Christian Huitema's Latest Posts

Cloudy sky, waves on the sea, the sun is 
shining

The list of past posts including those previously published on Wordpress is available here.

Migrating this blog to Private Octopus

Posted on 29 Oct 2022

My blog was fist published on WordPress, but I am getting repeated feedback that not having advertisements would be better, and also that a blog on networking really should be accessible on IPv6. So, I am taking the plunge and migrating the blog to the server of my personal company, Private Octopus.

The new blog is published as a static web site, developed using Jekyll. The upside of Jekyll is that publishing as a static web site is much simpler to manage than alternatives that require data base management. The downside is that I have to find a way to accept comments. And I would rather do that without adding a bunch of trackers that come with ready made solutions of the age of surveillance capitalism.

Net result, the comment section is a bit experimental. I am integrating this with Mastodon, because I like the concept of decentralized social networks. The integration is inspired by the work of Yidhra Farm, which I hope I ported correctly.

The Pi-hole and DNS privacy

Posted on 14 Aug 2022

People who install it love the Pi-hole. The Pi-hole is a DNS software server, typically running on a Rasberry Pi, that can filter the DNS requests coming out of a local network and, for example, drop connections to advertisers. Once you install that, web browsing becomes notably snappier, not to mention much more private. But there is a catch. If you also configure your device or your browser to connect to an encrypted service, for example using DNS over HTTPS (DoH) to route DNS requests to Google, Cloudflare or Quad9, the DNS requests will just go to that server, bypassing the Pi-Hole and its filtering.

We have known of the tension between encrypted DNS and DNS filtering for some time. Encrypted DNS is specifically designed to prevent intermediaries from meddling with the DNS traffic. Paul Vixie, for example, has been arguing for a long time that DoH is shifting power from network managers to big sites, and is in effect building an unbreakable pipeline for serving ads. Of course, the more people deploy Pi-hole style filtering, the more advertisers will push for solutions like DoH that bypass these filters.

DNS filtering does improve privacy in the same way that Ad Blockers do. Consider that when loading a page, the advertisement systems run an auction to determine which advertiser will win the right to place an impression on this page. In the process, your browser history is captured not just by this advertiser that wins the auction, but by all advertisers that participate in the auction. I personally believe that this trafficking in private data should be illegal, but as of now in the USA it is not. So, yes, any sensible Internet user should be blocking ads. But blocking ads is not enough. The DNS traffic still flows in clear text, the upstream intermediaries still collect DNS metadata. They can and probably will sell your browser history to advertisers. They can also implement their own filtering, censoring sites for political or commercial reasons. Only encryption can prevent that.

So, what next? First, it is kind of obvious that systems like the Pi-hole should be using DNS encryption to fetch DNS data. Instead of merely passing the filtered DNS requests to the local server and rely on its goodwill, they should forward them to a trusted service using encrypted DNS, be it DoH, DNS over TLS (DoT) or DNS over Quic (DoQ). It would also be a good idea to let the Pi-hole work as an encrypted DNS server, so that the browsers might connect directly to it than going all the way to Google or Cloudflare, although the discovery protocols to enable just that are still being standardized in the IETF ADD WG. But systems like the Pi-hole only protect people when they are browsing at home, in the network in which they have installed the filters.

For mobile devices, we need end to end solutions, solutions that work whether or not the local network is doing appropriate filtering. We also want a solution that is independent of the advertisers, because I certainly agree with Paul Vixie that sending all your browser history to Google does not improve privacy. We cannot only rely on personal DNS servers, because then advertisers will be able to attribute server traffic to specific users. What we want are moderate scale servers, with enough users so that traffic cannot be directly traced to individuals. We know how to do that technically: deploy in the network something like a version of the Pi-hole capable of DNS encryption. What we don’t have is a model in which such servers can recoup their costs without relying on advertisements!

Fixing a potential DOS issue in the QUIC handshake

Posted on 17 Jul 2022

Back in May, there was a discussion on the QUIC implementers’ chat room. What if a client played games with acknowledgements during the initial exchange? Could it be used in a DOS amplification attack? Or maybe some other form of DOS attack?

In the proposed attack the attacker sends a first packet, which typically contains the TLS 1.3 “Client Hello” and its extensions. The server will respond with up to three packets, the first one of which will be an Initial packet containing the TLS 1.3 “Server Hello” and its clear text extensions, followed by Handshake packets containing the reminder of the TLS server flight, or at least part of it. These packets will be sent to the “client address” as read from the incoming UDP message. The attacker has set that to the IP address of the target, not its own. It will not receive them, and it cannot guess the encryption key required to decrypt the handshake packets. But the attacker can guess that the packet number of the Initial packet was zero, because that’s the default. The attacker can thus create a second Initial packet, in which it acknowledges the packet number 0 from the server.

Showing a modified QUIC handshake, spoofed ACK validates the connection, server repeats many packets.
Spoofed ACK attack for DDOS Amplification

One risk is that the server accepts the incoming acknowledgement and treats the IP address of the client as “validated”. The DOS amplification protections would be lifted, and the server may start sending more packets towards the IP address of the client – or rather, towards the IP address of the target in our case. That would be bad.

It turns out that the picoquic implementation is protected against that attack: it will only validate the address if the incoming Initial packet carries the “connection ID” set by the server in its initial message. The only problem happens when the server uses zero-length connection-ID – a rare deployment, but one in which the fake packet would manage to validate the connection. That was a bug. It is now fixed. With that, the “fake acknowledgement” attack will not be able to use for amplifying DOS attacks.

That does not mean that the spoofed acknowledgement isn’t bad. It becomes just one of the many ways for third parties to spoof packets and interfere with the state of the connection. It is also an attack that is quite easy to fix. The attack depends on predicting the Initial packet number used by the server. It is easy to randomize that number. A recent code update just makes that the default behavior. As an added benefit, this means that acknowledgement of randomized packet numbers can be used to validate the client address, which improves the robustness of the connection establishment.

A Low Latency Internet with L4S

Posted on 05 Jul 2022

Many network applications work better with low latency: not only video conference, but also video games and many more transaction-based applications. The usual answer is to implement some form of QoS control in the network, managing “real-time” applications as a class apart from bulk transfers. But QoS is a special service, often billed as an extra, requiring extra management, and relying on the cooperation of all network providers on the path. That has proven very hard to deploy. What if we adopted a completely different approach, in which all applications benefited from lower latency? That basically the promise of the L4S architecture, which has two components: a simple Active Queue Management implemented in network routers provides feedback to the applications through Early Congestion Notification marks, and an end-to-end congestion control named “Prague” uses these marks to regulate network transmissions and avoid building queues. The goal is to obtain Low Latency and Low Losses, the 4 L in L4S – the final S stands for “Scalable Throughput”.

I am developing an implementation of the Prague algorithm for QUIC in Picoquic, updating an early implementation by Quentin Deconinck (see this PR). I am testing the implementation using network simulations. The first test simulates using L4S when controlling the transfer of 4 successive 1MB files on a 10 Mbps connection, and the following QLOG visualization shows that after the initial "slow start", the network traffic is indeed well controlled:


QLOG Visualization, simulation with Prague CC

For comparison, using the classic New Reno algorithm on the same simulated network shows a very different behavior:


QLOG Visualization, simulation with New Reno CC.

The Reno graph shows the characteristic saw-tooth pattern, with the latency climbing as the buffers fill up, until some packets are drop and the window size is reset. In contrast, the Prague shows a much smoother behavior. It is also more efficient:

The implementation effort did find some issues with the present specification: the Prague specification requires smoothing the ECN feedback over several round trips, but the reaction to congestion onset is then too slow; the redundancy between early ECN feedback detecting congestion and packet losses caused by the congestion needs to be managed; something strange is happening when traffic resumes from an idle period; and, more effort should be applied to managing the start-up phase. I expect that these will be fixed as the specification progresses in the IETF. In any case, these numbers are quite encouraging, showing that L4S could indeed deliver on the promise of low loss and low latency, without causing a performance penalty.

Other protocols, like BBR, also deliver low latency without requiring network changes, but there are limits, such as still suffering congestion caused by applications using Cubic, and slow reaction to underlying network changes. BBR is worse than Prague in that respect but performs better on networks that do not yet support L4S, or on high-speed networks. Ideally, we should improve BBR by incorporating the L4S mechanisms. With that, we could evolve the Internet to always deliver low latency, without having to bother with the complexity of managing QoS. That would be great!

Edited, a few hours later:

It turns out that the "strange [thing] happening when traffic resumes from an idle period" was due to an interaction between the packet pacing implementation in Picoquic and the L4S queue management. Pacing is implemented as a leaky bucket, with the rate set to the data rate computed by congestion control and the bucket size set large enough to send small batches of packets. When the source stops sending for a while, the bucket fills up. When it starts sending again, the full bucket allows for quickly sending this batch of packets. After that, pacing kicks in. This builds up a small queue, which is generally not a big issue. But if the pacing rate is larger than the line rate, even by a tiny bit, the queue does not drain, and pretty much all packets get marked as "congestion experienced."

This was fix by doing a test on the time needed to send a full window of packets. If the sender learns that a lot of packets were marked on a link that was not congested before, and that the "epoch" lasted longer than the RTT, then it classifies the measurement as "suspect" and only slows down transmission a little. With that, the transmission becomes even smoother, and the transfer time in the test drops to less than 3.5 seconds, as seen in this updated visualization:

An introduction service for Mastodon

Posted on 28 Apr 2022

It seems that many people have started using Mastodon recently. But the place feels a bit empty, especially if like me you join a small server. How do I find my friends on Mastodon? Today, there are two ways: they tell me their handle using some other service, or I stumble upon one of their posts promoted by a common friend. This might work eventually, but it is much less straightforward than just searching by name or email address, something that is easy on Facebook. Of course, finding people should not be too easy, because it would provide avenues for spam or harassment. Facebook provides option, such as only making your account visible to “your friends”, or “the friends of your friends”. These options, and search in general, are straightforward to implement on a centralized server. Can we implement them in a decentralized network?

One obvious and wrong solution would be to broadcast search requests to every server in the whole network. That’s obviously wrong because it creates a huge load, and because it broadcasts private information to a lot of servers who don’t need to know about it. Another obvious and wrong solution is to create a distributed database of all registered names, maybe as a distributed hash table. That would create less load than broadcasting all name queries, but that would still broadcast the name information and create big privacy issues.

My preferred solution would be for users to send the queries to all their friends, as in “Hey, Bob, this is Alice, do you know Carol’s handle?” The load is lower than broadcasting queries. The privacy risks are more contained, although not eliminated. The queries will likely be visible to Bob’s server, and that server can build a database of all of Alice’s friends. Big servers would build bigger databases. This might provide incentives for centralization.

Can we design the queries in ways that minimize the privacy risks? We could use end to end encryption, so that the queries are only visible by the actual user, not their server. We could also encode queries, so they carry the one-way hash of the queried name, as in “Hey, Bob, this is Alice, do you know the handle of the user whose name hashes with this nonce to produce 0xdeadbeef?” Bob will have to compute the hashes of all his friends’ names, see whether one matches, and then return it.

Of course, the security worries do not end there. If we have any kind of automated search, it can be abused. Spammers do that on Facebook today: they manage to befriend someone, download the list of that person’s friends, and then try to befriend every person in that list. Encrypting or hashing queries will not prevent that, but something else might. For example, when receiving the query, Bob might relay it to Carol: “Hey, Carol, this is Bob. I got a message from Alice asking for your handle, shall I respond?” Carol can then decide.

We could also make the messages more complex. Facebook, for example, decorates friend requests with lists of common friends, so the user can make an informed decision. This fall in the general category of “introduction services”. This is hard to design, but we must start somewhere. It would make a ton of sense to add the distributed introduction service to a distributed social network, so maybe we should just start there.