The list of past posts including those previously published on Wordpress is available here.
A friend, Marc Blanchet, asked me last December whether it would be possible to use QUIC in space. Sure, the delays would be longer, but in theory it should be possible to scale the various time-related constants in the protocol, and then everything else should work. I waited to have some free time, and then I took the challenge, running a couple of simulations to see how Picoquic would behave on space links, such as between the Earth and Mars. I had already tested Picoquic on links with a 10 second round trip time (RTT), so there was hope.
First, I tried a simulation with a one minute one-way delay. A bit short of Mars, but a good first step. Of course, the first trial did not work, because Picoquic was programmed with a “handshake completion timer” of 30 seconds, and the Picoquic server was enforcing a maximum idle timer of 2 minutes. There was already an API to set the idle timer, so I used it to set a value of at least 3 times the maximum supported RTT. Then, I updated the code to keep the handshake going until the largest of the 30 second default timer and the value of the idle timer. And, success, the handshake did work in the simulation. However, it was very noisy.
At the beginning of the connection, client and servers do not know the RTT. The QUIC spec says to repeat the Initial packet if a response does not arrive within a timer, starting with a short initial timer value (Picoquic uses 250ms), and doubling that value after every repeat. That’s a good exploration, but Picoquic capped the timer at 1 second, so there are enough trials on average to succeed in front of 30% packet loss — which meant repeating the Initial packet more than 120 times in our case. The fix was to make that cap a fraction of the idle timer value, with limit to about a dozen transmissions in our test. Still big, but acceptable.
After the handshake things get better, because both ends at that point have measured the RTT at least one. Most timer values used in the data transmission phase are proportional to this RTT, and they naturally adapt. The usual concern with long delay links is the duration of the slow start phase, during which the sender gradually increases the sending rate until the path bandwidth is assessed. The sending rate starts at a low value and is doubled every RTT, but for a 10 Mbps link that might require 5 or 6 RTT. In our case, that would be 12 minutes before reaching full efficiency, which would not be good. But Picoquic already knew how to cope with that, because it was already tested on satellite links.
Picoquic uses “chirping” to rapidly discover the path capacity. During the first RTT, Picoquic sends a small train of packets, measures the time between first and last acknowledgement for that train, and gets a gross estimate of the link data rate. It then uses that estimate to accelerate the start-up algorithm (Picoquic uses Hystart), by propping up the sending rate. That works quite well for our long distance links, and we reach reasonable usage in 3 RTT instead of 5. It could work even better if Picoquic used the full estimate provided by chirping, or maybe derived from a previous connection, but estimates could be wrong and we limit potential issues by only using half their value.
Chirping takes care of congestion control, at least during startup, but we also have to consider flow control. If the client asks to “Get this 100MB file” but the flow control allows only 1MB, the transmission on very long delay link is going to take a very long time. But if the client says something like “get this 100MB file and, by the way, here are an extra 100MB of flow control credits”, the transmission will happen much faster. This is what we do in the tests, but it will have to be somehow automated in the practical deployments.
Once we have solved congestion control and flow control, we need to worry about timers. In QUIC, most timers are proportional to the RTT, but a few are not. The idle timer is preset before the measurement, as discussed above. The BBR algorithm specifies a “probe RTT” interval of 10 seconds, which would not be good, but Picoquic was already programmed to use the max of that and 3 RTT. The main issue in the simulation was the “retire connection ID (CID)” interval.
Picoquic is programmed to switch to a CID if resuming transmission after a long silence. This is a privacy feature, because long silences often trigger a NAT rebinding. Changing the CID makes it harder for on path observers to correlate the newly observed packets to the previous connection. However, the “long silence” was defined as 5 seconds, which is way to short in our case. We had to change that and define it as the largest of 5 seconds and 3 times the RTT.
With these changes, our “60 seconds delay” experiment was successful. That was a happy result, but Marc pointed out that 60 seconds is not that long. It takes more than 3 minutes to send a signal from Earth to Mars when Mars is at the closest distance, and 22 minutes when Mars is at the furthest. Sending signals to Jupiter takes 32 minutes to almost an hour, and to Saturn more than an hour. What if we repeated the experiment by simulating a 20 minute delay? Would things explode?
In theory, the code was ready for this 20 minute trial, but in practice it did in fact explode. Picoquic measures time in microseconds. 20 minutes is 1,200,000,000 microseconds. Multiply by 4 and you get a number that does not fit on 32 bits! The tests quickly surfaced these issues, and they had to be fixed. But after those fixes the transmissions worked as expected.
I don’t know whether Picoquic will in fact be used in spaceships, but I found the exercise quite interesting. It reinforces my conviction that “if it is not tested, it does not work”. A bunch of little issues were found, which overall make the code more robust. And, well, one can always dream that QUIC will one day be used for transmissions between Earth and Mars.
I have been working on the Picoquic implementation of QUIC since 2017. Picoquic distinguished itself by performing very well on GEO satellite links. The main reason is that 40 years ago, I was studying protocols for transport of data over satellite links for my PhD. So, of course, I wanted to support the scenario well in my implementation of QUIC. Which explained why this morning someone was asking me about the ACK rate tuning work in Picoquic and why it was getting good performance results over GEO. Turns out that I never wrote that down, so here it is.
Sending fewer ACKs reduces transmission overhead and message processing load, which is a good thing. Historically, ACKs were also used for ACK Clocking: if ACKs are sent very often, each one acknowledges few packets, and thus opens the congestion window just enough for allowing a few more packets to be sent. If ACKs were too sparse, each would provide many credits, causing implementations to send packets in large bursts, maybe causing congestion on the path. But most implementations today implement some form of pacing, so ACK Clocking is not necessary anymore to prevent such packet bursts. Of course, if having fewer ACKs reduces overhead, it also impacts RTT measurements and packet loss detection, so there is a limit to how few ACKs a transport implementation should send.
This was discussed in QUIC Working Group. The discussions resulted in the publication of the QUIC Acknowledgement Frequency draft. The draft defines a QUIC control frame, by which the sender of packets can tell receivers how many packets or how much time they should wait before sending an ACK. However, that draft only provides generic guidance on how these parameters shall be sent. Picoquic implements the draft, and sets the packet threshold and ACK delay as follow:
The coefficients above were set in an empirical manner, based on a simulations of a variety of network configurations. Each of these simulation is actually a test case in the Picoquic suite of tests, which would detect if a code change caused a performance regression in one of the configurations. These simulations include several GEO configurations, including for example simulation of a high bandwidth data path and a low bandwith return path. In that asymmetric configuration, having too many ACKs would cause congestion on the return path, but the chosen tunings avoid that.
In that asymmetric configuration, limiting the number of ACKs is not enough. QUIC ACK frames could grow very large if they are allowed to carry a large number of "ACK ranges". If the ACKs were too large, that too could saturate a narrow return path. Picoquic limits the number of ACK ranges to 32, and further limits the size of ACKs by not including ranges that are too old, were already acknowledged, or were already announced in 4 previous ACK. And with all that, yes, we end up with good ACK behavior on GEO satellite links. And on other links too.
My blog was fist published on WordPress, but I am getting repeated feedback that not having advertisements would be better, and also that a blog on networking really should be accessible on IPv6. So, I am taking the plunge and migrating the blog to the server of my personal company, Private Octopus.
The new blog is published as a static web site, developed using Jekyll. The upside of Jekyll is that publishing as a static web site is much simpler to manage than alternatives that require data base management. The downside is that I have to find a way to accept comments. And I would rather do that without adding a bunch of trackers that come with ready made solutions of the age of surveillance capitalism.
Net result, the comment section is a bit experimental. I am integrating this with Mastodon, because I like the concept of decentralized social networks. The integration is inspired by the work of Yidhra Farm, which I hope I ported correctly.
People who install it love the Pi-hole. The Pi-hole is a DNS software server, typically running on a Rasberry Pi, that can filter the DNS requests coming out of a local network and, for example, drop connections to advertisers. Once you install that, web browsing becomes notably snappier, not to mention much more private. But there is a catch. If you also configure your device or your browser to connect to an encrypted service, for example using DNS over HTTPS (DoH) to route DNS requests to Google, Cloudflare or Quad9, the DNS requests will just go to that server, bypassing the Pi-Hole and its filtering.
We have known of the tension between encrypted DNS and DNS filtering for some time. Encrypted DNS is specifically designed to prevent intermediaries from meddling with the DNS traffic. Paul Vixie, for example, has been arguing for a long time that DoH is shifting power from network managers to big sites, and is in effect building an unbreakable pipeline for serving ads. Of course, the more people deploy Pi-hole style filtering, the more advertisers will push for solutions like DoH that bypass these filters.
DNS filtering does improve privacy in the same way that Ad Blockers do. Consider that when loading a page, the advertisement systems run an auction to determine which advertiser will win the right to place an impression on this page. In the process, your browser history is captured not just by this advertiser that wins the auction, but by all advertisers that participate in the auction. I personally believe that this trafficking in private data should be illegal, but as of now in the USA it is not. So, yes, any sensible Internet user should be blocking ads. But blocking ads is not enough. The DNS traffic still flows in clear text, the upstream intermediaries still collect DNS metadata. They can and probably will sell your browser history to advertisers. They can also implement their own filtering, censoring sites for political or commercial reasons. Only encryption can prevent that.
So, what next? First, it is kind of obvious that systems like the Pi-hole should be using DNS encryption to fetch DNS data. Instead of merely passing the filtered DNS requests to the local server and rely on its goodwill, they should forward them to a trusted service using encrypted DNS, be it DoH, DNS over TLS (DoT) or DNS over Quic (DoQ). It would also be a good idea to let the Pi-hole work as an encrypted DNS server, so that the browsers might connect directly to it than going all the way to Google or Cloudflare, although the discovery protocols to enable just that are still being standardized in the IETF ADD WG. But systems like the Pi-hole only protect people when they are browsing at home, in the network in which they have installed the filters.
For mobile devices, we need end to end solutions, solutions that work whether or not the local network is doing appropriate filtering. We also want a solution that is independent of the advertisers, because I certainly agree with Paul Vixie that sending all your browser history to Google does not improve privacy. We cannot only rely on personal DNS servers, because then advertisers will be able to attribute server traffic to specific users. What we want are moderate scale servers, with enough users so that traffic cannot be directly traced to individuals. We know how to do that technically: deploy in the network something like a version of the Pi-hole capable of DNS encryption. What we don’t have is a model in which such servers can recoup their costs without relying on advertisements!
Back in May, there was a discussion on the QUIC implementers’ chat room. What if a client played games with acknowledgements during the initial exchange? Could it be used in a DOS amplification attack? Or maybe some other form of DOS attack?
In the proposed attack the attacker sends a first packet, which typically contains the TLS 1.3 “Client Hello” and its extensions. The server will respond with up to three packets, the first one of which will be an Initial packet containing the TLS 1.3 “Server Hello” and its clear text extensions, followed by Handshake packets containing the reminder of the TLS server flight, or at least part of it. These packets will be sent to the “client address” as read from the incoming UDP message. The attacker has set that to the IP address of the target, not its own. It will not receive them, and it cannot guess the encryption key required to decrypt the handshake packets. But the attacker can guess that the packet number of the Initial packet was zero, because that’s the default. The attacker can thus create a second Initial packet, in which it acknowledges the packet number 0 from the server.
One risk is that the server accepts the incoming acknowledgement and treats the IP address of the client as “validated”. The DOS amplification protections would be lifted, and the server may start sending more packets towards the IP address of the client – or rather, towards the IP address of the target in our case. That would be bad.
It turns out that the picoquic implementation is protected against that attack: it will only validate the address if the incoming Initial packet carries the “connection ID” set by the server in its initial message. The only problem happens when the server uses zero-length connection-ID – a rare deployment, but one in which the fake packet would manage to validate the connection. That was a bug. It is now fixed. With that, the “fake acknowledgement” attack will not be able to use for amplifying DOS attacks.
That does not mean that the spoofed acknowledgement isn’t bad. It becomes just one of the many ways for third parties to spoof packets and interfere with the state of the connection. It is also an attack that is quite easy to fix. The attack depends on predicting the Initial packet number used by the server. It is easy to randomize that number. A recent code update just makes that the default behavior. As an added benefit, this means that acknowledgement of randomized packet numbers can be used to validate the client address, which improves the robustness of the connection establishment.