02 Apr 2022
The QUIC specification was finalized by the IETF a year ago, but the interop tests often detect the occasional failure. The most challenging tests are series of 50 connection attempts in a lossy environment, with about 30% packet loss. The tests only pass if all attempts succeed. You might think that 30% packet loss is not a realistic environment, but that’s a very good way to test the implementation of the “state machine” and detect loose ends. One specific loose end is the interaction between packet losses and the “anti amplification” feature of QUIC, which requires servers to limit how much data they send until the client has performed a three-ways handshake.
The typical QUIC handshake proceeds as figure 1: the client sends an “initial” packet, the server responds with its own initial packet that provide the “server hello” and the “handshake” key, and then a couple of “handshake” packets that enables the client to compute the “application” key and finalize the exchange. The important elements there are that the client cannot decrypt the handshake packet if it has not received the initial server packet and will not be able to send its own crypto data until it has received the server handshake packets. The other important element is that by the time the server has sent initial and handshake packets, it has exhausted its “anti-amplification” credits, and will not send anymore data until it receives a packet from the client that completes the three-ways handshake. Now, lets examine what happens if some packets are lost.
If the client initial is lost, the client will repeat it after a timeout. Similarly, if the client handshake packet is lost, the client will repeat it. All implementations do that. But there are more interesting failures if some of the server packets are lost. The first case is, what if the server’s initial packet is lost?
The client that does not receive the server’s initial packet has only one option, because it does not know whether its own packet was lost or whether the server packet was lost. In theory, the client could notice arrival of handshake packets, but those packets cannot be decrypted yet. The client will thus repeat its own Initial packet. Went it receives that, the server has multiple options. A simple and not very helpful one would be to just send back an initial acknowledgement, telling the client to stop repeating. The really bad option is to just repeat the handshake packets, because the client cannot understand them.The more helpful option is to repeat the server’s Initial packet, including the crypto frames, so the client can start processing handshake messages. Proactively repeating these handshake messages would also be helpful. Most implementations handle that case correctly.
The more difficult case is what happens if the handshake packets are lost, and specifically if the client’s initial packet is acknowledged.
Suppose that the handshake packets sent by the server are lost. The client has received the initial packet from the server and can now send or receive handshake packets. The natural option for the client is to send an initial acknowledgement, but if that packet is lost the whole exchange is going to stop:
That scenario has occurred many times during interop testing. The client has to be programmed to break that silence by repeating some packets. There are three options:
My preference is to repeat an Handshake packet, because this demonstrates to the server that the client has received all the required information in the Initial packet, an thus is more or less equivalent to sending both an Initial packet with an ACK and the handshake packet. The handshake packet cannot carry an acknowledgement, because no handshake packet has been reserved, but it can carry a PING frame. If the server is smart, it will decide to immediately repeat the missing packets, but if it is not that might require another round trip, as shown in the following figure:
The first handshake packet received by server from client completes the three ways handshake and gets the server out of anti-amplification mode. The not-too-smart server will acknowledge that packet, and the smarter client will immediately acknowledge that Handshake packet, even though it only contained an ACK frame. When the client acknowledges the 3rd handshake packet of the server, most servers will immediately repeat their initial handshake packets, and the QUIC handshake will proceed.
In truth, this procedure is already specified in section 8.1 of RFC 9000. “Loss of an Initial or Handshake packet from the server can cause a deadlock if the client does not send additional Initial or Handshake packets. A deadlock could occur when the server reaches its anti-amplification limit and the client has received acknowledgments for all the data it has sent. In this case, when the client has no reason to send additional packets, the server will be unable to send more data because it has not validated the client's address. To prevent this deadlock, clients MUST send a packet on a Probe Timeout (PTO); see Section 6.2 of [QUIC-RECOVERY]. Specifically, the client MUST send an Initial packet in a UDP datagram that contains at least 1200 bytes if it does not have Handshake keys, and otherwise send a Handshake packet.” But that short paragraph is often glimpsed over. Let’s hope that belaboring the point a little will help reduce the number of interop failures!