27 Jan 2021
Yesterday, I was providing a broad brush description of the implementation of multipath QUIC in picoquic. Today, I will present a bit more details on a specific point, the measurement of transmission delays. Just like TCP, QUIC implementations use round trip time (RTT) estimates in many ways:
In TCP, or in “single path” versions of QUIC, the RTT is estimated by comparing the time at which a packet is acknowledged to the time at which it was sent. With multipath versions of QUIC this simple definition is insufficient, because the packets and their acknowledgements are not always sent on the same path. Consider for example the test case in which in which the first path is a low bandwidth and low delay “terrestrial” path, and the second path is a high bandwidth and high delay “satellite” path. Assume that the one-way delay is 10 ms on the terrestrial paths, and 300 ms on the satellite paths. Depending on the circumstances, we can experience different “round trip” times:
For QUIC, the transport specification states that the RTT is measured “per path”, so we know that we should maintain separate statistics for the satellite and the terrestrial link. But even that simple specification is ambiguous. For example, the satellite path RTT will be computed as mixing some measurement of about 600ms and other measurements of about 310ms, leading to some confusion.
One way to remove that confusion would be to mandate always sending acknowledgements on the same path on which packets are received, but this is suboptimal. Receiving acknowledgement through the shorter path result in a shorter control loop, and thus better performance. Also, there is sometimes a need to send acknowledgements for packets received on a path that has since been abandoned. The alternative would be to not acknowledge these packets, but that would cause undue retransmissions.
In the picoquic implementation, I use timestamps as defined in the QUIC timestamp extension draft. Since each acknowledgement carries a timestamp, the code can estimate the two one-way delays: from send time to time stamp one way, from time stamp to acknowledgement time the other way. With that, the code can maintain one way delay estimates for each path and each direction, and also estimates of the delay deviation for each one-way path. When time stamps are not available, the code use the estimate of delay deviations to apportion delays to each path, using formulas similar to those of a Kalman filter.
In practical deployments there is a degree of uncertainty in these measurements because time stamps are not derived from synchronized global clocks. The code must deal with a phase uncertainty and with a possible clock skew, as explained in the timestamp extension draft. In practice, the phase uncertainty is not a problem, because delay-based decisions either only consider variations in one way delays, or the sum of one way delays in both directions. The phase uncertainty does not affect the measurement of variations, and it disappears when adding measurements from each direction. Clock skew is a bigger problem, but it mostly affect the few connections that last a long time. The solution is probably to implement a simplified version of the NTP algorithms. Another item for the to-do list.
Computing one-way delays requires getting series of delay measurements covering various combination of paths. This does not always happen naturally, for example in client-server scenarios in which the server sends data packets and the client mostly sends acknowledgements. By default, the server will not send acknowledgements of ack-only packets, and the client will not get enough measurements to maintain valid delay estimates. In the picoquic implementation, this is solved by frequently adding PING frames to acknowledgement packets.
Delay measurements are used in the code in a variety of ways. The picoquic implementation of HYSTART tracks the evolution of one-way delays during slow start. The bandwidth estimation in BBR measures the amount of data received between the time stamps at beginning and end of a measurement period. The delay measurements are also used to compute packet loss timers, acknowledgement delays, and reordering delays. In all these areas, the multipath implementation uses conservative tradeoffs. Packet loss timers are based on the worst case scenarios, combining the one-way delay on the sending path and the worse one-way delays of all possible acknowledgement paths. The acknowledgement delay is computed from the RTT of the fastest path. The initial value of reordering delays is set to fraction of the one-way delay of the sending path. Some tuning will certainly be needed in the future.