24 Mar 2025
The IETF draft defining multiparty extension for QUIC is almost ready, but there was an interesting debate during the IETF meeting in Bangkok about its status. One of the reasons is that we have only a few implementations, and not all that much deployment in production. The other reason is that congestion control for multiparty is still somewhat experimental. The draft is carefully worded to only use stable references, merely following RFC9000 and stating that there should be an independent congestion controller for each path, meaning that a QUIC multiparty connection with N paths will be equivalent to as many single path connections. Which seems reasonable, but is in fact a bit of a low bar. For example, in RFC6356, we see three more ambitious goals:
Goal 1 (Improve Throughput) A multipath flow should perform at least as well as a single path flow would on the best of the paths available to it.
Goal 2 (Do no harm) A multipath flow should not take up more capacity from any of the resources shared by its different paths than if it were a single flow using only one of these paths. This guarantees it will not unduly harm other flows.
Goal 3 (Balance congestion) A multipath flow should move as much traffic as possible off its most congested paths, subject to meeting the first two goals.
These goals are implicitly adopted in the latest specification for Multipath TCP in RFC8684.
The QUIC extension for multipath only mention a subset of these goals, stating that Congestion Control must be per-path, as specified in RFC9000. The guiding principle is the same as the Goal 2 of RFC6356, ensuring that a QUIC multipath connection using multiple paths will not use more resource than a set of independent QUIC connections running on each of these paths.
We could have a philosophical discussion about what constitute “fairness” in multipath. On one hand, we could consider that a connection between two endpoints A and B should be “fair” versus any other connections between these points. If it can use Wi-Fi and 5G at the same time, that means it should not be using more resource that the best of Wi-Fi and 5G. On the other hand, users can argue that they are paying for access to both networks, and should be able to use the sum of the resource provided by these two networks. The first approach requires users to be cooperative, the second accepts that users will be selfish. In practice, it is probably better to engineer the network as if users would be selfish – because indeed they will be. But even if we do accept selfishness, we still have at least two congestion control issues, such as shared bottleneck and simultaneous fallback.
Suppose that a QUIC congestion uses two paths, defined by two set of source and destination IP addresses. In practice, QUIC paths are initiated by clients, and the two paths are likely to use two different client IP addresses, but a single server address. We could easily find configuration in which the server side connection is the bottleneck. In that case, the congestion controllers on the two paths could be playing a zero-sum game. If one increases its sending rate, the other one will experience congestion and will need to slow down.
The counter-productive effects of this kind of shared path competition can be mitigated by “coupled congestion control” algorithms. The idea is to detect that multiple paths are using the same bottleneck, for example by noticing that congestion signals like packet losses, ECN marks or delay increases are happening simultaneously on these paths. When that happens, it might be wise for the congestion managers on each path to cooperate, for example by increasing sending rates slower than if they were alone.
The experimental algorithm proposed in RFC6356 is an example of that approach. It is a variation of the New Reno algorithm. Instead of always increasing the congestion window for a flow by “one packet per RTT per path”, it would use the minimum of that and something like “one packet per RTT across all paths”. (The actual formula is more complex, see the RFC text for details.)
I have two issues with that: it seems a bit too conservative, and in any case it will only be efficient on those paths where new Reno is adequate. Most of the Internet connections are using more modern congestion algorithms than new Reno, such as Cubic or BBR. We would have to develop an equivalent coupling algorithm for the these algorithms. But we can also notice that the formula will slow the congestion window increase even if the paths are not coupled, which is clearly not what selfish users would want. In short, this is still a research issue, which explains why the QUIC multipath draft does not mandate any particular solution.
Another issue with multipath is the “simultaneous backup” problem. Many multipath configurations are aiming for redundancy rather than load sharing. They will maintain an active path and a backup path, sending most of the traffic on the active path, and only switching to the backup path if they detect an issue. For example, they would use a Wi-Fi connection until it breaks, and then automatically start sending data on a 5G connection. The problem happens when multiple connections on multiple devices do that at the same time. They were all using the same Wi-Fi network, they all detect an issue are about the same time, so they all switch to using the 5G network around the same time. That’s the classic “thundering herd” problem – an instant surge of traffic causing immediate congestion as all these connections compete for the same 5G radio frequencies.
The “thundering herd” problem will solve itself eventually, as all connections notice the congestion and reduce their sending rate, but it would be nice if we avoided the packet losses and increased delays when these “simultaneous backups” happen. The classic solutions are to introduce random delays before backing up, and also to probe bandwidth cautiously after a backup. Again, this is largely a research issue. My main recommendation would be for networks to implement something like L4S (see RFC9330, so each connection will receive “congestion experienced” ECN marks and quickly react.
So, yes, we do have a couple of research issues for congestion control and multipath…
If you want to start or join a discussion on this post, the simplest way is to send a toot on the Fediverse/Mastodon to @huitema@social.secret-wg.org.