25 Oct 2021
Many Internet Service Providers are nice companies who love their customers, but quite a few have developed a sideline of collecting logs of traffic and selling that to advertisers, or in fact to whoever is willing to pay. Even the FTC is not happy about that. New technology like “DNS over HTTP”, or DoH, enables encrypted DNS resolution through the services of a third party, and thus hide whole chunks of metadata from their ISP. That seems like a win for privacy, until we realize that the main alternative DNS services are provided by big companies like Google or Cloudflare. And so, people have a big dilemma in front of them: do they leak their DNS stream of queries to their ISP, or do they leak it to Google or Cloudflare instead? My concern is that in most cases it is not "instead of" but "in addition to", at least if we look at the entire picture, not just the DNS stream in isolation. And if you do not look at it in isolation, you realize that the effort should focus on encrypting the SNI in HTTPS connections.
Assume the classic example of loading "www.example.com". Today, in practice, that means asking for the IP address of "www.example.com" through a DNS API, and then establishing an encrypted HTTPS connection to that address. So, who sees what? The DNS provider sees the stream of requests, but these are not the only "muddy footprints", to reuse the metaphor in Geoff Huston's presentation. The ISP sees lots of things. It sees the IP traffic, and thus it knows that some customer at IP address X is connecting to some service at IP address Y. The ISP also sees the unencrypted part of the HTTPS connection, which today includes the plain text name "www.example.com".
When the average user goes to its ISP's DNS and then loads the page, the ISP sees the DNS request to "www.example.com", the IP addresses in the packet, and the SNI "www.example.com" in the cleartext data. Google or Cloudflare see nothing of that, unless they happen to have some business relation with the company behind "example.com". (In many cases they do, but hopefully not all cases.)
When the privacy seeking user tries to hide its footprints by using and encrypted connection to a third party DNS, the ISP still sees the IP addresses in the packet, and the SNI "www.example.com" in the cleartext data. Getting that data might be a bit less convenient than just extracting a log of DNS queries, but if the ISP is set on extracting revenues from the metadata it will make the effort. So, for the ISP's business, not much change. But now, instead of just the ISP seeing the data, the third party also sees the DNS stream. So the true description is that, instead of just the ISP collecting metadata, both the ISP and the third party DNS provider do. What privacy did the user gain, exactly?
That why I believe any privacy gain depends on encrypting the SNI. Let's go back to the IP address resolution. In a lot of cases, the web site "www.example.com" is not accessed directly, but rather through a content distribution service like, say "cdn.example.net". When the ISP sees the traffic flowing from address X to address Y, address Y does not directly identify "www.example.com". Instead, it points to a server managed by "cdn.example.net". Suppose now that "www.example.com" and "cdn.example.net" cooperate and deploy "Encrypted Client Hello", the emerging specification for encrypting metadata like the SNI. If they do it right, we get to a stage where neither the ISP nor the DNS provider get more metadata than "address X is connecting to one of the services hosted by cdn.example.net".
How does it work? Using ECH properly requires two DNS transactions. In the first transaction, the user's application learns that "www.example.com" is accessed through "cdn.example.net", and also learns what key to use to encrypt the metadata. In the second transaction, the application resolves the name "cdn.example.net" and obtains the address of the nearest server instance. Now, let's suppose a little bit of magic that would allow the use to do the first transaction in a private way, and then cache the result.
When it comes to reading the page, we don’t get the same leaks as before. If we use the ISP resolver, the ISP will see a request for "cdn.example.net", traffic from IP X to IP Y corresponding to "cdn.example.net", and an SNI text that has been transformed to say “cdn.example.net”. Success. It is hard to see what privacy could be gained by sending the traffic to Google or Cloudflare instead.
Now, of course, there is still a need to perform the first transaction beforehand. The information is stored in a the “HTTPS RR” (https://datatracker.ietf.org/doc/draft-ietf-dnsop-svcb-https/) DNS record provided by the target server, in our case www.example.com. By default, it is certainly best to not load that through the ISP, which could then combine two pieces of metadata: looking for the ECH information for “www.example.com” and then connecting to “cdn.example.net”. Instead, it makes sense to look at that information through a third party, and then cache the result.
So, maybe we won’t be leaking information to the ISP, and also not to the big companies deploying third party DNS services. But we do need to encrypt the SNI and learn how to ECH!