03 Jan 2011
Extending DNS operation for peer-to-peer name resolution may be much simpler than many expect. DNS resolvers know how to maintain caches of previously resolved names. These caches can be used for progressive peer-to-peer name resolution without relying on centralized servers.
Let’s assume first that the names in the peer-to-peer domain can be sorted, for example using alphabetic order – case insensitive, of course, since this is DNS. A node that receives a query for a peer-to-peer name will examine the requested name. If the name is known locally, the node will provide the answer. If the name is not known, the node will forward the request to one of the peers known from the cache, whose name is closest to the requested name according to the alphabetic order. The process will repeat until the name is found, or, if the name is not reachable, no progress can be made. The response will then be returned, just like any regular DNS query.
This may be better explained with an example. Suppose that a P2P name resolver, “rose.p2p,” wants to resolve the name “flower.p2p.” It will first look for the “closest” name in its own cache, maybe “earth.p2p” and send it the request. “Earth.p2p” will look for the closest name in its cache, maybe “fire.p2p,” and forward the request. “Fire.p2p” may forward the request to “Flare.p2p”, which forwards it to “flour.p2p,” which in turn could have the data for “flower.p2p.”
What I just described is basically the PNRP name resolution algorithm, adapted to DNS names instead of PNRP’s binary identifiers. The resolution process has the side effect of propagating knowledge in the network, as all intermediate nodes can see the response and cache it. In our example, rose, earth, fire and flare all learned about “flower.” The interesting point is that the density of cached names increases closer to the name itself. I won’t go in a lengthy mathematical demonstration, but this property has a very good effect on the network performance. As the network learns, queries get resolved ever faster, allowing the system to manage very large number of names.
Of course, there is a catch. The propagation of knowledge works, but it only works because there was an initial “chain of knowledge.” In our example, “flare.p2p” knew about “flour.p2p,” which knew about “flower.p2p.” This initial knowledge cannot be discovered by the side effect of queries, and has to be somehow “seeded” in the network. This has been studied a lot in the context of distributed hash tables, and also during the design of PNRP. The solution is to add to the heuristic described above a “leaf-set management” procedure.
In the context of DHT, the leaf-set designates the other nodes in the network whose indices are closest to a particular node. We can prove that if all nodes in the network know their leaf-set, every name in the network can be resolved. Of course, in the worst case the resolution will require visiting half the nodes in the network, but in practice the caches will allow for efficient short cuts. Efficient resolution requires thus a combination of leaf-set management and efficient caching.
It turns out that the leaf-set can be managed by a simple process. At regular intervals, a node can look at its own cache and find in the cache the two closest names to its own – specifically the one just before its name in alphabetic order and the one just below. It can then send to each of these nodes a query for made up name located in alphabetical order between the sender and the receiver, in the hope of learning whether some names exist that are even closer than the ones in the cache. If new names are found, the process can be repeated until the node is convinced of knowing the two names closest to its own in the network. To each of these, the node can send an “unsolicited DNS response” advertising its own name and address. This will be an extension to the regular DNS protocol, a specific extension for peer-to-peer domains. This simple extension ensures that the network is seeded with all the relevant names, and that queries will succeed, even if they require initially many relays.
We can see that P2P DNS could be implemented with very minimal extensions to the basic DNS protocol: allow for relays of requests to the closest name in the cache instead of relying solely on NS records; perform “leaf-set discovery” by sending fake queries at regular intervals; and, perform “leaf-set maintenance” by pushing unsolicited DNS responses to the leaf-set. Of course, this is just the sketch of the solution. The full specification will need to deal with cache management issues, such as what replacement algorithm to implement if the cache is full. And it will also have to deal with security issues. But that will be the matter for another post. For now, let’s just notice that developing P2P DNS does not need to be hard!