July 2011
How to Abolish the DNS Hierarchy --- But it's a Bad Idea (2 July 2011)
Will the Circle Be Unbroken? (18 July 2011)

How to Abolish the DNS Hierarchy --- But it's a Bad Idea

2 July 2011

There’s been a fair amount of controversy of late about ICANN’s decision to dramatically increase the number of top-level domains. With a bit of effort, though — and with little disruption to the infrastructure — we could abolish the issue entirely. Any string whatsoever could be used, and it would all Just Work. That is, it would Just Work in a narrow technical sense; it would hurt innovation and it would likely have serious economic failure modes.

The trick is to use a cryptographic hash function to convert a string of bytes into a sequence of hexadecimal digits. For example, if one applies the SHA-1 function to the string

Amazon.com
the result is a46af6931d9dace2200617548fab3274549e308f. Add a dot after every pair of hex digits, tacking on a suffix like .arb (for "arbitrary", since .hash might be seen as having other connotations), and you get
a4.6a.f6.93.1d.9d.ac.e2.20.06.17.54.8f.ab.32.74.54.9e.30.8f.arb
which looks like a domain name, albeit a weird one. It not only looks like one, it is; that string could be added to the DNS today with no changes in code. We could even distribute the servers; at every level, there are 256 easy-to-split subtrees. So what’s wrong?

The technical limitation is that every end point would have to be upgraded to do the hashing. Yes, that’s a problem, but we’ve been through it before; supporting internationalized domain names required the same thing. And it works:

But — how do endpoints know to do the hashing in this scheme? Something in a the URL bar of a web browser? There are lots of things on the net that aren’t web browsers; how will they know what to do? You can’t necessarily tell from a string if it should be used literally or via this hashing scheme; "Amazon.com" appears to be the legal name of the corporation.

There’s another problem: canonicalization. Similar strings will produce very different hash values. Here’s an example:

New York Times 7e145e463809ea5e7c28f2ddf103499f942c9ea3
The New York Times 1950c50c10f288dd6e9190361c968e1b8c4a3775
N.Y. Times e69011929d6d30347ddca11c7955a07df8390984
NY Times 48b6b7d57f0ed2885816f1df96da1ffa86f09dda

We could no doubt define some set of rules that would handle many common cases. Equally certain, we’d miss many more. Companies could think of their own rules, but if they missed some we’d be back to cybersquatting and typosquatting. This would be worse, though, because the names are so spread out.

The real issue, though, is economic: who would run the different pieces of .arb? There are currently about 100M names in .com. Let’s allow for growth and assume 1,000,000,000 names. To handle canonicalization, assume another factor of 10, for about 10B names. Does that work? To a first approximation, sure; we can delegate at each period in the name, and there are 256 values at each level. That means that going down just two levels, we could have 65,536 different registries, each handling about 150K names. That’s easy to do, but a given registry could handle more than one zone. Let’s assume that 1.5M names is a good size (which is somewhat challenging, though it’s clearly possible since it works today). That means we’d need about 6,600 registries. But they have no way to do marketing; there’s no way to target any particular business segment, since names are mapped to more or less random parts of the name tree. If a registry failed, an unpredictable portion of the net would suddenly be unreachable.

Most of us never see registries; when we want to create a new domain, we do business with a registrar. But every registrar would need to do business with every registry! The number of relationships would get ungainly, and again, there’s no way to do targeted marketing. The registrars for, say, .museum can target museums, while ignoring, say, banks. With this scheme, everyone is doing business with everyone. It’s great to have a global market; it’s also very expensive.

https://www.cs.columbia.edu/~smb/blog/2011-07/2011-07-02.html