January 2010
Why I Won't Buy an E-book Reader -- and When I Might (13 January 2010)
Google, China, and Lawful Intercept (13 January 2010)
Why Isn't My Web Site Encrypted? (16 January 2010)

Why Isn't My Web Site Encrypted?

16 January 2010

In an NY Times Room for Debate posting, I urged a lot more use of encryption, even for routine posts. But my blog and web site are not encrypted. Why not? And can I fix it?

The short answer to the first question is simple: when I set up the blog, a few years ago, I just didn’t think about it. The second question, though, is remarkably hard to answer.

Proper web site design uses relative links. That is, instead of writing something like

<a href="http://www.cs.columbia.edu/~smb/blog/2010-01/2010-01-13a.html">…</a>
to refer to the previous post, I should simply write
<a href="2010-01-13a.html">…</a>
That makes it a lot easier to move web pages around. If people only viewed the blog as a web site, I would do that. But many people view it via a variety of RSS readers, which poses several problems.

First, many RSS readers don’t seem to do the right thing with relative links. Relative links that work perfectly well on the web site don’t work at all via RSS feeds. Maybe my directory structure is wrong for that; still, I haven’t gotten it to work. For that matter, links to postings in the RSS feed itself appear to need to be absolute. Again, maybe I’m doing it wrong, but I could never get that to work properly.

I also need to maintain backwards compatibility; I want all old links to continue to work.

There’s another problem: if you use https: (i.e., if you use an encrypted web page), you need a trust anchor, a starting point for the certificates that verify a site’s identity. Your browser has a lot built in; last time I checked, Firefox listed about 165 trust anchors (sometimes known as "certificate authorities" in this case). What trust anchors do RSS readers use? There only a handful of important browsers; there are many more RSS readers and aggregators. What about search engines? Whom do they trust? (Do search engines even crawl https:-protected pages? Content isn’t very findable unless it’s indexed by Google, Bing, Yahoo, etc.)

Finally, a noticeable portion of my web site is generated by programs. I’d have to modify the programs and/or their configuration files or wrapper scripts to spit out https: instead of http:, or possibly even create duplicate copies of pages. I’d also have to go back and fix up the absolute URLs when I can. I can’t just do a blind substitution, though, because things like BiBTeX entries need to contain the absolute references (to the https: copy?), rather than relative ones.

So what am I going to do? I will indeed upgrade the site to ensure that everything is accessible with or without encryption. It’s going to take a while to do that, especially because the semester starts in a few days and I’m not going to have much free time. But remember this: if I can’t do a flash cut to ubiquitous encryption, neither can a big web site like Google or the NY Times. Granted, being a web site maintainer isn’t my full-time job; on the other hand, my site is a lot less complex.

https://www.cs.columbia.edu/~smb/blog/2010-01/2010-01-16.html