28 February 2009
For some reason, I started thinking about a couple of old court cases. Among other points, both relied on the notion that a computer doing something with data was somehow different than a person doing the same thing. Is that true? Should it be?
The first case was Warshak v. United States, 490 F.3d 455 (6th Cir. 2006), which I wrote about some time ago. The court noted
The fact that a computer scans millions of e-mails for signs of pornography or a virus does not invade an individual's content-based privacy interest in the e-mails and has little bearing on his expectation of privacy in the content.Somehow, to this court, there was some sort of difference between a computer looking for nasty things and a person doing the same thing.
That decision was vacated for legal reasons (532 F.3d 521 (6th Cir. 2008)), so it sets no precedent, but the argument is similar to the one made about the FBI's Carnivore wiretap system: that having a computer filter the data did not mean that the discarded data was "searched" without a warrant. Is this a reasonable position?
There are many different issues here. One, of course, is accuracy: is the filtering correct? Carnivore once got its filter so wrong that an FBI agent discarded email intercepts about Osama bin Laden because email from "non-covered targets" was picked up as well. This issue — what to do about erroneous intercepts — concerns the FBI, too, for legal, political, and PR reasons. The spirit of their point, though, remains the same: as long as their filters properly discard unauthorized data, no unlawful search has taken place. But is that true?
In Smith v. Maryland, 442 U.S. 735 (1979), the U.S. Supreme Court ruled that individuals had no legitimate expectation of privacy in data — in this case, dialed phone numbers — given to a third party: the phone company. After all, people know that the phone company collects that data and does things with it. But — the court wrote
The switching equipment that processed those numbers is merely the modern counterpart of the operator who, in an earlier day, personally completed calls for the subscriber. Petitioner concedes that, if he had placed his calls through an operator, he could claim no legitimate expectation of privacy. We are not inclined to hold that a different constitutional result is required because the telephone company has decided to automate.
Let's turn that around. Suppose that the government decided to use people rather than software to do filtering, when the warrant does not allow inspection of content. Is that permissible? The central question is whether or not mechanization of the process makes a constitutional difference. As software gets "smarter", the question becomes more and more important.
4 February 2009
A while back, I commented on access to the source code of alcohol breath testers. Briefly, the Minnesota Supreme Court ruled that defendants had the right to see the source code to the device. Since then, however, things haven't gone so well.
As Ars Technica reports, subsequent court rulings have not been as favorable. The original case went back down and up the court ladder; last May, the Minnesota Court of Appeals ruled that the defendants had not shown that "the source code may relate to their guilt or innocence", and denied access to the code. Curiously, the ruling made no mention of the fact that the state of Minnesota owns the code in question. That case is now back on appeal to the state Supreme Court.
The Court of Appeals came to a similar conclusion in another case: the defendant had not shown that it might help. In both cases, that court showed a fundamental misunderstanding of software, testing, etc. (Aside: that opinion had the ominous heading "This opinion will be unpublished and may not be cited except as provided by Minn. Stat. § 480A.08, subd. 3 (2008)." Would I violating the law by pointing at it? Fortunately, a glimpse at the statute showed that they were only concerned with specifying procedures for lawyers to use when using the ruling — a legal "citation" is not nearly the same as an academic one!)
Perhaps the problem was a poor brief by the appellants. In the second case, the court wrote
Respondent Brunner provided the district court with a copy of the written testimony of Dr. David Wagner, a computer scientist, before a congressional committee inquiring into computerized voting systems. Although this testimony includes some explanation of what a "source code" is in general, it has no specific application to the Intoxilyzer or to the operation of breath-testing instruments. Respondent Brunner did not provide any affidavit from an expert on the design or operation of the Intoxilyzer or breath-testing instruments more generally. Accordingly, the district court was without any record from which to determine that the disclosure of the source code would "relate to the guilt or innocence of the defendant," or would lead to the discovery or development of admissible evidence on the reliability of the Intoxilyzer.(I assume that the court is referring to Dave Wagner from Berkeley, whom I've had the privilege of knowing for about 15 years.) Perhaps the lawyers didn't make it clear. More likely, though, the court didn't understand the fundamental point: writing complex software is hard, and voting machines are just a case in point. Looking at source code will often disclose just corner cases that "black box testing" — testing based just on the specifications — simply will not catch. Consider these quotes from Kohno, Stubblefield, Rubin, and Wallach's classic paper based on analysis of leaked Diebold voting machine source code:
All of the data on a storage device is encrypted using a single, hardcoded DES A black box tester may be able to observe that a key is nowhere specified or exported. From such testing, though, it is quite impossible to tell how long the key has been used, but its very longevity is itself a weakness. Similarly, examination of the code shows that "before being encrypted, a 16-bit cyclic redundancy check (CRC) of the plaintext data is computed." From the outside, it is extremely difficult to tell that it is a CRC that is being used, rather than, say, an encrypted HMAC.#define DESKEY ((des_key*)"F2654hD4")… from the CVS logs, we see this particular key has been used without change since December 1998, when the CVS tree for AccuVote-TS version 3 began
It is quite plausible — I would venture to guess likely — that there are similar flaws in the software for this device. Let me make up an example: suppose there is a time-of-day sensitivity in the code. (Note that I have no knowledge that there is such a thing, nor am I asserting that the examples I posit are real.) Perhaps the analysis depends on a photochemical reaction. Alternatively, perhaps the display differs by time of day. That's not unreasonable; I own a GPS that uses a black-on-white display during the day, but after (computed) sunset uses a white-on-black display. I suspect that most drunk driving arrests occur at night, but that most certification is done during normal working hours, i.e., during the day. Such a sensititivy would never be detected. More importantly, given the court's reasoning, the defendant would have no a priori reason to suspect such a thing, absent examination of the code. Is the alchohol test photosensitive? I have no idea. However, a code examination would almost certainly reveal it if it was, and hence reveal new avenues for testing.
Another point noted by Kohno et al. is that one can generally glean an overall impression of code quality, and hence reliability, from looking at the code. Consider this complimentary observation:
…the code reflects an awareness of avoiding such common hazards as buffer overflows. Most string operations already use their safe equivalents, and there are comments, e.g., should really use snprintf, reminding the developers to change others. While we are not prepared to claim that there are no exploitable buffer overflows in the current code, there are at the very least no glaringly obvious ones.That, too, can only be learned by looking at the source code. Conversely, the absence of such checks is a strong indication that the code was written by programmers who are, quite frankly, not up to professional standards. Which category does this code fall in? Absent qualified examination, we have no idea.
Software is a powerful and necessary part of modern life. Many common objects that were once entirely mechanical — phones, cars, sewing machines, even toasters — are based on software, but software is often faulty. Declaring someone guilty "beyond a reasonable doubt", without examining the software that provided crucial evidence, is just wrong.