25 March 2019
Briefly, some crew of attackers—I suspect an intelligence agency; more on that below—has managed to abuse ASUS' update channel and private signing key to distribute bogus patches. These patches checked the victims' MAC address; machines on the this list (about 600 of them) downloaded the malware payload from a bogus website that masqueraded as belonging to ASUS.
The reason this is so bad is that trust in the update channel is utterly vital. All software is at least potentially buggy, and some of those bugs will be security holes. For this reason, virtually all software is shipped with a built-in update mechanism. Indeed, on consumer versions of Windows 10 patching is automatic, and while this poses some risks, overall it has almost certainly signficantly improved the security of the Internet: most penetrations exploit known holes, holes for which patches exist but have not been installed.
Now we have an attack that points out the danger of malicious updates. If this scares people away from patching their systems, it will hurt the entire Internet, possibly in a disastrous way. Did the people who planned this operation take this risk into account?
I once blogged that
In cyberattacks, there are no accepted rules… The world knows, more or less, what is acceptable behavior in the physical world: what constitutes an act of war, what is spying, what you can do about these, etc. Do the same rules apply in cyberspace?ShadowHammer is norm-destroying—or rather, it would be, if such norms existed.
Ten years ago, the New York Times reported on a plan to hack Saddam Hussein's bank accounts. They refrained because of the possible consequences and side-effects:
“We are deeply concerned about the second- and third-order effects of certain types of computer network operations, as well as about laws of war that require attacks be proportional to the threat,” said one senior officer.Whoever launched this attack was either not worried about such issues—or felt that the payoff was worth it.
This officer, who like others spoke on the condition of anonymity because of the classified nature of the work, also acknowledged that these concerns had restrained the military from carrying out a number of proposed missions. “In some ways, we are self-deterred today because we really haven’t answered that yet in the world of cyber,” the officer said.
I am convinced that this attack was launched by some country's intelligence service. I say this for three reasons: it abuses a very sensitive channel, it shows very selective targeting, and the targeting is based on information—MAC addresses—that aren't that widely available.
The nature of the channel is the first clue. Code-signing keys are precious commodities. While one would hope that a company the size of ASUS would use a hardware security model to protect its keys, at the very least they would be expected to have strong defenses around them. This isn't the first time that code-signing keys have been abused—Stuxnet did it, too—but it's not a common thing. This alone shows the attacker's sophistication.
The highly selective nature of the attack is the next clue. Only ASUS users were affected, and of the estimated 500,000 computers that downloaded the bogus update, the real damage was done to only 600. An ordinary thief, one who wanted bank account logins and passwords, wouldn't bother with this sort of restriction. Also, limiting the number of machines that had the actual malicious payload minimizes the risk of discovery. Any attacker might worry about discovery, but governments really don't want covert operations tied back to them.
Finally, there's the question of how the party behind this attack (and we don't know who it is, though Kaspersky has tied it to the BARIUM APT, which some have linked to China). MAC addresses aren't secret, but they're not trivially available to most parties. They're widely available on-LAN; that might suggest that the attacker already had a toehold in the targets' networks. Under certain circumstances, other LANs within an enterprise can see them, too (DHCP Relay, if you're curious). If any of these machines are laptops that have been used elsewhere, e.g., a hotel or public hotspot, someone who had penetrated that infrasctructure could monitor them. They could be on shipping boxes, or in some vendor database, e.g., inside ASUS—which we already know has been compromised. It's even possible to get them externally, if the victims (a) use IPv6, (b) use stateless IP address configuration, (c) don't use the privacy-enhanced version; and (d) visit the attacker's IPv6 website. In any of these scenarios, you'd also have to link particular MAC addresses to particular targets.
Any or all of these are possible. But they all require significant investment and really good intelligence. To me, this plus the other two clues strongly point to some country's intelligence agency.
So: we have a state actor willing to take signficant risks with the total security of the Internet, in pursuit of an objective that may or may not be that important. This is, shall we say, bad. The question is what the security community should recommend as a response. The answer is not obvious.
"Don't patch" is a horrid idea. As I noted, that's a sure-fire recipe for disaster. In fact, if the ShadowHammerers' goal was to destroy the Internet, this is a pretty good first step, to be followed by attacks on the patch channels of other major vendors. (Hmm: as I write this, I'm installing patches to my phone and tablet…)
Cautious individuals and sites may wish to defer installing patches; indeed, the newest version of Windows 10 appears to permit a deferral of 35 days. That allows time for bugs to be shaken out of the patch, and for confirmation that the update is indeed a real one. (Zetter noted that some ASUS users did wonder about the ShadowHammer patch.) Sometimes, though, you can't wait. Equifax was apparently hit very soon after the vulnerability was announced.
Nor is waiting for a vendor announcement a panacea. A high-end attacker—that is to say, a major intelligence agency—can piggyback malware on an existing patch, possibly by subborning insiders.
A high-end vendor might have an independent patch verification team. It would anonymously download patches, reverse-engineer them, and see if they did what they're supposed to do. Of course, that's expensive, and small IoT vendors may not be able to afford that. Besides, there are many versions of some patches, e.g., for different language packs.
Ultimately, I suspect that there is no single answer. System penetration via bogus updates were predicted 45 years ago in the classic Karger/Schell report on Multics security. (For those following along at home, it's in Section 22.214.171.124.) Caution and auditing by all concerned seems to be the best technical path forward. But policy makers have a role, too. We desperately need international agreements on military norms for cyberspace. These won't be easy to devise nor to enforce, but ultimately, self-restraint may be the best answer.
Update: Juan Andres Guerrero-Saade points out that Flame also abused the update channel. This is quite correct, and I should have been clearer about that. My blog post on Flame, cited above, was written a few days before that aspect of it was described publicly, and I misremembered the attack as spoofing a code-signing certificate à la Stuxnet. Flame was thus just as damaging to vital norms.
Update 2: Matt Blaze has an excellent New York Times op-ed on the importance of patching, despite this incident.
11 March 2019
Mark Zuckerberg shocked a lot of people by promising a new focus on privacy for Facebook. There are many skeptics; Zuckerberg himself noted that the company doesn't "currently have a strong reputation for building privacy protective services". And there are issues that his blog post doesn't address; Zeynep Tufekci discusses many of them While I share many of her concerns, I think there are some other issues—and risks.
The Velocity of Content
Facebook has been criticized for being a channel where bad stuff—anti-vaxxer nonsense, fake news (in the original sense of the phrase…), bigotry, and more—can spread very easily. Tufekci called this out explicitly:
At the moment, critics can (and have) held Facebook accountable for its failure to adequately moderate the content it disseminates—allowing for hate speech, vaccine misinformation, fake news and so on. Once end-to-end encryption is put in place, Facebook can wash its hands of the content. We don't want to end up with all the same problems we now have with viral content online—only with less visibility and nobody to hold responsible for it.Some critics have called for Facebook to do more to curb such ideas. The company itself has announced it will stop recommending anti-vaccination content. Free speech advocates, though, worry about this a lot. It's not that anti-vaxxer content is valuable (or even coherent…); rather, it's that encouraging such a huge, influential company to censor communications is very dangerous. Besides, it doesn't scale; automated algorithms will make mistakes and can be biased; people not only make mistakes, too, but find the activity extremely stressful. As someone who is pretty much a free speech absolutist myself, I really dislike censorship. That said, as a scientist I prefer not closing my eyes to unpleasant facts. What if Facebook really is different enough that a different paradigm is needed?
Is Facebook that different? I confess that I don't know. That is, it has certain inherent differences, but I don't know if they're great enough in effect to matter, and if so, if the net benefit is more or less than the net harm. Still, it's worth taking a look at what these differences are.
Before Gutenberg, there was essentially no mass communication: everything was one person speaking or writing to a few others. Yes, the powerful—kings, popes, and the like—could order their subordinates to pass on certain messages, and this could have widespread effect. Indeed, this phenomenon was even recognized in the Biblical Book of Esther
3:12 Then were the king's scribes called on the thirteenth day of the first month, and there was written according to all that Haman had commanded unto the king's lieutenants, and to the governors that were over every province, and to the rulers of every people of every province according to the writing thereof, and to every people after their language; in the name of king Ahasuerus was it written, and sealed with the king's ring.By and large, though, this was rare.
3:13 And the letters were sent by posts into all the king's provinces, to destroy, to kill, and to cause to perish, all Jews, both young and old, little children and women, in one day, even upon the thirteenth day of the twelfth month, which is the month Adar, and to take the spoil of them for a prey.
3:14 The copy of the writing for a commandment to be given in every province was published unto all people, that they should be ready against that day.
3:15 The posts went out, being hastened by the king's commandment, and the decree was given in Shushan the palace. And the king and Haman sat down to drink; but the city Shushan was perplexed.
Gutenberg's printing press made life a lot easier. People other than potentates could produce and distribute fliers, pamphlets, newspapers, books, and the like. Information became much more democratic, though, as has often been observed, "freedom of the press belongs to those who own printing presses". There was mass communication, but there were still gatekeepers: most people could not in practice reach a large audience without the permission of a comparative few. Radio and television did not change this dynamic.
Enter the Internet. There was suddenly easy, cheap, many-to-many communication. A U.S. court recognized this. All parties to the case (on government-mandated censorship of content accessible to children) stipulated, among other things:
79. Because of the different forms of Internet communication, a user of the Internet may speak or listen interchangeably, blurring the distinction between "speakers" and "listeners" on the Internet. Chat rooms, e-mail, and newsgroups are interactive forms of communication, providing the user with the opportunity both to speak and to listen.The judges recognized the implications:
80. It follows that unlike traditional media, the barriers to entry as a speaker on the Internet do not differ significantly from the barriers to entry as a listener. Once one has entered cyberspace, one may engage in the dialogue that occurs there. In the argot of the medium, the receiver can and does become the content provider, and vice-versa.
81. The Internet is therefore a unique and wholly new medium of worldwide human communication.
It is no exaggeration to conclude that the Internet has achieved, and continues to achieve, the most participatory marketplace of mass speech that this country—and indeed the world—has yet seen. The plaintiffs in these actions correctly describe the "democratizing" effects of Internet communication: individual citizens of limited means can speak to a worldwide audience on issues of concern to them. Federalists and Anti-Federalists may debate the structure of their government nightly, but these debates occur in newsgroups or chat rooms rather than in pamphlets. Modern-day Luthers still post their theses, but to electronic bulletin boards rather than the door of the Wittenberg Schlosskirche. More mundane (but from a constitutional perspective, equally important) dialogue occurs between aspiring artists, or French cooks, or dog lovers, or fly fishermen.But what if this is the problem? What if this new, many-to-many communications, is precisely what is causing trouble? More precisely, what if the problem is the velocity of communcation, in units of people per day?
Indeed, the Government's asserted "failure" of the Internet rests on the implicit premise that too much speech occurs in that medium, and that speech there is too available to the participants. This is exactly the benefit of Internet communication, however. The Government, therefore, implicitly asks this court to limit both the amount of speech on the Internet and the availability of that speech. This argument is profoundly repugnant to First Amendment principles.
High velocity propagation appears to be exacerbated by automation, either explicitly or as a side-effect. YouTube's recommendation algorithm appears to favor extremist content. Facebook has a similar problem:
Contrast this, however, with another question from Ms. Harris, in which she asked Ms. Sandberg how Facebook can “reconcile an incentive to create and increase your user engagement when the content that generates a lot of engagement is often inflammatory and hateful.” That astute question Ms. Sandberg completely sidestepped, which was no surprise: No statistic can paper over the fact that this is a real problem.The velocity, in these cases, appears to be a side-effect of this algorithmic desire for engagement. Sometimes, though, bots appear to be designed to maximize the spread of malicious content. Either way, information spreads far more quickly than it used to, and on a many-to-many basis.
Facebook, Twitter and YouTube have business models that thrive on the outrageous, the incendiary and the eye-catching, because such content generates “engagement” and captures our attention, which the platforms then sell to advertisers, paired with extensive data on users that allow advertisers (and propagandists) to “microtarget” us at an individual level.
Zuckerberg suggests that Facebook wants to focus on smaller-scale communications:
This is different from broader social networks, where people can accumulate friends or followers until the services feel more public. This is well-suited to many important uses—telling all your friends about something, using your voice on important topics, finding communities of people with similar interests, following creators and media, buying and selling things, organizing fundraisers, growing businesses, or many other things that benefit from having everyone you know in one place. Still, when you see all these experiences together, it feels more like a town square than a more intimate space like a living room.What if Facebook evolves that way, and moves more towards small-group communication rather than being a digital town square? What will be the effect? Will smaller-scale many-to-many communications behave this way?
There is an opportunity to build a platform that focuses on all of the ways people want to interact privately. This sense of privacy and intimacy is not just about technical features—it is designed deeply into the feel of the service overall. In WhatsApp, for example, our team is obsessed with creating an intimate environment in every aspect of the product. Even where we've built features that allow for broader sharing, it's still a less public experience. When the team built groups, they put in a size limit to make sure every interaction felt private. When we shipped stories on WhatsApp, we limited public content because we worried it might erode the feeling of privacy to see lots of public content—even if it didn't actually change who you're sharing with.
I personally like being able to share my thoughts with the world. I was, after all, one of the creators of Usenet; I still spend far too much time on Twitter. But what if this velocity is bad for the world? I don't know if it is, and I hope it isn't—but what if it is?
One final thought on this… In democracies, restrictions on speech are more likely to pass legal scrutiny if they're content-neutral. For example, a loudspeaker truck advocating some controversial position can be banned under anti-noise regulations, regardless of what it is saying. It is quite possible that a velocity limit would be accepted—and it's not at all clear that this would be desirable. Authoritarian governments are well aware of the power of mass communications:
The use of big-character-posters did not end with the Cultural Revolution. Posters appeared in 1976, during student movements in the mid-1980s, and were central to the Democracy Wall movement in 1978. The most famous poster of this period was Wei Jingsheng's call for democracy as a "fifth modernization." The state responded by eliminating the clause in the Constitution that allowed people the right to write big-character-posters, and the People’s Daily condemned them for their responsibility in the "ten years of turmoil" and as a threat to socialist democracy. Nonetheless the spirit of the big-character-poster remains a part of protest repertoire, whether in the form of the flyers and notes put up by students in Hong Kong's Umbrella Movement or as ephemeral posts on the Chinese internet.As the court noted, "Federalists and Anti-Federalists may debate the structure of their government nightly, but these debates occur in newsgroups or chat rooms rather than in pamphlets." Is it good if we give up high-velocity, many-to-many communications?
Certainly, there are other channels than Facebook. But it's unique: with 2.32 billion users, it reaches about 30% of the world's population. Any change it makes will have worldwide implications. I wonder if they'll be for the best.
Possible RisksZuckerberg spoke of much more encryption, but he also noted the risks of encrypted content: "Encryption is a powerful tool for privacy, but that includes the privacy of people doing bad things. When billions of people use a service to connect, some of them are going to misuse it for truly terrible things like child exploitation, terrorism, and extortion. We have a responsibility to work with law enforcement and to help prevent these wherever we can". What does this imply?
One possibility, of course, is that Facebook might rely more on metadata for analysis: "We are working to improve our ability to identify and stop bad actors across our apps by detecting patterns of activity." But he also spoke of analysis "through other means". What might they be? Doing client-side analysis? About 75% of Facebook users employ mobile devices to access the service; Facebook clients can look at all sorts of things. Content analysis can happen that way, too; though Facebook doesn't use content to target ads, might it use it for censorship, good or bad?
Encryption also annoys many governments. Governments disliking encryption is not new, of course, but the more people use it, the more upset they will get. This will be exacerbated if encrypted messaging is used for mass communications; Tufekci is specifically concerned about that: "Once end-to-end encryption is put in place, Facebook can wash its hands of the content. We don't want to end up with all the same problems we now have with viral content online—only with less visibility and nobody to hold responsible for it." We can expect pressure for back doors to increase—but they'll still be a dangerous idea, for all of the reasons we've outlined. (And of course that interacts with the free speech issue.)
I'm not even convinced that Facebook can actually pull this off. Here's the problem with encryption: who has the keys? Note carefully: you need the key to read the content—but that implies that if the authorized user loses her key, she herself has lost access to her content and messages. The challenge for Facebook, then, is protecting keys against unauthorized parties—Zuckerberg specifically calls out "heavy-handed government intervention in many countries" as a threat—but also making them available to authorized users who have suffered some mishap. Matt Green calls this mud puddle test: if you drop your device in a mud puddle and forget your password, how do you recover your keys?
Apple has gone to great lengths to lock themselves out of your password. Facebook could adopt a similar strategy—but that could mean that a forgotten password means loss of all encrypted content. Facebook of course has a way to recover from a forgotten password—but will that recover a lost key? Should it? So-called secondary authentication is notoriously weak. Perhaps it's an acceptable tradeoff to regain access to your account but lose access to older content—indeed, Zuckerberg explicitly spoke of the desirability of evanescent content. But even if that's a good tradeoff—Zuckerberg says "you'd have the ability to change the timeframe or turn off auto-deletion for your threads if you wanted"—if someone else (including a government) took control of you're account, it would violate another principle Facebook holds dear: "there must never be any doubt about who you are communicating with".
How Facebook handles this dilemma will be very important. Key recovery will make many users very happy, but it will allow the "heavy-handed government intervention" Zuckerberg decries. A user-settable option on key recovery? The usability of any such an option is open to serious question; beyond that, most users will go with the default, and will thus inherit the risks of that default.
19 February 2019
Microsoft is shipping a patch to eliminate SHA-1 hashes from its update process. There's nothing wrong with eliminating SHA-1—but their reasoning may be very interesting.
SHA-1 is a "cryptographic hash function". That is, it takes an input file of any size and outputs 20 bytes. An essential property of cryptographic hash functions is that in practice (though obviously not in theory), no two files should have the same hash value unless the files are identical.
SHA-1 no longer has that property; we've known that for about 15 years. But definitions matter. SHA-1 is susceptible to a "collision attack": an attacker can simultaneously create two files that have the same SHA-1 hash. However, given an existing file and hence its hash, it is not possible, as far as anyone knows, to generate a second file with that same hash. This attack, called a "pre-image attack", is far more serious. (There's a third type of attack, a "second pre-image attack", which I won't go into.)
In the ordinary sequence of events, someone at Microsoft prepares an update file. Its hash—its SHA-1 hash, in many cases—is calculated; this value is then digitally signed. Someone who wished to create a fake update would have to crack either the signature algorithm or, somehow, produce a fake update that had the same hash value as the legitimate update. But that's a pre-image attack, and SHA-1 is still believed to be secure against those. So: is this update useless? Not quite—there's still a risk.
Recall that SHA-1 is vulnerable to a collision attack. This means that if two updates are prepared simultaneously, one good and one evil, there can be a signed, malicious update. In other words, the threat model here is a corrupt insider. By eliminating use of SHA-1 for updates, Microsoft is protecting users against misbehavior by one of its own employees.
Now, perhaps this is just housekeeping. Microsoft can get SHA-1 out of its code base, and thus discourage its use. And it's past time to do that; the algorithm is about 25 years old and does have serious weaknesses. But it's also recognition that an insider who turns to the Dark Side can be very dangerous.
25 January 2019
My thoughts on algorithmic bias are in an op-ed at Ars Technica.
9 November 2018
My thesis is simple: the way we protect privacy today is broken and cannot be fixed without a radical change in direction.
For almost 50 years, privacy protection has been based on the Fair Information Practice Principles (FIPPs). There are several provisions, including purpose specification and transparency, but fundamentally, the underlying principle is notice and consent: users must be informed about collection and consent to it. This is true for both the strong GDPR model and the much weaker US model: ultimately, users have to understand what's being collected and how the data will be used, and agree to it. Unfortunately, the concept no longer works (if indeed it ever did). Arthur Miller (no, not the playwright) put it this way:
A final note on access and dissemination. Excessive reliance should not be placed on what too often is viewed as a universal solvent—the concept of consent. How much attention is the average citizen going to pay to a governmental form requesting consent to record or transmit information? It is extremely unlikely that the full ramifications of the consent will be spelled out in the form; if they were, the document probably would be so complex that the average citizen would find it incomprehensible. Moreover, in many cases the consent will be coerced, not necessarily by threatening a heavy fine or imprisonment, but more subtly by requiring consent as a prerequisite to application for a federal job, contract, or subsidy.
The problem today is worse. Privacy policies are vague and ambiguous; besides, no one reads them. And given all of the embedded content on web pages, no one knows which policies to read.
What should replace notice and consent? It isn't clear. One possibility is use controls: user specify for what their information can be used, rather than who can collect it. But use controls pose their own problems. They may be too complex to use, there are continuity issues, and—at least in the US—there may be legal issues standing in their way.
I suspect that what we need is a fundamentally new paradigm. While we're at it, we should also work on a better definition of privacy harms. People in the privacy community take for granted that too much information collection is bad, but it is often hard to explain to others just what the issue is. It often seems to boil down to "creepiness factor".
These are difficult research questions. Until we have something better, we should use use controls; until we can deploy those, we need regulatory changes about how embedded content is handled. In the US, we should also clarify the FTC's authority to act against privacy violators.
None of this is easy. But our "data shadow" is growing longer every day; we need to act quickly.
27 October 2018
The 2018 Texas general election is going to be a disaster, and that's independent of who wins or loses. To be more precise, I should say "who appears to win or lose", because we're never really going to know. Despite more than 15 years of warnings, Texas still uses DRE (Direct Recording Electronic) voting machines, where a vote is entered into a computer and there is no independent record of how people actually voted. And now, with early voting having started, people's votes are being changed by the voting machines.
This isn't the first time votes have been miscounted because of DRE machine failures. In 2004, "Carteret County lost 4,438 votes during the early-voting period leading up to Election Day because a computer didn't record them." Ed Felten has often written about the machines' own outputs show inconsistencies. (For some reasons, images are not currently showing on those blog posts, so I've linked to a Wayback Machine copy.)
It doesn't help that the problem here appears to be due to a completely avoidable design error by the vendor. Per the Texas Secretary of State's office, bad things can happen if voters operate controls while the page is rendering. That's an excuse, not a reason, and it's a bad one. Behavior like that is completely unacceptable from a human factors perspective. If the system will misbehave from data entry during rendering, the input controls should be disabled or inputs from them should be ignored during that time— period, end of discussion. There is literally no excuse for not doing this correctly. Programming this correctly is "hard"? Sorry; not an acceptable answer. And judging from how quickly Texas officials "diagnosed" the problem, it appears that they've known about the issue and let it ride. Again, this is completely unacceptable.
I've been warning about buggy voting machine software for more than 10 years:
Ironically, for all that I'm a security expert, my real concern with electronic voting machines is ordinary bugs in the code. These have demonstrably happened. One of the simplest cases to understand is the counter overflow problem: the voting machine used too small a field for the number of votes cast. The machine used binary arithmetic (virtually all modern computers do), so the critical number was 32,767 votes; the analogy is trying to count 10,000 votes if your counter only has 4 decimal digits. In that vein, the interesting election story from 2000 wasn't Florida, it was Bernalillo County, New Mexico; you can see a copy of the Wall Street Journal story about the problem here.I haven't changed my mind
Bellovin is "much more worried about computer error — buggy code — than cyberattacks," he says. "There have been inexplicable errors in some voting machines. It's a really hard problem to deal with. It's not like, say, an ATM system, where they print out a log of every transaction and take pictures, and there's a record. In voting you need voter privacy — you can't keep logs — and there's no mechanism for redoing your election if you find a security problem later."
The rapid growth in the prominence of DREs brought greater voice to concerns about their use, particularly their vulnerability to software malfunctions and external security risks. And as with the lever machines that preceded them, without a paper record, it is not possible to conduct a convincing audit of the results of an election.and recommended that
4.11 Elections should be conducted with human-readable paper ballots. These may be marked by hand or by machine (using a ballot-marking device); they may be counted by hand or by machine (using an optical scanner). Recounts and audits should be conducted by human inspection of the human-readable portion of the paper ballots. Voting machines that do not provide the capacity for independent auditing (e.g., machines that do not produce a voter-verifiable paper audit trail) should be removed from service as soon as possible.
4.12 Every effort should be made to use human-readable paper ballots in the 2018 federal election. All local, state, and federal elections should be conducted using human-readable paper ballots by the 2020 presidential election.
This election will undooubtedly end up in court: there's a hotly contested Senate race, and both campaigns are very well-funded. Whatever the outcome, many people will feel that they were disenfranchised—and it didn't have to happen.
6 September 2018
The National Academies of Sciences, Engineering, and Medicine has just released a new report on The Future of Voting. The recommendations in that report are clear, unambiguous, far-reaching, and (in my opinion) absolutely correct. I won't try to summarize it—if you can't read the whole thing, at least read the executive summary—but a few of the recommendations are worth highlighting:
- Protect voter registration databases and electronic poll books
- Votes should be cast only by human-readable paper ballots, though machine-marking is acceptable.
- Recounts should be done by human inspection of these ballots
- Risk-limiting audits should be implemented as soon as possible, in all jurisdictions
Also: though it isn't a major part of the report, the committee did briefly address those who suggest that the blockchain should be employed to secure elections. Again, they were unambiguous:
While the notion of using a blockchain as an immutable ballot box may seem promising, blockchain technology does little to solve the fundamental security issues of elections, and indeed, blockchains introduce additional security vulnerabilities.
If you're at all concerned about voting, read this report. My congratulations to the committee on a wonderful job.
24 August 2018
This morning, I saw a link to a fascinating document. Briefly, it's a declassified TICOM document on some German cryptanalytic efforts during World War II. There are a number of interesting things about it, starting with the question of why it took until 2018 to declassify this sort of information. But maybe there's a answer lurking here…
(Aside: I'm on the road and don't have my library handy; I may update this post when I get home.)
TYPEX—originally Type 10, or Type X—was the British high-level cipher device. It was based on the commercial Enigma as modified by the British. The famous German military Enigma was also derived from the commercial model. Although the two parties strengthened it in different ways, there were some fundamental properties—and fundamental weaknesses—that both inherited from the original design. And the Germans had made significant progress against TYPEX—but they couldn't take it to the next level.
The German Amy Cryptanalytic Agency, OKH/In 7/VI, did a lot of statistical work on TYPEX. They eventually figured out more or less everything about how it worked, learning only later that the German army had captured three TYPEX units at Dunkirk. All that they were missing were the rotors, and in particular how they were wired and where the "notch" was on each. (The notch controlled when the rotor would kick over to the next position.) And if they'd had the rotor details and a short "crib" (known plaintext)?
The approximate number of tests required would be about 6 × 143 = 16,464. This was not by any means a large number and could certainly be tackled by hand. No fully mechanised method was suggested, but a semi-mechanised scheme using a converted Enigma and a lampboard was suggested. There can be no doubt that it would have worked if the conditions (a) and (b) had ever been fulfilled. Moreover, the step from a semi-mechanised approach to a fully automatic method would not have been a difficult one.In other words, the Germans never cracked TYPEX because they didn't know anything about the rotors and never managed to "pinch" any. But the British did have the wiring of the Enigma rotors. How?
It turns out that the British never did figure that one out. It was the work of a brilliant Polish mathematician, Marian Rejewski; the Poles eventually gave their results to the French and the British, since they realized that even perfect knowledge of German plans wouldn't help if their army was too weak to exploit the knowledge.
Rejewski was, according to David Kahn, the first person to use mathematics other than statistics and probability in cryptanalysis. In particular, he used group theory and permutation theory to figure out the rotor wiring. This was coupled with a German mistake in how they encrypted the indicators, the starting positions of the rotors. (Space prohibits a full discussion of what that means. I recommend Kahn's Seizing the Enigma and Budiansky's Battle of Wits for more details.)
But what if the Germans had solved TYPEX? What would that have meant? Potentially, it would have been a very big deal.
The first point is that since TYPEX and the German military Enigma had certain similarities, the ability to crack TYPEX (which is generally considered stronger than Enigma) might have alerted the Germans that the British could do the same to them—which was, of course, the case. If that wasn't enough, the British often used TYPEX to communicate ULTRA—the intelligence derived from cryptanalysis of Engima and some other systems—to field units. (Aside: the British used one-time pads to send ULTRA to army-level commands but used TYPEX for lower-level units.) In other words, had the German army gained the ability to read TYPEX, it might have been extremely serious. And although their early work was on 1940 and earlier TYPEX, "Had they succeeded in reading early traffic it seems reasonable to conjecture that they might have maintained continuity beyond the change on 1/7/40 when the 'red' drums were introduced." It's certainly the case that the British exploited continuity with Enigma; most historians agree that if the Germans had used Enigma at the start of the war as well as they used it at the end, it's doubtful that the British could have cracked it.
There are a couple of other interesting points in the TICOM report. For one thing, at least early in the war British cipher clerks were making the same sorts of mistakes as the German clerks did: "operators were careless about moving the wheels between the end of one message and the start of the next". The British called their insight about similar laziness by the Germans the "Herivel tip". And the British didn't even encipher their indicators; they sent them in the clear. (To be sure, the bad way the Germans encrypted their indicators was what led to the rotor wiring being recovered, thus showing that not even trying can be better than doing something badly!)
So where are we? The Germans knew how TYPEX worked and had devised an attack that was feasible if they had the rotor wiring. But they never captured any rotors and they lacked someone with the brilliance of Marian Rejewski, so they couldn't make any progress. We're also left with a puzzle: why was this so sensitive that it wasn't declassified until more than 70 years the war? Might the information have been useful to someone else, someone who did know the rotor wiring?
It wouldn't have been the U.S. The U.S. and the British cooperated very closely on ULTRA, though the two parties didn't share everything: "None of our allies was permitted even to see the [SIGABA] machine, let alone have it." Besides, TICOM was a joint project; the US had the same information on TYPEX's weaknesses. However, might the Soviets have benefited? They had plenty of well-placed agents in the U.K. Might they have had the rotor wirings? I don't know—but I wonder if something other than sheer inertia kept that report secret for so many years.
8 August 2018
I keep hearing stories of people using "foldering" for covert communications. Foldering is the process of composing a message for another party, but instead of sending it as an email, you leave it in the Drafts folder. The other party then logs in to the same email account and reads the message; they can then reply via the same technique. Foldering has been used for a long time, most famously by then-CIA director David Petraeus and his biographer/lover Paula Broadwell. Why is foldering used? What is it good for, and what are its weaknesses? There's a one-word answer to its strength—metadata—but its utility (to the extent that it had any) is largely that of a bygone era.
Before I start, I need to define a few technical terms. In the email world, there are "MUAs"—Mail User Agents—and "MTAs"—Mail Transfer Agents. They're different.
An MUA is what you use to compose and read email. It could be a dedicated mail program—the Mail app on iPhones and MacOS, Outlook on Windows, etc. An MUA needs to configured with the domain names of the user's outbound and inbound email servers. MUAs live on user machines, like laptops and phones; MTAs are servers, and are run by corporations, ISPs, and mail providers like Google. And there's a third piece, an inbound mail server. A receiving MTA hands off the mail to the inbound mail server; the MUA talks to it and pulls down email from it.
Webmail systems are a bit funny. Technically, they're remote MUAs that you talk to via a web browser. But they still talk to MTAs and inbound mail servers, though you don't see this. The MUA and MTA might be on the same computer for a small operation (perhaps running the open source squirrelmail package); for something the size of Gmail or Hotmail, the webmail servers are on separate machines from the MTAs. However, foldering doesn't involve an MTA. Rather, it involves composing messages and leaving them in some folder. The folders are all stored on disk—as it turns out, on disk managed by the inbound mail server, even though you're composing mail. (Why? Because only inbound mail servers and MUAs know about folders; MTAs don't. The MUA could have a draft mail folder (it probably does), but by sending it to the inbound mail server, you can start composing email on one device and continue from another.)
Webmail systems are, as I said, MUAs. For technical reasons, they generally don't have any permanent folder storage of their own; they just talk to the inbound mail server.
So: foldering via a webmail system involves a web server and an inbound mail server. It does not involve an MTA—and that's important.
If you're trying to engage in covert communications, you're not going to use your own mail systems—it's too obvious what's going on. Accordingly, you'll probably use a free commercial email service such as Google's Gmail or Microsoft's Outlook. The party with whom you're communicating will do the same. Let's follow the path of a typical email from a Gmail user (per the usual conventions in cryptography, we'll call her Alice) to an Outlook user named Bob.
The sender logs in to Gmail, probably via a web browser though possibly via an MUA app. Even back in the mists of time, the login connection was encrypted. However, until 2010, the actual session wasn't encrypted by default, though users were able to turn on encryption since at least 2008. Let's assume that our hypothetical conspirators or lovers were security-conscious, and thus turned on encryption for this link. That meant that no eavesdropper could see what was going on, and in particular could not see who logged in to Gmail or to whom a particular email was being sent. After Alice clicks "Send", though, the webmail MUA hands the message off to the MTA—and that's where the security breaks down. Back then, the MTA-to-MTA traffic was not encrypted; thus, someone—an intelligence agency?—monitoring the Internet backbone would see the emails. Bingo: our conspirators are burned. And even if we're talking about simple legal processes, the sender and recipient of such email messages are (probably) legally metadata and hence are readily available to law enforcement.
Suppose, though, that Alice and Bob used foldering. There are no MTAs involved, hence no sender/receiver metadata, and no unencrypted content flowing anywhere. They're safe—or so they thought…
When Alice logs into Gmail, her IP address is recorded. It, too, is metadata. An eavesdropper doesn't know that it's Alice, but her IP address is visible. More importantly, it's logged by Gmail: user Alice logged in from 203.0.113.42. Oddly enough, "Alice"—it's really Bob, of course—logged in from 198.51.100.17 as well, and those two IP addresses aren't physically located anywhere near each other. That discrepancy might even be logged. Regardless, it's in Gmail's log files, and if Alice or Bob are under suspicion, a simple subpoena for the log files (or a simple hack of the mail server) will show what's going on: these two IP addresses are showing a decidedly odd login pattern, and one of them belongs to a party under suspicion.
So where are we, circa 2010? Suppose neither Alice nor Bob were suspected of anything and they sent email. An intelligence agency monitoring assorted Internet links would see email between the two of them; if one was being targeted, it would be able to pick off the contents of the messages. If they used foldering, though, they would be much safer: there wouldn't be any incriminating unencrypted traffic. The spooks would see traffic from Alice's and Bob's IP addresses to Gmail or Outlook, but that's not suspicious. The login names and the sessions themselves are protected.
Suppose, though, that Alice and/or Bob were under suspicion by law enforcement. A subpoena would get the login IP addresses; the discrepancy would stick out like a sore thumb, and the investigation would proceed apace.
In other words, in 2010 foldering would protect against Internet eavesdropping but not against law enforcement.
The world is very different today. Following the Snowden revelations, many email providers turned on encryption for MTA-to-MTA traffic. As a consequence, our hypothetical intelligence agency can't see that email is flowing between Alice and Bob; it's all protected. If they're being investigated, of course, a subpoena will show the email—but the same sort of subpoena would also show the login IP addresses.
Where does that leave us? Today, an attacker with access to log files, either via subpoena or by hacking a mail server, can see the communication metadata whether Alice and Bob are using foldering or simply sending email. An eavesdropper can't see the communications in either case. This is in contrast to 2010, when an eavesdropper could learn a lot from email but couldn't from a foldering channel.
Conclusion: if Alice and Bob and their mail services take normal 2018 precautions, foldering adds very little security.
7 August 2018
There have been many news stories of late about potential attacks on the American electoral system. Which attacks are actually serious? As always, the answer depends on economics.
There are two assertions I'll make up front. First, the attacker—any attacker—is resource-limited. They may have vast resources, and in particular they may have more resources than the defenders—but they're still limited. Why? They'll throw enough resources at the problem to solve it, i.e., to hack the election, and use anything left over for the next problem, e.g., hacking the Brexit II referendum… There's always another target.
Second, elections are a system. That is, there are multiple interacting pieces. The attacker can go after any of them; the defender has to protect them all. And protecting just one piece very well won't help; after all, "you don't go through strong security, you go around it." But again, the attacker has limited resources. Their strategy, then, is to find the greatest leverage, the point to attack that costs the defenders the most to protect.
There are many pieces to a voting system; I'll concentrate on the major ones: the voting machines, the registration system, electronic poll books, and vote-tallying software. Also note that many of these pieces can be attacked indirectly, via a supply chain attack on the vendors.
There's another point to consider: what are the attacker's goals? Some will want to change vote totals; others will be content with causing enough obvious errors that no one believes the results—and that can result in chaos.
The actual voting machines get lots of attention. That's partly a hangover from the 2000 Bush–Gore election, where myriad technological problems in Florida's voting system (e.g., the butterfly ballot in Palm Beach County and the hanging chads on the punch card voting machines) arguably cost Gore the state and hence the presidential election.
And purely computerized (DRE—Direct Recording Electronic) voting machines are indeed problematic. They make mistakes. If there's ever a real problem, there's nothing to recount. It's crystal-clear to virtually every computer scientist who has studied the issue that DRE machines are a bad idea. But: if you want to change the results of a nation-wide election or set of elections in the U.S., going after DRE machines is probably the wrong idea. Why not? Because it's too expensive.
There are many different election administrations in the U.S.: about 10,000 of them. Yes, sometimes an entire state uses the same type of machine—but each county administers its own machines. Storing the voting machines? Software updates? Done by the county. Progamming the ballot? Done by the county. And if you want to attack them? Yup—you have to go to that county. And voting machines are rarely, if ever, connected to the Internet, which means that you pretty much need physical presence to do anything nasty.
Now, to be sure, if you are at the polling place you may be able to do really nasty things to some voting machines. But it's not an attack that scales well for the attacker. It may be a good way to attack a local election, but nothing larger. A single Congressional race? Maybe, but let's do a back-of-the-envelope calculation. The population of the U.S. is about 325,000,000. That means that each election area has about 32,500 people. (Yes, I know it's very non-uniform. This is a back-of-the-envelope calculation.) There are 435 representatives, so each one has about 747,000 constituents, or about 75 election districts. (Again: back of the envelope.) So: you'd need a physical presence in seven different counties, and maybe many precincts in each county to tamper with the machines there. As I said, it's not an attack that scales very well. We need to fix our voting machines—after all, think of Florida in 2000—but for an attacker who wants to change the result of a national election, it's not the best approach.
There's one big exception: a supply chain attack might be very feasible for a nation-state attacker. There are not many vendors of voting equipment; inserting malware in just a few places could work very well. But there's a silver lining in that cloud: because there are many fewer places to defend than 50 states or 10,000 districts, defense is much less expensive and hence more possible—if we take the problem seriously.
And don't forget the chaos issue. If, say, every voting machine in a populus county of a battleground state showed a preposterous result—perhaps a 100% margin for some candidate, or 100 times as many votes cast as there are registered voters in the area—no one will be believe that that result is valid. What then? Rerun the voting in just that county? Here's what the Constitution says:
The Congress may determine the Time of chusing the Electors, and the Day on which they shall give their Votes;
which Day shall be the same throughout the United States.
The voter registration systems are a more promising target for an attacker. While these are, again, locally run, there is often a statewide portal to them. In fact, 38 states have or are about to have online voter registration.
In 2016, Russia allegedly attacked registration systems in a number of states. Partly, they wanted to steal voter information, but an attacker could easily delete or modify voter records, thus effectively disenfranchising people. Provisional ballots? Sure, if your polling place has enough of them, and if you and the poll workers know what to do. I've been a poll worker. Let's just say that handling exceptional cases isn't the most efficient process. And consider the public reaction if many likely supporters (based on demographics) of a given candidate are the ones who are disproportionately deleted. (Could the attackers register phony voters? Sure, but to what end? In-person voter fraud is exceedingly rare; how many times can Boris and Natasha show up to vote? Again, that doesn't scale. That's also why requiring an ID to vote is solving a non-problem.)
There's another point. Voting software is specialized; it's attack surface should be low. It's possible to get that wrong, as in some now-decertified Virginia voting machines, and there's always the underlying operating system; still, if the machines aren't networked, during voting the only exposure should be via the voting interface.
A lot of registration software, though, is a more-or-less standard web platform, and is therefore subject to all of the risks of any other web service. SQL injection, in particular, is a very real risk. So an attack on the registration system is not only more scalable, it's easier.
Before the election, voter rolls are copied to what are known as poll books. Sometimes, these are paper books; other places use electronic ones. The electronic ones are networked to each other; however, they are generally not connected to the Internet. If that networking is set up incorrectly, there can be risks; generally, though, they're networked on a LAN. That means that you have to be at the polling place to exploit them. In other words, there's some risk, but it's not much greater than the voting machines.
There's one more critical piece: the vote-tallying software. Tallies from each precinct are transmitted to the county's election board; there may be links to the state, to news media, etc. In other words, this software is networked and hence very subject to attack. However: this is used for the election night count; different procedures can be and often are used for the official canvas. And even without attacks, many things can go wrong:
In Iowa, a hard-to-read fax from Scott County caused election officials initially to give Vice President Gore an extra 2,006 votes. In Outagamie County, Wis., a typo in a tally sheet threw Mr. Bush hundreds of votes he hadn't won.But: the ability to do a more accurate count the second time around depends on there being something different to count: paper ballots. That's what saved the day in 2000 in Bernalillo County, New Mexico. The problem: ``The paper tallies, resembling grocery-store receipts, seemed to show that many more ballots had been cast overall than were cast in individual races. For example, tallies later that night would show that, of about 38,000 early ballots cast, only 25,000 were cast for Mr. Gore or Mr. Bush.'' And the cause? Programming the vote-counting system:
As they worked, Mr. Lucero's computer screen repeatedly displayed a command window offering a pull-down menu. From the menu, the two men should have clicked on "straight party." Either they didn't make the crucial click, or they did and the software failed to work. As a result, the Accu-Vote machines counted a straight-party vote as one ballot cast, but didn't distribute any votes to each of the individual party candidates.Crucially, though, once they fixed the programming they could retally those paper ballots. (By the way, programming the tallying computer can itself be complex. Bernalillo County, which had a population of 557,000 then, required 114 different ballots.)
To illustrate: If a voter filled in the oval for straight-party Democrat, the scanner would record one ballot cast but wouldn't allocate votes to Mr. Gore and other Democratic candidates.
There's a related issue: the systems that distribute votes to the world. Alaska already suffered such an attack; it could happen elsewhere, too. And it doesn't have to be via hacking; a denial of service attack could also do the job of causing chaos.
The best way to check the ballot-counting software is risk-limiting audits. A risk-limiting audit checks a random subset of the ballots cast. The closer the apparent margin, the more ballots are checked by hand. "Risk-limiting audits guarantee that if the vote tabulation system found the wrong winner, there is a large chance of a full hand count to correct the results." And it doesn't matter whether the wrong count was due to buggy software or an attack. In other words, if there is a paper trail, and if it's actually looked at, via either a full hand-count or a risk-limiting audit, the tallying software isn't a good target for an attacker. One caveat: how much chaos might there be if the official count or the recount deliver results significantly different than the election night fast count?
There's one more point: much of the election machinery, other than the voting machines themselves, are an ordinary IT installation, and hence are subject to all of the security ills that any other IT organization can be subject to. This specifically includes things like insider attacks and ransomware—and some attackers have been targeting local governments:
Attempted ransomware attacks against local governments in the United States have become unnervingly common. A 2016 survey of chief information officers for jurisdictions across the country found that obtaining ransom was the most common purpose of cyberattacks on a city or county government, accounting for nearly one-third of all attacks.The threat of attacks has induced at least one jurisdiction to suspend online return of absentee ballots. They're wise to be cautious—and probably should have been that cautious to start.
Again, elections are complex. I've only covered the major pieces here; there are many more ways things can go wrong. But of this sample, it's pretty clear that the attackers' best target is the registration system. (Funny, the Russians seemed to know that, too.) Actual voting machines are not a great target, but the importance of risk-limiting audits (even if the only problem is a close race) means that replacing DRE voting machines with something that provides a paper trail is quite important. The vote-counting software is even less interesting if proper audits are done, though don't discount the utility to some parties of chaos and mistrust.
Update: No sooner did I write about how impossible results could lead to chaos than this story appeared about DRE machines in Georgia: "[i]n Habersham County's Mud Creek precinct, … 276 registered voters managed to cast 670 ballots". There were other problems, too. I suspect bugs rather than malice—but we don't really know yet.