May 2017
The n^2 Problem (1 May 2017)
Physicality and Comprehensibility (8 May 2017)
Patching is Hard (12 May 2017)
Who Pays? (16 May 2017)

Patching is Hard

12 May 2017

There are many news reports of a ransomware worm. Much of the National Health Service in the UK has been hit; so has FedEx. The patch for the flaw exploited by this malware has been out for a while, but many companies haven’t installed it. Naturally, this has prompted a lot of victim-blaming: they should have patched their systems. Yes, they should have, but many didn’t. Why not? Because patching is very hard and very risk, and the more complex your systems are, the harder and riskier it is.

Patching is hard? Yes—and every major tech player, no matter how sophisticated they are, has had catastrophic failures when they tried to change something. Google once bricked Chromebooks with an update. A Facebook configuration change took the site offline for 2.5 hours. Microsoft ruined network configuration and partially bricked some computers; even their newest patch isn’t trouble-free. An iOS update from Apple bricked some iPad Pros. Even Amazon knocked AWS off the air.

There are lots of reasons for any of these, but let’s focus on OS patches. Microsoft—and they’re probably the best in the business at this—devotes a lot of resources to testing patches. But they can’t test every possible user device configuration, nor can they test against every software package, especially if it’s locally written. An amazing amount of software inadvertently relies on OS bugs; sometimes, a vendor deliberately relies on non-standard APIs because there appears to be no other way to accomplish something. The inevitable result is that on occasion, these well-tested patches will break some computers. Enterprises know this, so they’re generally slow to patch. I learned the phrase "never install .0 of anything" in 1971, but while software today is much better, it’s not perfect and never will be. Enterprises often face a stark choice with security patches: take the risk of being knocked of the air by hackers, or take the risk of knocking yourself off the air. The result is that there is often an inverse correlation between the size of an organization and how rapidly it installs patches. This isn’t good, but with the very best technical people, both at the OS vendor and on site, it may be inevitable.

To be sure, there are good ways and bad ways to handle patches. Smart companies immediately start running patched software in their test labs, pounding on it with well-crafted regression tests and simulated user tests. They know that eventually, all operating systems become unsupported, and they plan (and budget) for replacement computers, and they make sure their own applications run on newer operating systems. If it won’t, they update or replace those applications, because running on an unsupported operating system is foolhardy.

Companies that aren’t sophisticated enough don’t do any of that. Budget-constrained enterprises postpone OS upgrades, often indefinitely. Government agencies are often the worst at that, because they’re dependent on budgets that are subject to the whims of politicians. But you can’t do that and expect your infrastructure to survive. Windows XP support ended more than three year ago. System administrators who haven’t upgraded since then may be negligent; more likely, they couldn’t persuade management (or Congress or Parliament…) to fund the necessary upgrade.

(The really bad problem is with embedded systems—and hospitals have lots of those. That’s "just" the Internet of Things security problem writ large. But IoT devices are often unpatchable; there’s no sustainable economic model for most of them. That, however, is a subject for another day.)

Today’s attack is blocked by the MS17-010 patch, which was released March 14. (It fixes holes allegedly exploited by the US intelligence community, but that’s a completely different topic. I’m on record as saying that the government should report flaws.) Two months seems like plenty of time to test, and it probably is enough—but is it enough time for remediation if you find a problem? Imagine the possible conversation between FedEx’s CSO and its CIO:

"We’ve got to install MS17-010; these are serious holes."

"We can’t just yet. We’ve been testing it for the last two weeks; it breaks the shipping label software in 25% of our stores."

"How long will a fix take?"

"About three months—we have to get updated database software from a vendor, and to install it we have to update the API the billing software uses."

"OK, but hurry—these flaws have gotten lots of attention. I don’t think we have much time."

So—if you’re the CIO, what do you do? Break the company, or risk an attack? (Again, this is an imaginary conversation.)

That patching is so hard is very unfortunate. Solving it is a research question. Vendors are doing what they can to improve the reliability of patches, but it’s a really, really difficult problem.

https://www.cs.columbia.edu/~smb/blog/2017-05/2017-05-12.html