March/April 2012 (Vol. 10, No. 2) pp. 102-103
1540-7993/12/$31.00 © 2012 IEEE
Published by the IEEE Computer Society
Published by the IEEE Computer Society
Numbers Worth Having
PDFs Require Adobe Acrobat
Early numbers are always wrong. —Mary Cheney
Writing in The New York Times six weeks to the day after 9/11, Richard Berner, then chief US economist for Morgan Stanley, predicted a decade's worth of increased security spending and prophesized the impact of it on productivity growth, the latter being the engine of most everything else (tinyurl.com/83v8p3e). US productivity over the interval 2001–2011 showed a compound annual growth rate (CAGR) of 2.25 percent (median 2.06 percent; tinyurl.com/34rl5e4):
Over that same interval, the CAGR for gross domestic product (GDP) was 1.64 percent (median 2.20 percent; tinyurl.com/7p6zjd4):
And over that interval once again, the share of GDP going to defense showed a CAGR of 4.49 percent (median 3.98 percent):
Taken together, we do see productivity growth being consumed by the after effects of 9/11. (N.B., real precise explication of the myriad side conditions and caveats can't fit in this column.)
Just over 10 years later, the Pentagon is discussing how to differentially draw down its overall forces—"working to keep those with valuable specialties such as cyber warfare and acquisitions" (tinyurl.com/848dr9n). In other words, going forward, the percentage of total military personnel devoted to cyberspace grows monotonically. Treating the defense sector as if it were a country, can we expect the kinds of curves shown in the three graphs above over the next 10 years?
Recently, Ponemon and Bloomberg (P&B) together surveyed cybersecurity at 172 enterprises (124 private, 48 public) central to critical infrastructure. In their survey, P&B chose to look at what it would take to reach 95 percent protection—that is, imperfect. They found that those 172 enterprises collectively spend US$5.3 billion on cybersecurity but would have to spend $46.6 billion to reach 95 percent protection—8.8 times as much. Among the survey population, annual spending varied from $16 million for companies in the food sector to $67 million in communications, and the step up in spending needed to reach a 95 percent protection varies—sevenfold for utilities, 13-fold for financial sector firms, and so on. Those of you who are consultants will find it unsurprising that the largest share of cybersecurity spending today goes for governance and compliance. One wishes for more numbers like those of P&B.
In fact, one wishes for more numbers all the time. To pick a question—what is the viability of public-key infrastructure given recent revelations?—one might ask what the history of revocation is and what it predicts. Alexandre Dulaunoy has given us a summation of the reasons for certificate revocation by pulling the CRLs from all public CAs (tinyurl.com/7roh4jq). Sadly, the majority of revoked certs show no reason at all. Those that do have this array:
There are a lot of numbers that it would be nice to have. Let's start with one that we can do by sampling: What are the odds that a random PDF is poisoned? Are the odds of safety better or worse for PDF resumes, which your average Human Resources Department ingests every day?
Suppose your goal is to achieve that 95 percent protection level that P&B used as a calibrator. If you agree with Adi Shamir's point in his Turing Award Lecture, that to halve your vulnerability requires you to double your expenditure, then you can extend the cost curve to how ever many "nines" you want (tinyurl.com/7j5jbs3).
It appears that with the low expense of cloud computing, there's no longer any reason to squeeze bits out of your programs. Yes, there was a time when getting a DOS file system to run in 4K proved you had the right stuff, but "Hello, World" has been a megabyte or more for a long time now. With cheap cloud, however, is there any remaining reason besides minimization of attack surface to not link to every library you could possibly need because program bloat just doesn't matter anymore? In any case, a good set of numbers would be those that established the trendline for, say, the median program running on a cloud or clouds. Better still would be a trendline measure of attack surface, which, to be maximally useful to other parties, would be randomized enough to be generalizable. Perhaps the posting of gasoline mileage ratings for cars offers us a useful analogy.
We all talk about the reality of a fast-changing cybersecurity environment driven by the overlap of new attackers and new technology. There is, at the same time, a lot of good academic work going on, but wouldn't it be nice to have a measure of how the cybersecurity body of knowledge is growing, and what share of that body of knowledge outdates yearly? Put differently, a half-life of the cybersecurity literature would be a helpful thing to have. Among other things, it would be a correlate of how much specialization we need or should expect. If the half-life were found to be shrinking, the implication about what it takes to get to 95 percent would be obvious.
Akin to this is a phenomenon that I see regularly: the rediscovery of something that had been known before but was thought to have become obsolete. Rate numbers for rediscovery compared to discovery are what a statistician might call sampling with or without replacement, a distinction that has its uses. Can we say that the number of as-yet-undiscovered cybersecurity research results is likely to be infinite, or is it bounded enough that we're gaining on it?
There might be folks working on the correlation between domain squatting and whether the domain is used in spam or phishing campaigns; that would be nice to have/calculate. A squatted domain might generate a little advertising income once in a blue moon, but renting it out might generate more. It's said that 2?3 of all registered domains are stockpiled by squatters, so one must assume that some of them are accomplices to cybercrime, whether unwittingly or not.
In The Haves and the Have Nots, Branko Milanovic of the World Bank demonstrated that whereas in Europe inequality between people was an intracountry matter far more than an intercountry matter, in Asia, the inequality is largely between countries rather than within them. This presents a couple of questions for which data would be attractive: If economic inequality in Asia is intercountry, is unequal access to cyberspace intercountry and, if not, what is at work? More to the point of this magazine, if digitally attacking Americans is just the latest version of "I rob banks because that's where the money is," then can we expect large near-term increases in transborder cyberattacks originating in Asia but also targeting Asia? Why not? China is the world's biggest computing monoculture, so yet more experimental evidence might soon be observable.
As I've written before, ratios are often the most instructive. My colleague Iván Arce suggests taking the quotient of the cost of digital breaches by data volume, then plotting trendlines. He suggests the quotient as a measure of "efficiency"; were it to remain constant or even just be linear, then the attackers' ability to extract value per unit of data isn't changing much. This would help support or disprove the prevailing idea that attacks have not only increased exponentially but also at a higher rate than overall technology adoption.
There are a lot more numbers it would be good to have, the cybersecurity evaluation of digital exhaust being just one more to throw in. Please be in touch if you have either numbers or ideas.
Daniel E. Geer Jr. is the chief information security officer for In-Q-Tel. He was formerly vice president and chief scientist at Verdasys and is a past president of the Usenix Association. Contact him at firstname.lastname@example.org.