Issue No.01 - January/February (2010 vol.12)
Published by the IEEE Computer Society
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/MCSE.2010.7
<p>Born from a desire to predict the future, epidemiology has largely been limited to studying the past. Now, computational epidemiology researchers are harnessing computing power to crack the complicated mystery of how diseases spread.</p>
When news reports declared that two well-publicized computer models underestimated the initial spread of the 2009 swine flu pandemic, people asked why the models didn't work better. But the real question is why the models worked as well as they did given the difficulty that scientists face in tracing the human behavior patterns that spread disease.
The Prediction Challenge
As Armin Mikler, director of the Computational Epidemiology Research Laboratory at the University of North Texas, explains it, the science of epidemiology sprang from the human desire to predict the future. Ever since 19th century doctor John Snow traced a deadly outbreak of cholera to certain London water wells, scientists have attempted to track human behavior to forecast—and curtail—the spread of disease. From those roots, epidemiology has grown into a broad discipline.
Mikler, like many epidemiologists around the world, works with doctors, statisticians, social scientists, computer scientists, and public health officials to sort through the myriad genetic and environmental factors that promote disease. In the US, critical data comes from the Centers for Disease Control and Prevention (CDC). The goal is to one day track every illness—from cancer and heart disease to obesity and alcoholism. Epidemiologists have their work cut out for them: given the rate of international travel today, any communicable illness has the potential to cross the globe in a matter of hours.
Traditionally, researchers have examined past outbreaks, working backward to pinpoint likely causes. As a result, Mikler says, the science of public health "has become very good at analyzing what has happened, but is not very well equipped to predict what might happen." As Carlos Castillo-Chavez, director of the Mathematical, Computational, and Modeling Sciences Center at Arizona State University, puts it: "We can't do experiments. We can't infect someone and see what happens. We have to make decisions based on limited data."
By analyzing past outbreaks, epidemiologists are working to pinpoint factors that will most influence outbreaks in the future. Such predictions are difficult, however, because human behavior is notoriously random. When people are sick, they might go out or stay home. They might see a doctor or not. And, if they do visit a doctor, that doctor might run tests or simply diagnose the problem using his or her own best judgment. All such behaviors are essentially invisible to scientists and clearly complicate the prediction task.
Still, as Castillo-Chavez notes, there's tremendous public pressure to generate specific predictions, such as the number of people who will become infected. "We demand to know—even though science has shown that prediction is rarely a possibility."
A Model Case: Swine Flu
With the availability of massive data storage and fast processors, computational epidemiology has developed in the hope of filling the knowledge gap by simulating the spread of disease. Using computers to find patterns in data can help guide public health policy decisions, including how to distribute limited resources such as vaccines.
Swine flu efforts in the US offer a recent case in point and also illustrate the challenges facing the still-nascent field of computational epidemiology. One of the most prominent 2009 swine flu computer models came from Dirk Brockmann, professor of engineering and applied mathematics, and his team at Northwestern University. Their model correctly pegged the disease as entering the US from Mexico, with the most intense outbreaks in California, Texas, Florida, and New York.
In its first projection on 3 May 2009, the model estimated that by that month's end there would be approximately 2,000 cases in the US—a number Brockmann describes as having "an enormous error bar." This initial number was widely reported in the press. When, at the end of May, the CDC reported 7,500 confirmed cases—and an estimated 100,000 unreported cases—the The New York Times ran a story that asked, "What went wrong?" 1
In fact, nothing had gone wrong. Brockmann's team had continued to refine the simulations, and by 5 May their estimate was that approximately 7,000 cases would occur by 17 May—a projection that would raise the possible number of cases to 100,000 by month's end. So, after three simulation trials, the team was actually pretty close to the CDC's own report on the number of potential cases. (As of early September, the number of confirmed US cases was just under 44,000, with 302 people dead, and an estimated 1 million cases unreported; numbers have since continued to rise dramatically.)
The model's initial numbers were low because the team had underestimated the number of initial infections in Mexico. Once corrected, the projections fell in line with CDC estimates. Brockmann's success suggests that computer models can effectively help guide public policy—when good initial data is available, that is. But where do those initial numbers come from? Ultimately, they're based on suggestions from public health officials and knowledge about human behavior.
The Social Network Model
Epidemiology has grown more mathematical over the past century, according to Madhav Marathe, a professor of computer science and the deputy director of the Network Dynamics and Simulation Science Laboratory at Virginia Tech's Virginia Bio-Informatics Institute.
Marathe and Keith Bisset, a senior research associate at the NDSS Lab, note that disease models based on simple differential equations and aggregate data worked well before the Earth's population became urbanized and mobile. Now diseases thrive in crowded cities and are easily carried abroad, creating large, complex social networks.
To contend with this complexity, researchers have begun to base their epidemiological models on computational networks—mathematical constructs of real-world networks. As Bisset points out, many of the basic principles of network theory that apply to particle physics, transportation science, and economics also apply to epidemiology. That's because network theory describes complex interactions and relationships between generic objects, or network nodes. Individual nodes interact with other nodes based on network connections; in social networks, individual attributes—including a person's behavior and interactions with others—determine the course of a disease over time (see Figure 1).
Like the other researchers interviewed for this article, Brockmann and his team run their network-based models on computer clusters with multicore processors. "Every simulation we run is different, because we also simulate random events," Brockmann says.
"In order to get good statistics, we may run 1,000 pandemic events and then compute the expected outcome by averaging. We want to be able to adjust our simulations on a daily basis during the initial outbreak of an epidemic. Therefore, we need very fast computers, and lots of them."
Brockmann's team starts with small clusters for coarse-grained simulations, and then moves on to larger clusters for more detail; they ran their most detailed swine flu model on the BlueGene supercomputer at Argonne National Laboratory. According to Brockmann, a model is ready for public consumption when it's structurally stable. "That means, when you slightly alter the equations that are involved, the qualitative features of the model dynamics do not change," he says, adding that "you have to have a good understanding of how the various dynamical ingredients interact individually before you add them all together in a complete model."
Thus, to ensure that their assumptions are valid, computational epidemiologists must work closely with statisticians and public health experts. As Marathe notes, "consensus building is very important." It's also important to set a context before releasing results, according to Castillo-Chavez, who says that emphasizing all the caveats is crucial before unveiling a model to an impatient public. "We have to be clear about our assumptions … there should be truth in advertising."
In an effort to build better models and thus produce more reliable results, researchers are digging up data in innovative ways. Brockmann's team, for example, used data from Where's George?—an Internet site that tracks the movements of dollar bills—as a proxy for face-to-face human contact. Mikler's team chose a different proxy: blog postings. Hoping that bloggers who caught the flu would write about it, they downloaded some 10 Tbytes of blog entries between October 2008 and August 2009. So far, they're finding a relatively strong correlation between blogs and CDC data.
With better data sources, Marathe believes that computational epidemiology could soon become less of a predictor and more of a real-time tracking tool. He foresees more work being done on supercomputers, with increasingly elaborate simulations produced rapidly, as an epidemic unfolds.
Initial efforts toward this goal are already under way. At the University of North Texas, Mikler's team has built a simulation chamber—a kind of "situation room"—in which computer scientists, epidemiologists, and public health officials can gather to visualize disease data from multiple sources on a large screen, manipulate the data, and make real-time decisions. At Arizona State, Castillo-Chavez oversees a similar laboratory, the Decision Theater (www.decisiontheater.org), which enables real-time surveillance in a dynamic, visual way.
Martin Meltzer, senior health economist at the CDC, notes that a key challenge will be for researchers to show all this elaborate data in a way that's simple to understand, but not so simple that important information is lost. Also, because the CDC must issue recommendations to public health officials, all models must be easily accessed on desktop computers. "I spend my time building models that people can download from the 'Net," says Meltzer.
That's precisely why Bisset, Marathe, and their colleagues at Virginia Tech developed Simdemics software, which lets officials with different levels of computational experience set up detailed experiments to study various "what if" scenarios. Simdemics has three variants: EpiSims, EpiSimdemics, and EpiFast, which let users trade off between model generality and processing speed.
Ways to model the human aspect of disease dissemination will continue to evolve. In the future, Brockmann believes that models will simulate not just the spread of a disease but also people's fear of it, which changes their behavior and thus alters the disease's course. "Despite the enormous detail many models have nowadays, this sort of feedback loop has not been investigated systematically yet," he says. "Based on new Internet technologies, I believe this could be accomplished."
Bisset agrees. Modeling people's behavioral responses to an epidemic is "a beautiful question to tackle in the next few years," he says. He and Marathe have added a behavioral feedback loop to Simdemics, and are now testing it. As Brockmann cautions, however, increases in model complexity and computing power won't automatically translate into greater understanding of diseases. "I think we need to unravel the underlying structures that shape the patterns and dynamics of infectious diseases," he says. That notion meshes with Meltzer's general message from the CDC: keep it simple. "I would issue this challenge: can you reduce your model to a spreadsheet? Then you have a chance of connecting with policy makers."
Pam Frost Gorder is a freelance science writer living in Columbus, Ohio. Contact her at email@example.com.