Issue No. 01 - January/February (2007 vol. 9)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/MCSE.2007.8
Pam Frost Gorder , Freelance Writer
On the second day of an earthquake research meeting in Tokushima, Japan, John Rundle sits in his hotel room, making a phone call through his laptop computer. Although it's only 6:00 a.m. local time, he's long been awake, still anchored in California time. In a few hours, he'll share some of the latest earthquake research from his laboratory at the University of California, Davis, with the conference attendees. His forecast, set to appear in the 1 December 2006 issue of Physical Review Letters, is an eerie one: northern California is likely to experience an earthquake of at least magnitude 6 on the Richter scale in the next 18 months.
The statement is astonishing, not just because it predicts a major earthquake, but because it does so with such precision. The physical forces that govern seismic events are so complex that scientists have struggled to assemble forecasts on the scale of decades. Now a new statistical technique has enabled Rundle's team to narrow the forecast window to less than three years—and in the present case, to only a year and a half. The calculations require a day of supercomputing time, but he and his partners plan to automate research-quality forecasts and make them available online. "Every day, we will recompute forecasts for these large earthquakes for northern and southern California," he says. "You can only do this with grid computing."
The idea of linking supercomputers into a computational grid to confront big problems isn't a new one, but Rundle and others in Earth science are doing something different. They're linking grids together—effectively, using grids of grids—thanks to a software movement that takes its name from the musical phenomenon known as the mashup.
Data on Demand
Artists have made musical mashups since at least the 1950s, combining portions of two or more songs into one. But the genre achieved new popularity in the 1990s, when music-editing software became widely available, and people could publish homemade mashups on the Internet. As music critic Sasha Frere-Jones explains, "The most celebrated mashups are melodically tuned, positing a harmonic relationship between, say, Madonna's voice and the Sex Pistols' guitars" ("1 + 1 + 1 = 1," The New Yorker, 10 Jan. 2005).
The philosophy of software mashups isn't so different: two complementary Web applications merge to create a custom tool. The number is growing every day, as companies like Google and Yahoo make the APIs for their programs publicly available. (Google offers its developer tools at http://code.google.com/ and Yahoo at http://developer.yahoo.com/.) As of mid-November 2006, the comprehensive ProgrammableWeb site ( www.programmableweb.com) lists more than 1,200 mashups, nearly two-thirds of which are map-related. The most commonly mashed-up API is Google Maps. Users have combined it with everything from apartment listings to crime statistics—any data set with a geographic component will do.
Marlon Pierce, assistant director of the Community Grids Lab (CGL) at Indiana University, was already developing grid applications when Google Maps debuted in the summer of 2005. That December, at the meeting of the American Geophysical Union in San Francisco, he reported CGL's first Google mashup: a seismic data map of the American Southwest.
"A lot of people are interested in Google Maps and Google Earth," Pierce says. "Google did a really good job making them both easy to use, but they also made it easy for people to develop new applications with them. So that was a lesson for the rest of us: a particular application can be complicated, but the face it presents to the outside world should be simple."
The mashup helps get the most out of computing grids. An application can reside on one grid, and a data set on another. The person who wants to use both could be anywhere—as long as all the applications and data sets follow compatible standards, and toolkits exist to bring them together, the combinations are endless.
Pierce is project director for the CrisisGrid ( www.crisisgrid.org), a project at CGL that aims to build and integrate such services on the Web, especially for Earth science. "This is our basic quest, to integrate those applications—what I'd call 'Execution Grid' services—with 'Data Grid' services. We want to provide a common framework," he says. CrisisGrid data could be anything from GPS station output to roadmaps to census data—all of which can be managed by geographical information system (GIS) services. "These problems may take a long time to calculate, and some of them have to be solved on supercomputers, but the data you get out of them is not at all abstract. You can display them with standard GIS-type services."
He and his colleagues use a distributed computing methodology called publish/subscribe. Users subscribe to a data set, which is pushed to them and published in a way similar to a subscription newsfeed. Middleware filters convert the data to desired formats, and other filters insert the data into Web tools such as Google Maps. In this way, the data set and the Web application that created it are out of the hands of the originators and into users' hands. Pierce says that's the idea of the mashup: "You just provide a service; you don't really care what anybody does with it."
Andrea Donnellan leads the Interferometric Synthetic Aperture Radar (InSAR) science team at NASA's Jet Propulsion Laboratory (JPL), which gathers satellite imagery of seismic activity. She predicts that interoperable Web services will be even more common in the future than they are today, and she expects more scientists will apply pattern-recognition methods to grid data. Donnellan looks forward to a future InSAR mission that will cover the entire globe and produce roughly 200 Gbytes of data per day. "Computational methods will be necessary for ingesting and understanding the data," she says.
Real Hazards, Virtual California
Unlike hurricanes, which form seasonally and can be spotted before they endanger life, earthquakes can happen any time, anywhere without warning. On 15 November 2006—less than a week after Donnellan and Rundle returned to the US from the meeting in Tokushima—an 8.1-magnitude earthquake struck the northernmost part of Japan. The next day, a 5.2-magnitude quake struck Taiwan. As of this writing, no casualties have been reported from either quake, although the first one sent tsunami waves across the Pacific Ocean that damaged docks in northern California.
Experts don't question whether the "Big One"—an earthquake of magnitude 8 or more—will strike California via the San Andreas Fault. It's only a question of when, where, and how big. The 1906 San Francisco earthquake was blamed on the fault, and probably would have registered near magnitude 8 had the scale existed then. Earth scientists are working hard to develop forecasts to help California residents and government agencies prepare for future quakes. As a result, the entire fault system and much of the southwestern US are dotted with permanent GPS stations that enable the monitoring of every tiny tremor. All that seismic data is archived, and much of it is publicly available.
Such a data set is maintained by the Scripps Orbit and Permanent Array Center (SOPAC) at the Scripps Institution of Oceanography at the University of California, San Diego. Pierce and his team, led by Geoffrey Fox at Indiana University, are working with SOPAC to convert a Web application that previously processed daily data to one that can process real-time data. SOPAC uses data mining software to pick out modes in seismic activity—patterns of GPS station movement.
"Our challenge is to apply this mode of detection to data that comes out once per second instead of once per day. We don't know what kind of modes to expect, so it will be interesting to see what might be hidden in those GPS signals," Pierce says.
Rundle's UC Davis team used data mining—a combination of algorithmic techniques called Relative Intensity Pattern Informatics (RIPI)—to construct its new forecast. To test the technique, they created a "hindcast" of seismic activity in northern and southern California since 1960. RIPI correctly predicted the time "windows" during which 16 out of 17 major groups of earthquakes occurred. "That's not a random coincidence. There's something causal at work," Rundle says. "We've discovered an underlying pattern in the timing of these earthquakes."
If RIPI is correct, the next major earthquake to strike California will happen in the northern part of the state by mid 2008. Rundle emphasized that the situation for southern California could change at any time, but at least his method gives an indication of whether a region is at high risk. Soon, he and his group believe they might be able to predict the locations of individual major earthquakes to within 50 kilometers or less.
Real seismic data fueled this new forecast, but the team is also using simulated data in a project called Virtual California. It applies everything that scientists know about the physics of fault systems to simulate millions of years of earthquakes. Rundle hopes to mine the simulations for clues as to how the real San Andreas operates.
Virtual California is one of three simulators under development for the QuakeSim project at NASA ( http://quakesim.jpl.nasa.gov/). Donnellan heads this project, which aims for interoperability between all its freely downloadable tools.
Pierce sees mashups as a democratizing force in the building of grid applications. He also suspects that letting people develop their own tools will be a boon to research. "I think we should encourage scientific mashups. When you look at the grid projects that have been the most successful, like high-energy physics or astronomy, the work has been led by the physical scientists," he says. "They have been making their own software."
Not everyone is a programming expert, and that's why mashups work so well. Scientists don't need to know specific details about how the tools work, as long as they know the parts that matter for the problem they're trying to solve.
Still, Pierce won't call the use of scientific mashups a paradigm shift. "I think the possibility [of a paradigm shift] is there, but I think it will take some very successful applications to convince people that this is what they should do."
JPL's Donnellan agrees that ease of use is the key. "The more off-the-shelf and plug-and-play grid computing is, the more earthquake scientists will make use of the available resources," she says.
To Rundle, the progress being made by his group at UC Davis ( http://hirsute.cse.ucdavis.edu/ ~rundle/), as well as by the CrisisGrid and QuakeSim projects, proves that there's a major place for computational science within earthquake forecasting. He calls for more collaboration between the Earth and computational sciences and statistical physics, which is especially suited for dealing with multidimensional data systems such as earthquake phenomena.
"Without the ideas of statistical physics, we would never have been able to do this," Rundle says. "Much of the theory we used came from areas well outside of Earth science, and that has to continue if we're going to make progress."