Pages: pp. 5-9
The world's biggest physics experiment starts in May 2008. To support it, the world's biggest computing experiment has already begun. The payoffs could reach far beyond physics.
The epicenter for both experiments is the European Organization for Nuclear Research (CERN) in Geneva, Switzerland. There, the Large Hadron Collider (LHC) will smash protons together with such force that it could release showers of subatomic particles that haven't existed since the big bang. Sophisticated detectors will catch those particles and release a corresponding flood of data—some 15 Pbytes a year—which scientists will have to painstakingly compare to similar volumes of simulation data before they can make new discoveries. For five years, scientists and engineers have been building the computing grid that will make it possible.
When all that data floods the LHC computing grid (LCG) for the first time, not only will physicists be watching but also scientists in other disciplines who hope to run their own super-sized experiments in the future. Now CERN and others are working to organize a permanent computing grid to support them all.
When the LHC powers up in May, it will be the world's most powerful particle accelerator. In IEEE Spectrum, Francois Grey, spokesperson for CERN's IT department, outlined some of the expected computing challenges (July 2006, pp. 28–33). With 600 million proton collisions per second, and hundreds of debris particles streaming from each one, there's no way that CERN could record every collision—so it will keep data from one in a million.
Even then, it will collect 30 Gbytes of data per minute, or 15 Pbytes a year—three times more data than all the academic research libraries in the US and Europe combined. Add an equal amount of simulation data, which physicists must use to analyze the experimental data, and the LCG could amass hundreds of petabytes over the experiment's lifespan.
Three major computing grids will share the load: the Enabling Grids for E-scienceE (EGEE) grid, which has member clusters in Europe, Asia, and North America; NorduGrid in the Scandinavian countries; and the Open Science Grid in the US. These grids comprise hundreds of clusters around the world.
Any grid, Grey pointed out in an interview, is really just an infrastructure that lets users schedule computing jobs on far-away clusters. So the LCG's essence is the middleware that keeps everyone connected—gLite ( http://glite.web.cern.ch/). It uses the Globus Toolkit to manage general grid functions, and Condor—a workload management system developed by the University of Wisconsin-Madison—to perform distributed computing tasks.
Starting in May, the grid will be busy transferring and storing the LHC's start-up calibration data. Scientists hope to perform the first real physics experiments in July.
Raman Sundrum of Johns Hopkins University will visit CERN in July, but he won't be working directly on the LHC experiment. Yet, he and countless other theoretical physicists are critical to the project because they've developed the models that it will test and will help decipher what it finds. Largely working with pencil and paper, Sundrum puzzles out the equations that describe the universe's microscopic laws. Knowing what particles formed with the big bang will help physicists understand what the universe is made of today, he explains. The LHC might produce a much-sought-after particle called the Higgs boson, which physicists believe is responsible for giving all other known elementary particles their mass.
"We understand a lot about the mass-inducing Higgs mechanism in general, but its particular incarnation in the real world? We don't know. And that is the central physics that the LHC is aimed at," Sundrum says. The results could hint at the existence of new dimensions or dark matter's composition.
Physicists have specifically designed two LHC experiments to detect the Higgs and related particles. One experiment is the "Compact Muon Solenoid" (CMS) experiment, and the other is "A large Toroidal LHC ApparatuS" (ATLAS). Both projects involve hundreds of collaborating institutions and have developed new network technologies to get the data to their members.
High-energy physicist David Colling runs the physics team of the London e-Science Centre at Imperial College London, which is developing grid infrastructures and middleware to support CMS. He recently co-authored a paper in the Proceedings of the Second IEEE International Conference on e-Science and Grid Computing on the challenges of integrating one of the e-Science Centre clusters into the LCG infrastructure (IEEE CS Press, 2006, pp. 153–160). The 200-server cluster already had an infrastructure that wasn't entirely compatible, so the team modified part of the LCG software distribution to make the merger work. The source code they used is available online ( www.gridpp.ac.uk/wiki/LCG-on-SGE).
Shawn McKee, a high-energy astrophysicist at the University of Michigan, has been working on a different aspect of grid integration as network project manager for ATLAS's US arm. He led the team that developed the ATLAS middleware UltraLight ( www.ultralight.org).
Regardless of which experiment finds the Higgs boson first—if it indeed exists—the lessons they've learned will apply to any other branch of science that has to deal with massive data. Or, as McKee says, "Grids are pretty agnostic in regard to the type of science you do on them; they just know about bits [of data] that have to move, and access that has to be provided in an infrastructure. The details can be filled in by each discipline."
McKee believes that, within five years, other disciplines will eclipse particle physics in terms of their need for computing and data storage, particularly climate studies, neuroscience, and fusion research. Astronomy has large data processing needs as well. But bioinformatics will probably be the first to catch up, according to Colling. He points out that the discipline already has a grid presence one quarter of the size of particle physics.
Colling and McKee agree that particle physics is pushing grid science forward because of its immediate need to support the LHC. But what happens when that immediate need is over? Grids require a great deal of funding and effort to maintain. Does that mean that when the LHC is done serving science, the LCG will cease to exist?
Over the past few years, CERN's Grey says, a feeling has grown in the community that there should be some kind of permanent grid infrastructure to truly nurture big science. Researchers "are not going to make a big effort to work on grids [if there's] no guarantee that in a couple of years the grid will still be there."
Enter CERN's Bob Jones, computer scientist and EGEE project director. He encouraged 36 separate European national grid initiatives to form the European Grid Initiative (EGI; www.eu-egi.org), which won initial funding from the EU in September 2007. The EGI aims to make a sustainable grid to coordinate national grid infrastructures, develop middleware standards, and link the European grid infrastructure with similar ones elsewhere.
Jones was busily preparing the funding proposal for the EGEE's next two-year phase when he explained the need for EGI in an email: "EGEE has established a pan-European production grid infrastructure, which is the cornerstone of the LHC computing grid. The LHC will take data for at least 15 years, while EGEE is funded on two-year cycles, so the LHC provides a prime example of why EGI is necessary."
As to trends he sees developing, disciplines that historically have had less experience with large-scale computing projects, such as the social sciences and humanities, are beginning to embrace grids. To support this widening user base, grid infrastructure will need to become simpler to use, he says. This would be a task for the EGI.
Jones would like to see the grid user base grow even wider by making grids accessible to people all over the world, especially in regions in which the local infrastructure is still developing. He points to EGEE extension projects in Latin America, Asia, Africa, and the Middle East. "In this way, we can give researchers and students at local universities and institutes access to a first-class IT infrastructure so they can address issues such as environmental protection, disaster recovery, and public health."
The Large Hadron Collider computing grid (LCG) is arranged in a hierarchy, with CERN as Tier 0. Tier 1 contains 11 major data centers—mostly high-energy physics labs—and hundreds of smaller data centers, such as university clusters, comprise Tier 2.
As Tier 0, CERN's tasks are most daunting. First, it must determine which raw collision data to keep. From theoretical work such as Raman Sundrum's at Johns Hopkins, and from simulations and experiments in other accelerators, physicists have an idea of what kinds of electronic signatures might be left behind by the exotic particles they seek. Algorithms will scour the raw data and pull out those "one in a million" collisions that are worth a second look.
Then, CERN will convert that data into "physical" data that reconstructs the collisions—the energies involved and the paths that particles followed. These data products are called event summary data (ESDs). CERN will keep the original ESDs and distribute enough pieces among the Tier 1 partners to make two more complete copies. Because the ESDs offer a detailed view of events inside the collider, they're critical assets for calibrating the detectors. In fact, the University of Michigan's Shawn McKee says that at this stage of the data processing, his team will put significant effort into comparing the ESDs to simulations. "One of our first tasks will be to recreate those original events, and confirm that we understand how the detector works and how to interpret the data."
The ESDs are too large to be very useful to researchers, so CERN will assemble smaller products called analysis object data (AODs). Each Tier 1 partner will store a copy of the AODs; the Tier 2s will store pieces of yet more copies. Finally, research institutions will access the Tier 2 networks to download portions of the AODs that are of interest and assemble their own customized data products. They can study energy signatures in detail and compare the results to simulations—all on the same grid. CERN's Francois Grey estimates that each tier will support one-third of the simulation.
For most scientists, this is when things will get really interesting, McKee explains: "when you compare the simulation with the real data and you see a difference, that's when you'll realize that there could be 'new physics' going on. Then, if someone had a theory to explain it, you'd run another simulation and compare it again." At every stage, records of who did what to the data—both experimental and simulated—are stored with each data product. "That kind of metadata needs to exist because you need to make sure that the new physics wasn't the result of a miscalibration or misunderstanding."
For a brief look at current events, including program announcements and news items related to science and engineering, check out the following Web sites:
The Oregon summer is one of our well-kept secrets out here in the upper-left corner. It hardly ever rains, the sun shines a lot, and the temperature is cool enough to enjoy being outside, yet still sleep at night with a blanket. So why have I spent so much of the precious summer thinking about cyberinfrastructure (CI) rather than standing in a river fishing or writing a good book? As you might expect, the answer has more to do with money than good sense—namely, writing a grant proposal to pay for the creation of some building blocks in the infrastructure. Now that the proposal is floating in the bureaucratic digital ether, I feel that I should be able to provide some thoughtful reflections on CI's nature, although to be honest, it will be six months before I know if my CI vision agrees with that of the money distributors.
In my browsing for meanings of CI, the first use appears to be in a 1998 press briefing by Jeffrey Hunker ( www.fas.org/irp/news/1998/05/980522-wh3.htm), who was then the Director of the Critical Infrastructure Assurance Office and concerned about threats to the US. To him, the CI that needed protection included the Internet, electric power systems, transportation systems, and banking and financial systems. This clearly is a broad view, with all cyber-based information systems being covered and one in which CI represents the multiple and overlaying grids spread across the country, somewhat like the Visible Man's arteries, veins, and bones. And much like in a human, a breakdown in any one grid can cause widespread harm (although I keep hoping futilely for the financial grid to break down after I purchase carpets with my Visa card in some far corner of the world).
The word "cyber" itself appears to be an abbreviation of cybernetics, a term first used in its modern sense by Norbert Weiner ( Cybernetics, or Control and Communication in the Animal and the Machine, John Wiley & Sons, 1948) to denote the study of control processes in biological, mechanical, and electrical systems, and especially the flow of mathematical information within these systems. (The word cyber derives from the Greek word for governing, which I suppose is another term for control.) If we now extend this idea to include computer and information systems, and the hardware and software for them, we're coming to CI. By combining cyber with the word infrastructure, I believe that CI also denotes the idea that cybersystems will be there for us to use without much thought or concern about their reliability and details (as with our highways and bridges?), and thus free us up to be our creative and original selves.
The most-used meaning of "cyberinfrastructure," at least by those of us trolling for grant support, is (surprise, surprise) that put forward by the US National Science Foundation (NSF). In 2003, the NSF took a data-centric view of CI as "the new research environments that support advanced data acquisition, data storage, data management, data integration, data mining, data visualization and other computing and information processing services over the Internet. In scientific usage, cyberinfrastructure is a technological solution to the problem of efficiently connecting data, computers, and people with the goal of enabling derivation of novel scientific theories and knowledge" ( www.nsf.gov/news/news_summ.jsp?cntn_id=100330). High minded, but rather officious.
But as fashions change, so too do definitions. A more human view of CI is that given by Fran Berman, the San Diego Supercomputer Center's director: "Cyberinfrastructure is the coordinated aggregate of software, hardware, and other technologies, as well as human expertise, required to support current and future discoveries in science and engineering. The challenge of cyberinfrastructure is to integrate relevant and often disparate resources to provide a useful, usable, and enabling framework for research and discovery characterized by broad access and 'end-to-end' coordination" ( http://vis.sdsc.edu/sbe/SBE-CISE_Workshop_Intro.pdf). This definition moves away from CI's control aspect, and doesn't seem like something we need to protect from foreign invaders.
That CI includes the people element and is needed for research and creativity might just reflect Berman's view of our modern world. However, it must be true seeing that the NSF's Blue Ribbon Advisory Panel on CI also came to the same conclusion: "Like the physical infrastructure of roads, bridges, power grids, telephone lines, and water systems that support modern society, 'cyberinfrastructure' refers to the distributed computer, information and communication technologies combined with the personnel and integrating components that provide a long-term platform to empower the modern scientific research endeavor" ( www.nsf.gov/od/oci/reports/toc.jsp).
I like these later definitions because they make me feel that my limited efforts to improve computational science education for people are strengthening the CI, and thus building a stronger society. Indeed, the aforementioned Blue Ribbon Panel envisions us building new types of scientific and engineering knowledge environments and organizations on the CI to pursue research in new ways and with increased efficacy. Although I can't envision a world without a good book to read each night, maybe the next good book that I'll write will have to be CI-correct.