Issue No. 06 - November/December (2007 vol. 9)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/MCSE.2007.119
Pam Frost Gorder , Freelance Writer
The world's biggest physics experiment starts in May 2008. To support it, the world's biggest computing experiment has already begun. The payoffs could reach far beyond physics.
The epicenter for both experiments is the European Organization for Nuclear Research (CERN) in Geneva, Switzerland. There, the Large Hadron Collider (LHC) will smash protons together with such force that it could release showers of subatomic particles that haven't existed since the big bang. Sophisticated detectors will catch those particles and release a corresponding flood of data—some 15 Pbytes a year—which scientists will have to painstakingly compare to similar volumes of simulation data before they can make new discoveries. For five years, scientists and engineers have been building the computing grid that will make it possible.
When all that data floods the LHC computing grid (LCG) for the first time, not only will physicists be watching but also scientists in other disciplines who hope to run their own super-sized experiments in the future. Now CERN and others are working to organize a permanent computing grid to support them all.
One in a Million
When the LHC powers up in May, it will be the world's most powerful particle accelerator. In IEEE Spectrum, Francois Grey, spokesperson for CERN's IT department, outlined some of the expected computing challenges (July 2006, pp. 28–33). With 600 million proton collisions per second, and hundreds of debris particles streaming from each one, there's no way that CERN could record every collision—so it will keep data from one in a million.
Even then, it will collect 30 Gbytes of data per minute, or 15 Pbytes a year—three times more data than all the academic research libraries in the US and Europe combined. Add an equal amount of simulation data, which physicists must use to analyze the experimental data, and the LCG could amass hundreds of petabytes over the experiment's lifespan.
Three major computing grids will share the load: the Enabling Grids for E-scienceE (EGEE) grid, which has member clusters in Europe, Asia, and North America; NorduGrid in the Scandinavian countries; and the Open Science Grid in the US. These grids comprise hundreds of clusters around the world.
Any grid, Grey pointed out in an interview, is really just an infrastructure that lets users schedule computing jobs on far-away clusters. So the LCG's essence is the middleware that keeps everyone connected—gLite ( http://glite.web.cern.ch/). It uses the Globus Toolkit to manage general grid functions, and Condor—a workload management system developed by the University of Wisconsin-Madison—to perform distributed computing tasks.
Starting in May, the grid will be busy transferring and storing the LHC's start-up calibration data. Scientists hope to perform the first real physics experiments in July.
Seeking the Higgs
Raman Sundrum of Johns Hopkins University will visit CERN in July, but he won't be working directly on the LHC experiment. Yet, he and countless other theoretical physicists are critical to the project because they've developed the models that it will test and will help decipher what it finds. Largely working with pencil and paper, Sundrum puzzles out the equations that describe the universe's microscopic laws. Knowing what particles formed with the big bang will help physicists understand what the universe is made of today, he explains. The LHC might produce a much-sought-after particle called the Higgs boson, which physicists believe is responsible for giving all other known elementary particles their mass.
"We understand a lot about the mass-inducing Higgs mechanism in general, but its particular incarnation in the real world? We don't know. And that is the central physics that the LHC is aimed at," Sundrum says. The results could hint at the existence of new dimensions or dark matter's composition.
Physicists have specifically designed two LHC experiments to detect the Higgs and related particles. One experiment is the "Compact Muon Solenoid" (CMS) experiment, and the other is "A large Toroidal LHC ApparatuS" (ATLAS). Both projects involve hundreds of collaborating institutions and have developed new network technologies to get the data to their members.
High-energy physicist David Colling runs the physics team of the London e-Science Centre at Imperial College London, which is developing grid infrastructures and middleware to support CMS. He recently co-authored a paper in the Proceedings of the Second IEEE International Conference on e-Science and Grid Computing on the challenges of integrating one of the e-Science Centre clusters into the LCG infrastructure (IEEE CS Press, 2006, pp. 153–160). The 200-server cluster already had an infrastructure that wasn't entirely compatible, so the team modified part of the LCG software distribution to make the merger work. The source code they used is available online ( www.gridpp.ac.uk/wiki/LCG-on-SGE).
Shawn McKee, a high-energy astrophysicist at the University of Michigan, has been working on a different aspect of grid integration as network project manager for ATLAS's US arm. He led the team that developed the ATLAS middleware UltraLight ( www.ultralight.org).
Regardless of which experiment finds the Higgs boson first—if it indeed exists—the lessons they've learned will apply to any other branch of science that has to deal with massive data. Or, as McKee says, "Grids are pretty agnostic in regard to the type of science you do on them; they just know about bits [of data] that have to move, and access that has to be provided in an infrastructure. The details can be filled in by each discipline."
Toward a Permanent Grid
McKee believes that, within five years, other disciplines will eclipse particle physics in terms of their need for computing and data storage, particularly climate studies, neuroscience, and fusion research. Astronomy has large data processing needs as well. But bioinformatics will probably be the first to catch up, according to Colling. He points out that the discipline already has a grid presence one quarter of the size of particle physics.
Colling and McKee agree that particle physics is pushing grid science forward because of its immediate need to support the LHC. But what happens when that immediate need is over? Grids require a great deal of funding and effort to maintain. Does that mean that when the LHC is done serving science, the LCG will cease to exist?
Over the past few years, CERN's Grey says, a feeling has grown in the community that there should be some kind of permanent grid infrastructure to truly nurture big science. Researchers "are not going to make a big effort to work on grids [if there's] no guarantee that in a couple of years the grid will still be there."
Enter CERN's Bob Jones, computer scientist and EGEE project director. He encouraged 36 separate European national grid initiatives to form the European Grid Initiative (EGI; www.eu-egi.org), which won initial funding from the EU in September 2007. The EGI aims to make a sustainable grid to coordinate national grid infrastructures, develop middleware standards, and link the European grid infrastructure with similar ones elsewhere.
Jones was busily preparing the funding proposal for the EGEE's next two-year phase when he explained the need for EGI in an email: "EGEE has established a pan-European production grid infrastructure, which is the cornerstone of the LHC computing grid. The LHC will take data for at least 15 years, while EGEE is funded on two-year cycles, so the LHC provides a prime example of why EGI is necessary."
As to trends he sees developing, disciplines that historically have had less experience with large-scale computing projects, such as the social sciences and humanities, are beginning to embrace grids. To support this widening user base, grid infrastructure will need to become simpler to use, he says. This would be a task for the EGI.
Jones would like to see the grid user base grow even wider by making grids accessible to people all over the world, especially in regions in which the local infrastructure is still developing. He points to EGEE extension projects in Latin America, Asia, Africa, and the Middle East. "In this way, we can give researchers and students at local universities and institutes access to a first-class IT infrastructure so they can address issues such as environmental protection, disaster recovery, and public health."