, Tufts University
, University College London
Pages: pp. 10-11
This is the second of two special issues devoted to scientific applications of grid computing. In the previous guest editors' introduction ( Computing in Science & Engineering, vol. 7, no. 5, Sept./Oct. 2005, pp. 10–13), we explained the modern definition of grid computing:
Grid computing is distributed computing performed transparently across multiple administrative domains.
We also reviewed the sequence of historical events that led naturally to grid computing, beginning with the parallel computing revolution of the mid to late 1980s. It is much too early to decide whether today's fledgling efforts at grid computing will ultimately change the way that scientific computing is done as profoundly as did parallel computing. Nevertheless, the undeniable undercurrent of excitement and optimism—as well as the pioneering spirit—among grid computing's early practitioners should be instantly recognizable to all who remember what parallel scientific computing research was like in the mid 1980s.
Although it's been possible to geographically distribute data over the Internet for decades, it was only the advent of the World Wide Web in the early 1990s that made this capability available to the average user. To achieve this, early Web developers had to propose new standards and see them through to adoption by diverse communities of researchers, including the directors of computer centers, computer scientists, software engineers, applications programmers, computational scientists, and so on.
Likewise, it's been possible to geographically distribute computing over the Internet for quite some time, but we need something akin to the Web to make this capability transparent to wide communities of users. To solve serious issues involving security, data formatting, and communication protocols will require both technical and political leadership. An imposed solution isn't an option, thanks to the "multiple administrative domains" mentioned in the definition. As with the Web, what's needed are solutions that are so transparent and so compelling that a consensus naturally builds around them. This enterprise is very difficult, however, because distributing computing is intrinsically more difficult than distributing data.
Cross-site deployment of scientific software, for example, will become routine only when it's possible to reserve dedicated access to computer resources at different computer centers easily and automatically. Currently, it isn't quite possible to do this on the US National Science Foundation's TeraGrid—some human interaction, such as an email message or telephone call, is still required. The creation of schedulers to automate the process will naturally require some standardization of the way each center makes its scheduling information available to the public, the way it accepts and acts on reservations, and so on. For the TeraGrid, work is in progress that should solve this particular problem soon. For other computer centers in the US and the rest of the world, such standards could take years to be adopted.
Another important example of the need for standardization is the specification of workflow in grid computing applications. There must be a simple and effective way for users to specify which parts of their jobs—both data and instructions—reside or execute on which machines at any given time. This is an important theme of the article in this issue by Kelvin Droegemeier and his colleagues, which describes a grid environment for research and education for the study of weather phenomena.
The article by Richard Bruin and his colleagues describes the eMinerals cluster, from the UK's National Environmental Research Council e-Science testbed project, which aims to create an integrated infrastructure for realistic molecular-level simulations of minerals and associated reaction processes. The article also describes the cluster's organization and management, including details on how to back up data in a grid environment.
Finally, the article by Harvey Newman and his colleagues describes the UltraLight Project, which helps researchers analyze the enormous amounts of data that the next generation of particle physics experiments will generate, such as CERN's Large Hadron Collider. These types of experiments are necessarily international in scope, and any grid computing solution to a problem must be general and flexible enough to involve computational platforms throughout the world.
As with the examples provided in the September/October 2005 issue, these articles describe cutting-edge research in scientific grid computing. Although it's also possible to use grid computing to address small-scale scientific computing needs, our aim in assembling both these issues has been to focus on large-scale work—indeed, work that can't be done in any other way, for want of computational resources. In the end, we believe that grid computing's impact on computational science ought to be judged by what new natural science it enables us to discover.