Issue No. 05 - September/October (2004 vol. 6)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/MCSE.2004.43
Grid Computing: Making the Global Infrastructure a Reality, by Fran Berman, Geoffrey Fox, and Anthony J.G. Hey, John Wiley & Sons, 2003, ISBN 0470853190, US$105
The use of the term "Grid computing" covers a wide variety of efforts in distributed computing, high-performance computation, self-repairing networks, ubiquitous computing, as well as new techniques in high-availability computing. In recent years, the field has seen a burst of activity with many new projects springing up and has received increased attention in the technical and popular press.
Grid Computing: Making the Global Infrastructure a Reality ( www.grid2002.org) weighs in at just over 1,000 pages, and promises to be "a comprehensive reference to the state-of-the-art in Grid computing." Divided into four parts, the book is an edited collection of 43 papers from more than 100 authors working in the field. The contents range from high-level design papers to descriptions of Grid-based applications in science and industry.
As the authors point out in the foreword, whatever Grid computing might or might not be, it is not yet mature. Given such a rapidly developing field composed of a plethora of projects—few of whose builders can even agree on exactly what "Grid" means—summarizing this domain is no easy task. This book distills, compiles, and presents the status of Grid computing's different aspects into a single reference—a significant contribution to the Grid community and beyond.
THE GRID'S HISTORY, HYPE, AND FUTURE
Part A, "Overview and Motivation", gives an overview of the Grid, its evolution, what it's intended to do, and how it is supposed to do it. It also looks toward future developments, anticipating what could be possible if and when Grid techniques become widespread. Unfortunately, this rarefied and forward-looking discussion succumbs to hype and makes some rather wild claims not only regarding the scale and scope of today's Grid ("harnessing of computing resources distributed around the galaxy"—alas, no further details on the Grid-enabled interstellar fleet), but also its current development level.
Arguably misleading statements about the Grid's current state appear throughout the book. These often stem from the authors' confusing use of the present tense when describing capabilities that as yet have only been designed at the very highest levels and remain far from practical implementation. At the very worst, the authors credit the Grid for other projects' successes. Part A's first chapter—an overview written by the editors—claims that the Top500 ( www.top500.org) Web site tracks "the performance of the most high-performance nodes on the Grid." In fact, many machines listed on the site are at private institutions or US national labs and are unlikely to be connected to the Internet, let alone the Grid.
On the other hand, some sections do attempt to dispel common misconceptions about the Grid. For example, section 1.6 mentions that the often used analogy with electrical power grids ultimately breaks down when performance issues are accounted for because computational resources are insufficiently fungible—a power generator's location doesn't affect the performance of devices plugged into it, yet network effects such as latency and bandwidth make a disk array much more useful when connected to your local network rather than one in another city.
Overall, part A gives a comprehensive historical overview of the Grid, describing its origins in projects such as the seminal I-Way experiment (a distributed test bed with over 17 networked sites and 60 applications for the Supercomputing'95 conference) and going back to the first Grid-like ideas that emerged in the 1960s. It clearly describes the ambitious vision of Grid technologies, leaving Parts B and D to describe how these might be realized. The last chapter in this section is an excellent summary of Grid implementation experiences at NASA and the US Department of Energy, discussing not only the technical details, but also the political and organizational problems encountered.
Grid Architecture and Technologies
Part B, "Architecture and Technologies of the Grid", examines how the vision described in Part A is becoming a reality: first, by providing the high-level designs to which many of today's implementations seek to conform, and then going into details, including Grid middleware implementations.
The high-level description centers on the emerging Open Grid Services Architecture (OGSA) specification; this section contains the two papers that originally proposed the OGSA system. This bias is quite understandable and was probably the right thing to do at the time of writing. Unfortunately, since the book was written, Open Grid Services Infrastructure (OGSI)—the only part of OGSA to be codified in a concrete, implementable standard—became obsolete because of the Web Services Resource Framework (WSRF), which addresses certain problems in the original OGSI design. Nonetheless, despite going into no-longer-relevant details, this section is still highly relevant to the design of current Grid framework design.
Part B goes on to describe an impressively diverse range of implemented Grid (or Grid-like) systems, such as the CCA/XCAT framework ( www.extreme.indiana.edu/xcat), Legion/Avaki ( www.avaki.com), Condor ( www.cs.wisc.edu/condor), and Entropia ( www.entropia.com). We were surprised that the editors didn't include a detailed chapter on the Globus toolkit ( www.globus.org) on which many of the projects described in the rest of the book are based. The book then returns to research-level topics, giving an interesting summary of more exotic Grid-like methodologies, such as semantic or peer-to-peer Grids.
Grid Computing Environments
Grid computing environments (GCEs) can be thought of as the tools and technologies required to properly use Grid resources and applications. The Introduction chapter at the beginning of part C provides a good overview and sets the context for the role and critical importance of Grid computation and programming environments. However, there are several instances where it is difficult to determine whether the focus is on part C's remaining chapters or on chapters in a special issue of Concurrency and Computation: Practise and Experience referenced in the chapter, in which the bulk of the chapters in this section first appeared.
Grid aspects that form GCEs are not tightly defined, 1 which helps explain why part C covers a wide range of seemingly diverse topics. Chapter 21, Grid Programming Models: Current Tools, Issues and Directions, lays out the challenges that effective Grid code development poses as well as the unique requirements of programming on a Grid characterized by dynamic and heterogeneous resources. For example, a Grid programmer must design and manage the interaction between remote services, data sources, and hardware resources. The chapter also provides an excellent overview of current Grid programming models and likely directions for future investigation. Chapter 27 discusses Grid portal development kits and chapter 28 discusses Grid portals, which can be generically defined as easy and customizable interfaces that provide a single point of access to Grid resources and perform a variety of Grid operations. Part C also mentions middleware frameworks—such as Commodity Grid Toolkits (CoG kits; www.unix.globus.org/cog), Unicore ( www.unicore.org), and Netsolve ( http://icl.cs.utk.edu/netsolve)—which aim to provide a seamless interface between standard programming and desktop systems and the Grid architecture, thus enabling effective utilization of the Grid's rich supply of services. We must see much more effort and activity in the areas part C covers before the barrier for application deployments lowers and thus provides a broader acceptance of the Grid as a paradigm for computational science.
Part D is devoted to Grid applications (though other sections contain significant application discussions). Similar to Parts A through C, Part D begins with a short introduction chapter that abstracts some general principles about different types of applications well suited for the Grid. Although not in part D, chapter 23, "Classifying and Enabling Grid Applications", provides an equally interesting classification but a far more passionate case for applications on Grids: "Applications must be the lifeline of the Grid," rather than the Grid being "yet another theoretical exercise" in computer-science concepts.
Generally, this section's papers, along with chapter 23, do a good job of convincing and motivating readers that any eventual stable and widely used Grid environment will evolve only if there is an exchange of ideas and requirements between technology developers and users—a "technology push and applications pull."
Most applications in this section are data-centric, the context for which is set in chapter 36, which refers to the "The Data Deluge: An E-Science Perspective." The chapter has an interesting and little known compilation of typical data quantities that various sources generate. This chapter's section on open archives and scholarly publishing will interest anyone not already aware of current thoughts and trends. And although the paper doesn't make it exactly clear how Grid computing will advance the trends, readers who understand metadata, ontologies, and semantic Grid concepts should be able to make the connection.
While the book is a good overall presentation of the Grid it—like much Grid documentation—suffers by making too many assumptions about readers' knowledge to be particularly useful to scientists new to the field. Grid newcomers somewhat unfamiliar with distributed computing's latest technologies and trends must navigate a bewildering thicket of standards documents that repeatedly explain some terms and concepts (often in subtly different ways) and neglect others, a situation that the editors don't try to remedy.
There are many passages that assume readers understand what XML schema means, what precisely is meant by "ontology" in a Grid context, or the differences between encryption and authentication. While it is highly informative and useful reading for someone already familiar with basic Grid ideas, the entry barrier for someone who is not is embarrassingly high. Such newcomers might be better served by reading past issues of CiSE's "Web Computing" series ( www.computer.org/cise/).
Our practical experience is that the Grid, while full of potential, is a perplexing and shifting field, and it still can be an uphill struggle to get any application Grid-enabled, in any sense of the term. It's common to see disagreement among practitioners on what the Grid actually means or what does or doesn't constitute a Grid environment, let alone the correct way to build such a thing. The book contains many "The Grid is …", or "The Grid is not …" statements, and thus implicitly reflects this confusion. However, it could have treated this more explicitly given its Making the Global Infrastructure a Reality subtitle, as well as focused more on what has been done and what is still lacking in attempts to insulate users from changing paradigms.
After reading through all the standards and systems descriptions, it's worrisome that no clear vision remains of how they will be unified into a single, usable whole. However, as a demonstration of the variety of Grid-like projects underway, and of current thoughts on how the technology will evolve, this text is a worthwhile, if sometimes demanding, read. It isn't a user's guide or manual for the Grid, but even if someone could write such a book, it probably would be obsolete by the time it was published.
Jonathan Chin is an Engineering and Physical Sciences Research Council postdoctoral research fellow at the Centre for Computational Science, University College London. His research interests include mesoscale modeling and Grid and high-performance computing. He has an MSci in physics from the University of Oxford. Contact him at firstname.lastname@example.org.
Peter V. Coveney is a professor in physical chemistry and director of the Centre for Computational Science at University College London. His research interests include theoretical and computational science, atomistic, mesoscale, and multiscale modeling, statistical mechanics, high-performance computing, and visualization. He has a BA, MA, and PhD in chemistry from the University of Oxford. He is a fellow of the Royal Society of Chemistry and of the Institute of Physics. Contact him at email@example.com.
Shantenu Jha is an Engineering and Physical Sciences Research Council postdoctoral research fellow at the Centre for Computational Sciences, University College London. His research interests include Grid computing and computational physics. He has degrees in computer science and physics from Syracuse University and the Indian Institute of Technology, Delhi. He is a member of the Global Grid Forum and of the American Physical Society. Contact him at firstname.lastname@example.org.