Issue No. 08 - August (2007 vol. 8)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/MDSO.2007.47
Arturo Ortiz-Tapia , Mexican Petroleum Institute
The Linux Enterprise Cluster
No Starch Press, 2005
In The Linux Enterprise Cluster, Karl Kopper gathers almost all of the information necessary to build and administer a high-availability cluster based on Linux and its related free software. Such a cluster isn't exactly the same as a "high-performance" cluster; Kopper is referring to one that will make programs and databases fail-safe and reliable for a large group of people, but it's not a cluster for parallel computing.
Kopper emphasizes two main ideas. First, a high-availability cluster is one that "should act like a single unified computing resource." This means that, given a collection of programs, data, and users, none should be aware that they belong to a cluster of computers. Several CPUs and hard disks and a load balancer serve as a single, giant, "fail-safe" device that aims to offer programs, data, and other computing resources (including printing spooling) to a large number of people, even if a single element of the cluster crashes.
Second, Kopper explains that it's possible to make such a cluster out of commodity hardware, such as what you would normally find in an office. Commodity hardware isn't particularly special in any way; it's pretty standard. I recommend consulting Wikipedia ( http://en.wikipedia.org/wiki/Commodity_computer) for a good definition.
But how can I build a cluster?
And how, you might ask, could anyone build such a cluster without a single point of failure? Kopper describes several packages for achieving that goal, principally the High Availability Linux project's Heartbeat solution. Heartbeat basically identifies which computer owns a resource and labels it the primary server. Another computer will act as a backup server, and Heartbeat checks the primary server every now and then. Upon failure, Heartbeat acts as an intermediary and makes the backup server take over.
Kopper also discusses Ganglia software, which can supervise the cluster. Ganglia "provides a sophisticated mechanism for collecting and storing cluster node performance data, and it includes tools for displaying this data in a Web browser." In other words, Ganglia can help you visualize how well (or badly) your cluster is behaving both globally and locally, almost in real time.
Other subjects Kopper discusses include how to build a Linux enterprise cluster, the Linux virtual server, the load balancer, and the network file system. Of course, he also provides details on how to configure Heartbeat, Ganglia, and their peripheral resources, including case studies for administering the cluster. The book ends with a series of appendices related to downloading, troubleshooting, configuration, strategies for dependency failures, other potential cluster file systems and Linux virtual clusters, and the Apache configuration file. Kopper provides plenty of examples with figures and possible configurations for the programs and subprograms mentioned.
I liked the IEEE definition of commodity cluster that Kopper transcribes at the beginning of his book. I think he strongly wants to adhere to standards, and that's the feeling that he conveys throughout. So, if you apply the information he provides, you will probably be on the right track for building a highly available, no-point-of-failure enterprise cluster.
The book includes a CD-ROM that has all the packages and figures I mentioned earlier. The CD-ROM works with Windows as well as Linux, an advantage for anyone who wants to start immediately. I highly recommend keeping in mind the warning that the software is provided "as is," so it might not render everything that you might expect. You can download the cluster software from the Internet, and Kopper provides an appendix that explains what to download and how.
Kopper offers a comprehensive, pedagogical book with basic, intermediate, and advanced tools for anyone who wants to build a fail-safe cluster. As the title implies, the book is meant for enterprises, not exactly for the scientific community, which perhaps would rather go for parallel clusters. I recommend this book mostly to those working within a company either already working under Linux or hoping to do so soon.
Arturo Ortiz-Tapia is a scientific researcher at the Mexican Petroleum Institute. Contact him at firstname.lastname@example.org.