High-Performance Computing Adds Virtualization to the Mix
Experts consider cluster computing — linking groups of commodity, x86-based computers so that they can function like one high-performance machine — to be a way to democratize supercomputing.
The approach has made high-performance computing (HPC) more affordable and easier to implement for small and mid-sized companies than using traditional expensive, complex, single-machine supercomputers.
Virtualization has become a key new trend in cluster computing.
In traditional server virtualization, a single machine runs multiple operating systems. Businesses could thus put many applications on servers, even if they require different OSs. This lets companies utilize their hardware more efficiently.
Instead of letting single machines do the tasks of many, scale-up HPC virtualization — also called aggregation — efficiently binds many machines so that they can function as a virtual supercomputer,
"You can manage [work] with fewer human resources because the virtualization takes care of the lot of the things you used to worry about," said Mike Kahn, managing director of the Clipper Group consultancy.
Vendors are starting to sell HPC virtualization products, and companies are beginning to implement the approach. However, the technology is relatively new and still faces a number of challenges.
Control Data Corp., sparked by HPC pioneer Seymour Cray, introduced some of the first supercomputers in the mid 1960s.
Early machines were based on large scalar processors that shared a single memory pool.
In the early 1990s vendors introduced new architectures based on massively parallel processing, which became the dominant HPC paradigm.
However, supercomputers were expensive and required more IT expertise than most companies possessed.
In 1994, NASA researchers clustered commodity components to create their Beowulf HPC machine.
This turned out to be much less expensive than traditional big-iron machines, explained Bob Quinn, chief technical officer at virtualization vendor 3Leaf Systems.
In cluster computing, each participating computer runs its own operating system. A job-scheduling manager handles systemwide tasks.
Today, two-thirds of supercomputers are built via clusters, noted Steve Conway, HPC research vice president at IDC, a market research firm.
Driving HPC Virtualization
HPC virtualization offers advantages other than saving money on supercomputing.
The scale-up approach offers better performance than traditional cluster computing, particularly for applications that require large shared memory. With cluster systems, participating computers use their own memory, and thus aren't effective for applications that require shared memory.
Scale-up HPC virtualization also offers lower cost; and less programming complexity than standard cluster systems.
Cluster systems have other problems that HPC virtualization addresses.
For example, the cluster infrastructure's installation, management, and I/O requirements are complex because the network manager must configure each machine to work with the entire cluster, said Shai Fultheim, CEO of HPC-virtualization vendor ScaleMP,
Cluster systems' parallel-computing programming model is also complex.
To maximize their effectiveness, cluster systems require load-balancing and distributed resource management, which must be provided manually by a programmer, rather than automatically by the OS.
Better support in commodity AMD and Intel x86 chips for virtualization has helped drive HPC virtualization.
For example, the new chips have dedicated hardware extensions that provide virtualization of critical subsystems such as the memory and I/O subsystem. This lets the processors connect memory and I/O resources more easily and with less performance overhead.
There are two types of HPC virtualization.
With scale-up HPC-virtualization, a single hypervisor — the software that allocates a host machine's resources to each virtualized operating system or to each program running on a virtualized OS — runs across multiple computers. This aggregates multiple CPUs and memory systems and makes them appear as a single computer.
The hypervisor more efficiently handles the work that the job scheduling manager performs in traditional cluster systems.
Scale-up systems also dynamically provision workloads across the virtual supercomputer, which makes it easier to manage.
Moreover, the systems enable the OS to automatically handle the load-balancing and distributed resource management required to maximize cluster systems' effectiveness. Programmers must manually write code to perform this work in traditional cluster systems.
Networking improvements have overcome the low bandwidth and high latency of Ethernet technologies used in early virtualization approaches, which made binding multiple machines into a virtual supercomputer difficult.
Scale-up virtualization is best for applications that require a lot of memory.
Cray is combining its Cx1 supercomputer hardware with ScaleMP's vSMP software, which would let a single OS run up to 128 x86 cores as a single scale-up virtual HPC.
3LeafSystem's Distributed Virtual Machine Monitor enables the Red Hat Linux OS to run across multiple x86 servers, which could combine to form a scale-up virtual HPC.
In this approach, a VM and hypervisor run on every processor core in each machine in the virtual supercomputer.
A grid-management tool, rather than a single hypervisor as in scale-up HPC virtualization, manages the overall system.
This enables the system to allocate resources with granularity, said Gary Tyreman, senior vice president for products and alliances at cloud-computing vendor Univa. Thus, the virtual supercomputer can more easily start and stop individual applications to let more important processes run when necessary.
Scale-out virtualization is best for processor-intensive applications.
Dell, Oracle, and Univa operate a grid, using Univa’s UniCloud VM management software, that offers scale-out virtualization services.
With scale-up HPC virtualization, organizations manage a single logical system, rather than each machine in a complex cluster. There is no need for cluster file systems, cluster interconnect issues, application provisioning, and the installation and update of multiple operating systems and applications. However, improved cluster-management tools are starting to address these issues without the need for virtualization.
HPC virtualization enables more application uptime and less power consumption by making it easier for systems to effectively manage resources and programs.
The approach also lets developers create a simulated large-cluster system for testing and demonstration purposes before allowing it to access a real, full-scale computing resource, said Northwestern University associate professor Peter Dinda.
HPC environments require ultra high speed message-passing, which many of today's virtualization approaches don't provide because they use lower-bandwidth, higher-latency Ethernet or InfiniBand connections between participating computers.
The need for virtualization systems to run a hypervisor can rob them of some performance, which is at a premium in HPC environments. This is an issue when processes running on multiple VMs must exchange information, said Tyreman.
HPC virtualization is best suited for applications that require a minimum amount of communication between nodes, because of lower bandwidth and higher latency within the virtual supercomputer, said Tyreman.
For the same reason, he added, the approach is more suitable for CPU- and memory-intensive applications — such as the analysis of a gene sequence or massive server log — than for those that are I/O-intensive.
Researchers are discussing adding more instrumentation to HPC virtualization systems to provide a clearer view of where CPU overhead is occurring within the environment.
According to ScaleMP's Fultheim, HPC virtualization will foster the development of an ecosystem for on-demand cloud-based supercomputing services.
On the other hand, Gordon Haff, senior analyst with market-research firm Illuminata, said HPC virtualization faces an uncertain future.
"The big question," he said, "is to what degree this approach simplifies [processes] while still yielding good performance. Whether the technology takes off depends on this."
George Lawton is a freelance technology writer based in Monte Rio, California. Contact him at firstname.lastname@example.org.