August 2013 Theme: Specialized Hardware for Grid, HPC, and Cloud
Guest Editor's Introduction: Art Sedighi

The trend for the 90s and a large portion of the early 2000s was commodity hardware — items that were widely available and relatively inexpensive — with the x86 architecture dominating the market. Since then the trend has been toward specialized hardware, in which a hardware platform is designed and built for a specific purpose, as with supercomputers like the BlueGene. Yet, the tides are turning again as we see a move toward commodity hardware replacing specialized for several reasons:

  • Cost savings. Why should I pay for unnecessary extras — a PCI bridge in a high-performance computing (HPC) environment, for example?
  • Energy savings. Infrastructure expansion is no longer electrically feasible as datacenters are running out of power, not to mention space.
  • Hunger for information. Greed is the evil of all things. The faster processors and servers become, the more we demand and desire information and, as a result, analysis of data.

In short, we want faster, cheaper, and better than we had yesterday. This, in turn, is causing a shift in the market for specialized processors (with ARM, FPGAs, and GPUs) that are smaller, relatively less power hungry, and broadly applicable to smaller sets of specialized problems, such as those that are common in the grid, HPC, and cloud environments.

No Need for Exotic Hardware

With the widespread availability of commodity-level components, nearly anyone can now build a specialized hardware solution to meet a given application’s performance, cost, and power-consumption constraints. The three articles featured in this month’s Computing Now theme demonstrate this new trend that’s on the rise.

In “Energy and Cost-Efficiency Analysis of ARM-based Clusters,” Zhonghong Ou and his colleagues compare workloads running on x86 and ARM processors. Specifically, they weigh the cost-efficiency for 4×2-core A9 ARM versus 1×4 core Core2-Q9400. From a performance perspective, we can expect that the Core2 will beat the ARM processor, but from a cost and energy perspective, ARM outperforms. A typical ARM processor consumes about 1 to 5 watts of power compared to an x86 processor, which is rated at about 40 to 60 watts (over 100 watts if you add all the other peripherals, such as disk drives, and so on).

The authors calculate the energy efficiency as the ratio of computation to the energy consumed (performance/power) and thus present an apple-to-apple comparison of the two approaches. They then look at three different use cases:

  • an in-memory database using SQLite;
  • a Web application using Nginx and httpd; and
  • video transcoding using HD-VideoBench.

To compare, the authors calculate energy efficiency (EE) ratios as the number of ARM processors required to do the same amount of work as the number of x86 processors. The in-memory database’s EE ratio (# ARM / # Intel) was 2.6 – 9.5, which is just 2/3 the cost of the Intel-based infrastructure. The other applications show an EE ratio of >1, as well, but the in-memory database is significant to the grid and cloud-based infrastructure as it is often used for data management.

In “Evaluating Performance and Energy on ARM-based Clusters for High-Performance Computing,” Edson L. Padoin and his colleagues take the approach one step further by using ARM-based boards in building a very large HPC environment. For their work, the authors use off-the-shelf boards from (ARM A9 2-core) and (ARM A8). Both were running Ubuntu Linux with version 4.5 of the GNU Compiler Collection (GCC) compiler that supports version 7 of the ARM instruction set attribute (ISA).

Both boards were given two workloads: integer-only matrices and floating-point matrices of 1,000 × 1,000. The PandaBoard with the A9 chip and two cores did significantly better on both: 755 MFlops vs. 24 MFlops. Even with a higher clock rate and much higher performance, the A9 processor’s energy efficiency was 92 MFlops/Max Watt versus the A8’s 20 MFlops/Max Watt. The A9 thus showed that increased computational power doesn’t necessarily equate to increased power consumption.

In the final theme article, “On Achieving High Message Rates,” Holger Froning and his colleagues discuss their work in the Extoll project, in which they created specialized hardware to allow very high message rates compared to conventional network interface controller (NIC) cards. Grid and clustered environments perform only as well as the underlying messaging and communication infrastructure. Most large HPC systems, such as the Massively Parallel Processor (MPP) and others on the Top 500 list, have very tightly coupled and integrated network layers to aid with intraprocessor communication. As a result, they tend to perform better than clusters and grids, which are usually built from commodity and off-the-shelf hardware. MPPs are also very expensive compared to clusters built with commodity hardware.

Extoll is a 6-port customized NIC card that uses a field-programmable gate array (FPGA) to allow dynamic configuration and reconfiguration of a torus network. Extoll’s main characteristics are support for multicore environments by hardware-level virtualization communication engines for very low overhead and a minimized memory footprint. The virtualization of the underlying network protocol and layout makes integration into the application simple. The team has integrated the Extoll libraries into OpenMPI.

Froning and colleagues report that Extoll consistently outperformed an Infiniband Quad Data Rate 40 Gbit Ethernet and a 10 Gbit Ethernet network. That said, the authors are looking to rebuild the Extoll engine on an application-specific integrated circuit (ASIC) to get about 5 to 10 times the performance of the FPGA-based version.


This month’s theme focuses on ARM processors and FPGAs as they apply to Grid and Cloud environments. These articles showcase the fact that specialized computing hardware is no longer solely for big budget projects. ARM processors, FPGAs, and general purpose GPUs are becoming more abundant and cost-effective because of lower manufacturing costs for producing the hardware and the increasing availability of software tools that support specialized hardware. As the ecosystem for ARMs, FPGAs, and GPUs matures, so will their presence in mission-critical applications and production environments.

Art Sedighi is a freelance consultant focusing on large infrastructure design and implementation and a member of the Computing Now editorial board. He has an MS in computer science from Rensselaer Polytechnic Institute. His professional interests include scheduling and game theory. Contact him at or


Spanish    |    Chinese

Translations by Osvaldo Perez and Tiejun Huang