Novel Architectures and Accelerators
In late 2008, I had the pleasure of working with Steve Gottleib and Volodymyr Kindratenko on a special issue of Computing in Science and Engineering magazine. Entitled “Novel Architectures,” the issue discussed how to incorporate accelerators into one’s arsenal of programming techniques. Having spent most of my life working on more “conventional” multiprocessing/multithreading techniques for concurrent, parallel, and distributed computing applications, it was a new area to me at the time, although the techniques were largely familiar from my early days in parallel computing. Many computers I used as a graduate student in the late 1980s (systems from Cray and Thinking Machines, among others) used the vector and data-parallel models.
Today, the computing community is evaluating several types of accelerators, most notably field-programmable gate arrays (FPGAs), graphics processing units for general-purpose computations (GPGPUs), Sony-Toshiba-IBM’s Cell Broadband Engine, and the ClearSpeed attached processor. Several high-performance computing vendors offer systems that include accelerators as an integral part of their product lines—for example, the SGI reconfigurable application-specific computing architecture uses FPGAs, the Cray XT5h uses vector processors and FPGA accelerators, and IBM’s hybrid system architecture uses a PowerXCell as a coprocessor. Several Beowulf PC clusters are reportedly outfitted with FPGAs, ClearSpeed, and GPGPU accelerators.
The jury is still out on which of these computational accelerator technologies will dominate the field, because each one brings to the table a different mix of benefits and challenges. However, it’s undeniable that since we published our special issue, GPGPU computing has made a very strong push forward, owing largely to the commodity nature of graphics chipsets (available from NVIDIA). It’s possible to start writing your own GPGPU programs using any Windows or Linux PC outfitted with an appropriate NVIDIA-based graphics card, not to mention the MacBook line, which has been employing NVIDIA graphics for a while now.
Since our 2008 special issue, it has become possible for anyone to build a “novel computer.” In my case, I just finished building a small form factor computer that includes the latest Intel i7 Core 2 Quad processor and an NVIDIA Fermi GPU with up to 16GB RAM. (You can buy the parts and build your own; install time is under 40 minutes.) In addition, I’m using my computer to develop sensing applications using the uber-cool Phidgets kit. The vision of the desktop (or laptop) supercomputer is truly coming to life and can be had for a small fortune of just under US$1300.
This Computing Now theme starts out with a few articles from the 2008 special issue of CiSE, for which we chose articles representing three popular alternatives for accelerators: GPGPUs, FPGAs, and cell processors. I’ve also included three articles that have appeared in other IEEE Computer Society publications since then. It’s impossible for six articles to do complete justice to this topic. So, I’ve chosen articles that are likely to be of interest to the widest possible audience, including those who are completely new to the topic, as I was just a precious few years ago.
“Moving Scientific Codes to Multicore Multiprocessor CPUs” (login is required to access the full text of this article) describes a restructuring method for implementing numerical algorithms for scientific computing that can help them run efficiently on the IBM Cell processor and other multicore CPUs.
“Computing Models for FPGA-Based Accelerators” (login is required to access the full text of this article) describes the critical phase of FPGA application development, which involves finding and mapping to an appropriate computing model. These models differ from those generally used in programming existing commodity CPUs. For example, whereas parallel computing models are often based on thread execution and interaction, FPGA computing can exploit more degrees of freedom than are available in software (such as fine-grained parallelism and communication).
The Quantum Chromodynamics Parallel Computing on the Cell Broadband Engine project is developing a massively parallel, scalable supercomputer for applications in lattice quantum chromodynamics (QCD). This is an example of how accelerators are increasingly becoming a part of conventional cluster/supercomputer designs, especially for specialist computations with high-throughput computing needs such as QCD. Learn about this project in the article “QPACE: Quantum Chromodynamics Parallel Computing on the Cell Broadband Engine.” (login is required to access the full text of this article)
A recent article in IEEE Software, “Joint Forces: From Multithreaded Programming to GPU Computing,” (login is required to access the full text of this article) considers the need for best CPU-GPU software development practices. The authors argue that we must not only consider new programming models but also be well-versed in the (parallel) methods required to achieve true performance gains.
“The GPU Computing Era” (login is required to access the full text of this article) from IEEE Micro describes how GPU computing is now at a tipping point. It’s being employed in demanding consumer applications and high-performance computing alike.
Numerous conference papers explore the use of GPGPUs, FPGAs, and cell processors. In “Parallel Option Pricing wth BSDE Method on GPU,” (login is required to access the full text of this article) we see a practical example of the use of accelerators in option pricing (an area outside of computational science that has always used parallel methods, mostly in secret).
You might also be interested in the full version of the original guest editors’ introduction for our “Novel Architectures” issue of CiSE. And for an excellent primer on GPU programming, check out “Getting Started with GPU Programming” (login is required to access the full text of this article).
Quick Look at Nvidia CUDA 3.2 Toolkit
This video is aimed at helping you get started with the Nvidia CUDA 3.2 toolkit. If you own a computer with a recent Nvidia graphics card, you can run parallel programs as if you owned your own private supercomputer, even if you have no immediate interest in writing code for this platform. In this video, I show how to get started with the toolkit and run the basic bandwidth test and n-body simulation (of interest in astrophysics).
Download Video (.flv)
A Look at Shuttle Multicore/GPU Development System
These clips show the Nvidia GPU being used to run the same n-body simulation (as the previous video) and a quick look inside the barebones computer that I built using the parts on the NewEgg wish list linked to in the main article.
Download Part 1 (.flv)
Download Part 2 (.flv)