Pages: pp. 4
Abstract—In this column, CiSE's editor in chief proposes a new benchmark for high-performance computing—this on based on flops per joule.
Keywords—High-performance computing, benchmarks, computational efficiency, scientific computing
THE MOST RECENT TOP500 LIST (WWW.TOP500.ORG/LISTS/2011/06/PRESS-RELEASE) WAS RECENTLY PUBLISHED ALONG WITH THE USUAL FAN-FARE, INCLUDING FEATURE ARTICLES IN SEVERAL INTERNATIONAL NEWS MEDIA. ACCORDING TO WIKIPEDIA, "THE TOP500 PROJECT RANKS AND DETAILS
the 500 (non-distributed) most powerful known computer systems in the world." Here, the loaded term "most powerful" means time to compute a floating-point solution of a dense system of linear equations. The world of high-performance computing is indebted to Jack Dongarra and the creators of the Linpack software for supplying at least one clearly understood and easy-to-specify metric for measuring performance.
Solving linear equations lies at the heart of many, many scientific computations. But solving Ax = b is not the only thing needed in scientific computing. Other things that easily come to mind include doing a depth-first search of a very large graph; sorting a long, long list of numbers; and finding a specific item in a huge multidimensional database. Many observers have grumbled about the absence of such tests for years but, so far as I know, none of them have been able to formulate a precise test that is both easy to state and understand and highly portable. The focus and discipline of the numerical computing/engineering community is required to do something like this.
Other observers point out, quite correctly I think, that things such as programming language expressiveness and compiler quality might be more important to productivity than raw speed. How one measures such things is a deep mystery. And the Linpack benchmark does in fact measure something—probably a combi-nation of switching speeds, component architecture, and the skill of compiler writers. In this regard, it's instructive to note how closely performance on the benchmark tracks Moore's law. One wonders if this is an indication of the benchmark's accuracy or if it means that chip designers are subtly but strongly influenced by the benchmark's existence.
In principle, it might be possible to assemble a machine with several times the speed of the current "most power-ful" one: just buy 10 or 15 times as many processors and hire a few hundred expert system architects, superb programmers, and computational scientists—plus a top-notch management team—and get to work. Naturally this would require some up-front capital outlay. But that's not a showstopper. It would also be necessary to supply electrical power to run the machine. The power can't be in the form of one lightening bolt, as in the Frankenstein movies, but rather must be expressed in units of energy, such as the watt second or the joule (work to move an electric charge of one coulomb through an electrical potential difference of one volt). No one can afford to shut down a major city to place first in the Top 500 rankings.
In general, the purpose of a computation is to reduce entropy—that is, to summarize the data contained in numerous bits into fewer and increasingly understandable bits. Thus an energy unit, such as the joule, is as important as flops per second. Assuming the data is available, it would be extremely instructive to plot the Top 500 machines as points on an x-y grid, with the x-axis being flops per second and the y-axis being joules per flop.
I thank Francis Sullivan for his interesting insights into this subject.