Are Supercomputers Going to Reach the Exaflop per Second Barrier Soon?

By Vladimir Getov

By Vladimir Getov on

February 2, 2020

For almost all information-processing jobs, supercomputers are faster than anything else on the planet. Over the years, the community has been monitoring the very popular TOP500 rankings of supercomputers with the Supercomputing conference being one of the events for the semi-annual updates. The latest TOP500 list of supercomputers were announced at SC-19 in Denver by the team pictured below: (left to right) Martin Meuer, Michael Heroux, Wu Feng, Jack Dongarra, Horst Simon, and Erich Strohmaier.

SC19 Denver

Based on the High-Performance LINPACK (HPL) benchmark, the fastest supercomputers on the list have not changed since the June 2019 TOP500 ranking; however, the average speed across all 500 machines is now an astonishing 1.14 Petaflop/s. Leading the way is Oak Ridge National Laboratory's Summit system, which holds top honors with an HPL result of 148.6 Petaflop/s. In a rather distant second place at 94.6 Petaflop/s is another IBM machine: Lawrence Livermore National Laboratory's Sierra. Close behind at No. 3 is the Sunway TaihuLight supercomputer, with an HPL speed of 93.0 Petaflop/s. TaihuLight was developed by China's National Research Center of Parallel Computer Engineering and Technology (NRCPC) and is installed at the National Supercomputing Center in Wuxi.

While HPL is an excellent representative of the compute-bound codes, many new applications with very high economic potential — such as big data analytics, machine learning, real-time feature recognition, recommendation systems, and even physical simulations - have been emerging in the last 10-15 years. However, these codes typically feature irregular or dynamic solution grids and spend much more of their computation in non-floating-point operations such as address computations and comparisons, with memory addresses that are no longer regular or cache-friendly. The computational intensity of such programs is much lower dropping the efficiency of the floating-point units that have become the focal point of modern core architectures to only a few percent. This emergence of applications with data-intensive characteristics — e.g. with execution times dominated by data access and data movement — has been recognized recently as the “3rd Locality Wall” for advances in computer architecture.

To highlight these inefficiencies, a new evaluation code was introduced in 2014 called HPCG (High Performance Conjugate Gradient) benchmark. While HPCG does not represent the worst-case scenario, it has been widely accepted as a typical performance yardstick for memory-bound applications. With HPL as the representative of compute-bound codes and HPCG as the representative for memory-bound codes, there are readily available performance and energy results published twice per year since June 2014. We have further decided to use the average of the top 10 performance and energy results for each of these two benchmarks for consistency because of the very limited HPCG results in the early years of publicly available HPCG measurements.

HPL vs HPCG

The figure above shows a significant performance gap of nearly 2 orders of magnitude between HPL and HPCG results in the last several years. The increase of the average HPL performance since June 2016 is because of the introduction of the Chinese Sunway TaihuLight system. The most recent increase of both HPL and HPCG performance is visible since June 2018 after the installation of the Summit supercomputer at ORNL. An optimistic expectation here would be to observe that the gap keeps closing and then assess the rate of this progress. Unfortunately, we do not have any evidence that the observed performance gap is in fact closing to any degree. Thus, we can draw the conclusion that one of the main challenges ahead will be to significantly increase the performance for memory-bound codes with any future computing systems designed for this application domain. While reaching Exaflop/s performance with HPL will happen soon, it is equally clear that this achievement will leave this significant gap between dense and sparse system performance unchanged.

While at the top, the fastest of the fast competition has remained the same; it's a different story with the Green500 list. This is a list of the most energy-efficiency supercomputers that have transformed.

HPL vs HPCG Energy Efficiency

The current status of the energy efficiency challenge is shown on the figure above. The existing supercomputing designs appear to be able to scale up to 200 Petaflop/s while remaining within the recommended 20 MW system power consumption envelope. An optimistic estimate based on this would require five times improvements in energy efficiency, and seven times improvements in the HPL performance currently delivered by the Summit supercomputer. However, such improvements are not realistic, since the best energy efficiency results and rankings are different from the HPL ranking (see comments above about the top 10 ranked results). Therefore, a more realistic projection based on the current (end of 2019) Summit results is that one needs ten times energy efficiency improvement and ten times higher HPL performance to reach the Exaflop/s barrier. Unfortunately, this would only achieve the desired performance and energy efficiency for the computation of dense physical systems such as the HPL benchmark. Similar performance versus energy efficiency analysis and projections for sparse systems based on the HPCG results look much more pessimistic. Here the two orders of magnitude lower performance delivered for sparse systems by the current supercomputing architectures strongly impact the energy efficiency.

The latest TOP500 results demonstrate that the world-wide efforts to reach the Exascale processing speeds on a wide range of applications urgently need novel and innovative architectures that provide solutions resolving the 3rd Locality Wall challenge. This includes both novel memory systems and interconnection networks offering much higher bandwidth and lower latency. Energy efficiency indicators also need urgent improvements by at least an order of magnitude. Novel architecture proposals that address floating-point processing challenges can also be expected to have substantial impact, particularly for compute-bound applications.

Vladimir Getov

Vladimir Getov is a professor of distributed and high-performance computing (HPC) at the University of Westminster in London (UK) and Computer’s area editor for HPC. His research interests include parallel architectures and performance, energy-efficient computing, autonomous distributed systems, and HPC programming environments. Getov is the recipient of the IEEE Computer Society Golden Core Award and is a Senior Member of IEEE and ACM. Contact him at v.s.getov@westminster.ac.uk.