
An interview with John Shalf, recipient of the 2025 Seymour Cray Award.
John Shalf is Department Head for Computer Science at Lawrence Berkeley National Laboratory and former Deputy Director for Hardware Technology on the U.S. Exascale Computing Project (ECP), whose leadership has shaped the architectural direction of high-performance computing in the United States.
We connected with Dr. Shalf to discuss the convergence of HPC and hyperscale, lessons from public-private collaboration, and the future of computing architectures in the post-exascale era.
As former Deputy Director for Hardware Technology on the DOE-led Exascale Computing Project (ECP), what were the key architectural and hardware innovations that emerged from the project, and how are they shaping current exascale systems?
In the years leading up to the ECP, there was a notion of a set of swim lanes representing different paths that could get us to our project performance goals. The three original swim lanes were manycore, wide-SIMD/vector, and GPU-accelerated. Eventually the SIMD merged with manycore to become multicore with SIMD, which is prevalent in today’s CPU products. GPU acceleration ultimately won in the US program whereas Japan adopted the SIMD/Multicore swim lane for their Fugaku system. Indeed, the next generation of systems in Japan called Fugaku-Next are also adopting a hybrid GPU/CPU architecture as well.
But there were also many important technologies that were initiated under the FastForward, DesignForward and Path Forward programs that have emerged in the mainstream (early versions of Arm SVE, NVLink for example) and continue to have impact in ways that we could never have predicted when we made the initial investments. There were also some investments that didn’t make it such as multiple processor-in-memory and dataflow-like ideas, but we need to up-our game with our collaboration with industry for the next generation of systems beyond exascale.
But the bigger picture that has emerged post exascale is that the architectures adopted by hyperscalers such as AWS, Microsoft, Meta, and Google are becoming indistinguishable from the GPU-accelerated systems we are deploying for HPC. There was some movement towards convergence of HPC and hyperscale leading up to the first exascale system, but I had no idea the convergence to liquid cooled GPU-accelerated systems (even interconnects) would happen so quickly. The Microsoft Azure “Eagle” system is an example of that convergence, and has remained near the top of the Top500 list of fastest supercomputers in the world even though its real purpose is more in the AI space.
Your co-authored report, “The Landscape of Parallel Computing Research: A View from Berkeley,” has become a foundational reference in HPC. How do you assess the evolution of parallel computing paradigms since its publication, particularly in light of heterogeneous architectures and domain-specific accelerators?
We are approaching the 20th anniversary of that report, which was released in December of 2006. We are in fact in the process of writing a retrospective with David Patterson and many of the original authors to re-assess our predictions. That was several years of weekly meetings including many external speakers to assemble all of the information that was packed into that report.
As one of our coauthors was fond of quoting Mike Tomlin who said, “you should not be in the business of making predictions if you are afraid of being wrong.” Indeed there were a lot of bold and provocative tidbits of Old Conventional Wisdom versus New Conventional Wisdom that have really stood the test of time. If there is anything that we underestimated it was the magnitude of the explosion of parallelism at every level of the system stack: instruction level, chip level, board, rack, pod, and datacenter level with tens of thousands of many-threaded GPUs used to train large language models. But the amazing growth of applications like AI and LLMs were far beyond anything we envisioned twenty years ago.
Phil Colella’s “seven dwarves” of scientific computing algorithms – which eventually generalized to the “13 motifs” as we broadened the survey of algorithm patterns – has been extremely influential on my research. The observation was that there are a finite number of design patterns or abstractions for parallelism that we build algorithms around. Consequently there are also a finite number of patterns for the hardware accelerators that could target those algorithm design patterns. After that many books were written about parallel design patterns. But the bottom line is that it took a potentially infinite design space and brought it down to a finite design space that was more tractable. And I still rely on that notion of design space contraction and parameterization today.
As the 2024–2027 IEEE Electronics Packaging Society Distinguished Lecturer, you’re engaging with the intersection of HPC and electronics packaging. What are the most promising developments in packaging technologies that could impact future supercomputing performance and scalability?
I was asked to guest lecture for a few advanced packaging meetings when the ECP was starting up and it was very eye opening. I wondered initially why the microelectronics industry would dedicate entire conferences to building boxes for their products, but quickly learned the role such technologies are playing and will play in the future of computing. This was right when advanced packaging technologies were rapidly moving from being a side-show to the main event when it came to the future of electronics. Certainly, advanced packaging technologies such as chiplets and 3D stacking were instrumental in delivering all of the US-based Exascale supercomputers along with the technologies for the exploding AI and hyperscale industries. But the really big opportunities are yet to come where we leverage the disaggregation and modularity afforded by chiplets to create a multi-vendor environment for rapid and cost effective specialization of systems as one of the pillars for future energy efficient systems. In this way the new “motherboard” is the package. This is all being led by the hyperscalers and organizations such as the OCP that create an open forum for those negotiations.
With advanced packaging, we have the opportunity to play for new value propositions in the service of science and for the future of HPC. For the past few decades, our primary play for HPC value has been scale, but at this point scale is just higher capital cost and operations cost. Advanced packaging and targeted specializations enable us to deliver meaningful performance growth for science for the coming decades. And this is only possible if we work together with our colleagues in the hyperscale space to share and benefit from the really powerful supply chain they are creating to supply their datacenters so that we can focus on the specializations – algorithms, software and hardware – that really matter for science.
John Shalf will receive hisaward at SC 2025.