Pages: pp. 321
Richard Oehler, AMD
The problem of finding parallelism in application code and exploiting it through automatic tools has long been recognized as the Holy Grail of high-performance computing (HPC). In the late 1970s, many academic institutions and government laboratories spent considerable time and energy investigating issues and challenges in this area. Although they found some success in parallelizing Fortran compilers and compiler front ends, and a little more success in parallelizing high-performance math libraries, the general problem still remains. There are no general-purpose tools that can parallelize run-of-the-mill commercial code, and writing (or rewriting) commercial code to exploit parallelization is very difficult and prone to error.
By the early 1990s, it became clear that the introduction of chip-level multiprocessing (CMP) would put renewed emphasis on multithreading and multiprocessing capabilities. By the early 2000s, it was also clear that CPU single-thread performance scaling was at an end, owing to ever-increasing power and heat issues and complexity issues. No longer could the processor chip providers compete only in terms of frequency (the fastest single thread) to achieve performance leadership. They would need to rely on total parallel processing, including multithreading performance, to win the performance race.
This new direction to achieve application performance leadership has brought the underlying unsolved parallelization problem to the top of the list of problems that industry must solve if we are to continue scaling commercial application performance. Many researchers in academia and industry have begun to work on this problem. The interest in solving this problem is far broader and recognized as critical more than ever before.
To solve this problem, the industry needs coordinated activity by academia and industry. Several organizations have such charters, and at least one has embraced concurrency as a long-term research agenda. In 2006, the Gigascale Systems Research Center (GSRC) refocused its agenda to include four themes and a driver. The one most interesting to me was Design Technologies for Concurrent Systems, led by Wen-mei Hwu. "The Concurrency Challenge" article in this issue of IEEE Design & Test describes the multithreading problem, discusses how the solution space is being analyzed, and predicts where breakthroughs are likely to occur.
From my perspective, success in this endeavor requires major industry participation. Participation is required not only among processor chip or hardware system developers; it must also include operating system and tool providers and, most important, the application writers and developers who will engage and in many ways drive toward acceptable solutions.
At AMD, we must be prepared to try new low-overhead approaches to solve the major hardware issues of synchronization and control over threads, concurrency and locking protocols, debugging, and performance-measuring assists. We need to provide the vehicles on which industry and academia can try new software approaches to concurrency. Finally, we must be prepared to share our experiences so that both industry and academia can reach a common understanding of the necessary hardware and software components that will solve the multithreading and concurrency problems.
Richard Oehleris Server CTO and Corporate Fellow at AMD. Contact him at email@example.com.