Smith is being recognized for her work in developing a novel dynamic re-routing algorithm on fat-tree interconnects—an algorithm that has resulted in significant performance improvements in multi-job HPC workloads. Fernando is being recognized for his work in developing computationally optimal parallel, high-performance algorithms that are scalable on modern heterogeneous architectures, for applications in relativity, geosciences, and CFD. The Fellowships are jointly presented by ACM and the IEEE Computer Society.
Fernando’s research is focused primarily on developing computational algorithms for the numerical solution of large-scale partial differential equations. New discoveries in science and engineering are primarily driven by computer simulations instead of physical experiments. In many cases, such as gravitational-wave (GW) astronomy, physical experiments are impossible. In the modern computational era, while computing resources have grown exponentially, they have also become increasingly complex with ever-increasing heterogeneity and fine-grain parallelism, making their use by domain-scientists increasingly difficult.
Fernando’s work is to develop algorithms and computational codes that enable the effective use of modern supercomputers by domain scientists. The key objectives are ease-of-use by domain scientists (by using symbolical interfaces and automatic code generation), portability (ability to use across different architectures), performance (efficient use of computing resources) and scalability (ability to solve larger problems on next-generation machines).
In Fernando’s current research, the main driving application has been computational relativity and GW astronomy, but the contributions of his research are fundamental and have also had a significant impact on other areas such as Computational Fluid Dynamics (CFD).
Smith’s work on network contention has demonstrated the extent of the problem on modern HPC systems and given new insight into the underlying network conditions that lead to performance degradation. Additionally, she has developed a technique to dynamically re-route traffic on fat-tree based interconnects when significant contention is occurring or is about to occur. For these pieces of work, Smith and her collaborators have studied two different HPC network architectures.
First, they performed controlled experiments to measure and analyze the effects of network contention on both production and synthetic applications on an InfiniBand (IB) based fat-tree installation and subsequently designed a dynamic re-routing approach, called Adaptive Flow-Aware Routing (AFAR) to reduce network contention when it occurs. AFAR achieves up to a 46% improvement in job runtime when compared to default routing, with median improvements of 13-25% for jobs that are sensitive to interference.
Second, they performed an empirical study of the effects of network contention on a scientific application running in production mode on a Cray XC30 dragonfly installation, in which the performance variability of a quantum chromodynamics code, MILC, was measured when run alongside other users’ jobs, showing that the variability was due to communication performance, not computational performance. They used machine learning to demonstrate a strong correlation between network congestion and application performance. Currently, Smith is developing scheduling techniques for assigning dedicated network resources to jobs that will eliminate network contention between jobs.
The ACM/IEEE-CS George Michael Memorial HPC Fellowship is endowed in memory of George Michael, one of the founding fathers of the SC Conference series. The fellowship honors exceptional PhD students throughout the world whose research focus is on high-performance computing applications, networking, storage or large-scale data analytics using the most powerful computers that are currently available. The fellowship includes a $5,000 honorarium and travel expenses to attend SC19 in Denver, Colorado, November 17-22, 2019, where the fellowships will be formally presented.
About IEEE Computer Society
The IEEE Computer Society is the world’s home for computer science, engineering, and technology. A global leader in providing access to computer science research, analysis, and information, the IEEE Computer Society offers a comprehensive array of unmatched products, services, and opportunities for individuals at all stages of their professional career. Known as the premier organization that empowers the people who drive technology, its unparalleled resources include membership, international conferences, peer-reviewed publications, a unique digital library, standards, and training programs. Visit www.computer.org for more information.