The Community for Technology Leaders
2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (2012)
Shanghai, China China
May 21, 2012 to May 25, 2012
ISBN: 978-1-4673-0974-5
pp: 487-496
Scientific libraries are written in a general way in anticipation of a variety of use cases that reduce optimization opportunities. Significant performance gains can be achieved by specializing library code to its execution context: the application in which it is invoked, the input data set used, the architectural platform and its backend compiler. Such specialization is not typically done because it is time consuming, leads to nonportable code and requires performance-tuning expertise that application scientists may not have. Tool support for library specialization in the above context could potentially reduce the extensive understanding required while significantly improving performance, code reuse and portability. In this work, we study the performance gains achieved by specializing the single processor sparse linear algebra functions in PETSc (Portable, Extensible Toolkit for Scientific Computation) in the context of three scalable scientific applications on the Hopper Cray XE6 Supercomputer at NERSC. We use CHiLL (Compos able High-Level Loop Transformation Framework) to apply source level transformations tailored to the special needs of sparse computations and automatically generate highly optimized PETSc functions. We demonstrate significant performance improvements of more than 1.8X on the library functions and overall gains of 9 to 24% on three scalable applications that use PETSc's sparse matrix capabilities.
Sparse matrices, Libraries, Optimization, Context, Linear algebra, Computer architecture, Performance gain, Code Specialization, PETSc, Compilers for High Performance Computing, Scientific Computing, Code Optimization, Loop transformations

S. Ramalingam, M. Hall and C. Chen, "Improving High-Performance Sparse Libraries Using Compiler-Assisted Specialization: A PETSc Case Study," 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum(IPDPSW), Shanghai, China China, 2012, pp. 487-496.
171 ms
(Ver 3.3 (11022016))