Parallel and Distributed Processing Symposium, International (2009)
May 23, 2009 to May 29, 2009
Ananta Tiwari , University of Maryland, Department of Computer Science, College Park, 20740 USA
Chun Chen , University of Utah, School of Computing, Salt Lake City, 84112 USA
Jacqueline Chame , University of Southern California, Information Sciences Institute, Marina del Ray, 90292 USA
Mary Hall , University of Utah, School of Computing, Salt Lake City, 84112 USA
Jeffrey K. Hollingsworth , University of Maryland, Department of Computer Science, College Park, 20740 USA
We describe a scalable and general-purpose framework for auto-tuning compiler-generated code. We combine Active Harmony's parallel search backend with the CHiLL compiler transformation framework to generate in parallel a set of alternative implementations of computation kernels and automatically select the one with the best-performing implementation. The resulting system achieves performance of compiler-generated code comparable to the fully automated version of the ATLAS library for the tested kernels. Performance for various kernels is 1.4 to 3.6 times faster than the native Intel compiler without search. Our search algorithm simultaneously evaluates different combinations of compiler optimizations and converges to solutions in only a few tens of search-steps.
A. Tiwari, M. Hall, J. Chame, J. K. Hollingsworth and C. Chen, "A scalable auto-tuning framework for compiler optimization," 2009 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), Rome, 2009, pp. 1-12.