Parallel and Distributed Processing Symposium, International (2006)
Rhodes Island, Greece
Apr. 25, 2006 to Apr. 29, 2006
J. Chame , Inf. Sci. Inst., Southern California Univ., Marina del Rey, CA, USA
Chun Chen , Inf. Sci. Inst., Southern California Univ., Marina del Rey, CA, USA
P. Diniz , Inf. Sci. Inst., Southern California Univ., Marina del Rey, CA, USA
M. Hall , Inf. Sci. Inst., Southern California Univ., Marina del Rey, CA, USA
Yoon-Ju Lee , Inf. Sci. Inst., Southern California Univ., Marina del Rey, CA, USA
R.F. Lucas , Inf. Sci. Inst., Southern California Univ., Marina del Rey, CA, USA
In this paper, we describe a compilation system that automates much of the process of performance tuning that is currently done manually by application programmers interested in high performance. Our approach combines compiler models and heuristics with guided empirical search to take advantage of their complementary strengths. The models and heuristics limit the search to a small number of candidate implementations, and the empirical results provide the most accurate information to the compiler to select among candidates and tune optimization parameter values. The overall approach can be employed to alleviate some of the performance problems that lead to inefficiencies in key applications today: register pressure, cache conflict misses, and the trade-off between synchronization, parallelism and locality in SMPs. The main focus of the paper is an algorithm for simultaneously optimizing across multiple levels of the memory hierarchy for dense-matrix computations. We have developed an initial compiler implementation, and present automatically-generated results on matrix multiply. Results on two architectures, SGI R10000 and Sun UltraSparc IIe, outperform the native compiler, and either outperform or achieve comparable performance as the ATLAS self-tuning library and the hand-tuned vendor BLAS library. This paper describes other components of the ECO system, including supporting tools and experiments with programmer-guided performance tuning. This approach has provided a foundation for a general framework for systematic optimization of domain-specific applications. Specifically, we are developing an optimization system for signal and image processing that exploits signal properties, and we are using machine learning and a knowledge-rich representation can be exploited to optimize molecular dynamics simulation.
empirical search, ECO project, compilation system, performance tuning, heuristics, optimization parameter, SMP, dense-matrix computations, compiler implementation, matrix multiplication, systematic optimization, domain-specific applacations, signal processing, image processing
P. Diniz, Yoon-Ju Lee, Chun Chen, R. Lucas, M. Hall and J. Chame, "An overview of the ECO project," Parallel and Distributed Processing Symposium, International(IPDPS), Rhodes Island, Greece, 2006, pp. 314.