Performance Modeling and Optimal Block Size Selection for a BLAS-3 Based Tridiagonalization Algorithm
High Performance Computing and Grid in Asia Pacific Region, International Conference on (2005)
Nov. 30, 2005 to Dec. 3, 2005
Yusaku Yamamoto , Nagoya University, Japan
We construct a performance model for Bischof \xi Wu's tridiagonalization algorithm that is fully based on the level-3 BLAS. The model has a hierarchical struc- ture, which reflects the hierarchical structure of the original algorithm, and given the matrix size, the two block sizes and the performance data of the underlying BLAS routines, predicts the execution time of the algo- rithm. Experiments on the Opteron and Alpha 21264A processors show that the model is quite accurate and can predict the performance of the algorithm for ma trix sizes from 1920 to 7680 and for various block sizes with relative errors below 10%. The model will serve as a key component of an automatic tuned library that selects the optimal block sizes itself It can also be used in a Grid environment to help the user find which of the available machines to use to solve his/her problem in the shortest time.
Y. Yamamoto, "Performance Modeling and Optimal Block Size Selection for a BLAS-3 Based Tridiagonalization Algorithm," High Performance Computing and Grid in Asia Pacific Region, International Conference on(HPCASIA), Beijing, China, 2005, pp. 249-256.