The Community for Technology Leaders
Green Image
<p><b>Abstract</b>—Parallel I/O systems typically consist of individual processors, communication networks, and a large number of disks. Managing and utilizing these resources to meet performance, portability, and usability goals of high-performance scientific applications has become a significant challenge. For scientists, the problem is exacerbated by the need to retune the I/O portion of their code for each supercomputer platform where they obtain access. We believe that a parallel I/O system that automatically selects efficient I/O plans for user applications is a solution to this problem. In this paper, we present such an approach for scientific applications performing collective I/O requests on multidimensional arrays. Under our approach, an optimization engine in a parallel I/O system selects high-quality I/O plans without human intervention, based on a description of the application I/O requests and the system configuration. To validate our hypothesis, we have built an optimizer that uses rule-based and randomized search-based algorithms to tune parameter settings in Panda, a parallel I/O library for multidimensional arrays. Our performance results obtained from an IBM SP using an out-of-core matrix multiplication application show that the Panda optimizer is able to select high-quality I/O plans and deliver high performance under a variety of system configurations with a small total optimization overhead.</p>
Parallel I/O, performance modeling, automatic performance optimization, simulated annealing.

M. Winslett and Y. Chen, "Automated Tuning of Parallel I/O Systems: An Approach to Portable I/O Performance for Scientific Applications," in IEEE Transactions on Software Engineering, vol. 26, no. , pp. 362-383, 2000.
88 ms
(Ver 3.3 (11022016))