This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Automated Tuning of Parallel I/O Systems: An Approach to Portable I/O Performance for Scientific Applications
April 2000 (vol. 26 no. 4)
pp. 362-383

Abstract—Parallel I/O systems typically consist of individual processors, communication networks, and a large number of disks. Managing and utilizing these resources to meet performance, portability, and usability goals of high-performance scientific applications has become a significant challenge. For scientists, the problem is exacerbated by the need to retune the I/O portion of their code for each supercomputer platform where they obtain access. We believe that a parallel I/O system that automatically selects efficient I/O plans for user applications is a solution to this problem. In this paper, we present such an approach for scientific applications performing collective I/O requests on multidimensional arrays. Under our approach, an optimization engine in a parallel I/O system selects high-quality I/O plans without human intervention, based on a description of the application I/O requests and the system configuration. To validate our hypothesis, we have built an optimizer that uses rule-based and randomized search-based algorithms to tune parameter settings in Panda, a parallel I/O library for multidimensional arrays. Our performance results obtained from an IBM SP using an out-of-core matrix multiplication application show that the Panda optimizer is able to select high-quality I/O plans and deliver high performance under a variety of system configurations with a small total optimization overhead.

[1] A. Acharya, M. Uysal, R. Bennett, A. Mendelson, M. Beynon, J.K. Hollingsworth, J. Saltz, and A. Sussman, “Tuning the Performance of I/O Intensive Parallel Applications,” Proc. Fourth Ann. Workshop I/O in Parallel and Distributed Systems (IOPADS), May 1996.
[2] “Adaptive Simulated Annealing (ASA),” ftp. alumni.caltech.edu:/pub/ingberASA-shar , 1993.
[3] F.E. Bassow, IBM AIX Parallel I/O File System: Installation, Administration, and Use, IBM, Kingston, N.Y., Document Number SH34-6065-00, May 1995.
[4] K.P. Bennett, M.C. Ferris, and Y.E. Ioannidis, “A Genetic Algorithm for Database Query Optimization,” Technical Report CS-TR-91-1004, Univ. of Wisconsin at Madison, 1991.
[5] R. Bennett, K. Bryant, A. Sussman, R. Das, and J.S. Jovian, A Framework for Optimizing Parallel I/O Proc. Scalable Parallel Libraries Conf., 1994.
[6] P. Bodorik, J. Pyra, and J.S. Riordon, “Correcting Execution of Distributed Queries,” Proc. Second IEEE Int'l Symp. Databases in Parallel and Distributed Systems, pp. 192-201, 1990.
[7] R. Bordawekar, A. Choudhary, and J. Ramanujam, “Compilation and Communication Strategies for Out-of-Core Programs on Distributed-Memory Machines,” J. Parallel and Distributed Computing, vol. 38, no. 2, pp. 277-288, Nov. 1996.
[8] E. Borowsky, R. Golding, A. Merchant, E. Shriver, M. Spasojevic, and J. Wilkes, “Eliminating Storage Headaches through Self-Management,” Proc. Second Symp. Operating Systems Design and Implementation, 1996.
[9] P. Brezany, M. Gernt, P. Mehotra, and H. Zima, “Concurrent File Operations in High Performance FORTRAN,” Proc. Supercomputing '92, pp. 230-237, 1992.
[10] Y. Chen, “Automatic Parallel I/O Performance Optimization in Panda,” PhD thesis, Dept. of Computer Science, Univ. of Illi nois, Feb. 1998.
[11] Y. Chen, I. Foster, J. Nieplocha, and M. Winslett, “Optimizing Collective I/O Performance on Parallel Computers: A Multisystem Study,” Proc. 11th ACM Int'l Conf. Supercomputing, pp. 28-35, July 1997.
[12] Y. Chen, M. Winslett, Y. Cho, and S. Kuo, “Automatic Parallel I/O Performance Optimization in Panda,” Proc. 10th Ann. ACM Symp. Parallel Algorithms and Architectures, June 1998.
[13] Y. Chen, M. Winslett, Y. Cho, and S. Kuo, “Automatic Parallel I/O Performance Optimization Using Genetic Algorithms,” Proc. Seventh Int'l Symp. High Performance Distributed Computing, July 1998.
[14] Y. Chen, M. Winslett, Y. Cho, and S. Kuo, “Automatic Parallel I/O Performance Optimization Using Genetic Algorithms,” Proc. Seventh Int'l Symp. High Performance Distributed Computing, July 1998.
[15] Y. Chen, M. Winslett, S. Kuo, Y. Cho, M. Subramaniam, and K.E. Seamons, “Performance Modeling for the Panda Array I/O Library,” Proc. Supercomputing '96, Nov. 1996.
[16] Y. Chen, M. Winslett, K.E. Seamons, S. Kuo, Y. Cho, and M. Subramaniam, “Scalable Message Passing in Panda,” Proc. Fourth Workshop Input/Output in Parallel and Distributed Systems, pp. 109-121, May 1996.
[17] Y. Cho, M. Winslett, J. Lee, Y. Chen, S. Kuo, and K. Motukuri, “Collective I/O on a SGI CRAY Origin 2000: Strategy and Performance,” Proc. 1998 Int'l Conf. Parallel and Distributed Processing Techniques and Applications, July 1998.
[18] Y. Cho, M. Winslett, M. Subramaniam, Y. Chen, S. Kuo, and K.E. Seamons, “Exploiting Local Data in Parallel Array I/O on a Practical Network of Workstations,” Proc. Fifth Workshop Input/Output in Parallel and Distributed Systems, pp. 1-13, Nov. 1997.
[19] P.F. Corbett and D.G. Feitelson, “The Vesta Parallel File System,” ACM Trans. Computer Systems, vol. 14, no. 3, pp. 225-264, Aug. 1996.
[20] R. Epstein and M. Stonebraker, “Analysis of Distributed Database Processing Strategies,” Proc. Sixth Int'l Conf. Very Large Databases, pp. 92-101, 1980.
[21] D.G. Feitelson, P.F. Corbett, and J. Prost, “Performance of the Vesta Parallel File System,” Proc. Ninth Int'l Parallel Processing Symp., pp. 150-158, Apr. 1995.
[22] High Performance Fortran Forum, “High Performance Fortran Language Specification version 1.0,” Technical Report CRPC-TR92225, Rice Univ., Jan. 1993.
[23] D.E. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning. Reading, Mass.: Addison-Wesley, 1989.
[24] R. Golding, P. Bosch, C. Staelin, T. Sullivan, and J. Wilkes, “Idleness Is Not Sloth,” Proc. USENIX 1995 Technical Conf. UNIX and Advanced Computing Systems, pp. 201-212, Jan. 1995.
[25] R. Golding, E. Shriver, T. Sullivan, and J. Wilkes, “Attribute-Managed Storage,” Proc. Workshop Modeling and Specification of I/O, Oct. 1995.
[26] G. Graefe, "Query Evaluation Techniques for Large Databases," ACM Computing Surveys, vol. 25, no. 2, pp. 73-170, June 1993.
[27] J.H. Holland, Adaptation in Natural and Artificial Systems. Univ. of Michigan Press, 1975.
[28] J. Huber, C.L. Elford, D.A. Reed, A.A. Chien, and D.S. Blumenthal, “PPFS: A High Performance Portable Parallel File System,” Proc. Ninth ACM Int'l Conf. Supercomputing, pp. 385–394, July 1995.
[29] L. Ingber, “Simulated Annealing: Practice versus Theory,” Math. Computer Modelling, vol. 18, no. 11, pp. 29-57, 1993.
[30] Intel Corp., Paragon User's Guide, 1993.
[31] Y.E. Ioannidis and E. Wong,“Query optimization by simulated annealing,” Proc. ACM-SIGMOD Conf., pp. 9-22, 1987.
[32] S. Kirkpatrick, C.D. Gelatt Jr., and M.P. Vecchi, “Optimization by Simulated Annealing,” Science, vol. 220, no. 4,598, pp. 671-680, 1983.
[33] D. Kotz, “Disk-Directed I/O for MIMD Multiprocessors,” ACM Trans. Computer Systems, vol. 15, no. 1, pp. 41-74, Feb. 1997.
[34] D. Kotz and N. Nieuwejaar, “Dynamic File-Access Characteristics of a Production Parallel Scientific Workload,” Proc. Supercomputing '94, pp. 640–649, Nov. 1994.
[35] D. Kotz and N. Nieuwejaar, “Flexibility and Performance of Parallel File Systems,” ACM Operating Systems Review, vol. 30, no. 2, pp. 63-73, Apr. 1996.
[36] S. Kuo, M. Winslett, Y. Chen, Y. Cho, M. Subramaniam, and K. Seamons, “Application Experience with Panda,” Proc. Eighth SIAM Conf. Parallel Processing for Scientific Computing, Mar. 1997.
[37] T.M. Madhyasta, C.L. Elford, and D.A. Reed, “Optimizing Input/Output Using Adaptive File System Policies,” Proc. Fifth NASA Goddard Conf. Mass Storage Systems, pp. II:493-514, Sept. 1996
[38] T. Madhyastha and D.A. Reed, “Intelligent, Adaptive File System Policy Selection,” Proc. Frontiers '96, 1996.
[39] J. Masso, E. Seidel, and P. Walker, “Adaptative Mesh Refinement in Numerical Relativity,” Proc. Seventh Marcel Grossman Meeting on General Relativity, 1994.
[40] Message Passing Interface Forum, MPI: A Message-Passing Interface Standard, May 1994.
[41] Message Passing Interface Forum, MPI-2: Extensions to the Message-Passing Interface, 1997.
[42] J.A. Moore, P.J. Hatcher, and M.J. Quinn, “Efficient Data-Parallel Files via Automatic Mode Detection,” Proc. Fourth Workshop Input/Output in Parallel and Distributed Systems, pp. 1-14, May 1996.
[43] J.A. Moore and M.J. Quinn, “Enhancing Disk-Directed I/O for Fine-Grained Redistribution of File Data,” Parallel Computing, vol. 23, no. 4, 1997.
[44] The MPI-IO Committee, “MPI-IO: A Parallel File I/O Interface for MPI, Version 0.5,” Apr. 1996, http://lovelace.nas.nasa.gov/MPI-IOmpi-io-report.0.5.ps .
[45] J. Nieplocha and I. Foster, “Disk Resident Arrays: An Array-Oriented I/O Library for Out-of-Core Computations,” Proc. Frontiers '96 of Massively Parallel Computing Symp., Sept. 1996.
[46] N. Nieuwejaar and D. Kotz, “The Galley Parallel File System,” Parallel Computing, vol. 23, no. 4, pp. 447-476, 1997.
[47] R.H. Patterson, G. Gibson, E. Ginting, D. Stodolsky, and J. Zelenka, "Informed Prefetching and Caching," Proc. 15th ACM Symp. Operating Systems Principles, pp. 79-95, Dec. 1995.
[48] P. Pierce, “A Concurrent File System for a Highly Parallel Mass Storage System,” Proc. Fourth Conf. Hypercube Concurrent Computers and Applications, pp. 155-160, Mar. 1989.
[49] J.T. Poole, “Preliminary Survey of I/O Intensive Applications,” Technical Report CCSF-38, Scalable I/O Initiative, Caltech Concurrent Supercomputing Facilities, Caltech, 1994.
[50] A. Purakayastha, C.S. Ellis, and D. Kotz, “ENWRICH: A Compute-Processor Write Caching Scheme for Parallel File Systems,” Proc. Fourth Workshop Input/Output in Parallel and Distributed Systems, pp. 55-68, May 1996.
[51] R.H. Saavedra-Barrera, A.J. Smith, and E. Miya, “Machine Characterization Based on an Abstract High-Level Language Machine,” IEEE Trans. Computers, vol. 38, no. 12, pp. 1,659-1,679, Dec. 1989.
[52] K.E. Seamons, Y. Chen, P. Jones, J. Jozwiak, and M. Winslett, “Server-Directed Collective I/O in Panda,” Proc. Supercomputing '95, Dec. 1995.
[53] E. Shriver, “Performance Modeling for Realistic Storage Devices,” PhD thesis, Univ. of New York, 1997.
[54] A. Swami and A. Gupta,“Optimization of large join queries,” Proc. ACM-SIGMOD Conf., pp. 8-17, 1988.
[55] R. Thakur et al., "Passion: Optimized I/O for Parallel Application," Computer, June 1996, pp. 70-78.
[56] R. van de Geijn and J. Watts, “SUMMA: Scalable Universal Matrix MultiplicationAlgorithm,” Technical Report TR-95-13, Dept. of Computer Sciences, Univ. of Texas at Austin, Apr. 1995.
[57] S.J. Worley and A.J. Smith, “Microbenchmarking and Performance Prediction for Parallel Computers,” Technical Report CSD-95-873, Dept. of Computer Science, Univ. of California at Berkeley, 1995.

Index Terms:
Parallel I/O, performance modeling, automatic performance optimization, simulated annealing.
Citation:
Ying Chen, Marianne Winslett, "Automated Tuning of Parallel I/O Systems: An Approach to Portable I/O Performance for Scientific Applications," IEEE Transactions on Software Engineering, vol. 26, no. 4, pp. 362-383, April 2000, doi:10.1109/32.844494
Usage of this product signifies your acceptance of the Terms of Use.