This Article 
 Bibliographic References 
 Add to: 
December 2002 (vol. 13 no. 12)
pp. 1303-1319

Abstract—Many large scale applications have significant I/O requirements as well as computational and memory requirements. Unfortunately, the limited number of I/O nodes provided in a typical configuration of the modern message-passing distributed-memory architectures such as Intel Paragon and IBM SP-2 limits the I/O performance of these applications severely. In this paper, we examine some software optimization techniques and evaluate their effects in five different I/O-intensive codes from both small and large application domains. Our goals in this study are twofold. First, we want to understand the behavior of large-scale data-intensive applications and the impact of I/O subsystems on their performance and vice versa. Second, and more importantly, we strive to determine the solutions for improving the applications' performance by a mix of software techniques. Our results reveal that different applications can benefit from different optimizations. For example, we found that some applications benefit from file layout optimizations whereas others take advantage of collective I/O. A combination of architectural and software solutions is normally needed to obtain good I/O performance. For example, we show that with a limited number of I/O resources, it is possible to obtain good performance by using appropriate software optimizations. We also show that beyond a certain level, imbalance in the architecture results in performance degradation even when using optimized software, thereby indicating the necessity of an increase in I/O resources.

[1] M. Arunachalam, A. Choudhary, and B. Rullman, “A Prefetching Prototype for the Parallel File System on the Paragon,” Proc. Joint Int'l Conf. Measurement and Modeling of Computer Systems, ACM Sigmetrics '95/Performance '95, May 1995.
[2] R. Bennett, K. Bryant, A. Sussman, R. Das, and J. Saltz, “Jovian: A Framework for Optimizing Parallel I/O,” Proc. 1994 Scalable Parallel Libraries Conf., 1994.
[3] R. Bordawekar, A. Choudhary, K. Kennedy, C. Koelbel, and M. Paleczny, “A Model and Compilation Strategy for Out-of-Core Data-Parallel Programs,” Proc. Fifth ACM Symp. Principles and Practice of Parallel Programming, July 1995.
[4] R. Bordawekar, A. Choudhary, and J. Ramanujam, “Automatic Optimization of Communication in Out-of-Core Stencil Codes,” Proc. 10th ACM Int'l Conf. Supercomputing, pp. 366-373, May 1996.
[5] P. Brezany, T. Mueck, and E. Schikuta, “Language, Compiler and Parallel Database Support for I/O Intensive Applications,” Proc. High Performance Computing and Networking Conf., 1995.
[6] P.H. Carns, W.B. Ligon III, R.B. Ross, and R. Thakur, “PVFS: A Parallel File System for Linux Clusters,” Preprint ANL/MCS-P804-0400, submitted to the 2000 Extreme Linux Workshop April 2000.
[7] Y. Chen, J. Plank, and K. Li, “CLIP: A Check-Pointing Tool for Message-Passing Parallel Programs,” Proc. Supercomputing '97, 1997.
[8] A. Choudhary, R. Bordawekar, S. More, K. Sivaram, and R. Thakur, “The PASSION Runtime Library for the Intel Paragon,” Proc. Intel Supercomputer User's Group Conf., June 1995.
[9] P. Corbett, D. Feitelson, S. Fineberg, Y. Hsu, B. Nitzberg, J. Prost, M. Snir, B. Traversat, and P. Wong, “Overview of the MPIIO Parallel I/O Interface,” Proc. Third Workshop I/O in Parallel and Distributed Systems, Apr. 1995.
[10] P. Crandall, R. Aydt, A. Chien, and D. Reed, “Input/Output Characteristics of Scalable Parallel Applications,” Proc. Supercomputing '95, 1995.
[11] P. Corbett, D. Feitelson, J. Prost, G. Almasi, S. Baylor, A. Bolmarcich, Y. Hsu, J. Satran, M. Snir, R. Colao, B. Herr, J. Kavaky, T. Morgan, and A. Zlotek, “Parallel File Systems for the IBM SP Computers,” IBM Systems J., vol. 34, no. 2, pp. 222-248, Jan. 1995.
[12] J. Del Rosario, R. Bordawekar, and A. Choudhary, “Improved Parallel I/O via A Two-Phase Run-Time Access Strategy,” Proc. 1993 IPPS Workshop Input/Output in Parallel Computer Systems, Apr. 1993.
[13] J.D. Rosario and A. Choudhary, “High Performance I/O for Parallel Computers: Problems and Prospects,” Computer, pp 59-68, Mar. 1994.
[14] S. Fineberg, “Implementing the NHT-1 Application I/O Benchmark,” Proc. Int'l Parallel Processing Symp. (IPPS '93) Workshop Input/Output in Parallel Computer Systems, pp. 37-55, 1993, Also published in Computer Architecture News, vol. 21, no. 5, pp. 23-30, Dec. 1993.
[15] J. Huber, C. Elford, D. Reed, A. Chien, and D. Blumenthal, “PPFS: A High Performance Portable Parallel File System,” Proc. Int'l Conf. Supercomputing, July 1995.
[16] M. Kandaswamy, “Design and Evaluation of Optimizations in I/O-Intensive Applications,” PhD Thesis, EECS Dept., Syracuse Univ., Syracuse New York, May 1998.
[17] M. Kandaswamy, M. Kandemir, A. Choudhary, and D. Bernholdt, “Optimization and Evaluation of Hartree-Fock Application's I/O with PASSION,” Proc. SC '97 Conf., (formerly known as Supercomputing), Nov. 1997.
[18] M. Kandemir, “A Collective I/O Scheme Based on Compiler Analysis,” Proc. Fifth Workshop Languages, Compilers, and Run-Time Systems for Scalable Computers, May 2000.
[19] M. Kandemir, A. Choudhary, J. Ramanujam, and R. Bordawekar, ”Compilation Techniques for Out-of-Core Parallel Computations,” Parallel Computing, vol. 24, nos. 3-4, pp. 597-628, June 1998.
[20] M. Kandemir, A. Choudhary, J. Ramanujam, and M. Kandaswamy, “A Unified Compiler Algorithm for Optimizing Locality, Parallelism and Communication in Out-of-Core Computations, Proc. Workshop I/O in Parallel and Distributed Systems (IOPADS '97), pp. 79-92, Nov. 1997.
[21] M. Kandemir, J. Ramanujam, and A. Choudhary, “Improving the Performance of Out-of-Core Computations,” Proc. 1997 Int'l Conf. Parallel Processing, pp. 128-136, Aug. 1997.
[22] D. Kotz, “Expanding the Potential for Disk-Directed I/O,” Proc. 1995 IEEE Symp. Parallel and Distributed Processing, pp. 490-495, Oct. 1995.
[23] D. Kotz and C. Ellis, “Practical Prefetching Techniques for Multiprocessor File Systems,” J. Distributed and Parallel Databases, vol. 1, no. 1, pp. 33-51, Jan. 1993.
[24] T. Mowry, A. Demke, and O. Krieger, “Automatic Compiler-Inserted I/O Prefetching for Out-of-Core Applications,” Proc. Second Symp. Operating Systems Design and Implementations (OSDI'96), Oct. 1996.
[25] “NWChem, A Computational Chemistry Package for Parallel Computers, Version 1.1,” High Performance Computational Chemistry Group, Pacific Northwest Laboratory (PNL), 1995.
[26] M. Paleczny, K. Kennedy, and C. Koelbel, “Compiler Support for Out-of-Core Arrays on Parallel Machines,” CRPC Technical Report 94509-S, Rice Univ., Houston Tex., Dec. 1994.
[27] D. Reed, R. Aydt, R. Noe, P. Roth, K. Shields, B. Schwartz, and L. Tavera, “Scalable Performance Analysis: the Pablo Performance Analysis Environment,” Proc. Scalable Parallel Libraries Conf., pp. 104-113, 1993.
[28] B. Rullman Paragon Parallel File System, External Product Specification, Intel Supercomputer Systems Division. 1996.
[29] K.E. Seamons, Y. Chen, P. Jones, J. Jozwiak, and M. Winslett, “Server-Directed Collective I/O in Panda,” Proc. Supercomputing '95, Dec. 1995.
[30] E. Smirni, C. Elford, A. Laevry, D. Reed, and A. Chien, “Algorithmic Influences on I/O Access Patterns and Parallel File System Performance,” Technical Report, Pablo Group, Univ. of Illinois at Urbana-Champaign, 1996.
[31] R. Thakur and A. Choudhary, “An Extended Two-Phase Method for Accessing Sections of Out-of-Core Arrays,” Scientific Programming, vol. 5, no. 4, pp. 301-317, Winter 1996.
[32] R. Thakur, W. Gropp, and E. Lusk, “An Experimental Evaluation of the Parallel I/O Systems of the IBM, SP, and Intel Paragon Using a Production Application, Proc. Third Int'l Conf. Austrian Center for Parallel Computation (ACPC), pp. 24-35, Sept. 1996.
[33] R. Thakur, W. Gropp, and E. Lusk, “A Case for Using MPI's Derived Data Types to Improve I/O Performance, Preprint, ANL/MCS-P717-0598, Math. and Computer Science Division, Argonne Nat'l Laboratory, May 1998.
[34] S. Toledo and F. Gustavson, “The Design and Implementation of SOLAR, a Portable Library for Scalable Out-of-Core Linear Algebra Computations, Proc. Fourth Ann. Workshop I/O in Parallel and Distributed Systems, May 1996.

Index Terms:
I/O optimizations, parallel architectures, I/O intensive applications, disk layout, collective I/O.
Meenakshi A. Kandaswamy, Mahmut Kandemir, Alok Choudhary, David Bernholdt, "An Experimental Evaluation of I/O Optimizations on Different Applications," IEEE Transactions on Parallel and Distributed Systems, vol. 13, no. 12, pp. 1303-1319, Dec. 2002, doi:10.1109/TPDS.2002.1158267
Usage of this product signifies your acceptance of the Terms of Use.