This Article 
 Bibliographic References 
 Add to: 
Delegation-Based I/O Mechanism for High Performance Computing Systems
February 2012 (vol. 23 no. 2)
pp. 271-279
Arifa Nisar, University of California Santa Cruz, Santa Cruz
Wei-Keng Liao, Northwestern University, Evanston
Alok Choudhary, Northwestern University, Evanston
Massively parallel applications often require periodic data checkpointing for program restart and post-run data analysis. Although high performance computing systems provide massive parallelism and computing power to fulfill the crucial requirements of the scientific applications, the I/O tasks of high-end applications do not scale. Strict data consistency semantics adopted from traditional file systems are inadequate for homogeneous parallel computing platforms. For high performance parallel applications independent I/O is critical, particularly if checkpointing data are dynamically created or irregularly partitioned. In particular, parallel programs generating a large number of unrelated I/O accesses on large-scale systems often face serious I/O serializations introduced by lock contention and conflicts at file system layer. As these applications may not be able to utilize the I/O optimizations requiring process synchronization, they pose a great challenge for parallel I/O architecture and software designs. We propose an I/O mechanism to bridge the gap between scientific applications and parallel storage systems. A static file domain partitioning method is developed to align the I/O requests and produce a client-server mapping that minimizes the file lock acquisition costs and eliminates the lock contention. Our performance evaluations of production application I/O kernels demonstrate scalable performance and achieve high I/O bandwidths.

[1] Teragrid Infrastructure, http:/, 2011.
[2] Jaguar (Cray xt5), /, 2011.
[3] Franklin (Cray xt4), /, 2011.
[4] R. Thakur, R.B. Ross, and R. Latham, "Implementing Byte-Range Locks Using Mpi One-Sided Communication," Proc. Parallel Virtual Machine/Message Passing Interface (PVM/MPI), pp. 119-128, 2005.
[5] A. Nisar, W.-k. Liao, and A. Choudhary, "Scaling Parallel I/O Performance through I/O Delegate and Caching System," Proc. ACM/IEEE Conf. Supercomputing (SC '08), pp. 1-12, 2008.
[6] J.M. del Rosario, R. Bordawekar, and A. Choudhary, "Improved Parallel I/O via a Two-Phase Run-Time Access Strategy," ACM SIGARCH Computer Architecture News, vol. 21, no. 5, pp. 31-38, 1993.
[7] R. Thakur, W. Gropp, and E. Lusk, "Data Sieving and Collective I/O in Romio," Proc. Seventh Symp. Frontiers of Massively Parallel Computation (FRONTIERS '99), pp. 182-189, 1999.
[8] Message Passing Interface Forum, MPI: A Message Passing Interface Standard, Version 1.1, html, June 1995.
[9] M. Berger and J. Oliger, "Adaptive Mesh Refinement for Hyperbolic Partial Differential Equations," J. Computational Physics, vol. 53, pp. 484-512, Mar. 1984.
[10] R. Thakur, W. Gropp, and E. Lusk, "Users Guide for ROMIO: A High-Performance, Portable MPI-IO Implementation," Technical Report ANL/MCS-TM-234, Math. and Computer Science Division, Argonne Nat'l Laboratory, Oct. 1997.
[11] W. keng Liao, A. Ching, K. Coloma, A. Nisar, A. Choudhary, J. Chen, R. Sankaran, and S. Klasky, "Using MPI File Caching to Improve Parallel Write Performance for Large-Scale Scientific Applications," Proc. ACM/IEEE Conf. Supercomputing (SC '07), Nov. 2007.
[12] Abe (teragrid intel-64 cluster), HardwareIntel64Cluster/, 2011.
[13] B. Fryxell, K. Olson, P. Ricker, F.X. Timmes, M. Zingale, D.Q. Lamb, P. MacNeice, R. Rosner, and H. Tufo, "Flash: An Adaptive Mesh Hydrodynamics Code for Modelling Astrophysical Thermonuclear Flashes," Proc. Astrophysical J. Suppliment Series 131, pp. 273-334, 2000.
[14] M. Zingale, FLASH I/O Benchmark Routine—Parallel HDF 5, , Mar. 2001.
[15] R. Sankaran, E.R. Hawkes, J.H. Chen, T. Lu, and C.K. Law, "Direct Numerical Simulations of Turbulent Lean Premixed Combustion," Proc. Physics Conf. Series, Sept. 2006.
[16] D. Oswald David Knaak, "Optimizing MPI-IO for Applications on Cray XT Systems," White Paper, Cray, Inc., p. 20, May 2009.
[17] K. Iskra, J.W. Romein, K. Yoshii, and P. Beckman, "Zoid: I/O-Forwarding Infrastructure for Petascale Architectures," Proc. 13th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP '08), pp. 153-162, 2008.
[18] High-Performance Storage Architecture and Scalable Cluster File System, White Paper, Sun Microsystems, Inc., p. 20, Oct. 2008.
[19] C. William McCurd, R. Stevens, H. Simon, W. Kramer, D. Bailey, W. Johnston, C. Catlett, R. Lusk, T. Morgan, J. Meza, M. Banda, J. Leighton, and J. Hules, "Creating Science-Driven Computer Architecture: A New Path to Scientific Leadership," technical report, Nat'l Energy Research Scientific Computing Center, Oct. 2002.

Index Terms:
Parallel I/O, I/O delegation, MPI-IO, non collective I/O, collaborative caching, parallel file systems, file locking.
Arifa Nisar, Wei-Keng Liao, Alok Choudhary, "Delegation-Based I/O Mechanism for High Performance Computing Systems," IEEE Transactions on Parallel and Distributed Systems, vol. 23, no. 2, pp. 271-279, Feb. 2012, doi:10.1109/TPDS.2011.166
Usage of this product signifies your acceptance of the Terms of Use.