This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Scalable Design and Implementations for MPI Parallel Overlapping I/O
November 2006 (vol. 17 no. 11)
pp. 1264-1276

Abstract—We investigate the Message Passing Interface Input/Output (MPI I/O) implementation issues for two overlapping access patterns: the overlaps among processes within a single I/O operation and the overlaps across a sequence of I/O operations. The former case considers whether I/O atomicity can be obtained in the overlapping regions. The latter focuses on the file consistency problem on parallel machines with client-side file caching enabled. Traditional solutions for both overlapping I/O problems use whole file or byte-range file locking to ensure exclusive access to the overlapping regions and bypass the file system cache. Unfortunately, not only can file locking serialize I/O, but it can also increase the aggregate communication overhead between clients and I/O servers. For atomicity, we first differentiate MPI's requirements from the Portable Operating System Interface (POSIX) standard and propose two scalable approaches, graph coloring and process-rank ordering, which can resolve access conflicts and maintain I/O parallelism. For solving the file consistency problem across multiple I/O operations, we propose a method called Persistent File Domains, which tackles cache coherency with additional information and coordination to guarantee safe cache access without using file locks.

[1] P. Crandall, R. Aydt, A. Chien, and D. Reed, “Input-Output Characteristics of Scalable Parallel Applications,” Supercomputing, Dec. 1995
[2] N. Nieuwejaar, D. Kotz, A. Purakayastha, C. Ellis, and M. Best, “File-Access Characteristics of Parallel Scientific Workloads,” IEEE Trans. Parallel and Distributed Systems, vol. 7, no. 10, pp.1075-1089, Oct. 1996.
[3] E. Smirni, R. Aydt, A. Chien, and D. Reed, “I/O Requirements of Scientific Applications: An Evolutionary View,” Proc. Fifth IEEE Int'l Symp. High Performance Distributed Computing, pp. 49-59, 1996.
[4] E. Smirni and D. Reed, “Lessons from Characterizing the Input/Output Behavior of Parallel Scientific Applications,” Performance Evaluation: An Int'l J., vol. 33, no. 1, pp. 27-44, June 1998.
[5] Message Passing Interface Forum, “MPI-2: Extensions to the Message Passing Interface,” http://www.mpi-forum.org/docsdocs.html, July 1997.
[6] IEEE/ANSI Std. 1003.1, “Portable Operating System Interface (POSIX)-Part 1: System Application Program Interface (API) [CLanguage],” 1996.
[7] HDFGroup, Hierarchical Data Format, version 5, Nat'l Center for Supercomputing Applications, http://hdf.ncsa.uiuc.eduHDF5, 2004.
[8] The UniData Program Center, Network Common Data Form, Univ. Corporation for Atmospheric Research, http://www. unidata.ucar.edu/softwarenetcdf , 2004.
[9] R. Thakur, W. Gropp, and E. Lusk, “Users Guide for ROMIO: A High-Performance, Portable MPI-IO Implementation,” Technical Report ANL/MCS-TM-234, Math. and Computer Science Division, Argonne Nat'l Laboratory, Oct. 1997.
[10] IEEE Std. 1003.1-2001, System Interfaces, 2001.
[11] “RS/6000 SP Software: Parallel I/O File System,” IBM, http://www.rs6000.ibm.com/software/sp-products piofs.html, 1996.
[12] F. Schmuck and R. Haskin, “GPFS: A Shared-Disk File System for Large Computing Clusters,” Proc. Conf. File and Storage Technologies (FAST '02), pp. 231-244, Jan. 2002.
[13] S.G. Inc., XFS: A High-Performance Journaling Filesystem, http://oss.sgi.com/projectsxfs, 2006.
[14] A. Tanenbaum and M. van Steen, Distributed Systems—Principles and Paradigms. Prentice Hall, 2002.
[15] T. Cortes, S. Girona, and J. Labarta, “Design Issues of a Cooperative Cache with No Coherence Problems,” High Performance Mass Storage and Parallel I/O: Technologies and Applications, pp. 259-270, 2001.
[16] M. Kallahalla and P. Varman, “Optimal Prefetching and Caching for Parallel I/O Systems,” Proc. 13th Ann. ACM Symp. Parallel Algorithms and Architectures, pp. 219-228, 2001.
[17] T. Kimbrel, P. Cao, E. Felten, A. Karlin, and K. Li, “Integrating Parallel Prefetching and Caching,” Proc. ACM SIGMETRICS Conf. Measurement and Modelling of Computer Systems, pp. 262-263, 1996.
[18] S. Soltis, T. Ruwart, G. Erickson, K. Preslan, and M. O'Keefe, “The Global File System,” High Performance Mass Storage and Parallel I/O: Technologies and Applications, pp. 10-15, 2001.
[19] B. Forney, A. Arpaci-Dusseau, and R. Arpaci-Dusseau, “Storage-Aware Caching: Revisiting Caching for Heterogeneous Storage Systems,” Proc. First Int'l Conf. File and Storage Technologies (FAST), 2002.
[20] S. Shepler, C. Beame, R. Callaghan, M. Eisler, D. Noveck, D. Robinson, and R. Thurlow, “NFS version 4 Protocol,” Dec. 2000.
[21] P. Carns, W. Ligon, R. Ross, and R. Thakur, “PVFS: A Parallel File System for Linux Clusters,” Proc. Third Ann. Linux Showcase and Conf., pp. 317-327, Oct. 2000.
[22] R. Thakur, R. Brodawekar, A. Choudhary, R. Ponnusamy, and T. Singh, “PASSION: Runtime Library for Parallel I/O,” Proc. Scalable Parallel Libraries Conf., pp. 119-128, Oct. 1994.
[23] J. del Rosario, R. Brodawekar, and A. Choudhary, “Improved Parallel I/O via a Two-Phase Run-Time Access Strategy,” Proc. Workshop I/O in Parallel Computer Systems, pp. 56-70, Apr. 1993.
[24] R. Thakur, W. Gropp, and E. Lusk, “On Implementing MPI-IO Portably and with High Performance,” Proc. Sixth Workshop I/O in Parallel and Distributed Systems, pp. 23-32, May 1999,
[25] M. Garey and D. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness. New York: W. H. Freeman, 1979.
[26] D. Kotz, “Disk-Directed I/O for MIMD Multiprocessors,” ACM Trans. Computer Systems, vol. 15, no. 1, pp. 41-74, Feb. 1997.
[27] Sandia National Laboratories, ENFS—Extended NFS, http://www.cs.sandia.gov/cplant/doc/ioENFS_User_Doc.html , 2004.
[28] Sandia National Laboratories, “Computational Plant,” http://www.cs.sandia.govcplant, 2004.
[29] K. Seamons, Y. Chen, P. Jones, J. Jozwiak, and M. Winslett, “Server-Directed Collective I/O in Panda,” Supercomputing, Dec. 1995.
[30] R. Thakur, W. Gropp, and E. Lusk, “Data Sieving and Collective I/O in ROMIO,” Proc. Seventh Symp. Frontiers of Massively Parallel Computation, Feb. 1999.
[31] R. Thakur and A. Choudhary, “An Extended Two-Phase Method for Accessing Sections of Out-of-Core Arrays,” Scientific Programming, vol. 5, no. 4, pp. 301-317, Feb. 1996.

Index Terms:
MPI, MPI I/O, atomic I/O, file atomicity, file consistency, cache coherence, overlapping I/O.
Citation:
Wei-keng Liao, Kenin Coloma, Alok Choudhary, Lee Ward, Eric Russell, Neil Pundit, "Scalable Design and Implementations for MPI Parallel Overlapping I/O," IEEE Transactions on Parallel and Distributed Systems, vol. 17, no. 11, pp. 1264-1276, Nov. 2006, doi:10.1109/TPDS.2006.163
Usage of this product signifies your acceptance of the Terms of Use.