The Community for Technology Leaders
Green Image
Issue No. 02 - February (2012 vol. 23)
ISSN: 1045-9219
pp: 271-279
Arifa Nisar , University of California Santa Cruz, Santa Cruz
Wei-Keng Liao , Northwestern University, Evanston
Alok Choudhary , Northwestern University, Evanston
ABSTRACT
Massively parallel applications often require periodic data checkpointing for program restart and post-run data analysis. Although high performance computing systems provide massive parallelism and computing power to fulfill the crucial requirements of the scientific applications, the I/O tasks of high-end applications do not scale. Strict data consistency semantics adopted from traditional file systems are inadequate for homogeneous parallel computing platforms. For high performance parallel applications independent I/O is critical, particularly if checkpointing data are dynamically created or irregularly partitioned. In particular, parallel programs generating a large number of unrelated I/O accesses on large-scale systems often face serious I/O serializations introduced by lock contention and conflicts at file system layer. As these applications may not be able to utilize the I/O optimizations requiring process synchronization, they pose a great challenge for parallel I/O architecture and software designs. We propose an I/O mechanism to bridge the gap between scientific applications and parallel storage systems. A static file domain partitioning method is developed to align the I/O requests and produce a client-server mapping that minimizes the file lock acquisition costs and eliminates the lock contention. Our performance evaluations of production application I/O kernels demonstrate scalable performance and achieve high I/O bandwidths.
INDEX TERMS
Parallel I/O, I/O delegation, MPI-IO, non collective I/O, collaborative caching, parallel file systems, file locking.
CITATION
Arifa Nisar, Wei-Keng Liao, Alok Choudhary, "Delegation-Based I/O Mechanism for High Performance Computing Systems", IEEE Transactions on Parallel & Distributed Systems, vol. 23, no. , pp. 271-279, February 2012, doi:10.1109/TPDS.2011.166
93 ms
(Ver )