1. limited number of I/O resources (e.g., I/O nodes, disks), and
2. unoptimized I/O software.
1. To understand I/O behavior of real scientific codes,
2. To evaluate different I/O optimizations available on different applications and try to see which optimizations are appropriate for a given application,
3. To see at what point the software optimizations become ineffective and to try to see whether increasing hardware resources sparingly will solve the problem, and finally,
4. To reach a general guideline (rule of thumb) that can be applied by the end user (or can be automated) which aims at improving I/O performance of the applications.
1. The original version [ 25] that uses Unix I/O with a Fortran interface (to the underlying parallel file system) as provided by the Pacific Northwest Lab (PNL),
2. an optimized version that uses the PASSION [ 8] library's read/write calls, and
3. an optimized version that uses PASSION prefetch calls instead of read calls.
1. Write phase (performed only once); integrals are calculated and packed in a memory buffer and written to disk after the buffer is full.
2. Read phase (performed in every iteration); integral values are read from disk to a memory buffer to compute Fock matrix; integral file is resident on disks, striped across various I/O nodes and accessed through the parallel file system.
1. The PASSION library uses the file system's nonblocking or asynchronous reads for prefetching and because of this, it has to translate a single request to a logically contiguous chunk of data access into multiple requests to physically contiguous chunks of data accesses. This bookkeeping contributes to a big percentage of the prefetching overhead. Posting of individual requests also adds to the overhead, as each request needs to obtain a token to be entered in the queue of asynchronous requests to a given file.
2. Copying data from the prefetch buffer to the application buffer also contributes to a nonnegligible percentage of the prefetch overhead.
1. Unoptimized version: I/O done using the Chameleon library and
2. Optimized version: I/O done using a runtime system library performing two-phase I/O.
1. Optimizations that target access pattern of a number of processors and
2. Optimizations that target access pattern of a single node.
M. Kandaswamy is with the Enterprise Server Group, Intel Corporation, Hillsboro, OR 97124-6497. E-mail: email@example.com.
M. Kandemir is with the Microsystems Design Group, Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA 16802. E-mail: firstname.lastname@example.org.
A. Choudhary is with the Department of Electrical and Computer Engineering, Northwestern University, Technological Institute, 2145 Sheridan Road, Evanston, IL 60208-3118. E-mail: email@example.com.
D. Bernholdt is with the Northeast Parallel Architecture Center, Syracuse University, 111 College Place, Syracuse, NY 13244.
Manuscript received 16 Oct. 1998; revised 12 Aug. 2000; accepted 12 Dec. 2001.
For information on obtaining reprints of this article, please send e-mail to: firstname.lastname@example.org, and reference IEEECS Log Number 108073.
1. EDITOR'S NOTE: This paper originally appeared in TPDS, Vol. 13, No. 7. Unfortunately, due to extraordinary circumstances, errors were introduced into the figure captions of the paper. We are reprinting the paper in its entirety here.
Meenakshi A. Kandaswamy received the BE (Honors) degree in computer science and engineering from Regional Engineering College, Trichy, India in 1989. She received the MS and PhD degrees from the Department of Electrical Engineering and Computer Science at Syracuse University in 1998 and 1995, respectively. She currently works in the Enterprise Architecture Labs, Intel Corporation as a senior software engineer. Her research interests include high-performance I/O, parallel applications and benchmarks, multiprocessor file systems, performance modeling, and simulation.
Mahmut Kandemir received the BSc and MSc degrees in control and computer engineering from Istanbul Technical University, Istanbul, Turkey, in 1988 and 1992, respectively. He received the PhD degree from Syracuse University, Syracuse, New York in electrical engineering and computer science, in 1999. He has been an assistant professor in the Computer Science and Engineering Department at the Pennsylvania State University since August 1999. His main research interests are optimizing compilers, I/O intensive applications, and power-aware computing. He is a member of the IEEE and the ACM.
Alok Choudhary received the BE (Hons.) degree from Birla Institute of Technology and Science, Pilani, India in 1982, the MS degree from the University of Massachusetts, Amherst, in 1986 and the PhD degree from the University of Illinois, Urbana-Champaign, in electrical and computer engineering. He is a professor of electrical and computer engineering at Northwestern University. He received the US National Science Foundation's Young Investigator Award in 1993 (1993-1999). He has also received an IEEE Engineering Foundation award, an IBM Faculty Development award, and an Intel Research Council award. His main research interests are in high-performance computing and communication systems and their applications in many domains including multimedia systems, information processing and scientific computing. He is a senior member of the IEEE and a member of the ACM. He also serves in the High-Performance Fortran Forum, a forum of Academia, Industry and Government Labs working on standardizing programming languages for portable programming on parallel computers.