This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
SPIFFI-A Scalable Parallel File System for the Intel Paragon
November 1996 (vol. 7 no. 11)
pp. 1185-1200

Abstract—This paper presents the design and performance of SPIFFI, a scalable high-performance parallel file system intended for use by extremely I/O intensive applications including "Grand Challenge" scientific applications and multimedia systems. This paper contains experimental results from a SPIFFI prototype on a 64 node/64 disk Intel Paragon. The results show that SPIFFI provides high performance and linear scaleup on real hardware. The paper also explains how shared file pointers (i.e., file pointers that are shared by multiple processes) can simplify the design of a parallel application. By sequentializing I/O accesses and by providing dynamic I/O load balancing, a shared file pointer may even improve an application's performance.

This paper also presents the predictions of a SPIFFI simulator that we validated using the prototype. The simulator results show that SPIFFI continues to provide high performance even when it is scaled to configurations with as many as 128 disks or 256 compute nodes.

[1] M. Baker, J.H. Hartman, M.D. Kupfer, K.W. Shirriff, and J. Ousterhout, "Measurements of a Distributed File System," Proc. 13th ACM Symp. Operating Systems Principles, pp. 198-211, Oct. 1991.
[2] C.H. Baldwin and W.C. Nestlerode, "A Large Scale File Processing Application on a Hypercube," Procs Fifth Distributed Memory Computing Conf., pp. 1,400-1,404, Apr. 1990.
[3] M.L. Best et al., “CMMD I/O: A Parallel Unix I/O,” Proc. Seventh Int’l Parallel Processing Symp., IEEE Computer Society Press, Los Alamitos, Calif., 1993, pp. 489–495.
[4] "Butterfly Parallel Processor Overview," Technical Report 6149, Version 2, BBN Laboratories, June 1986.
[5] R. Bordawekar, J.M. del Rosario, and A. Choudhary, “Design and Evaluation of Primitives for Parallel I/O,” Proc. Supercomputing '93, pp. 452-461, Nov. 1993.
[6] P. Brezany, M. Gernt, P. Mehotra, and H. Zima, “Concurrent File Operations in High Performance FORTRAN,” Proc. Supercomputing '92, pp. 230-237, 1992.
[7] L.-F. Cabrera and D.D.E. Long, "Swift: Using Distributed Disk Striping to Provide High I/O Data Rates," Computing Systems, vol. 4, no. 4, pp. 405-436, Fall 1991.
[8] P.M. Chen and D.A. Patterson, "Maximizing Performance in a Striped Disk Array," Proc. 17th Int'l Symp. Computer Architecture,Seattle, pp. 322-331, May 1990.
[9] H-T. Chou, D.J. DeWitt, R.H. Katz, and A.C. Klug, "Design and Implementation of the Wisconsin Storage System," Software—Practice and Experience, vol. 15, no. 10, pp. 943-962, Oct. 1985.
[10] P.F. Corbett, S.J. Baylor, and D.G. Feitelson, "Overview of the Vesta Parallel File System," Proc. IPPS '93 Workshop Input/Output in Parallel Computer Systems, pp. 7-14, Apr. 1993.
[11] P.F. Corbett et al., “Parallel Access to Files in the Vesta File System,” Proc. Supercomputing’93, CS Press, 1993, pp. 472–481.
[12] P.F. Corbett and D.G. Feitelson, "Design and Implementation of the Vesta Parallel File System," Proc. SHPCC, Scalable High-Performance Computing Conf., pp. 63-70,Knoxville, Tenn., May 1994.
[13] R.A. Coyne and H. Hulen, "The High Performance Storage System," Proc. Supercomputing '93, pp. 83-92, Nov. 1993.
[14] E. DeBenedictis and J.M. del Rosario, “nCube Parallel I/O Software,” Proc. 11th Int’l Phoenix Conf. Computers&Communications, CS Press, 1992, pp. 117–124.
[15] D.J. DeWitt,S. Ghandeharizadeh,D.A. Schneider,A. Bricker,H.I. Hsiao,, and R. Rasmussen,“The gamma database machine project,” IEEE Trans. on Knowledge and Data Engineering, vol. 2, no. 1, pp. 44-62, Mar. 1990.
[16] P.C. Dibble, M.L. Scott, and C.S. Ellis, “Bridge: A High-Performance File System for Parallel Processors,” Proc. Eighth Int’l Conf. Distributed Computing Systems, CS Press, 1988, pp. 154–161.
[17] A.L. Drapeau and R.H. Katz, "Striping in Large Tape Libraries," Proc. Supercomputing '93, pp. 378-387, Nov. 1993.
[18] A.L. Drapeau, K.W. Shiriff, J.H. Hartman, E.L. Miller, S. Seshan, R.H. Katz, K. Lutz, D.A. Patterson, E.K. Lee, P.M. Chen, and G.A. Gibson, “RAID-II: A High Bandwidth Network File Server,” Proc. Int'l Symp. Computer Architecture, pp. 234–244, 1994.
[19] J.C. French, "Characterizing the Balance of Parallel I/O Systems," Proc. Sixth Distributed Memory Computing Conf.e, pp. 724-727, Apr. 1991.
[20] N. Galbreath, W. Gropp, and D. Levine, “Applications-Driven Parallel I/O,” Proc. Supercomputing '93, pp. 462-471, Nov. 1993.
[21] D.K. Gifford, R.M. Needham, and M.D. Schroeder, "The Cedar File System," Comm. ACM, vol. 31, no. 3, pp. 288-298, Mar. 1988.
[22] J. Gray, B. Horst, and M. Walker, "Parity Striping of Disk Arrays: Low Cost Reliable Storage with Acceptable Throughput," Proc. 16th Int'l VLDB Conf., p. 152, 1990.
[23] J. Gray, Transaction Processing: Concepts and Techniques, pp. 423-424.San Mateo, Calif.: Morgan Kaufmann, 1993.
[24] J.H. Hartman, J.K. Ousterhout, “The Zebra Striped Network File System,” Proc. 14th Symp. Operating Systems Principles, pp. 29–43, Dec. 1993.
[25] H. Hellwagner, "Design Considerations for Scalable Parallel File Systems," The Computer J., vol. 36, no. 8, pp. 741-755, 1993.
[26] J.H. Howard, M.L. Kazar, S.G. Menees, D.A. Nichols, M. Satyanarayanan, R.N. Sidebotham, and M.J. West, "Scale and performance in a distributed file system," ACM Trans. Comp. Sys., vol. 6, no. 1, Feb. 1988.
[27] iPSC/2 User's Guide, Intel Corporation, Beaverton, Ore., Mar. 1989.
[28] Paragon OSF/1 User's Guide, Intel Corporation, Beaverton, Ore., Apr. 1993.
[29] Paragon System Software Release 1.1 Release Notes for the Paragon XP/S System, Intel Corporation, Beaverton, Ore., Oct. 1993.
[30] D. Kotz and C.S. Ellis, “Practical Prefetching Techniques for Multiprocessor File Systems,” J. Distributed and Parallel Databases, vol. 1, no. 1, pp. 33–51, Jan. 1993.
[31] D. Kotz, "Multiprocessor File System Interfaces," Proc. Second Int'l Conf. Parallel and Distributed Information Systems, pp. 194-201, 1993.
[32] D. Kotz, "Throughput of Existing Multiprocessor File Systems (An Informal Study)," Technical Report PCS-TR93-190, Dept. of Math and Computer Science, Dartmouth College.
[33] D. Kotz and N. Nieuwejaar, “Dynamic File-Access Characteristics of a Production Parallel Scientific Workload,” Proc. Supercomputing '94, pp. 640–649, Nov. 1994.
[34] D. Kotz, "Disk-directed I/O for MIMD Multiprocessors," Proc. First Symp. Operating Systems Design and Implementation, pp. 61-74, Nov. 1994.
[35] J. Krystynak and B. Nitzberg, "Performance Characteristics of the iPSC/860 and CM-2 I/O Systems," Proc. Seventh Int'l Parallel Processing Symp., pp. 837-841, Apr. 1993.
[36] K. Loepere, "Mach 3 Kernel Principles," Open Software Foundation document, 1992.
[37] M. Mehta and D.J. DeWitt, "Dynamic Memory Allocation for Multiple-Query Workloads," Proc. 19th Int'l Conf. Very Large Data Bases, pp. 354-367, Aug. 1993.
[38] E. Miller and R. Katz, "Input/Output Behavior of Supercomputing Applications," Proc. Supercomputing '91, pp. 567-576, 1991.
[39] E.L. Miller and R.H. Katz, "RAMA: A File System for Massively-Parallel Computers," Proc. 12th IEEE Symp. Mass Storage Systems, pp. 163-168, Oct. 1993.
[40] E.L. Miller and R.H. Katz, "RAMA: Easy Access to a High-Bandwidth Massively Parallel File System," Proc. 1995 Winter USENIX Conf., pp. 59-70, Jan. 1995.
[41] B. Nitzberg, "Performance of the iPSC/860 Concurrent File System," Technical Report RND-92-020, NASA Ames Research Center, Dec. 1992.
[42] J.K. Ousterhout et al., "A Trace-Driven Analysis of the UNIX 4.2 BSD File System," Proc. 10th Symp. Operating Systems Principles, pp. 15-24, Dec. 1985.
[43] B.K. Pasquale and G.C. Polyzos, “A Static Analysis of I/O Characteristics of Scientific Applications in a Production Workload,” Proc. Supercomputing '93, pp. 388–397, 1993.
[44] B.K. Pasquale and G.C. Polyzos, “Dynamic I/O Characterization of I/O Intensive Scientific Applications,” Proc. Supercomputing '94, pp. 660–669, Nov. 1994.
[45] D.A. Patterson, G. Gibson, and R.H. Katz, “A Case for Redundant Arrays of Inexpensive Disks (RAID),” Proc. ACM SIGMOD Conf., pp. 109–116, 1988.
[46] C. Polyzois, A. Bhide, and D. Dias, "Disk Mirroring with Alternating Deferred Updates," Proc. 18th Int'l Conf. Very Large Databases, pp. 604-617,Dublin, Aug. 1993.
[47] J.S. Quarterman, A. Silberschatz, and J.L. Peterson, "4.2BSD and 4.3BSD as Examples of the UNIX System," Computing Surveys, vol. 17, no. 4, pp. 379-418, Dec. 1985.
[48] D. Ries and R. Epstein, "Evaluation of Distribution Criteria for Distributed Database Systems," UCB/ERL Technical Report M78/22, Univ. of California at Berkeley, May 1978.
[49] H. Schwetman, "CSIM Users' Guide," MCC Technical Report No. ACT-126-90, Microelectronics and Computer Technology Corp., Austin, Tex., Mar. 1990.
[50] A. Silberschatz and P.B. Galvin, Operating Systems Concepts, 5th ed., Addison-Wesley, Reading, Mass., 1998.
[51] P. Corbett, D. Feitelson, Y. Hsu, J.-P. Prost, M. Snir, S. Fineberg, B. Nitzberg, B. Traversat, and P. Wong, "MPI-IO: A Parallel File I/O Interface for MPI, Version 0.3," Technical Report NAS-95-002, NASA Ames Research Center, Jan. 1995.
[52] R. Thakur, R. Bordawekar, and A. Choudhary, "Compiler and Runtime Support for Out-of-Core HPF Programs," Proc. Int'l Conf. Supercomputing, pp. 382-391, July 1994.
[53] B. Walker, G. Popek, R. English, C. Kline, and G. Thiel, "The LOCUS Distributed Operating System," Proc. Ninth ACM Symp. Operating System Principles, pp. 49-69, Oct. 1983.
[54] G. Weikum, C. Hasse, A. Moenkeberg, M. Rys, and P. Zabback, "The COMFORT Project," Proc. Second Int'l Conf. Parallel and Distributed Information Systems, pp. 158-161, Jan. 1993.

Index Terms:
Parallel I/O, file system, massively parallel processor, high performance, scalability.
Citation:
Craig S. Freedman, Josef Burger, David J. DeWitt, "SPIFFI-A Scalable Parallel File System for the Intel Paragon," IEEE Transactions on Parallel and Distributed Systems, vol. 7, no. 11, pp. 1185-1200, Nov. 1996, doi:10.1109/71.544358
Usage of this product signifies your acceptance of the Terms of Use.