This Article 
 Bibliographic References 
 Add to: 
Learning to Classify Parallel Input/Output Access Patterns
August 2002 (vol. 13 no. 8)
pp. 802-813

Abstract—Input/output performance on current parallel file systems is sensitive to a good match of application access patterns to file system capabilities. Automatic input/output access pattern classification can determine application access patterns at execution time, guiding adaptive file system policies. In this paper, we examine and compare two novel input/output access pattern classification methods based on learning algorithms. The first approach uses a feedforward neural network previously trained on access pattern benchmarks to generate qualitative classifications. The second approach uses hidden Markov models trained on access patterns from previous executions to create a probabilistic model of input/output accesses. In a parallel application, access patterns can be recognized at the level of each local thread or as the global interleaving of all application threads. Classification of patterns at both levels is important for parallel file system performance; we propose a method for forming global classifications from local classifications. We present results from parallel and sequential benchmarks and applications that demonstrate the viability of this approach.

[1] E. Charniak, Statistical Language Learning. MIT Press, 1993.
[2] P.E. Crandall, R.A. Aydt, A.A. Chien, and D.A. Reed, “Characterization of a Suite of Input/Output Intensive Applications,” Proc. Supercomputing '95, Dec. 1995.
[3] J. Griffioen and R. Appleton, “Reducing File System Latency Using a Predictive Approach,” Proc. USENIX Summer Technical Conf., pp. 197–207, June 1994.
[4] A.S. Grimshaw and E.C. LoyotJr., “ELFS: Object-Oriented Extensible File Systems,” Proc. First Int'l Conf. Parallel and Distributed Information Systems, p. 177, Dec. 1991.
[5] R.D. Henderson, “Unstructured Spectral Element Methods: Parallel Algorithms and Simulations,” PhD thesis, Princeton Univ., June 1994.
[6] R.D. Henderson and G.E. Karniadakis, “Unstructured Spectral Element Methods for Simulation of Turbulent Flows,” J. Computational Physics, vol. 122, no. 2, pp. 191–217, 1995.
[7] G.E. Hinton, “Connectionist Learning Procedures,” Artificial Intelligence, vol. 40, pp. 185–234, 1989.
[8] J. Huber, C.L. Elford, D.A. Reed, A.A. Chien, and D.S. Blumenthal, “PPFS: A High Performance Portable Parallel File System,” Proc. Ninth ACM Int'l Conf. Supercomputing, pp. 385–394, July 1995.
[9] Paragon XP/S Product Overview. Intel Corp., 1991.
[10] L. Kleinrock, Queueing Systems, vol. 1,Theory. John Wiley, 1975.
[11] K. Korner, “Intelligent Caching for Remote File Service,” Proc. 10th Int'l Conf. Distributed Computing Systems, pp. 220–226, May 1990.
[12] D. Kotz and C.S. Ellis, “Practical Prefetching Techniques for Multiprocessor File Systems,” J. Distributed and Parallel Databases, vol. 1, no. 1, pp. 33–51, Jan. 1993.
[13] T.M. Kroeger and D.D.E. Long, “Predicting File-System Actions from Prior Events,” Proc. USENIX 1996 Ann. Technical Conf., pp. 319–328, Jan. 1996.
[14] A. Kuppermann and Y.-S.M. Wu, “The Quantitative Prediction and Lifetime of a Pronounced Reactive Scattering Resonance,” Chemical Physics Letters 241, pp. 229–240, 1995.
[15] H. Lei and D. Duchamp, “An Analytical Approach to File Prefetching,” Proc. USENIX 1997 Ann. Technical Conf., pp. 275–288, Jan. 1997.
[16] T.M. Madhyastha, “Porsonify: A Portable System for Data Sonification,” master's thesis, Dept. Computer Science, Univ. of Illinois at Urbana-Champaign, Apr. 1992.
[17] M. Palmer and S.B. Zdonik, “Fido: A Cache that Learns to Fetch,” Proc. 17th Int'l Conf. Very Large Data Bases, pp. 255–262, Sept. 1991.
[18] R.H. Patterson, G. Gibson, E. Ginting, D. Stodolsky, and J. Zelenka, "Informed Prefetching and Caching," Proc. 15th ACM Symp. Operating Systems Principles, pp. 79-95, Dec. 1995.
[19] J.T. Pool, “Scalable I/O Initiative,” California Inst. of Technology, available athttp://www.ccsf.caltech.eduSIO/, 1996.
[20] L.R. Rabiner, “Tutorial on Hidden Markov Model and Selected Applications in Speech Recognition,” Proc. IEEE, vol. 77, no. 2, pp. 257-285, 1989.
[21] E. Smirni, R.A. Aydt, A.A. Chien, and D.A. Reed, “I/O Requirements of Scientific Applications: An Evolutionary View,” Proc. Fifth Int'l Symp. High Performance Distributed Computing, pp. 49–59, 1996.
[22] A. Tomkins, R.H. Patterson, and G. Gibson, “Informed Multi-Process Prefetching and Caching,” Proc. ACM Int'l Conf. Measurement and Modeling of Computer Systems, June 1997.
[23] Y.-S.M. Wu, S.A. Cuccaro, P.G. Hipes, and A. Kuppermann, “Quantum Chemical Reaction Dynamics on a Highly Parallel Supercomputer,” Theoretica Chimica Acta 79, pp. 225–239, 1991.

Index Terms:
Parallel I/O, access pattern classification, adaptive policies, neural networks, hidden Markov models.
Tara M. Madhyastha, Daniel A. Reed, "Learning to Classify Parallel Input/Output Access Patterns," IEEE Transactions on Parallel and Distributed Systems, vol. 13, no. 8, pp. 802-813, Aug. 2002, doi:10.1109/TPDS.2002.1028437
Usage of this product signifies your acceptance of the Terms of Use.