The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.05 - May (2011 vol.22)
pp: 803-816
Jin Xiong , Chinese Academy of Sciences, Beijing
Yiming Hu , University of Cincinnati, Cincinnati
Guojie Li , Chinese Academy of Sciences, Beijing
Rongfeng Tang , Chinese Academy of Sciences, Beijing
Zhihua Fan , NetEase.com Inc, Beijing
ABSTRACT
Most supercomputers nowadays are based on large clusters, which call for sophisticated, scalable, and decentralized metadata processing techniques. From the perspective of maximizing metadata throughput, an ideal metadata distribution policy should automatically balance the namespace locality and even distribution without manual intervention. None of existing metadata distribution schemes is designed to make such a balance. We propose a novel metadata distribution policy, Dynamic Dir-Grain (DDG), which seeks to balance the requirements of keeping namespace locality and even distribution of the load by dynamic partitioning of the namespace into size-adjustable hierarchical units. Extensive simulation and measurement results show that DDG policies with a proper granularity significantly outperform traditional techniques such as the Random policy and the Subtree policy by 40 percent to 62 times. In addition, from the perspective of file system reliability, metadata consistency is an equally important issue. However, it is complicated by dynamic metadata distribution. Metadata consistency of cross-metadata server operations cannot be solved by traditional metadata journaling on each server. While traditional two-phase commit (2PC) algorithm can be used, it is too costly for distributed file systems. We proposed a consistent metadata processing protocol, S2PC-MP, which combines the two-phase commit algorithm with metadata processing to reduce overheads. Our measurement results show that S2PC-MP not only ensures fast recovery, but also greatly reduces fail-free execution overheads.
INDEX TERMS
Distributed file systems, metadata management.
CITATION
Jin Xiong, Yiming Hu, Guojie Li, Rongfeng Tang, Zhihua Fan, "Metadata Distribution and Consistency Techniques for Large-Scale Cluster File Systems", IEEE Transactions on Parallel & Distributed Systems, vol.22, no. 5, pp. 803-816, May 2011, doi:10.1109/TPDS.2010.154
REFERENCES
[1] D.C. Anderson, J.S. Chase, and A.M. Vahdat, "Interposed Request Routing for Scalable Network Storage," ACM Trans. Computer Systems, vol. 20, no. 1, pp. 25-48, Feb. 2002.
[2] M. Baker and J. Ousterhout, "Availability in the Sprite Distributed File System," ACM SIGOPS Operating Systems Rev., vol. 25, no. 2, pp. 95-98, Apr. 1991.
[3] M. Baker, "Fast Crash Recovery in Distributed File Systems," PhD dissertation, Dept. of Computer Science, Univ. of California, Berkeley, 1994.
[4] P.A. Bernstein, V. Hadzilacos, and N. Goodman, Concurrency Control and Recovery in Database Systems. Addison-Wesley, 1987.
[5] P.J. Braam, "The Lustre Storage Architecture," White Paper, Cluster File Systems, Inc., Oct. 2003.
[6] S.A. Brandt, L. Xue, E.L. Miller, and D.D.E. Long, "Efficient Metadata Management in Large Distributed File Systems," Proc. 20th IEEE/11th NASA Goddard Conf. Mass Storage Systems and Technologies (MSST '03), pp. 290-298, Apr. 2003.
[7] A.R. Butt, T.A. Johnson, Y. Zheng, and Y.C. Hu, "Kosha: A Peer-to-Peer Enhancement for the Network File System," Proc. IEEE/ACM High Performance Computing, Networking and Storage Conf. (SC '04), Nov. 2004.
[8] M. Devarakonda, B. Kish, and A. Mohindra, "Recovery in the Calypso File System," ACM Trans. Computer Systems, vol. 14, no. 3, pp. 287-310, Aug. 1996.
[9] J. Gray, "Notes on Data Base Operating Systems," An Advanced Course: Operating Systems, R. Bayer, R. M. Graham, and G. Seegmüller, eds., vol. 60, pp. 393-481, Springer-Verlag, 1978.
[10] T. Haerder and A. Reuter, "Principles of Transaction-Oriented Database Recovery," ACM Computing Surveys, vol. 15, no. 4, pp. 287-317, Dec. 1983.
[11] M. Ji, E.W. Felten, R. Wang, and J.P. Singh, "Archipelago: An Island-Based File System for Highly Available and Scalable Internet Services," Proc. Fourth USENIX Windows Systems Symp., Aug. 2000.
[12] C. Karamanolis, L. Liu, M. Maholingam, D. Muntz, and Z. Zhang, "An Architecture for Scalable and Manageable File Services," Technical Report HPL-2001-173, HP Labs, July 2001.
[13] J. Katcher, "Postmark: A New File System Benchmark," Technical Report TR-3022, Network Appliance, Corp., 1997.
[14] J. Menon, D.A. Pease, R. Rees, L. Duyanovich, and B. Hillsberg, "IBM Storage Tank-A Heterogeneous Scalable SAN File System," IBM Systems J., vol. 42, no. 2, pp. 250-267, Apr. 2003.
[15] D. Nagle, D. Serenyi, and A. Matthews, "The Panasas ActiveScale Storage Cluster-Delivering Scalable High Bandwidth Storage," Proc. IEEE/ACM High Performance Computing, Networking and Storage Conf. (SC '04), Nov. 2004.
[16] W.D. Norcott, "Iozone File System Benchmark," http://www. iozone.org/docsIOzone_msword_98.pdf , 2005.
[17] K.W. Preslan, A. Barry, J. Brassow, R. Cattelan, A. Manthei, E. Nygaard, S. Van Oort, D. Teigland, M. Tilstra, and M. O'Keefe, "Implementing Journaling in a Linux Shared Disk File System," Proc. Eighth NASA Goddard Conf. Mass Storage Systems and Technologies in Cooperation with the 17th IEEE Symp. Mass Storage Systems, pp. 351-378, Mar. 2000.
[18] PVFS2 Development Team, "Parallel Virtue File System, Version 2," http://www.pvfs.orgpvfs2/, Sept. 2003.
[19] D. Roselli, J. Lorch, and T. Anderson, "A Comparison of File System Workloads," Proc. USENIX Ann. Technical Conf., pp. 41-54, June 2000.
[20] F. Schmuck and R. Haskin, "GPFS: A Shared-Disk File System for Large Computing Clusters," Proc. First USENIX Conf. File and Storage Technologies (FAST '02), Jan. 2002.
[21] SFS 3.0 Documentation Version 1.0, Standard Performance Evaluation Corp. (SPEC).
[22] L. Shepard and E. Eppe, "SGI InfiniteStorage Shared Filesystem CXFS: A High-Performance, Multi-OS SAN File System from SGI," White Paper, Silicon Graphics, Inc., June 2004.
[23] M.T. Özsu and P. Valduriez, Principles of Distributed Database Systems, second ed. Prentice-Hall, Inc., 1999.
[24] U. Vahalia, UNIX Internals: The New Frontiers. Prentice-Hall, Inc., 1996.
[25] S.A. Weil, K.T. Pollack, S.A. Brandt, and E.L. Miller, "Dynamic Metadata Management for Petabyte-Scale File Systems," Proc. IEEE/ACM High Performance Computing, Networking and Storage Conf. (SC '04), Nov. 2004.
[26] C. Wu and R. Burns, "Handling Heterogeneity in Shared-Disk File Systems," Proc. Int'l Conf. High Performance Computing and Comm. (SC '03), Nov. 2003.
[27] J. Xiong, S. Wu, D. Meng, N. Sun, and G. Li, "Design and Performance of the Dawning Cluster File System," Proc. IEEE Int'l Conf. Cluster Computing (CLUSTER '03), pp. 232-239, Dec. 2003.
[28] J. Xiong, R. Tang, S. Wu, D. Meng, and N. Sun, "An Efficient Metadata Distribution Policy for Cluster File Systems," Proc. IEEE Int'l Conf. Cluster Computing (CLUSTER '05), Sept. 2005.
[29] Z. Zhang and C. Karamanolis, "Designing a Robust Namespace for Distributed File Services," Proc. 20th IEEE Symp. Reliable Distributed Systems, pp. 162-171, Oct. 2001.
[30] J.K. Ousterhout, A.R. Cherenson, F. Douglis, M.N. Nelson, and B.B. Welch, "The Sprite Network Operating System," Computer, vol. 21, no. 2, pp. 23-36, Feb. 1988.
[31] Y. Zhu, H. Jiang, J. Wang, and F. Xian, "HBA: Distributed Metadata Management for Large Cluster-Based Storage Systems" IEEE Trans. Parallel and Distributed Systems, vol. 19, no. 6, pp. 750-763, June 2008.
[32] Y. Hua, Y. Zhu, H. Jiang, D. Feng, and L. Tian, "Scalable and Adaptive Metadata Management in Ultra Large-Scale File Systems" Proc. 28th Int'l Conf. Distributed Computing Systems (ICDCS '08), pp. 403-410, June 2008.
[33] S. Dayal, "Characterizing HEC Storage Systems at Rest," Technical Report CMU-PDL-08-109, Parallel Data Laboratory, Carnegie Mellon Univ., July 2008.
[34] M. Satyanarayanan, J.H. Howard, D.A. Nichols, R.N. Sidebotham, A.Z. Spector, and M.J. West, "The ITC Distributed File System: Principles and Design," ACM SIGOPS Operating Systems Rev., vol. 19, no. 5, pp. 35-50, Dec. 1985.
[35] T.E. Anderson, M.D. Dahlin, J.M. Neefe, D.A. Patterson, D.S. Roselli, and R.Y. Wang, "Serverless Network File Systems," Proc. 15th ACM Symp. Operating Systems Principles, pp. 109-126, Dec. 1995.
[36] S. Patil and G. Gibson, "GIGA+: Scalable Directories for Shared File Systems," Technical Report CMU-PDL-08-110, Parallel Data Laboratory, Carnegie Mellon Univ., Oct. 2008.
[37] J. Xing, J. Xiong, N. Sun, and J. Ma, "Adaptive and Scalable Metadata Management to Support a Trillion Files," Proc. Conf. High Performance Computing Networking, Storage and Analysis, Nov. 2009.
[38] S. Sinnamohideen, R. Sambasivan, J. Hendricks, K. Liu, and G. Ganger, "A Transparently-Scalable Metadata Service for the Ursa Minor Storage System," Proc. USENIX Ann. Technical Conf. (USENIX ATC '10), June 2010.
[39] S. Ghemawat, H. Gobioff, and S.-T. Leung, "The Google File System," Proc. 19th ACM Symp. Operating Systems Principles (SOSP '03), pp. 29-43, 2003.
21 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool