This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Supporting Scalable and Adaptive Metadata Management in Ultralarge-Scale File Systems
April 2011 (vol. 22 no. 4)
pp. 580-593
Yu Hua, Huazhong University of Science and Technology, Wuhan
Yifeng Zhu, University of Maine, Orono
Hong Jiang, University of Nebraska-Lincoln, Lincoln
Dan Feng, Huazhong University of Science and Technology, Wuhan
Lei Tian, Huazhong University of Science and Technology, Wuhan
This paper presents a scalable and adaptive decentralized metadata lookup scheme for ultralarge-scale file systems (more than Petabytes or even Exabytes). Our scheme logically organizes metadata servers (MDSs) into a multilayered query hierarchy and exploits grouped Bloom filters to efficiently route metadata requests to desired MDSs through the hierarchy. This metadata lookup scheme can be executed at the network or memory speed, without being bounded by the performance of slow disks. An effective workload balance method is also developed in this paper for server reconfigurations. This scheme is evaluated through extensive trace-driven simulations and a prototype implementation in Linux. Experimental results show that this scheme can significantly improve metadata management scalability and query efficiency in ultralarge-scale storage systems.

[1] J. Piernas, T. Cortes, and J.M. Garcia, "The Design of New Journaling File Systems: The DualFS Case," IEEE Trans. Computers, vol. 56, no. 2, pp. 267-281, Feb. 2007.
[2] S.A. Brandt, E.L. Miller, D.D.E. Long, and L. Xue, "Efficient Metadata Management in Large Distributed Storage Systems," Proc. 20th IEEE/NASA Goddard Conf. Mass Storage Systems and Technologies (MSST), 2003.
[3] D. Roselli, J.R. Lorch, and T.E. Anderson, "A Comparison of File System Workloads," Proc. Ann. USENIX Technical Conf., 2000.
[4] L. Guy, P. Kunszt, E. Laure, H. Stockinger, and K. Stockinger, "Replica Management in Data Grids," technical report, GGF5 Working Draft, 2002.
[5] S. Moon and T. Roscoe, "Metadata Management of Terabyte Datasets from an IP Backbone Network: Experience and Challenges," Proc. Workshop Network-Related Data Management (NRDM), 2001.
[6] M. Cai, M. Frank, B. Yan, and R. MacGregor, "A Subscribable Peer-to-Peer RDF Repository for Distributed Metadata Management," J. Web Semantics: Science, Services and Agents on the World Wide Web, vol. 2, no. 2, pp. 109-130, 2005.
[7] C. Lukas and M. Roszkowski, "The Isaac Network: LDAP and Distributed Metadata for Resource Discovery," Internet Scout Project, http://scout.cs.wisc.edu/research/isaacldap.html , 2001.
[8] D. Fisher, J. Sobolewski, and T. Tyler, "Distributed Metadata Management in the High Performance Storage System," Proc. First IEEE Metadata Conf., 1996.
[9] A. Foster, C. Salisbury, and S. Tuecke, "The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets," J. Network and Computer Applications, vol. 23, pp. 187-200, 2001.
[10] M. Zingler, "Architectural Components for Metadata Management in Earth Observation," Proc. First IEEE Metadata Conf., 1996.
[11] B. Bloom, "Space/time Trade-Offs in Hash Coding with Allowable Errors," Comm. ACM, vol. 13, no. 7, pp. 422-426, 1970.
[12] P.J. Braam, "Lustre Whitepaper," http:/www.lustre.org, 2005.
[13] P.F. Corbett and D.G. Feitelson, "The Vesta Parallel File System," ACM Trans. Computer Systems, vol. 14, no. 3, pp. 225-264, 1996.
[14] P.J. Braam and P.A. Nelson, "Removing Bottlenecks in Distributed File Systems: Coda and Intermezzo as Examples," Proc. Linux Expo, 1999.
[15] T.E. Anderson, M.D. Dahlin, J.M. Neefe, D.A. Patterson, D.S. Roselli, and R.Y. Wang, "Serverless Network File Systems," ACM Trans. Computer Systems, vol. 14, no. 1, pp. 41-79, 1996.
[16] O. Rodeh and A. Teperman, "zFS—A Scalable Distributed File System Using Object Disks," Proc. 20th IEEE/NASA Goddard Conf. Mass Storage Systems and Technologies (MSST), pp. 207-218, 2003.
[17] B. Pawlowski, C. Juszczak, P. Staubach, C. Smith, D. Lebel, and D. Hitz, "Nfs Version3: Design and Implementation," Proc. USENIX Technical Conf., pp. 137-151, 1994.
[18] J.H. Morris, M. Satyanarayanan, M.H. Conner, J.H. Howard, D.S. Rosenthal, and F.D. Smith, "Andrew: A Distributed Personal Computing Environment," Comm. ACM, vol. 29, no. 3, pp. 184-201, 1986.
[19] M. Satyanarayanan, J.J. Kistler, P. Kumar, M.E. Okasaki, E.H. Siegel, and D.C. Steere, "Coda: A Highly Available File System for a Distributed Workstation Environment," IEEE Trans. Computers, vol. 39, no. 4, pp. 447-459, Apr. 1990.
[20] M.N. Nelson, B.B. Welch, and J.K. Ousterhout, "Caching in the Sprite Network File System," ACM Trans. Computer Systems, vol. 6, no. 1, pp. 134-154, 1988.
[21] A. Adya, R. Wattenhofer, W. Bolosky, M. Castro, G. Cermak, R. Chaiken, J. Douceur, J. Howell, J. Lorch, and M. Theimer, "Farsite: Federated, Available, and Reliable Storage for an Incompletely Trusted Environment," ACM SIGOPS Operating Systems Rev., vol. 36, pp. 1-14, 2002.
[22] V. Cate and T. Gross, "Combining the Concepts of Compression and Caching for a Two-Level Filesystem," ACM SIGARCH Computer Architecture News, vol. 19, no. 2, pp. 200-211, 1991.
[23] S. Weil, K. Pollack, S.A. Brandt, and E.L. Miller, "Dynamic Metadata Management for Petabyte-Scale File Systems," Proc. ACM/IEEE Conf. Supercomputing, 2004.
[24] S. Weil, S.A. Brandt, E.L. Miller, D.D.E. Long, and C. Maltzahn, "Ceph: A Scalable, High-Performance Distributed File System," Proc. Seventh Symp. Operating Systems Design and Implementation (OSDI), 2006.
[25] S. Weil, S.A. Brandt, E.L. Miller, and C. Maltzahn, "Crush: Controlled, Scalable, Decentralized Placement of Replicated Data," Proc. ACM/IEEE Conf. Supercomputing, 2006.
[26] R.J. Honicky and E.L. Miller, "Replication Under Scalable Hashing: A Family of Algorithms for Scalable Decentralized Data Distribution," Proc. 18th Int'l Parallel and Distributed Processing Symp. (IPDPS), 2004.
[27] L. Fan, P. Cao, J. Almeida, and A.Z. Brode, "Summary Cache: A Scalable Wide Area Web Cache Sharing Protocol," IEEE/ACM Trans. Networking, vol. 8, no. 3, pp. 281-293, June 2000.
[28] A. Chervenak, N. Palavalli, S. Bharathi, C. Kesselman, and R. Schwartzkopf, "Performance and Scalability of a Replica Location Service," Proc. 13th IEEE Int'l Symp. High Performance Distributed Computing (HPDC), 2004.
[29] Y. Zhu, H. Jiang, J. Wang, and F. Xian, "HBA: Distributed Metadata Management for Large Cluster-Based Storage Systems," IEEE Trans. Parallel and Distributed Systems, vol. 19, no. 6, pp. 750-763, June 2008.
[30] A. Broder and M. Mitzenmacher, "Network Applications of Bloom Filters: A Survey," Internet Math., vol. 1, pp. 485-509, 2005.
[31] E. Riedel, M. Kallahalla, and R. Swaminathan, "A Framework for Evaluating Storage System Security," Proc. Conf. File and Storage Technologies (FAST), pp. 15-30, 2002.
[32] Y. Zhu and H. Jiang, "False Rate Analysis of Bloom Filter Replicas in Distributed Systems," Proc. Int'l Conf. Parallel Processing (ICPP), pp. 255-262, 2006.
[33] D. Ellard, J. Ledlie, P. Malkani, and M. Seltzer, "Passive NFS Tracing of Email and Research Workloads," Proc. Second Symp. File and Storage Technologies (FAST), pp. 203-216, 2003.
[34] J. Kubiatowicz, D. Bindel, Y. Chen, P. Eaton, D. Geels, R. Gummadi, S. Rhea, H. Weatherspoon, W. Weimer, C. Wells, and B. Zhao, "Oceanstore: An Architecture for Global-Scale Persistent Storage," Proc. Ninth Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS), Nov. 2000.
[35] A. Azagury, V. Dreizin, M. Factor, E. Henis, D. Naor, N. Rinetzky, O. Rodeh, J. Satran, A. Tavory, and L. Yerushalmi., "Towards an Object Store." Proc. 20th IEEE/NASA Goddard Conf. Mass Storage Systems and Technologies (MSST), pp. 165-176, Apr. 2003.
[36] B. Welch and G. Gibson., "Managing Scalability in Object Storage Systems for HPC Linux Clusters." Proc. 21st IEEE/12th NASA Goddard Conf. Mass Storage Systems and Technologies (MSST), pp. 433-445, Apr. 2004.
[37] E.L. Miller and R.H. Katz, "Rama: An Easy-to-Use, High-Performance Parallel File System," Parallel Computing, vol. 23, pp. 419-446, 1997.
[38] P. Carns, W. Ligon III, R. Ross, and R. Thakur, "PVFS: A Parallel File System for Linux Clusters," Proc. Ann. Linux Showcase and Conf., pp. 317-327, 2000.
[39] N. Nieuwejaar and D. Kotz, The Galley Parallel File System, ACM Press, 1996.
[40] A.W. Leung, M. Shao, T. Bisson, S. Pasupathy, and E.L. Miller, "Spyglass: Fast, Scalable Metadata Search for Large-Scale Storage Systems," Technical Report UCSC-SSRC-08-01, 2008.
[41] A. Sweeney, D. Doucette, W. Hu, C. Anderson, M. Nishimoto, and G. Peck, "Scalability in the XFS File System," Proc. USENIX Ann. Technical Conf., pp. 1-14, 1996.
[42] M. Mitzenmacher, "Compressed Bloom Filters," IEEE/ACM Trans. Networking, vol. 10, no. 5, pp. 604-612, Oct. 2002.
[43] A. Kumar, J. Xu, and E.W. Zegura, "Efficient and Scalable Query Routing for Unstructured Peer-to-Peer Networks," Proc. IEEE INFOCOM, 2005.
[44] S. Cohen and Y. Matias, "Spectral Bloom Filters," Proc. ACM SIGMOD, 2003.
[45] Y. Zhang, D. Li, L. Chen, and X. Lu, "Collaborative Search in Large-Scale Unstructured Peer-to-Peer Networks," Proc. Int'l Conf. Parallel Processing (ICPP), 2007.
[46] F. Bonomi, M. Mitzenmacher, R. Panigrahy, S. Singh, and G. Varghese, "Beyond Bloom Filters: From Approximate Membership Checks to Approximate State Machines," Proc. ACM SIGCOMM, 2006.
[47] D. Guo, J. Wu, H. Chen, and X. Luo, "Theory and Network Application of Dynamic Bloom Filters," Proc. IEEE INFOCOM, 2006.
[48] B. Xiao and Y. Hua, "Using Parallel Bloom Filters for Multi-Attribute Representation on Network Services," IEEE Trans. Parallel and Distributed Systems, vol. 21, no. 1, pp. 20-32, Jan. 2010.
[49] Y. Hua, Y. Zhu, H. Jiang, D. Feng, and L. Tian, "Scalable and Adaptive Metadata Management in Ultra Large-Scale File Systems," Proc. 28th Int'l Conf. Distributed Computing Systems (ICDCS), pp. 403-410, 2008.

Index Terms:
File systems, Bloom filters, metadata management, scalability, performance evaluation.
Citation:
Yu Hua, Yifeng Zhu, Hong Jiang, Dan Feng, Lei Tian, "Supporting Scalable and Adaptive Metadata Management in Ultralarge-Scale File Systems," IEEE Transactions on Parallel and Distributed Systems, vol. 22, no. 4, pp. 580-593, April 2011, doi:10.1109/TPDS.2010.116
Usage of this product signifies your acceptance of the Terms of Use.