The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.02 - February (2012 vol.23)
pp: 337-344
Hong Jiang , University of Nebraska-Lincoln, Lincoln
Yifeng Zhu , University of Maine, Orono
Dan Feng , Huazhong University of Science and Technology, Wuhan
Lei Tian , University of Nebraska-Lincoln, Lincoln
ABSTRACT
Existing data storage systems based on the hierarchical directory-tree organization do not meet the scalability and functionality requirements for exponentially growing data sets and increasingly complex metadata queries in large-scale, Exabyte-level file systems with billions of files. This paper proposes a novel decentralized semantic-aware metadata organization, called SmartStore, which exploits semantics of files' metadata to judiciously aggregate correlated files into semantic-aware groups by using information retrieval tools. The key idea of SmartStore is to limit the search scope of a complex metadata query to a single or a minimal number of semantically correlated groups and avoid or alleviate brute-force search in the entire system. The decentralized design of SmartStore can improve system scalability and reduce query latency for complex queries (including range and top-k queries). Moreover, it is also conducive to constructing semantic-aware caching, and conventional filename-based point query. We have implemented a prototype of SmartStore and extensive experiments based on real-world traces show that SmartStore significantly improves system scalability and reduces query latency over database approaches. To the best of our knowledge, this is the first study on the implementation of complex queries in large-scale file systems.
INDEX TERMS
File systems, metadata management, scalability, performance evaluation.
CITATION
Hong Jiang, Yifeng Zhu, Dan Feng, Lei Tian, "Semantic-Aware Metadata Organization Paradigm in Next-Generation File Systems", IEEE Transactions on Parallel & Distributed Systems, vol.23, no. 2, pp. 337-344, February 2012, doi:10.1109/TPDS.2011.169
REFERENCES
[1] S.A. Weil, S.A. Brandt, E.L. Miller, D.D.E. Long, and C. Maltzahn, "Ceph: A Scalable, High-Performance Distributed File System," Proc. Symp. Operating Systems Design and Implementation (OSDI), 2006.
[2] Y. Hua, Y. Zhu, H. Jiang, D. Feng, and L. Tian, "Supporting Scalable and Adaptive Metadata Management in Ultralarge-Scale File Systems," IEEE Trans. Parallel and Distributed Systems, vol. 22, no. 4, pp. 580-593, Apr. 2011.
[3] M. Stonebraker and U. Cetintemel, "One Size Fits All: An Idea Whose Time Has Come and Gone," Proc. Int'l Conf. Data Eng. (ICDE), 2005.
[4] A.W. Leung, M. Shao, T. Bisson, S. Pasupathy, and E.L. Miller, "Spyglass: Fast, Scalable Metadata Search for Large-Scale Storage Systems," Proc. Conf. File and Storage Technologies (FAST), 2009.
[5] D. Roselli, J. Lorch, and T. Anderson, "A Comparison of File System Workloads," Proc. USENIX Conf., pp. 41-54, 2000.
[6] M. Seltzer and N. Murphy, "Hierarchical File Systems Are Dead," Proc. Conf. Hot Topics in Operating Systems (HotOS), 2009.
[7] D.K. Gifford, P. Jouvelot, M.A. Sheldon, and J.W. OToole, "Semantic File Systems," Proc. ACM Symp. Operating Systems Principles (SOSP), 1991.
[8] Y. Hua, H. Jiang, Y. Zhu, D. Feng, and L. Tian, "SmartStore: A New Metadata Organization Paradigm with Semantic-Awareness for Next-Generation File Systems," Proc. ACM/IEEE Supercomputing Conf. (SC), 2009.
[9] C. Papadimitriou, P. Raghavan, H. Tamaki, and S. Vempala, "Latent Semantic Indexing: A Probabilistic Analysis," J. Computer and System Sciences, vol. 61, no. 2, pp. 217-235, 2000.
[10] A. Guttman, "R-Trees: A Dynamic Index Structure for Spatial Searching," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD), 1984.
[11] B. Bloom, "Space/Time Trade-Offs in Hash Coding with Allowable Errors," Comm. ACM, vol. 13, no. 7, pp. 422-426, 1970.
[12] Y. Hua, Y. Zhu, H. Jiang, D. Feng, and L. Tian, "Scalable and Adaptive Metadata Management in Ultra Large-Scale File Systems," Proc. Int'l Conf. Distributed Computing Systems (ICDCS), 2008.
[13] T. Hofmann, "Probabilistic Latent Semantic Indexing," Proc. Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR), 1999.
[14] G. McLachlan and T. Krishnan, The EM Algorithm and Extensions. Wiley, 1997.
[15] A. Dempster et al., "Maximum Likelihood from Incomplete Data via the EM Algorithm," J. Royal Statistical Soc. Series B (Methodological), vol. 39, no. 1, pp. 1-38, 1977.
[16] Z. Rached, F. Alajaji, and L. Campbell, "The Kullback-Leibler Divergence Rate between Markov Sources," IEEE Trans. Information Theory, vol. 50, no. 5, pp. 917-921, May 2004.
[17] C.A.N. Soules, G.R. Goodson, J.D. Strunk, and G.R. Ganger, "Metadata Efficiency in Versioning File Systems," Proc. USENIX Conf. File and Storage Technologies (FAST), 2003.
[18] L. Fan, P. Cao, J. Almeida, and A. Broder, "Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol," IEEE/ACM Trans. Networking, vol. 8, no. 3, pp. 281-293, June 2000.
[19] E. Riedel, M. Kallahalla, and R. Swaminathan, "A Framework for Evaluating Storage System Security," Proc. USENIX Conf. File and Storage Technologies (FAST), 2002.
[20] S. Kavalanekar, B. Worthington, Q. Zhang, and V. Sharda, "Characterization of Storage Workload Traces from Production Windows Servers," Proc. IEEE Int'l Symp. Workload Characterization (IISWC), 2008.
[21] D. Ellard, J. Ledlie, P. Malkani, and M. Seltzer, "Passive NFS Tracing of Email and Research Workloads," Proc. USENIX Conf. File and Storage Technologies (FAST), 2003.
[22] A. Traeger, E. Zadok, N. Joukov, and C. Wright, "A Nine Year Study of File System and Storage Benchmarking," ACM Trans. Storage, vol. 4, no. 2, pp. 1-56, 2008.
[23] B. Piwowarski and G. Dupret, "Evaluation in (XML) Information Retrieval: Expected Precision-Recall with User Modelling (EPRUM)," Proc. Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR), pp. 260-267, 2006.
[24] D.K. Gifford, P. Jouvelot, M.A. Sheldon, and J.W.O. Jr, "Semantic File Systems," Proc. ACM Symp. Operating Systems Principles (SOSP), 1991.
[25] "Google Desktop," http:/desktop.google.com/, 2011.
[26] C. Soules and G. Ganger, "Connections: Using Context to Enhance File Search," Proc. ACM Symp. Operating Systems Principles (SOSP), 2005.
[27] J. Kleinberg, "Authoritative Sources in a Hyperlinked Environment," J. ACM, vol. 46, no. 5, pp. 604-632, 1999.
[28] "Google," http:/www.google.com/, 2011.
[29] S. Patil and G. Gibson, "Scale and Concurrency of GIGA+: File System Directories with Millions of Files," Proc. USENIX Conf. File and Storage Technologies (FAST), 2011.
[30] J.R. Douceur and J. Howell, "Distributed Directory Service in the Farsite File System," Proc. Symp. Operating Systems Design and Implementation (OSDI), pp. 321-334, 2006.
16 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool