Issue No. 01 - March (2017 vol. 3)
Dongfang Zhao , Department of Computer Science and Engineering, University of Washington, Seattle, WA
Kan Qiao , Google Inc., Kirkland, WA
Zhou Zhou , Department of Computer Science, Illinois Institute of Technology, Chicago, IL
Tonglin Li , Oak Ridge National Laboratory, Oak Ridge, TN
Zhihan Lu , Department of Computer Science, University College London, London, United Kingdom
Xiaohua Xu , Department of Computer Science, Kennesaw State University, Kennesaw, GA
In Big Data era, applications are generating orders of magnitude more data in both volume and quantity. While many systems emerge to address such data explosion, the fact that these data’s descriptors, i.e., metadata, are also “big” is often overlooked. The conventional approach to address the big metadata issue is to disperse metadata into multiple machines. However, it is extremely difficult to preserve both load-balance and data-locality in this approach. To this end, in this work we propose hierarchical indirection layers for indexing the underlying distributed metadata. By doing this, data locality is achieved efficiently by the indirection while load-balance is preserved. Three key challenges exist in this approach, however: first, how to achieve high resilience; second, how to ensure flexible granularity; third, how to restrain performance overhead. To address above challenges, we design Dindex, a distributed indexing service for metadata. Dindex incorporates a hierarchy of coarse-grained aggregation and horizontal key-coalition. Theoretical analysis shows that the overhead of building Dindex is compensated by only two or three queries. Dindex has been implemented by a lightweight distributed key-value store and integrated to a fully-fledged distributed filesystem. Experiments demonstrated that Dindex accelerated metadata queries by up to 60 percent with a negligible overhead.
Metadata, Indexing, Big data, Buildings, Electronic mail, Computer science
D. Zhao, K. Qiao, Z. Zhou, T. Li, Z. Lu and X. Xu, "Toward Efficient and Flexible Metadata Indexing of Big Data Systems," in IEEE Transactions on Big Data, vol. 3, no. 1, pp. 107-117, 2017.