Issue No.06 - June (2008 vol.19)
An efficient and distributed scheme for file mapping or file lookup is critical in decentralizing metadata management within a group of metadata servers. This paper presents a novel technique called HBA (Hierarchical Bloom filter Arrays) to map filenames to the metadata servers holding their metadata. Two levels of probabilistic arrays, namely, Bloom filter arrays, with different level of accuracies, are used on each metadata server. One array, with lower accuracy and representing the distribution of the entire metadata, trades accuracy for significantly reduced memory overhead, while the other array, with higher accuracy, caches partial distribution information and exploits the temporal locality of file access patterns. Both arrays are replicated to all metadata servers to support fast local lookups. We evaluate HBA through extensive trace-driven simulations and an implementation in Linux. Simulation results show our HBA design to be highly effective and efficient in improving performance and scalability of file systems in clusters with 1,000 to 10,000 nodes (or super-clusters) and with the amount of data in the Peta-byte scale or higher. Our implementation indicates that HBA can reduce metadata operation time of a single-metadata-server architecture by a factor of up to 43.9 when the system is configured with 16 metadata servers.
Distributed file systems, Distributed file systems, Distributed systems, Parallel systems, Storage Management, File Systems Management
Yifeng Zhu, Hong Jiang, Jun Wang, Feng Xian, "HBA: Distributed Metadata Management for Large Cluster-Based Storage Systems", IEEE Transactions on Parallel & Distributed Systems, vol.19, no. 6, pp. 750-763, June 2008, doi:10.1109/TPDS.2007.70788