The Community for Technology Leaders
2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) (2018)
Washington, DC, USA
May 1, 2018 to May 4, 2018
ISBN: 978-1-5386-5815-4
pp: 452-461
ABSTRACT
Small files are known to pose major performance challenges for file systems. Yet, such workloads are increasingly common in a number of Big Data Analytics workflows or large-scale HPC simulations. These challenges are mainly caused by the common architecture of most state-of-the-art file systems needing one or multiple metadata requests before being able to read from a file. Small input file size causes the overhead of this metadata management to gain relative importance as the size of each file decreases. In this paper we propose a set of techniques leveraging consistent hashing and dynamic metadata replication to significantly reduce this metadata overhead. We implement such techniques inside a new file system named TýrFS, built as a thin layer above the Týr object store. We prove that TýrFS increases small file access performance up to one order of magnitude compared to other state-of-the-art file systems, while only causing a minimal impact on file write throughput.
INDEX TERMS
Big Data, data analysis, file organisation, meta data, parallel processing
CITATION

P. Matri, M. S. Perez, A. Costan and G. Antoniu, "TýrFS: Increasing Small Files Access Performance with Dynamic Metadata Replication," 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), Washington, DC, USA, 2018, pp. 452-461.
doi:10.1109/CCGRID.2018.00072
551 ms
(Ver 3.3 (11022016))