The Community for Technology Leaders
2012 IEEE International Symposium on Workload Characterization (IISWC) (2012)
La Jolla, CA, USA USA
Nov. 4, 2012 to Nov. 6, 2012
ISBN: 978-1-4673-4531-6
pp: 100-109
Cristina L. Abad , University of Illinois at Urbana-Champaign, USA
Nathan Roberts , Yahoo! Inc., USA
Yi Lu , University of Illinois at Urbana-Champaign, USA
Roy H. Campbell , University of Illinois at Urbana-Champaign, USA
ABSTRACT
A huge increase in data storage and processing requirements has lead to Big Data, for which next generation storage systems are being designed and implemented. However, we have a limited understanding of the workloads of Big Data storage systems. We consider the case of one common type of Big Data storage cluster: a cluster dedicated to supporting a mix of MapReduce jobs. We analyze 6-month traces from two large Hadoop clusters at Yahoo! and characterize the file popularity, temporal locality, and arrival patterns of the workloads. We identify several interesting properties and compare them with previous observations from web and media server workloads. To the best of our knowledge, this is the first study of how MapReduce workloads interact with the storage layer.
INDEX TERMS
access patterns, Big Data, MapReduce, HDFS
CITATION

C. L. Abad, N. Roberts, Y. Lu and R. H. Campbell, "A storage-centric analysis of MapReduce workloads: File popularity, temporal locality and arrival patterns," 2012 IEEE International Symposium on Workload Characterization (IISWC), La Jolla, CA, USA USA, 2012, pp. 100-109.
doi:10.1109/IISWC.2012.6402909
82 ms
(Ver 3.3 (11022016))