|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
2011 International Green Computing Conference and Workshops
Predictive data and energy management in GreenHDFS
Orlando, FL
July 25-July 28
ISBN: 978-1-4577-1222-7
| ASCII Text | x | ||
| Rini T. Kaushik, Tarek Abdelzaher, Ryota Egashira, Klara Nahrstedt, "Predictive data and energy management in GreenHDFS," 2012 International Green Computing Conference (IGCC), pp. 1-9, 2011 International Green Computing Conference and Workshops, 2011. | |||
| BibTex | x | ||
| @article{ 10.1109/IGCC.2011.6008563, author = {Rini T. Kaushik and Tarek Abdelzaher and Ryota Egashira and Klara Nahrstedt}, title = {Predictive data and energy management in GreenHDFS}, journal ={2012 International Green Computing Conference (IGCC)}, volume = {0}, year = {2011}, isbn = {978-1-4577-1222-7}, pages = {1-9}, doi = {http://doi.ieeecomputersociety.org/10.1109/IGCC.2011.6008563}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - CONF JO - 2012 International Green Computing Conference (IGCC) TI - Predictive data and energy management in GreenHDFS SN - 978-1-4577-1222-7 SP1 EP9 A1 - Rini T. Kaushik, A1 - Tarek Abdelzaher, A1 - Ryota Egashira, A1 - Klara Nahrstedt, PY - 2011 KW - energy consumption KW - predictive data KW - energy management KW - data-intensive compute clusters KW - Yahoo KW - file size KW - file lifespan KW - file heat KW - hierarchical directory structure KW - absolute file path KW - predictive GreenHDFS KW - Hadoop distributed file system KW - supervised machine learning technique KW - directory hierarchy KW - file attributes KW - predictive file zone placement KW - file migration KW - replication policy KW - large-scale production Hadoop cluster KW - GreenHDFS simulations VL - 0 JA - 2012 International Green Computing Conference (IGCC) ER - | |||
The sheer scale and rapid rise of Big Data mandates highly scalable, self-adaptive, and energy-conserving data-intensive compute clusters. Based on our analysis of the traces from a production Hadoop cluster at Yahoo!, we observe that file size, file lifespan, and file heat are statistically correlated and very strongly associated with the hierarchical directory structure (i.e., absolute file path) in which the files are organized. Leveraging that observation, we present predictive GreenHDFS; an energy-conserving variant of the Hadoop distributed file system that uses a supervised machine learning technique to learn the correlation between the directory hierarchy and the file attributes to guide novel predictive file zone placement, migration, and replication policies that significantly outperform the current state-of-the-art reactive approaches. Using real-world traces from a large-scale (2600 servers, 5 Petabytes) production Hadoop cluster at Yahoo! in our GreenHDFS simulations, we show how predictive GreenHDFS results in a much better trade-off between performance and energy consumption.
Index Terms:
energy consumption, predictive data, energy management, data-intensive compute clusters, Yahoo, file size, file lifespan, file heat, hierarchical directory structure, absolute file path, predictive GreenHDFS, Hadoop distributed file system, supervised machine learning technique, directory hierarchy, file attributes, predictive file zone placement, file migration, replication policy, large-scale production Hadoop cluster, GreenHDFS simulations
Citation:
Rini T. Kaushik, Tarek Abdelzaher, Ryota Egashira, Klara Nahrstedt, "Predictive data and energy management in GreenHDFS," igcc, pp.1-9, 2011 International Green Computing Conference and Workshops, 2011
Usage of this product signifies your acceptance of the Terms of Use.
