|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems
Towards Synthesizing Realistic Workload Traces for Studying the Hadoop Ecosystem
Singapore, Singapore
July 25-July 27
ISBN: 978-0-7695-4430-4
| ASCII Text | x | ||
| Guanying Wang, Ali R. Butt, Henry Monti, Karan Gupta, "Towards Synthesizing Realistic Workload Traces for Studying the Hadoop Ecosystem," 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, pp. 400-408, 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems, 2011. | |||
| BibTex | x | ||
| @article{ 10.1109/MASCOTS.2011.59, author = {Guanying Wang and Ali R. Butt and Henry Monti and Karan Gupta}, title = {Towards Synthesizing Realistic Workload Traces for Studying the Hadoop Ecosystem}, journal ={2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems}, volume = {0}, year = {2011}, issn = {1526-7539}, pages = {400-408}, doi = {http://doi.ieeecomputersociety.org/10.1109/MASCOTS.2011.59}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - CONF JO - 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems TI - Towards Synthesizing Realistic Workload Traces for Studying the Hadoop Ecosystem SN - 1526-7539 SP400 EP408 A1 - Guanying Wang, A1 - Ali R. Butt, A1 - Henry Monti, A1 - Karan Gupta, PY - 2011 KW - Cloud computing KW - Performance analysis KW - Design optimization KW - Software performance modeling VL - 0 JA - 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems ER - | |||
Designing cloud computing setups is a challenging task. It involves understanding the impact of a plethora of parameters ranging from cluster configuration, partitioning, networking characteristics, and the targeted applications' behavior. The design space, and the scale of the clusters, make it cumbersome and error-prone to test different cluster configurations using real setups. Thus, the community is increasingly relying on simulations and models of cloud setups to infer system behavior and the impact of design choices. The accuracy of the results from such approaches depends on the accuracy and realistic nature of the workload traces employed. Unfortunately, few cloud workload traces are available (in the public domain). In this paper, we present the key steps towards analyzing the traces that have been made public, e.g., from Google, and inferring lessons that can be used to design realistic cloud workloads as well as enable thorough quantitative studies of Hadoop design. Moreover, we leverage the lessons learned from the traces to undertake two case studies: (i) Evaluating Hadoop job schedulers, and (ii) Quantifying the impact of shared storage on Hadoop system performance.
Index Terms:
Cloud computing, Performance analysis, Design optimization, Software performance modeling
Citation:
Guanying Wang, Ali R. Butt, Henry Monti, Karan Gupta, "Towards Synthesizing Realistic Workload Traces for Studying the Hadoop Ecosystem," mascots, pp.400-408, 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems, 2011
Usage of this product signifies your acceptance of the Terms of Use.
