The Community for Technology Leaders
RSS Icon
Issue No.04 - July-Aug. (2013 vol.30)
pp: 88-94
Ariel Rabkin , Princeton University
Randy Howard Katz , University of California, Berkeley
This article describes an examination of a sample of several hundred support tickets for the Hadoop ecosystem, a widely used group of big data storage and processing systems; a taxonomy of errors and how they are addressed by supporters; and the misconfigurations that are the dominant cause of failures. Some design "antipatterns" and missing platform features contribute to these problems. Developers can use various methods to build more robust distributed systems, thereby helping users and administrators prevent some of these rough edges.
Cluster approximation, Information management, Data handling, Data storage systems, Software development, Software reliability, Analytical models, system administration, reliability, distributed systems, cloud computing, big data
Ariel Rabkin, Randy Howard Katz, "How Hadoop Clusters Break", IEEE Software, vol.30, no. 4, pp. 88-94, July-Aug. 2013, doi:10.1109/MS.2012.73
1. A. Rabkin and R. Katz, “Static Extraction of Program Configuration Options,” Proc. 33rd Int'l Conf. Software Eng. (ICSE 11), ACM, 2011, pp. 131–140.
2. R. Fonseca et al., “A Pervasive Network Tracing Framework,” Proc. 4th Usenix Symp. Networked Systems Design and Implementation (NSDI 07), Usenix, 2007;
3. B. Sigelman et al., Dapper, a Large-Scale Distributed Systems Tracing Infrastructure, tech. report, Google, 2010.
4. G. Candea and A. Fox, “Crash-Only Software,” Proc. 9th Workshop Hot Topics in Operating Systems (HotOS 03), Usenix, 2003, pp. 67–72.
486 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool