2015 IEEE International Conference on Cluster Computing (CLUSTER) (2015)
Chicago, IL, USA
Sept. 8, 2015 to Sept. 11, 2015
Hadoop, as one of the most widely accepted MapReduce frameworks, is naturally data-intensive. Its several dependent projects, such as Mahout and Hive, inherent this characteristic. Meanwhile I/O optimization becomes a daunting work, since applications' source code is not always available. I/O traces for Hadoop and its dependents are increasingly important, because it can faithfully reveal intrinsic I/O behaviors without knowing the source code. This method can not only help to diagnose system bottlenecks but also further optimize performance. To achieve this goal, we propose a transparent tracing and analysis tool suite, namely IOSIG+, which can be plugged into Hadoop system. We make several contributions: 1) we describe our approach of tracing, 2) we release the tracer, which can trace I/O operations without modifying targets' source code, 3) this work adopts several techniques to mitigate the introduced execution overhead at runtime, 4) we create an analyzer, which helps to discover new approaches to address I/O problems according to access patterns. The experimental results and analysis confirm its effectiveness and the observed overhead can be as low as 1.97%.
Java, Throughput, Optimization, Runtime, Tuning, Yarn, Performance evaluation
B. Feng, X. Yang, K. Feng, Y. Yin and X. Sun, "IOSIG+: On the Role of I/O Tracing and Analysis for Hadoop Systems," 2015 IEEE International Conference on Cluster Computing (CLUSTER), Chicago, IL, USA, 2015, pp. 62-65.