Issue No.03 - March (2013 vol.24)
Yongmin Tan , North Carolina State University, Raleigh
Vinay Venkatesh , IBM, Research Triangle Park
Xiaohui Gu , North Carolina State University, Raleigh
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.167
Large-scale hosting infrastructures have become the fundamental platforms for many real-world systems such as cloud computing infrastructures, enterprise data centers, and massive data processing systems. However, it is a challenging task to achieve both scalability and high precision while monitoring a large number of intranode and internode attributes (e.g., CPU usage, free memory, free disk, internode network delay). In this paper, we present the design and implementation of a Resilient self-Compressive Monitoring (RCM) system for large-scale hosting infrastructures. RCM achieves scalable distributed monitoring by performing online data compression to reduce remote data collection cost. RCM provides failure resilience to achieve robust monitoring for dynamic distributed systems where host and network failures are common. We have conducted extensive experiments using a set of real monitoring data from NCSU's virtual computing lab (VCL), PlanetLab, a Google cluster, and real Internet traffic matrices. The experimental results show that RCM can achieve up to 200 percent higher compression ratio and several orders of magnitude less overhead than the existing approaches.
Monitoring, Training, Image coding, Peer to peer computing, Data compression, Distributed databases, Measurement, distributed system monitoring, Online data compression
Yongmin Tan, Vinay Venkatesh, Xiaohui Gu, "Resilient Self-Compressive Monitoring for Large-Scale Hosting Infrastructures", IEEE Transactions on Parallel & Distributed Systems, vol.24, no. 3, pp. 576-586, March 2013, doi:10.1109/TPDS.2012.167