The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.03 - March (2013 vol.24)
pp: 576-586
Yongmin Tan , North Carolina State University, Raleigh
Vinay Venkatesh , IBM, Research Triangle Park
Xiaohui Gu , North Carolina State University, Raleigh
ABSTRACT
Large-scale hosting infrastructures have become the fundamental platforms for many real-world systems such as cloud computing infrastructures, enterprise data centers, and massive data processing systems. However, it is a challenging task to achieve both scalability and high precision while monitoring a large number of intranode and internode attributes (e.g., CPU usage, free memory, free disk, internode network delay). In this paper, we present the design and implementation of a Resilient self-Compressive Monitoring (RCM) system for large-scale hosting infrastructures. RCM achieves scalable distributed monitoring by performing online data compression to reduce remote data collection cost. RCM provides failure resilience to achieve robust monitoring for dynamic distributed systems where host and network failures are common. We have conducted extensive experiments using a set of real monitoring data from NCSU's virtual computing lab (VCL), PlanetLab, a Google cluster, and real Internet traffic matrices. The experimental results show that RCM can achieve up to 200 percent higher compression ratio and several orders of magnitude less overhead than the existing approaches.
INDEX TERMS
Monitoring, Training, Image coding, Peer to peer computing, Data compression, Distributed databases, Measurement, distributed system monitoring, Online data compression
CITATION
Yongmin Tan, Vinay Venkatesh, Xiaohui Gu, "Resilient Self-Compressive Monitoring for Large-Scale Hosting Infrastructures", IEEE Transactions on Parallel & Distributed Systems, vol.24, no. 3, pp. 576-586, March 2013, doi:10.1109/TPDS.2012.167
REFERENCES
[1] "Amazon Elastic Compute Cloud," http://aws.amazon.comec2/, 2012.
[2] J. Dean and S. Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters," Proc. Symp. Operating System Design and Implementation (OSDI), 2004.
[3] "Apache Hadoop System," http://hadoop.apache.orgcore/, 2012.
[4] L. Amini, N. Jain, A. Sehgal, J. Silber, and O. Verscheure, "Adaptive Control of Extreme-Scale Stream Processing Systems," Proc. IEEE Int'l Conf. Distributed Computing Systems (ICDCS), 2006.
[5] M. Parashar and S. Hariri, "Autonomic Computing: An Overview," Proc. Workshop Unconventional Programming Paradigms (UPP), 2004.
[6] Z. Gong and X. Gu, "PAC: Pattern-Driven Application Consolidation for Efficient Cloud Computing," Proc. IEEE Int'l Symp. Modeling, Analysis and Simulation of Computer and Telecomm. Systems, 2010.
[7] Z. Gong, X. Gu, and J. Wilkes, "PRESS: Predictive Elastic Resource Scaling for Cloud Systems," Proc. Int'l Conf. Network and Service Management (CNSM), 2010.
[8] "RUBiS Online Auction System," http:/rubis.ow2.org/, 2012.
[9] I. Cohen, M. Goldszmidt, T. Kelly, J. Symons, and J.S. Chase, "Correlating Instrumentation Data to System States: A Building Block for Automated Diagnosis and Control," Proc. Conf. Symp. Operating Systems Design and Implementation (OSDI), 2004.
[10] X. Gu and H. Wang, "Online Anomaly Prediction for Robust Cluster Systems," Proc. Int'l Conf. Data Eng. (ICDE), 2009.
[11] L. Huang et al., "Communication-Efficient Online Detection of Network-Wide Anomalies," Proc. IEEE INFOCOM, 2007.
[12] "CoMon," http:/comon.cs.princeton.edu/, 2012.
[13] "IBM Tivoli Monitoring Software," http://www-01.ibm.com/softwaretivoli/, 2012.
[14] "NCSU Virtual Computing Lab," http:/vcl.ncsu.edu/, 2012.
[15] "httperf," http://code.google.com/phttperf/, 2012.
[16] R. Van Renesse, K.P. Birman, and W. Vogels, "Astrolabe: A Robust and Scalable Technology for Distributed System Monitoring, Management, and Data Mining," ACM Trans. Computing Systems, vol. 21, no. 2, pp. 164-206, 2003.
[17] P. Yalagandula and M. Dahlin, "A Scalable Distributed Information Management System," Proc. ACM SIGCOMM, Aug. 2004.
[18] D. Oppenheimer, J. Albrecht, D. Patterson, and A. Vahdat, "Design and Implementation Tradeoffs for Wide-Area Resource Discovery," Proc. Int'l Symp. High Performance Distributed Computing (HPDC), 2005.
[19] J. Liang, X. Gu, and K. Nahrstedt, "Self-Configuring Information Management for Large-Scale Service Overlays," Proc. IEEE INFOCOM, 2007.
[20] N. Jain, D. Kit, P. Mahajan, P. Yalagandula, M. Dahlin, and Y. Zhang, "STAR: Self-Tuning Aggregation for Scalable Monitoring," Proc. Int'l Conf. Very Large Data Bases (VLDB), 2007.
[21] Y. Zhao, Y. Tan, Z. Gong, X. Gu, and M. Wamboldt, "Self-Correlating Predictive Information Tracking for Large-Scale Production Systems," Proc. Int'l Conf. Autonomic Computing (ICAC), 2009.
[22] J.W. Mickens and B.D. Noble, "Exploiting Availability Prediction in Distributed Systems," Proc. Conf. Networked Systems Design and Implementation (NSDI), 2006.
[23] Y. Tan and X. Gu, "On Predictability of System Anomalies in Real World," Proc. IEEE/ACM Int'l Symp. Modeling, Analysis and Simulation of Computer and (MASCOTS), 2010.
[24] L. Peterson, T. Anderson, D. Culler, and T. Roscoe, "A Blueprint for Introducing Disruptive Technology into the Internet," Proc. Workshop Hot Topics in Networks (HotNets), 2002.
[25] "Google Cluster Data," http://googleresearch.blogspot.com/2010/ 01google-cluster-data.html, 2012.
[26] S. Uhlig, B. Quoitin, J. Lepropre, and S. Balon, "Providing Public Intradomain Traffic Matrices to the Research Community," Computer Comm. Rev., vol. 36, no. 1, pp. 83-86, 2006.
[27] S. Zhu and K.K. Ma, "A New Diamond Search Algorithm for Fast Block-Matching Motion Estimation," IEEE Trans. Image Processing, vol. 9, no. 2, pp. 287-290, Feb. 2000.
[28] J.B. Leners, H. Wu, W.-L. Hung, M.K. Aguilera, and M. Walfish, "Detecting Failures in Distributed Systems with the Falcon Spy Network," Proc. 23rd ACM Symp. Operating Systems Principles (SOSP), 2011.
[29] J. Han and M. Kamber, Data Mining: Concepts and Techniques. Morgan Kaufmann, 2000.
[30] J. Ziv and A. Lempel, "Compression of Individual Sequences via Variable-Rate Coding," IEEE Trans. Information Theory, vol. 24, no. 5, pp. 530-536, Sept. 1978.
[31] T.H. Cormen, C. Stein, R.L. Rivest, and C.E. Leiserson, Introduction to Algorithms, second ed. McGraw-Hill, 2001.
[32] M.C. Vuran and I.F. Akyildiz, "Spatial Correlation-Based Collaborative Medium Access Control in Wireless Sensor Networks," IEEE/ACM Trans. Networking, vol. 14, no. 2, pp. 316-329, Apr. 2006.
[33] S. Krishnamurthy, T. He, G. Zhou, J.A. Stankovic, and S. Son, "RESTORE: A Real-time Event Correlation and Storage Service for Sensor Networks," Proc. Third Int'l Conf. Networked Sensing Systems (ICNSS), 2006.
[34] A. Deshpand, E.C. Guestrin, and S.R. Madden, "Model-driven Data Acquisition in Sensor Networks," Proc. Int'l Conf. Very Large Data Bases (VLDB), 2002.
[35] A. Jain, E.Y. Chang, and Y.-F. Wang, "Adaptive Stream Resource Management Using Kalman Filters," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD), 2004.
[36] M. Cardosa and A. Chandra, "Resource Bundles: Using Aggregation for Statistical Wide-Area Resource Discovery and Allocation," Proc. Int'l Conf. Distributed Computing Systems (ICDCS), 2008.
[37] Y. Zhang, M. Roughan, W. Willinger, and L. Qiu, "Spatio-temporal Compressive Sensing and Internet Traffic Matrices," Proc. ACM SIGCOMM, 2009.
[38] Y. Tan, X. Gu, and V. Venkatesh, "OLIC: Online Information Compression for Scalable Hosting Infrastructure Monitoring," Proc. 19th Int'l Workshop Quality of Service (IWQoS), 2011.
35 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool