The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.09 - September (2011 vol.23)
pp: 1328-1344
ABSTRACT
Monitoring global states of a distributed cloud application is a critical functionality for cloud datacenter management. State monitoring requires meeting two demanding objectives: high level of correctness, which ensures zero or low error rate, and high communication efficiency, which demands minimal communication cost in detecting state updates. Most existing work follows an instantaneous model which triggers state alerts whenever a constraint is violated. This model may cause frequent and unnecessary alerts due to momentary value bursts and outliers. Countermeasures of such alerts may further cause problematic operations. In this paper, we present a WIndow-based StatE monitoring (WISE) framework for efficiently managing cloud applications. Window-based state monitoring reports alerts only when state violation is continuous within a time window. We show that it is not only more resilient to value bursts and outliers, but also able to save considerable communication when implemented in a distributed manner based on four technical contributions. First, we present the architectural design and deployment options for window-based state monitoring with centralized parameter tuning. Second, we develop a new distributed parameter tuning scheme enabling WISE to scale to much more monitoring nodes as each node tunes its monitoring parameters reactively without global information. Third, we introduce two optimization techniques, including their design rationale, correctness and usage model, to further reduce the communication cost. Finally, we provide an in-depth empirical study of the scalability of WISE, and evaluate the improvement brought by the distributed tuning scheme and the two performance optimizations. Our results show that WISE reduces communication by 50-90 percent compared with instantaneous monitoring approaches, and the improved WISE gains a clear scalability advantage over its centralized version.
INDEX TERMS
optimisation, cloud computing, distributed processing, communication cost, state monitoring, distributed cloud application, cloud datacenter management, high communication efficiency, minimal communication cost, triggers state alert, momentary value, window-based state monitoring framework, state violation, architectural design, centralized parameter tuning, distributed parameter tuning scheme, WISE, optimization technique, Monitoring, Servers, Optimization, Tuning, Scalability, Knowledge engineering, Measurement, tuning., State monitoring, datacenter, cloud, distributed, aggregation
CITATION
"State Monitoring in Cloud Datacenters", IEEE Transactions on Knowledge & Data Engineering, vol.23, no. 9, pp. 1328-1344, September 2011, doi:10.1109/TKDE.2011.70
REFERENCES
[1] Amazon, "Amazon Elastic Computer Cloud(Amazon ec2)," 2008.
[2] S. Bhatia, A. Kumar, M.E. Fiuczynski, and L.L. Peterson, "Lightweight, High-Resolution Monitoring for Troubleshooting Production Systems," Proc. Eighth USENIX Conf. Operating Systems Design and Implementation (OSDI), pp. 103-116, 2008.
[3] N. Jain, M. Dahlin, Y. Zhang, D. Kit, P. Mahajan, and P. Yalagandula, "Star: Self-Tuning Aggregation for Scalable Monitoring," Proc. 33rd Int'l Conf. Very Large Data Bases (VLDB), 2007.
[4] S. Mengy, S.R. Kashyap, C. Venkatramani, and L. Liu, "Remo: Resource-Aware Application State Monitoring for Large-Scale Distributed Systems," Proc. Int'l Conf. Distributed Computing Systems (ICDCS), 2009.
[5] N. Jain, P. Mahajan, D. Kit, P. Yalagandula, M. Dahlin, and Y. Zhang, "Network Imprecision: A New Consistency Metric for Scalable Monitoring," Proc. Eighth USENIX Symp. Operating Systems Design and Implementation (OSDI), pp. 87-102, 2008.
[6] B. Babcock and C. Olston, "Distributed Topk Monitoring," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2003.
[7] C. Olston, J. Jiang, and J. Widom, "Adaptive Filters for Continuous Queries over Distributed Data Streams," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2003.
[8] G. Cormode, M.N. Garofalakis, S. Muthukrishnan, and R. Rastogi, "Holistic Aggregates in a Networked World: Distributed Tracking of Approximate Quantiles," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 25-36, 2005.
[9] S. Krishnamurthy, C. Wu, and M.J. Franklin, "On-the-Fly Sharing for Streamed Aggregation," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 623-634, 2006.
[10] "Amazon Cloudwatch Beta," http://aws.amazon.com cloudwatch /, 2011.
[11] A. Bulut and A.K. Singh, "A Unified Framework for Monitoring Data Streams in Real Time," Proc. 21st Int'l Conf. Data Eng. (ICDE), 2005.
[12] S. Madden, M.J. Franklin, J.M. Hellerstein, and W. Hong, "Tinydb: An Acquisitional Query Processing System for Sensor Networks," ACM Trans. Database Systems, vol. 30, no. 1, pp. 122-173, 2005.
[13] A. Manjhi, S. Nath, and P.B. Gibbons, "Tributaries and Deltas: Efficient and Robust Aggregation in Sensor Network Streams," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2005.
[14] M. Li, Y. Liu, and L. Chen, "Non-Threshold Based Event Detection for 3D Environment Monitoring in Sensor Networks," Proc. Int'l Conf. Distributed Computing Systems (ICDCS '07), 2007.
[15] A. Deligiannakis, V. Stoumpos, Y. Kotidis, V. Vassalos, and A. Delis, "Outlier-Aware Data Aggregation in Sensor Networks," Proc. IEEE 24th Int'l Conf. Data Eng. (ICDE), 2008.
[16] L. Huang, M.N. Garofalakis, A.D. Joseph, and N. Taft, "Communication-Efficient Tracking of Distributed Cumulative Triggers," Proc. 27th Int'l Conf. Distributed Computing Systems (ICDCS), p. 54, 2007.
[17] A. Silberstein, K. Munagala, and J. Yang, "Energy-Efficient Monitoring of Extreme Values in Sensor Networks," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2006.
[18] M. Dilman and D. Raz, "Efficient Reactive Monitoring," Proc. IEEE INFOCOM '01, 2001.
[19] R. Keralapura, G. Cormode, and J. Ramamirtham, "Communication-Efficient Distributed Monitoring of Threshold Counts," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2006.
[20] I. Sharfman, A. Schuster, and D. Keren, "A Geometric Approach to Monitor Threshold Functions over Distributed Data Streams," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2006.
[21] S. Agrawal, S. Deb, K.V.M. Naidu, and R. Rastogi, "Efficient Detection of Distributed Constraint Violations," Proc. IEEE 23rd Int'l Conf. Data Eng. (ICDE), 2007.
[22] S.R. Kashyap, J. Ramamirtham, R. Rastogi, and P. Shukla, "Efficient Constraint Monitoring Using Adaptive Thresholds," Proc. IEEE 24th Int'l Conf. Data Eng. (ICDE), 2008.
[23] S. Meng, T. Wang, and L. Liu, "Monitoring Continuous State Violation in Datacenters: Exploring the Time Dimension," Proc. IEEE 26th Int'l Conf. Data Eng. (ICDE), 2010.
[24] M. Arlitt and T. Jin, "1998 World Cup Website Access Logs," http://www.acm.org/sigcommITA/, Aug. 1998.
[25] L.A. Adamic and B.A. Huberman, "Zipfs Law and the Internet," Glottometrics, 2002.
[26] A. Jain, J.M. Hellerstein, S. Ratnasamy, and D. Wetherall, "The Case for Distributed Triggers," Proc. ACM Workshop Hot Topics in Networks (HotNets), 2004.
SEARCH
61 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool