Response Time Reliability in Cloud Environments: An Empirical Study of n-Tier Applications at High Resource Utilization
Reliable Distributed Systems, IEEE Symposium on (2012)
Irvine, CA, USA USA
Oct. 8, 2012 to Oct. 11, 2012
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/SRDS.2012.61
When running mission-critical web-facing applications (e.g., electronic commerce) in cloud environments, predictable response time, e.g., specified as service level agreements (SLA), is a major performance reliability requirement. Through extensive measurements of n-tier application benchmarks in a cloud environment, we study three factors that significantly impact the application response time predictability: bursty workloads (typical of web-facing applications), soft resource management strategies (e.g., global thread pool or local thread pool), and bursts in system software consumption of hardware resources (e.g., Java Virtual Machine garbage collection). Using a set of profit-based performance criteria derived from typical SLAs, we show that response time reliability is brittle, with large response time variations (order of several seconds) depending on each one of those factors. For example, for the same workload and hardware platform, different apparently reasonable soft resource management strategies may result in profit differences of 26\%. Similarly, modest increases in workload burstiness may result in profit drops of more than 50\%. Our study shows that performance reliability of large scale distributed applications is a significant and interesting research challenge. Furthermore, our results show that profit-based performance criteria may contribute significantly to the successful delimitation of performance unreliability boundaries and thus support effective management of clouds.
profit model, performance reliability, response time prediction, n-tier, web application
Q. Wang, Y. Kanemasa, J. Li, D. Jayasinghe, M. Kawaba and C. Pu, "Response Time Reliability in Cloud Environments: An Empirical Study of n-Tier Applications at High Resource Utilization," 2012 IEEE 31st International Symposium on Reliable Distributed Systems (SRDS 2012)(SRDS), Irvine, CA, 2012, pp. 378-383.