The Community for Technology Leaders
2013 IEEE 33rd International Conference on Distributed Computing Systems (2013)
Philadelphia, PA USA
July 8, 2013 to July 11, 2013
ISSN: 1063-6927
pp: 31-40
Qingyang Wang , Georgia Inst. of Technol., Atlanta, GA, USA
Yasuhiko Kanemasa , Cloud Comput. Res. Center, FUJITSU Labs. Ltd., Kawasaki, Japan
Jack Li , Georgia Inst. of Technol., Atlanta, GA, USA
Deepal Jayasinghe , Georgia Inst. of Technol., Atlanta, GA, USA
Toshihiro Shimizu , Cloud Comput. Res. Center, FUJITSU Labs. Ltd., Kawasaki, Japan
Masazumi Matsubara , Cloud Comput. Res. Center, FUJITSU Labs. Ltd., Kawasaki, Japan
Motoyuki Kawaba , Cloud Comput. Res. Center, FUJITSU Labs. Ltd., Kawasaki, Japan
Calton Pu , Georgia Inst. of Technol., Atlanta, GA, USA
ABSTRACT
Identifying the location of performance bottlenecks is a non-trivial challenge when scaling n-tier applications in computing clouds. Specifically, we observed that an n-tier application may experience significant performance loss when there are transient bottlenecks in component servers. Such transient bottlenecks arise frequently at high resource utilization and often result from transient events (e.g., JVM garbage collection) in an n-tier system and bursty workloads. Because of their short lifespan (e.g., milliseconds), these transient bottlenecks are difficult to detect using current system monitoring tools with sampling at intervals of seconds or minutes. We describe a novel transient bottleneck detection method that correlates throughput (i.e., request service rate) and load (i.e., number of concurrent requests) of each server in an n-tier system at fine time granularity. Both throughput and load can be measured through passive network tracing at millisecond-level time granularity. Using correlation analysis, we can identify the transient bottlenecks at time granularities as short as 50ms. We validate our method experimentally through two case studies on transient bottlenecks caused by factors at the system software layer (e.g., JVM garbage collection) and architecture layer (e.g., Intel SpeedStep).
INDEX TERMS
Servers, Throughput, Transient analysis, Time factors, Monitoring, Time measurement, Passive networks
CITATION

Qingyang Wang et al., "Detecting Transient Bottlenecks in n-Tier Applications through Fine-Grained Analysis," 2013 IEEE 33rd International Conference on Distributed Computing Systems(ICDCS), Philadelphia, PA USA, 2014, pp. 31-40.
doi:10.1109/ICDCS.2013.17
411 ms
(Ver 3.3 (11022016))