Toward Fine-Grained, Unsupervised, Scalable Performance Diagnosis for Production Cloud Computing Systems
Issue No. 06 - June (2013 vol. 24)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.21
Haibo Mi , National University of Defense Technology, Changsha
Huaimin Wang , National University of Defense Technology, ChangSha
Yangfan Zhou , The Chinese University of Hong Kong, Shatin
Michael Rung-Tsong Lyu , The Chinese University of Hong Kong, Hong Kong
Hua Cai , Alibaba Cloud Computing, Alibaba Inc., Hangzhou
Performance diagnosis is labor intensive in production cloud computing systems. Such systems typically face many real-world challenges, which the existing diagnosis techniques for such distributed systems cannot effectively solve. An efficient, unsupervised diagnosis tool for locating fine-grained performance anomalies is still lacking in production cloud computing systems. This paper proposes CloudDiag to bridge this gap. Combining a statistical technique and a fast matrix recovery algorithm, CloudDiag can efficiently pinpoint fine-grained causes of the performance problems, which does not require any domain-specific knowledge to the target system. CloudDiag has been applied in a practical production cloud computing systems to diagnose performance problems. We demonstrate the effectiveness of CloudDiag in three real-world case studies.
Production, Cloud computing, Electronic mail, Synchronization, Time factors, Data collection, Clocks, request tracing, Cloud computing, performance diagnosis
M. R. Lyu, Y. Zhou, H. Wang, H. Mi and H. Cai, "Toward Fine-Grained, Unsupervised, Scalable Performance Diagnosis for Production Cloud Computing Systems," in IEEE Transactions on Parallel & Distributed Systems, vol. 24, no. , pp. 1245-1255, 2013.