This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Precise, Scalable, and Online Request Tracing for Multitier Services of Black Boxes
June 2012 (vol. 23 no. 6)
pp. 1159-1167
Bo Sang, Purdue University, West Lafayette
Jianfeng Zhan, Chinese Academy of Sciences, Beijing
Gang Lu, Chinese Academy of Sciences, Beijing
Haining Wang, College of William and Mary, Williamsburg
Dongyan Xu, Purdue University, West Lafayette
Lei Wang, Chinese Academy of Sciences, Beijing
Zhihong Zhang, Chinese Academy of Sciences, Beijing
Zhen Jia, Chinese Academy of Sciences, Beijing
As more and more multitier services are developed from commercial off-the-shelf components or heterogeneous middleware without source code available, both developers and administrators need a request tracing tool to 1) exactly know how a user request of interest travels through services of black boxes and 2) obtain macrolevel user request behaviors of services without manually analyzing massive logs. This need is further exacerbated by IT system “agility,” which mandates the tracing tool to provide online performance data since offline approaches cannot reflect system changes in real time. Moreover, considering the large scale of deployed services, a pragmatic tracing approach should be scalable in terms of the cost in collecting and analyzing logs. In this paper, we introduce a precise, scalable, and online request tracing tool for multitier services of black boxes. Our contributions are threefold. First, we propose a precise request tracing algorithm for multitier services of black boxes, which only uses application-independent knowledge. Second, we present a microlevel abstraction, component activity graph, to represent causal paths of each request. On the basis of this abstraction, we use dominated causal path patterns to represent repeatedly executed causal paths that account for significant fractions, and we further present a derived performance metric of causal path patterns, latency percentages of components, to enable debugging performance-in-the-large. Third, we develop two mechanisms, tracing on demand and sampling, to significantly increase the system scalability. We implement a prototype of the proposed system, called PreciseTracer, and release it as open source code. In comparison with WAP5—a black-box tracing approach, PreciseTracer achieves higher tracing accuracy and faster response time. Our experimental results also show that PreciseTracer has low overhead, and still achieves high tracing accuracy even if an aggressive sampling policy is adopted, indicating that PreciseTracer is a promising tracing tool for large-scale production systems.

[1] P. Barham et al., "Using Magpie for Request Extraction and Workload Modeling," Proc. Sixth Conf. Symp. Operating Systems Design and Implementation (OSDI), pp. 18-18, 2004.
[2] P. Barham et al., "Magpie: Online Modelling and Performance-Aware System," Proc. Conf. Hot Topics in Operating Systems (HotOS '03), pp. 85-90, 2003.
[3] P. Reynolds et al., "WAP5: Black-Box Performance Debugging for Wide-Area Systems," Proc. 15th Int'l Conf. World Wide Web (WWW), pp. 347-356, 2006.
[4] M.K. Aguilera et al., "Performance Debugging for Distributed Systems of Black Boxes," Proc. 19th ACM Symp. Operating Systems Principles (SOSP), pp. 74-89, 2003.
[5] E. Koskinenand et al., "BorderPatrol: Isolating Events for Black-Box Tracing," SIGOPS Operating System Rev., vol. 42, no. 4, pp. 191-203, 2008.
[6] M. Ricahrd Stevens, UNIX Network Programming Networking APIs: Sockets and XTI, vol. 1, Prentice Hall, 1998.
[7] S. Agarwala et al., "E2EProf: Automated End-to-End Performance Management for Enterprise Systems," Proc. 37th IEEE Int'l Conf. Dependable Systems and Networks (DSN), pp. 749-758, 2007.
[8] L. Lamport, "Time, Clocks and the Ordering of Events in a Distributed System," Comm. ACM, vol. 21, no. 7, pp. 558-565, 1978.
[9] B.P. Miller, "DPM: A Measurement System for Distributed Programs," IEEE Trans. Computers, vol. 37, no. 2, pp. 243-248, Feb. 1988.
[10] B. Tierney et al., "The NetLogger Methodology for High Performance Distributed Systems Performance Analysis," Proc. 17th Int'l Symp. High Performance Distributed Computing (HPDC), pp. 260-267, 1998.
[11] J.L. Hellerstein et al., "ETE: A Customizable Approach to Measuring End-to-End Response Times and Their Components in Distributed Systems," Proc. 19th Int'l Conf. Distributed Computing System (ICDCS), pp. 152-162, 1999.
[12] E. Thereska et al., "Stardust: Tracking Activity in a Distributed Storage System," Proc. Joint Int'l Conf. Measurement and Modeling of Computer Systems (SIGMETRICS), pp. 3-14, 2006.
[13] P. Reynolds et al., "Pip: Detecting the Unexpected in Distributed Systems," Proc. Third Conf. Networked Systems Design and Implementation (NSDI), pp. 115-128, 2006.
[14] A. Chanda et al., "Whodunit: Transactional Profiling for Multi-Tier Applications," SIGOPS Operating Systems Rev., vol. 41, no. 3, pp. 17-30, 2007.
[15] M.Y. Chen et al., "Pinpoint: Problem Determination in Large, Dynamic Internet Services," Proc. 32th Int'l Conf. Dependable Systems and Networks (DSN), pp. 595-604, 2002.
[16] A. Chanda et al., "Causeway: Operating System Support for Controlling and Analyzing the Execution of Distributed Programs," Proc. 10th Conf. Hot Topics in Operating Systems (HotOS), pp. 18-18, 2005.
[17] R. Fonseca et al., "X-Trace: A Pervasive Network Tracing Framework," Proc. Fourth USENIX Conf. Networked Systems Design and Implementation (NSDI), pp. 271-284, 2007.
[18] A. Anandkumar et al., "Tracking in a Spaghetti Bowl: Monitoring Transactions Using Footprints," Proc. Int'l Conf. Measurement and Modeling of Computer Systems (SIGMETRICS '08), pp. 133-144, 2008.
[19] M.Y. Chen et al., "Path-Based Failure and Evolution Management," Proc. USENIX Symp. Networked Systems Design and Implementation (NSDI '04), 2004.
[20] Z. Zhang et al., "Precise Request Tracing and Performance Debugging of Multi-Tier Services of Black Boxes," Proc. IEEE Int'l Conf. Dependable Systems and Networks (DSN '09), pp. 337-346, 2009.
[21] B. Sang et al., "Decreasing Log Data of Multi-Tier Services for Effective Request Tracing," Proc. IEEE Int'l Conf. Dependable Systems and Networks (DSN '09), 2009.
[22] B.C. Tak et al., "vPath: Precise Discovery of Request Processing Paths from Black-Box Observations of Thread and Network Activities," Proc. Conf. USENIX Ann. Technical Conf. (USENIX '09), 2009.
[23] B.M. Cantrill et al., "Dynamic Instrumentation of Production Systems," Proc. Conf. USENIX Ann. Technical Conf. (USENIX '04), 2004.
[24] Y. Ruan et al., "Making the "Box" Transparent: System Call Performance as a First-Class Result," Proc. Ann. Conf. USENIX Ann. Technical Conf. (USENIX ATC '04), 2004.
[25] Y. Ruan et al., "Understanding and Addressing Blocking-Induced Network Server Latency," Proc. Ann. Conf. USENIX Ann. Technical Conf. (USENIX ATC '06), 2006.
[26] K. Shen et al., "Hardware Counter Driven on-the-fly Request Signatures," Proc. Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS XIII), pp. 189-200, 2008.
[27] B.H. Sigelman et al., "Dapper, a Large-Scale Distributed Systems Tracing Infrastructure," Google Technical Report dapper-2010-1, Apr. 2010.
[28] J. Tan et al., "Visual Log-Based Causal Tracing for Performance Debugging of MapReduce Systems," Proc. IEEE 30th Int'l Conf. Distributed Computing Systems (ICDCS '10), 2010.
[29] C. Stewart et al., "Performance Modeling and System Management for Multi-Component Online Services," Proc. Conf. Symp. Networked Systems Design and Implementation (NSDI '05), 2005.
[30] K. Appleby et al., "Oceano—SLA Based Management of a Computing Utility," Proc. IFIP/IEEE Symp. Integrated Network Management, pp. 855-868, 2001.
[31] R. Krishnakumar, "Kernel Korner: kprobes-a Kernel Debugger," Linux J., vol. 2005, no. 133, p. 11, May 2005.
[32] L. Yuan et al., "PowerTracer, Tracing Requests in Multi-tier Services to Save Cluster Power Consumption," technical report, http://arxiv.orgcorr/, 2010.
[33] SystemTap, http://sourceware.orgsystemtap, 2011.
[34] TPC Benchmark, http://www.tpc.orgtpcw/, 2011.
[35] G. Ren et al., "Google-Wide Profiling: A Continuous Profiling Infrastructure for Data Centers," IEEE Micro, vol. 30, no. 4, pp. 65-79, July 2010.
[36] P. Wang et al., "Transformer: A New Paradigm for Building Data-Parallel Programming Models," IEEE Micro, vol. 30, no. 4, pp. 55-64, July 2010.
[37] L. Wang et al., "In Cloud, Do MTC or HTC Service Providers Benefit from the Economies of Scale?" Proc. Workshop Many-Task Computing on Grids and Supercomputers (MTAGS '09), 2009.
[38] L. Wang et al., "In Cloud, Can Scientific Communities Benefit from the Economies of Scale?," IEEE Trans. Parallel and Distributed Systems, vol. 23, no. 2, pp. 296-303, Feb. 2012.
[39] Readings in Instrumentation, Profiling, and Tracing, http://www.inf.usi.ch/faculty/hauswirth/ teachingipt.html, 2011.
[40] X. Liu et al., "Automatic Performance Debugging of SPMD-Style Parallel Programs," J. Parallel and Distributed Computing, vol. 71, no. 7, pp. 925-937, July 2011.

Index Terms:
Multitier service, black boxes, precise request tracing, micro- and macrolevel abstractions, online analysis, performance debugging, scalability.
Citation:
Bo Sang, Jianfeng Zhan, Gang Lu, Haining Wang, Dongyan Xu, Lei Wang, Zhihong Zhang, Zhen Jia, "Precise, Scalable, and Online Request Tracing for Multitier Services of Black Boxes," IEEE Transactions on Parallel and Distributed Systems, vol. 23, no. 6, pp. 1159-1167, June 2012, doi:10.1109/TPDS.2011.257
Usage of this product signifies your acceptance of the Terms of Use.