This Article 
 Bibliographic References 
 Add to: 
Spin Detection Hardware for Improved Management of Multithreaded Systems
June 2006 (vol. 17 no. 6)
pp. 508-521

Abstract—Spinning is a synchronization mechanism commonly used in applications and operating systems. Excessive spinning, however, often indicates performance or correctness (e.g., livelock) problems. Detecting if applications and operating systems are spinning is essential for achieving high performance, especially in consolidated servers running virtual machines. Prior research has used source or binary instrumentation to detect spinning. However, these approaches place a significant burden on programmers and may even be infeasible in certain situations. In this paper, we propose efficient hardware to detect spinning in unmodified applications and operating systems. Based on this hardware, we develop 1) scheduling and power policies that adaptively manage resources for spinning threads, 2) system support that helps detect when a multithreaded program is livelocked, and 3) hardware performance counters that accurately reflect system performance. Using full-system simulation with SPEC OMP, SPLASH-2, and Wisconsin commercial workloads, we demonstrate that our mechanisms effectively improve the management of multithreaded systems.

[1] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warfield, “Xen and the Art of Virtualization,” Proc. 20th ACM Symp. Operating System Principles, pp. 164-177, Oct. 2003.
[2] R. Uhlig, G. Neiger, D. Rodgers, A.L. Santoni, F.C. Martins, A.V. Anderson, S.M. Bennett, A. Kägi, F.H. Leung, and L. Smith, “Intel Virtualization Technology,” Computer, vol. 38, no. 5, pp. 48-56, May 2005.
[3] C.A. Waldspurger, “Memory Resource Management in VMware ESX Server,” Proc. Fifth Symp. Operating Systems Design and Implementation, pp. 181-194, Dec. 2002.
[4] A. Whitaker, M. Shaw, and S.D. Gribble, “Scale and Performance in the Denali Isolation Kernel,” Proc. Fifth Symp. Operating Systems Design and Implementation, pp. 195-210, Dec. 2002.
[5] VMware, Inc., “High CPU Utilization of Inactive Virtual Machines,” VMWare Knowledge Base, Answer ID 1077, std_adp.php?p_fa qid=1077, 2006.
[6] Intel, “Using Spin-Loops on Intel Pentium 4 Processor and Intel Xeon Processor,” Intel Corp., Order Number: 248674-002, May 2001.
[7] SPEC, “SPEC OpenMP Benchmark Suite V3.0,” http://www.spec .orgomp, Dec. 2003.
[8] A. Whitaker, S.D. Gribble, and M. Shaw, “Rethinking the Design of Virtual Machine Monitors,” Computer, vol. 38, no. 5, pp. 57-62, May 2005.
[9] S.C. Woo, M. Ohara, E. Torrie, J.P. Singh, and A. Gupta, “The SPLASH-2 Programs: Characterization and Methodological Considerations,” Proc. 22nd Ann. Int'l Symp. Computer Architecture, pp. 24-37, June 1995.
[10] E. Artiaga, N. Navarro, X. Martorell, and Y. Becerra, “Implementing PARMACS Macros for Shared Memory Multiprocessor Environments,” Technical Report UPC-DAC-1997-07, Dept. of Computer Architecture, Polytechnic Univ. of Catalunya, Jan. 1997.
[11] A.R. Karlin, K. Li, M.S. Manasse, and S. Owicki, “Empirical Studies of Competitive Spinning for a Shared-Memory Multiprocessor,” Proc. 13th ACM Symp. Operating System Principles, pp. 41-55, Oct. 1991.
[12] B.-H. Lim and A. Agarwal, “Waiting Algorithms for Synchronization in Large-Scale Multiprocessors,” ACM Trans. Computer Systems, vol. 11, no. 3, pp. 253-294, Aug. 1993.
[13] K.M. Lepak and M.H. Lipasti, “Silent Stores for Free,” Proc. 33rd Ann. IEEE/ACM Int'l Symp. Microarchitecture, pp. 22-31, Dec. 2000.
[14] M.M.K. Martin, D.J. Sorin, B.M. Beckmann, M.R. Marty, M. Xu, A.R. Alameldeen, K.E. Moore, M.D. Hill, and D.A. Wood, “Multifacet's General Execution-Driven Multiprocessor Simulator (GEMS) Toolset,” Computer Architecture News, vol. 33, no. 4, pp. 92-99, Sept. 2005.
[15] P.S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hållberg, J. Högberg, F. Larsson, A. Moestedt, and B. Werner, “Simics: A Full System Simulation Platform,” Computer, vol. 35, no. 2, pp. 50-58, Feb. 2002.
[16] C.J. Mauer, M.D. Hill, and D.A. Wood, “Full System Timing-First Simulation,” Proc. 2002 ACM Sigmetrics Conf. Measurement and Modeling of Computer Systems, pp. 108-116, June 2002.
[17] A.R. Alameldeen, M.M.K. Martin, C.J. Mauer, K.E. Moore, M. Xu, M.D. Hill, D.A. Wood, and D.J. Sorin, “Simulating a $2M Commercial Server on a $2K PC,” Computer, vol. 36, no. 2, pp. 50-57, Feb. 2003.
[18] J. Goodacre and A.N. Sloss, “Parallelism and the Arm Instruction Set Architecture,” Computer, vol. 38, no. 7, pp. 42-50, July 2005.
[19] Compaq, Alpha 21264 Microprocessor Hardware Reference Manual, Compaq Computer Corp., July 1999.
[20] T. Li, C.S. Ellis, A.R. Lebeck, and D.J. Sorin, “Pulse: A Dynamic Deadlock Detection Mechanism Using Speculative Execution,” Proc. 2005 USENIX Ann. Technical Conf., pp. 31-44, Apr. 2005.
[21] A. Charlesworth, “Starfire: Extending the SMP Envelope,” IEEE Micro, vol. 18, no. 1, pp. 39-49, Jan./Feb. 1998.
[22] D. Folegnani and A. González, “Energy-Effective Issue Logic,” Proc. 28th Ann. Int'l Symp. Computer Architecture, pp. 230-239, July 2001.
[23] J.L. Lo, L.A. Barroso, S.J. Eggers, K. Gharachorloo, H.M. Levy, and S.S. Parekh, “An Analysis of Database Workload Performance on Simultaneous Multithreaded Processors,” Proc. 25th Ann. Int'l Symp. Computer Architecture, pp. 39-50, June 1998.
[24] K. Olukotun, B.A. Nayfeh, L. Hammond, K. Wilson, and K.-Y. Chang, “The Case for a Single-Chip Multiprocessor,” Proc. Seventh Int'l Conf. Architectural Support for Programming Languages and Operating Systems, Oct. 1996.
[25] D.M. Tullsen, S.J. Eggers, J.S. Emer, H.M. Levy, J.L. Lo, and R.L. Stamm, “Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor,” Proc. 23rd Ann. Int'l Symp. Computer Architecture, pp. 191-202, May 1996.
[26] T. Yamauchi, L. Hammond, K. Olukotun, and K. Arimoto, “A Single Chip Multiprocessor Integrated with High Density DRAM,” IEICE Trans. Electronics, vol. E82-C, no. 8, pp. 1567-1577, Aug. 1999.
[27] D.M. Tullsen, J.L. Lo, S.J. Eggers, and H.M. Levy, “Supporting Fine-Grained Synchronization on a Simultaneous Multithreading Processor,” Proc. Fifth IEEE Symp. High-Performance Computer Architecture, pp. 54-58, Jan. 1999.
[28] S.W. Keckler, W.J. Dally, D. Maskit, N.P. Carter, A. Chang, and W.S. Lee, “Exploiting Fine-Grain Thread Level Parallelism on the MIT Multi-ALU Processor,” Proc. 25th Ann. Int'l Symp. Computer Architecture, pp. 306-317, June 1998.
[29] L.K. McDowell, S.J. Eggers, and S.D. Gribble, “Improving Server Software Support for Simultaneous Multithreaded Processors,” Proc. Ninth ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPOPP), pp. 37-48, June 2003.
[30] R. Kalla, B. Sinharoy, and J.M. Tendler, “IBM POWER5 Chip: A Dual-Core Multithreaded Processor,” IEEE Micro, vol. 24, no. 2, pp. 40-47, Mar./Apr. 2004.
[31] J. Li, J.F. Martinez, and M.C. Huang, “The Thrifty Barrier: Energy-Aware Synchronization in Shared-Memory Multiprocessors,” Proc. 10th IEEE Symp. High-Performance Computer Architecture, Feb. 2004.
[32] J.M. Redstone, “An Analysis of Software Interface Issues for SMT Processors,” PhD dissertation, Univ. of Washington, Dec. 2002.
[33] S. Owicki and L. Lamport, “Proving Liveness Properties of Concurrent Programs,” ACM Trans. Programming Languages and Systems, vol. 4, no. 3, pp. 455-495, July 1982.
[34] G.J. Holzmann, “The Model Checker SPIN,” IEEE Trans. Software Eng., vol. 23, no. 5, pp. 279-295, May 1997.
[35] S.C. Cheung and J. Kramer, “Context Constraints for Compositional Reachability Analysis,” ACM Trans. Software Eng. and Methodology, vol. 5, no. 4, pp. 334-377, Oct. 1996.
[36] D. Engler and K. Ashcraft, “RacerX: Effective, Static Detection of Race Conditions and Deadlock,” Proc. 20th ACM Symp. Operating System Principles, pp. 237-252, Oct. 2003.
[37] J.C. Mogul and K.K. Ramakrishnan, “Eliminating Receive Livelock in an Interrupt-Driven Kernel,” ACM Trans. Computer Systems, vol. 15, no. 3, pp. 217-252, Aug. 1997.
[38] B.A. Nayfeh, L. Hammond, and K. Olukotun, “Evaluation of Design Alternatives for a Multiprocessor Microprocessor,” Proc. 23rd Ann. Int'l Symp. Computer Architecture, pp. 67-77, May 1996.
[39] K.M. Lepak, H.W. Cain, and M.H. Lipasti, “Redeeming IPC as a Performance Metric for Multithreaded Programs,” Proc. Int'l Conf. Parallel Architectures and Compilation Techniques, Sept. 2003.
[40] R. Bryant and J. Hawkes, “Lockmeter: Highly Informative Instrumentation for Spin Locks in the Linux Kernel,” Proc. Fourth Ann. Linux Showcase & Conf., pp. 271-282, Oct. 2000.
[41] B.P. Miller, M.D. Callaghan, J.M. Cargille, J.K. Hollingsworth, R.B. Irvin, K.L. Karavanic, K. Kunchithapadam, and T. Newhall, “The Paradyn Parallel Performance Measurement Tools,” Computer, vol. 28, no. 11, pp. 37-46, Nov. 1995.
[42] R.W. Wisniewski and B. Rosenburg, “Efficient, Unified, and Scalable Performance Monitoring for Multiprocessor Operating Systems,” Proc. 2003 ACM/IEEE Conf. Supercomputing, pp. 3-16, Nov. 2003.
[43] Y. Solihin, V. Lam, and J. Torrellas, “Scal-Tool: Pinpointing and Quantifying Scalability Bottlenecks in DSM Multiprocessors,” Proc. 1999 ACM/IEEE Conf. Supercomputing, Nov. 1999.

Index Terms:
Deadlock, livelock, multiprocessor, multithreaded system, performance counter, scheduling, spinning, synchronization, virtualization.
Tong Li, Alvin R. Lebeck, Daniel J. Sorin, "Spin Detection Hardware for Improved Management of Multithreaded Systems," IEEE Transactions on Parallel and Distributed Systems, vol. 17, no. 6, pp. 508-521, June 2006, doi:10.1109/TPDS.2006.78
Usage of this product signifies your acceptance of the Terms of Use.