This Article 
 Bibliographic References 
 Add to: 
CPU Accounting for Multicore Processors
February 2012 (vol. 61 no. 2)
pp. 251-264
Carlos Luque, Universitat Politecnica de Catalunya / Barcelona Supercomputing Center, Barcelona
Miquel Moreto, Universitat Politecnica de Catalunya, and Barcelona Supercomputing Center, Barcelona
Francisco J. Cazorla, Spanish National Research Council, and Barcelona Supercomputing Center, Barcelona
Roberto Gioiosa, Barcelona Supercomputing Center, Barcelona
Alper Buyuktosunoglu, IBM T.J. Watson Research Center, New York
Mateo Valero, Universitat Politecnica de Catalunya, and Barcelona Supercomputing Center, Barcelona
In single-threaded processors and Symmetric Multiprocessors the execution time of a task depends on the other tasks it runs with (the workload), since the Operating System (OS) time shares the CPU(s) between tasks in the workload. However, the time accounted to a task is roughly the same regardless of the workload in which the task runs in, since the OS takes into account those periods in which the task is not scheduled onto a CPU. Chip Multiprocessors (CMPs) introduce complexities when accounting CPU utilization, since the CPU time to account to a task not only depends on the time that the task is scheduled onto a CPU, but also on the amount of hardware resources it receives during that period. And given that in a CMP hardware resources are dynamically shared between tasks, the CPU time accounted to a task in a CMP depends on the workload it executes in. This is undesirable because the same task with the same input data set may be accounted differently depending on the workload it executes. In this paper, we identify how an inaccurate measurement of the CPU utilization affects several key aspects of the system such as OS statistics or the charging mechanism in data centers. We propose a new hardware CPU accounting mechanism to improve the accuracy when measuring the CPU utilization in CMPs and compare it with the previous accounting mechanisms. Our results show that currently known mechanisms lead to a 16 percent average error when it comes to CPU utilization accounting. Our proposal reduces this error to less than 2.8 percent in a modeled 8-core processor system.

[1] C. Acosta et al., "The MPsim Simulation Tool," Technical Report UPC-DAC-RR-CAP-2009-15, Computer Architecture Dept., UPC, 2009.
[2] R.L. Arndt et al., "Method and Apparatus for Frequency Independent Processor Utilization Recording Register in a Simultaneously Multi-Threaded Processor," US Patent 7,870,406, to IBM Corp., Patent and Trademark Office, 2011.
[3] F.J. Cazorla et al., "Predictable Performance in SMT Processors: Synergy between the OS and SMTs," IEEE Trans. Computers, vol. 55, no. 7, pp. 785-799, July 2006.
[4] S. Eyerman and L. Eeckhout, "Per-Thread Cycle Accounting in SMT Processors," Proc. 14th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 133-144, 2009.
[5] S. Eyerman et al., "A Performance Counter Architecture for Computing Accurate CPI Components," Proc. 12th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 175-184, 2006.
[6] A. Fedorova, M. Seltzer, and M. Smith, "Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler," Proc. 16th Int'l Conf. Parallel Architecture and Compilation Techniques (PACT), pp. 25-38, 2007.
[7] M.S. Floyd et al., "System Power Management Support in the IBM POWER6 Microprocessor," IBM J. Research and Development, vol. 51, no. 6, pp. 733-746, 2007.
[8] R.L. Mattson et al. "Evaluation Techniques for Storage Hierarchies," IBM Systems J., vol. 9, no. 2, pp. 78-117, 1970.
[9] L. Hammond, B.A. Nayfeh, and K. Olukotun, "A Single-Chip Multiprocessor," Computer, vol. 30, no. 9, pp. 79-85, Sept. 1997.
[10] R.R. Iyer et al., "QoS Policies and Architecture for Cache/Memory in CMP Platforms," Proc. Int'l Conf. Measurement and Modeling of Computer Systems (SIGMETRICS), pp. 25-36, 2007.
[11] S. Kim, D. Chandra, and Y. Solihin, "Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture," Proc. 13th Int'l Conf. Parallel Architecture and Compilation Techniques (PACT), pp. 111-122, 2004.
[12] C. Luque et al., "CPU Accounting in CMP Processors," Computer Architecture Letters, vol. 8, no. 1, pp. 17-20, 2009.
[13] C. Luque et al., "ITCA: Inter-Task Conflict-Aware CPU Accounting for CMPs," Proc. 18th Int'l Conf. Parallel Architecture and Compilation Techniques (PACT), pp. 203-213, 2009.
[14] P. Mackerras, T.S. Mathews, and R.C. Swanberg, "Operating System Exploitation of the POWER5 System," IBM J. Research and Development, vol. 49, nos. 4/5, pp. 533-539, 2005.
[15] M. Moreto et al., "FlexDCP: A QoS Framework for CMP Architectures," SIGOPS Operating Systems Rev., vol. 43, no. 2, pp. 86-96, 2009.
[16] N. Muralimanohar, R. Balasubramonian, and N. Jouppi, "CACTI 6.0: A Tool to Understand Large Caches," Technical Report HPL-2009-85, HP, 2009.
[17] M.K. Qureshi, "Adaptive Spill-Receive for Robust High-Performance Caching in CMPs," Proc. 15th Int'l Conf. High-Performance Computer Architecture (HPCA), pp. 45-54, 2009.
[18] M.K. Qureshi and Y.N. Patt, "Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches," Proc. 39th Int'l Symp. Microarchitecture (MICRO), pp. 423-432, 2006.
[19] M.J. Serrano, R. Wood, and M. Nemirovsky, "A Study of Multistreamed Superscalar Processors," Technical Report #93-05, UCSB, 1993.
[20] T. Sherwood, E. Perelman, and B. Calder, "Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications," Proc. 10th Int'l Conf. Parallel Architecture and Compilation Techniques (PACT), pp. 3-14, 2001.
[21] Standard Performance Evaluation Corporation, "SPEC CPU 2000 Benchmark Suite," http:/, 2011.
[22] G.E. Suh, S. Devadas, and L. Rudolph, "A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning," Proc. Eight Int'l Conf. High-Performance Computer Architecture (HPCA), pp. 117-128, 2002.
[23] D. Tullsen, S.J. Eggers, and H.M. Levy, "Simultaneous Multithreading: Maximizing On-Chip Parallelism," Proc. 22nd Int'l Symp. Computer Architecture (ISCA), pp. 392-403, 1995.

Index Terms:
CPU accounting, chip-multiprocessor, shared last level of cache, cache partitioning algorithms.
Carlos Luque, Miquel Moreto, Francisco J. Cazorla, Roberto Gioiosa, Alper Buyuktosunoglu, Mateo Valero, "CPU Accounting for Multicore Processors," IEEE Transactions on Computers, vol. 61, no. 2, pp. 251-264, Feb. 2012, doi:10.1109/TC.2011.152
Usage of this product signifies your acceptance of the Terms of Use.