This Article 
 Bibliographic References 
 Add to: 
The Complexity of Optimal Job Co-Scheduling on Chip Multiprocessors and Heuristics-Based Solutions
July 2011 (vol. 22 no. 7)
pp. 1192-1205
Yunlian Jiang, The College of William and Mary, Williamsburg
Kai Tian, The College of William and Mary, Williamsburg
Xipeng Shen, The College of William and Mary, Williamsburg
Jinghe Zhang, University of North Carolina at Chapel Hill, Chapel Hill
Jie Chen, the Thomas Jefferson National Accelerator Facility, VA
Rahul Tripathi, University of South Florida, Tampa
In Chip Multiprocessors (CMPs) architecture, it is common that multiple cores share some on-chip cache. The sharing may cause cache thrashing and contention among co-running jobs. Job co-scheduling is an approach to tackling the problem by assigning jobs to cores appropriately so that the contention and consequent performance degradations are minimized. Job co-scheduling includes two tasks: the estimation of co-run performance, and the determination of suitable co-schedules. Most existing studies in job co-scheduling have concentrated on the first task but relies on simple techniques (e.g., trying different schedules) for the second. This paper presents a systematic exploration to the second task. The paper uncovers the computational complexity of the determination of optimal job co-schedules, proving its NP-completeness. It introduces a set of algorithms, based on graph theory and Integer/Linear Programming, for computing optimal co-schedules or their lower bounds in scenarios with or without job migrations. For complex cases, it empirically demonstrates the feasibility for approximating the optimal effectively by proposing several heuristics-based algorithms. These discoveries may facilitate the assessment of job co-schedulers by providing necessary baselines, as well as shed insights to the development of co-scheduling algorithms in practical systems.

[1] "GNU Linear Programming Kit," software/glpkglpk.html , 2011.
[2] E. Berg, H. Zeffer, and E. Hagersten, "A Statistical Multiprocessor Cache Model," Proc. IEEE Int'l Symp. Performance Analysis of Systems and Software, pp. 89-99, 2006.
[3] D. Chandra, F. Guo, S. Kim, and Y. Solihin, "Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture," Proc. Int'l Symp. High Performance Computer Architecture (HPCA), pp. 340-351, 2005.
[4] W. Cook and A. Rohe, "Computing Minimum-Weight Perfect Matchings," INFORMS J. Computing, vol. 11, pp. 138-148, 1999.
[5] S. Dandamudi, Hierarchical Scheduling in Parallel and Cluster Systems. Kluwer, 2003.
[6] M. DeVuyst, R. Kumar, and D.M. Tullsen, "Exploiting Unbalanced Thread Scheduling for Energy and Performance on a CMP of SMT Processors," Proc. Int'l Parallel and Distributed Processing Symp. (IPDPS), pp. 117-126, 2006.
[7] J. Edmonds, "Maximum Matching and a Polyhedron with 0,1-Vertices," J. Research of the Nat'l Bureau of Standards B, vol. 69B, pp. 125-130, 1965.
[8] A. Fedorova, M. Seltzer, C. Small, and D. Nussbaum, "Performance of Multithreaded Chip Multiprocessors and Implications for Operating System Design," Proc. USENIX Ann. Technical Conf., 2005.
[9] A. Fedorova, M. Seltzer, and M.D. Smith, "Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler," Proc. Int'l Conf. Parallel Architecture and Compilation Techniques, pp. 25-38, 2007.
[10] H. Gabow and R.E. Tarjan, "Faster Scaling Algorithms for General Graph-Matching Problems," J. ACM, vol. 38, pp. 815-853, 1991.
[11] M. Garey and D. Johnson, Computers and Intractability, W.H. Freeman, 1979.
[12] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning. Springer, 2001.
[13] J.L. Hennessy and D.A. Patterson, Computer Architecture: A Quantitative Approach. Morgan Kaufmann, 2007.
[14] L.R. Hsu, S.K. Reinhardt, R. Lyer, and S. Makineni, "Communist, Utilitarian, and Capitalist Cache Policies on CMPs: Caches as a Shared Resource," Proc. Int'l Conf. Parallel Architecture and Compilation Techniques, pp. 13-22, 2006.
[15] Y. Jiang and X. Shen, "Exploration of the Influence of Program Inputs on CMP Co-Scheduling," Proc. European Conf. Parallel Computing (Euro-Par), pp. 263-273, Aug. 2008.
[16] Y. Jiang, X. Shen, J. Chen, and R. Tripathi, "Analysis and Approximation of Optimal Co-Scheduling on Chip Multiprocessors," Proc. Int'l Conf. Parallel Architecture and Compilation Techniques (PACT), pp. 220-229, Oct. 2008.
[17] R. Karp, "Reducibility among Combinatorial Problems," Complexity of Computer Computations, R. Miller and J. Thatcher, eds., pp. 85-103, Plenum Press, 1972.
[18] S. Kim, D. Chandra, and Y. Solihin, "Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture," Proc. Int'l Conf. Parallel Architecture and Compilation Techniques, pp. 111-122, 2004.
[19] J.Y.-T. Leung, Handbook of Scheduling. Chapman & Hall/CRC, 2004.
[20] J. McCalpin, "Memory Bandwidth and Machine Balance in Current High Performance Computers," IEEE TCCA Newsletter, http://www.cs.virginia.edustream, 1995.
[21] S. Mehrotra, "On the Implementation of a Primal-Dual Interior Point Method," SIAM J. Optimization, vol. 2, pp. 575-601, 1992.
[22] M.K. Qureshi and Y.N. Patt, "Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches," Proc. Int'l Symp. Microarchitecture, pp. 423-432, 2006.
[23] N. Rafique, W. Lim, and M. Thottethodi, "Architectural Support for Operating System-Driven CMP Cache Management," Proc. Int'l Conf. Parallel Architecture and Compilation Techniques, pp. 2-12, 2006.
[24] S. Russell and P. Norvig, Artificial Intelligence. Prentice Hall, 2002.
[25] A. Snavely and D. Tullsen, "Symbiotic Jobscheduling for a Simultaneous Multithreading Processor," Proc. Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 66-76, 2000.
[26] G. Suh, S. Devadas, and L. Rudolph, "A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning," Proc. Eighth Int'l Symp. High-Performance Computer Architecture, pp. 117-128, 2002.
[27] K. Tian, Y. Jiang, and X. Shen, "A Study on Optimally Co-Scheduling Jobs of Different Lengths on Chip Multiprocessors," Proc. ACM Computing Frontiers, pp. 41-50, 2009.
[28] N. Tuck and D.M. Tullsen, "Initial Observations of the Simultaneous Multithreading Pentium 4 Processor," Proc. Int'l Conf. Parallel Architectures and Compilation Techniques, pp. 26-35, 2003.
[29] Y. Zhang, "Solving Large-Scale Linear Programs by Interior-Point Methods under the Matlab Environment," Technical Report 96-01, Univ. of Maryland, July 1995.
[30] S. Zhuravlev, S. Blagodurov, and A. Fedorova, "Addressing Shared Resource Contention In Multicore Processors via Scheduling," Proc. Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 129-142, 2010.

Index Terms:
Co-scheduling, shared cache, CMP scheduling, cache contention, perfect matching, integer programming.
Yunlian Jiang, Kai Tian, Xipeng Shen, Jinghe Zhang, Jie Chen, Rahul Tripathi, "The Complexity of Optimal Job Co-Scheduling on Chip Multiprocessors and Heuristics-Based Solutions," IEEE Transactions on Parallel and Distributed Systems, vol. 22, no. 7, pp. 1192-1205, July 2011, doi:10.1109/TPDS.2010.193
Usage of this product signifies your acceptance of the Terms of Use.