The Community for Technology Leaders
RSS Icon
Issue No.07 - July (2011 vol.22)
pp: 1192-1205
Yunlian Jiang , The College of William and Mary, Williamsburg
Kai Tian , The College of William and Mary, Williamsburg
Xipeng Shen , The College of William and Mary, Williamsburg
Jinghe Zhang , University of North Carolina at Chapel Hill, Chapel Hill
Jie Chen , the Thomas Jefferson National Accelerator Facility, VA
Rahul Tripathi , University of South Florida, Tampa
In Chip Multiprocessors (CMPs) architecture, it is common that multiple cores share some on-chip cache. The sharing may cause cache thrashing and contention among co-running jobs. Job co-scheduling is an approach to tackling the problem by assigning jobs to cores appropriately so that the contention and consequent performance degradations are minimized. Job co-scheduling includes two tasks: the estimation of co-run performance, and the determination of suitable co-schedules. Most existing studies in job co-scheduling have concentrated on the first task but relies on simple techniques (e.g., trying different schedules) for the second. This paper presents a systematic exploration to the second task. The paper uncovers the computational complexity of the determination of optimal job co-schedules, proving its NP-completeness. It introduces a set of algorithms, based on graph theory and Integer/Linear Programming, for computing optimal co-schedules or their lower bounds in scenarios with or without job migrations. For complex cases, it empirically demonstrates the feasibility for approximating the optimal effectively by proposing several heuristics-based algorithms. These discoveries may facilitate the assessment of job co-schedulers by providing necessary baselines, as well as shed insights to the development of co-scheduling algorithms in practical systems.
Co-scheduling, shared cache, CMP scheduling, cache contention, perfect matching, integer programming.
Yunlian Jiang, Kai Tian, Xipeng Shen, Jinghe Zhang, Jie Chen, Rahul Tripathi, "The Complexity of Optimal Job Co-Scheduling on Chip Multiprocessors and Heuristics-Based Solutions", IEEE Transactions on Parallel & Distributed Systems, vol.22, no. 7, pp. 1192-1205, July 2011, doi:10.1109/TPDS.2010.193
[1] "GNU Linear Programming Kit," software/glpkglpk.html , 2011.
[2] E. Berg, H. Zeffer, and E. Hagersten, "A Statistical Multiprocessor Cache Model," Proc. IEEE Int'l Symp. Performance Analysis of Systems and Software, pp. 89-99, 2006.
[3] D. Chandra, F. Guo, S. Kim, and Y. Solihin, "Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture," Proc. Int'l Symp. High Performance Computer Architecture (HPCA), pp. 340-351, 2005.
[4] W. Cook and A. Rohe, "Computing Minimum-Weight Perfect Matchings," INFORMS J. Computing, vol. 11, pp. 138-148, 1999.
[5] S. Dandamudi, Hierarchical Scheduling in Parallel and Cluster Systems. Kluwer, 2003.
[6] M. DeVuyst, R. Kumar, and D.M. Tullsen, "Exploiting Unbalanced Thread Scheduling for Energy and Performance on a CMP of SMT Processors," Proc. Int'l Parallel and Distributed Processing Symp. (IPDPS), pp. 117-126, 2006.
[7] J. Edmonds, "Maximum Matching and a Polyhedron with 0,1-Vertices," J. Research of the Nat'l Bureau of Standards B, vol. 69B, pp. 125-130, 1965.
[8] A. Fedorova, M. Seltzer, C. Small, and D. Nussbaum, "Performance of Multithreaded Chip Multiprocessors and Implications for Operating System Design," Proc. USENIX Ann. Technical Conf., 2005.
[9] A. Fedorova, M. Seltzer, and M.D. Smith, "Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler," Proc. Int'l Conf. Parallel Architecture and Compilation Techniques, pp. 25-38, 2007.
[10] H. Gabow and R.E. Tarjan, "Faster Scaling Algorithms for General Graph-Matching Problems," J. ACM, vol. 38, pp. 815-853, 1991.
[11] M. Garey and D. Johnson, Computers and Intractability, W.H. Freeman, 1979.
[12] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning. Springer, 2001.
[13] J.L. Hennessy and D.A. Patterson, Computer Architecture: A Quantitative Approach. Morgan Kaufmann, 2007.
[14] L.R. Hsu, S.K. Reinhardt, R. Lyer, and S. Makineni, "Communist, Utilitarian, and Capitalist Cache Policies on CMPs: Caches as a Shared Resource," Proc. Int'l Conf. Parallel Architecture and Compilation Techniques, pp. 13-22, 2006.
[15] Y. Jiang and X. Shen, "Exploration of the Influence of Program Inputs on CMP Co-Scheduling," Proc. European Conf. Parallel Computing (Euro-Par), pp. 263-273, Aug. 2008.
[16] Y. Jiang, X. Shen, J. Chen, and R. Tripathi, "Analysis and Approximation of Optimal Co-Scheduling on Chip Multiprocessors," Proc. Int'l Conf. Parallel Architecture and Compilation Techniques (PACT), pp. 220-229, Oct. 2008.
[17] R. Karp, "Reducibility among Combinatorial Problems," Complexity of Computer Computations, R. Miller and J. Thatcher, eds., pp. 85-103, Plenum Press, 1972.
[18] S. Kim, D. Chandra, and Y. Solihin, "Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture," Proc. Int'l Conf. Parallel Architecture and Compilation Techniques, pp. 111-122, 2004.
[19] J.Y.-T. Leung, Handbook of Scheduling. Chapman & Hall/CRC, 2004.
[20] J. McCalpin, "Memory Bandwidth and Machine Balance in Current High Performance Computers," IEEE TCCA Newsletter, http://www.cs.virginia.edustream, 1995.
[21] S. Mehrotra, "On the Implementation of a Primal-Dual Interior Point Method," SIAM J. Optimization, vol. 2, pp. 575-601, 1992.
[22] M.K. Qureshi and Y.N. Patt, "Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches," Proc. Int'l Symp. Microarchitecture, pp. 423-432, 2006.
[23] N. Rafique, W. Lim, and M. Thottethodi, "Architectural Support for Operating System-Driven CMP Cache Management," Proc. Int'l Conf. Parallel Architecture and Compilation Techniques, pp. 2-12, 2006.
[24] S. Russell and P. Norvig, Artificial Intelligence. Prentice Hall, 2002.
[25] A. Snavely and D. Tullsen, "Symbiotic Jobscheduling for a Simultaneous Multithreading Processor," Proc. Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 66-76, 2000.
[26] G. Suh, S. Devadas, and L. Rudolph, "A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning," Proc. Eighth Int'l Symp. High-Performance Computer Architecture, pp. 117-128, 2002.
[27] K. Tian, Y. Jiang, and X. Shen, "A Study on Optimally Co-Scheduling Jobs of Different Lengths on Chip Multiprocessors," Proc. ACM Computing Frontiers, pp. 41-50, 2009.
[28] N. Tuck and D.M. Tullsen, "Initial Observations of the Simultaneous Multithreading Pentium 4 Processor," Proc. Int'l Conf. Parallel Architectures and Compilation Techniques, pp. 26-35, 2003.
[29] Y. Zhang, "Solving Large-Scale Linear Programs by Interior-Point Methods under the Matlab Environment," Technical Report 96-01, Univ. of Maryland, July 1995.
[30] S. Zhuravlev, S. Blagodurov, and A. Fedorova, "Addressing Shared Resource Contention In Multicore Processors via Scheduling," Proc. Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 129-142, 2010.
28 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool