|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| Eddy Zheng Zhang, Yunlian Jiang, Xipeng Shen, "The Significance of CMP Cache Sharing on Contemporary Multithreaded Applications," IEEE Transactions on Parallel and Distributed Systems, vol. 23, no. 2, pp. 367-374, February, 2012. | |||
| BibTex | x | ||
| @article{ 10.1109/TPDS.2011.130, author = {Eddy Zheng Zhang and Yunlian Jiang and Xipeng Shen}, title = {The Significance of CMP Cache Sharing on Contemporary Multithreaded Applications}, journal ={IEEE Transactions on Parallel and Distributed Systems}, volume = {23}, number = {2}, issn = {1045-9219}, year = {2012}, pages = {367-374}, doi = {http://doi.ieeecomputersociety.org/10.1109/TPDS.2011.130}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Parallel and Distributed Systems TI - The Significance of CMP Cache Sharing on Contemporary Multithreaded Applications IS - 2 SN - 1045-9219 SP367 EP374 EPD - 367-374 A1 - Eddy Zheng Zhang, A1 - Yunlian Jiang, A1 - Xipeng Shen, PY - 2012 KW - Shared cache KW - thread scheduling KW - parallel program optimizations KW - chip multiprocessors. VL - 23 JA - IEEE Transactions on Parallel and Distributed Systems ER - | |||
[1] R. Allen and K. Kennedy, Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann Publishers, 2001.
[2] C. Bienia, S. Kumar, and K. Li, "PARSEC versus SPLASH-2: A Quantitative Comparison of Two Multithreaded Benchmark Suites on Chip-Multiprocessors," Proc. IEEE Int'l Symp. Workload Characterization, pp. 47-56, 2008.
[3] C. Bienia, S. Kumar, J.P. Singh, and K. Li, "The PARSEC Benchmark Suite: Characterization and Architectural Implications," Proc. Int'l Conf. Parallel Architectures and Compilation Techniques, pp. 72-81, 2008.
[4] S. Browne, C. Deane, G. Ho, and P. Mucci, "PAPI: A Portable Interface to Hardware Performance Counters," Proc. Dept. of Defense HPCMP Users Group Conf., 1999.
[5] B. Calder, C. Krintz, S. John, and T. Austin, "Cache-Conscious Data Placement," Proc. Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 139-149, 1998.
[6] J. Chang and G. Sohi, "Cooperative Cache Partitioning for Chip Multiprocessors," Proc. 21st Ann. Int'l Conf. Supercomputing, pp. 242-252, 2007.
[7] M. DeVuyst, R. Kumar, and D.M. Tullsen, "Exploiting Unbalanced Thread Scheduling for Energy and Performance on a Cmp of smt Processors," Proc. Int'l Parallel and Distributed Processing Symp. (IPDPS), 2006.
[8] C. Ding and T. Chilimbi, "A Composable Model for Analyzing Locality of Multi-Threaded Programs," Technical Report MSR-TR-2009-107, Microsoft Research, 2009.
[9] X. Ding, J. Lin, Q. Lu, P. Sadayappan, and Z. Zhang, "Gaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems," Proc. Int'l Symp. High-Performance Computer Architecture (HPCA), pp. 367-378, 2008.
[10] A. Fedorova, M. Seltzer, and M.D. Smith, "Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler," Proc. Int'l Conf. Parallel Architecture and Compilation Techniques, pp. 25-38, 2007.
[11] Y. Jiang, X. Shen, J. Chen, and R. Tripathi, "Analysis and Approximation of Optimal Co-Scheduling on Chip Multiprocessors," Proc. Int'l Conf. Parallel Architecture and Compilation Techniques (PACT), pp. 220-229, Oct. 2008.
[12] Y. Jiang, K. Tian, and X. Shen, "Combining Locality Analysis with Online Proactive Job Co-Scheduling in Chip Multiprocessors," Proc. Int'l Conf. High Performance Embedded Architectures and Compilers (HiPEAC), pp. 201-215, 2010.
[13] Y. Jiang, E. Zhang, and X. Shen, "Array Regrouping on Cmp with Non-Uniform Cache Sharing," Proc. Int'l Workshop Languages and Compilers for Parallel Computing, 2010.
[14] Y. Jiang, E. Zhang, K. Tian, and X. Shen, "Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors," Proc. Int'l Conf. Compiler Construction, 2010.
[15] R. Kumar and D. Tullsen, "Compiling for Instruction Cache Performance on a Multithreaded Architecture," Proc. Int'l Symp. Microarchitecture, pp. 419-429, 2002.
[16] C. Liao, Z. Liu, L. Huang, and B. Chapman, "Evaluating OpenMP on Chip Multithreading Platforms," Proc. Int'l Workshop OpenMP, 2005.
[17] D. Nikolopoulos, "Code and Data Transformations for Improving Shared Cache Performance on SMT Processors," Proc. Int'l Symp. High Performance Computing, pp. 54-69, 2003.
[18] M.K. Qureshi and Y.N. Patt, "Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches," Proc. Int'l Symp. Microarchitecture, pp. 423-432, 2006.
[19] N. Rafique, W. Lim, and M. Thottethodi, "Architectural Support for Operating System-Driven CMP Cache Management," Proc. Int'l Conf. Parallel Architecture and Compilation Techniques, pp. 2-12, 2006.
[20] S. Sarkar and D. Tullsen, "Compiler Techniques for Reducing Data Cache Miss Rate on a Multithreaded Architecture," Proc. Int'l Conf. High Performance Embedded Architectures and Compilers (HiPEAC), pp. 353-368, 2008.
[21] D. Schuff, M. Kulkarni, and V. Pai, "Accelerating Multicore Reuse Distance Analysis with Sampling and Parallelization," Proc. Int'l Conf. Parallel Architectures and Compilation Techniques, pp. 53-64, 2010.
[22] A. Settle, J.L. Kihm, A. Janiszewski, and D.A. Connors, "Architectural Support for Enhanced SMT Job Scheduling," Proc. Int'l Conf. Parallel Architecture and Compilation Techniques, pp. 63-73, 2004.
[23] X. Shen, Y. Zhong, and C. Ding, "Locality Phase Prediction," Proc. Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 165-176, 2004.
[24] T. Sherwood, E. Perelman, G. Hamerly, and B. Calder, "Automatically Characterizing Large Scale Program Behavior," Proc. Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 45-57, 2002.
[25] A. Snavely and D. Tullsen, "Symbiotic Jobscheduling for a Simultaneous Multithreading processor," Proc. Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 66-76, 2000.
[26] A. Snavely, D. Tullsen, and G. Voelker, "Symbiotic Jobscheduling with Priorities for a Simultaneous Multithreading Processor," Proc. Joint Int'l Conf. Measurement and Modeling of Computer Systems, pp. 66-76, 2002.
[27] G. Suh, S. Devadas, and L. Rudolph, "A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning," Proc. Eighth Int'l Symp. High-Performance Computer Architecture, pp. 117-128, 2002.
[28] D. Tam, R. Azimi, and M. Stumm, "Thread Clustering: Sharing-Aware Scheduling on SMP-CMP-SMT Multiprocessors," SIGOPS Operating Systems Rev., vol. 41, no. 3, pp. 47-58, 2007.
[29] K. Tian, Y. Jiang, and X. Shen, "A Study on Optimally Co-Scheduling Jobs of Different Lengths on Chip Multiprocessors," Proc. ACM Conf. Computing Frontiers, pp. 41-50, 2009.
[30] N. Tuck and D.M. Tullsen, "Initial Observations of the Simultaneous Multithreading Pentium 4 Processor," Proc. Int'l Conf. Parallel Architectures and Compilation Techniques, pp. 26-35, 2003.
[31] S. Woo, M. Ohara, E. Torrie, J. Singh, and A. Gupta, "The SPLASH-2 Programs: Characterization and Methodological Considerations," Proc. Int'l Symp. Computer Architecture, pp. 24-36, 1995.
[32] E.Z. Zhang, Y. Jiang, and X. Shen, "Does Cache Sharing on Modern Cmp Matter to the Performance of Contemporary Multithreaded Programs?," PPoPP '10: Proc. 15th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, pp. 203-212, 2010.
[33] X. Zhang, S. Dwarkadas, G. Folkmanis, and K. Shen, "Processor Hardware Counter Statistics as a First-Class System Resource," Proc. 11th Workshop Hot Topics in Operating Systems, 2007.
[34] S. Zhuravlev, S. Blagodurov, and A. Fedorova, "Addressing Shared Resource Contention in Multicore Processors via Scheduling," Proc. Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 129-142, 2010.

