The Community for Technology Leaders
RSS Icon
Issue No.01 - January (2010 vol.21)
pp: 47-59
David A. Zier , NVIDIA Corporation, Beaverton
Ben Lee , Oregon State University, Corvallis
Thread-level parallelism (TLP) has been extensively studied in order to overcome the limitations of exploiting instruction-level parallelism (ILP) on high-performance superscalar processors. One promising method of exploiting TLP is Dynamic Speculative Multithreading (D-SpMT), which extracts multiple threads from a sequential program without compiler support or instruction set extensions. This paper introduces Cascadia, a D-SpMT multicore architecture that provides multigrain thread-level support and is used to evaluate the performance of several benchmarks. Cascadia applies a unique sustainable IPC (sIPC) metric on a comprehensive loop tree to select the best performing nested loop level to multithread. This paper also discusses the relationships that loops have on one another, in particular, how loop nesting levels can be extended through procedures. In addition, a detailed study is provided on the effects that thread granularity and interthread dependencies have on the entire system.
Multithreading processors, multicore processors, simulation, speculative multithreading.
David A. Zier, Ben Lee, "Performance Evaluation of Dynamic Speculative Multithreading with the Cascadia Architecture", IEEE Transactions on Parallel & Distributed Systems, vol.21, no. 1, pp. 47-59, January 2010, doi:10.1109/TPDS.2009.47
[1] J.T. Oplinger, D.L. Heine, and M.S. Lam, “In Search of Speculative Thread-Level Parallelism,” Proc. Int'l Conf. Parallel Architectures and Compilation Techniques, pp. 303-313, Oct. 1999.
[2] D. Ortiz-Arroyo and B. Lee, “Dynamic Simultaneous Multithreaded Architecture,” Proc. 16th Int'l Conf. Parallel and Distributed Computing Systems, Aug. 2003.
[3] J. Tubella and A. González, “Control Speculation in Multithreaded Processors through Dynamic Loop Detection,” Proc. Fourth Int'l Symp. High-Performance Computer Architecture, Feb. 1998.
[4] S. Balakrishnan and G.S. Sohi, “Program Demultiplexing: Data-Flow Based Speculative Parallelization of Methods in Sequential Programs,” Proc. 33rd Ann. Int'l Symp. Computer Architecture, pp.302-313, June 2006.
[5] J.D. Collins, D.M. Tullsen, and H. Wang, “Control Flow Optimization via Dynamic Reconvergence Prediction,” Proc. 37th Ann. IEEE/ACM Int'l Symp. Microarchitecture, pp. 129-140, Dec. 2004.
[6] M.J. Bridges, N. Vachharajani, Y. Zhang, T. Jablin, and D.I. August, “Revisiting the Sequential Programming Model for Multi-Core,” Proc. 40th IEEE/ACM Int'l Symp. Microarchitecture (MICRO), pp. 69-84, Dec. 2007.
[7] C. Tian, M. Feng, V. Nagarajan, and R. Gupta, “Copy or Discard Execution Model for Speculative Parallelization on Multicores,” Proc. 41st Int'l IEEE/ACM Symp. Microarchitecture, pp. 300-341, Nov. 2008.
[8] A. Gontmakher, A. Mendelson, A. Schuster, and G. Shklover, “Speculative Synchronization and Thread Management for Fine Granularity Threads,” Proc. 12th Int'l Symp. High-Performance Computer Architecture, pp. 278-287, Feb. 2006.
[9] T. Ohsawa et al. “Pinot: Speculative Multi-Threading Processor Architecture Exploiting Parallelism over a Wide Range of Granularities,” Proc. 38th Ann. IEEE/ACM Int'l Symp. Microarchitecture, pp. 81-92, Nov. 2005.
[10] G.S. Sohi, S.E. Breach, and T.N. Vijaykumar, “Multiscalar Processors,” Proc. 22nd Ann. Int'l Symp. Computer Architecture, pp. 414-425, June 1995.
[11] H. Akkary and M.A. Driscoll, “A Dynamic Multithreading Processor,” Proc. 31st Ann. Int'l Symp. Microarchitecture, pp. 226-236, Dec. 1998.
[12] P. Marcuello and A. González, “Clustered Speculative Multithreaded Processors,” Proc. 13th Int'l Conf. Supercomputing, pp.365-372, 1999.
[13] A. Kejariwal et al. “On the Performance Potential of Different Types of Speculative Thread-Level Parallelism,” Proc. 20th Ann. Int'l Conf. Supercomputing, 2006.
[14] M. Kobayashi, “Dynamic Characteristics of Loops,” IEEE Trans. Computers, vol. 33, no. 2, pp. 125-132, Feb. 1984.
[15] P. Marcuello and A. González, “Thread-Spawning Schemes for Speculative Multithreading,” Proc. Eighth Int'l Symp. High-Performance Computer Architecture, Feb. 2002.
[16] G. Ottoni and D.I. August, “Global Multi-Threaded Instructions Scheduling,” Proc. 40th IEEE/ACM Int'l Symp. Microarchitecture, Dec. 2007.
[17] C.G. Quiñones et al. “Mitosis Compiler: An Infrastructure for Speculative Threading Based on Pre-Computation Slices,” Proc. 2005 ACM SIGPLAN Conf. Programming Language Design and Implementation, pp. 269-279, 2005.
[18] M. Tremblay and S. Chaudhry, “A Third-Generation 65 nm 16-Core 32-Thread Plus 32-Scout-Thread CMT SPARC Processor,” Proc. IEEE Int'l Solid-State Circuits Conf., vol. 51, pp. 82-83, Feb. 2008.
[19] M. Tremblay, B. Joy, and K. Shin, “A Three Dimensional Register File for Superscalar Processors,” Proc. 28th Hawaii Int'l Conf. System Sciences, 1995.
[20] J. Lu et al. “Dynamic Helper Threaded Prefetching on the Sun UltraSPARC CMP Processor,” Proc. 38th Ann. IEEE/ACM Int'l Symp. Microarchitecture, pp. 93-104, Nov. 2005.
[21] D.A. Zier, J.A. Nelson, and B. Lee, “NetSim: An Object-Oriented Architectural Simulator Suite,” Proc. Int'l Conf. Computer Design, June 2005.
[22] T. Austin, E. Larson, and D. Ernst, “SimpleScalar: An Infrastructure for Computer System Modeling,” Computer, vol. 35, no. 2, pp. 59-67, Feb. 2002.
[23] I. Park, B. Falsafi, and T.N. Vijaykumar, “Implicitly-Multithreaded Processors,” Proc. 30th Ann. Int'l Symp. Computer Architecture, pp.39-51, June 2003.
[24] J.L. Henning, “SPEC CPU2000: Measuring CPU Performance in the New Millinnium,” Computer, vol. 33, no. 7, pp. 28-35, July 2000.
[25] C. Lee, M. Potkonjak, and W.H. Mangione-Smith, “MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communication Systems,” Proc. IEEE Int'l Symp. Microarchitecture, pp. 330-335, 1997.
[26] A. KleinOsowski and D.J. Lilja, “MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research,” IEEE Computer Architecture Letters, vol. 1, p. 7, June 2002.
[27] M.R. de Alba and D.R. Kaeli, “Characterization and Evaluation of Hardware Loop Unrolling,” Proc. First Boston Area Architecture Conf., Jan. 2003.
[28] L. Hammond, M. Willey, and K. Olukotun, “Data Speculation Support for a Chip Multiprocessor,” Proc. Eighth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 58-69, Oct. 1998.
[29] D. Puppin and D. Tullsen, “Maximizing TLP with Loop-Parallelization on SMT,” Proc. Workshop Multithreaded Execution, Architecture, and Compilation, 2001.
[30] J.-Y. Tsai and P.-C. Yew, “The Superthreaded Architecture: Thread Pipelining with Run-Time Data Dependence Checking and Control Speculation,” Proc. 1996 Conf. Parallel Architectures and Compilation Techniques, Oct. 1996.
[31] Z.-H. Du, C.-C. Lim, X.-F. Li, C. Yang, Q. Zhao, and T.-F. Ngai, “A Cost-Driven Compilation Framework for Speculative Parallelization of Sequential Programs,” Proc. ACM SIGPLAN 2004 Conf. Programming Language Design and Implementation, pp. 71-81, 2004.
[32] W. Liu, J. Tuck, L. Ceze, W. Ahn, K. Strauss, J. Renau, and J. Torrellas, “Posh: A TLS Compiler That Exploits Program Structure,” Proc. 11th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, pp. 158-167, 2006.
[33] S. Wang, X. Dai, K.S. Yellajyosula, A. Zhai, and P.-C. Yew, “Loop Selection for Thread Level Speculation,” Proc. 18th Ann. Workshop Languages and Compilers for Parallel Computing, pp. 289-303, 2005.
[34] R.A. Hankins, G.N. Chinya, J.D. Collins, P.H. Wang, R. Rakvic, H. Wang, and J.P. Shen, “Multiple Instruction Stream Processor,” Proc. 33rd Ann. Int'l Symp. Computer Architecture, pp. 114-127, June 2006.
[35] G. Ottoni et al. “From Sequential Programs to Concurrent Threads,” IEEE Computer Architecture Letters, vol. 5, no. 1, Jan. 2006.
[36] P. Marcuello, A. González, and J. Tubella, “Speculative Multithreaded Processors,” Proc. 12th Int'l Conf. Supercomputing, pp. 77-84, 1998.
[37] T.M. Rafacz, “Spawn Point Prediction for a Polyflow Processor,” master's thesis, Univ. of Illinois at Urbana Champaign, 2005.
[38] T.N. Vijaykumar, S. Gopal, J.E. Smith, and G. Sohi, “Speculative Versioning Cache,” IEEE Trans. Parallel Distributive Systems, vol. 12, no. 12, pp. 1305-1317, Dec. 2001.
[39] J. Chung et al. “The Common Case Transactional Behavior of Multithreaded Programs,” Proc. 12th Ann. Int'l Symp. High-Performance Computer Architecture, pp. 266-277, 2006.
[40] M. Franklin and G.S. Sohi, “ARB: A Hardware Mechanism for Dynamic Reordering of Memory References,” IEEE Trans. Computers, vol. 45, no. 5, pp. 552-571, May 1996.
15 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool