This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Coarse-Grained Thread Pipelining: A Speculative Parallel Execution Model for Shared-Memory Multiprocessors
September 2001 (vol. 12 no. 9)
pp. 952-966

Abstract—This paper presents a new parallelization model, called coarse-grained thread pipelining, for exploiting speculative coarse-grained parallelism from general-purpose application programs in shared-memory multiprocessor systems. This parallelization model, which is based on the fine-grained thread pipelining model proposed for the superthreaded architecture, allows concurrent execution of loop iterations in a pipelined fashion with runtime data-dependence checking and control speculation. The speculative execution combined with the runtime dependence checking allows the parallelization of a variety of program constructs that cannot be parallelized with existing runtime parallelization algorithms. The pipelined execution of loop iterations in this new technique results in lower parallelization overhead than in other existing techniques. We evaluated the performance of this new model using some real applications and a synthetic benchmark. These experiments show that programs with a sufficiently large grain size compared to the parallelization overhead obtain significant speedup using this model. The results from the synthetic benchmark provide a means for estimating the performance that can be obtained from application programs that will be parallelized with this model. The library routines developed for this thread pipelining model are also useful for evaluating the correctness of the codes generated by the superthreaded compiler and in debugging and verifying the simulator for the superthreaded processor.

[1] D. Chen, J. Torrellas, and P. Yew, “An Efficient Algorithm for Runtime Parallelization of DOACROSS Loops,” Proc. Supercomputing 94, pp. 815-527, Nov. 1994.
[2] G. Cybenko, “Supercomputer Performance Trends and the Perfect Benchmarks,” Supercomputing Review, pp. 53-60, Apr. 1991.
[3] J. Huang and D.J. Lilja, “An Efficient Strategy for Developing a Simulator for a Novel Concurrent Multithreaded Processor Architecture,” Proc. Int'l Symp. Modeling, Analysis and Simulation of Computer and Telecomm. Systems, July 1998.
[4] I.H. Kazi and D.J. Lilja, “Coarse-Grained Speculative Execution in Shared-Memory Multiprocessors,” Proc. Int'l Conf. Supercomputing, 1998, pp. 93-100, July 1998.
[5] D.J. Lilja and J. Schmitt, “A Data Parallel Implementation of the TRFD Program from the Perfect Benchmarks,” Proc. EUROSIM Int'l Conf. Massively Parallel Processing Applications and Development, pp. 355-362, June 1994.
[6] J. Oplinger et al. “Software and Hardware for Exploiting Speculative Parallelism with a Multiprocessor,” Technical Report CSL-TR-97-715, Stanford Univ. Computer Systems Laboratory, Feb. 1997.
[7] J.G. Steffan, C.B. Colohan, and T.C. Mowry, “Architectural Support for Thread-Level Data Speculation,” Technical Report CMU-CS-97-188, School of Computer Science, Carnegie Mellon Univ., Nov. 1997.
[8] L. Rauchwerger and D. Padua, “The PRIVATIZING DOALL Test: A Run-Time Technique for DOALL Loop Identification and Array Privatization,” Proc. SIGPLAN 1994 Conf. Supercomputing, pp. 33-43, July 1994.
[9] L. Rauchwerger, N. Amato, and D. Padua, “Run-Time Methods for Parallelizing Partially Parallel Loops,” Proc. Supercomputing 1995, pp. 137-145, 1995.
[10] L. Rauchwerger and D. Padua, “Parallelizing While Loops for Multiprocessor Systems,” Proc. Ninth Int'l Parallel Processing Symp., pp. 347-356, Apr. 1995.
[11] J.-Y. Tsai and P.-C. Yew, "The Superthreaded Architecture: Thread Pipelining with Run-time Data Dependence Checking and Control Speculation," Proc. Int'l Conf. Parallel Architectures and Compilation Techniques, IEEE Computer Society Press, Los Alamitos, Calif., 1996, pp. 49-58.
[12] J.Y. Tsai, J. Huang, C. Amlo, D.J. Lilja, and P.C. Yew, “The Superthreaded Processor Architecture,” IEEE Trans. Computers, vol. 48, no. 9, Sept. 1999
[13] J.Y. Tsai, Z. Jiang, and P.C. Yew, “Compiler Techniques for the Superthreaded Architectures,” Int'l J. Parallel Programming, June 1998.
[14] J.Y. Tsai et al. “Superthreading: Integrating Compilation Technology and Processor Architecture for Cost-Effective Concurrent Multithreading,” J. Information Science and Eng., Mar. 1998.
[15] Y. Zhang, L. Rauchwerger, and J. Torrellas, “Hardware for Speculative Run-Time Parallelization in Distributed Shared-Memory Multiprocessors,” Proc. Fourth Int'l Conf. High-Performance Computer Architecture (HPCA-4), pp. 162-173, Feb. 1998.
[16] B. Zheng et al. “Designing the Agassiz Compiler for Concurrent Multithreaded Architectures,” Proc. Workshop Languages and Compilers for Parallel Computing, Aug. 1999.
[17] C.Q. Zhu and P.C. Yew, “A Scheme to Enforce Data Dependence on Large Multiprocessor Systems,” IEEE Trans. Software Eng., vol. 13, no. 6, pp. 726-739, June 1987.

Index Terms:
Runtime parallelization, shared-memory multiprocessors, coarse-grained parallelization, speculative execution, thread pipelining, superthreaded architecture.
Citation:
Iffat H. Kazi, David J. Lilja, "Coarse-Grained Thread Pipelining: A Speculative Parallel Execution Model for Shared-Memory Multiprocessors," IEEE Transactions on Parallel and Distributed Systems, vol. 12, no. 9, pp. 952-966, Sept. 2001, doi:10.1109/71.954629
Usage of this product signifies your acceptance of the Terms of Use.