The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.05 - May (2013 vol.62)
pp: 1004-1016
Lin Gao , University of New South Wales, Sydney
Lian Li , University of New South Wales, Sydney
Jingling Xue , University of New South Wales, Sydney
Pen-Chung Yew , University of Minnesota, Twin Cities
ABSTRACT
Research on compiler techniques for thread-level loop speculation has so far remained on studying its performance limits: loop candidates that are worthy of parallelization are manually selected by the researchers or based on extensive profiling and preexecution. It is therefore difficult to include them in a production compiler for speculative multithreaded multicore processors. In a way, existing techniques are statically adaptive ("realized"; by the researchers for different inputs) yet dynamically greedy (since all iterations of all selected loop candidates are always parallelized at run time). This paper introduces a Statically GrEEdy and Dynamically Adaptive (SEED) approach for thread-level speculation on loops that is quite different from most other existing techniques. SEED relies on the compiler to select and optimize loop candidates greedily (possibly in an input-independent way) and provides a runtime scheduler to schedule loop iterations adaptively. To select loops for parallelization at runtime (subject to program inputs), loop iterations are prioritized in terms of their potential benefits rather than their degree of speculation as in many prior studies. In our current implementation, the benefits of speculative threads are estimated by a simple yet effective cost model. It comprises a mechanism for efficiently tracing the loop nesting structures of the program and a mechanism for predicting the outcome of speculative threads. We have evaluated SEED using a set of SPECint2000 and Olden benchmarks. Compared to existing techniques with a program's loop candidates being ideally selected a priori, SEED can achieve comparable or better performance while aututomating the entire loop candidate selection process.
INDEX TERMS
Parallel processing, Instruction sets, Runtime, Optimization, Hardware, Dynamic scheduling, thread-level speculation, Parallel processing, Instruction sets, Runtime, Optimization, Hardware, Dynamic scheduling, speculative compilation, Loop-level speculation
CITATION
Lin Gao, Lian Li, Jingling Xue, Pen-Chung Yew, "SEED: A Statically Greedy and Dynamically Adaptive Approach for Speculative Loop Execution", IEEE Transactions on Computers, vol.62, no. 5, pp. 1004-1016, May 2013, doi:10.1109/TC.2012.41
REFERENCES
[1] H. Akkary and M.A. Driscoll, "A Dynamic Multithreading Processor," MICRO-31: Proc. ACM/IEEE 31st Ann. Int'l Symp. Microarchitecture , pp. 226-236, 1998.
[2] M.K. Chen and K. Olukotun, "The Jrpm System for Dynamically Parallelizing Java Programs," Proc. 30th Ann. Int'l Symp. Computer Architecture (ISCA '03), pp. 434-446, 2003.
[3] M. Cintra and J. Torrellas, "Eliminating Squashes through Learning Cross-Thread Violations in Speculative Parallelization for Multiprocessors," Proc. Eighth Int'l Symp. High-Performance Computer Architecture (HPCA '02), pp. 43-54, 2002.
[4] L. Codrescu, D.S. Wills, and J. Meindl, "Architecture of the Atlas Chip-Multiprocessor: Dynamically Parallelizing Irregular Applications," IEEE Trans. Computers, vol. 50, no. 1, pp. 67-82, Jan. 2001.
[5] J. Dou and M. Cintra, "A Compiler Cost Model for Speculative Parallelization," ACM Trans. Architecture and Code Optimization, vol. 4, no. 2, article 12, 2007.
[6] L. Hammond, M. Willey, and K. Olukotun, "Data Speculation Support for a Chip Multiprocessor," Proc. Eighth Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS-VIII), pp. 58-69, 1998.
[7] T.A. Johnson, R. Eigenmann, and T.N. Vijaykumar, "Min-Cut Program Decomposition for Thread-Level Speculation," Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation (PLDI '04), pp. 59-70, 2004.
[8] T.A. Johnson, R. Eigenmann, and T.N. Vijaykumar, "Speculative Thread Decomposition through Empirical Optimization," Proc. 12th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP '07), pp. 205-214, 2007.
[9] A. Kejariwal, X. Tian, M. Girkar, W. Li, S. Kozhukhov, U. Banerjee, A. Nicolau, A.V. Veidenbaum, and C.D. Polychronopoulos, "Tight Analysis of the Performance Potential of Thread Speculation Using Spec CPU 2006," Proc. ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP '07), pp. 215-225, 2007.
[10] C.-C. Lee, I.-C. K. Chen, and T.N. Mudge, "The Bi-Mode Branch Predictor," MICRO-30: Proc. ACM/IEEE 30th Ann. Int'l Symp. Microarchitecture, pp. 4-13, 1997.
[11] W. Liu, J. Tuck, L. Ceze, W. Ahn, K. Strauss, J. Renau, and J. Torrellas, "POSH: A TLS Compiler that Exploits Program Structure," Proc. ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP '06), pp. 158-167, 2006.
[12] P. Marcuello and A. González, "Thread-Spawning Schemes for Speculative Multithreading," Proc. Int'l Symp. High-Performance Computer Architecture (HPCA '02), pp. 55-64, 2002.
[13] J. Oplinger, D. Heine, S.-W. Liao, B.A. Nayfeh, M.S. Lam, and K. Olukotun, "Software and Hardware for Exploiting Speculative Parallelism with a Multiprocessor," Technical Report CSL-TR-97-715, 1997.
[14] J.T. Oplinger, D.L. Heine, and M.S. Lam, "In Search of Speculative Thread-Level Parallelism," Proc. Int'l Conf. Parallel Architectures and Compilation Techniques (PACT '99), pp. 303-313, 1999.
[15] P. Palatin, Y. Lhuillier, and O. Temam, "Capsule: Hardware-Assisted Parallel Execution of Component-Based Programs," MICRO-39: Proc. 39th Ann. IEEE/ACM Int'l Symp. Microarchitecture, pp. 247-258, 2006.
[16] I. Park, B. Falsafi, and T.N. Vijaykumar, "Implicitly-Multithreaded Processors," Proc. 30th Ann. Int'l Symp. Computer Architecture (ISCA '03), pp. 39-51, 2003.
[17] C.G. Quinones, C. Madrile, J. Sanchez, P. Marcuello, A. Gonzalez, and D.M. Tullsen, "Mitosis Compiler: An Infrastructure for Speculative Threading Based on Pre-Computation Slices," Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation (PLDI '05), pp. 269-279, 2005.
[18] E. Raman, G. Ottoni, A. Raman, M.J. Bridges, and D.I. August, "Parallel-Stage Decoupled Software Pipelining," Proc. IEEE/ACM Sixth Ann. Int'l Symp. Code Generation and Optimization (CGO '08), pp. 114-123, 2008.
[19] J. Renau, K. Strauss, L. Ceze, W. Liu, S. Sarangi, J. Tuck, and J. Torrellas, "Thread-Level Speculation on a CMP Can Be Energy Efficient," Proc. 19th Ann. Int'l Conf. Supercomputing, pp. 219-228, 2005.
[20] J. Renau, J. Tuck, W. Liu, L. Ceze, K. Strauss, and J. Torrellas, "Tasking with Out-of-Order Spawn in tls chip Multiprocessors: Microarchitecture and Compilation," Proc. 19th Ann. Int'l Conf. Supercomputing (ICS '05), pp. 179-188, 2005.
[21] J.G. Steffan, C.B. Colohan, A. Zhai, and T.C. Mowry, "Improving Value Communication for Thread-Level Speculation," Proc. Eighth Int'l Symp. High-Performance Computer Architecture (HPCA '08), 2002.
[22] J.Y. Tsai and P.C. Yew, "The Superthreaded Architecture: Thread Pipelining with Run-Time Data Dependence Checking and Control Speculation," Proc. Conf. Parallel Architectures and Compilation Techniques (PACT '99), pp. 35-46, 1999.
[23] N. Vachharajani, R. Rangan, E. Raman, M.J. Bridges, G. Ottoni, and D.I. August, "Speculative Decoupled Software Pipelining," Proc. Int'l Conf. Parallel Architectures and Compilation Techniques (PACT '07), pp. 49-59, 2007.
[24] T.N. Vijaykumar, "Compiling for the Multiscalar Architecture," PhD dissertation, 1998.
[25] T.N. Vijaykumar, S. Gopal, J.E. Smith, and G. Sohi, "Speculative Versioning Cache," IEEE Trans. Parallel Distributed Systems, vol. 12, no. 12, pp. 1305-1317, Dec. 2001.
[26] T.N. Vijaykumar and G.S. Sohi, "Task Selection for a Multiscalar Processor," MICRO-31: Proc. ACM/IEEE 31st Ann. Int'l Symp. Microarchitecture, pp. 81-92, 1998.
[27] C. Wang, Y. Wu, E. Borin, S. Hu, W. Liu, D. Sager, T.-f. Ngai, and J. Fang, "Dynamic Parallelization of Single-Threaded Binary Programs Using Speculative Slicing," Proc. 23rd Int'l Conf. Supercomputing (ICS '09), 2009.
[28] T.-Y. Yeh and Y.N. Patt, "Two-Level Adaptive Training Branch Prediction," MICRO-24: Proc. ACM/IEEE Ann. Int'l Symp. Microarchitecture, pp. 51-61, 1991.
[29] Z.-H. Du, C.-C. Lim, X.-F. Li, C. Yang, Q. Zhao, and T.F. Ngai, "A Cost-Driven Compilation Framework for Speculative Parallelization of Sequential Programs," Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation (PLDI '04), pp. 71-81, 2004.
[30] H. Zhong, M. Mehrara, S. Lieberman, and S. Mahlke, "Uncovering Hidden Loop Level Parallelism in Sequential Applications," Proc. 14th Int'l Symp. High-Performance Computer Architecture (HPCA '08), 2008.
50 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool