loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Complete Compiler Approach to Auto-Parallelizing C Programs for Multi-DSP Systems
March 2005 (vol. 16 no. 3)
pp. 234-245

Abstract—Auto-parallelizing compilers for embedded applications have been unsuccessful due to the widespread use of pointer arithmetic and the complex memory model of multiple-address space digital signal processors (DSPs). This paper develops, for the first time, a complete auto-parallelization approach, which overcomes these issues. It first combines a pointer conversion technique with a new modulo elimination transformation for program recovery enabling later parallelization stages. Next, it integrates a novel data transformation technique that exposes the processor location of partitioned data. When this is combined with a new address resolution mechanism, it generates efficient programs that run on multiple address spaces without using message passing. Furthermore, as DSPs do not possess any data cache structure, an optimization is presented which transforms the program to both exploit remote data locality and local memory bandwidth. This parallelization approach is applied to the DSPstone and UTDSP benchmark suites, giving an average speedup of 3.78 on four Analog Devices TigerSHARC TS-101 processors.

[1] 234 R. Leupers, “Novel Code Optimization Techniques for DSPS,” Proc. Second European DSP Education and Research Conf., 1998.[2] V. Zivojnovic, J.M. Velarde, C. Schlager, and H. Meyr, “DSPstone: A DSP-Oriented Benchmarking Methodology,” Proc. Int'l Conf. Signal Processing Applications & Technology (ICSPAT '94), pp. 715-720, 1994.[3] S.L. Scott, “Synchronization and Communication in the T3E Multiprocessor,” Proc. Seventh Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS-VII), pp. 26-36, 1996.[4] S. Chakrabarti, M. Gupta, and J.-D. Choi, “Global Communication Analysis and Optimization,” Proc. SIGPLAN Conf. Programming Language Design and Implementation (PLDI '96), pp. 68-78, 1996.[5] S. Hiranandani, K. Kennedy, and C.-W. Tseng, “Compiling Fortran D for MIMD Distributed-Memory Machines,” Comm. ACM, vol. 35, no. 8, pp. 66-80, 1992.[6] J. Mellor-Crummey, V. Adve, B. Broom, D. Chavarria-Miranda, R. Fowler, G. Jin, K. Kennedy, and Q. Yi, “Advanced Optimization Strategies in the Rice DHPF Compiler,” Concurrency-Practice and Experience, vol. 14, nos. 8-9, pp. 741-767, 2002.[7] C.G. Lee and M. Stoodley, “UTDSP Benchmark Suite,” Univ. of Toronto, Canada, 1992, http://www.eecg.toronto.edu/corinna/DSP/ infrastructureUTDSP.html.[8] J.M. Anderson, S.P. Amarasinge, and M.S. Lam, “Data and Computation Transformations for Multiprocessors,” Proc. Fifth ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP '95), pp. 166-178, 1995.[9] M.F.P. O'Boyle and P.M.W. Knijnenburg, “Integrating Loop and Data Transformations for Global Optimisation,” J. Parallel and Distributed Computing, vol. 62, no. 4, pp. 563-590, 2002.[10] B. Franke and M.F.P. O'Boyle, “Array Recovery and High-Level Transformations for DSP Applications,” ACM Trans. Embedded Computing Systems, vol. 2, no. 2, pp. 132-162, 2003.[11] B. Bixby, K. Kennedy, and U. Kremer, “Automatic Data Layout Using 0-1 Integer Programming,” Proc. Parallel Architectures and Compiler Technology Conf. (PACT '94), 1994.[12] J. Garcia, E. Ayguad, and J. Labarta, “A Framework for Integrating Data Alignment, Distribution, and Redistribution in Distributed Memory Multiprocessors,” IEEE Trans. Parallel and Distributed Systems, vol. 12, no. 4, Apr. 2001.[13] D. Bau, I. Kodukla, V. Kotlyar, K. Pingali, and P. Stodghill, “Solving Alignment Using Elimentary Linear Algebra,” Proc. Seventh Int'l Workshop Languages and Compilers for Parallel Computing (LCPC '94), pp. 46-60, 1994.[14] K. Knobe, J. Lukas, and G. Steele, “Data Optimization: Allocation of Arrays to Reduce Communication on SIMD Machines,” J. Parallel and Distributed Computing, vol. 8, no. 2, pp. 102-118, 1990.[15] S. Chatterjee, J.R. Gilbert, L. Oliker, R. Schreiber, and T.J. Sheffler, “Algorithms for Automatic Alignment of Arrays,” J. Parallel and Distributed Computing, vol. 38, no. 2, pp. 145-157, 1996.[16] M.F.P. O'Boyle and F. Bodin, “Compiler Reduction of Synchronisation in Shared Virtual Memory Systems,” Proc. Ninth Int'l Conf. Supercomputing (ICS '95), pp. 318-327, 1995.[17] M.F.P. O'Boyle and E.A. Stohr, “Compile Time Barrier Synchronization Minimization,” IEEE Trans. Parallel and Distributed Systems, vol. 13, no. 6, pp. 529-543, June 2002.[18] W. Pugh, “Counting Solutions to Presburger Formulas: How and Why,” Proc. SIGPLAN Conf. Programming Languages Design and Implementation (PLDI '94), pp. 121-134, 1994.[19] M.W. Hall, J.M. Anderson, S.P. Amarasinghe, B.R. Murphy, S.-W. Liao, E. Bugnion, and M.S. Lam, “Maximizing Multiprocessor Performance with the SUIF Compiler,” Computer, vol. 29, no. 12, pp. 84-89, Dec. 1996.[20] R. Gupta, S. Pande, K. Psarris, and V. Sakar, “Compilation Techniques for Parallel Systems,” Parallel Computing, vol. 25, nos. 13-14, pp. 1741-1783, 1999.[21] J.R. Larus, “Compiling for Shared-Memory and Message-Passing Computers,” ACM Letters Programming Languages and Systems, vol. 2, nos. 1-4, pp. 165-180, 1993.[22] M. Kandemir, J. Ramanujam, and A. Choudhary, “Improving Cache Locality by a Combination of Loop and Data Transformations,” IEEE Trans. Computers, vol. 48, no. 2, pp. 159-167, Feb. 1999.[23] Y. Paek, A.G. Navarro, E.L. Zapata, and D.A. Padua, “Parallelization of Benchmarks for Scalable Shared-Memory Multiprocessors,” Proc. Conf. Parallel Architectures and Compiler Technology (PACT '98), 1998.[24] F. Balasa, F.H.M. Franssen, F.V.M. Catthoor, and H.J. De Man, “Transformation of Nested Loops with Modulo Indexing to Affine Recurrences,” Parallel Processing Letters, vol. 4, no. 3, pp. 271-280, 1994.[25] S. Carr, K.S. McKinley, and C.-W. Tseng, “Compiler Optimizations for Improving Data Locality,” Proc. Sixth Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS-VI), pp. 252-262, 1994.[26] J. Teich and L. Thiele, “Uniform Design of Parallel Programs for DSP,” Proc. IEEE Int'l Symp. Circuits and Systems (ISCAS '91), pp. 344a-347a, 1991.[27] A. Kalavade, J. Othmer, B. Ackland, and K.J. Singh, “Software Environment for a Multiprocessor DSP,” Proc. 36th ACM/IEEE Design Automation Conf. (DAC '99), 1999.[28] D.M. Lorts, “Combining Parallelization Techniques to Increase Adaptability and Efficiency of Multiprocessing DSP Systems,” Proc. Ninth DSP Workshop (DSP 2000)— First Signal Processing Education Workshop (SPEd 2000), 2000.[29] I. Karkowski and H. Corporaal, “Exploiting Fine- and Coarse-Grain Parallelism in Embedded Programs,” Proc. Int'l Conf. Parallel Architectures and Compilation Techniques (PACT '98), pp. 60-67, 1998.

Index Terms:
Parallel processors, interprocessor communications, real-time and embedded systems, signal processing systems, measurement, evaluation, modeling, simulation of multiple-processor systems, conversion from sequential to parallel forms, restructuring, reverse engineering, and reengineering, performance measures, compilers, arrays.
Citation:
Bj? Franke, Michael F.P. O'Boyle, "A Complete Compiler Approach to Auto-Parallelizing C Programs for Multi-DSP Systems," IEEE Transactions on Parallel and Distributed Systems, vol. 16, no. 3, pp. 234-245, Mar. 2005, doi:10.1109/TPDS.2005.26
Usage of this product signifies your acceptance of the Terms of Use.