• Publication
  • 2000
  • Issue No. 3 - March
  • Abstract - Automatic Mapping of System of N-Dimensional Affine Recurrence Equations (SARE) onto Distributed Memory Parallel Systems
 This Article 
 Bibliographic References 
 Add to: 
Automatic Mapping of System of N-Dimensional Affine Recurrence Equations (SARE) onto Distributed Memory Parallel Systems
March 2000 (vol. 26 no. 3)
pp. 262-275

Abstract—Automatic extraction of parallelism from algorithms, and the consequent parallel code generation, is a challenging problem. In this work, we present a procedure for automatic parallel code generation in the case of algorithms described through Set of Affine Recurrence Equations (SARE); starting from the original SARE description in an N-dimensional iteration space, the algorithm is converted into a parallel code for an (eventually virtual) m-dimensional distributed memory parallel machine ($m). In the paper, we demonstrate some theorems which are the mathematical basis for the proposed parallel generation tool. The projection technique used in the tool is based on the polytope model. Some affine transformations are introduced to project the polytope from the original iteration space onto another polytope, preserving the SARE semantic, in the processor-time $({\rm t,p})$ space. Points in $({\rm t,p})$ are individuated through the m-dimensional p coordinate and the n-dimensional t coordinate, resulting in $N=n+m$. Along with polytope transformation, a methodology to generate the code within processors is given. Finally, a cost function, used to guide the heuristic search for the polytope transformation and derived from the actual implementation of the method on an MPP SIMD machine, is introduced.

[1] P. Clauss, “An Efficient Allocation Strategy for Mapping Affine Recurrences into Space and Time Optima Regular Processor Arrays,” Parcella, Sept. 1994.
[2] P. Clauss and G.R. Perrin, “Optimal Mapping of Systolic Algorithms by Regular Instruction Shifts,” Proc. IEEE Int'l Conf. Application-Specific Array Processors, ASAP, pp. 224-235, Aug. 1994.
[3] P. Feautrier, “Automatic Parallelization in the Polytope Model,” Les Menuires, pp. 79-100, 1996.
[4] P. Feautrier, “Some Efficient Solution to the Affine Scheduling Problem, II Multi-Dimensional Time,” Int'l J. Parallel Programming, vol. 21, no. 6, pp. 389-420, Dec. 1992.
[5] I.N. Herstein, Topics in Algebra. John Wiley&Sons, 1975.
[6] P. Lee and Z. Kedem, “Synthesizing Linear Array Algorithms from Nested For Loop Algorithms,” IEEE Trans. Computers, vol. 37, no. 12, pp. 1,578-1,598, Dec. 1988.
[7] C. Lengauer, “Loop Parallelization in the Polytope Model,” CONCUR, pp. 398-416, 1993.
[8] V. Loechner and C. Mongenet, “OPERA: A Toolbox for Loop Parallelization,” Proc. Int'l Workshop Software Eng. for Parallel and Distributed Systems, PDSE, 1996.
[9] V. Loechner and C. Mongenet, “A Toolbox for Affine Recurrence Equations Parallelization,” Proc. High Performance Computer Networking '95, pp. 263-268, May 1995.
[10] V. Loechner and C. Mongenet, “Solutions to the Communication Minimization Problem for Affine Recurrence Equations,” Proc. EUROPAR '97, pp. 328-337, Aug. 1997.
[11] C. Mongenet, “Data Compiling for System of Affine Recurrence Equations,” Proc. IEEE Int'l Conf. Application-Specific Array Processors, ASAP, pp. 212-223, Aug. 1994.
[12] C. Mongenet, P. Clauss, and G.R. Perrin, “Geometrical Tools to Map System of Affine Recurrence Equations on Regular Arrays,” Acta Informatica, vol. 31, no. 2, pp. 137-160, 1994.
[13] H. Le Verge, V. Van Dongen, and D.K. Wilde, “Loop Nest Synthesis Using the Polyhedral Library,” IRISA, Internal Report no. 830, May 1994.
[14] D.K. Wilde, “A Library for Doing Polyhedral Operations,” IRISA, Internal Report no. 785, Dec. 1993.
[15] A. Bartoloni et al, “A Hardware Implementation of the APE100 Architecture,” Int'l J. Modern Physics, vol. C4, 1993.
[16] L. Lamport, "The Parallel Execution of DO Loops," Comm. ACM, vol. 17, Feb. 1974.
[17] A. Darte and Y. Robert, “Affine-by-Statement Scheduling of Uniform and Affine Loop Nests over Parametric Domains,” J. Parallel and Distributed Computing, vol. 29, pp. 43-59, 1995.
[18] A. Darte and Y. Robert, “Mapping Uniform Loop Nests onto Distributed Memory Architectures,” Research Report no. 93-03, LIP-ENS Lyon, Jan. 1993.
[19] M.L. Dowling, “Optimal Code Parallelization Using Unimodular Transformations,” Parallel Computing, vol. 16, pp. 157-171, 1990.
[20] V. Loechner and D.K. Wilde, “Parameterized Polyhedra and Their Vertices,” Int'l J. Parallel Programming, vol. 25, no. 6, Dec. 1997.
[21] P. Clauss, V. Loechner, and D. Wilde, “Deriving Formulae to Count Solutions to Parameterized Linear Systems Using Ehrhart Polynomials: Applications to the Analysis of Nested-Loop Programs,” downloadable fromwww.ee.byu.edu/~wildepubs.html.
[22] P. Clauss and V. Loechner, “Parametric Analysis of Polyhedral Iteration Spaces,” Proc. IEEE Int'l Conf. Application-Specific Array Processors, ASAP, Aug. 1996.
[23] C. Mongenet, “Affine Dependence Classification for Communications Minimization,” ICPS Research Report No. 96-07, downloadable fromicps.u-strasbg.frpub-96/.
[24] W. Li and K. Pingali, “A Singular Loop Transformation Framework Based on Non-Singular Matrices,” Proc. Fifth Workshop Languages and Compilers for Parallel Computers, pp. 249-260, 1992.
[25] A. Schrijver, Theory of Linear and Integer Programming. John Wiley, 1986.
[26] V. Lefebvre and P. Feautrier, “Automatic Storage Management for Parallel Programs,” Parallel Computing, vol. 24, pp. 649-671, 1998.
[27] S. Kirkpatrick, C.D. Gelatt, and M.P. Vecchi, “Optimization by Simulated Annealing,” Science, vol. 220, no. 4,589, 13 May 1983.
[28] A. Deckers and E. Aarts, “Global Optimization and Simulated Annealing,” Math. Programming, vol. 50, 1991.
[29] L. Polverini, “Generation of Integral Unimodular Matrices to ProjectN-Dimensional Algorithms onto am-Dimensional Processor Space ($m<N$),” master's thesis, Univ.“La Sapienza,”Rome, 1998 (in Italian).
[30] K.H. Zimmermann, “Linear Mapping ofn-Dimensional Uniform Recurrences ontok-Dimensional Systolic Arrays,” J. VLSI Signal Processing, vol. 12, pp. 187-202, 1996.
[31] K.H. Zimmermann, “A Unifying Lattice-Based Approach for the Partitioning of Systolic Arrays via LPGS and LSGP,” J. VLSI Signal Processing, vol. 17, pp. 21-41, 1997.

Index Terms:
Automatic parallelization, polytope model, affine functions, n-dimensional projection, SARE.
Alessandro Marongiu, Paolo Palazzari, "Automatic Mapping of System of N-Dimensional Affine Recurrence Equations (SARE) onto Distributed Memory Parallel Systems," IEEE Transactions on Software Engineering, vol. 26, no. 3, pp. 262-275, March 2000, doi:10.1109/32.842951
Usage of this product signifies your acceptance of the Terms of Use.