This Article 
 Bibliographic References 
 Add to: 
Pipelined Data Parallel Algorithms-II: Design
October 1990 (vol. 1 no. 4)
pp. 486-499

A methodology for designing pipelined data-parallel algorithms on multicomputers is studied. The design procedure starts with a sequential algorithm which can be expressed as a nested loop with constant loop-carried dependencies. The procedure's main focus is on partitioning the loop by grouping related iterations together. Grouping is necessary to balance the communication overhead with the available parallelism and to produce pipelined execution patterns, which result in pipelined data-parallel computations. The grouping should satisfy dependence relationships among the iterations and also allow the granularity to be controlled. Various properties of grouping are studied, and methods for generating communication-efficient grouping are given. Given a grouping and an assignment of the groups to the processors, an analytic model is combined with the grouping results to describe the behavior and to estimate the performance of the resultant parallel program. Expressions characterizing the performance are derived.

[1] W. L. Athas and C. L. Seitz, "Multicomputers: Message-passing concurrent computers,"IEEE Comput. Mag., pp. 9-24, Aug. 1988.
[2] C. Callahan and K. Kennedy, "Compiling programs for distributed-memory multiprocessors," inProc. 1988 Workshop on Programming Languages and Compilers for Parallel Computing, Aug. 1988.
[3] R. Cytron, "Doacross: Beyond vectorization for multiprocessors (extended abstract)," inProc. 1986 Int. Conf. Parallel Processing, Aug. 1986, pp. 836-844.
[4] W.J. Dally and C.L. Seitz, "Deadlock-Free Message Routing in Multiprocessor Interconnection Networks,"IEEE Trans. Computers, Vol. C-36, No. 5, May 1987, pp. 547-553.
[5] K. Gallivan, W. Jalby, and D. Gannon, "On the problem of optimizing data transfers for complex memory systems," inProc. 1988 ACM Int. Conf. Supercomput., St. Malo France, July 1988, pp. 238,253.
[6] D. C. Grunwald and D. A. Reed, "Networks for parallel processors: Measurements and prognostications," inProc. Third Conf. Hypercube Concurrent Comput. Appl., vol. I, 1988, pp. 610-619.
[7] M. A. Holliday, "Page table management in local/remote architectures," inProc. 1988 ACM Int. Conf. Supercomput., July 1988. pp. 1-8.
[8] K. Hwang, "Advanced parallel processing with supercomputer architectures,"Proc. IEEE, pp. 1348-1379, Oct. 1987.
[9] C. T. King, W. H. Chou, and L. M. Ni, "Pipelined data parallel algorithms-Part I: Concept and modeling,"IEEE Trans. Parallel Distributed Syst., this issue, pp. 470-485.
[10] C. Koelbel, P. Mehrotra, and J. Von Rosendale, "Semi-automatic process partitioning for parallel computation,"Int. J. Parallel Programming, vol. 16, no. 5, pp. 365-382, 1987.
[11] D. J. Kuck, R.H. Kuhn, B. Leasure, D.A. Padua, and M. Wolfe, "Compiler transformation of dependence graphs," inConf. Rec. 8th ACM Symp. Principles Program. Languages, Williamsburg, VA, Jan. 1981.
[12] P. Lee and Z. M. Kedem, "Synthesizing linear array algorithms from nested for loop algorithms,"IEEE Trans. Comput., vol. 37, no. 12, pp. 1578-1598, Dec. 1988.
[13] G. Li and B. W. Wah, "The design of optimal systolic arrays,"IEEE Trans. Comput., pp. 66-77, Jan. 1985.
[14] K. Li, "IVY: A shared virtual memory system for parallel computing," inProc. 1988 Int. Conf. Parallel Processing, vol. 2, Aug. 1988, pp. 94-101.
[15] W. L. Miranker and A. Winkler, "Spacetime representations of computational structures,"Computing, vol. 32, pp. 93-114, 1984.
[16] D. I. Moldovan, "On the analysis and synthesis of VLSI algorithms,"IEEE Trans. Comput., vol. C-31, no. 11, pp. 1121-1126, Nov. 1982.
[17] L. M. Ni and C. T. King, "On partitioning and mapping for hypercube computing,"Int. J. Parallel Programming, vol. 17, no. 6, pp. 475-495, Dec. 1988.
[18] D. A. Padua and M. J. Wolfe, "Advanced compiler optimizations for supercomputers,"Common. ACM, vol. 29, no. 12, pp. 1184- 1201, Dec. 1986.
[19] J. K. Peir and R. Cytron, "Minimum distance: A method for partitioning recurrences for multiprocessors,"IEEE Trans. Comput., vol. 38, no. 8, pp. 1203-1211, Aug. 1989.Parallel Processing, pp. 217-225, Aug. 1987.
[20] G. F. Pfister, W. Brentley, D. George, S. Harvey, W. Kleinfelder, K. McAuliffe, E. Melton, V. Norton, and J. Weiss, "The IBM Research Parallel Processor Prototype (RP3): Introduction and architecture," inProc. 1985 Int. Conf. Parallel Processing, Aug. 1985, pp. 764-771.
[21] C. D. Polychronopoulos, D. J. Kuck, and D. A. Padua, "Utilizing multidimensional loop parallelism on large-scale parallel processor systems,"IEEE Trans. Comput., vol. 38, no. 9, pp. 1285-1296, Sept. 1989.
[22] D. Pountain, "Configuring parallel programs-Part I,"Byte, vol. 14, no. 13, pp. 349-352, Dec. 1989.
[23] M. J. Quinn, P. J. Hatcher, and K. C. Jourdenais, "Compiling C* programs for a hypercube multicomputer," inProc. ACM SIGPLAN Parallel Programming: Experience with Appl., Languages, Syst., 1988, pp. 57-65.
[24] C. V. Ramamoorthy and G. S. Ho. "Performance evaluation of asynchronous concurrent systems using Petri nets,"IEEE Trans. Software Eng., vol. SE-6, no. 5, pp. 440-449, Sept. 1980.
[25] C. Scheurich and M. Dubois, "Dynamic page migration in multiprocessors with distributed global memory,"IEEE Trans. Comput., vol. 38, no. 8, pp. 1154-1163, Aug. 1989.
[26] M. Wolfe and U. Banerjee, "Data Dependence and Its Application to Parallel Processing,"Int'l J. Parallel Programming, Vol. 16, No. 2, Apr. 1987, pp. 137-178.

Index Terms:
Index Termsdata parallel algorithms; sequential algorithm; nested loop; loop-carried dependencies; partitioning; parallelism; pipelined execution patterns; grouping; dependence relationships; parallel program; parallel algorithms; performance evaluation
C.T. King, W.H. Chou, L.M. Ni, "Pipelined Data Parallel Algorithms-II: Design," IEEE Transactions on Parallel and Distributed Systems, vol. 1, no. 4, pp. 486-499, Oct. 1990, doi:10.1109/71.80176
Usage of this product signifies your acceptance of the Terms of Use.