This Article 
 Bibliographic References 
 Add to: 
Pipelined Data Parallel Algorithms-I: Concept and Modeling
October 1990 (vol. 1 no. 4)
pp. 470-485

The basic concept of pipelined data-parallel algorithms is introduced by contrasting the algorithms with other styles of computation and by a simple example (a pipeline image distance transformation algorithm). Pipelined data-parallel algorithms are a class of algorithms which use pipelined operations and data level partitioning to achieve parallelism. Applications which involve data parallelism and recurrence relations are good candidates for this kind of algorithm. The computations are ideal for distributed-memory multicomputers. By controlling the granularity through data partitioning and overlapping the operations through pipelining, it is possible to achieve a balanced computation on multicomputers. An analytic model is presented for modeling pipelined data-parallel computation on multicomputers. The model uses timed Petri nets to describe data pipelining operations. As a case study, the model is applied to a pipelined matrix multiplication algorithm. Predicted results match closely with the measured performance on a 64-node NCUBE hypercube multicomputer.

[1] W. L. Athas and C. L. Seitz, "Multicomputers: Message-passing concurrent computers,"IEEE Comput. Mag., pp. 9-24, Aug. 1988.
[2] V. Cherkassky and R. Smith, "Efficient mapping and implementation of matrix algorithms on a hypercube," Tech. Rep., Dep. Elec. Eng., Univ. of Minnesota, 1987.
[3] W. W. Chu, L. J. Holloway, M. T. Lan, and K. Efe, "Task allocation in distributed data processing,"IEEE Comput. Mag., pp. 57-69, Nov. 1980.
[4] W.J. Dally and C.L. Seitz, "Deadlock-Free Message Routing in Multiprocessor Interconnection Networks,"IEEE Trans. Computers, Vol. C-36, No. 5, May 1987, pp. 547-553.
[5] Z. Fang, X. Li, and L. M. Ni, "On the communication complexity of generalized 2-D convolution on array processors,"IEEE Trans. Comput., vol. 38, no. 2, pp. 184-194, Feb. 1989.
[6] G. C. Fox, S. W. Otto, and A. J. Hey, "Matrix algorithms on a hypercube I: Matrix multiplication,"Parallel Computing, pp. 17-31, Jan. 1987.
[7] D. C. Grunwald and D. A. Reed, "Networks for parallel processors: Measurements and prognostications," inProc. Third Conf. Hypercube Concurrent Comput. Appl., vol. I, 1988, pp. 610-619.
[8] J. P. Hayes, T. N. Mudge, Q. F. Stout, S. Colley, and J. Palmer,"Architecture of a hypercube supercomputer," inProc. 1986 Int. Conf. Parallel Processing, Aug. 1986, pp. 653-660.
[9] W. D. Hillis and G. L. Steele, Jr., "Data parallel algorithms,"Commun. ACM, vol. 29, no. 12, pp. 1170-1183, Dec. 1986.
[10] K. Hwang and F. A. Briggs,Computer Architecture and Parallel Processing. New York: McGraw-Hill, 1984.
[11] K. Hwang, "Advanced parallel processing with supercomputer architectures,"Proc. IEEE, pp. 1348-1378, Oct. 1987.
[12] C. T. King, W. H. Chou, and-L. M. Ni, "Pipelined data parallel algorithms: Part II-Design."IEEE Trans. Parallel Distributed Syst., this issue, pp. 486-499.
[13] H. T. Kung, "Why systolic architectures?"IEEE Comput. Mag., no. 1, vol. 15, pp. 37-46, Jan. 1982.
[14] H. T. Kung and C. E. Leiserson, "Systolic arrays (for VLSI),"Sparse Matrix Proc., pp. 32-63, Jan. 1978.
[15] S. Y. Kung, S. C. Lo, S. N. Jean, and J. N. Hwang, "Wavefront array processors--Concept to implementation,"IEEE Comput. Mag., vol. 20, pp. 18-33, July 1987.
[16] P. R. Ma, E. Y. Lee, and M. Tsuchiya, "A task allocation model for distributed computing systems,"IEEE Trans. Comput., pp. 41-47, Jan. 1982.
[17] T. N. Mudge, G. D. Buzzard, and T. S. Abdel-Rahman, "A high performance operating system for the NCUBE," inProc. 2nd Conf. Hypercube Multiprocessors, 1986.
[18] P. A. Nelson and L. Snyder, "Programming paradigms for nonshared memory parallel computers," inThe Characteristics of Parallel Algorithms, L. H. Jamieson, D. B. Gannon, and R. J. Douglass, Eds. Cambridge, MA: MIT Press, 1987.
[19] L. M. Ni and C. T. King, "On partitioning and mapping for hypercube computing,"Int. J. Parallel Programming, vol. 17, no. 6, pp. 475-495, Dec. 1988.
[20] A. Osterhaug,Guide to Parallel Programming on Sequent Com puter Systems, Sequent Computer Systems, Beaverton, Ore., 1986.
[21] C. V. Ramamoorthy and G. S. Ho, "Performance evaluation of asynchronous concurrent systems using Petri nets,"IEEE Trans. Software Eng., vol. SE-6, no. 5, pp. 440-449, Sept. 1980.
[22] D. A. Reed, L. M. Adams, and M. L. Patrick, "Stencils and problem partitionings: Their influence on the performance of multiple processor systems,"IEEE Trans. Comput., vol. C-36, pp. 845-858, July 1987.
[23] Y. Saad and M. H. Schultz, "Topological properties of hypercubes," Tech. Rep. YALEU/DCS/RR-389, Dep. Comput. Sci., Yale Univ., June 1985.
[24] V. Sarkar and J. Hennessy, "Compile-time partitioning and scheduling of parallel programs," inProc. SIGPLAN Symp. Compiler Construction, July 1986, pp. 17-26.
[25] Y. Shih and J. Fier, "Hypercube systems and key applications," inParallel Processing for Supercomputing and AI, K. Hwang and D. DeGroot, Eds., 1987.
[26] F. Shih, C. T. King, and C. Pu, "A two-scan algorithm and architecture to a root for morphological filters," inProc. Int. Phoenix Conf. Comput. Commun.. AZ, Mar. 1990.
[27] A. Witkowski, K. Chandrakumar, and G. Macchio, "Concurrent IO system for the hypercube multiprocessor,"Proc. 3rd Conf. on Hypercube Concurrent Comput. and Applics., SIAM, Jan. 1988.

Index Terms:
Index Termspipelined data-parallel algorithms; pipelined operations; data level partitioning; data parallelism; Petri nets; parallel algorithms
C.T. King, W.H. Chou, L.M. Ni, "Pipelined Data Parallel Algorithms-I: Concept and Modeling," IEEE Transactions on Parallel and Distributed Systems, vol. 1, no. 4, pp. 470-485, Oct. 1990, doi:10.1109/71.80175
Usage of this product signifies your acceptance of the Terms of Use.