
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
C.T. King, W.H. Chou, L.M. Ni, "Pipelined Data Parallel AlgorithmsI: Concept and Modeling," IEEE Transactions on Parallel and Distributed Systems, vol. 1, no. 4, pp. 470485, October, 1990.  
BibTex  x  
@article{ 10.1109/71.80175, author = {C.T. King and W.H. Chou and L.M. Ni}, title = {Pipelined Data Parallel AlgorithmsI: Concept and Modeling}, journal ={IEEE Transactions on Parallel and Distributed Systems}, volume = {1}, number = {4}, issn = {10459219}, year = {1990}, pages = {470485}, doi = {http://doi.ieeecomputersociety.org/10.1109/71.80175}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Parallel and Distributed Systems TI  Pipelined Data Parallel AlgorithmsI: Concept and Modeling IS  4 SN  10459219 SP470 EP485 EPD  470485 A1  C.T. King, A1  W.H. Chou, A1  L.M. Ni, PY  1990 KW  Index Termspipelined dataparallel algorithms; pipelined operations; data level partitioning; data parallelism; Petri nets; parallel algorithms VL  1 JA  IEEE Transactions on Parallel and Distributed Systems ER   
The basic concept of pipelined dataparallel algorithms is introduced by contrasting the algorithms with other styles of computation and by a simple example (a pipeline image distance transformation algorithm). Pipelined dataparallel algorithms are a class of algorithms which use pipelined operations and data level partitioning to achieve parallelism. Applications which involve data parallelism and recurrence relations are good candidates for this kind of algorithm. The computations are ideal for distributedmemory multicomputers. By controlling the granularity through data partitioning and overlapping the operations through pipelining, it is possible to achieve a balanced computation on multicomputers. An analytic model is presented for modeling pipelined dataparallel computation on multicomputers. The model uses timed Petri nets to describe data pipelining operations. As a case study, the model is applied to a pipelined matrix multiplication algorithm. Predicted results match closely with the measured performance on a 64node NCUBE hypercube multicomputer.
[1] W. L. Athas and C. L. Seitz, "Multicomputers: Messagepassing concurrent computers,"IEEE Comput. Mag., pp. 924, Aug. 1988.
[2] V. Cherkassky and R. Smith, "Efficient mapping and implementation of matrix algorithms on a hypercube," Tech. Rep., Dep. Elec. Eng., Univ. of Minnesota, 1987.
[3] W. W. Chu, L. J. Holloway, M. T. Lan, and K. Efe, "Task allocation in distributed data processing,"IEEE Comput. Mag., pp. 5769, Nov. 1980.
[4] W.J. Dally and C.L. Seitz, "DeadlockFree Message Routing in Multiprocessor Interconnection Networks,"IEEE Trans. Computers, Vol. C36, No. 5, May 1987, pp. 547553.
[5] Z. Fang, X. Li, and L. M. Ni, "On the communication complexity of generalized 2D convolution on array processors,"IEEE Trans. Comput., vol. 38, no. 2, pp. 184194, Feb. 1989.
[6] G. C. Fox, S. W. Otto, and A. J. Hey, "Matrix algorithms on a hypercube I: Matrix multiplication,"Parallel Computing, pp. 1731, Jan. 1987.
[7] D. C. Grunwald and D. A. Reed, "Networks for parallel processors: Measurements and prognostications," inProc. Third Conf. Hypercube Concurrent Comput. Appl., vol. I, 1988, pp. 610619.
[8] J. P. Hayes, T. N. Mudge, Q. F. Stout, S. Colley, and J. Palmer,"Architecture of a hypercube supercomputer," inProc. 1986 Int. Conf. Parallel Processing, Aug. 1986, pp. 653660.
[9] W. D. Hillis and G. L. Steele, Jr., "Data parallel algorithms,"Commun. ACM, vol. 29, no. 12, pp. 11701183, Dec. 1986.
[10] K. Hwang and F. A. Briggs,Computer Architecture and Parallel Processing. New York: McGrawHill, 1984.
[11] K. Hwang, "Advanced parallel processing with supercomputer architectures,"Proc. IEEE, pp. 13481378, Oct. 1987.
[12] C. T. King, W. H. Chou, andL. M. Ni, "Pipelined data parallel algorithms: Part IIDesign."IEEE Trans. Parallel Distributed Syst., this issue, pp. 486499.
[13] H. T. Kung, "Why systolic architectures?"IEEE Comput. Mag., no. 1, vol. 15, pp. 3746, Jan. 1982.
[14] H. T. Kung and C. E. Leiserson, "Systolic arrays (for VLSI),"Sparse Matrix Proc., pp. 3263, Jan. 1978.
[15] S. Y. Kung, S. C. Lo, S. N. Jean, and J. N. Hwang, "Wavefront array processorsConcept to implementation,"IEEE Comput. Mag., vol. 20, pp. 1833, July 1987.
[16] P. R. Ma, E. Y. Lee, and M. Tsuchiya, "A task allocation model for distributed computing systems,"IEEE Trans. Comput., pp. 4147, Jan. 1982.
[17] T. N. Mudge, G. D. Buzzard, and T. S. AbdelRahman, "A high performance operating system for the NCUBE," inProc. 2nd Conf. Hypercube Multiprocessors, 1986.
[18] P. A. Nelson and L. Snyder, "Programming paradigms for nonshared memory parallel computers," inThe Characteristics of Parallel Algorithms, L. H. Jamieson, D. B. Gannon, and R. J. Douglass, Eds. Cambridge, MA: MIT Press, 1987.
[19] L. M. Ni and C. T. King, "On partitioning and mapping for hypercube computing,"Int. J. Parallel Programming, vol. 17, no. 6, pp. 475495, Dec. 1988.
[20] A. Osterhaug,Guide to Parallel Programming on Sequent Com puter Systems, Sequent Computer Systems, Beaverton, Ore., 1986.
[21] C. V. Ramamoorthy and G. S. Ho, "Performance evaluation of asynchronous concurrent systems using Petri nets,"IEEE Trans. Software Eng., vol. SE6, no. 5, pp. 440449, Sept. 1980.
[22] D. A. Reed, L. M. Adams, and M. L. Patrick, "Stencils and problem partitionings: Their influence on the performance of multiple processor systems,"IEEE Trans. Comput., vol. C36, pp. 845858, July 1987.
[23] Y. Saad and M. H. Schultz, "Topological properties of hypercubes," Tech. Rep. YALEU/DCS/RR389, Dep. Comput. Sci., Yale Univ., June 1985.
[24] V. Sarkar and J. Hennessy, "Compiletime partitioning and scheduling of parallel programs," inProc. SIGPLAN Symp. Compiler Construction, July 1986, pp. 1726.
[25] Y. Shih and J. Fier, "Hypercube systems and key applications," inParallel Processing for Supercomputing and AI, K. Hwang and D. DeGroot, Eds., 1987.
[26] F. Shih, C. T. King, and C. Pu, "A twoscan algorithm and architecture to a root for morphological filters," inProc. Int. Phoenix Conf. Comput. Commun.. AZ, Mar. 1990.
[27] A. Witkowski, K. Chandrakumar, and G. Macchio, "Concurrent IO system for the hypercube multiprocessor,"Proc. 3rd Conf. on Hypercube Concurrent Comput. and Applics., SIAM, Jan. 1988.