This Article 
 Bibliographic References 
 Add to: 
An Optimal Implementation of Broadcasting with Selective Reduction
March 1993 (vol. 4 no. 3)
pp. 256-269

A model of parallel computation called broadcasting with selective reduction (BSR) can be viewed as a concurrent-read concurrent-write (CRCW) parallel random access machine (PRAM) with one extension. An additional type of concurrent memory access is permitted in BSR, namely the BROADCAST instruction by means of which all N processors may gain access to all M memory locations simultaneously for the purpose of writing. At each memory location, a subset of the incoming broadcast data is selected and reduced to one value finally stored in that location. For several problems, BSR algorithms are known which require fewer steps than the corresponding best-known PRAM algorithms, using the same number of processors. A circuit is introduced to implement the BSR model, and it is shown that, in size and depth, the circuit presented is of the same order as an optimal circuit implementing the PRAM. Thus, if it is reasonable to assume that CRCW PRAM instructions execute in constant time, the assumption of a constant time BROADCAST instruction is no less reasonable.

[1] G. B. Adams, D. P. Agrawal, and H. J. Siegel, "A survey and comparison of fault-tolerant multistage interconnection networks,"IEEE Comput. Mag., vol. 20, pp. 14-27, June 1987.
[2] M. Ajtai, J. Komlos, and E. Szemeredi, "AnO(nlogn) sorting network," inProc. 15th ACM Symp. Theory Comput., 1983, pp. 1-9.
[3] Selim G. Akl,The Design and Analysis of Parallel Algorithms. Englewood Cliffs, NJ: Prentice-Hall, 1989.
[4] S. G. Akl, "On the power of concurrent memory access," inComputing and Information. New York: North-Holland,Proc. Int. Conf. Comput. and Inform., ICCI '89. Toronto, Ont., Canada, May 1989, pp. 49-55.
[5] S. G. Akl, "Reflections on a parallel model of computation," Invited talk, First Great Lakes Computer Science Conference, Kalamazoo, MI, Oct. 1989.
[6] S. G. Akl, "Parallel synergy,"J. Parallel Algorithms and Appl., to be published.
[7] S. G. Akl and G. R. Guenther, "Broadcasting with selective reduction," inProc. Inform. Processing 89, Proc. IFIP 11th World Comput. Congress, San Francisco, CA, G. X. Ritter, Ed. New York: North-Holland, 1989, pp. 515-520.
[8] S. G. Akl and G. R. Guenther, "Application of BSR to the maximal sum subsegment problem,"Int. J. High Speed Comput., vol. 3, no. 2, pp. 107-119, June 1991.
[9] H. Alt, T. Hagerup, K. Mehlhorn, and F. P. Preparata, "Deterministic simulation of idealized parallel computers on more realistic ones,"SIAM J. Comput., vol. 16, no. 5, pp. 808-835, Oct. 1987.
[10] K. E. Batcher, "Sorting networks and their applications," inProc. AFIPS 1968 Spring Joint Comput. Conf., Atlantic City, NJ, Apr. 1968, pp. 307-314.
[11] G. E. Blelloch, "Scans as primitive parallel operations,"IEEE Trans. Comput., vol. 38, no. 11, pp. 1526-1538, Nov. 1989.
[12] G.E. Blelloch,Vector Models for Data-Parallel Computing, MIT Press, Cambridge, Mass., 1990.
[13] S. Cook and C. Dwork, "Bounds on time for parallel RAMs to compute simple functions," inProc. 14th ACM Symp. Theory Comput., pp. 231-233.
[14] L. Fava, "The design of an efficient BSR network," M.Sc. thesis, Dep. Comput. Inform. Sci., Queen's Univ., Kingston, Ont., Canada, Sept. 1990.
[15] S. Fortune and J. Wyllie, "Parallelism in random access machines," inProc. 10th Annu. ACM Symp. Theory Comput., 1978, pp. 114-118.
[16] A. Gibbons and W. Rytter,Efficient Parallel Algorithms. Cambridge, England: Cambridge University Press, 1988.
[17] A. Gottlieb, "An overview of the NYU Ultracomputer project," inSpecial Topics in Supercomputing, Vol. 1, Experimental Parallel Computing Architectures, J. J. Dongarra, Ed. Amsterdam, The Netherlands: Elsevier Science, 1987.
[18] R. M. Karp and V. Ramachandran, "Parallel algorithms for shared-memory machines," inHandbook of Theoretical Computer Science, J. van Leeuwen, Ed. Cambridge, MA: M.I.T. Press, 1990, ch. 17, pp. 869-941.
[19] L. Kucera, "Parallel computation and conflict in memory access,"Inform. Processing Lett., vol. 14, no. 2, pp. 93-96, Apr. 1982.
[20] V. P. Kumar and S. M. Reddy, "Fault-tolerant multistage interconnection networks for multiprocessor systems," inConcurrent Computations: Algorithms, Architecture, and Technology, S. K. Tewksbury, B. W. Dickinson, and S. C. Schwartz, Eds. New York: Plenum, 1988, pp. 495-523.
[21] D. Nassimi and S. Sahni, "Data broadcasting in SIMD computers,"IEEE Trans. Comput., vol. C-30, no. 2, pp. 101-106, Feb. 1981.
[22] I. Parberry, "Parallel complexity theory, " inResearch Notes in Theoretical Computer Science. London, England: Pitman, 1987.
[23] M. S. Paterson, "Improved sorting networks with O(logN) depth,"Algorithmica, vol. 5, pp. 75-92, 1990.
[24] A. G. Ranade, "How to emulate shared memory,"J. Comput. Syst. Sci., vol. 42, no. 3, pp. 307-326, June 1991.
[25] J. Rothstein, "On the ultimate limitations of parallel processing," inProc. 1986 Int. Conf. Parallel Processing, Detroit, MI, Aug. 1976, pp. 206-212.
[26] J. T. Schwartz, "Ultracomputers,"ACM Trans. Programming Languages Syst., vol. 2, no. 4, pp. 484-521, Oct. 1980.
[27] C. E. Shannon, "Memory requirements in a telephone exchange,"Bell Syst. Tech. J., vol. 29, pp. 343-349, 1950.
[28] X. Shi, "Contributions to sequence problems," M.Sc. thesis, Dep. Comput. Inform. Sci., Queen's Univ., Kingston, Ont., Canada, Sept. 1991.
[29] M. Snir, "On parallel searching,"SIAM J. Comput., vol. 12, no. 3, pp. 688-708, Aug. 1985.
[30] L. Snyder, "Type architectures, shared memory and the corollary of modest potential,"Annu. Rev. Comput. Sci., vol. 1, pp. 289-317, 1986.
[31] F. Springsteel and I. Stojmenovic, "Parallel general prefix computations with geometric, algebraic, and other applications," inProc. FCT '89, Szeged, Hungary,Lecture Notes in Computer Science380, 1989, pp. 424-433.
[32] J. D. Ullman,Computational Aspects of VLSI. Rockville, MD: Computer Science Press, 1984.
[33] U. Vishkin, "A parallel-design distributed-implementation (PDDI) general-purpose computer,"Theoret. Comput. Sci., vol. 32, pp. 157-172, 1984.

Index Terms:
Index Termsoptimal implementation; broadcasting with selective reduction; parallel computation;concurrent-read concurrent-write; parallel random access machine; PRAM; concurrentmemory access; BROADCAST instruction; memory locations; instruction sets; parallelalgorithms; random-access storage
L. Fava Lindon, S.G. Akl, "An Optimal Implementation of Broadcasting with Selective Reduction," IEEE Transactions on Parallel and Distributed Systems, vol. 4, no. 3, pp. 256-269, March 1993, doi:10.1109/71.210809
Usage of this product signifies your acceptance of the Terms of Use.