This Article 
 Bibliographic References 
 Add to: 
The Block Distributed Memory Model
August 1996 (vol. 7 no. 8)
pp. 830-840

Abstract—We introduce a computation model for developing and analyzing parallel algorithms on distributed memory machines. The model allows the design of algorithms using a single address space and does not assume any particular interconnection topology. We capture performance by incorporating a cost measure for interprocessor communication induced by remote memory accesses. The cost measure includes parameters reflecting memory latency, communication bandwidth, and spatial locality. Our model allows the initial placement of the input data and pipelined prefetching.

We use our model to develop parallel algorithms for various data rearrangement problems, load balancing, sorting, FFT, and matrix multiplication. We show that most of these algorithms achieve optimal or near optimal communication complexity while simultaneously guaranteeing an optimal speed-up in computational complexity. Ongoing experimental work in testing and evaluating these algorithms has thus far shown very promising results.

[1] A. Aggarwal, A.K. Chandra, and M. Snir, "On Communication Latency in PRAM Computation," Proc. ACM Symp. Parallel Algorithms and Architectures, pp. 11-21, ACM, June 1989.
[2] A. Aggarwal, A.K. Chandra, and M. Snir, "Hierarchical Memory with Block Transfer," Proc. 28th Ann. Symp. Foundations of Computer Science, pp. 204-216, Oct. 1987.
[3] A. Agarwal et. al., “APRIL: A processor architecture for multiprocessing,” Proc. of the 17th Int’l Symp. on Computer Architecture, 1990, pp. 104-114.
[4] D.A. Bader, D.R. Helman, and J. JáJá, “Practical Parallel Algorithms for Personalized Communication and Integer Sorting,” Technical Report UMIACS TR 95-101, Inst. for Advanced Computer Studies, Univ. of Maryland, 1995. .
[5] D.A. Bader and J. JáJá, "Parallel Algorithms for Image Histogramming and Connected Components with an Experimental Study," Proc. Fifth ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, ACM Press, New York, 1995, pp. 123-133.
[6] D.A. Bader and J. JàJà, "Practical Parallel Algorithms for Dynamic Data Redistribution, Median Finding, and Selection," Proc. 10th Int'l Parallel Processing Symp., pp. 292-301, Apr. 1996.
[7] D. Bader, J. JáJá, D. Harwood, and L. Davis, "Parallel Algorithms for Image Enhancement and Segmentation by Region Growing with an Experimental Study," Proc. Int'l Parallel Processing Symp., Apr. 1996.
[8] A. Bar-Noy and S. Kipnis, "Designing Broadcasting Algorithms in the Postal Model for Message-Passing Systems," Proc. ACM Symp. Parallel Algorithms and Architectures, pp. 11-22, June 1992.
[9] A. Bar-Noy and S. Kipnis,“Multiple message broadcasting in the postal model, Proc. Seventh Int’l Parallel Processing Symp., IEEE, Apr. 1993.
[10] G.E. Blelloch, C.E. Leiserson, B.M. Maggs, C.G. Plaxton, S. Smith, and M. Zagha, "A Comparison of Sorting Algorithms for the Connection Machine CM-2," Proc. Third Ann. ACM Symp. Parallel Algorithms and Architectures, pp. 3-16, July 1991.
[11] H. Burkhardt III, S. Frank, B. Knobe, and J. Rothnie, Overview of the KSRI Computer System. TR KSR_TR_9202001,Boston: Kendall Square Rescard, Feb. 1992
[12] D. Culler,R. Karp,D. Patterson,A. Sahay,K.E. Schauser,E. Santos,R. Subramonian,, and T. von Eicken,“LogP: Towards a realistic model of parallel computation,” Fourth Symp. Principles and Practices Parallel Programming, SIGPLAN’93, ACM, May 1993.
[13] P.B. Gibbons, "Asynchronous PRAM Algorithms," a chapter in Synthesis of Parallel Algorithms. J.H. Reif, ed., Morgan-Kaufman, 1990.
[14] A. Gupta and V. Kumar, "Scalability of Parallel Algorithms for the Matrix Multiplication," Proc. Int'l Conf. Parallel Processing, vol. III, pp. 115-123, 1993.
[15] E. Hagersten, S. Haridi, and D. Warren, "The Cache-Coherence Protocol of the Data Diffusion Machine," M. Dubois and S. Thakkar, eds., Cache and Interconnect Architectures in Multiprocessors. Kluwer Academic Publishers, 1990.
[16] S. Hambrusch and A. Khokhar, "$C^3$: An Architecture Independent Model for Coarse-Grained Parallel Machines," Proc. Sixth IEEE Symp. Parallel and Distributed Processing, pp. 554-551, 1994.
[17] T. Heywood and S. Ranka, "A Practical Hierarchical Model of Parallel Computation," J. Parallel and Distributed Computing, vol. 16, pp. 212-249, 1992.
[18] J. JáJá and K.W. Ryu, "Load Balancing and Routing on the Hypercube and Related Networks," J. Parallel and Distributed Computing vol. 14, pp. 431-435, 1992.
[19] R. Karp,A. Sahay,E. Santos,, and K.E. Schauser,“Optimal broadcast and summation in the LogP model,” Proc. Fifth Ann. Symp. Parallel Algorithms and Architectures, ACM, June 1993.
[20] C.P. Kruskal, L. Rudolph, and M. Snir, "A Complexity Theory of Efficient Parallel Algorithms," Theoretical Computer Science 71, pp. 95-132, 1990.
[21] T. Leighton, "Tight Bounds on the Complexity of Parallel Sorting," IEEE Trans. Computers, vol. 34, no. 4, pp. 344-354, Apr. 1985.
[22] D. Lenoski et al., “The Stanford DASH Multiprocessor,” Computer, pp. 63-79, Mar. 1992.
[23] C. Van Loan, “Computational Frameworks for the Fast Fourier Transform,” SIAM, 1992.
[24] J.M. Marberg and E. Gafni, "Sorting in Constant Number of Row and Column Phases on a Mesh," Algorithmica vol. 3, pp. 561-572, 1988.
[25] K. Mehrotra, S. Ranka, and J.-C. Wang, "A Probabilistic Analysis of a Locality Maintaining Load Balancing Algorithm," Proc. Seventh Int'l Parallel Processing mp., pp. 369-373, Apr. 1993.
[26] S. Rajasekaran and T. Tsantilas, "Optimal Routing Algorithms for Mesh-connected Processor Arrays," Algorithmica, vol. 8, pp. 21-38, 1992.
[27] K.W. Ryu and J. JáJá, "Efficient Algorithms for List Ranking and for Solving Graph Problems on the Hypercube," IEEE Trans. Parallel and Distributed Systems, vol. 1, no. 1, pp. 83-90, 1990.
[28] H. Shi and J. Schaeffer, “Parallel Sorting by Regular Sampling,” J. Parallel and Distributed Computing, vol. 14, pp. 361-370, 1992.
[29] H.J. Siegel et. al., "Report of the Purdue Workshop in Grand Challenges in Computer Architecture for the Support of High Performance Computing," J. Parallel and Distributed Computing, vol. 16, no. 3, pp. 198-211, 1992.
[30] L.G. Valiant, “A Bridging Model for Parallel Computation,” Comm. ACM, vol. 33, no. 8, pp. 103-111, Aug. 1990.

Index Terms:
Parallel algorithms, parallel model, personalized communication, broadcasting, load balancing, sorting, Fast Fourier Transform, matrix multiplication.
Joseph F. JáJá, Kwan Woo Ryu, "The Block Distributed Memory Model," IEEE Transactions on Parallel and Distributed Systems, vol. 7, no. 8, pp. 830-840, Aug. 1996, doi:10.1109/71.532114
Usage of this product signifies your acceptance of the Terms of Use.