
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
Joseph F. JáJá, Kwan Woo Ryu, "The Block Distributed Memory Model," IEEE Transactions on Parallel and Distributed Systems, vol. 7, no. 8, pp. 830840, August, 1996.  
BibTex  x  
@article{ 10.1109/71.532114, author = {Joseph F. JáJá and Kwan Woo Ryu}, title = {The Block Distributed Memory Model}, journal ={IEEE Transactions on Parallel and Distributed Systems}, volume = {7}, number = {8}, issn = {10459219}, year = {1996}, pages = {830840}, doi = {http://doi.ieeecomputersociety.org/10.1109/71.532114}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Parallel and Distributed Systems TI  The Block Distributed Memory Model IS  8 SN  10459219 SP830 EP840 EPD  830840 A1  Joseph F. JáJá, A1  Kwan Woo Ryu, PY  1996 KW  Parallel algorithms KW  parallel model KW  personalized communication KW  broadcasting KW  load balancing KW  sorting KW  Fast Fourier Transform KW  matrix multiplication. VL  7 JA  IEEE Transactions on Parallel and Distributed Systems ER   
Abstract—We introduce a computation model for developing and analyzing parallel algorithms on distributed memory machines. The model allows the design of algorithms using a single address space and does not assume any particular interconnection topology. We capture performance by incorporating a cost measure for interprocessor communication induced by remote memory accesses. The cost measure includes parameters reflecting memory latency, communication bandwidth, and spatial locality. Our model allows the initial placement of the input data and pipelined prefetching.
We use our model to develop parallel algorithms for various data rearrangement problems, load balancing, sorting, FFT, and matrix multiplication. We show that most of these algorithms achieve optimal or near optimal communication complexity while simultaneously guaranteeing an optimal speedup in computational complexity. Ongoing experimental work in testing and evaluating these algorithms has thus far shown very promising results.
[1] A. Aggarwal, A.K. Chandra, and M. Snir, "On Communication Latency in PRAM Computation," Proc. ACM Symp. Parallel Algorithms and Architectures, pp. 1121, ACM, June 1989.
[2] A. Aggarwal, A.K. Chandra, and M. Snir, "Hierarchical Memory with Block Transfer," Proc. 28th Ann. Symp. Foundations of Computer Science, pp. 204216, Oct. 1987.
[3] A. Agarwal et. al., “APRIL: A processor architecture for multiprocessing,” Proc. of the 17th Int’l Symp. on Computer Architecture, 1990, pp. 104114.
[4] D.A. Bader, D.R. Helman, and J. JáJá, “Practical Parallel Algorithms for Personalized Communication and Integer Sorting,” Technical Report UMIACS TR 95101, Inst. for Advanced Computer Studies, Univ. of Maryland, 1995. .
[5] D.A. Bader and J. JáJá, "Parallel Algorithms for Image Histogramming and Connected Components with an Experimental Study," Proc. Fifth ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, ACM Press, New York, 1995, pp. 123133.
[6] D.A. Bader and J. JàJà, "Practical Parallel Algorithms for Dynamic Data Redistribution, Median Finding, and Selection," Proc. 10th Int'l Parallel Processing Symp., pp. 292301, Apr. 1996.
[7] D. Bader, J. JáJá, D. Harwood, and L. Davis, "Parallel Algorithms for Image Enhancement and Segmentation by Region Growing with an Experimental Study," Proc. Int'l Parallel Processing Symp., Apr. 1996.
[8] A. BarNoy and S. Kipnis, "Designing Broadcasting Algorithms in the Postal Model for MessagePassing Systems," Proc. ACM Symp. Parallel Algorithms and Architectures, pp. 1122, June 1992.
[9] A. BarNoy and S. Kipnis,“Multiple message broadcasting in the postal model, Proc. Seventh Int’l Parallel Processing Symp., IEEE, Apr. 1993.
[10] G.E. Blelloch, C.E. Leiserson, B.M. Maggs, C.G. Plaxton, S. Smith, and M. Zagha, "A Comparison of Sorting Algorithms for the Connection Machine CM2," Proc. Third Ann. ACM Symp. Parallel Algorithms and Architectures, pp. 316, July 1991.
[11] H. Burkhardt III, S. Frank, B. Knobe, and J. Rothnie, Overview of the KSRI Computer System. TR KSR_TR_9202001,Boston: Kendall Square Rescard, Feb. 1992
[12] D. Culler,R. Karp,D. Patterson,A. Sahay,K.E. Schauser,E. Santos,R. Subramonian,, and T. von Eicken,“LogP: Towards a realistic model of parallel computation,” Fourth Symp. Principles and Practices Parallel Programming, SIGPLAN’93, ACM, May 1993.
[13] P.B. Gibbons, "Asynchronous PRAM Algorithms," a chapter in Synthesis of Parallel Algorithms. J.H. Reif, ed., MorganKaufman, 1990.
[14] A. Gupta and V. Kumar, "Scalability of Parallel Algorithms for the Matrix Multiplication," Proc. Int'l Conf. Parallel Processing, vol. III, pp. 115123, 1993.
[15] E. Hagersten, S. Haridi, and D. Warren, "The CacheCoherence Protocol of the Data Diffusion Machine," M. Dubois and S. Thakkar, eds., Cache and Interconnect Architectures in Multiprocessors. Kluwer Academic Publishers, 1990.
[16] S. Hambrusch and A. Khokhar, "$C^3$: An Architecture Independent Model for CoarseGrained Parallel Machines," Proc. Sixth IEEE Symp. Parallel and Distributed Processing, pp. 554551, 1994.
[17] T. Heywood and S. Ranka, "A Practical Hierarchical Model of Parallel Computation," J. Parallel and Distributed Computing, vol. 16, pp. 212249, 1992.
[18] J. JáJá and K.W. Ryu, "Load Balancing and Routing on the Hypercube and Related Networks," J. Parallel and Distributed Computing vol. 14, pp. 431435, 1992.
[19] R. Karp,A. Sahay,E. Santos,, and K.E. Schauser,“Optimal broadcast and summation in the LogP model,” Proc. Fifth Ann. Symp. Parallel Algorithms and Architectures, ACM, June 1993.
[20] C.P. Kruskal, L. Rudolph, and M. Snir, "A Complexity Theory of Efficient Parallel Algorithms," Theoretical Computer Science 71, pp. 95132, 1990.
[21] T. Leighton, "Tight Bounds on the Complexity of Parallel Sorting," IEEE Trans. Computers, vol. 34, no. 4, pp. 344354, Apr. 1985.
[22] D. Lenoski et al., “The Stanford DASH Multiprocessor,” Computer, pp. 6379, Mar. 1992.
[23] C. Van Loan, “Computational Frameworks for the Fast Fourier Transform,” SIAM, 1992.
[24] J.M. Marberg and E. Gafni, "Sorting in Constant Number of Row and Column Phases on a Mesh," Algorithmica vol. 3, pp. 561572, 1988.
[25] K. Mehrotra, S. Ranka, and J.C. Wang, "A Probabilistic Analysis of a Locality Maintaining Load Balancing Algorithm," Proc. Seventh Int'l Parallel Processing mp., pp. 369373, Apr. 1993.
[26] S. Rajasekaran and T. Tsantilas, "Optimal Routing Algorithms for Meshconnected Processor Arrays," Algorithmica, vol. 8, pp. 2138, 1992.
[27] K.W. Ryu and J. JáJá, "Efficient Algorithms for List Ranking and for Solving Graph Problems on the Hypercube," IEEE Trans. Parallel and Distributed Systems, vol. 1, no. 1, pp. 8390, 1990.
[28] H. Shi and J. Schaeffer, “Parallel Sorting by Regular Sampling,” J. Parallel and Distributed Computing, vol. 14, pp. 361370, 1992.
[29] H.J. Siegel et. al., "Report of the Purdue Workshop in Grand Challenges in Computer Architecture for the Support of High Performance Computing," J. Parallel and Distributed Computing, vol. 16, no. 3, pp. 198211, 1992.
[30] L.G. Valiant, “A Bridging Model for Parallel Computation,” Comm. ACM, vol. 33, no. 8, pp. 103111, Aug. 1990.