
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
Mark W. Goudreau, Kevin Lang, Satish B. Rao, Torsten Suel, Thanasis Tsantilas, "Portable and Efficient Parallel Computing Using the BSP Model," IEEE Transactions on Computers, vol. 48, no. 7, pp. 670689, July, 1999.  
BibTex  x  
@article{ 10.1109/12.780876, author = {Mark W. Goudreau and Kevin Lang and Satish B. Rao and Torsten Suel and Thanasis Tsantilas}, title = {Portable and Efficient Parallel Computing Using the BSP Model}, journal ={IEEE Transactions on Computers}, volume = {48}, number = {7}, issn = {00189340}, year = {1999}, pages = {670689}, doi = {http://doi.ieeecomputersociety.org/10.1109/12.780876}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Computers TI  Portable and Efficient Parallel Computing Using the BSP Model IS  7 SN  00189340 SP670 EP689 EPD  670689 A1  Mark W. Goudreau, A1  Kevin Lang, A1  Satish B. Rao, A1  Torsten Suel, A1  Thanasis Tsantilas, PY  1999 KW  BSP KW  minimum spanning tree problem KW  models of parallel computation KW  $N$body problem KW  parallel computing KW  parallel graph algorithms KW  shortest path problem. VL  48 JA  IEEE Transactions on Computers ER   
Abstract—The BulkSynchronous Parallel (BSP) model was proposed by Valiant as a standard interface between parallel software and hardware. In theory, the BSP model has been shown to allow the asymptotically optimal execution of architectureindependent software on a variety of architectures. Our goal in this work is to experimentally examine the practical use of the BSP model on current parallel architectures. We describe the design and implementation of the Green BSP Library, a small library of functions that implement the BSP model, and of several applications that were written for this library. We then discuss the performance of the library and application programs on several parallel architectures. Our results are positive in that we demonstrate efficiency and portability over a range of parallel architectures and show that the BSP cost model is useful for predicting performance trends and estimating execution times.
[1] High Performance Fortran Forum, Version 1.1, “High Performance Fortran Language Specification,” Nov. 1994.
[2] M. Adler, J.W. Byers, and R.M. Karp, “Scheduling Parallel Communication: The$h$Relation Problem,” Proc. 20th Symp. Mathematical Foundations of Computer Science, pp. 120, Aug. 1995.
[3] A. BarNoy and S. Kipnis, "Designing Broadcasting Algorithms in the Postal Model for MessagePassing Systems," Proc. ACM Symp. Parallel Algorithms and Architectures, pp. 1122, June 1992.
[4] J. Barnes and P. Hut, “A Hierarchical$O(N \log N)$ForceCalculation Algorithm,” Nature, no. 324, pp. 446449, 1986.
[5] A. Bäumker and W. Dittrich, “Fully Dynamic Search Trees for an Extension of the BSP Model,” Proc. Eighth Ann. ACM Symp. Parallel Algorithms and Architectures, pp. 233242, June 1996.
[6] A. Bäumker, W. Dittrich, and F. Meyer auf der Heide, “Truly Efficient Parallel Algorithms: 1Optimal Multisearch for an Extension of the BSP Model,” Technical Report trrsfb96008, Univ. of Paderborn, 1996.
[7] A. Bäumker, W. Dittrich, and F. Meyer auf der Heide, “Truly Efficient Parallel Algorithms: cOptimal Multisearch for an Extension of the BSP Model,” Proc. Third Ann. European Symp. Algorithms, pp. 1730, Sept. 1995.
[8] G. Bilardi, K.T. Herley, A. Pietracaprina, G. Pucci, and P. Spirakis, “BSP vs LogP,” Proc. Eighth Ann, ACM Symp. Parallel Algorithms and Architectures, pp. 2532, June 1996.
[9] R.H. Bisseling and W.F. McColl, “Scientific Computing on Bulk Synchronous Parallel Architectures,” Proc. 13th IFIP World Computer Congress, B. Pehrson and I. Simon, eds., vol. 1, pp. 509514, 1994.
[10] R.H. Bisseling, “Sparse Matrix Computations on Bulk Synchronous Parallel Computers,” Proc. Int'l Conf. Industrial and Applied Mathematics, Hamburg, Germany, July 1995.
[11] D. Blackston and T. Suel, “Highly Portable and Efficient Implementations of Parallel Adaptive$N$Body Methods,” Proc. SC '97: High Performance Networking and Computing, Nov. 1997.
[12] G. Blelloch, P. Gibbons, Y. Matias, and M. Zagha, “Accounting for Memory Bank Contention and Delay in HighBandwidth Multiprocessors,” Proc. Seventh ACM Symp. Parallel Algorithms and Architectures, pp. 8494, June 1995.
[13] G.E. Blelloch, “Nesl: A Nested DataParallel Language (version 2.6),” Technical Report CMUCS93129, School of Computer Science, Carnegie Mellon Univ., 1993.
[14] G.E. Blelloch, “Programming Parallel Algorithms,” Comm. ACM, vol. 39, no. 3, pp. 8597, Mar. 1996.
[15] J. Carrier, L. Greengard, and V. Rokhlin, “A Fast Adaptive Multipole Algorithm for Particle Simulations,” SIAM J. Scientific and Statistical Computing, vol. 9, no. 4, pp. 669686, July 1988.
[16] T. Cheatham, A. Fahmy, and D. Stefanescu, “General Purpose Optimization Technology,” technical report, Center for Research in Computing Technology, Harvard Univ., Dec. 1994.
[17] T. Cheatham, A. Fahmy, D.C. Stefanescu, and L.G. Valiant, “Bulk Synchronous Parallel Computing—A Paradigm for Transportable Doftware,” Proc. 28th Hawaii Int'l Conf. System Science, vol. II, Jan. 1995.
[18] D.E. Culler, A. Dusseau, S.C. Goldstein, A. Krishnamurthy, S. Lumetta, T. von Eicken, and K. Yelick, "Parallel Programming in SplitC," Supercomputing, 1993.
[19] D. Culler, A. Dusseau, R. Martin, and K.E. Schauser, “Fast Parallel Sorting under LogP: From Theory to Practice,” Proc. Workshop Portability and Performance for Parallel Processing, July 1993.
[20] D. Culler,R. Karp,D. Patterson,A. Sahay,K.E. Schauser,E. Santos,R. Subramonian,, and T. von Eicken,“LogP: Towards a realistic model of parallel computation,” Fourth Symp. Principles and Practices Parallel Programming, SIGPLAN’93, ACM, May 1993.
[21] F. Dehne, W. Dittrich, and D. Hutchinson, “Efficient External Memory Algorithms by Simulating CoarseGrained Parallel Algorithms,” Proc. Ninth Ann. ACM Symp. Parallel Algorithms and Architectures, pp. 106115, Newport, R.I., June 1997.
[22] A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Manchek,, and V. Sunderam,PVM: Parallel Virtual Machine—A Users' Guide and Tutorial for Networked Parallel Computing. The MIT Press, 1994.
[23] A.V. Gerbessiotis and C.J. Siniolakis, “Deterministic Sorting and Randomized Mean Finding on the BSP Model,” Proc. Eighth Ann. ACM Symp. Parallel Algorithms and Architectures, pp. 223232, June 1996.
[24] A.V. Gerbessiotis and L.G. Valiant, “Direct BulkSynchronous Parallel Algorithms,” J. Parallel and Distributed Computing, vol. 22, no. 2, pp. 251267, Aug. 1994.
[25] K. Gharachorloo, S.V. Adve, A. Gupta, J.L. Hennessy, and M.D. Hill, “Programming for Different Memory Consistency Models,” J. Parallel and Distributed Systems, vol. 15, no. 4, pp. 399407, Aug. 1992.
[26] P. Gibbons, Y. Matias, and V. Ramachandran, “Efficient LowContention Parallel Algorithms,” Proc. Sixth ACM Symp. Parallel Algorithms and Architectures, pp. 236247, June 1994.
[27] P. Gibbons, Y. Matias, and V. Ramachandran, “The QRQW PRAM: Accounting for Contention in Parallel Algorithms,” Proc. Fifth Ann. ACMSIAM Symp. Discrete Algorithms, pp. 638648, Jan. 1994.
[28] P.B. Gibbons, Y. Mattias, and V. Ramachandran, “Can a SharedMemory Model Serve as a Bridging Model for Parallel Computation?,” Proc. Ninth Ann. ACM Symp. Parallel Algorithms and Architectures, pp. 7283, Newport, R.I., June 1997.
[29] A.V. Goldberg, K. Lang, and S.B. Rao, “Computing Minimum Spanning Tree with the Green BSP Library,” in preparation Apr. 1996.
[30] M.W. Goudreau, K. Lang, S.B. Rao, and T. Tsantilas, “The Green BSP Library,” Technical Report CSTR9511, Dept. of Computer Science, Univ. of Central Florida, Orlando, Fla., June 1995.
[31] M.W. Goudreau and S.B. Rao, “Single Message vs. Batch Communication,” Algorithms for Parallel Processing, M.T. Heath, A. Ranade, and R.S. Schreiber, eds., pp. 6174. SpringerVerlag, 1999.
[32] M.W. Goudreau and E.D. Root, “A BulkSynchronous Parallel Library Implementation for the BBN Butterfly GP1000,” Proc. Eighth IEEE Symp. Parallel and Distributed Processing, pp. 288297, Oct. 1996.
[33] W. Gropp, E. Lusk, and A. Skjellum, Using MPI: Portable Parallel Programming with the Message Passing Interface. MIT Press, 1994.
[34] J. L. Gustafson,“Reevaluating Amdahl's law,”Commun. ACM, vol. 31, no. 5, pp. 532–533, 1988.
[35] J.M.D. Hill, P.I. Crumpton, and D.A. Burgess, “Theory, Practice, and a Tool for BSP Performance Prediction,” Proc. EuroPar '96, pp. 697705, 1996.
[36] J.M.D. Hill, B. McColl, D.C. Stefanescu, M.W. Goudreau, K. Lang, S.B. Rao, T. Suel, T. Tsantilas, and R. Bisseling, “BSPlib: The BSP Programming Library,” Parallel Computing, vol. 24, no. 14, pp. 1,9471,980, 1998.
[37] B.H.H. Juurlink, H.A.G. Wijshoff, “The EBSP Model: Incorporating Unbalanced Communication and General Locality into the BSP Model,” Technical Report 9544, Leiden Univ., 1995.
[38] B.H.H. Juurlink and H.A.G. Wijshoff, “A Quantitative Comparison of Parallel Computation Models,” Proc. Eighth Ann. ACM Symp. Parallel Algorithms and Architectures, pp. 1324, June 1996.
[39] S. Knee, “Program Development and Performance Prediction on BSP Machines Using Opal,” Techical Report PRGTR1894, Oxford Univ. Computing Laboratory, Aug. 1994.
[40] V. Kumar, A. Grama, A. Gupta, and G. Karypis, Introduction to Parallel Computing: Design and Analysis of Algorithms. Benjamin Cummings, 1994.
[41] D. Lecomber, “An ObjectOriented Programming Model for BSP Computations,” technical report, Oxford Univ. Computing Laboratory, 1994.
[42] C. Leiserson and B.M. Maggs, “CommunicationEfficient Parallel Algorithms for Distributed RandomAccess Machines,” Algorithmics, vol. 3, pp. 5377, 1988.
[43] P. Liu, W. Aiello, and S. Bhatt, "An Atomic Model for MessagePassing," Proc. ACM Symp. Parallel Algorithms and Architectures, 1993.
[44] P. Liu and S.N. Bhatt, “Experiences with Parallel NBody Simulations,” Proc. Sixth Ann. ACM Symp. Parallel Algorithms and Architectures, pp. 122131, 1994.
[45] S. Lumetta, A. Krishnamurthy, and D.E. Culler, “Towards Modeling the Performance of a Fast Connected Components Algorithms on Parallel Machines,” Proc. Supercomputing '95, Nov. 1995.
[46] B.M. Maggs, L.R. Matheson, and R.E. Tarjan, “Models of Parallel Computation: A Survey and Synthesis,” Proc. 28th Hawaii Int'l Conf. System Sciences, vol. 2, pp. 6170, Jan. 1995.
[47] W.F. McColl, “BSP Programming,” Proc. DIMACS Workshop Specification of Parallel Algorithms, G. Blelloch, K. Chandy, and S. Jagannathan, eds., pp. 2135, May 1994.
[48] W.F. McColl, “General Purpose Parallel Computing,” Lectures in Parallel Computation, Proc. 1991 ALCOM Spring School on Parallel Computation, A.M. Gibbons and P. Spirakis, eds., pp. 337391, 1993.
[49] R. Miller, “A Library for BulkSynchronous Parallel Programming,” Proc. British Computer Soc. Parallel Processing Specialist Group Workshop General Purpose Parallel Computing, pp. 100108, Dec. 1993.
[50] M.V. Nibhanupudi, C.D. Norton, and B.K. Szymanski, “Plasma Simulation on Networks of Workstations Using the BulkSynchronous Parallel Model,” Proc. Int'l Conf. Parallel and Distributed Processing Techniques and Applications, Nov. 1995.
[51] A. Plaat, H. Bal, and R. Hofman, “Bandwidth and Latency Sensitivity of Parallel Applications in a WideArea System,” unpublished manuscript, Mar. 1998.
[52] G.W. Shumaker, “A BulkSynchronous Parallel Implementation on the Maspar,” Master's thesis, Univ. of Central Florida, Orlando, Fla., 1996.
[53] J.F. Sibeyn and M. Kaufmann, “BSPLike ExternalMemory Computation,” Proc. Italian Conf. Algorithms and Complexity (CIAC), pp. 229240, 1997.
[54] J.P. Singh and J.L. Hennessy, “Data Locality and Memory System Performance in the Parallel Simulation of Ocean Eddy Currents,” Proc. Second Int'l Symp. High Performance Computing, Oct. 1991.
[55] J.P. Singh, J.L. Hennessy, and A. Gupta, “Scaling Parallel Programs for Multiprocessors: Methodology and Examples,” Computer, vol. 26, no. 7, pp. 4250, July 1993.
[56] J.P. Singh, W.D. Weber, and A. Gupta, “SPLASH: Stanford Parallel Applications for SharedMemory,” Technical Report CSLTR92526, Stanford Univ., Palo Alto, Calif., June 1992.
[57] L.G. Valiant, “A Bridging Model for Parallel Computation,” Comm. ACM, vol. 33, no. 8, pp. 103111, Aug. 1990.
[58] L.G. Valiant, “General Purpose Parallel Architectures,” Handbook of Theoretical Computer Science, J. van Leeuwen, ed., pp. 944971, NorthHolland, 1990.
[59] T. von Eicken et al., “Active Messages: A Mechanism for Integrated Communication and Computation,” Proc. 19th Int’l Symp. Computer Architecture, Assoc. of Computing Machinery, N.Y., May 1992, pp. 256266.
[60] M.S. Warren and J.K. Salmon, “Astrophysical$N$body Simulations Using Hierarchical Tree Data Structures,” Proc. Supercomputing '92, pp. 570576, 1992.