This Article 
 Bibliographic References 
 Add to: 
Portable and Efficient Parallel Computing Using the BSP Model
July 1999 (vol. 48 no. 7)
pp. 670-689

Abstract—The Bulk-Synchronous Parallel (BSP) model was proposed by Valiant as a standard interface between parallel software and hardware. In theory, the BSP model has been shown to allow the asymptotically optimal execution of architecture-independent software on a variety of architectures. Our goal in this work is to experimentally examine the practical use of the BSP model on current parallel architectures. We describe the design and implementation of the Green BSP Library, a small library of functions that implement the BSP model, and of several applications that were written for this library. We then discuss the performance of the library and application programs on several parallel architectures. Our results are positive in that we demonstrate efficiency and portability over a range of parallel architectures and show that the BSP cost model is useful for predicting performance trends and estimating execution times.

[1] High Performance Fortran Forum, Version 1.1, “High Performance Fortran Language Specification,” Nov. 1994.
[2] M. Adler, J.W. Byers, and R.M. Karp, “Scheduling Parallel Communication: The$h$-Relation Problem,” Proc. 20th Symp. Mathematical Foundations of Computer Science, pp. 1-20, Aug. 1995.
[3] A. Bar-Noy and S. Kipnis, "Designing Broadcasting Algorithms in the Postal Model for Message-Passing Systems," Proc. ACM Symp. Parallel Algorithms and Architectures, pp. 11-22, June 1992.
[4] J. Barnes and P. Hut, “A Hierarchical$O(N \log N)$Force-Calculation Algorithm,” Nature, no. 324, pp. 446-449, 1986.
[5] A. Bäumker and W. Dittrich, “Fully Dynamic Search Trees for an Extension of the BSP Model,” Proc. Eighth Ann. ACM Symp. Parallel Algorithms and Architectures, pp. 233-242, June 1996.
[6] A. Bäumker, W. Dittrich, and F. Meyer auf der Heide, “Truly Efficient Parallel Algorithms: 1-Optimal Multisearch for an Extension of the BSP Model,” Technical Report tr-rsfb-96-008, Univ. of Paderborn, 1996.
[7] A. Bäumker, W. Dittrich, and F. Meyer auf der Heide, “Truly Efficient Parallel Algorithms: c-Optimal Multisearch for an Extension of the BSP Model,” Proc. Third Ann. European Symp. Algorithms, pp. 17-30, Sept. 1995.
[8] G. Bilardi, K.T. Herley, A. Pietracaprina, G. Pucci, and P. Spirakis, “BSP vs LogP,” Proc. Eighth Ann, ACM Symp. Parallel Algorithms and Architectures, pp. 25-32, June 1996.
[9] R.H. Bisseling and W.F. McColl, “Scientific Computing on Bulk Synchronous Parallel Architectures,” Proc. 13th IFIP World Computer Congress, B. Pehrson and I. Simon, eds., vol. 1, pp. 509-514, 1994.
[10] R.H. Bisseling, “Sparse Matrix Computations on Bulk Synchronous Parallel Computers,” Proc. Int'l Conf. Industrial and Applied Mathematics, Hamburg, Germany, July 1995.
[11] D. Blackston and T. Suel, “Highly Portable and Efficient Implementations of Parallel Adaptive$N$-Body Methods,” Proc. SC '97: High Performance Networking and Computing, Nov. 1997.
[12] G. Blelloch, P. Gibbons, Y. Matias, and M. Zagha, “Accounting for Memory Bank Contention and Delay in High-Bandwidth Multiprocessors,” Proc. Seventh ACM Symp. Parallel Algorithms and Architectures, pp. 84-94, June 1995.
[13] G.E. Blelloch, “Nesl: A Nested Data-Parallel Language (version 2.6),” Technical Report CMU-CS-93-129, School of Computer Science, Carnegie Mellon Univ., 1993.
[14] G.E. Blelloch, “Programming Parallel Algorithms,” Comm. ACM, vol. 39, no. 3, pp. 85-97, Mar. 1996.
[15] J. Carrier, L. Greengard, and V. Rokhlin, “A Fast Adaptive Multipole Algorithm for Particle Simulations,” SIAM J. Scientific and Statistical Computing, vol. 9, no. 4, pp. 669-686, July 1988.
[16] T. Cheatham, A. Fahmy, and D. Stefanescu, “General Purpose Optimization Technology,” technical report, Center for Research in Computing Technology, Harvard Univ., Dec. 1994.
[17] T. Cheatham, A. Fahmy, D.C. Stefanescu, and L.G. Valiant, “Bulk Synchronous Parallel Computing—A Paradigm for Transportable Doftware,” Proc. 28th Hawaii Int'l Conf. System Science, vol. II, Jan. 1995.
[18] D.E. Culler, A. Dusseau, S.C. Goldstein, A. Krishnamurthy, S. Lumetta, T. von Eicken, and K. Yelick, "Parallel Programming in Split-C," Supercomputing, 1993.
[19] D. Culler, A. Dusseau, R. Martin, and K.E. Schauser, “Fast Parallel Sorting under LogP: From Theory to Practice,” Proc. Workshop Portability and Performance for Parallel Processing, July 1993.
[20] D. Culler,R. Karp,D. Patterson,A. Sahay,K.E. Schauser,E. Santos,R. Subramonian,, and T. von Eicken,“LogP: Towards a realistic model of parallel computation,” Fourth Symp. Principles and Practices Parallel Programming, SIGPLAN’93, ACM, May 1993.
[21] F. Dehne, W. Dittrich, and D. Hutchinson, “Efficient External Memory Algorithms by Simulating Coarse-Grained Parallel Algorithms,” Proc. Ninth Ann. ACM Symp. Parallel Algorithms and Architectures, pp. 106-115, Newport, R.I., June 1997.
[22] A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Manchek,, and V. Sunderam,PVM: Parallel Virtual Machine—A Users' Guide and Tutorial for Networked Parallel Computing. The MIT Press, 1994.
[23] A.V. Gerbessiotis and C.J. Siniolakis, “Deterministic Sorting and Randomized Mean Finding on the BSP Model,” Proc. Eighth Ann. ACM Symp. Parallel Algorithms and Architectures, pp. 223-232, June 1996.
[24] A.V. Gerbessiotis and L.G. Valiant, “Direct Bulk-Synchronous Parallel Algorithms,” J. Parallel and Distributed Computing, vol. 22, no. 2, pp. 251-267, Aug. 1994.
[25] K. Gharachorloo, S.V. Adve, A. Gupta, J.L. Hennessy, and M.D. Hill, “Programming for Different Memory Consistency Models,” J. Parallel and Distributed Systems, vol. 15, no. 4, pp. 399-407, Aug. 1992.
[26] P. Gibbons, Y. Matias, and V. Ramachandran, “Efficient Low-Contention Parallel Algorithms,” Proc. Sixth ACM Symp. Parallel Algorithms and Architectures, pp. 236-247, June 1994.
[27] P. Gibbons, Y. Matias, and V. Ramachandran, “The QRQW PRAM: Accounting for Contention in Parallel Algorithms,” Proc. Fifth Ann. ACM-SIAM Symp. Discrete Algorithms, pp. 638-648, Jan. 1994.
[28] P.B. Gibbons, Y. Mattias, and V. Ramachandran, “Can a Shared-Memory Model Serve as a Bridging Model for Parallel Computation?,” Proc. Ninth Ann. ACM Symp. Parallel Algorithms and Architectures, pp. 72-83, Newport, R.I., June 1997.
[29] A.V. Goldberg, K. Lang, and S.B. Rao, “Computing Minimum Spanning Tree with the Green BSP Library,” in preparation Apr. 1996.
[30] M.W. Goudreau, K. Lang, S.B. Rao, and T. Tsantilas, “The Green BSP Library,” Technical Report CS-TR-95-11, Dept. of Computer Science, Univ. of Central Florida, Orlando, Fla., June 1995.
[31] M.W. Goudreau and S.B. Rao, “Single Message vs. Batch Communication,” Algorithms for Parallel Processing, M.T. Heath, A. Ranade, and R.S. Schreiber, eds., pp. 61-74. Springer-Verlag, 1999.
[32] M.W. Goudreau and E.D. Root, “A Bulk-Synchronous Parallel Library Implementation for the BBN Butterfly GP1000,” Proc. Eighth IEEE Symp. Parallel and Distributed Processing, pp. 288-297, Oct. 1996.
[33] W. Gropp, E. Lusk, and A. Skjellum, Using MPI: Portable Parallel Programming with the Message Passing Interface. MIT Press, 1994.
[34] J. L. Gustafson,“Reevaluating Amdahl's law,”Commun. ACM, vol. 31, no. 5, pp. 532–533, 1988.
[35] J.M.D. Hill, P.I. Crumpton, and D.A. Burgess, “Theory, Practice, and a Tool for BSP Performance Prediction,” Proc. EuroPar '96, pp. 697-705, 1996.
[36] J.M.D. Hill, B. McColl, D.C. Stefanescu, M.W. Goudreau, K. Lang, S.B. Rao, T. Suel, T. Tsantilas, and R. Bisseling, “BSPlib: The BSP Programming Library,” Parallel Computing, vol. 24, no. 14, pp. 1,947-1,980, 1998.
[37] B.H.H. Juurlink, H.A.G. Wijshoff, “The E-BSP Model: Incorporating Unbalanced Communication and General Locality into the BSP Model,” Technical Report 95-44, Leiden Univ., 1995.
[38] B.H.H. Juurlink and H.A.G. Wijshoff, “A Quantitative Comparison of Parallel Computation Models,” Proc. Eighth Ann. ACM Symp. Parallel Algorithms and Architectures, pp. 13-24, June 1996.
[39] S. Knee, “Program Development and Performance Prediction on BSP Machines Using Opal,” Techical Report PRG-TR-18-94, Oxford Univ. Computing Laboratory, Aug. 1994.
[40] V. Kumar, A. Grama, A. Gupta, and G. Karypis, Introduction to Parallel Computing: Design and Analysis of Algorithms. Benjamin Cummings, 1994.
[41] D. Lecomber, “An Object-Oriented Programming Model for BSP Computations,” technical report, Oxford Univ. Computing Laboratory, 1994.
[42] C. Leiserson and B.M. Maggs, “Communication-Efficient Parallel Algorithms for Distributed Random-Access Machines,” Algorithmics, vol. 3, pp. 53-77, 1988.
[43] P. Liu, W. Aiello, and S. Bhatt, "An Atomic Model for Message-Passing," Proc. ACM Symp. Parallel Algorithms and Architectures, 1993.
[44] P. Liu and S.N. Bhatt, “Experiences with Parallel N-Body Simulations,” Proc. Sixth Ann. ACM Symp. Parallel Algorithms and Architectures, pp. 122-131, 1994.
[45] S. Lumetta, A. Krishnamurthy, and D.E. Culler, “Towards Modeling the Performance of a Fast Connected Components Algorithms on Parallel Machines,” Proc. Supercomputing '95, Nov. 1995.
[46] B.M. Maggs, L.R. Matheson, and R.E. Tarjan, “Models of Parallel Computation: A Survey and Synthesis,” Proc. 28th Hawaii Int'l Conf. System Sciences, vol. 2, pp. 61-70, Jan. 1995.
[47] W.F. McColl, “BSP Programming,” Proc. DIMACS Workshop Specification of Parallel Algorithms, G. Blelloch, K. Chandy, and S. Jagannathan, eds., pp. 21-35, May 1994.
[48] W.F. McColl, “General Purpose Parallel Computing,” Lectures in Parallel Computation, Proc. 1991 ALCOM Spring School on Parallel Computation, A.M. Gibbons and P. Spirakis, eds., pp. 337-391, 1993.
[49] R. Miller, “A Library for Bulk-Synchronous Parallel Programming,” Proc. British Computer Soc. Parallel Processing Specialist Group Workshop General Purpose Parallel Computing, pp. 100-108, Dec. 1993.
[50] M.V. Nibhanupudi, C.D. Norton, and B.K. Szymanski, “Plasma Simulation on Networks of Workstations Using the Bulk-Synchronous Parallel Model,” Proc. Int'l Conf. Parallel and Distributed Processing Techniques and Applications, Nov. 1995.
[51] A. Plaat, H. Bal, and R. Hofman, “Bandwidth and Latency Sensitivity of Parallel Applications in a Wide-Area System,” unpublished manuscript, Mar. 1998.
[52] G.W. Shumaker, “A Bulk-Synchronous Parallel Implementation on the Maspar,” Master's thesis, Univ. of Central Florida, Orlando, Fla., 1996.
[53] J.F. Sibeyn and M. Kaufmann, “BSP-Like External-Memory Computation,” Proc. Italian Conf. Algorithms and Complexity (CIAC), pp. 229-240, 1997.
[54] J.P. Singh and J.L. Hennessy, “Data Locality and Memory System Performance in the Parallel Simulation of Ocean Eddy Currents,” Proc. Second Int'l Symp. High Performance Computing, Oct. 1991.
[55] J.P. Singh, J.L. Hennessy, and A. Gupta, “Scaling Parallel Programs for Multiprocessors: Methodology and Examples,” Computer, vol. 26, no. 7, pp. 42-50, July 1993.
[56] J.P. Singh, W.-D. Weber, and A. Gupta, “SPLASH: Stanford Parallel Applications for Shared-Memory,” Technical Report CSL-TR-92-526, Stanford Univ., Palo Alto, Calif., June 1992.
[57] L.G. Valiant, “A Bridging Model for Parallel Computation,” Comm. ACM, vol. 33, no. 8, pp. 103-111, Aug. 1990.
[58] L.G. Valiant, “General Purpose Parallel Architectures,” Handbook of Theoretical Computer Science, J. van Leeuwen, ed., pp. 944-971, NorthHolland, 1990.
[59] T. von Eicken et al., “Active Messages: A Mechanism for Integrated Communication and Computation,” Proc. 19th Int’l Symp. Computer Architecture, Assoc. of Computing Machinery, N.Y., May 1992, pp. 256-266.
[60] M.S. Warren and J.K. Salmon, “Astrophysical$N$-body Simulations Using Hierarchical Tree Data Structures,” Proc. Supercomputing '92, pp. 570-576, 1992.

Index Terms:
BSP, minimum spanning tree problem, models of parallel computation, $N$-body problem, parallel computing, parallel graph algorithms, shortest path problem.
Mark W. Goudreau, Kevin Lang, Satish B. Rao, Torsten Suel, Thanasis Tsantilas, "Portable and Efficient Parallel Computing Using the BSP Model," IEEE Transactions on Computers, vol. 48, no. 7, pp. 670-689, July 1999, doi:10.1109/12.780876
Usage of this product signifies your acceptance of the Terms of Use.