This Article 
 Bibliographic References 
 Add to: 
Experiences with Parallel N-Body Simulation
December 2000 (vol. 11 no. 12)
pp. 1306-1323

Abstract—This paper describes our experiences developing high-performance code for astrophysical N-body simulations. Recent N-body methods are based on an adaptive tree structure. The tree must be built and maintained across physically distributed memory; moreover, the communication requirements are irregular and adaptive. Together with the need to balance the computational work-load among processors, these issues pose interesting challenges and tradeoffs for high-performance implementation. Our implementation was guided by the need to keep solutions simple and general. We use a technique for implicitly representing a dynamic global tree across multiple processors which substantially reduces the programming complexity as well as the performance overheads of distributed memory architectures. The contributions include methods to vectorize the computation and minimize communication time which are theoretically and experimentally justified. The code has been tested by varying the number and distribution of bodies on different configurations of the Connection Machine CM-5. The overall performance on instances with 10 million bodies is typically over 48 percent of the peak machine rate, which compares favorably with other approaches.

[1] S. Aarseth, M. Henon, and R. Wielen, “Numerical Methods for the Study of Star Cluster Dynamics,” Astronomy and Astrophysics, vol. 37, 1974.
[2] R. Anderson, “Tree Data Structures forN-Body Simulation,” Proc. 37th Ann. Symp. Foundations of Computer Science, 1996.
[3] A. Appel, “An Efficient Program for Many-Body Simulation,” SIAM J. Scientific and Statistical Computing, vol. 6, 1985.
[4] J. Barnes, “A Modified Tree Code: Don't Laugh; It Runs,” J. Computational Physics, vol. 87, 1990.
[5] J. Barnes and P. Hut, “A Hierarchical$O(N\log N)$Force-Calculation Algorithm,” Nature, vol. 324, 1986.
[6] J. Bartholdi and L. Platzman, “Heuristics Based on Space-Filling Curves for Combinatorial Problems in Euclidean Space,” Management Science, vol. 34, 1988.
[7] D. Bertsimas and M. Grigni, “Worst-Case Examples for the Space-Filling Curve Heuristic for the Euclidean Traveling Salesman Problem,” Operation Research Letter, vol. 8, 1989.
[8] S. Bhatt, M. Chen, C. Lin, and P. Liu, “Abstractions for Parallel N-Body Simulation,” Proc. Scalable High Performance Computing Conf. (SHPCC '92), 1992.
[9] S. Bhatt and P. Liu, “A Framework for Parallel N-Body Simulations,” Proc. Third Int'l Conf. Computational Physics, 1995.
[10] S. Bhatt, P. Liu, V. Fernadez, and N. Zabusky, “Tree Codes for Vortex Dynamics,” Proc. Int'l Parallel Processing Symp., 1995.
[11] G. Blelloch and G. Narlikar, “A Practical Comparison of$n$-Body Algorithms,” Parallel Algorithms. Am. Math. Soc., 1997.
[12] J. Board, Z. Hakura, W. Elliot, D. Gray, W. Blanke, and J.F. Leathrum, “Scalable Implementations of Multipole-Accelerated Algorithms for Molecular Dynamics,” Technical Report 94-002, Duke Univ., 1994.
[13] J. Board, Z. Hakura, W.S. Elliot, and W. Rankin, “Scalable Variants of Multipole-Accelerated Algorithms for Molecular Dynamics,” Technical Report 94-006, Duke Univ., 1994.
[14] P. Callahan and S. Kosaraju, “A Decomposition of Multidimensional Point Sets with Applications to$k$-Nearest-Neighbors and$n$-Body Potential Fields,” J. ACM, vol. 42, no. 1, pp. 67–90, Jan. 1995.
[15] V. Fernadez, N. Zabusky, S. Bhatt, P. Liu, and A. Gerasoulis, “Filament Surgery and Temporal Grid Adaptivity Extensions to a Parallel Tree Code for Simulation and Diagnostics in 3D Vortex Dynamics,” Proc. Second Int'l Workshop in Vortex Flow, 1995.
[16] L. Greengard and V. Rokhlin, “A Fast Algorithm for Particle Simulations,” J. Computational Physics, vol. 73, 1987.
[17] O.M. Knio and A.F. Ghoniem, “Numerical Study of a Three-Dimensional Vortex Method,” J. Computational Physics, vol. 86, 1980.
[18] P. Liu, W. Aiello, and S. Bhatt, "An Atomic Model for Message-Passing," Proc. ACM Symp. Parallel Algorithms and Architectures, 1993.
[19] P. Liu and J. Wu, “A Framework for Parallel Tree-Based Scientific Simulation,” Proc. 26th Int'l Conf. Parallel Processing, 1997.
[20] P. Liu and J. Wu, “Supporting Efficient Tree Structures for Distributed Scientific Computing,” J. Information Science and Eng.: Special Issue on Compiler Technique for High-Performance Computing, vol. 14, no. 1, 1998.
[21] P. Mills, L. Nyland, J. Prins, and J. Reif, “Prototyping N-Body Simulation in Proteus,” Proc. Sixth Int'l Parallel Processing Symp., 1992.
[22] L. Nyland, J. Prins, and J. Reif, “A Data-Parallel Implementation of the Adaptive Fast Multipole Algorithm,” DAGS/PC Symp., 1993.
[23] L.K. Platzman and J.J. Bartholdi III, "Spacefilling curves and the planar travelling salesman problem," J. ACM, vol. 36, no. 4, pp. 719-737, Oct. 1989.
[24] G. Pringle, “Numerical Study of Three-Dimensional Flow Using Fast Parallel Particle Algorithms,” PhD thesis, Napier Univ., 1996.
[25] J. Reif and S. Tate, “The Complexity of N-Body Simulation,” Proc. Int'l Colloquium on Automata Languages and Programming, 1993.
[26] J. Salmon, “Parallel Hierarchical N-Body Methods,” PhD thesis, Caltech, 1990.
[27] J. Salmon and M. Warren, “Skeletons from the Treecode Closet,” J. Computational Physics, vol. 111, no. 1, pp. 136–155, 1994.
[28] J. Singh, "Parallel Hierarchical N-body Methods and Their Implications for Multiprocessors," PhD dissertation, EE Dept., Stanford University, Stanford, Calif., 1993.
[29] J. Singh, J. Hennessy, and A. Gupta, “Implications of HierarchicalN-Body Methods for Multiprocessor Architectures,” ACM Trans. Computer Systems, vol. 13, no. 2, pp. 141–202, May 1995.
[30] J. Singh, C. Holt, T. Totsuka, A. Gupta, and J. Hennessy, “Load Balancing and Data Locality in Hierarchical N-Body Methods,” Technical Report CSL-TR-92-505, Stanford Univ., 1992.
[31] S. Sundaram, “Fast Algorithms for N-Body Simulations,” PhD thesis, Cornell Univ., 1993.
[32] Thinking Machine Corp., “CMMD Reference Manual,” 1993.
[33] M. Warren, D. Becker, M. Goda, J. Salmon, and T. Stering, “Parallel Supercomputing with Commodity Components,” Proc. Int'l Conf. Parallel and Distributed Processing Techniques and Applications, 1997.
[34] M.S. Warren and J.K. Salmon, “Astrophysical$N$-body Simulations Using Hierarchical Tree Data Structures,” Proc. Supercomputing '92, pp. 570-576, 1992.
[35] M.S. Warren and J.K. Salmon, "A parallel hashed oct-tree N-body algorithm," Proc. Supercomputing 93, pp. 12-21, 1993.
[36] M. Warren and J. Salmon, “A Portable Parallel Particle Program,” Computer Physics Comm., 1995.
[37] G. Xue, “An${\rm O(n)}$Time Hierarchical Tree Algorithm for Computing Force Field in$n$-Body Simulations,” Theoretical Computer Science, vol. 197,nos. 1–2, pp. 157–169, May 1998.
[38] F. Zhao and S. Johnsson, “The Parallel Multipole Method on the Connection Machine,” SIAM J. Scientific and Statistical Computing, 1991.

Index Terms:
N-body simulations, parallel processing, Barnes-Hut algorithm, adaptive tree structure, Peano-Hilbert space filling curve.
Pangfeng Liu, Sandeep N. Bhatt, "Experiences with Parallel N-Body Simulation," IEEE Transactions on Parallel and Distributed Systems, vol. 11, no. 12, pp. 1306-1323, Dec. 2000, doi:10.1109/71.895795
Usage of this product signifies your acceptance of the Terms of Use.