Subscribe
Issue No.03 - May/June (2012 vol.14)
pp: 30-39
Rio Yokota , Boston University
ABSTRACT
<p>Algorithms designed to efficiently solve the classical N-body problem of mechanics fit well on GPU hardware and exhibit excellent scalability on many GPUs. Their computational intensity makes them a promising approach for other applications amenable to an N-body formulation. Adding features such as autotuning makes multipole-type algorithms ideal for heterogeneous computing environments.</p>
INDEX TERMS
Scientific computing, GPU programming, fast N-body algorithms, computational science, autotuning
CITATION
Rio Yokota, "Hierarchical N-body Simulations with Autotuning for Heterogeneous Systems", Computing in Science & Engineering, vol.14, no. 3, pp. 30-39, May/June 2012, doi:10.1109/MCSE.2012.1
REFERENCES
1. J. Barnes and P. Hut, "A Hierarchical O(Nlog N) Force-Calculation Algorithm," Nature, vol. 324, 1986, pp. 446–449.
2. L. Greengard and V. Rokhlin, "A Fast Algorithm for Particle Simulations," J. Computational Physics, vol. 73, no. 2, 1987, pp. 325–348.
3. R. Yokota and L.A. Barba, "Treecode and Fast Multipole Method for N-Body Simulation with CUDA," GPU Computing Gems Emerald Edition, W.-M. Hwu ed., Elsevier/Morgan Kaufman, 2011, pp. 113–132.
4. M.S. Warren, and J.K. Salmon, "A Portable Parallel Particle Program," Computer Physics Comm., vol. 87, nos. 1-2, 1995, pp. 266–290.
5. W. Dehnen, "A Hierarchical O (N) Force Calculation Algorithm," J. Computational Physics, vol. 179, no. 1, 2002, pp. 27–42.
6. H. Cheng, L. Greengard, and V. Rokhlin, "A Fast Adaptive Multipole Algorithm in Three Dimensions," J. Computational Physics, vol. 155, no. 2, 1999, pp. 468–498.
7. T. Hamada and T. Iitaka, "The Chamomile Scheme: An Optimized Algorithm for N-Body Simulations on Programmable Graphics Processing Units," 2007; http://arxiv.org/pdf/astro-ph0703100v1.pdf .
8. L. Nyland, M. Harris, and J. Prins, "Fast N-Body Simulation with CUDA," GPU Gems 3, Addison-Wesley Professional, 2007, pp. 677–695.
9. R.G. Belleman, J. Bédorf, and S.F. Portegies Zwart, "High Performance Direct Gravitational N-Body Simulations on Graphics Processing Units II: An Implementation in CUDA," New Astronomy, vol. 13, no. 2, 2008, pp. 103–112.
10. E. Gaburov, S. Harfst, and S.P. Zwart, "Sapporo: A Way to Turn Your Graphics Cards into a GRAPE-6," New Astronomy, vol. 14, no. 7, 2009, pp. 630–637.
11. S. Williams, A. Waterman, and D. Patterson, "Roofline: An Insightful Visual Performance Model for Multicore Architectures," Comm. ACM, vol. 52, no. 4, 2008, pp. 65–76.
12. J.K. Salmon, "Parallel Hierarchical N-body Methods," doctoral dissertation, Physics, Mathematics and Astronomy Dept., California Inst. Technology, 1990.
13. M.S. Warren and J.K. Salmon, "A Parallel Hashed Oct-Tree N-Body Algorithm," Proc. 1993 ACM/IEEE Conf. Supercomputing, ACM, 1993, pp. 12–21.
14. J. Dubinski, "A Parallel Tree Code," New Astronomy, vol. 1, no. 2, 1996, pp. 133–147.
15. H. Sundar, R.S. Sampath, and G. Biros, "Bottom-Up Construction and 2:1 Balance Refinement of Linear Octrees in Parallel," SIAM J. Scientific Computing, vol. 30, no. 5, 2008, pp. 2675–2708.
16. R. Yokota et al., "Biomolecular Electrostatics Using a Fast Multipole BEM on up to 512 GPUs and a Billion Unknowns," Computer Physics Comm., vol. 182, no. 6, 2011, pp. 1271–1283.