Subscribe

Issue No.03 - May/June (2012 vol.14)

pp: 30-39

Rio Yokota , Boston University

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/MCSE.2012.1

ABSTRACT

<p>Algorithms designed to efficiently solve the classical N-body problem of mechanics fit well on GPU hardware and exhibit excellent scalability on many GPUs. Their computational intensity makes them a promising approach for other applications amenable to an N-body formulation. Adding features such as autotuning makes multipole-type algorithms ideal for heterogeneous computing environments.</p>

INDEX TERMS

Scientific computing, GPU programming, fast N-body algorithms, computational science, autotuning

CITATION

Rio Yokota, "Hierarchical N-body Simulations with Autotuning for Heterogeneous Systems",

*Computing in Science & Engineering*, vol.14, no. 3, pp. 30-39, May/June 2012, doi:10.1109/MCSE.2012.1REFERENCES

- 1. J. Barnes and P. Hut, "A Hierarchical O(Nlog N) Force-Calculation Algorithm,"
Nature, vol. 324, 1986, pp. 446–449.- 2. L. Greengard and V. Rokhlin, "A Fast Algorithm for Particle Simulations,"
J. Computational Physics, vol. 73, no. 2, 1987, pp. 325–348.- 3. R. Yokota and L.A. Barba, "Treecode and Fast Multipole Method for N-Body Simulation with CUDA,"
GPU Computing Gems Emerald Edition, W.-M. Hwu ed., Elsevier/Morgan Kaufman, 2011, pp. 113–132.- 4. M.S. Warren, and J.K. Salmon, "A Portable Parallel Particle Program,"
Computer Physics Comm., vol. 87, nos. 1-2, 1995, pp. 266–290.- 5. W. Dehnen, "A Hierarchical O (N) Force Calculation Algorithm,"
J. Computational Physics, vol. 179, no. 1, 2002, pp. 27–42.- 6. H. Cheng, L. Greengard, and V. Rokhlin, "A Fast Adaptive Multipole Algorithm in Three Dimensions,"
J. Computational Physics, vol. 155, no. 2, 1999, pp. 468–498.- 7. T. Hamada and T. Iitaka, "The Chamomile Scheme: An Optimized Algorithm for N-Body Simulations on Programmable Graphics Processing Units," 2007; http://arxiv.org/pdf/astro-ph0703100v1.pdf .
- 8. L. Nyland, M. Harris, and J. Prins, "Fast N-Body Simulation with CUDA,"
GPU Gems 3, Addison-Wesley Professional, 2007, pp. 677–695.- 9. R.G. Belleman, J. Bédorf, and S.F. Portegies Zwart, "High Performance Direct Gravitational N-Body Simulations on Graphics Processing Units II: An Implementation in CUDA,"
New Astronomy, vol. 13, no. 2, 2008, pp. 103–112.- 10. E. Gaburov, S. Harfst, and S.P. Zwart, "Sapporo: A Way to Turn Your Graphics Cards into a GRAPE-6,"
New Astronomy, vol. 14, no. 7, 2009, pp. 630–637.- 11. S. Williams, A. Waterman, and D. Patterson, "Roofline: An Insightful Visual Performance Model for Multicore Architectures,"
Comm. ACM, vol. 52, no. 4, 2008, pp. 65–76.- 12. J.K. Salmon, "Parallel Hierarchical N-body Methods," doctoral dissertation, Physics, Mathematics and Astronomy Dept., California Inst. Technology, 1990.
- 13. M.S. Warren and J.K. Salmon, "A Parallel Hashed Oct-Tree N-Body Algorithm,"
Proc. 1993 ACM/IEEE Conf. Supercomputing, ACM, 1993, pp. 12–21.- 14. J. Dubinski, "A Parallel Tree Code,"
New Astronomy, vol. 1, no. 2, 1996, pp. 133–147.- 15. H. Sundar, R.S. Sampath, and G. Biros, "Bottom-Up Construction and 2:1 Balance Refinement of Linear Octrees in Parallel,"
SIAM J. Scientific Computing, vol. 30, no. 5, 2008, pp. 2675–2708.- 16. R. Yokota et al., "Biomolecular Electrostatics Using a Fast Multipole BEM on up to 512 GPUs and a Billion Unknowns,"
Computer Physics Comm., vol. 182, no. 6, 2011, pp. 1271–1283. |