
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
Rio Yokota, Lorena A. Barba, "Hierarchical Nbody Simulations with Autotuning for Heterogeneous Systems," Computing in Science and Engineering, vol. 14, no. 3, pp. 3039, May/June, 2012.  
BibTex  x  
@article{ 10.1109/MCSE.2012.1, author = {Rio Yokota and Lorena A. Barba}, title = {Hierarchical Nbody Simulations with Autotuning for Heterogeneous Systems}, journal ={Computing in Science and Engineering}, volume = {14}, number = {3}, issn = {15219615}, year = {2012}, pages = {3039}, doi = {http://doi.ieeecomputersociety.org/10.1109/MCSE.2012.1}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  MGZN JO  Computing in Science and Engineering TI  Hierarchical Nbody Simulations with Autotuning for Heterogeneous Systems IS  3 SN  15219615 SP30 EP39 EPD  3039 A1  Rio Yokota, A1  Lorena A. Barba, PY  2012 KW  Scientific computing KW  GPU programming KW  fast Nbody algorithms KW  computational science KW  autotuning VL  14 JA  Computing in Science and Engineering ER   
Algorithms designed to efficiently solve the classical Nbody problem of mechanics fit well on GPU hardware and exhibit excellent scalability on many GPUs. Their computational intensity makes them a promising approach for other applications amenable to an Nbody formulation. Adding features such as autotuning makes multipoletype algorithms ideal for heterogeneous computing environments.
1. J. Barnes and P. Hut, "A Hierarchical O(Nlog N) ForceCalculation Algorithm," Nature, vol. 324, 1986, pp. 446–449.
2. L. Greengard and V. Rokhlin, "A Fast Algorithm for Particle Simulations," J. Computational Physics, vol. 73, no. 2, 1987, pp. 325–348.
3. R. Yokota and L.A. Barba, "Treecode and Fast Multipole Method for NBody Simulation with CUDA," GPU Computing Gems Emerald Edition, W.M. Hwu ed., Elsevier/Morgan Kaufman, 2011, pp. 113–132.
4. M.S. Warren, and J.K. Salmon, "A Portable Parallel Particle Program," Computer Physics Comm., vol. 87, nos. 12, 1995, pp. 266–290.
5. W. Dehnen, "A Hierarchical O (N) Force Calculation Algorithm," J. Computational Physics, vol. 179, no. 1, 2002, pp. 27–42.
6. H. Cheng, L. Greengard, and V. Rokhlin, "A Fast Adaptive Multipole Algorithm in Three Dimensions," J. Computational Physics, vol. 155, no. 2, 1999, pp. 468–498.
7. T. Hamada and T. Iitaka, "The Chamomile Scheme: An Optimized Algorithm for NBody Simulations on Programmable Graphics Processing Units," 2007; http://arxiv.org/pdf/astroph0703100v1.pdf .
8. L. Nyland, M. Harris, and J. Prins, "Fast NBody Simulation with CUDA," GPU Gems 3, AddisonWesley Professional, 2007, pp. 677–695.
9. R.G. Belleman, J. Bédorf, and S.F. Portegies Zwart, "High Performance Direct Gravitational NBody Simulations on Graphics Processing Units II: An Implementation in CUDA," New Astronomy, vol. 13, no. 2, 2008, pp. 103–112.
10. E. Gaburov, S. Harfst, and S.P. Zwart, "Sapporo: A Way to Turn Your Graphics Cards into a GRAPE6," New Astronomy, vol. 14, no. 7, 2009, pp. 630–637.
11. S. Williams, A. Waterman, and D. Patterson, "Roofline: An Insightful Visual Performance Model for Multicore Architectures," Comm. ACM, vol. 52, no. 4, 2008, pp. 65–76.
12. J.K. Salmon, "Parallel Hierarchical Nbody Methods," doctoral dissertation, Physics, Mathematics and Astronomy Dept., California Inst. Technology, 1990.
13. M.S. Warren and J.K. Salmon, "A Parallel Hashed OctTree NBody Algorithm," Proc. 1993 ACM/IEEE Conf. Supercomputing, ACM, 1993, pp. 12–21.
14. J. Dubinski, "A Parallel Tree Code," New Astronomy, vol. 1, no. 2, 1996, pp. 133–147.
15. H. Sundar, R.S. Sampath, and G. Biros, "BottomUp Construction and 2:1 Balance Refinement of Linear Octrees in Parallel," SIAM J. Scientific Computing, vol. 30, no. 5, 2008, pp. 2675–2708.
16. R. Yokota et al., "Biomolecular Electrostatics Using a Fast Multipole BEM on up to 512 GPUs and a Billion Unknowns," Computer Physics Comm., vol. 182, no. 6, 2011, pp. 1271–1283.