Subscribe

Issue No.01 - Jan. (2014 vol.25)

pp: 43-52

Eugenio Rustico , University of Catania, Catania

Giuseppe Bilotta , Istituto Nazionale di Geofisica e Vulcanologia - Osservatorio Etneo, Catania

Alexis Herault , Département Ingénierie Mathématique, Conservatoire National des Arts et Métiers, Paris

Ciro Del Negro , Istituto Nazionale di Geofisica e Vulcanologia - Osservatorio Etneo, Catania

Giovanni Gallo , University of Catania, Catania

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.340

ABSTRACT

We present a multi-GPU version of GPUSPH, a CUDA implementation of fluid-dynamics models based on the smoothed particle hydrodynamics (SPH) numerical method. The SPH is a well-known Lagrangian model for the simulation of free-surface fluid flows; it exposes a high degree of parallelism and has already been successfully ported to GPU. We extend the GPU-based simulator to run simulations on multiple GPUs simultaneously, to obtain a gain in speed and overcome the memory limitations of using a single device. The computational domain is spatially split with minimal overlapping and shared volume slices are updated at every iteration of the simulation. Data transfers are asynchronous with computations, thus completely covering the overhead introduced by slice exchange. A simple yet effective load balancing policy preserves the performance in case of unbalanced simulations due to asymmetric fluid topologies. The obtained speedup factor (up to 4.5x for 6 GPUs) closely follows the expected one (5x for 6 GPUs) and it is possible to run simulations with a higher number of particles than would fit on a single device. We use the Karp-Flatt metric to formally estimate the overall efficiency of the parallelization.

INDEX TERMS

Graphics processing units, Computational modeling, Kernel, Numerical models, Load modeling, Parallel processing, Load management,HPC, GPU, multi-GPU, SPH, CUDA, fluid dynamics, numerical simulations, load balancing, parallel computing

CITATION

Eugenio Rustico, Giuseppe Bilotta, Alexis Herault, Ciro Del Negro, Giovanni Gallo, "Advances in Multi-GPU Smoothed Particle Hydrodynamics Simulations",

*IEEE Transactions on Parallel & Distributed Systems*, vol.25, no. 1, pp. 43-52, Jan. 2014, doi:10.1109/TPDS.2012.340REFERENCES

- [1] J. Nickolls, I. Buck, M. Garland, and K. Skadron, "Scalable Parallel Programming with CUDA,"
Queue, vol. 6, pp. 40-53, http://doi.acm.org/10.11451365490.1365500 , Mar. 2008.- [2] ATI,
ATI Stream Computing—User Guide, Dec. 2008.- [3] Z. Yang, Y. Zhu, and Y. Pu, "Parallel Image Processing Based on CUDA,"
Proc. Int'l Conf. Computer Science and Software Eng., vol. 3, no. 208, pp. 198-201, 2008.- [4] M. Silberstein, A. Schuster, D. Geiger, A. Patney, and J.D. Owens, "Efficient Computation of Sum-Products on GPUs through Software-Managed Cache,"
Proc. Second Ann. Int'l Conf. Supercomputing (ICS '08), pp. 309-318, http://doi.acm.org/10.11451375527.1375572 , 2008.- [5] J. Nickolls and W.J. Dally, "The GPU Computing Era,"
IEEE Micro, vol. 30, no. 2, pp. 56-69, Mar. 2010.- [6] V.W. Lee, C. Kim, J. Chhugani, M. Deisher, D. Kim, A.D. Nguyen, N. Satish, M. Smelyanskiy, S. Chennupaty, P. Hammarlund, R. Singhal, and P. Dubey, "Debunking the 100X GPU vs. CPU Myth: An Evaluation of Throughput Computing on CPU and GPU,"
ACM SIGARCH Computer Architecture News , vol. 38, pp. 451-460, http://doi.acm.org/10.11451816038.1816021 , June 2010.- [7] E. Attardo, A. Borsic, and R. Halter, "A Multi-GPU Acceleration for 3D Imaging of the Prostate,"
Proc. Int'l Conf. Electromagnetics in Advanced Applications (ICEAA), pp. 1096-1099, 2011.- [8] B. Jang, D. Kaeli, S. Do, and H. Pien, "Multi GPU Implementation of Iterative Tomographic Reconstruction Algorithms,"
Proc. IEEE Sixth Int'l Conf. Symp. Biomedical Imaging: From Nano to Macro (ISBI '09), pp. 185-188, http://dl.acm.orgcitation. cfm?id=1699872.1699919 , 2009.- [9] M. Strengert, M. Magallón, D. Weiskopf, S. Guthe, and T. Ertl, "Hierarchical Visualization and Compression of Large Volume Datasets Using GPU Clusters,"
Proc. Eurographics Symp. Parallel Graphics and Visualization (EGPGV '04), pp. 41-48, 2004.- [10] O. Villa, L. Chen, and S. Krishnamoorthy, "High Performance Molecular Dynamic Simulation on Single and Multi-GPU Systems,"
Proc. IEEE Int'l Symp. Circuits and Systems (ISCAS), pp. 3805-3808, 2010.- [11] B.G. Levine, J.E. Stone, and A. Kohlmeyer, "Fast Analysis of Molecular Dynamics Trajectories with Graphics Processing Units-Radial Distribution Function Histogramming,"
J. Computational Physics, vol. 230, pp. 3556-3569, http://dx.doi.org/10.1016j.jcp.2011.01.048 , May 2011.- [12] J.C. Phillips, J.E. Stone, and K. Schulten, "Adapting a Message-Driven Parallel Application to GPU-Accelerated Clusters,"
Proc. ACM/IEEE Conf. Supercomputing (SC '08), pp. 8:1-8:9, http://dl.acm.orgcitation.cfm?id=1413370.1413379 , 2008.- [13] R. Babich, M.A. Clark, and B. Joó, "Parallelizing the QUDA Library for Multi-GPU Calculations in Lattice Quantum Chromodynamics,"
Proc. ACM/IEEE Int'l Conf. High Performance Computing, Networking, Storage, and Analysis (SC '10), pp. 1-11, http://dx.doi.org/10.1109SC.2010.40, 2010.- [14] P. Richmond, D. Walker, S. Coakley, and D. Romano, "High Performance Cellular Level Agent-Based Simulation with FLAME for the GPU,"
Briefings in Bioinformatics, vol. 11, no. 3, pp. 334-347, http://bib.oxfordjournals.org/content/11/ 3334.abstract, 2010.- [15] Y. Zhou, S. Song, T. Dong, and D.A. Yuen, "Seismic Wave Propagation Simulation Using Accelerated Support Operator Rupture Dynamics on Multi-GPU,"
Proc. IEEE 14th Int'l Conf. Computational Science and Eng., pp. 567-572, 2011.- [16] P. Vidal and E. Alba, "A Multi-GPU Implementation of a Cellular Genetic Algorithm,"
Proc. IEEE Congress Evolutionary Computation, pp. 1-7, http://ieeexplore.ieee.org/xplfreeabs_all.jsp? arnumber=5586530 , 2010.- [17] C. Obrecht, F. Kuznik, B. Tourancheau, and J.J. Roux, "Multi-GPU Implementation of the Lattice Boltzmann Method,"
Computers and Math. with Applications, vol. 65, pp. 252-261, http://dx.doi.org/10.1016j.camwa.2011.02.020 , 2011.- [18] L. Chen, O. Villa, S. Krishnamoorthy, and G. Gao, "Dynamic Load Balancing on Single- and Multi-GPU Systems,"
Proc. IEEE Int'l Symp. Parallel and Distributed Processing (IPDPS), pp. 1-12, 2010.- [19] R.A. Gingold and J.J. Monaghan, "Smoothed Particle Hydrodynamics - Theory and Application to Non-Spherical Stars,"
Monthly Notices of the Royal Astronomical Soc., vol. 181, pp. 375-389, Nov. 1977.- [20] J.J. Monaghan, "Smoothed Particle Hydrodynamics,"
Ann. Rev. of Astronomy and Astrophysics, vol. 30, pp. 543-574, 1977.- [21] J.J. Monaghan, "Smoothed Particle Hydrodynamics,"
Reports on Progress in Physics, vol. 68, no. 8, pp. 1703-1579, http://stacks.iop.org/0034-4885/68/i=8a=R01 , 2005.- [22] M. Gomez-Gesteira, B.D. Rogers, R.A. Dalrymple, and A.J. Crespo, "State-of-the-Art of Classical SPH for Free-Surface Flows,"
J. Hydraulic Research, vol. 48, no. sup1, pp. 6-27, http://www. tandfonline.com/doi/abs/10.1080 00221686.2010.9641242, 2010.- [23] T. Amada, M. Imura, Y. Yasumuro, Y. Manabe, and K. Chihara, "Particle-Based Fluid Simulation on GPU,"
Proc. ACM Workshop General-Purpose Computing on Graphics Processors and SIGGRAPH Poster Session, 2004.- [24] A. Kolb and N. Cuntz, "Dynamic Particle Coupling for GPU-Based Fluid Simulation,"
Proc. 18th Symp. Simulation Technique, pp. 722-727, http://citeseerx.ist.psu.edu/viewdocsummary? doi=10.1.1.89.2285 , 2005.- [25] T. Harada, S. Koshizuka, and Y. Kawaguchi, "Smoothed Particle Hydrodynamics on GPUs,"
Proc. Computer Graphics Int'l Conf., pp. 63-70, 2007.- [26] A. Hérault, G. Bilotta, and R.A. Dalrymple, "SPH on GPU with CUDA,"
J. Hydraulic Research, vol. 48, pp. 74-79, 2010.- [27] M. Gómez-Gesteira, B. Rogers, R. Dalrymple, A. Crespo, and M. Narayanaswamy, User Guide for the SPHysics Code v1.2, 2007.
- [28] A. Hérault, G. Bilotta, R. Dalrymple, E. Rustico, and C. Del Negro GPU-SPH, http://www.ce.jhu.edu/dalrymple/GPU/GPUSPH Home.html, 2013.
- [29] A. Hérault, A. Vicari, C. Del Negro, and R. Dalrymple, "Modeling Water Waves in the Surf Zone with GPU-SPHysics,"
Proc. Fourth Workshop, SPHERIC, ERCOFTAC, Nantes, 2009.- [30] R. Dalrymple and A. Hérault, "Levee Breaching with GPU-SPHysics Code,"
Proc. Fourth Workshop, SPHERIC, ERCOFTAC, Nantes, 2009.- [31] R. Dalrymple, A. Hérault, G. Bilotta, and R.J. Farahani, "GPU-Accelerated SPH Model for Water Waves and Other Free Surface Flows,"
Proc. 31st Int'l Conf. Coastal Eng., 2010.- [32] G. Bilotta, A. Hérault, C. Del Negro, G. Russo, and A. Vicari, "Complex Fluid Flow Modeling with SPH on GPU,"
EGU General Assembly, vol. 12, p. 12233, May 2010.- [33] A. Hérault, G. Bilotta, C. Del Negro, G. Russo, and A. Vicari,
SPH Modeling of Lava Flows with GPU Implementation, World Scientific Series on Nonlinear Science, Series B, vol. 15, pp. 183-188, World Scientific Publishing Company, 2010.- [34] A. Hérault, G. Bilotta, A. Vicari, E. Rustico, and C. Del Negro, "Numerical Simulation of Lava Flow Using a GPU SPH Model,"
Annals of Geophysics, vol. 54, no. 5, pp. 600-620, 2011.- [35] P. Goswami, P. Schlegel, B. Solenthaler, and R. Pajarola, "Interactive SPH Simulation and Rendering on the GPU,"
Proc. ACM SIGGRAPH/Eurographics Symp. Computer Animation (SCA '10), pp. 55-64, http://dl.acm.orgcitation. cfm?id=1921427.1921437 , 2010.- [36] A. Crespo, J. Domínguez, M. Gómez-Gesteira, A. Barreiro, and B. Rogers User Guide for DualSPHysics Code v2.0, 2012.
- [37] E. Rustico, G. Bilotta, A. Hérault, C. Del Negro, and G. Gallo, "Smoothed Particle Hydrodynamics Simulations on Multi-GPU Systems,"
Proc. Int'l Euromicro Conf. Parallel, Distributed and Network-Based Processing, pp. 543-574, Feb. 2012.- [38] S. Green Particle Simulation Using CUDA, http://developer. download.nvidia.com/compute/ DevZone/C/html/C/src/ parti cles/ docparticles.pdf, 2010.
- [39] E. Rustico, G. Bilotta, A. Hérault, C. Del Negro, and G. Gallo, "Scalable Multi-GPU Implementation of Cellular Automata Based Lava Simulations,"
Annals of Geophysics, vol. 54, no. 5, pp. 592-599, 2011.- [40] G.M. Amdahl, "Validity of the Single Processor Approach to Achieving Large Scale Computing Capabilities,"
Proc. Spring Joint Computer Conf. (AFIPS '67), pp. 483-485, http://doi.acm.org/10.11451465482.1465560 , 1967.- [41] J.L. Gustafson, "Reevaluating Amdahl's Law,"
Comm. ACM, vol. 31, pp. 532-533, 1988.- [42] A.H. Karp and H.P. Flatt, "Measuring Parallel Processor Performance,"
Comm. ACM, vol. 33, pp. 539-543, http://doi.acm.org/10.114578607.78614, May 1990. |