Subscribe

Issue No.10 - Oct. (2012 vol.23)

pp: 1915-1922

Kamesh Madduri , Lawrence Berkeley National Laboratory, Berkeley

Jimmy Su , University of California at Berkeley, Berkeley

Samuel Williams , Lawrence Berkeley National Laboratory, Berkeley

Leonid Oliker , Lawrence Berkeley National Laboratory, Berkeley

Stéphane Ethier , Princeton Plasma Physics Laboratory, Princeton

Katherine Yelick , University of California at Berkeley and Lawrence Berkeley National Laboratory, Berkeley

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.28

ABSTRACT

We are now in the multicore revolution which is witnessing a rapid evolution of architectural designs due to power constraints and correspondingly limited microprocessor clock speeds. Understanding how to efficiently utilize these systems in the context of demanding numerical algorithms is an urgent challenge to meet the ever growing computational needs of high-end computing. In this work, we examine multicore parallel optimization of the particle-to-grid interpolation step in particle-mesh methods, an inherently complex optimization problem due to its low computation intensity, irregular data accesses, and potential fine-grained data hazards. Our evaluated kernels are derived from two important numerical computations: a biological simulation of the heart using the Immersed Boundary (IB) method, and a Gyrokinetic Particle-in-Cell (PIC)-based application for studying fusion plasma microturbulence. We develop several novel synchronization and grid decomposition schemes, as well as low-level optimization techniques to maximize performance on three modern multicore platforms: Intel's Xeon X5550 (Nehalem), AMD's Opteron 2356 (Barcelona), and Sun's UltraSparc {\rm T}2{+} (Niagara). Results show that our optimizations lead to significant performance improvements, achieving up to a 5.6{\times} speedup compared to the reference parallel implementation. Our work also provides valuable insight into the design of future autotuning frameworks for particle-to-grid interpolation on next-generation systems.

INDEX TERMS

Heart, Synchronization, Optimization, Interpolation, Multicore processing, Kernel, Computational modeling, lock free, Particle mesh, particle-to-grid interpolation, multicore performance tuning, synchronization, atomic

CITATION

Kamesh Madduri, Jimmy Su, Samuel Williams, Leonid Oliker, Stéphane Ethier, Katherine Yelick, "Optimization of Parallel Particle-to-Grid Interpolation on Leading Multicore Platforms",

*IEEE Transactions on Parallel & Distributed Systems*, vol.23, no. 10, pp. 1915-1922, Oct. 2012, doi:10.1109/TPDS.2012.28REFERENCES

- [1] M. Adams, S. Ethier, and N. Wichmann, "Performance of Particle in Cell Methods on Highly Concurrent Computational Architectures,"
J. Physics: Conf. Series, vol. 78, p. 012001, 2007.- [2] S. Ethier, W. Tang, R. Walkup, and L. Oliker, "Large-Scale Gyrokinetic Particle Simulation of Microturbulence in Magnetically Confined Fusion Plasmas,"
IBM J. Research and Development, vol. 52, nos. 1/2, pp. 105-115, 2008.- [3] E. Givelberg and K. Yelick, "Distributed Immersed Boundary Simulation in Titanium,"
SIAM J. Scientific Computing, vol. 28, no. 4, pp. 1361-1378, 2007.- [4] J. Su, "Optimizing Irregular Data Accesses for Cluster and Multicore Architectures," PhD dissertation, Univ. of California, Berkeley, http://www.eecs.berkeley.edu/Pubs/TechRpts/ 2010EECS-2010-170.html, Dec. 2010.
- [5] K. Madduri, K. Ibrahim, S. Williams, E.-J. Im, S. Ethier, J. Shalf, and L. Oliker, "Gyrokinetic Toroidal Simulations on Leading Multi- and Manycore Hpc Systems,"
Proc. ACM/IEEE Conf. Supercomputing (SC '11), pp. 1-11, Nov. 2011.- [6] R. Hockney and J. Eastwood,
Computer Simulation Using Particles. Taylor & Francis, Inc., 1988.- [7] E. Bertschinger and J. Gelb, "Cosmological N-Body Simulations,"
Computers in Physics, vol. 5, pp. 164-175, 1991.- [8] C. Birdsall and A. Langdon,
Plasma Physics via Computer Simulation. McGraw Hill Higher Education, 1984.- [9] R. Mittal and G. Iaccarino, "Immersed Boundary Methods,"
Ann. Rev. Fluid Mechanics, vol. 37, pp. 239-261, 2005.- [10] C. Peskin and D. McQueen, "A Three-Dimensional Computational Method for Blood Flow in the Heart: (I) Immersed Elastic Fibers in a Viscous Incompressible Fluid,"
J. Computational Physics, vol. 81, pp. 372-405, 1989.- [11] R. Beyer, "A Computational Model of the Cochlea Using the Immersed Boundary Method,"
J. Computational Physics, vol. 98, pp. 145-162, 1992.- [12] D. McQueen and C. Peskin, "A Three-Dimensional Computer Model of the Human Heart for Studying Cardiac Fluid Dynamics,"
Computer Graphics, vol. 34, pp. 56-60, 2000.- [13] Z. Lin, T. Hahm, W. Lee, W. Tang, and R. White, "Turbulent Transport Reduction by Zonal Flows: Massively Parallel Simulations,"
Science, vol. 281, no. 5384, pp. 1835-1837, 1998.- [14] S. Ethier, W. Tang, and Z. Lin, "Gyrokinetic Particle-in-Cell Simulations of Plasma Microturbulence on Advanced Computing Platforms,"
J. Physics: Conf. Series, vol. 16, pp. 1-15, 2005.- [15] W. Lee, "Gyrokinetic Particle Simulation Model,"
J. Computational Physics, vol. 72, no. 1, pp. 243-269, 1987. |