This Article 
 Bibliographic References 
 Add to: 
Assessing Accelerator-Based HPC Reverse Time Migration
January 2011 (vol. 22 no. 1)
pp. 147-162
Mauricio Araya-Polo, Barcelona Supercomputing Center, Barcelona
Javier Cabezas, Barcelona Supercomputing Center, Barcelona
Mauricio Hanzich, Barcelona Supercomputing Center, Barcelona
Miquel Pericas, Barcelona Supercomputing Center, Barcelona
Félix Rubio, Barcelona Supercomputing Center, Barcelona
Isaac Gelado, Universitat Politecnica de Catalunya, Barcelona
Muhammad Shafiq, Barcelona Supercomputing Center, Barcelona
Enric Morancho, Universitat Politecnica de Catalunya, Barcelona
Nacho Navarro, Universitat Politecnica de Catalunya, Barcelona
Eduard Ayguade, Barcelona Supercomputing Center, Barcelona
José María Cela, Barcelona Supercomputing Center, Barcelona
Mateo Valero, Barcelona Supercomputing Center, Barcelona
Oil and gas companies trust Reverse Time Migration (RTM), the most advanced seismic imaging technique, with crucial decisions on drilling investments. The economic value of the oil reserves that require RTM to be localized is in the order of 10^{13} dollars. But RTM requires vast computational power, which somewhat hindered its practical success. Although, accelerator-based architectures deliver enormous computational power, little attention has been devoted to assess the RTM implementations effort. The aim of this paper is to identify the major limitations imposed by different accelerators during RTM implementations, and potential bottlenecks regarding architecture features. Moreover, we suggest a wish list, that from our experience, should be included as features in the next generation of accelerators, to cope with the requirements of applications like RTM. We present an RTM algorithm mapping to the IBM Cell/B.E., NVIDIA Tesla and an FPGA platform modeled after the Convey HC-1. All three implementations outperform a traditional processor (Intel Harpertown) in terms of performance (10x), but at the cost of huge development effort, mainly due to immature development frameworks and lack of well-suited programming models. These results show that accelerators are well positioned platforms for this kind of workload. Due to the fact that our RTM implementation is based on an explicit high order finite difference scheme, some of the conclusions of this work can be extrapolated to applications with similar numerical scheme, for instance, magneto-hydrodynamics or atmospheric flow simulations.

[1] J.A. Kahle, M.N. Day, H.P. Hofstee, C.R. Johns, T.R. Maeurer, and D. Shippy, "Introduction to the Cell Multiprocessor," IBM J. Research and Development, vol. 49, nos. 4/5, pp. 589-604, 2005.
[2] S. Patel and W.-M.W. Hwu, "Accelerator Architectures," IEEE Micro, vol. 28, no. 4, pp. 4-12, July/Aug. 2008.
[3] S. Hauck and A. DeHon, Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation. Morgan Kaufmann Publishers Inc., 2007.
[4] E. Baysal, D.D. Kosloff, and J.W.C. Sherwood, "Reverse Time Migration," Geophysics, vol. 48, no. 11, pp. 1514-1524, 1983.
[5] R. Baud, R. Peterson, G. Richardson, L. French, J. Regg, T. Montgomery, T. Williams, C. Doyle, and M. Dorner, "Deepwater Gulf of Mexico 2002: America's Expanding Frontier," OCS Report, vol. MMS 2002-021, pp. 1-133, 2002.
[6] D.E. Shaw, M.M. Deneroff, R.O. Dror, J.S. Kuskin, R.H. Larson, J.K. Salmon, C. Young, B. Batson, K.J. Bowers, J.C. Chao, M.P. Eastwood, J. Gagliardo, J.P. Grossman, C.R. Ho, D.J. Ierardi, I. Kolossváry, J.L. Klepeis, T. Layman, C. McLeavey, M.A. Moraes, R. Mueller, E.C. Priest, Y. Shan, J. Spengler, M. Theobald, B. Towles, and S.C. Wang, "Anton, a Special-Purpose Machine for Molecular Dynamics Simulation," Proc. 34th Ann. Int'l Symp. Computer architecture (ISCA '07), pp. 1-12, 2007.
[7] Y. Sun, F. Qin, S. Checkles, and J.P. Leveille, "3d Prestack Kirchhoff Beam Migration for Depth Imaging," Geophysics, vol. 65, pp. 1592-1603, 2000.
[8] A.G.F. Ortigosa, Q. Liao, and W. Cai, "Speeding Up RTM Velocity Model Building Beyond Algorithmics," Proc. SEG Int'l Exposition and 78th Ann. Meeting, Nov. 2008.
[9] A. Ray, G. Kondayya, and S.V.G. Menon, "Developing a Finite Difference Time Domain Parallel Code for Nuclear Electromagnetic Field Simulation," IEEE Trans. Antennas and Propagation, vol. 54, no. 4, pp. 1192-1199, Apr. 2006.
[10] S. Operto, J. Virieux, P. Amestoy, L. Giraud, and J.Y. L'Excellent, "3D Frequency-Domain Finite-Difference Modeling of Acoustic Wave Propagation Using a Massively Parallel Direct Solver: A Feasibility Study," SEG Technical Program Expanded Abstracts, pp. 2265-2269, 2006.
[11] S. Kamil, P. Husbands, L. Oliker, J. Shalf, and K. Yelick, "Impact of Modern Memory Subsystems on Cache Optimizations for Stencil Computations," Proc. Workshop Memory System Performance (MSP '05), pp. 36-43, 2005.
[12] M.E. Wolf and M.S. Lam, "A Data Locality Optimizing Algorithm," ACM SIGPLAN Notices, vol. 26, no. 6, pp. 30-44, 1991.
[13] R. de la Cruz, M. Araya-Polo, and J.M. Cela, "Introducing the Semi-Stencil Algorithm," Proc. Eighth Int'l Conf. Parallel Processing and Applied Math., 2009.
[14] G. Rivera and C.W. Tseng, "Tiling Optimizations for 3D Scientific Computations," Proc. High Performance Networking and Computing Conf., 2000.
[15] L. Dagum and R. Menon, "OpenMP: An Industry-Standard API for Shared-Memory Programming," IEEE Computational Science and Eng., vol. 5, no. 1, pp. 46-55, Jan. 1998.
[16] E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym, "Nvidia Tesla: A Unified Graphics and Computing Architecture," IEEE Micro, vol. 28, no. 2, pp. 39-55, Mar./Apr. 2008.
[17] M. Garland, S.L. Grand, J. Nickolls, J. Anderson, J. Hardwick, S. Morton, E. Phillips, Y. Zhang, and V. Volkov, "Parallel Computing Experiences with CUDA," IEEE Micro, vol. 28, no. 4, pp. 13-27, July/Aug. 2008.
[18] P. Micikevicius, "3d Finite Difference Computation on GPUs Using CUDA," Proc. Second Workshop General Purpose Processing on Graphics Processing Units (GPGPU-2), pp. 79-84, 2009.
[19] C. He, G. Qin, M. Lu, and W. Zhao, "An Efficient Implementation of High-Accuracy Finite Difference Computing Engine on FPGAs," Proc. 17th IEEE CS Int'l Conf. Application-Specific Systems, Architectures and Processors (ASAP '06), pp. 95-98, 2006.
[20] M. Shafiq, M. Pericas, R. de la Cruz, M. Araya, N. Navarro, and E. Ayguade, "Exploiting Memory Customization in FPGA for 3D Stencil Computations," Proc. Int'l Conf. Field-Programmable Technology (FPT '09), 2009.
[21] P. Bellens, J.M. Perez, R.M. Badia, and J. Labarta, "CellSs: A Programming Model for the Cell BE Architecture," Proc. ACM/IEEE Conf. Supercomputing (SC '06), p. 86, 2006.
[22] I. Gelado, J. Cabezas, J. Stone, S. Patel, N. Navarro, and W.-M. Hwu, "An Asymmetric Distributed Shared Memory Model for Heterogeneous Parallel Systems," Proc. 15th Int'l Conf. Architectural Support for Programming Languages and Operating Systems, 2010.

Index Terms:
Reverse time migration, accelerators, GPU, Cell/B.E., FPGA, geophysics.
Mauricio Araya-Polo, Javier Cabezas, Mauricio Hanzich, Miquel Pericas, Félix Rubio, Isaac Gelado, Muhammad Shafiq, Enric Morancho, Nacho Navarro, Eduard Ayguade, José María Cela, Mateo Valero, "Assessing Accelerator-Based HPC Reverse Time Migration," IEEE Transactions on Parallel and Distributed Systems, vol. 22, no. 1, pp. 147-162, Jan. 2011, doi:10.1109/TPDS.2010.144
Usage of this product signifies your acceptance of the Terms of Use.