|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| Dominik Göddeke, Robert Strzodka, "Cyclic Reduction Tridiagonal Solvers on GPUs Applied to Mixed-Precision Multigrid," IEEE Transactions on Parallel and Distributed Systems, vol. 22, no. 1, pp. 22-32, January, 2011. | |||
| BibTex | x | ||
| @article{ 10.1109/TPDS.2010.61, author = {Dominik Göddeke and Robert Strzodka}, title = {Cyclic Reduction Tridiagonal Solvers on GPUs Applied to Mixed-Precision Multigrid}, journal ={IEEE Transactions on Parallel and Distributed Systems}, volume = {22}, number = {1}, issn = {1045-9219}, year = {2011}, pages = {22-32}, doi = {http://doi.ieeecomputersociety.org/10.1109/TPDS.2010.61}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Parallel and Distributed Systems TI - Cyclic Reduction Tridiagonal Solvers on GPUs Applied to Mixed-Precision Multigrid IS - 1 SN - 1045-9219 SP22 EP32 EPD - 22-32 A1 - Dominik Göddeke, A1 - Robert Strzodka, PY - 2011 KW - GPU Computing KW - mixed-precision iterative refinement KW - multigrid KW - tridiagonal solvers KW - cyclic reduction KW - finite elements KW - NVIDIA CUDA. VL - 22 JA - IEEE Transactions on Parallel and Distributed Systems ER - | |||
[1] NVIDIA Corporation, "NVIDIA CUDA Programming Guide Version 2.3," http://www.nvidia.comcuda, July 2009.
[2] J.D. Owens, M. Houston, D.P. Luebke, S. Green, J.E. Stone, and J.C. Phillips, "GPU Computing," Proc. IEEE, vol. 96, no. 5, pp. 879-899, May 2008.
[3] M. Garland, S.L. Grand, J. Nickolls, J.A. Anderson, J. Hardwick, S. Morton, E.H. Phillips, Y. Zhang, and V. Volkov, "Parallel Computing Experiences with CUDA," IEEE Micro, vol. 28, no. 4, pp. 13-27, July 2008.
[4] E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym, "NVIDIA Tesla: A Unified Graphics and Computing Architecture," IEEE Micro, vol. 28, no. 2, pp. 39-55, Mar./Apr. 2008.
[5] J. Nickolls, I. Buck, M. Garland, and K. Skadron, "Scalable Parallel Programming with CUDA," ACM Queue, vol. 6, no. 2, pp. 40-53, Mar./Apr. 2008.
[6] J. Bolz, I. Farmer, E. Grinspun, and P. Schröder, "Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid," ACM Trans. Graphics, vol. 22, no. 3, pp. 917-924, July 2003.
[7] N. Goodnight, C. Woolley, G. Lewin, D.P. Luebke, and G. Humphreys, "A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware," Proc. Conf. Graphics Hardware, M. Doggett, W. Heidrich, W.R. Mark, and A. Schilling, eds., pp. 102-111, July 2003.
[8] R. Strzodka, M. Droske, and M. Rumpf, "Image Registration by a Regularized Gradient Flow—a Streaming Implementation in DX9 Graphics Hardware," Computing, vol. 73, no. 4, pp. 373-389, Nov. 2004.
[9] D. Göddeke, R. Strzodka, and S. Turek, "Performance and Accuracy of Hardware-Oriented Native-, Emulated- and Mixed-Precision Solvers in FEM Simulations," Int'l J. Parallel, Emergent and Distributed Systems, vol. 22, no. 4, pp. 221-256, Jan. 2007.
[10] M. Kazhdan and H. Hoppe, "Streaming Multigrid for Gradient-Domain Operations on Large Images," ACM Trans. Graphics, vol. 27, no. 3, pp. 1-10, Aug. 2008.
[11] Z. Feng and P. Li, "Multigrid on GPU: Tackling Power Grid Analysis on Parallel SIMT Platforms," Proc. IEEE/ACM Int'l Conf. Computer-Aided Design (ICCAD '08), pp. 647-654, Nov. 2008.
[12] E. Elsen, P. LeGresley, and E. Darve, "Large Calculation of the Flow over a Hypersonic Vehicle Using a GPU," J. Computational Physics, vol. 227, no. 24, pp. 10148-10161, Dec. 2008.
[13] M. Kass, A.E. Lefohn, and J.D. Owens, "Interactive Depth of Field Using Simulated Diffusion," Technical Report 06-01, Pixar Animation Studios, Jan. 2006.
[14] S. Sengupta, M.J. Harris, Y. Zhang, and J.D. Owens, "Scan Primitives for GPU Computing," Proc. Conf. Graphics Hardware, T. Aila and M. Segal, eds., pp. 97-106, Aug. 2007.
[15] R.W. Hockney, "A Fast Direct Solution of Poisson's Equation Using Fourier Analysis," J. ACM, vol. 12, no. 1, pp. 95-113, Jan. 1965.
[16] R.W. Hockney and C.R. Jesshope, Parallel Computers. Adam Hilger, Nov. 1981.
[17] H.S. Stone, "An Efficient Parallel Algorithm for the Solution of a Tridiagonal Linear System of Equations," J. ACM, vol. 20, no. 1, pp. 27-38, Jan. 1973.
[18] Y. Zhang, J. Cohen, and J.D. Owens, "Fast Tridiagonal Solvers on the GPU," Proc. 15th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP '10), pp. 127-136, Jan. 2010.
[19] M. Grajewski, M. Köster, and S. Turek, "Mathematical and Numerical Analysis of a Robust and Efficient Grid Deformation Method in the Finite Element Context," SIAM J. Scientific Computing, vol. 31, no. 2, pp. 1539-1557, Nov. 2008.
[20] S. Turek, C. Becker, and S. Kilian, "Hardware-Oriented Numerics and Concepts for PDE Software," Future Generation Computer Systems, vol. 22, nos. 1/2, pp. 217-238, Feb. 2004.
[21] S. Turek, D. Göddeke, C. Becker, S.H. Buijssen, and H. Wobker, "FEAST—Realisation of Hardware-Oriented Numerics for HPC Simulations with Finite Elements," Concurrency and Computation: Practice and Expecience, special issue Proc. ISC 2008, Feb. 2010, doi:10.1002/cpe.1584.
[22] S. Turek, C. Becker, S. Kilian, S.H.M. Buijssen, D. Göddeke, and H. Wobker, "FEAST—Finite Element Analysis and Solution Tools," http:/www.feast.tu-dortmund.de, 2008.
[23] D. Göddeke, H. Wobker, R. Strzodka, J. Mohd-Yusof, P.S. McCormick, and S. Turek, "Co-Processor Acceleration of an Unmodified Parallel Solid Mechanics Code with FEASTGPU," Int'l J. Computational Science and Eng., vol. 4, no. 4, pp. 254-269, Oct. 2009.
[24] D. Göddeke, S.H. Buijssen, H. Wobker, and S. Turek, "GPU Acceleration of an Unmodified Parallel Finite Element Navier-Stokes Solver," Proc. IEEE Int'l Conf. High Performance Computing and Simulation (HPCS '09), pp. 12-21, June 2009.
[25] O. Axelsson and V.A. Barker, Finite Element Solution of Boundary Value Problems, vol. 35. SIAM, 2001.
[26] D.C. Pham, S. Asano, M. Bolliger, M.N. Day, H.P. Hofstee, C.R. Johns, J.A. Kahle, A. Kameyama, J. Keaty, Y. Masubuchi, M. Riley, D. Shippy, D.L. Stasiak, M. Suzuoki, M. Wang, J. Warnock, S. Weitzel, D. Wendel, T. Yamazaki, and K. Yazawa, "The Design and Implementation of a First-Generation CELL Processor," Proc. Int'l Solid-State Circuits Conf. (ISSCC '05), Digest of Technical Papers, vol. 1, pp. 184-592, Feb. 2005.
[27] NVIDIA Corporation, "Whitepaper: NVIDIA's Next Generation CUDA Compute Architecture: Fermi," http://www.nvidia.com/objectfermi_architecture.html , Sept. 2009.
[28] J.H. Wilkinson, Rounding Errors in Algebraic Processes. Prentice-Hall, 1963.
[29] R.S. Martin, G. Peters, and J.H. Wilkinson, "Iterative Refinement of the Solution of a Positive Definite System of Equations," Numerische Mathematik, vol. 8, no. 3, pp. 203-216, May 1966.
[30] H.J. Bowdler, R.S. Martin, G. Peters, and J.H. Wilkinson, "Solution of Real and Complex Systems of Linear Equations," Numerische Mathematik, vol. 8, no. 3, pp. 217-234, May 1966.
[31] C.B. Moler, "Iterative Refinement in Floating Point," J. ACM, vol. 14, no. 2, pp. 316-321, Apr. 1967.
[32] D.E. Knuth, The Art of Computer Programming, Volume 2: Seminumerical Algorithms, third ed. Addison-Wesley, 1997.
[33] L.H. Thomas, "Elliptic Problems in Linear Difference Equations over a Network," Watson Scientific Computing Laboratory Report, Columbia Univ., 1949.
[34] D.W. Peaceman and H.H. Rachford Jr, "The Numerical Solution of Parabolic and Elliptic Differential Equations," J. Soc. for Industrial and Applied Math., vol. 3, no. 1, pp. 28-41, Mar. 1955.

