
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
Dominik Göddeke, Robert Strzodka, "Cyclic Reduction Tridiagonal Solvers on GPUs Applied to MixedPrecision Multigrid," IEEE Transactions on Parallel and Distributed Systems, vol. 22, no. 1, pp. 2232, January, 2011.  
BibTex  x  
@article{ 10.1109/TPDS.2010.61, author = {Dominik Göddeke and Robert Strzodka}, title = {Cyclic Reduction Tridiagonal Solvers on GPUs Applied to MixedPrecision Multigrid}, journal ={IEEE Transactions on Parallel and Distributed Systems}, volume = {22}, number = {1}, issn = {10459219}, year = {2011}, pages = {2232}, doi = {http://doi.ieeecomputersociety.org/10.1109/TPDS.2010.61}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Parallel and Distributed Systems TI  Cyclic Reduction Tridiagonal Solvers on GPUs Applied to MixedPrecision Multigrid IS  1 SN  10459219 SP22 EP32 EPD  2232 A1  Dominik Göddeke, A1  Robert Strzodka, PY  2011 KW  GPU Computing KW  mixedprecision iterative refinement KW  multigrid KW  tridiagonal solvers KW  cyclic reduction KW  finite elements KW  NVIDIA CUDA. VL  22 JA  IEEE Transactions on Parallel and Distributed Systems ER   
[1] NVIDIA Corporation, "NVIDIA CUDA Programming Guide Version 2.3," http://www.nvidia.comcuda, July 2009.
[2] J.D. Owens, M. Houston, D.P. Luebke, S. Green, J.E. Stone, and J.C. Phillips, "GPU Computing," Proc. IEEE, vol. 96, no. 5, pp. 879899, May 2008.
[3] M. Garland, S.L. Grand, J. Nickolls, J.A. Anderson, J. Hardwick, S. Morton, E.H. Phillips, Y. Zhang, and V. Volkov, "Parallel Computing Experiences with CUDA," IEEE Micro, vol. 28, no. 4, pp. 1327, July 2008.
[4] E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym, "NVIDIA Tesla: A Unified Graphics and Computing Architecture," IEEE Micro, vol. 28, no. 2, pp. 3955, Mar./Apr. 2008.
[5] J. Nickolls, I. Buck, M. Garland, and K. Skadron, "Scalable Parallel Programming with CUDA," ACM Queue, vol. 6, no. 2, pp. 4053, Mar./Apr. 2008.
[6] J. Bolz, I. Farmer, E. Grinspun, and P. Schröder, "Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid," ACM Trans. Graphics, vol. 22, no. 3, pp. 917924, July 2003.
[7] N. Goodnight, C. Woolley, G. Lewin, D.P. Luebke, and G. Humphreys, "A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware," Proc. Conf. Graphics Hardware, M. Doggett, W. Heidrich, W.R. Mark, and A. Schilling, eds., pp. 102111, July 2003.
[8] R. Strzodka, M. Droske, and M. Rumpf, "Image Registration by a Regularized Gradient Flow—a Streaming Implementation in DX9 Graphics Hardware," Computing, vol. 73, no. 4, pp. 373389, Nov. 2004.
[9] D. Göddeke, R. Strzodka, and S. Turek, "Performance and Accuracy of HardwareOriented Native, Emulated and MixedPrecision Solvers in FEM Simulations," Int'l J. Parallel, Emergent and Distributed Systems, vol. 22, no. 4, pp. 221256, Jan. 2007.
[10] M. Kazhdan and H. Hoppe, "Streaming Multigrid for GradientDomain Operations on Large Images," ACM Trans. Graphics, vol. 27, no. 3, pp. 110, Aug. 2008.
[11] Z. Feng and P. Li, "Multigrid on GPU: Tackling Power Grid Analysis on Parallel SIMT Platforms," Proc. IEEE/ACM Int'l Conf. ComputerAided Design (ICCAD '08), pp. 647654, Nov. 2008.
[12] E. Elsen, P. LeGresley, and E. Darve, "Large Calculation of the Flow over a Hypersonic Vehicle Using a GPU," J. Computational Physics, vol. 227, no. 24, pp. 1014810161, Dec. 2008.
[13] M. Kass, A.E. Lefohn, and J.D. Owens, "Interactive Depth of Field Using Simulated Diffusion," Technical Report 0601, Pixar Animation Studios, Jan. 2006.
[14] S. Sengupta, M.J. Harris, Y. Zhang, and J.D. Owens, "Scan Primitives for GPU Computing," Proc. Conf. Graphics Hardware, T. Aila and M. Segal, eds., pp. 97106, Aug. 2007.
[15] R.W. Hockney, "A Fast Direct Solution of Poisson's Equation Using Fourier Analysis," J. ACM, vol. 12, no. 1, pp. 95113, Jan. 1965.
[16] R.W. Hockney and C.R. Jesshope, Parallel Computers. Adam Hilger, Nov. 1981.
[17] H.S. Stone, "An Efficient Parallel Algorithm for the Solution of a Tridiagonal Linear System of Equations," J. ACM, vol. 20, no. 1, pp. 2738, Jan. 1973.
[18] Y. Zhang, J. Cohen, and J.D. Owens, "Fast Tridiagonal Solvers on the GPU," Proc. 15th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP '10), pp. 127136, Jan. 2010.
[19] M. Grajewski, M. Köster, and S. Turek, "Mathematical and Numerical Analysis of a Robust and Efficient Grid Deformation Method in the Finite Element Context," SIAM J. Scientific Computing, vol. 31, no. 2, pp. 15391557, Nov. 2008.
[20] S. Turek, C. Becker, and S. Kilian, "HardwareOriented Numerics and Concepts for PDE Software," Future Generation Computer Systems, vol. 22, nos. 1/2, pp. 217238, Feb. 2004.
[21] S. Turek, D. Göddeke, C. Becker, S.H. Buijssen, and H. Wobker, "FEAST—Realisation of HardwareOriented Numerics for HPC Simulations with Finite Elements," Concurrency and Computation: Practice and Expecience, special issue Proc. ISC 2008, Feb. 2010, doi:10.1002/cpe.1584.
[22] S. Turek, C. Becker, S. Kilian, S.H.M. Buijssen, D. Göddeke, and H. Wobker, "FEAST—Finite Element Analysis and Solution Tools," http:/www.feast.tudortmund.de, 2008.
[23] D. Göddeke, H. Wobker, R. Strzodka, J. MohdYusof, P.S. McCormick, and S. Turek, "CoProcessor Acceleration of an Unmodified Parallel Solid Mechanics Code with FEASTGPU," Int'l J. Computational Science and Eng., vol. 4, no. 4, pp. 254269, Oct. 2009.
[24] D. Göddeke, S.H. Buijssen, H. Wobker, and S. Turek, "GPU Acceleration of an Unmodified Parallel Finite Element NavierStokes Solver," Proc. IEEE Int'l Conf. High Performance Computing and Simulation (HPCS '09), pp. 1221, June 2009.
[25] O. Axelsson and V.A. Barker, Finite Element Solution of Boundary Value Problems, vol. 35. SIAM, 2001.
[26] D.C. Pham, S. Asano, M. Bolliger, M.N. Day, H.P. Hofstee, C.R. Johns, J.A. Kahle, A. Kameyama, J. Keaty, Y. Masubuchi, M. Riley, D. Shippy, D.L. Stasiak, M. Suzuoki, M. Wang, J. Warnock, S. Weitzel, D. Wendel, T. Yamazaki, and K. Yazawa, "The Design and Implementation of a FirstGeneration CELL Processor," Proc. Int'l SolidState Circuits Conf. (ISSCC '05), Digest of Technical Papers, vol. 1, pp. 184592, Feb. 2005.
[27] NVIDIA Corporation, "Whitepaper: NVIDIA's Next Generation CUDA Compute Architecture: Fermi," http://www.nvidia.com/objectfermi_architecture.html , Sept. 2009.
[28] J.H. Wilkinson, Rounding Errors in Algebraic Processes. PrenticeHall, 1963.
[29] R.S. Martin, G. Peters, and J.H. Wilkinson, "Iterative Refinement of the Solution of a Positive Definite System of Equations," Numerische Mathematik, vol. 8, no. 3, pp. 203216, May 1966.
[30] H.J. Bowdler, R.S. Martin, G. Peters, and J.H. Wilkinson, "Solution of Real and Complex Systems of Linear Equations," Numerische Mathematik, vol. 8, no. 3, pp. 217234, May 1966.
[31] C.B. Moler, "Iterative Refinement in Floating Point," J. ACM, vol. 14, no. 2, pp. 316321, Apr. 1967.
[32] D.E. Knuth, The Art of Computer Programming, Volume 2: Seminumerical Algorithms, third ed. AddisonWesley, 1997.
[33] L.H. Thomas, "Elliptic Problems in Linear Difference Equations over a Network," Watson Scientific Computing Laboratory Report, Columbia Univ., 1949.
[34] D.W. Peaceman and H.H. Rachford Jr, "The Numerical Solution of Parabolic and Elliptic Differential Equations," J. Soc. for Industrial and Applied Math., vol. 3, no. 1, pp. 2841, Mar. 1955.