
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
Manish Kumar Jaiswal, Nitin Chandrachoodan, "FPGABased HighPerformance and Scalable Block LU Decomposition Architecture," IEEE Transactions on Computers, vol. 61, no. 1, pp. 6072, January, 2012.  
BibTex  x  
@article{ 10.1109/TC.2011.24, author = {Manish Kumar Jaiswal and Nitin Chandrachoodan}, title = {FPGABased HighPerformance and Scalable Block LU Decomposition Architecture}, journal ={IEEE Transactions on Computers}, volume = {61}, number = {1}, issn = {00189340}, year = {2012}, pages = {6072}, doi = {http://doi.ieeecomputersociety.org/10.1109/TC.2011.24}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Computers TI  FPGABased HighPerformance and Scalable Block LU Decomposition Architecture IS  1 SN  00189340 SP60 EP72 EPD  6072 A1  Manish Kumar Jaiswal, A1  Nitin Chandrachoodan, PY  2012 KW  LU decomposition KW  block LU KW  FPGA KW  hardware acceleration KW  floating point arithmetics KW  single/double precision KW  scaling KW  ATLAS KW  IntelMKL KW  GPU. VL  61 JA  IEEE Transactions on Computers ER   
[1] A. Edelman, "Large Dense Numerical Linear Algebra in 1993: The Parallel Computing Influence," Int'l J. Supercomputer Applications, vol. 7, pp. 113128, 1993.
[2] J.J. Dongarra and D.W. Walker, "Software Libraries for Linear Algebra Computations on High Performance Computers," SIAM Rev., vol. 37, pp. 151180, 1995.
[3] B.A. Hendrickson and D.E. Womble, "The TorusWrap Mapping for Dense Matrix Calculations on Massively Parallel Computers," SIAM J. Scientific Computing, vol. 15, no. 5, pp. 12011226, 1994.
[4] R. Harrington, "Origin and Development of the Method of Moments for Field Computation," IEEE Antennas and Propagation Magazine, vol. 32, no. 3, pp. 3135, June 1990.
[5] J.L. Hess, "Panel Methods in Computational Fluid Dynamics," Ann. Rev. of Fluid Mechanics, vol. 22, pp. 225274, Jan. 1990.
[6] L. Zhuo and V.K. Prasanna, "HighPerformance and Parameterized Matrix Factorization on FPGAs," Proc. Int'l Conf. Field Programmable Logic and Applications (FPL '06), pp. 16, Aug. 2006.
[7] J.W. Demmel, N.J. Higham, and R.S. Schreiber, "Stability of Block LU Factorization," Numerical Linear Algebra with Applications, vol. 2, no. 2, pp. 173190, 1995.
[8] J.W. Demmel and N.J. Higham, "Stability of Block Algorithms with Fast Level3 BLAS," ACM Trans. Math. Software, vol. 18, no. 3, pp. 274291, Sept. 1992.
[9] M.K. Jaiswal and N. Chandrachoodan, "A High Performance Implementation of LU Decomposition on FPGA," Proc. 13th VLSI Design and Test Symp. (VDAT '09), pp. 124134, July 2009.
[10] "Automatically Tuned Linear Algebra Software (ATLAS)," http://www.netlib.orgatlas/, 2011.
[11] H.T. Kung and J. Subhlok, "A New Approach for Automatic Parallelization of Blocked Linear Algebra Computations," Supercomputing '91: Proc. ACM/IEEE Conf. Supercomputing, pp. 122129, 1991.
[12] G. von Laszewski, M. Parashar, A.G. Mohamed, and G.C. Fox, "On the Parallelization of Blocked LU Factorization Algorithms on Distributed Memory Architectures," Supercomputing '92: Proc. ACM/IEEE Conf. Supercomputing, pp. 170179, 1992.
[13] Y. Zhang, T. Tang, G. Li, and X. Yang, "Implementation and Optimization of Dense LU Decomposition on the Stream Processor," Parallel Processing and Applied Mathematics, R. Wyrzykowski, J. Dongarra, K. Karczewski, and J. Wasniewski, eds., pp. 7888, Springer, 2008.
[14] A. Sudarsanam, S. Young, A. Dasu, and T. Hauser, "MultiFPGA Based High Performance LU Decomposition," Proc. 10th High Performance Embedded Computing (HPEC) Workshop, Sept. 2006.
[15] S. Choi and V.K. Prasanna, "Time and Energy Efficient Matrix Factorization Using FPGA," Proc. Int'l Conf. FieldProgrammable Logic and Applications (FPL '03), vol. 2278, pp. 507519, Sept. 2003.
[16] G. Govindu, S. Choi, and V.K. Prasanna, "Efficient FloatingPoint Based Block LU Decomposition on FPGAs," Proc. 11th Reconfigurable Architectures Workshop, Apr. 2004.
[17] G. Govindu, S. Choi, V. Prasanna, V. Daga, S. Gangadharpalli, and V. Sridhar, "A HighPerformance and EnergyEfficient Architecture for FloatingPoint Based LU Decomposition on FPGAs," Proc. 18th Int'l Parallel and Distributed Processing Symp., p. 149, Apr. 2004.
[18] L. Zhuo and V.K. Prasanna, "HighPerformance Designs for Linear Algebra Operations on Reconfigurable Hardware," IEEE Trans. Computers, vol. 57, no. 8, pp. 10571071, Aug. 2008.
[19] W. Zhang, V. Betz, and J. Rose, "Portable and Scalable FPGABased Acceleration of a Direct Linear System Solver," Proc. Int'l Conf. FieldProgrammable Technology (FPT '08), pp. 1724, Dec. 2008.
[20] "SRC Supercomputers," http:/www.srccomp.com/, 2008.
[21] "SGI Supercomputers," http:/www.sgi.com/, 2011.
[22] "Cray XD1 Supercomputers," http:/www.cray.com/, 2008.
[23] N. Galoppo, N. Govindaraju, M. Henson, and D. Manocha, "LUGPU: Efficient Algorithms for Solving Dense Linear Systems on Graphics Hardware," Proc. ACM/IEEE Conf. Supercomputing (SC), p. 3, Nov. 2005.
[24] V. Volkov and J.W. Demmel, "Benchmarking GPUs to Tune Dense Linear Algebra," SC '08: Proc. ACM/IEEE Conf. Supercomputing, pp. 111, 2008.
[25] F. Ino, M. Matsui, K. Goda, and K. Hagihara, "Performance Study of LU Decomposition on the Programmable GPU," Proc. Int'l Conf. High Performance Computing (HiPC), vol. 3769, pp. 8394, 2005.
[26] S. Tomov, R. Nath, H. Ltaief, and J. Dongarra, "Dense Linear Algebra Solvers for Multicore with GPU Accelerators," Proc. Int'l Workshop HighLevel Parallel Programming Models and Supportive Environments (HIPS '10), Jan. 2010.
[27] M.K. Jaiswal and N. Chandrachoodan, "Efficient Implementation of FloatingPoint Reciprocator on FPGA," Proc. 22nd Int'l Conf. VLSI Design (VLSID '09). pp. 267271, 2009.
[28] M.K. Jaiswal and N. Chandrachoodan, "Efficient Implementation of IEEE Double Precision FloatingPoint Multiplier on FPGA," Proc. IEEE Region 10 and the Third Int'l Conf. Industrial and Information Systems (ICIIS '08), pp. 14, Dec. 2008.
[29] L. Gopalakrishnan, "QDR II SRAM Interface for Virtex5 Devices," Xilinx Application Note (XAPP853), http://www.xilinx.com/support/documentation/ application_notesxapp853.pdf, Oct. 2008.
[30] J. Sun, G. Peterson, and O. Storaasli, "HighPerformance MixedPrecision Linear Solver for FPGAs," IEEE Trans. Computers, vol. 57, no. 12, pp. 16141623, Dec. 2008.
[31] "AMD Core Math Library (ACML)," http://developer.amd.com/cpu/Libraries/acml/ Pagesdefault.aspx, 2011.
[32] Intel Corporation "Intel Math Kernel Library (Intel MKL) 10.2 InDepth," http://software.intel.com/sites/products/ collateral/hpc/mklmkl_indepth.pdf, 2009.
[33] E. Agullo, J. Demmel, J. Dongarra, B. Hadri, J. Kurzak, J. Langou, H. Ltaief, P. Luszczek, and S. Tomov, "Numerical Linear Algebra on Emerging Architectures: The PLASMA and MAGMA Projects," J. Physics: Conference Series, vol. 180, 2009.
[34] J. Dongarra, "LINPACK Benchmarking and beyond," http://www.netlib.org/utk/people/JackDongarra/ SLIDESdod0610. pdf, June 2010.
[35] J. Humphrey, "CULA 2.2 Sneak Preview," http://www. culatools.com/blog/2010/09/10 cula22sneakpreview/, 2010.