
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
Ling Zhuo, Viktor K. Prasanna, "HighPerformance Designs for Linear Algebra Operations on Reconfigurable Hardware," IEEE Transactions on Computers, vol. 57, no. 8, pp. 10571071, August, 2008.  
BibTex  x  
@article{ 10.1109/TC.2008.55, author = {Ling Zhuo and Viktor K. Prasanna}, title = {HighPerformance Designs for Linear Algebra Operations on Reconfigurable Hardware}, journal ={IEEE Transactions on Computers}, volume = {57}, number = {8}, issn = {00189340}, year = {2008}, pages = {10571071}, doi = {http://doi.ieeecomputersociety.org/10.1109/TC.2008.55}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Computers TI  HighPerformance Designs for Linear Algebra Operations on Reconfigurable Hardware IS  8 SN  00189340 SP1057 EP1071 EPD  10571071 A1  Ling Zhuo, A1  Viktor K. Prasanna, PY  2008 KW  Reconfigurable hardware KW  Computations on matrices KW  Parallel algorithms VL  57 JA  IEEE Transactions on Computers ER   
[1] Xilinx Incorporated, http:/www.xilinx.com, 2008.
[2] O. Storaasli, R.C. Singleterry, and S. Brown, “Scientific Computations on a NASA Reconfigurable Hypercomputer,” Proc. Fifth Ann. Int'l Conf. Military and Aerospace Programmable Logic Devices, Sept. 2002.
[3] K.D. Underwood and K.S. Hemmert, “Closing the Gap: CPU and FPGA Trends in Sustainable FloatingPoint BLAS Performance,” Proc. 12th Ann. IEEE Symp. FieldProgrammable Custom Computing Machines, Apr. 2004.
[4] M. Smith, J. Vetter, and X. Liang, “Accelerating Scientific Applications with the SRC6 Reconfigurable Computer: Methodologies and Analysis,” Proc. 19th IEEE Int'l Parallel and Distributed Processing Symp., Apr. 2005.
[5] Z. Guo, W. Najjar, F. Vahid, and K. Vissers, “A Quantitative Analysis of the Speedup Factors of FPGAs over Processors,” Proc. 12th ACM/SIGDA Int'l Symp. Field Programmable Gate Arrays, pp.162170, Feb. 2004.
[6] V. Aggarwal, A. George, and K. Slatton, “Reconfigurable Computing with Multiscale Data Fusion for Remote Sensing,” Proc. 14th ACM/SIGDA Int'l Symp. Field Programmable Gate Arrays, p. 235, Feb. 2006.
[7] S. Bajracharya, C. Shu, K. Gaj, and T. ElGhazawi, “Implementation of Elliptic Curve Cryptosystems over ${\rm gf}(2^{\rm n})$ in Optimal Normal Basis on a Reconfigurable Computer,” Proc. 12th ACM/SIGDA Int'l Symp. Field Programmable Gate Arrays, Feb. 2004.
[8] D.A. Buell and J.P. Davis, “Reconfigurable Computing Applied to Problems in Communications Security,” Proc. Fifth Ann. Int'l Conf. Military and Aerospace Programmable Logic Devices, Sept. 2002.
[9] A. Koohi, N. Bagherzadeh, and C. Pan, “A Fast Parallel ReedSolomon Decoder on a Reconfigurable Architecture,” Proc. First IEEE/ACM/IFIP Int'l Conf. Hardware/Software Codesign and System Synthesis, Oct. 2003.
[10] Cray Inc., http:/www.cray.com/, 2008.
[11] SRC Computers, Inc., http:/www.srccomp.com/, 2008.
[12] Silicon Graphics, Inc., http:/www.sgi.com/, 2008.
[13] D. Bader, B. Moret, and P. Sanders, “HighPerformance Algorithm Engineering for Parallel Computation,” Lecture Notes in Computer Science, vol. 2547, pp. 123, 2002.
[14] C. Lawson, R. Hanson, D. Kincaid, and F. Krogh, “Basic Linear Algebra Subprograms for FORTRAN Usage,” ACM Trans. Math. Software, vol. 5, no. 3, pp. 308323, 1979.
[15] L. Zhuo and V.K. Prasanna, “Scalable and Modular Algorithms for FloatingPoint Matrix Multiplication on FPGAs,” Proc. 18th Int'l Parallel and Distributed Processing Symp., Apr. 2004.
[16] M. Smith, J. Vetter, and S. Alam, “Scientific Computing Beyond CPUs: FPGA Implementations of Common Scientific Kernels,” Proc. Eighth Ann. Int'l Conf. Military and Aerospace Programmable Logic Devices, Sept. 2005.
[17] R. Barrett, M. Berry, T.F. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, and H.V. der Vorst, Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, second ed. SIAM, 1994.
[18] W.H. Press, B.P. Flannery, S.A. Teukolsky, and W.T. Vetterling, Numerical Recipes in C: The Art of Scientific Computing. Cambridge Univ. Press, 1992.
[19] IEEE 754 Standard for Binary FloatingPoint Arithmetic, IEEE, 1984.
[20] R.C. Whaley, A. Petitet, and J.J. Dongarra, “Automated Empirical Optimization of Software and the ATLAS Project,” Parallel Computing, vol. 27, nos. 12, pp. 335, also available as Univ. of Tennessee LAPACK Working Note #147, UTCS00448, 2000 (), 2001.
[21] E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J.D. Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen, “LAPACK User's Guide Third Edition,” www.netlib.org/lapack/lawns/lawn147.pshttp:/ /www.netlib.org/lapack/luglapack_lug.html , Aug. 1999.
[22] L.S. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R.C. Whaley, ScaLAPACK Users' Guide, SIAM, 1997.
[23] A. Chtchelkanova, J. Gunnels, G. Morrow, J. Overfelt, and R. van de Geijn, “Parallel Implementation of BLAS: General Techniques for Level 3 BLAS,” Concurrency: Practice and Experience, vol. 9, no. 9, pp. 837857, 1997.
[24] K. Compton and S. Hauck, “Reconfigurable Computing: A Survey of Systems and Software,” ACM Computing Surveys, vol. 34, no. 2, pp. 171210, June 2002.
[25] G. Govindu, R. Scrofano, and V.K. Prasanna, “A Library of Parameterizable FloatingPoint Cores for FPGAs and Their Application to Scientific Computing,” Proc. Int'l Conf. Eng. Reconfigurable Systems and Algorithms, June 2005.
[26] X. Wang, S. Braganza, and M. Leeser, “Advanced Components in the Variable Precision FloatingPoint Library,” Proc. 14th Ann. IEEE Symp. FieldProgrammable Custom Computing Machines, Apr. 2006.
[27] S. Alam, P. Agarwal, M. Smith, J. Vetter, and D. Caliga, “Using FPGA Devices to Accelerate Biomolecular Simulations,” Computer, vol. 40, no. 3, pp. 6673, Mar. 2007.
[28] D. Benyamin, W. Luk, and J. Villasenor, “Optimizing FPGABased Vector Product Designs,” Proc. Seventh Ann. IEEE Symp. FieldProgrammable Custom Computing Machines, pp. 188197, Apr. 1999.
[29] J.W. Jang, S. Choi, and V.K. Prasanna, “Area and Time Efficient Implementation of Matrix Multiplication on FPGAs,” Proc. First IEEE Int'l Conf. Field Programmable Technology, Dec. 2002.
[30] S. Choi and V.K. Prasanna, “Time and Energy Efficient Matrix Factorization Using FPGAs,” Proc. 13th Int'l Conf. Field Programmable Logic and Applications, Sept. 2003.
[31] Y. Dou, S. Vassiliadis, G. Kuzmanov, and G. Gaydadjiev, “64Bit FloatingPoint FPGA Matrix Multiplication,” Proc. 13th ACM/SIGDA Int'l Symp. Field Programmable Gate Arrays, Feb. 2005.
[32] L. Zhuo and V.K. Prasanna, “Sparse MatrixVector Multiplication on FPGAs,” Proc. 13th ACM/SIGDA Int'l Symp. Field Programmable Gate Arrays, Feb. 2005.
[33] J. Sun, G. Peterson, and O. Storaasli, “Sparse MatrixVector Multiplication Design on FPGAs,” Proc. 15th Ann. IEEE Symp. FieldProgrammable Custom Computing Machines, Apr. 2007.
[34] M. deLorimier and A. DeHon, “FloatingPoint Sparse MatrixVector Multiply for FPGAs,” Proc. 13th ACM/SIGDA Int'l Symp. Field Programmable Gate Arrays, Feb. 2005.
[35] S. Akella, M. Smith, R. Mills, S. Alam, R. Barrett, and J. Vetter, “Sparse MatrixVector Multiplication Kernel on a Reconfigurable Computer,” Proc. Workshop High Performance Embedded Computing, Sept. 2005.
[36] G. Govindu, S. Choi, V.K. Prasanna, V. Daga, S. Gangadharpalli, and V. Sridhar, “A HighPerformance and EnergyEfficient Architecture for FloatingPoint Based LU Decomposition on FPGAs,” Proc. Int'l Conf. Eng. Reconfigurable Systems and Algorithms, June 2004.
[37] V. Daga, G. Govindu, S. Gangadharpalli, V. Sridhar, and V.K. Prasanna, “Efficient FloatingPoint Based Block LU Decomposition on FPGAs,” Proc. Int'l Conf. Eng. Reconfigurable Systems and Algorithms, June 2004.
[38] L. Zhuo and V.K. Prasanna, “Design Tradeoffs for BLAS Operations on Reconfigurable Hardware,” Proc. 34th Int'l Conf. Parallel Processing, June 2005.
[39] L. Zhuo and V.K. Prasanna, “HighPerformance and AreaEfficient Reduction Circuits on FPGAs,” Proc. 17th Int'l Symp. Computer Architecture and High Performance Computing, Oct. 2005.
[40] J. Hong and H. Kung, “I/O Complexity: The Red Blue Pebble Game,” Proc. 13th Ann. ACM Symp. Theory of Computing, pp. 326333, May 1981.
[41] L. Zhuo and V. Prasanna, “Scalable and Modular Algorithms for FloatingPoint Matrix Multiplication on Reconfigurable Computing Systems,” IEEE Trans. Parallel and Distributed Systems, vol. 18, no. 4, pp. 433448, Apr. 2007.
[42] T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein, Introduction to Algorithms, second ed. The MIT Press, 2001.
[43] D. Womble, D. Greenberg, R. Riesen, and S. Wheat, “Out of Core, Out of Mind: Practical Parallel I/O,” Proc. Scalable Parallel Libraries Conf., pp. 1016, citeseer.ist.psu.eduwomble93out.html, 1993.
[44] Mentor Graphics Corp., http:/www.mentor.com/, 2008.
[45] AMD Core Math Library, http://developer.amd.comacml.aspx, 2008.
[46] S. Hunold and T. Rauber, “Automatic Tuning of PDGEMM Towards Optimal Performance,” Proc. European Conf. Parallel Processing, Aug. 2005.