|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| Yi-Gang Tai, Chia-Tien Dan Lo, Kleanthis Psarris, "Accelerating Matrix Operations with Improved Deeply Pipelined Vector Reduction," IEEE Transactions on Parallel and Distributed Systems, vol. 23, no. 2, pp. 202-210, February, 2012. | |||
| BibTex | x | ||
| @article{ 10.1109/TPDS.2011.141, author = {Yi-Gang Tai and Chia-Tien Dan Lo and Kleanthis Psarris}, title = {Accelerating Matrix Operations with Improved Deeply Pipelined Vector Reduction}, journal ={IEEE Transactions on Parallel and Distributed Systems}, volume = {23}, number = {2}, issn = {1045-9219}, year = {2012}, pages = {202-210}, doi = {http://doi.ieeecomputersociety.org/10.1109/TPDS.2011.141}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Parallel and Distributed Systems TI - Accelerating Matrix Operations with Improved Deeply Pipelined Vector Reduction IS - 2 SN - 1045-9219 SP202 EP210 EPD - 202-210 A1 - Yi-Gang Tai, A1 - Chia-Tien Dan Lo, A1 - Kleanthis Psarris, PY - 2012 KW - Reconfigurable hardware KW - pipeline processors KW - parallel algorithms KW - parallel and vector implementations KW - algorithm design and analysis. VL - 23 JA - IEEE Transactions on Parallel and Distributed Systems ER - | |||
[1] Xilinx Floating-Point Operator v3.0, Xilinx, Inc., http://www. xilinx.com/support/documentation/ ip_documentationfloating_ point_ds335.pdf , Sept. 2006.
[2] Y.-G. Tai, C.-T. D. Lo, and K. Psarris, "An Improved Reduction Algorithm with Deeply Pipelined Operators," Proc. IEEE Int'l Conf. Systems, Man and Cybernetics (SMC '09), pp. 3060-3065, Oct. 2009.
[3] Y.-G. Tai, C.-T. D. Lo, and K. Psarris, "Multiple Data Set Reduction on FPGAs," Proc. Int'l Conf. Field-Programmable Technology (FPT '10), Dec. 2010.
[4] P.M. Kogge, The Architecture of Pipelined Computers. McGraw-Hill, 1981.
[5] L.M. Ni and K. Hwang, "Vector-Reduction Techniques for Arithmetic Pipelines," IEEE Trans. Computer, vol. C-34, no. 5, pp. 404-411, May 1985.
[6] H. Sips and H. Lin, "An Improved Vector-Reduction Method," IEEE Trans. Computer, vol. 40, no. 2, pp. 214-217, Feb. 1991.
[7] G.R. Morris, V.K. Prasanna, and R.D. Anderson, "A Hybrid Approach for Mapping Conjugate Gradient onto an FPGA-Augmented Reconfigurable Supercomputer," Proc. 14th Ann. IEEE Symp. Field-Programmable Custom Computing Machines (FCCM '06), pp. 3-12, 2006.
[8] G.R. Morris, V.K. Prasanna, and R.D. Anderson, "An FPGA-Based Application-Specific Processor for Efficient Reduction of Multiple Variable-Length Floating-Point Data Sets," Proc. 17th IEEE Int'l Conf. Application-Specific Systems, Architectures and Processors (ASAP '06), pp. 323-330, 2006.
[9] L. Zhuo, G.R. Morris, and V.K. Prasanna, "Designing Scalable FPGA-Based Reduction Circuits Using Pipelined Floating-Point Cores," Proc. 19th IEEE Int'l Parallel and Distributed Processing Symp. (IPDPS '05) p. 147a, 2005.
[10] L. Zhuo and V.K. Prasanna, "High-Performance and Area-Efficient Reduction Circuits on FPGAs," Proc. 17th Int'l Symp. Computer Architecture and High Performance Computing, Oct. 2005.
[11] G.R. Morris, L. Zhuo, and V.K. Prasanna, "High-Performance FPGA-Based General Reduction Methods," Proc. 10th IEEE Symp. Field-Programmable Custom Computing Machines (FCCM '05), Apr. 2005.
[12] L. Zhuo, G.R. Morris, and V.K. Prasanna, "High-Performance Reduction Circuits Using Deeply Pipelined Operators on FPGAs," IEEE Trans. Parallel Distributed Systems, vol. 18, no. 10, pp. 1377-1392, Oct. 2007.
[13] Y.-G. Tai, C.-T. D. Lo, and K. Psarris, "Applying Out-of-Core QR Decomposition Algorithms on FPGA-Based Systems," Proc. 17th Int'l Conf. Field Programmable Logic and Applications (FPL '07), 2007.
[14] Y.-G. Tai, C.-T. D. Lo, and K. Psarris, "Accelerating Matrix Decomposition with Replications," Proc. 15th Reconfigurable Architectures Workshop (RAW '08), 2008.
[15] B.C. Gunter and R.A.V.D. Geijn, "Parallel Out-of-Core Computation and Updating of the QR Factorization," ACM Trans. Math. Software, vol. 31, no. 1, pp. 60-78, 2005.
[16] A. Buttari, J. Langou, J. Kurzak, and J. Dongarra, "Parallel Tiled QR Factorization For Multicore Architectures," technical report, LAPack Working Notes #190, http://www.netlib.org/lapack/lawnspdflawn190.pdf , 2007.
[17] B. Hadri, H. Ltaief, E. Agullo, and J. Dongarra, "Enhancing Parallelism of Tile QR Factorization for Multicore Architectures," technical report, LAPack Working Notes #222, Innovative Computing Laboratory, Univ. of Tennessee, http://www.netlib.org/lapack/lawnspdflawn222.pdf , 2009.
[18] Virtex-II Pro / Virtex-II Pro X Complete Data Sheet, Xilinx, Inc., http://direct.xilinx.com/bvdocs/publications ds083.pdf, 2007.
[19] Virtex-4 Family Overview, Xilinx, Inc., http://www.xilinx.com/support/documentation/ data_sheetsds112.pdf, 2007.
[20] Virtex-5 Family Overview, Xilinx, Inc., http://www.xilinx.com/support/documentation/ data_sheetsds100.pdf, 2009.

