Issue No. 02 - February (2012 vol. 23)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TPDS.2011.141
Yi-Gang Tai , University of Texas at San Antonio, San Antonio
Chia-Tien Dan Lo , Southern Polytechnic State University, Marietta
Kleanthis Psarris , University of Texas at San Antonio, San Antonio
Many scientific or engineering applications involve matrix operations, in which reduction of vectors is a common operation. If the core operator of the reduction is deeply pipelined, which is usually the case, dependencies between the input data elements cause data hazards. To tackle this problem, we propose a new reduction method with low latency and high pipeline utilization. The performance of the proposed design is evaluated for both single data set and multiple data set scenarios. Further, QR decomposition is used to demonstrate how the proposed method can accelerate its execution. We implement the design on an FPGA and compare its results to other methods.
Reconfigurable hardware, pipeline processors, parallel algorithms, parallel and vector implementations, algorithm design and analysis.
Y. Tai, K. Psarris and C. D. Lo, "Accelerating Matrix Operations with Improved Deeply Pipelined Vector Reduction," in IEEE Transactions on Parallel & Distributed Systems, vol. 23, no. , pp. 202-210, 2011.