IEEE Transactions on Computers (TC) has moved to the OnlinePlus publication model starting with 2013 issues!

From the August 2014 issue

Algorithm, Architecture, and Floating-Point Unit Codesign of a Matrix Factorization Accelerator

By Ardavan Pedram, Andreas Gerstlauer, and Robert A. van de Geijn

Featured article thumbnail imageThis paper examines the mapping of algorithms encountered when solving dense linear systems and linear least-squares problems to a custom Linear Algebra Processor. Specifically, the focus is on Cholesky, LU (with partial pivoting), and QR factorizations and their blocked algorithms. As part of the study, we expose the benefits of redesigning floating point units and their surrounding data-paths to support these complicated operations. We show how adding moderate complexity to the architecture greatly alleviates complexities in the algorithm. We study design tradeoffs and the effectiveness of architectural modifications to demonstrate that we can improve power and performance efficiency to a level that can otherwise only be expected of full-custom ASIC designs. A feasibility study of inner kernels is extended to blocked level and shows that, at block level, the Linear Algebra Core (LAC) can achieve high efficiencies with up to 45 GFLOPS/W for both Cholesky and LU factorization, and over 35 GFLOPS/W for QR factorization. While maintaining such efficiencies, our extensions to the MAC units can achieve up to 10, 12, and 20 percent speedup for the blocked algorithms of Cholesky, LU, and QR factorization, respectively.

download PDF View the PDF of this article      csdl View this issue in the digital library


Editorials and Announcements

Announcements

New Essential Set

Editorials

Guest Editorials

Reviewers List

Annual Index


Access Recently Published TC Articles

RSS Subscribe to the RSS feed of latest TC content added to the digital library

Mail Sign up for the Transactions Connection newsletter.


Importance of Coherence Protocols with Network Applications on Multi-Core Processors

 

Automated Generation of Performance and Dependability Models for the Assessment of Wireless Sensor Networks

 

IEEE Transactions on Computers (TC) is a monthly publication that publishes research in such areas as computer organizations and architectures, digital devices, operating systems, and new and important applications and trends. 
Read the full scope of TC