From the August 2014 issue
Algorithm, Architecture, and Floating-Point Unit Codesign of a Matrix Factorization Accelerator
By Ardavan Pedram, Andreas Gerstlauer, and Robert A. van de Geijn
This paper examines the mapping of algorithms encountered when solving dense linear systems and linear least-squares problems to a custom Linear Algebra Processor. Specifically, the focus is on Cholesky, LU (with partial pivoting), and QR factorizations and their blocked algorithms. As part of the study, we expose the benefits of redesigning floating point units and their surrounding data-paths to support these complicated operations. We show how adding moderate complexity to the architecture greatly alleviates complexities in the algorithm. We study design tradeoffs and the effectiveness of architectural modifications to demonstrate that we can improve power and performance efficiency to a level that can otherwise only be expected of full-custom ASIC designs. A feasibility study of inner kernels is extended to blocked level and shows that, at block level, the Linear Algebra Core (LAC) can achieve high efficiencies with up to 45 GFLOPS/W for both Cholesky and LU factorization, and over 35 GFLOPS/W for QR factorization. While maintaining such efficiencies, our extensions to the MAC units can achieve up to 10, 12, and 20 percent speedup for the blocked algorithms of Cholesky, LU, and QR factorization, respectively.
Editorials and Announcements
- Dr. Paolo Montuschi Announced as New Editor-in-Chief of the IEEE Transactions on Computers
- Get Your Journals as eBooks for Free
- eBooks of issues of TC can now be downloaded from the Computer Society Digital Library
- IEEE Transactions on Computers EIC Albert Zomaya receives two IEEE awards.
New Essential Set
- "Cloud Computing" available at computer.org/store
- "Industrial Implementations of Floating-Point Units" available at computer.org/store
- State of the Journal by Albert Y. Zomaya (Jan 2012)
- State of the Journal by Albert Y. Zomaya (May 2011)
- Special Section on Computer Arithmetic (August 2014)
- Special Issue on Network-on-Chip (March 2014)
- Special Issue on Cloud of Clouds (January 2014)
- Special Section on Concurrent On-Line Testing and Error/Fault Resilience of Digital Systems (September 2011)
Access Recently Published TC Articles
Subscribe to the RSS feed of latest TC content added to the digital library
Sign up for the Transactions Connection newsletter.
Importance of Coherence Protocols with Network Applications on Multi-Core Processors
Automated Generation of Performance and Dependability Models for the Assessment of Wireless Sensor Networks