The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.01 - Jan. (2013 vol.24)
pp: 85-91
G. R. Morris , Eng. R&D Center, US Army, Vicksburg, MS, USA
K. H. Abed , Dept. of Comput. Eng., Jackson State Univ., Jackson, MS, USA
ABSTRACT
High-performance heterogeneous computers that employ field programmable gate arrays (FPGAs) as computational elements are known as high-performance reconfigurable computers (HPRCs). For floating-point applications, these FPGA-based processors must satisfy a variety of heuristics and rules of thumb to achieve a speedup compared with their software counterparts. By way of a simple sparse matrix Jacobi iterative solver, this paper illustrates some of the issues associated with mapping floating-point kernels onto HPRCs. The Jacobi method was chosen based on heuristics developed from earlier research. Furthermore, Jacobi is relatively easy to understand, yet is complex enough to illustrate the mapping issues. This paper is not trying to demonstrate the speedup of a particular application nor is it suggesting that Jacobi is the best way to solve equations. The results demonstrate a nearly threefold wall clock runtime speedup when compared with a software implementation. A formal analysis shows that these results are reasonable. The purpose of this paper is to illuminate the challenging floating-point mapping process while simultaneously showing that such mappings can result in significant speedups. The ideas revealed by research such as this have already been and should continue to be used to facilitate a more automated mapping process.
INDEX TERMS
Jacobian matrices, Field programmable gate arrays, Reconfigurable architectures, Iterative methods, Jacobi iterative method, Field programmable gate array (FPGA), reconfigurable computer (RC), high-performance reconfigurable computer (HPRC), high-performance heterogeneous computer (HPHC)
CITATION
G. R. Morris, K. H. Abed, "Mapping a Jacobi Iterative Solver onto a High-Performance Heterogeneous Computer", IEEE Transactions on Parallel & Distributed Systems, vol.24, no. 1, pp. 85-91, Jan. 2013, doi:10.1109/TPDS.2012.121
REFERENCES
[1] A. Roldao and G.A. Constatinides, "A High Throughput FPGA-Based Floating Point Conjugate Gradient Implementation for Dense Matrices," ACM Trans. Reconfigurable Technology and Systems, vol. 3, no. 1, pp. 1-19, 2010.
[2] G.R. Morris and V.K. Prasanna, "An FPGA-Based Floating-Point Jacobi Iterative Solver," Proc. Eigth Int'l Symp. Parallel Architectures, Algorithms, and Networks, pp. 420-427, Dec. 2005.
[3] SRC Computers LLC, "Carte Programming Environment," www.srccomp.com/techpubscarte.asp, 2010.
[4] L. Zhuo, G.R. Morris, and V.K. Prasanna, "High-Performance Reduction Circuits Using Deeply Pipelined Operators on FPGAs," IEEE Trans. Parallel and Distributed Systems, vol. 18, no. 10, pp. 1377-1392, Oct. 2007.
[5] G.R. Morris, R.D. Anderson, and V.K. Prasanna, "An FPGA-Based Application-Specific Processor for Efficient Reduction of Multiple Variable-Length Floating-Point Data Sets," Proc. IEEE 17th Int'l Conf. Application-Specific Systems, Architectures and Processors, pp. 323-330, Sept. 2006.
[6] G.R. Morris, L. Zhuo, and V.K. Prasanna, "High-Performance FPGA-Based General Reduction Methods," Proc. IEEE 13th Symp. Field-Programmable Custom Computing Machines, pp. 323-324, Apr. 2005.
[7] J. Child, "FPGA Boards and Systems Boost UAV Payload Compute Density," COTS J., 2009.
[8] Jane's Information Group, "Tactical Reconnaissance and Counter-Concealment Enabled Radar (TRACER)(United States)," Jane's Electronic Mission Aircraft, 2011.
[9] X. Wang, S. Braganza, and M. Leeser, "Advanced Components in the Variable Precision Floating-Point Library," Proc. IEEE 14th Symp. Field-Programmable Custom Computing Machines, pp. 249-258, Apr. 2006.
[10] G. Govindu, R. Scrofano, and V.K. Prasanna, "A Library of Parameterizable Floating-Point Cores for FPGAs and Their Application to Scientific Computing," Proc. Int'l Conf. Eng. Reconfigurable Systems and Algorithms, pp. 137-148, June 2005.
[11] Altera Corp., "FPGA, CPLD, and ASIC from Altera," www. altera.com, 2011.
[12] N.S. Peay, G.R. Morris, and K.H. Abed, "Integrating Quartus Wizard-Based VHDL Floating-Point Components into a High Performance Heterogeneous Computing Environment," Proc. IEEE SoutheastCon Conf., pp. 413-417, Mar. 2011.
[13] R. Scrofano, M. Gokhale, F. Trouw, and V. Prasanna, "A Hardware/Software Approach to Molecular Dynamics on Reconfigurable Computers," Proc. IEEE 14th Symp. Field-Programmable Custom Computing Machines, pp. 23-24, Apr. 2006.
[14] M.C. Herbordt, M.A. Khan, and T. Dean, "Parallel Discrete Event Simulation of Molecular Dynamics through Event-Based Decomposition," Proc. IEEE 20th Int'l Conf. Application-Specific Systems, Architectures and Processors, pp. 129-136. Sept. 2009.
[15] L. Zhuo and V.K. Prasanna, "Sparse Matrix-Vector Multiplication on FPGAs," Proc. ACM/SIGDA 13th Int'l Symp. Field-Programmable Gate Arrays, pp. 63-74, Feb. 2005.
[16] "High Performance Linear Algebra Operations on Reconfigurable Systems," Proc. ACM/IEEE Super Computing Conf., pp. 2-13, Nov. 2005.
[17] G. Wu, Y. Dou, Y. Lei, J. Zhou, M. Wang, and J. Jiang, "A Fine-Grained Pipelined Implementation of the LINPACK Benchmark on FPGAs," Proc. IEEE 17th Symp. Field-Programmable Custom Computing Machines, pp. 183-190. Apr. 2009.
[18] G.R. Morris and V.K. Prasanna, "A Hybrid Approach for Accelerating a Sparse Matrix Jacobi Solver Using an FPGA-Augmented Reconfigurable Computer," Proc. Ninth Military and Aerospace Programmable Logic Devices Conf., Sept. 2006.
[19] G.R. Morris and V.K. Prasanna, "Sparse Matrix Computations on Reconfigurable Hardware," Computer, vol. 40, no. 3, pp. 58-64, Mar. 2007.
[20] G. Estrin, "Organization of Computer Systems-The Fixed Plus Variable Structure Computer," Proc. Western Joint Computer Conf., pp. 33-40, May 1960.
[21] SRC Computers LLC "General Purpose Reconfigurable Computing Systems," www.srccomp.com/productsmapstation workstations.asp , 2010.
[22] Mentor Graphics, "DK Design Suite," www.mentor.com/ products/fpga/handel-cdk-design-suite , 2010.
[23] L. Zhuo and V.K. Prasanna, "Design Tradeoffs for BLAS Operations on Reconfigurable Hardware," Proc. 34th Int'l Conf. Parallel Processing, pp. 78-86, June 2005.
[24] N. Black and S. Moore, "Stationary Iterative Method," MathWorld—A Wolfram Web Resource, mathworld.wolfram. comStationaryIterativeMethod.html , 2011.
[25] E. Isaacson and H.B. Keller, Analysis of Numerical Methods. John Wiley & Sons, 1966.
[26] D.M. Young, Iterative Solution of Large Linear Systems. Academic Press, 1971.
[27] R.S. Varga, Matrix Iterative Analysis, Second ed. Springer, 2009.
[28] NIST, "Matrix Market," math.nist.govMatrixMarket, June 2004.
[29] Y. Saad, "SPARSKIT: A Basic Tool-Kit for Sparse Matrix Computations (Version 2)," www-users.cs.umn.edu/saad/ softwareSPARSKIT , 2009.
[30] J.L. Rice, K.C. Pace, M.D. Gates, G.R. Morris, and K.H. Abed, "High Performance Reconfigurable Computer Application Design Considerations," Proc. IEEE Southeast Conf., pp. 236-243. Apr. 2008.
[31] J.L. Rice, K.H. Abed, and G.R. Morris, "Design Heuristics for Mapping Floating-Point Scientific Computational Kernels onto High Performance Reconfigurable Computers," J. Computers, vol. 4, no. 6, pp. 542-553, June 2009.
[32] T.A. Davis and Y. Hu, "The University of Florida Sparse Matrix Collection," Trans. Math. Software, vol. 30, www.cise.ufl.edu/research/sparsematrices , Apr. 2011.
[33] Sourceforge, "Matgen," www.matgen.sourceforge.net, 2007.
[34] N. Black, S. Moore, and E.W. Weisstein, "Jacobi Method," MathWorld - A Wolfram Web Resource, mathworld.wolfram.comJacobiMethod.html, 2011.
[35] C.G.J. Jacobi, "ÜBer Ein Leichtes Verfahren, Die in Der Theorie Der SäKularstörungen Vorkommenden Gleichungen Numerisch Aufzulösen," J. für die reine und angewandte Mathematik (Crelle's Journal), vol. 30, pp. 51-94, 1846.
[36] DigiZeitschriften, "Das Deutsche Digital Zeitschriftenarchiv," www.digizeitschriften.de/dms/img?PPN= GDZPPN002144522 , 2011.
[37] H.H. Goldstine, F.J. Murray, and J. von Neumann, "The Jacobi Method For Real Symmetric Matrices," J. ACM, vol. 6, pp. 59-96, Jan. 1959.
[38] D.R. Kincaid, T.C. Oppe, and D.M. Young, "Vector Computations for Sparse Linear Systems," SIAM J. Algebraic and Discrete Methods, vol. 7, no. 1, pp. 99-112, Jan. 1986.
[39] D.M. Young, "Iterative Methods for Solving Partial Difference Equations of Elliptic Type," PhD dissertation, Harvard Univ., May 1950.
[40] M. Benzi, "Key Moments in the History of Numerical Analysis," history.siam.org/pdfnahist_Benzi.pdf, 2008.
[41] R. von Mises and H. Pollaczek-Geiringer, "Praktische Verfahren Der Gleichungsauflösung," Zeitschrift für Angewandte Mathematik und Mechanik, vol. 9, pp. 152-164, 1929.
[42] John Wiley & Sons "Wiley Online Library," onlinelibrary. wiley.com/doi/10.1002/zamm.19290090206 abstract, 2011.
[43] K.R. James, "Convergence of Matrix Iterations Subject to Diagonal Dominance," SIAM J. Numerical Analysis, vol. 10, no. 3, pp. 478-484, June 1973.
[44] R. Bagnara, "A Unified Proof for the Convergence of Jacobi and Gauss-Seidel Methods," SIAM Rev., vol. 37, no. 1, pp. 93-97, Mar. 1995.
[45] T.-Z. Huang and D.J. Evans, "New Estimations of the Spectral Radii of J, G-S and SOR Iterative Matrices for a Class of Linear Systems," Int'l J. Computer Math., vol. 84, pp. 23-31, Jan. 2007.
[46] E.W. Weisstein, "Cauchy's Inequality," MathWorld - A Wolfram Web Resource, mathworld.wolfram.comCauchysInequality.html , 2011.
[47] M.T. Heath, Scientific Computing: An Introductory Survey. McGraw-Hill, 1997.
[48] J.L. Hennessy and D.A. Patterson, Computer Architecture: A Quantitative Approach, third ed. Morgan Kaufman, 2003.
[49] J. Harkins, T. El-Ghazawi, E. El-Araby, and M. Huang, "Performance of Sorting Algorithms on the SRC 6 Reconfigurable Computer," Proc. IEEE Int'l Conf. Field-Programmable Technology, pp. 295-296, Dec. 2005.
[50] K.H. Abed and G.R. Morris, "Improving Performance of Codes with Large/Irregular Stride Memory Access Patterns via High Performance Reconfigurable Computers," Proc. High Performance Computing Modernization Program Users Group Conf., pp. 422-429, June 2009.
[51] M.C. Herbordt, T.V. Court, Y. Gu, B. Sukhwani, A. Conti, J. Model, and D. DiSabello, "Achieving High Performance with FPGA-Based Computing," Computer, vol. 40, no. 3, pp. 50-57, Mar. 2007.
[52] C.E. Shannon, "A Mathematical Theory of Communication," Bell System Technical J., vol. 27, pp. 379-423 and 623-656, July-Oct. 1948.
[53] J.D. McCalpin, "STREAM: Sustainable Memory Bandwidth in High Performance Computers," www.cs.virginia.edustream, 2010.
61 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool