The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.01 - January (2011 vol.22)
pp: 7-21
Bernd Burgstaller , Yonsei University, Seoul
Raymes Khoury , The University of Sydney, Sydney
ABSTRACT
Matrix languages, including MATLAB and Octave, are established standards for applications in science and engineering. They provide interactive programming environments that are easy to use due to their script languages with matrix data types. Current implementations of matrix languages do not fully utilize high-performance, special-purpose chip architectures, such as the IBM PowerXCell processor (Cell). We present a new framework that extends Octave to harvest the computational power of the Cell. With this framework, the programmer is alleviated of the burden of introducing explicit notions of parallelism. Instead, the programmer uses a new matrix data type to execute matrix operations in parallel on the synergistic processing elements (SPEs) of the Cell. We employ lazy evaluation semantics for our new matrix data type to obtain execution traces of matrix operations. Traces are converted to data dependence graphs; operations in the data dependence graph are lowered (split into submatrices), scheduled and executed on the SPEs. Thereby, we exploit 1) data parallelism, 2) instruction level parallelism, 3) pipeline parallelism, and 4) task parallelism of matrix language programs. We conducted extensive experiments to show the validity of our approach. Our Cell-based implementation achieves speedups of up to a factor of 12 over code run on recent Intel Core2 Quad processors.
INDEX TERMS
Programming languages, lazy evaluation, scheduling, data partitioning, math script languages, Cell Broadband Engine architecture.
CITATION
Bernd Burgstaller, Raymes Khoury, "Accelerating the Execution of Matrix Languages on the Cell Broadband Engine Architecture", IEEE Transactions on Parallel & Distributed Systems, vol.22, no. 1, pp. 7-21, January 2011, doi:10.1109/TPDS.2010.58
REFERENCES
[1] MathWorks, "MATLAB," http:/www.mathworks.com.au/, 2010.
[2] GNU, "Octave," http://www.gnu.org/software/octave, 2010 .
[3] R. Choy and A. Edelman, "Parallel MATLAB: Doing It Right," Proc. IEEE, vol. 93, no. 2, pp. 331-341, Feb. 2005.
[4] J. Kepner, "MatlabMPI," J. Parallel Distributed Computing, vol. 64, no. 8, pp. 997-1005, 2004.
[5] J. Fernandez, M. Anguita, E. Ros, and J. Bernier, "SCE Toolboxes for the Development of High-Level Parallel Applications," Lecture Notes in Computer Science, p. 518, Springer, 2006.
[6] N.T. Bliss and J. Kepner, "pMATLAB Parallel MATLAB Library," Int'l J. High Performance Computing Applications, vol. 21, no. 3, pp. 336-359, 2007.
[7] N. Travinin, H. Hoffmann, R. Bond, H. Chan, J. Kepner, and E. Wong, "pMapper: Automatic Mapping of Parallel MATLAB Programs," Proc. High Performance Computing Modernization Program Users Group Conf., pp. 254-261, 2005.
[8] G. Sharma and J. Martin, "MATLAB: A Language for Parallel Computing," Int'l J. Parallel Programming, vol. 37, no. 1, pp. 3-36, 2009.
[9] Interactive Supercomputing, "Star-P—Parallel Computing without Parallel Programming," White Paper, 2008.
[10] L. De Rose and D. Padua, "A MATLAB to Fortran 90 Translator and Its Effectiveness," Proc. 10th Int'l Conf. Supercomputing, pp. 309-316, 1996.
[11] R.C. Whaley and J. Dongarra, "Automatically Tuned Linear Algebra Software," Proc. SuperComputing: High Performance Networking and Computing, 1998.
[12] AccelerEyes, "Accelereyes—MATLAB GPU Computing," http:/www.accelereyes.com/, Mar. 2009.
[13] P. Messmer, P. Mullowney, and B. Granger, "GPULib: GPU Computing in High-Level Languages," Computing Science Eng., vol. 10, no. 5, pp. 70-73, 2008.
[14] K. Barker, K. Davis, A. Hoisie, D. Kerbyson, M. Lang, S. Pakin, and J. Sancho, "Entering the Petaflop Era: The Architecture and Performance of Roadrunner," Proc. ACM/IEEE Conf. Supercomputing, 2008.
[15] M. Snir and S. Otto, MPI—The Complete Reference: The MPI Core. MIT Press, 1998.
[16] W. Blume, R. Doallo, R. Eigenmann, J. Grout, J. Hoeflinger, T. Lawrence, J. Lee, D. Padua, Y. Paek, B. Pottenger, L. Rauchwerger, and P. Tu, "Parallel Programming with Polaris," Computer, vol. 29, no. 12, pp. 78-82, Dec. 1996.
[17] G. Almasi and D.A. Padua, "MaJIC: A MATLAB Just-In-Time Compiler," Languages and Compilers for Parallel Computing, pp. 68-81, Springer, 2001.
[18] M.J. Quinn, A. Malishevsky, and N. Seelam, "Otter: Bridging the Gap between MATLAB and ScaLAPACK," Proc. Seventh IEEE Int'l Symp. High Performance Distributed Computing (HPDC '98), p. 114, 1998.
[19] P. Banerjee, N. Shenoy, A. Choudhary, S. Hauck, C. Bachmann, M. Haldar, P. Joisha, A. Jones, A. Kanhare, A. Nayak, S. Periyacheri, M. Walkden, and D. Zaretsky, "A MATLAB Compiler for Distributed, Heterogeneous, Reconfigurable Computing Systems," Proc. IEEE Symp. Field-Programmable Custom Computing Machines, pp. 39-48, 2000.
[20] AccelerEyes, "Jacket User Guide," http:/www.accelereyes.com/, Mar. 2009.
[21] The GP-you Group, "GPUmat: GPU Toolbox for MATLAB," http:/gp-you.org/, Oct. 2009.
[22] P. Hudak, "Conception, Evolution, and Application of Functional Programming Languages," ACM Computing Surveys, vol. 21, no. 3, pp. 359-411, 1989.
[23] D.A. Patterson and J.L. Hennessy, Computer Architecture: A Quantitative Approach, fourth ed. Morgan Kaufmann, 2007.
[24] R. Khoury, B. Burgstaller, and B. Scholz, "Accelerating the Execution of Matrix Languages on the Cell Broadband Engine Architecture," arXiv.org, vol. arXiv:0910.2324v2 [cs.PL], http://arxiv.org/abs0910.2324v2, Nov. 2009.
[25] M.R. Garey and D.S. Johnson, Computers and Intractability; A Guide to the Theory of NP-Completeness. W.H. Freeman & Co., 1990.
[26] Y.-K. Kwok and I. Ahmad, "Static Scheduling Algorithms for Allocating Directed Task Graphs to Multiprocessors," ACM Computing Surveys, vol. 31, no. 4, pp. 406-471, 1999.
[27] R.H. Möhring, M.W. Schäffter, and A.S. Schulz, "Scheduling Jobs with Communication Delays: Using Infeasible Solutions for Approximation," Proc. Fourth European Symp. Algorithms, pp. 76-90, 1996.
[28] E. Kreyszig, Advanced Engineering Mathematics, ninth ed. Wiley, Nov. 2005.
[29] M. Gschwind, H.P. Hofstee, B. Flachs, M. Hopkins, Y. Watanabe, and T. Yamazaki, "Synergistic Processing in Cell's Multicore Architecture," IEEE Micro, vol. 26, no. 2, pp. 10-24, Mar. 2006.
[30] R.E. Bryant and D.R. O'Hallaron, Computer Systems: A Programmer's Perspective. Prentice Hall, 2003.
[31] W.P. Adams and R.J. Forrester, "A Simple Recipe for Concise Mixed 0-1 Linearizations," Operations Research Letters, vol. 33, no. 1, pp. 55-61, 2005.
[32] IBM, Cell Broadband Engine Programming Handbook. IBM, May 2008.
[33] IBM, Cell SDK Example Library API Reference 3.1. IBM, Sept. 2008.
[34] Ilog Inc., "Solver Cplex," http://www.ilog.fr/productscplex/, 2003.
23 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool