The Community for Technology Leaders
RSS Icon
Issue No.03 - March (2009 vol.20)
pp: 404-418
Nawal Copty , SUN Microsystems, Inc., Santa Clara
Alejandro Duran , Universitat Politècnica de Catalunya, Barcelona Supercomputing Center, Barcelona
Jay Hoeflinger , Intel, Champaign
Yuan Lin , SUN Microsystems, Inc., Santa Clara
Federico Massaioli , CASPUR, Rome
Xavier Teruel , Universitat Politècnica de Catalunya, Barcelona Supercomputing Center, Barcelona
Priya Unnikrishnan , IBM, Toronto
Guansong Zhang , IBM, Toronto
OpenMP has been very successful in exploiting structured parallelism in applications. With increasing application complexity, there is a growing need for addressing irregular parallelism in the presence of complicated control structures. This is evident in various efforts by the industry and research communities to provide a solution to this challenging problem. One of the primary goals of OpenMP 3.0 is to define a standard dialect to express and efficiently exploit unstructured parallelism. This paper presents the design of the OpenMP tasking model by members of the OpenMP 3.0 tasking sub-committee which was formed for this purpose. The paper summarizes the efforts of the sub-committee (spanning over two years) in designing, evaluating and seamlessly integrating the tasking model into the OpenMP specification. In this paper, we present the design goals and key features of the tasking model, including a rich set of examples and an in-depth discussion of the rationale behind various design choices. We compare a prototype implementation of the tasking model with existing models, and evaluate it on a wide range of applications. The comparison shows that the OpenMP tasking model provides expressiveness, flexibility, and huge potential for performance and scalability.
Concurrent, distributed, and parallel languages, Concurrent Programming, Concurrent programming structures
Nawal Copty, Alejandro Duran, Jay Hoeflinger, Yuan Lin, Federico Massaioli, Xavier Teruel, Priya Unnikrishnan, Guansong Zhang, "The Design of OpenMP Tasks", IEEE Transactions on Parallel & Distributed Systems, vol.20, no. 3, pp. 404-418, March 2009, doi:10.1109/TPDS.2008.105
[1] M. Frigo, C.E. Leiserson, and K.H. Randall, “The Implementation of the Cilk-5 Multithreaded Language,” Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation (PLDI '98), pp.212-223, 1998.
[2] S. Shah, G. Haab, P. Petersen, and J. Throop, “Flexible Control Structures for Parallelism in OpenMP,” Proc. First European Workshop OpenMP (EWOMP '99), Sept. 1999.
[3] J. Balart, A. Duran, M. Gonzàlez, X. Martorell, E. Ayguadé, and J. Labarta, “Nanos Mercurium: A Research Compiler for OpenMP,” Proc. Sixth European Workshop OpenMP (EWOMP'04), pp. 103-109, Sept. 2004.
[4] OpenMP Application Program Interface, Version 2.5, OpenMP Architecture Review Board, May 2005.
[5] F. Massaioli, F. Castiglione, and M. Bernaschi, “OpenMP Parallelization of Agent-Based Models,” Parallel Computing, vol. 31, nos. 10-12, pp. 1066-1081, 2005.
[6] R. Blikberg and T. Sørevik, “Load Balancing and OpenMP Implementation of Nested Parallelism,” Parallel Computing, vol. 31, nos. 10-12, pp. 984-998, 2005.
[7] S. Salvini, “Unlocking the Power of OpenMP,” Proc. Fifth European Workshop OpenMP (EWOMP '03), invited lecture, Sept. 2003.
[8] F.G.V. Zee, P. Bientinesi, T.M. Low, and R.A. van de Geijn, “Scalable Parallelization of FLAME Code via the Workqueuing Model,” ACM Trans. Math. Software, submitted, 2006.
[9] J. Kurzak and J. Dongarra, Implementing Linear Algebra Routines on Multi-Core Processors with Pipelining and a Look Ahead, Dept. Computer Science, Univ. of Tennessee, LAPACK Working Note 178, Sept. 2006.
[10] K.M. Chandy and C. Kesselman, “Compositional C++: Compositional Parallel Programming,” Technical Report CaltechCSTR: 1992.cs-tr-92-13, California Inst. Tech nology, 1992.
[11] M. Gonzàlez, E. Ayguadé, X. Martorell, and J. Labarta, “Exploiting Pipelined Executions in OpenMP,” Proc. 32nd Ann. Int'l Conf. Parallel Processing (ICPP '03), Oct. 2003.
[12] J. Reinders, Intel Threading Building Blocks. O'Reilly Media Inc., 2007.
[13] D. Leijen and J. Hall, “Optimize Managed Code for Multi-Core Machines,” MSDN Magazine, pp. 1098-1116, Oct. 2007.
[14] T.X.D. Team, “Report on the Experimental Language X10,” technical report, IBM, Feb. 2006.
[15] D. Callahan, B.L. Chamberlain, and H.P. Zima, “The Cascade High Productivity Language,” Proc. Ninth Int'l Workshop High-Level Parallel Programming Models and Supportive Environments (HIPS '04), pp. 52-60, Apr. 2004.
[16] The Fortress Language Specification, Version 1.0 B, Mar. 2007.
[17] J. Subhlok, J.M. Stichnoth, D.R. O'Hallaron, and T. Gross, “Exploiting Task and Data Parallelism on a Multicomputer,” Proc. Fourth ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPOPP '93), pp. 13-22, 1993.
[18] S. Chakrabarti, J. Demmel, and K. Yelick, “Modeling the Benefits of Mixed Data and Task Parallelism,” Proc. Seventh Ann. ACM Symp. Parallel Algorithms and Architectures (SPAA '95), pp. 74-83, 1995.
[19] T. Rauber and G. Rünger, “Tlib: A Library to Support Programming with Hierarchical Multi-Processor Tasks,” J. Parallel and Distributed Computing, vol. 65, no. 3, pp. 347-360, 2005.
[20] S. Ramaswamy, S. Sapatnekar, and P. Banerjee, “A Framework for Exploiting Task and Data Parallelism on Distributed Memory Multicomputers,” IEEE Trans. Parallel and Distributed Systems, vol. 8, no. 11, pp. 1098-1116, Nov. 1997.
[21] H. Bal and M. Haines, “Approaches for Integrating Task and Data Parallelism,” IEEE Concurrency, see also IEEE Parallel and Distributed Technology, vol. 6, no. 3, pp. 74-84, July-Sept. 1998.
[22] X. Teruel, X. Martorell, A. Duran, R. Ferrer, and E. Ayguadé, “Support for OpenMP Tasks in Nanos v4,” Proc. Conf. Center for Advanced Studies on Collaborative Research (CASCON '07), Oct. 2007.
[23] J. Balart, A. Duran, M. Gonzàlez, X. Martorell, E. Ayguadé, and J. Labarta, “Nanos Mercurium: A Research Compiler for OpenMP,” Proc. Sixth European Workshop OpenMP (EWOMP'04), Oct. 2004.
[24] C. Polychronopoulos, “Nano-Threads: Compiler Driven Multithreading,” Proc. Fourth Int'l Workshop Compilers for Parallel Computing (CPC '93), Dec. 1993.
[25] P.C. Fischer and R.L. Probert, “Efficient Procedures for Using Matrix Algorithms,” Proc. Second Int'l Colloquium Automata, Languages and Programming (ICALP '74), pp. 413-427, 1974.
[26] J. Cooley and J. Tukey, “An Algorithm for the Machine Calculation of Complex Fourier Series,” Math. Computation, vol. 19, pp. 297-301, 1965.
[27] E. Ayguadé, A. Duran, J. Hoeflinger, F. Massaioli, and X. Teruel, “An Experimental Evaluation of the New OpenMP Tasking Model,” Proc. 20th Int'l Workshop Languages and Compilers for Parallel Computing (LCPC '07), Oct. 2007.
16 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool