The Community for Technology Leaders
RSS Icon
Issue No.08 - Aug. (2012 vol.23)
pp: 1369-1386
Javier Diaz , Indiana University, Bloomington
Camelia Muñoz-Caro , Universidad de Castilla-La Mancha, Ciudad Real
Alfonso Niño , Universidad de Castilla-La Mancha, Ciudad Real
In this work, we present a survey of the different parallel programming models and tools available today with special consideration to their suitability for high-performance computing. Thus, we review the shared and distributed memory approaches, as well as the current heterogeneous parallel programming model. In addition, we analyze how the partitioned global address space (PGAS) and hybrid parallel programming models are used to combine the advantages of shared and distributed memory systems. The work is completed by considering languages with specific parallel support and the distributed programming paradigm. In all cases, we present characteristics, strengths, and weaknesses. The study shows that the availability of multi-core CPUs has given new impulse to the shared memory parallel programming approach. In addition, we find that hybrid parallel programming is the current way of harnessing the capabilities of computer clusters with multi-core nodes. On the other hand, heterogeneous programming is found to be an increasingly popular paradigm, as a consequence of the availability of multi-core CPUs+GPUs systems. The use of open industry standards like OpenMP, MPI, or OpenCL, as opposed to proprietary solutions, seems to be the way to uniformize and extend the use of parallel programming models.
Parallelism and concurrency, distributed programming, heterogeneous (hybrid) systems.
Javier Diaz, Camelia Muñoz-Caro, Alfonso Niño, "A Survey of Parallel Programming Models and Tools in the Multi and Many-Core Era", IEEE Transactions on Parallel & Distributed Systems, vol.23, no. 8, pp. 1369-1386, Aug. 2012, doi:10.1109/TPDS.2011.308
[1] D. Kirk and W. Hwu, Programming Massively Parallel Processors: A Hands-on Approach. Morgan Kaufmann, 2010.
[2] W. Hwu, K. Keutzer, and T.G. Mattson, "The concurrency challenge," IEEE Design and Test of Computers, vol. 25, no. 4, pp. 312-320, July 2008.
[3] H. Sutter and J. Larus, "Software and the Concurrency Revolution," ACM Queue, vol. 3, no. 7, pp. 54-62, 2005.
[4] W-C. Feng and P. Balaji, "Tools and Environments for Multicore and Many-Core Architectures," Computer, vol. 42, no. 12, pp. 26-27, Dec. 2009.
[5] R.R. Loka, W-C. Feng, and P. Balaji, "Serial Computing Is Not Dead," Computer, vol. 43, no. 9, pp. 6-9, Mar. 2010.
[6] J. Dongarra, I. Foster, G. Fox, W. Gropp, K. Kennedy, L. Torczon, and A. White, The Sourcebook of Parallel Computing. Morgan Kaufmann Publishers, 2003.
[7] H. Kasim, V. March, R. Zhang, and S. See, "Survey on Parallel Programming Model," Proc. IFIP Int'l Conf. Network and Parallel Computing, vol. 5245, pp. 266-275, Oct. 2008.
[8] M.J. Sottile, T.G. Mattson, and C.E. Rasmussen, Introduction to Concurrency in Programming Languages. CRC Press, 2010.
[9] G.R. Andrews, Foundations of Multithreaded, Parallel, and Distributed Programming. Addison Wesley, 1999.
[10] T.G. Mattson, B.A. Sanders, and B. Massingill, Patterns for Parallel Programming. Addison-Wesley Professional, 2005.
[11] V. Kumar, A. Grama, A. Gupta, and G. Karypis, Introduction to Parallel Computing: Design and Analysis of Algorithms. Benjamin/Cummings Publishing Company, 1994.
[12] G. Fox, M. Johnson, G. Lyzenga, S. Otto, J. Salmon, and D. Walker, Solving Problems on Concurrent Processors, vol. 1. Prentice Hall, 1988.
[13] M.J. Quinn, Parallel Computing: Theory and Practice. McGraw-Hill, 1994.
[14] P.B. Hansen, Studies in Computational Science: Parallel Programming Paradigms. Prentice-Hall, 1995.
[15] K.M. Chandy and J. Misra, Parallel Program Design: A Foundation. Addison-Wesley, 1988.
[16] OpenMP, "API Specification for Parallel Programming," , Oct. 2011.
[17] W. Gropp, S. Huss-Lederman, A. Lumsdaine, E. Lusk, B. Nitzberg, W. Saphir, and M. Snir, MPI: The Complete Reference, the MPI-2 Extensions, vol. 2. The MIT Press, Sept. 1998.
[18] K. Kedia, "Hybrid Programming with OpenMP and MPI," Technical Report 18.337J, Massachusetts Inst. of Tech nology, May 2009.
[19] D.A. Jacobsen, J.C. Thibaulty, and I. Senocak, "An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters," Proc. 48th AIAA Aerospace Sciences Meeting Including the New Horizons Forum and Aerospace Exposition, Jan. 2010.
[20] C.-T. Yang, C.-L. Huang, and C.-F. Li, "Hybrid CUDA, OpenMP, and MPI Parallel Programming on Multicore GPU Clusters," Computer Physics Comm., vol. 182, no. 1, 2011.
[21] POSIX 1003.1 FAQ,\posix_faq.html , Oct. 2011.
[22] D.R. Butenhof, Programming with POSIX Threads. Addison-Wesley, 1997.
[23] IEEE, "IEEE P1003.1c/D10: Draft Standard for Information Technology - Portable Operating Systems Interface (POSIX)," Sept. 1994.
[24] A. Grama, G. Karypis, V. Kumar, and A. Gupta, Introduction to Parallel Computing, second ed. Addison-Wesley, 2003.
[25] B. Chapman, G. Jost, and R. van der Pas, Using, OpenMP: Portable Shared Memory Parallel Programming. MIT Press, 2007.
[26] OpenMP 3.0 Specification, documentsspec30.pdf , Oct. 2011.
[27] P.S. Pacheco, Parallel Programming with MPI. Morgan Kaufmann, 1996.
[28] W. Gropp, E. Lusk, and A. Skjellum, Using MPI: Portable Parallel Programming with the Message-Passing Interface, second ed. MIT Press, 1999.
[29] W. Gropp, E. Lusk, and R. Thakur, Using MPI-2: Advanced Features of the Message-Passing Interface. MIT Press, 1999.
[30] Globus, http:/, Oct. 2011.
[31] Message Passing Interface Forum, "MPI-2: Extensions to the Message-Passing Interface," July 1997.
[32] W. Gropp and R. Thakur, "Thread Safety in an MPI Implementation: Requirements and Analysis," Parallel Computing, vol. 33, no. 9, pp. 595-604, Sept. 2007.
[33] M. Valiev, E.J. Bylaska, N. Govind, K. Kowalski, T.P. Straatsma, H.J.J. van Dam, D. Wang, J. Nieplocha, E. Apra, T.L. Windus, and W.A. de Jong, "NWChem: A Comprehensive and Scalable Open-Source Solution for Large Scale Molecular Simulations," Computer Physics Comm. vol. 181, pp. 1477-1589, http:/www.nwchem-sw. org, Oct. 2011.
[34] M.S. Gordon and M.W. Schmidt, "Advances in electronic structure theory: GAMESS a decade later," Theory and Applications of Computational Chemistry, the First Forty Years, C.E. Dykstra, G. Frenking, K.S. Kim, G.E. Scuseria, eds., Chapter 41, pp 1167-1189, Elsevier, GAMESS.html , Oct. 2011.
[35] H. Lin, X. Ma, W. Feng, and N. Samatova, "Coordinating Computation and I/O in Massively Parallel Sequence Search," IEEE Trans. Parallel and Distributed Systems, vol. 22, no. 4, pp. 529-543, http:/, Oct. 2011.
[36] M. Macedonia, "The GPU Enters Computing's Mainstream," Computer, vol. 36, no.10, pp. 106-108, Oct. 2003.
[37] AMD Fusion, fusion.aspx, Oct. 2011.
[38] Sandy Bridge, sandy-bridge/, Oct. 2011.
[39] I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan, "Brook for GPUs: Stream Computing on Graphics Hardware," Proc. SIGGRAPH, 2004.
[40] W.R. Mark, R.S. Glanville, K. Akeley, M.J. Kilgard, "Cg: A System for Programming Graphics Hardware in a C-Like Language," Proc. SIGGRAPH, 2003.
[41] CUDA Zone, new.html , Oct. 2011.
[42] Khronos Group, http://www.khronos.orgopencl, Oct. 2011.
[43] Microsoft DirectX Developer Center, , Oct. 2011.
[44] Sophisticated Library for Vector Parallelism: Intel Array Building Blocks, Intel; intel- array-building-blocks, 2010.
[45] Nvidia Developer Zone, http:/, Oct. 2011.
[46] Nvidia Company. Nvidia CUDA Programming Guide, v3.0, 2010.
[47] Nvidia Company. Nvidia CUDA C Programming Best Practices Guide, Version 3.0, 2010.
[48] Michael Wolfe, "Compilers and More: Knights Ferry Versus Fermi," HPCwire, Aug. 2010.
[49] K. Skaugen, "Petascale to Exascale. Extending Intel's HPC Commitment"," Proc. Int'l Supercomputing Conf. (ISC '10), 2010.
[50] OpenCL 1.1 Specification, \opencl-1.1.pdf, Oct. 2011.
[51] Introduction to OpenCL, stream-technology/opencl/pagesopencl-intro. aspx , Oct. 2011.
[52] W.D. Hillis and G.L. Steele, "Data Parallel Algorithms," Comm. ACM, vol. 29, pp. 1170-1183, 1986.
[53] M. Quinn, Parallel Programming in C with MPI and OpenMP. McGraw-Hill, 2004.
[54] OpenCL 1.1 C++ Bindings Specification, http://www.khronos. org/\ registry/cl/specs opencl-cplusplus-1.1.pdf, Oct. 2011.
[55] Shader Model 5 (Microsoft MSDN), = vs.85).aspx , Oct. 2011.
[56] A. Ghuloum et al., "Future-Proof Data Parallel Algorithms and Software on Intel Multi-Core Architecture," Intel Technology J., vol. 11, no. 4, pp. 333-347, 2007.
[57] W. Kim, M. Voss, "Multicore Desktop Programming with Intel Threading Building Blocks," IEEE Software, vol. 28, no. 1, pp. 23-31, Jan./Feb. 2011.
[58] Intel Threading Building Blocks, http:/www. Oct. 2011.
[59] J. Krüger and R. Westermann, "Linear Algebra Operators for GPU Implementation of Numerical Algorithms," ACM Trans. Graphics, vol. 22, pp. 908-916, 2003.
[60] D.C. Rapaport, "Enhanced Molecular Dynamics Performance with a Programmable Graphics Processor," Computer Physics Comm. vol. 182, pp. 926-934, 2011.
[61] F. Xu and K. Mueller, "Accelerating Popular Tomographic Reconstruction Algorithms on Commodity PC Graphics Hardware," IEEE Trans. Nuclear Science, vol. 52, pp. 654-663, June 2005.
[62] S.A. Manavski and G. Valle, "CUDA Compatible GPU Cards as Efficient Hardware Accelerators for Smith-Waterman Sequence Alignment," BMC Bioinformatics, vol. 9, no. 2, pp. 1-9, Mar. 2008.
[63] J.D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Krüger, A.E. Lefohn, and T.J. Purcel, "A Survey of General-Purpose Computation on Graphics Hardware," Computer Graphics Forum, vol. 26, no. 1, pp. 80-113, 2007.
[64] J.D. Owens, M. Houston, D. Luebke, S. Green, J.E. Stone, and J.C. Phillips, "GPU Computing," Proc. IEEE, vol. 96, no. 5, pp. 879-899, May 2008.
[65] J. Protic, M. Tomasevic, and V. Milotinuvic, "A survey of distributed shared memory systems," Proc. 28th Hawaii Int'l Conf. System Sciences (HICSS '95), pp. 74-84, 1990.
[66] C. Coarfa, Y. Dotsenko, J. Mellor-Crummey, F. Cantonnet, T. El-Ghazawi, A. Mohanty, and Y. Yao, "An Evaluation of Global Address Space Languages: Co-Array Fortran and Unified Parallel C," Proc. 10th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, pp. 36-47, 2005.
[67] V. Saraswat, G. Almasi, G. Bikshandi, C. Cascaval, D. Grove, D. Cunningham, O. Tardieu, I. Peshansky, and S. Kodali, "The Asynchronous Partitioned Global Address Space Model," Proc. First Workshop Advances in Message Passing, 2010.
[68] DARPA's, High_ Productivity_Computing_Systems_(HPCS).aspx , Oct. 2011.
[69] D. Bonachea and J. Jeong, "GASNet: A Portable High-Performance Communication Layer for Global Address-Space Languages," CS258 Parallel Computer Architecture Project, 2002.
[70] GASNet, http:/, Oct. 2011.
[71] A. Mainwaring and D. Culler, "Active Messages: Organization and Applications Programming Interface," technical report, UC Berkeley, 1995.
[72] J. Nieplocha and B. Carpenter, "ARMCI: A Portable Remote Memory Copy Library for Distributed Array Libraries and Compiler Run-Time Systems," Proc. Third Workshop Runtime Systems for Parallel Programming (RTSPP) of IPPS/SPDP '99, 1999.
[73] J. Nieplocha, V. Tipparaju, M. Krishnan, and D. Panda, "High Performance Remote Memory Access Comunications: The ARMCI Approach," Int'l J. High Performance Computing and Applications, vol. 20, pp. 233-253, 2006.
[74] Aggregate Remote Memory Copy Interface, http://www. , Oct. 2011.
[75] The KeLP Programming System, , Oct. 2011.
[76] S.J. Fink, S.R. Kohn, and S.B. Baden, "Efficient Run-Time Support for Irregular Block-Structured Applications," J. Parallel and Distributed Computing, vol. 50, pp. 61-82, 1998.
[77] W. Carlson, J. Draper, D. Culler, K. Yelick, E. Brooks, and K. Warren, "Introduction to UPC and Language Specification," Technical Report CCS-TR-99-157, IDA Center for Computing Sciences, 1999.
[78] T. El-Ghazawi, W. Carlson, T. Sterling, and K. Yelick, UPC: Distributed Shared Memory Programming. John Wiley and Sons, 2005.
[79] Unified Parallel C, http:/, Oct. 2011.
[80] R.W. Numrich and J.K. Reid, "Co-Arrays in the Next Fortran Standard," ACM SIGPLAN Fortran Forum, vol. 24, pp. 4-17, 2005.
[81] Co-Array Fortran, http:/ Apr. 2011.
[82], Oct. 2011.
[83] J. Reid, "Coarrays in the Next Fortran Standard," ACM SIGPLAN Fortran Forum, vol. 29, no. 2, pp. 10-27, 2010.
[84] Titanium, http:/, Oct. 2011.
[85] K.A. Yelick, L. Semenzato, G. Pike, C. Miyamoto, B. Liblit, A. Krishnamurthy, P.N. Hilfinger, S.L. Graham, D. Gay, P. Colella, and A. Aiken, "Titanium: A High-Performance Java Dialect," Proc. ACM Workshop Java for High-Performance Network Computing, 1998.
[86] X10 Language, http:/, Oct. 2011.
[87] J. Muttersbach, T. Villiger, and W. Fichtner, "Practical Design of Globally-Asynchronous Locally-Synchronous Systems," Proc. Sixth Int'l Symp. Advanced Research in Asynchronous Circuits and Systems (ASYNC '00), pp. 52-59, 2000.
[88] M. Weiland, "Chapel, Fortress and x10: Novel Languages for hpc," technical report from the HPCx Consortium, 2007.
[89] Chapel Language, http:/ Oct. 2011.
[90] D. Callahan, B.L. Chamberlain, and H.P. Zima, "The Cascade High Productivity Language," Proc. Ninth Int'l Workshop High-Level Parallel Programming Models and Supportive Environments (HIPS), pp. 52-60, 2004.
[91] Project Fortress, http:/, Oct. 2011.
[92] G. Steele, "Fortress: A New Programming Language for Scientific Computing," Sun Labs Open House, 2005.
[93] T. Sterling, P. Messina, and P.H. Smith, Enabling Technologies for Petaflops Computing. MIT Press, 1995.
[94] C. Wright, "Hybrid Programming Fun: Making Bzip2 Parallel with MPICH2 & pthreads on the Cray XD1," Proc. CUG, 2006.
[95] P. Johnson, "Pthread Performance in an MPI Model for Prime Number Generation," CSCI 4576 - High-Performance Scientific Computing, Univ. of Colorado, 2007.
[96] W. Pfeiffer and A. Stamatakis, "Hybrid MPI/Pthreads Parallelization of the RAxML Phylogenetics Code," Proc. Ninth IEEE Int'l Workshop High Performance Computational Biology, Apr. 2010.
[97] L. Smith and M. Bulk, "Development of Mixed Mode MPI/OpenMP Applications," Proc. Workshop OpenMP Applications and Tools (WOMPAT '00), July 2000.
[98] R. Rabenseifner, "Hybrid Parallel Programming on HPC Platforms," Proc. European Workshop OpenMP (EWOMP '03), 2003.
[99] B. Estrade, "Hybrid Programming with MPI and OpenMP," Proc. High Performance Computing Workshop, 2009.
[100] S. Bova, C. Breshears, R. Eigenmann, H. Gabb, G. Gaertner, B. Kuhn, B. Magro, S. Salvini, and V. Vatsa, "Combining Message-Passing and Directives in Parallel Applications," SIAM News, vol. 32, no. 9, pp. 10-14, 1999.
[101] I.J. Bush, C.J. Noble, and R.J. Allan, "Mixed OpenMP and MPI for Parallel Fortran Applications," Proc. Second European Workshop OpenMP, 2000.
[102] P. Luong, C.P. Breshears, and L.N. Ly, "Costal Ocean Modeling of the U.S. West Coast with Multiblock Grid and Dual-Level Parallelism," Proc. ACM/IEEE Conf. Supercomputing'01, 2001.
[103] R.D. Loft, S.J. Thomas, and J.M. Dennis, "Terascale Spectral Element Dynamical Core for Atmospheric General Circulation Models," Proc. ACM/IEEE Conf. Supercomputing'01, 2001.
[104] K. Nakajima, "Parallel Iterative Solvers for Finite-Element Methods Using an OpenMP/MPI hybrid Programming Model on the Earth Simulator," Parallel Computing, vol. 31, pp. 1048-1065, 2005.
[105] R. Aversa, B. Di Martino, M. Rak, S. Venticinque, and U. Villano, "Performance Prediction through Simulation of a Hybrid MPI/OpenMP Application," Parallel Computing, vol. 31, pp. 1013-1033, 2005.
[106] F. Cappello and D. Etiemble, "MPI Versus MPI+OpenMP on the IBM SP for the NAS Benchmarks," Proc. Conf. High Performance Networking and Computing, 2000.
[107] J. Duthie, M. Bull, A. Trew, and L. Smith, "Mixed Mode Applications on HPCx," Technical Report HPCxTR0403, HPCx Consortium, 2004.
[108] L. Smith, "Mixed mode MPI/OpenMP programming," Technical Report Technology Watch 1, UK High-End Computing, EPCC, United Kingdom, 2000.
[109] D.S. Henty, "Performance of hybrid Message-Passing and Shared-Memory Parallelism for Discrete Element Modeling," Proc. ACM/IEEE Conf. Supercomputing'00, 2000.
[110] E. Chow and D. Hysom, "Assessing Performance of Hybrid MPI/OpenMP Programs on SMP Clusters," Technical Report UCRL-JC-143957, Lawrence Livermore Nat'l Laboratory 2001.
[111] J.C. Thibault and I. Senocak, "CUDA Implementation of a Navier-Stokes Solver on Multi-GPU Desktop Platforms for Incompressible Flows," Proc. 47th AIAA Aerospace Sciences Meeting, 2010.
[112] S. Jun Park and D. Shires, "Central Processing Unit/Graphics Processing Unit (CPU/GPU) Hybrid Computing of Synthetic Aperture Radar Algorithm," Technical Report ARL-TR-5074, US Army Research Laboratory, 2010.
[113] H. Jang, A. Park, and K. Jung, "Neural Network Implementation using CUDA and OpenMP," Proc. Digital Image Computing: Techniques and Applications, pp. 155-161, 2008.
[114] G. Sims, "Parallel Cloth Simulation Using OpenMP and CUDA," thesis dissertation, Graduate Faculty of the Louisiana State Univ. and Agricultural and Mechanical College, 2009.
[115] Y. Wang, Z. Feng, H. Guo, C. He, and Y. Yang, "Scene Recognition Acceleration using CUDA and OpenMP," Proc. First Int'l Conf. Information Science and Eng. (ICISE '09), 2009.
[116] Q. Chen and J. Zhang, "A Stream Processor Cluster Architecture Model with the Hybrid Technology of MPI and CUDA," Proc. First Int'l Conf. Information Science and Eng. (ICISE '09), 2009.
[117] J.C. Phillips, J.E. Stone, and K. Schulten, "Adapting a Message-Driven Parallel Application to GPU-Accelerated Clusters," Proc. ACM/IEEE Conf. Supercomputing, 2008.
[118] H. Schivea, C. Chiena, S. Wonga, Y. Tsaia, and T. Chiueha, "Graphic-Card Cluster for Astrophysics (GraCCA) - Performance Tests," New Astronomy, vol. 13, no. 6, pp. 418-435, 2008.
[119] D.A. Jacobsen, J.C. Thibault, and I. Senocak, "An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters," Proc. 48th AIAA Aerospace Sciences Meeting, 2010.
[120] N.P. Karunadasa and D.N. Ranasinghe, "On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters," Proc. Int'l Conf. High Performance Computing, 2009.
[121] V. Strassen, "Gaussian Elimination Is Not Optimal," Numerische Mathematik, vol. 13, pp. 354-356, 1969.
[122] M.R. Hestenes and E. Stiefel, "Methods of Conjugate Gradients for Solving Linear Systems," J. Research of the Nat'l Bureau of Standards, vol. 49, no. 6, pp. 409-436, 1952.
[123] A.E. Walsh, J. Couch, and D.H. Steinberg, Java 2 Bible. Wiley Publishing, 2000.
[124] B. Amedro, V. Bodnartchouk, D. Caromel, C. Delbé, F. Huet, and G.L. Taboada, "Current State of Java for HPC," Technical Report RT-0353, INRIA, 2008.
[125] Nas Parallel Benchmarks,\Software npb.html, Oct. 2011.
[126] R.V. Nieuwpoort, J. Maassen, G. Wrzesinska, R. Hofman, C. Jacobs, T. Kielmann, and H.E. Bal, "Ibis: A Flexible and Efficient Java Based Grid Programming Environment," Concurrency and Computation: Practice and Experience, vol. 17, pp. 1079-1107, 2005.
[127] G.L. Taboada, J. Touriño, and R. Doallo, "Java for High Performance Computing: Assessment of Current Research and Practice," Proc. Seventh Int'l Conf. Principles and Practice of Programming in Java (PPPJ '09), pp. 30-39, 2009.
[128] A. Shafi, B. Carpenter, M. Baker, and A. Hussain, "A Comparative Study of Java and C Performance in Two Large-Scale Parallel Applications," Concurrency and Computation: Practice & Experience, vol. 15, no. 21, pp. 1882-1906, 2010.
[129] B. Blount and S. Chatterjee, "An Evaluation of Java for Numerical Computing," Scientific Programming, vol. 7, no. 2, pp. 97-110, 1999.
[130] Java Grande Forum: index.html , Oct. 2011.
[131] M. Baker, B. Carpenter, S.H. Ko, and X. Li, "mpiJava: A Java Interface to MPI," Proc. First UK Workshop Java for High Performance Network Computing, 1998.
[132] A. Shafi, B. Carpenter, and M. Baker, "Nested Parallelism for Multi-Core HPC Systems Using Java," J. Parallel Distributed Computing, vol. 69, pp. 532-545, 2009.
[133] G.L. Taboada, S. Ramos, J. Touriño, and R. Doallo, "Design of Efficient Java Message-Passing Collectives on Multi-Core Clusters," J. Supercomputing, vol. 55, pp. 126-154, 2011.
[134] High Performance Fortran, http://hpff.rice.eduindex.htm. Oct. 2011.
[135] H. Richardson, "High Performance Fortran: History, Overview and Current Developments," Technical Report TMC-261, Thinking Machines Corporation, 1996.
[136] C.H.Q. Ding, "High Performance Fortran for Practical Scientific Algorithms: An Up-to-Date Evaluation," Future Generation Computer Systems, vol. 15, pp. 343-352, 1999.
[137] R.D. Blumofe, C.F. Joerg, B.C. Kuszmaul, C.E. Leiserson, K.H. Randall, and Y. Zhou, "Cilk: An Efficient Multithreaded Runtime System," J. Parallel and Distributed Computing, vol. 37, pp. 55-69, 1996.
[138] Cilk Project,, Oct. 2011.
[139] Intel Cilk Plus, intel-cilk-plus, Oct. 2011.
[140] B.L. Chamberlain, S.-E. Choi, E.C. Lewis, C. Lin, L. Snyder, and W.D. Weathersby, "ZPL: A Machine Independent Programming Language for Parallel Computers," IEEE Trans. Software Eng., vol. 26, no. 3, pp. 197-211, Mar. 2000.
[141] L. Snyder, "The Design and Development of ZPL," Proc. Third ACM SIGPLAN History of Programming Languages Conf., June 2007.
[142] Zpl Web: homeindex.html, Oct. 2011.
[143] H. Wu, G. Turkiyyahi, and W. Keirouzt, "ZPLCLAW: A Parallel Portable Toolkit for Wave Propagation Problems," Proc. Am. Soc. of Civil Eng. (ASCE) Structures Congress, 2000.
[144] Erlang: http:/, Oct. 2011.
[145] S. Vinoski, "Reliability with Erlang," IEEE Internet Computing, vol. 11, no. 6, pp. 79-81, Nov./Dec. 2007.
[146] P.W. Trinder, K. Hammond, H.-W. Loidl, and S.L. Jones, "Algorithm+Strategy=Parallelism," J. Functional Programming, vol. 8, no. 1, pp. 23-60, 1998.
[147] S. Marlow, S.P. Jones, and S. Singh, "Runtime Support for Multicore Haskell," ACM SIGPLAN Notices - ICFP '09, vol. 44, no. 9, pp. 65-78, 2009.
[148] A.S. Tanenbaum and M.V. Steen, Distributed Systems: Principles and Paradigms, second ed. Prentice Hall, 2007.
[149] I. Foster and C. Kesselman, The Grid: Blueprint for a New Computing Infrastructure. Morgan Kauffman, 1998.
[150] B. Wilkinson, Grid Computing. Chapman & Hall/CRC, 2010.
[151] gLite, http:/, Oct. 2011.
[152] "EGEE, http:/," Oct. 2011.
[153] S. Reyes, C. Muñoz-Caro, A. Niño, R.M. Badia, and J.M. Cela, "Performance of Computationally Intensive Parameter Sweep Applications on Internet-Based Grids of Computers: the Mapping of Molecular Potential Energy Hypersurfaces," Concurrency and Computation: Practice and Experience, vol. 19, pp. 463-481, 2007.
[154] C. Sun, B. Kim, G. Yi, and H. Park, "A Model of Problem Solving Environment for Integrated Bioinformatics Solution on Grid by Using Condor," Proc. Int'l Conf. Grid and Cooperative Computing (GCC), pp. 935-938, 2004.
[155] Large Hadron Collider (LHC) Computing Grid Project for High Energy Physics Data Analysis,, Oct. 2011.
[156] OMG, http:/, Oct. 2011.
[157] A. Birrell and B. Nelson, "Implementing Remote Procedure Calls," ACM Trans. Computer Systems vol. 2, no. 1, pp. 39-59, 1984.
[158] S. Vinoski, "CORBA: Integrating Diverse Applications within Distributed Heterogeneous Environments," IEEE Comm. Magazine, vol. 35, no. 2, pp. 46-55, Feb. 1997.
[159] M. Henning, "The Rise and Fall of CORBA," ACM Queue, vol. 4, pp. 28-34, June 2006.
[160] Y. Gong, "CORBA Application in Real-Time Distributed Embedded Systems," Survey Report, ECE 8990 Real-Time Systems Design, 2003.
[161] CORBA/e,, Oct. 2011.
[162] COM, , Oct. 2011.
[163] ComSource, http://www.opengroup.orgcomsource, Oct. 2011.
[164] P. Emerald, C. Yennun, H.S. Yajnik, D. Liang, J.C. Shih, C.Y. Wang, and Y.M. Wang, "DCOM and CORBA Side by Side, Step by Step, and Layer by Layer," C++ Report, vol. 10, no. 1, pp. 18-29, 1998.
[165] G. Alonso, F. Casati, H. Kuno, and V. Machiraju, Web Services: Concepts, Architectures and Applications. Springer-Verlag, 2004.
[166] A. Gokhale, B. Kumar, and A. Sahuguet, "Reinventing the Wheel? CORBA vs. Web Services," Proc. Conf. World Wide Web (WWW '02), 2002.
[167] SOAP: , Apr. 2011.
[168] WSDL,, Oct. 2011.
[169] E. Cerami, "Web Services Essentials. Distributed Applications with XML-RPC, SOAP, UDDI & WSDL, O'Reilly," 2002.
[170] abbrev=\uddi-spec , Oct. 2011.
[171] http:/, Oct. 2011.
[172] http:/, Oct. 2011.
[173] http:/, Oct. 2011.
[174] http:/, Oct. 2011.
[175] W.W. Eckerson, "Three Tier Client/Server Architecture: Achieving Scalability, Performance, and Efficiency in Client Server Applications," Open Information Systems, vol. 10, no. 1, 1995.
[176] index.html, Oct. 2011.
[177] Workflows for e-Science, I.J. Taylor, E. Deelman, D.B. Gannon, and M. Shields, eds. Springer-Verlag, 2007.
[178] EMBRACE Service Registry:, Oct. 2011.
[179] A. Sahai, S. Graupner, and W. Kim, "The Unfolding of the Web Services Paradigm," Technical Report HPL-2002-130, Hewlett-Packard, 2002.
[180] T. Earl, Service-Oriented Architecture: Concepts, Technology, and Design. Prentice-Hall, 2005.
[181] S. Mulik, S. Ajgaonkar, and K. Sharma, "Where Do You Want to Go in Your SOA Adoption Journey?," IT Professional, vol. 10, no. 3, pp. 36-39, May/June 2008.
[182] J. McGovern, S. Tyagi, M. Stevens, and S. Mathew, "Service Oriented Architecture," Java Web Services Architecture, Chapter 2, Morgan Kaufmann, 2003.
[183] R.T. Fielding, "Architectural Styles and the Design of Network-Based Software Architectures," PhD dissertation, Univ. of California, Irvine, 2000.
[184] R.T. Fielding and R.N. Taylor, "Principled Design of the Modern Web Architecture," ACM Trans. Internet Technology, vol. 2, no. 2, pp. 115-150, May 2002.
[185] S. Vinoski, "REST Eye for the SOA Guy," IEEE Internet Computing, vol. 11, no. 1, pp. 82-84, Jan./Feb., 2007.
[186] ZeroC Ice, www.zeroc.comice.html, Oct. 2011.
[187] M. Henning and M. Spruiell Distributed Programming with Ice, ZeroC, 2003, www.zeroc.comIce-Manual.pdf, Oct. 2011.
[188] M. Henning, "A New Approach to Object-Oriented Middleware," IEEE Internet Computing, vol. 8, no. 1 pp. 66-75, Jan./Feb. 2004.
[189] Scopus, http://www.scopus.comhome.url, Oct. 2011.
[190] , "A Call to Arms for Parallel Programming Standards," HPCWire, SC10 Features, Nov. 2010.
31 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool