This Article 
 Bibliographic References 
 Add to: 
Performance Modeling of Distributed Hybrid Architectures
January 2004 (vol. 15 no. 1)
pp. 81-92
G.D. van Albada, IEEE Computer Society

Abstract—Hybrid architectures are systems where a high performance general purpose computer is coupled to one or more Special Purpose Devices (SPDs). Such a system can be the optimal choice for several fields of computational science. Configuring the system and finding the optimal mapping of the application tasks onto the hybrid machine often is not straightforward. Performance modeling is a tool to tackle and solve these problems. We have developed a performance model to simulate the behavior of a hybrid architecture consisting of a parallel multiprocessor where some nodes are the host of a GRAPE board. GRAPE is a very high performance SPD used in computational astrophysics. We validate our model on the architecture at our disposal, and show examples of predictions that our model can produce.

[1] S.J. Aarseth, Direct Methods for$N{\hbox{-}}{\rm{Body}}$Simulations Multiple Time Scales, J.U. Brackhill and B.I. Cohen, eds., Academic Press, 1985.
[2] V. Adve, R. Bagrodia, J. Browne, E. Deelman, A. Dube, E. Houstis, J. Rice, R. Sakellariou, D. Sundaram-Stukel, P. Teller, and M. Vernon, “POEMS: End-to-end Performance Design of Large Parallel Adaptive Computation Systems,” IEEE Trans. Software Eng., vol. 26, no. 11, pp. 1027-1048, Nov. 2000.
[3] V.S. Adve and R. Sakellariou, Application Representations for Multiparadigm Performance Modeling of Large-Scale Parallel Scientific Codes Int'l J. High Performance Computing Applications, vol. 14, p. 304, 2000.
[4] K. Aida, A. Takefusa, H. Nakada, S. Matsuoka, S. Sekiguchi, and U. Nagashima, Performance Evaluation Model for Scheduling in Global Computing Systems Int'l J. High Performance Computing Applications, vol. 14, p. 268, 2000.
[5] S. Aoki, R. Burkhalter, T. Kanaya, T. Yoshié, T. Boku, H. Nakamura, and Y. Yamashita, Performance of Lattice QCD Programs on CP-PACS Parallel Computing, vol. 25, p. 1243, 1999.
[6] R. Bagrodia, R. Meyer, M. Takai, Y.A. Chan, X. Zeng, J. Marting, and H.Y. Song, “Parsec: A Parallel Simulation Environment for Complex Systems,” Computer, vol. 31, no. 10, pp. 77-85, Oct. 1998.
[7] H.E. Bal, R.A.F. Bhoedjang, R.F.H. Hofman, C.J.H. Jacobs, T. Kielmann, J. Maassen, R. van Nieuwpoort, J. Romein, L. Renambot, T. Rühl, R. Veldema, K. Verstoep, A. Baggio, G. Ballintijn, I. Kuz, G. Pierre, M. van Steen, A.S. Tanenbaum, G. Doornbos, D. Germans, H. Spoelder, E.-J. Baerends, S. van Gisbergen, H. Afsarmanesh, G.D. van Albada, A.S.Z. Belloum, D. Dubbeldam, Z.W. Hendrikse, L.O. Hertzberger, A.G. Hoekstra, K.A. Iskra, B.D. Kandhai, D.C. Koelma, F. van der Linden, B.J. Overeinder, P.M.A. Sloot, P.F. Spinnato, D.H.J. Epema, A. van Gemund, P.P. Jonker, A. Radulescu, C. van Reeuwijk, H.J. Sips, P.M.W. Knijnenburg, M. Lew, F. Sluiter, L. Wolters, H. Blom, and A. van der Steen, The Distributed ASCI Supercomputer Project Operating Systems Rev., vol. 34, p. 76, ACM, Special Interest Group on Operating Systems, 2000.
[8] J. Barnes and P. Hut, A Hierarchical${\cal{O}}(N \cdot \log N)$Force-Calculation Algorithm Nature, vol. 324, p. 446, 1986.
[9] H. Cheng, L. Greengard, and V. Rokhlin, A Fast Adaptive Multiple Algorithm in Three Dimensions J. Computational Physics, vol. 155, p. 468, 1999.
[10] P. Cremonesi and C. Gennaro, Integrated Performance Models for SPMD Applications and MIMD Architectures IEEE Tran. Parallel and Distributed Systems, vol. 13, p. 745, 2002.
[11] M.D. Dikaiakos, A. Rogers, and K. Steiglitz, Functional Algorithm Simulation of the Fast Multipole Method: Architectural Implications Parallel Processing Letters, vol. 6, p. 55, 1996.
[12] R.F. Freund and H.J. Siegel, Heterogeneous Processing Computer, vol. 26, no. 6, p. 13, 1993.
[13] T. Fukushige and J. Makino, $N{\hbox{-}}{\rm{Body}}$Simulation of Galaxy Formation on the GRAPE-4 Special Purpose Computer Proc. Supercomputing Conf., 1996.
[14] Y. Funato, P. Hut, S.L.W. McMillan, and J. Makino, Time-Symmetrized Kustaanheimo-Stiefel Regularization Astronomical J., vol. 112, p. 1697, 1996.
[15] A. van Gemund, Performance Prediction of Parallel Processing Systems: The Pamela Methodology Proc. Seventh ACM Int'l Conf. on Supercomputing, 1993.
[16] A. van Gemund, Symbolic Performance Modeling of Parallel Systems IEEE Trans. Parallel and Distributed Systems, vol. 14, p. 154, 2003.
[17] N.J. Gunther, The Dynamics of Performance Collapse in Large-Scale Networks and Computers Int'l J. High Performance Computing Applications, vol. 14, p. 367, 2000.
[18] D.C. Heggie and P. Hut, The Gravitational Million Body Problem. Cambridge Univ. Press, 2003.
[19] R.W. Hockney and J.W. Eastwood, Computer Simulation Using Particles. IOP Publishing, 1988.
[20] A. Hoisie, O. Lubeck, and H. Wassermann, Performance and Scalability Analysis of Teraflop-Scale Parallel Architectures Using Multidimentional Wavefront Applications Int'l J. High Performance Computing Applications, vol. 14, p. 330, 2000.
[21] A. Kawai, T. Fukushige, and J. Makino, $7.0/Mflops Astrophysical$N{\hbox{-}}{\rm{Body}}$Simulation with Treecode on GRAPE-5 Proc. Supercomputing Conf., 1999.
[22] A. Kawai and J. Makino, Pseudoparticle Multipole Method: A Simple Method to Implement High-Accuracy Treecode Astrophysical J., vol. 550, p. 143, 2001.
[23] T. Kurc, M. Uysal, H. Eom, J. Hollingsworth, J. Saltz, and A. Sussman, Efficient Performance Prediction for Large-Scale, Data-Intensive Applications Int'l J. High Performance Computing Applications, vol. 14, p. 216, 2000.
[24] J. Makino, Treecode with a Special-Purpose Processor Publications of the Astronomical Soc. of Japan, vol. 43, p. 621, 1991.
[25] J. Makino, A Modified Aarseth Code for GRAPE and Vector Processors Publications of the Astronomical Soc. of Japan, vol. 43, p. 859, 1991.
[26] J. Makino, M. Taiji, T. Ebisuzaki, and D. Sugimoto, GRAPE-4: A Massively Parallel Special-Purpose Computer for Collisional$N{\hbox{-}}{\rm{Body}}$Simulations Astrophysical J., vol. 480, p. 432, 1997, J. Makino and M. Taiji Scientific Simulations with Special-Purpose Computers. Wiley, 1998.
[27] J. Makino and M. Taiji, Astrophysical$N{\hbox{-}}{\rm{Body}}$Simulations on GRAPE-4 Special-Purpose Computer Proc. Supercomputing Conf., 1995.
[28] J. Makino, Yet Another Fast Multipole Method without Multipoles Pseudoparticle Multipole Method J. Computational Physics, vol. 151, p. 910, 1999.
[29] J. Makino, T. Fukushige, and K. Masaki, A 1.349 Tflops Simulation of Black Holes in a Galactic Center on GRAPE-6 Proc. Supercomputing Conf., 2000.
[30] J. Makino and T. Fukushige, A 11.55 Tflops Simulation of Black Holes in a Galactic Center on GRAPE-6 Proc. Supercomputing Conf., 2001.
[31] J. Makino, E. Kokubo, T. Fukushige, and H. Daisaka, A 22.72 Tflops Simulation of Planetesimals in Uranus-Neptune Region on GRAPE-6 Proc. Supercomputing Conf., 2002.
[32] R.D. Mawhinney, The 1 Teraflops QCDSP Computer Parallel Computing, vol. 25, p. 1281, 1999.
[33] S.L.W. McMillan and S.J. Aarseth, An${\cal{O}}(N \cdot\log N)$Integration Scheme for Collisional Stellar Systems Astrophysical J., vol. 414, p. 200, 1993.
[34] P. Palazzari, L. Arcipiani, M. Celino, R. Guadagni, A. Marongiu, A. Mathis, P. Novelli, and V. Rosato, Heterogeneity as Key Feature of High Performance Computing: The PQE1 Prototype Proc. Ninth Heterogeneous Computing Workshop, 2000.
[35] A.D. Pimentel, L.O. Hertzberger, P. Lieverse, P. van der Wolf, and E.F. Deprettere, Exploring Embedded-Systems Architectures with Artemis Computer, vol. 34, no. 11, p. 57, 2001.
[36] H.C. Plummer, The Distribution of Stars in Globular Clusters Monthly Notices of the Royal Astronomical Soc., vol. 76, p. 107, 1915.
[37] P.F. Spinnato, G.D. van Albada, and P.M.A. Sloot, Performance Analysis of Parallel$N{\hbox{-}}{\rm{Body}}$Codes Proc. HPCN Europe Conf., 2000.
[38] P.F. Spinnato, G.D. van Albada, and P.M.A. Sloot, Performance Prediction of$N{\hbox{-}}{\rm{Body}}$Simulations on a Hybrid Architecture Computer Physics Comm., vol. 139, p. 34, 2001.
[39] P.F. Spinnato, G.D. van Albada, and P.M.A. Sloot, A Versatile Simulation Model for Hierarchical Treecodes Proc. ICCS Conf., 2002.
[40] L. Spitzer, Dynamical Evolution of Globular Clusters. Princeton Univ. Press, 1987.
[41] V. Springel, N. Yoshida, and S.D.M. White, GADGET: A Code for Collisionless and Gasdynamical Cosmological Simulations New Astronomy, vol. 6, p. 79, 2001.
[42] R. Tripiccione, APEmille Parallel Computing, vol. 25, p. 1297, 1999.
[43] M.S. Warren and J.K. Salmon, A Portable Parallel Particle Program Computer Physics Comm., vol. 87, p. 266, 1995.

Index Terms:
Performance modeling, N - body codes, special purpose devices, hybrid architectures.
Piero F. Spinnato, G.D. van Albada, Peter M.A. Sloot, "Performance Modeling of Distributed Hybrid Architectures," IEEE Transactions on Parallel and Distributed Systems, vol. 15, no. 1, pp. 81-92, Jan. 2004, doi:10.1109/TPDS.2004.1264788
Usage of this product signifies your acceptance of the Terms of Use.