This Article 
 Bibliographic References 
 Add to: 
P-3PC: A Point-to-Point Communication Model for Automatic and Optimal Decomposition of Regular Domain Problems
July 2002 (vol. 13 no. 7)
pp. 758-768

One of the most fundamental problems automatic parallelization tools are confronted with is to find an optimal domain decomposition for a given application. For regular domain problems (such as simple matrix manipulations), this task may seem trivial. However, communication costs in message passing programs often significantly depend on the memory layout of data blocks to be transmitted. As a consequence, straightforward domain decompositions may be nonoptimal. In this paper, we introduce a new point-to-point communication model (called P-3PC, or the “Parameterized model based on the Three Paths of Communication”) that is specifically designed to overcome this problem. In comparison with related models (e.g., LogGP) P-3PC is similar in complexity, but more accurate in many situations. Although the model is aimed at MPI's standard point-to-point operations, it is applicable to similar message passing definitions as well. The effectiveness of the model is tested in a framework for automatic parallelization of low level image processing applications. Experiments are performed on two Beowulf-type systems, each having a different interconnection network, and a different MPI implementation. Results show that, where other models frequently fail, P-3PC correctly predicts the communication costs related to any type of domain decomposition.

[1] A. Alexandrov, M. Ionescu, K.E. Schauser, and C. Scheiman, “LogGP: Incorporating Long Messages into the LogP Model,” Proc. Symp. Parallel Algorithms and Architectures '95, July 1995.
[2] C.A. Moritz and M.I. Frank, “LoGPC: Modeling Network Contention in Message-Passing Programs,” IEEE Trans. Parallel and Distributed Systems, vol. 12, no. 4, 2001.
[3] H. Bal, “The Distributed ASCI Supercomputer Project,” Operating Systems Review, vol. 34, pp. 76-96, Oct. 2000.
[4] A. Bar-Noy and S. Kipnis,“Designing broadcasting algorithms in the postal model formessage-passing systems,” Math. Systems Theory, vol. 27, no. 5, pp. 431-452, 1994.
[5] R.A.F. Bhoedjang, T. Rühl, and H.E. Bal, “LFC: A Communication Substrate for Myrinet,” Proc. Conf. Advanced School for Computing and Imaging (ASCI '98), pp. 31-37, June 1998.
[6] J. Bruck, L. De Coster, N. Dewulf, C.-T. Ho, and R. Lauwereins, "On the Design and Implementation of Broadcast and Global Combine Operations Using the Postal Model," IEEE Trans. Parallel and Distributed Systems, vol. 7, no. 3, pp. 256-265, Mar. 1996.
[7] D. Culler,R. Karp,D. Patterson,A. Sahay,K.E. Schauser,E. Santos,R. Subramonian,, and T. von Eicken,“LogP: Towards a realistic model of parallel computation,” Fourth Symp. Principles and Practices Parallel Programming, SIGPLAN’93, ACM, May 1993.
[8] I.T. Foster, Designing and Building Parallel Programs Addison-Wesley, Reading, Mass., 1995.
[9] A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Manchek,, and V. Sunderam,PVM: Parallel Virtual Machine—A Users' Guide and Tutorial for Networked Parallel Computing. The MIT Press, 1994.
[10] J.M. Geusebroek, A.W.M. Smeulders, and H. Geerts, “A Minimum Cost Approach for Segmenting Networks of Lines,” Int'l J. Computer Vision, vol. 43, no. 2, pp. 99-111, July 2001.
[11] W. Gropp, E. Lusk, N. Doss, and A. Skjellum, “A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard,” Parallel Computing, vol. 22, no. 6, pp. 789–828, 1996.
[12] S.E. Hambrusch and A. Khokhar, “$\big. C^3\bigr.$: A Parallel Model for Coarse-Grained Machines,” J. Parallel and Distruted Computing, vol. 32, no. 2, pp. 139-154, 1996.
[13] S. Akhtar, Reliability ofk-out-of-n:G Systems with Imperfect Fault-Coverage IEEE Trans. Reliability, vol. 43, pp. 101-106, Mar. 1994.
[14] M. Lauria, “LogP Characterization of FM on the VU's DAS Machine,” technical report, Dipartimento di Informatica e Sistemistica, Univ. di Napoli Federico II, 1997.
[15] O.A. McBryan, "An Overview of Message Passing Environments," Parallel Computing, Vol. 20, No. 4, Apr. 1994, pp. 11-24. Also, see the entire special issue, "Message Passing Interfaces."
[16] W.F. McColl, “Scalability, Portability and Predictability: The BSP Approach to Parallel Programming,” Future Generation Computer Systems, vol. 12, pp. 265-272, 1996.
[17] Message Passing Interface Forum,“MPI: A Message-Passing Interface Standard (version 1.1),”technical report, Univ. of Tennessee, Knoxville, Tenn., June 1995.
[18] Message Passing Interface Forum,“MPI-2: Extensions to the Message-Passing Interface,”technical report, Univ. of Tennessee, Knoxville, Tenn., July 1997.
[19] N. Nupairoj and L. M. Ni, “Performance Evaluation of Some MPI Implementations on Workstation Clusters,” Proc. 1994 Scalable Parallel Libraries Conf. (SPLC' 94), pp. 98-105, Oct. 1994.
[20] M. Prieto, I.M. Llorente, and F. Tirado, “A Review of Regular Domain Partitioning,” SIAM News, vol. 33, no. 1, Jan. 2000.
[21] M. Prieto, I.M. Llorente, and F. Tirado, Data Locality Exploitation in the Decomposition of Regular Domain Problems IEEE Trans. Parallel and Distributed Systems, vol. 11, no. 11, pp. 1141-1149, Nov. 2000.
[22] F.J. Seinstra and D. Koelma, “Modeling Performance of Low Level Image Processing Routines on MIMD Computers,” Proc. Conf. Advanced School for Computing and Imaging, (ASCI '99), pp. 307-314, June 1999.
[23] F.J. Seinstra and D. Koelma, “P-3PC: A Simple and Accurate Model of Point-to-Point Communication,” technical report, ISIS, Faculty of Science, Univ. of Amsterdam, The Netherlands, Dec. 2000.
[24] F.J. Seinstra, D. Koelma, and J.M. Geusebroek, “A Software Architecture for User Transparent Parallel Image Processing on MIMD Computers,” Proc. Seventh Int'l Euro-Par Conf. (Euro-Par 2001), pp. 653-662, Aug. 2001.
[25] A.J. van der Steen and R. van der Pas, “A Performance Analysis of the SGI Origin 2000,” Proc. Third Int'l Meeting on Vector and Parallel Processing, pp. 534-547, 1999.
[26] L.G. Valiant, “A Bridging Model for Parallel Computation,” Comm. ACM, vol. 33, no. 8, pp. 103-111, Aug. 1990.
[27] Z. Xu, X. Zhang, and L. Sun, "Semi-Empirical Multiprocessor Performance Predictions," J. Parallel and Distributed Computing, vol. 39, no. 1, pp. 14-28, 1996.
[28] X. Zhang, Y. Yan, and K. He, "Latency Metric: An Experimental Method for Measuring and Evaluating Program and Architecture Scalability," J. Parallel and Distributed Computing, Vol. 22, No. 3, Sept. 1994, pp. 392-410.

Index Terms:
MPI, point-to-point communication, performance optimization, performance modeling, automatic domain decomposition.
F.J. Seinstra, D. Koelma, "P-3PC: A Point-to-Point Communication Model for Automatic and Optimal Decomposition of Regular Domain Problems," IEEE Transactions on Parallel and Distributed Systems, vol. 13, no. 7, pp. 758-768, July 2002, doi:10.1109/TPDS.2002.1019863
Usage of this product signifies your acceptance of the Terms of Use.