This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
LoGPC: Modeling Network Contention in Message-Passing Programs
April 2001 (vol. 12 no. 4)
pp. 404-415

Abstract—In many real applications, for example, those with frequent and irregular communication patterns or those using large messages, network contention and contention for message processing resources can be a significant part of the total execution time. This paper presents a new cost model, called LoGPC, that extends the LogP [9] and LogGP [4] models to account for the impact of network contention and network interface DMA behavior on the performance of message passing programs. We validate LoGPC by analyzing three applications implemented with Active Messages [11], [19] on the MIT Alewife multiprocessor. Our analysis shows that network contention accounts for up to 50 percent of the total execution time. In addition, we show that the impact of communication locality on the communication costs is at most a factor of two on Alewife. Finally, we use the model to identify trade-offs between synchronous and asynchronous message passing styles.

[1] A. Agarwal, "Limits on Interconnection Network Performance," IEEE Trans. Parallel and Distributed Systems, vol. 2, no. 4, pp. 398-412, Oct. 1991.
[2] A. Agarwal et al., “The MIT Alewife Machine: Architecture and Performance,” Proc. Int'l Symp. Computer Architecture, pp. 2-13, June 1995.
[3] A. Arpaci-Dusseau, D. Culler, K. Schauser, and R. Martin, “Fast Parallel Sorting under LogP: Experience with the CM-5,” IEEE Trans. Parallel and Distributed Systems, vol. 7, no. 8, pp. 791-805, Aug. 1996.
[4] A. Alexandrov, M. Ionescu, K.E. Schauser, and C. Scheiman, “LogGP: Incorporating Long Messages into the LogP Model,” Proc. Symp. Parallel Algorithms and Architectures '95, July 1995.
[5] C.A. Moritz, K. Al-Tawil, B.F. Rodriguez, “MPI Performance Comparison on MPP and Workstation Clusters,” Proc. 10th Int'l Conf. Parallel and Distributed Computing, Oct. 1997.
[6] G. Chochia, C. Boeres, and P. Thanisch, “Analysis of Multicomputer Schedules in Cost and Latency Model of Communication,” Abstract Machine Workshop, 1996.
[7] F. Chong, R. Barua, F. Dahlgren, J. Kubiatowicz, and A. Agarwal, “The Sensitivity of Communication Mechanisms to Bandwidth and Latency,” Proc. Fourth Int'l Symp. High Performance Computer Architecture, Feb. 1998.
[8] D.E. Culler, A. Dusseau, S.C. Goldstein, A. Krishnamurthy, S. Lumetta, T. von Eicken, and K. Yelick, "Parallel Programming in Split-C," Supercomputing, 1993.
[9] D. Culler,R. Karp,D. Patterson,A. Sahay,K.E. Schauser,E. Santos,R. Subramonian,, and T. von Eicken,“LogP: Towards a realistic model of parallel computation,” Fourth Symp. Principles and Practices Parallel Programming, SIGPLAN’93, ACM, May 1993.
[10] D. Culler et al., "Assessing Fast Network Interfaces," IEEE Micro, Feb. 1996, pp. 35-43.
[11] T. von Eicken et al., “Active Messages: A Mechanism for Integrated Communication and Computation,” Proc. 19th Int’l Symp. Computer Architecture, Assoc. of Computing Machinery, N.Y., May 1992, pp. 256-266.
[12] K. Johnson, “The Impact of Communication Locality on Large-Scale Multiprocessor Performance,” Proc. 19th Ann. Int'l Symp. Computer Architecture, pp. 392-402, May 1992.
[13] A.G. Greenberg, “On the Time Complexity of Broadcast Communication Schemes,” Proc. 14th ACM Symp. Theory of Computing, pp. 354-364, May 1982.
[14] C. Holt, M. Heinrich, J.P. Singh, E. Rothberg, and J. Hennesy, “The Effects of Latency, Occupancy and Bandwidth on the Performance of Cache-Coherent Multiprocessors,” Technical Report CSL-TR-95, Stanford Univ., Jan. 1995.
[15] R. Jefferey and M. Berry, "A Framework for Evaluation and Prediction of Metrics Program Success," 1st Int'l Software Metrics Symp., IEEE Computer Soc. Press, Los Alamitos, Calif., 1993, pp. 28-39.
[16] K. Keeton, T. Anderson, and D. Patterson, “LogP Quantified: The Case for Low-Overhead Local Area Networks,” Hot Interconnects III: A Symp. High Performance Interconnects, Aug. 1995.
[17] V. Karamcheti and A.A. Chien, “A Comparison of Architectural Support for Messaging in the TMC CM-5 and the Cray T3D,” Proc. ISCA '95, 1995.
[18] C.P. Kruskal and M. Snir, “The Performance of Multistage Interconnection Networks for Multiprocessors,” IEEE Trans. Computers, vol. 37, pp. 1091-1098, Dec. 1983.
[19] K. Mackenzie, J. Kubiatowicz, M. Frank, W. Lee, V. Lee, A. Agarwal, and F. Kaashoek, “Exploiting Two-Case Delivery for Fast Protected Messaging,” Proc. Fourth Int'l Symp. High Performance Computer Architecture, Feb. 1998.
[20] N.K. Madsen, “Divergence Preserving Discrete Surface Integral Methods for Maxwell's Curl Equations Using Nonorthogonal Unstructured Grids,” Technical Report 92.04, RIACS, Feb. 1992.
[21] R.P. Martin et al., "Effects of Communication Latency, Overhead and Bandwidth in a Cluster Architecture," Computer Architecture News, May 1997, pp. 85-97.
[22] C.H. Papadimitriou and M. Yannakakis,"Towards an architecture-independent analysis of parallel algorithms," SIAM J. Computing, vol. 19, no. 2, pp. 322-328, Apr. 1990.

Index Terms:
Multiprocessors, modeling, pipelining, contention, network.
Citation:
Csaba Andras Moritz, Matthew I. Frank, "LoGPC: Modeling Network Contention in Message-Passing Programs," IEEE Transactions on Parallel and Distributed Systems, vol. 12, no. 4, pp. 404-415, April 2001, doi:10.1109/71.920589
Usage of this product signifies your acceptance of the Terms of Use.