This Article 
 Bibliographic References 
 Add to: 
CCL: A Portable and Tunable Collective Communication Library for Scalable Parallel Computers
February 1995 (vol. 6 no. 2)
pp. 154-164

Abstract—A collective communication library for parallel computers includes frequently used operations such as broadcast, reduce, scatter, gather, concatenate, synchronize, and shift. Such a library provides users with a convenient programming interface, efficient communication operations, and the advantage of portability. A library of this nature, the Collective Communication Library (CCL), intended for the line of scalable parallel computer products by IBM, has been designed. CCL is part of the parallel application programming interface of the recently announced IBM 9076 Scalable POWERparallel System 1 (SP1). In this paper, we examine several issues related to the functionality, correctness, and performance of a portable collective communication library while focusing on three novel aspects in the design and implementation of CCL: 1) the introduction of process groups, 2) the definition of semantics that ensures correctness, and 3) the design of new and tunable algorithms based on a realistic point-to-point communication model.

Index Terms—Collective communication algorithms, collective communication semantics, message-passing parallel systems, portable library, process group, tunable algorithms.

[1] A. Aggarwal and S. Kipnis,“Message-time tradeoff for combining data in message-passing systems,”IBM Res. Rep. RC-18349, Sept. 1992.
[2] Y. Amir, D. Dolev, S. Kramer, and D. Malki,“Transis: A communication sub-system for high availability,”inProc. 22nd Ann Int Symp Fault-Tolerant Computing, July 1992, pp. 76–84.
[3] V. Bala and S. Kipnis,“Efficient collective communication over dynamically determined sets of processors in massively parallel computers,”IBM Res. Rep. RC-17771, Mar. 1992.
[4] ——,“Process groups: A mechanism for the coordination of and communication among processes in the Venus collective communication library,”inProc. 7th Int Parallel Processing Symp, Apr. 1993.
[5] A. Bar-Noy and S. Kipnis, "Designing Broadcasting Algorithms in the Postal Model for Message-Passing Systems," Proc. ACM Symp. Parallel Algorithms and Architectures, pp. 11-22, June 1992.
[6] ——,“Broadcasting multiple messages in simultaneous send/receive systems,”inProc. 5th IEEE Symp Parallel, Distributed Processing, Dec. 1993.
[7] A. Bar-Noy, S. Kipnis, and B. Schieber,“An optimal algorithm for computing census functions in message-passing systems,”Parallel Processing Lett., vol. 3, no. 1, Mar. 1993.
[8] A. Beguelin,et al.,“A user's guide to PVM Parallel Virtual Machine,”ORNL Tech. Rep. ORNL/TM-11826, May 1992.
[9] K. Birmanet al.,“The ISIS system manual,”Department of Computer Science, Cornell University, Ithaca, NY, Sept. 1990.
[10] J. Brucket al.,“Survey of routing issues for the Vulcan parallel computer,”IBM Res. Rep. RJ-8839, June 1992.
[11] J. Brucket al.,“A proposal for common group structures in a collective communication library,”IBM Res. Rep. RJ-9421, Mar. 1993.
[12] J. Bruck, R. Cypher, and C. T. Ho,“Efficient fault-tolerant mesh and hypercubes architectures,”inProc. 1992 Int. Symp. Fault-Tolerant Computing, pp. 162–169.
[13] J. Bruck, R. Cypher, and C. T. Ho,“Multiple message broadcasting with generalized Fibonacci trees,”inProc. 4th IEEE Symp. Parallel, Distributed Processing, Dec. 1992, pp. 424–431.
[14] J. Bruck, C. T. Ho, and S. Kipnis,“Concatenating data optimally in message-passing systems,”IBM Res. Rep. RJ-9191, Jan. 1993.
[15] J. Bruck, R. Cypher, C. T. Ho, and S. Kipnis,“Efficient algorithms for the index operation in message-passing systems,”IBM Res. Rep. RJ-9300, Apr. 1993.
[16] N. Carriero and D. Gelernter, "Linda in Context," Comm. ACM, vol. 32, no. 4, Apr. 1989, pp. 444-458.
[17] D. Cheriton and W. Zwaenepoel, “Distributed Process Groups in the V Kernel,” ACM Trans. Computer Systems, Vol. 3, No. 2, May 1985, pp. 77‐107.
[18] D. Culler,R. Karp,D. Patterson,A. Sahay,K.E. Schauser,E. Santos,R. Subramonian,, and T. von Eicken,“LogP: Towards a realistic model of parallel computation,” Fourth Symp. Principles and Practices Parallel Programming, SIGPLAN’93, ACM, May 1993.
[19] R. Cypher and E. Leu,“Message-passing semantics and portable parallel programs,”IBM Res. Rep. RJ-9654, Jan. 1994.
[20] W. J. Dallyet al.,“The J-machine: A fine-grain concurrent computer,”inProc. Inform. Processing 89, 1989, pp. 1147–1153.
[21] J. Dongarra et al.,“Document for a standard message-passing interface,” Message Passing Interface Forum, Univ. of Tennessee, Tech. Report CS-93-214, Nov. 1993.
[22] J. Dongarra, R. Hempel, A. Hey, and D. Walker,“A proposal for a user-level, message-passing interface in a distributed memory environment,”ORNL Tech. Rep. ORNL/TM-12231, Oct. 1992.
[23] S. Dutt and J.P. Hayes, “Designing Fault-Tolerant Systems Using Auto-morphisms,” J. Parallel and Distributed Computing, vol. 12, no. 3, pp. 249–268, 1991.
[24] B. Elspas and J. Turner,“Graphs with circulant adjacency matrices,”J. Combinatorial Theory, no. 9, pp. 297–307, 1970.
[25] B. G. Fitch and M. E. Giampapa,“The Vulcan operation environment: A brief overview and status report,”inProc. 5th Workshop Use Parallel Processors Meteorology, ECMWF, Nov. 1992.
[26] G. Fox and W. Furmanski,“Optimal communication algorithms for regular decompositions on the hypercube,”inProc. 3rd Conf. Hypercube Concurrent Comput. Applicat., ACM, pp. 648–713, 1988.
[27] G. Fox,M. Johnson,G. Lyzenga,S. Otto,J. Salmon,, and D. Walker,Solving Problems on Concurrent Processors, Vol. I: General Techniques andRegular Problems.Englewood Cliffs, N.J.: Prentice Hall 1988.
[28] D. Fryeet al.,“An external user interface for scalable parallel systems: FORTRAN interface,”Tech. Rep., IBM Highly Parallel Supercomputing Systems Laboratory, Nov. 1992.
[29] G. A. Geist, M. T. Heath, B. W. Peyton, and P. H. Worley,“A user's guide to PICL: A portable instrumented communication library,”ORNL Tech. Rep. ORNL/TM-11616, Oct. 1990.
[30] M. E. Giampapa, B. G. Fitch, G. R. Irwin, and D. G. Shea,“Vulcan operating environment: Programmer's reference,”Intern. Memorandum, IBM T. J. Watson Research Center, Yorktown Heights, NY, Mar. 1991.
[31] R. Hempel,“The ANL/GMD macros (PARMACS) in FORTRAN for portable parallel programming using the message passing programming model, user's guide and reference manual,”Tech. Memorandum, Gesellschaft für Mathematik und Datenverabeitung mbH, West Germany.
[32] C.-T. Ho,“Optimal broadcasting on SIMD hypercubes without indirect addressing capability,”J. Parallel Distrib. Comput., 13(2):246–255, Oct. 1991.
[33] S. L. Johnsson and C.-T. Ho,“Optimum broadcasting and personalized communication in hypercubes,”IEEE Trans. Comput.,vol. 38, pp. 1249–1268, Sept. 1989.
[34] J. F. Palmer,“The NCUBE family of parallel supercomputers,”inProc. Int. Conf. Comput. Design, IEEE, 1986.
[35] A. Skjellum and A. P. Leung,“Zipcode: A portable multicomputer communication library atop the Reactive Kernel,”inProc. 5th Distrib. Memory Computing Conf., IEEE, Apr. 1990, pp. 328–337.
[36] Connection Machine CM-5 Tech. Summary, Thinking Machines Corporation, 1991.
[37] Express 3.0 Introductory Guide, Parasoft Corporation, 1990.
[38] Paragon XP/S Overview, Intel Corporation, 1991.
[39] “Vulcan system summary,”Int. Memorandum, IBM T. J. Watson Research Center, Yorktown Heights, NY.

Vasanth Bala, Jehoshua Bruck, Robert Cypher, Pablo Elustondo, Alex Ho, Ching-Tien Ho, Shlomo Kipnis, Marc Snir, "CCL: A Portable and Tunable Collective Communication Library for Scalable Parallel Computers," IEEE Transactions on Parallel and Distributed Systems, vol. 6, no. 2, pp. 154-164, Feb. 1995, doi:10.1109/71.342126
Usage of this product signifies your acceptance of the Terms of Use.