This Article 
 Bibliographic References 
 Add to: 
Static and Run-Time Algorithms for All-to-Many Personalized Communication on Permutation Networks
December 1994 (vol. 5 no. 12)
pp. 1266-1274

With the advent of new routing methods, the distance that a message is sent isbecoming relatively less and less important. Thus, assuming no link contention,permutation seems to be an efficient collective communication primitive. In this paper, we present several algorithms for decomposing all-to-many personalized communication into a set of disjoint partial permutations. We discuss several algorithms and study theireffectiveness from the view of static scheduling as well as run-time scheduling. Anapproximate analysis shows that with n processors, and assuming that every processorsends and receives d messages to random destinations, our algorithm can perform thescheduling in O(dn In d) time, on average, and can use an expected number of d+log dpartial permutations to carry out the communication. We present experimental results ofour algorithms on the CM-5.

[1] I. Angus, G. C. Fox, J. Kim, and D. Walker,Solving Problems on Concurrent Processors, vol. 2, Englewood Cliffs, NJ: Prentice-Hall, 1990.
[2] M. Barnett, D. G. Payne, and R. Geijn, "Optimal broadcasting in mesh-connected architectures," Tech. Rep., Univ. of Texas at Austin, Dec. 1991.
[3] D. Baxter, J. Saltz, M. Schultz, S. Eisentstat, and K. Crowley, "An experimental study of methods for parallel preconditioned krylov methods," inProc. 1988 Hypercube Multiprocessor Conf., Pasadena, CA, Jan. 1988, pp. 1698,1711.
[4] Z. Bozkus, S. Ranka, and G. C. Fox, "Benchmarking the CM-5 multicomputer," inProc. Frontiers of Massively Parallel Computation, 1992.
[5] D. Callahan and K. Kennedy, "Compiling programs for distributed-memory multiprocessors,"J. Supercomputing, vol. 2, pp. 151-169, Oct. 1988.
[6] A. Choudhary, G. C. Fox, S. Hiranandani, K. Kennedy, C. Koelbel, S. Ranka, and C.-W. Tseng, "Compiling Fortran 77D and 90D for MIMD distributed-memory machines," inProc. Frontiers of Massively Parallel Computation, 1992.
[7] A. Choudhary, G. C. Fox, S. Hiranandani, K. Kennedy, C. Koelbel, S. Ranka, and J. Saltz, "Software support for irregular and loosely synchronous problems," inProc. Conf. High Performance Computing for Flight Vehicles, 1992.
[8] W.J. Dally and C.L. Seitz, "Deadlock-Free Message Routing in Multiprocessor Interconnection Networks,"IEEE Trans. Computers, Vol. C-36, No. 5, May 1987, pp. 547-553.
[9] R. Das, R. Ponnusamy, J. Saltz, and D. Mavriplis, "Distributed memory compiler methods for irregular problems: Data copy reuse and run-time partitioning," inCompilers and run-time Software for Scalable Multiprocessors, J. Saltz and P. Mehrotra, Eds. Amsterdam, Netherlands: Elsevier, 1991.
[10] G. C. Fox, "The architecture of problems and portable parallel software systems," Tech. Rep. SCCS-78b (Rev.), Syracuse Univ., Syracuse, NY, USA, July 1991.
[11] S. Hiranandani, K. Kennedy, and C.-W. Tseng, "Compiler support for machine-independent parallel programming in Fortran D," Tech. Rep. Rice COMP TR91-149, Rice Univ., Houston, TX, USA, Mar. 1991.
[12] S. Lennart Johnsson and C.-T. Ho, "Optimum broadcasting and personalized communication in hypercubes,"IEEE Trans. Comput., vol. 38, pp. 1249-1268, Sept. 1989.
[13] C. Koelbel and P. Mehrotra, "Compiling global name-space parallel loops for distributed execution,"IEEE Trans. Parallel Distrib. Syst., vol. 2, pp. 440-451, Oct. 1991.
[14] R. Mirchandaney, J. H. Saltz, R. M. Smith, D. M. Nicol, and Kay Crowley, "Principles of runtime support for parallel processors," inProc. 1988 ACM Int. Conf. Supercomput., St. Malo, France, July 1988, pp. 140-152.
[15] L. M. Ni and P. K. McKinley, "A survey of wormhole routing techniques in direct networks,"IEEE Comput., vol. 26, pp. 62-76, Feb. 1993.
[16] Ranka, S., and S. Sahni,Hypercube Algorithms for Image Processing and Pattern Recognition, Springer-Verlag, Berlin, 1990.
[17] A. Rogers and K. Pingali, "Process decomposition through locality of reference," inProc. SIGPLAN'89 Conf. Programming Language Design and Implementation, 1989, pp. 69-80.
[18] Thinking Machines Corp.,The Connection Machine CM-5 Reference Manual. Cambridge, MA: 1992.
[19] D. W. Walker, "Characterizing the parallel performance of a large-scale, particle-in-cell plasma simulation code,"Concurrency: Practice and Experience, 1990.
[20] D. L. Whitaker, D. C. Slack, and R. W. Walters, "Solution algorithms for the two-dimensional Euler equations on unstructured meshes," inProc. AIAA 28th Aerospace Sci. Meeting, Reno, NV, USA, Jan. 1990.
[21] H. P. Zima, H.-J. Bast and M. Gerndt, "SUPERB: A tool for semi-automatic MIMD/SIMD parallelization,"Parallel Computing, vol. 6, pp. 1-18, 1988.

Index Terms:
Index Termsscheduling; multiprocessor interconnection networks; performance evaluation; run-time algorithms; static algorithms; all-to-many personalized communication; permutation networks; run-time scheduling; CM-5
S. Ranka, J.C. Wang, G. Fox, "Static and Run-Time Algorithms for All-to-Many Personalized Communication on Permutation Networks," IEEE Transactions on Parallel and Distributed Systems, vol. 5, no. 12, pp. 1266-1274, Dec. 1994, doi:10.1109/71.334900
Usage of this product signifies your acceptance of the Terms of Use.