Publication 1995 Issue No. 11 - November Abstract - Annealed Embeddings of Communication Patterns in an Interconnection Cached Network
Annealed Embeddings of Communication Patterns in an Interconnection Cached Network
November 1995 (vol. 6 no. 11)
pp. 1153-1167
 ASCII Text x Vipul Gupta, Eugen Schenfeld, "Annealed Embeddings of Communication Patterns in an Interconnection Cached Network," IEEE Transactions on Parallel and Distributed Systems, vol. 6, no. 11, pp. 1153-1167, November, 1995.
 BibTex x @article{ 10.1109/71.476187,author = {Vipul Gupta and Eugen Schenfeld},title = {Annealed Embeddings of Communication Patterns in an Interconnection Cached Network},journal ={IEEE Transactions on Parallel and Distributed Systems},volume = {6},number = {11},issn = {1045-9219},year = {1995},pages = {1153-1167},doi = {http://doi.ieeecomputersociety.org/10.1109/71.476187},publisher = {IEEE Computer Society},address = {Los Alamitos, CA, USA},}
 RefWorks Procite/RefMan/Endnote x TY - JOURJO - IEEE Transactions on Parallel and Distributed SystemsTI - Annealed Embeddings of Communication Patterns in an Interconnection Cached NetworkIS - 11SN - 1045-9219SP1153EP1167EPD - 1153-1167A1 - Vipul Gupta, A1 - Eugen Schenfeld, PY - 1995KW - Interconnection cacheKW - interconnection networksKW - switching localityKW - latency reductionKW - optical networksKW - reconfigurable parallel architecturesKW - process mappingKW - simulated annealing.VL - 6JA - IEEE Transactions on Parallel and Distributed SystemsER -

Abstract—The communication needs of many parallel applications exhibit what we call switching locality. In such applications, each computation entity (process, thread, etc.) tends to restrict its communication to a small set of other entities. The physical location or proximity of these entities can be arbitrary, as long as the communication degree is small. The Interconnection Cached Network (ICN) is a reconfigurable network ideally suited for exploiting such locality. The use of fast small crossbar switches (Interconnection Caches) with a larger, but slower, reconfigurable network (optimized for connectivity) lets the ICN adapt to the communication requirements of individual applications, potentially achieving higher performance. Embedding communication patterns efficiently in an ICN, requires finding a bounded$\ell$-contraction of the underlying communication graph.

The problem of identifying whether a graph has a bounded $\ell$-contraction for a given integer $\ell$ is known to be NP-complete for $\ell > 2$. We describe a heuristic algorithm based on simulated annealing for this problem. We test the effectiveness of our approach by using it to embed graphs, representing regular communication patterns, for which the best solutions are deterministically known. The algorithm does not rely on any structural information of the communication pattern and is therefore applicable to irregular patterns as well. The results of applying our heuristics to embed such irregular graphs are also presented. These embeddings in the ICN allow low latency communication paths to be established between the computation entities of parallel applications.

[1] A. Barak and R. Ben-Natan,“Bounded contractions of full trees,” J. Parallel and Distributed Computing, vol. 17, no. 4, pp. 363-369, Apr. 1993.
[2] A. Barak and E. Schenfeld,“Embedding classical communication topologies in the OPAM architecture,” Technical Report TR 90-12, Dept. of Computer Science, The Hebrew Univ. of Jerusalem, Aug. 1990. A shorter version appears in Proc. IEEE Symp. Parallel and Distributed Processing, pp. 482-485, Dec. 1991.
[3] J. Beetem, M. Denneau, and D. Weingarten, “The GF11 Supercomputer,” Proc. 12th Ann. Int'l Symp. Computer Architecture, pp. 108-115, 1985.
[4] F. Berman and L. Snyder, "On Mapping Parallel Algorithms into Parallel Architectures," J. Parallel and Distributed Computing, vol. 4, pp. 439-458, 1987.
[5] J. Bhasker and S. Sahni,“Optimal linear arrangement of circuit components,” J. VLSI and Computer Systems, vol. 2, pp. 87-109, 1987.
[6] Pyramidal System for Computer Vision, V. Cantoni and S. Levialdi, eds., vol. F25of NATO ASI Series. Berlin, Heidelberg: Springer-Verlag, 1987.
[7] W.-M. Chen,Y.-X. Wong,, and X. Ping,“Flow-shop scheduling by the knowledge of statistical mechanics and annealing,” Proc. 26th IEEE Conf. Decision Control, vol. 1, pp. 642-643, 1987.
[8] J. Deminet,“Experience with multiprocessor algorithms,” IEEE Trans. Computers, vol. 31, no. 4, Apr. 1982.
[9] A. Despain and D. Patterson,“X-tree: A structured multiprocessor computer architecture,” Proc. Fifth Ann. Int’l Symp. Computer Architecture, pp. 144-151, 1978.
[10] I.S. Duff, R. Grimes, and J. Lewis, “Sparse Matrix Test Problems,” ACM Trans. Mathematical Software, vol. 15, pp. 1–14, Mar. 1989.
[11] Federal Coordinating Council for Science, Engineering and Tech nology, “Grand challenges 1993: High performance computing and communications,” a report by the Committee on Physical, Mathematical, and Engineering Sciences, Nat’l Science Foundation, 1993.
[12] I. Foster and S. Tuecke,“Parallel programming with PCN,” Technical Report ANL-91/32 Rev. 2, Argonne Nat’l laboratory, Argonne, Ill., Jan. 1993.
[13] L. Goldstein and M.S. Waterman,“Mapping DNA by stochastic relaxation,” Advances in Applied Mathematics, vol. 8, pp. 194-207, 1987.
[14] V. Gupta,“Mapping techniques and performance analysis for an interconnection cached multiprocessor network,” PhD thesis, Rutgers Univ., New Brunswick, N.J., Oct. 1994.
[15] V. Gupta and E. Schenfeld,“A heuristic approach for embedding communication patterns in an interconnection cached parallel processing network,” Proc. Seventh Int’l Parallel Processing Symp., pp. 291-297, Apr. 1993.
[16] V. Gupta and E. Schenfeld,“A comparative performance study of an interconnection cached network,” Proc. Int’l Conf. Parallel Processing, vol. I, pp. 191-195, Aug. 1994.
[17] F.K. Hwang, "Control Algorithms for Rearrangeable Clos Networks," IEEE Trans. Comm., vol. 31, pp. 952-954, Aug. 1983.
[18] L.H. Jamieson,The Characteristics of Parallel Algorithms, chapter “Characterizing parallel algorithms,” pp. 65-100.Cambridge, Mass.: MIT Press, 1987.
[19] D.S. Johnson, C. Aragon, L. McGeoch, and C. Schevon, "Optimization by Simulated Annealing: An Experimental Evaluation, Part 1, Graph Partitioning," Operations Research, vol. 37, pp. 865-892, 1989.
[20] S. Kirkpatrick,C.D. Gelatt Jr.,, and M.P. Vecchi,“Optimization by simulated annealing,” Science, vol. 220, pp. 671-680, May 1983.
[21] H.T. Kung,“The structure of parallel algorithms,” Advances in Computers, vol. 19, pp. 65-112, 1980.
[22] F.T. Leighton,Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes.San Mateo, Calif.: Morgan Kaufmann, 1992.
[23] Y.-D. Lyuu and E. Schenfeld,“Parallel graph contraction with applications to a reconfigurable parallel architecture,” Proc. 1994 Int’l Conf. Parallel Processing, vol. III, pp. 258-265, Aug. 1994.
[24] R. McGill,J.W. Tukey,, and W.A. Desarbo,“Variations on box plots,” Amer. Stat., vol. 32, no. 1, pp. 12-16, 1978.
[25] D.M. Nicholson,A. Chowdhary,, and L. Schwartz,“Monte Carlo optimization of pair distributed functions—application to the electronic structure of disordered metals,” Physical Review B, vol. 19, pp. 1,633-1,637, 1984.
[26] J. Ortega and R. Voigt,“Solution of partial differential equations on vector and parallel computers,” SIAM Review, vol. 27, no. 2, pp. 149-240, 1985.
[27] R.H.J.M. Otten and L.P.P.P. van Ginneken,“Floorplan design using simulated annealing,” Proc. IEEE Int’l Conf. Computer-Aided Design, pp. 96-98,Santa Clara, Calif., 1984.
[28] F.P. Preparata and J. Vuillemin, “The Cube-Connected Cycles: A Versatile Network for Parallel Computation,” Comm ACM, vol. 24, no. 5, pp. 300-309, 1981.
[29] B. Ramamurthy and M.S. Krishnamoorthy,“Bounded 2-contractions of graphs,” Proc. 1991 Ann. Allerton Conf. Communication, Control and Computing, pp. 498-507, 1991.
[30] B. Ramamurthy and M.S. Krishnamoorthy,“Bounded p-contractability is NP-complete,” Technical Report 92-17, Dept. of Computer Science, Rennselaer Polytechnic Inst., Troy, N.Y., June 1992.
[31] M.R. Samatham and D.K. Pradhan, "The de Bruijn Multiprocessor Network: A Versatile Parallel Processing and Sorting Network for VLSI," IEEE Trans. Computers, vol. 38, no. 4, pp. 567-581, Apr. 1989.
[32] A.A. Sawchuk, C.S. Raghavandra, B.K. Jenkins, and A. Varma, “Optical Crossbar Networks,” IEEE Computer, vol. 20, no. 6, pp. 50–62, June 1987.
[33] E. Schenfeld,“The optical parallel architecture model (OPAM) experiment—project plan and motivations,” Technical Report TR 88-17, Dept. of Computer Science, Hebrew Univ. of Jerusalem, Aug. 1988.
[34] H.J. Siegel,“The theory underlying the partitioning of permutation networks,” IEEE Trans. Computers, vol. 29, no. 9, pp. 791-800, Sept. 1980.
[35] H.D. Simon,“Partitioning of unstructured problems for parallel processing,” Computing Systems in Engineering, vol. 2, no. 2/3, pp. 135-148, 1991.
[36] L. Snyder,“Introduction to the configurable, highly parallel computer,” Computer, vol. 17, no. 7, pp. 27-36, July 1984.
[37] L. Snyder,“Parallel programming and the Poker programming environment,” Computer, vol. 15, pp. 47-56, Jan. 1982.
[38] L.T. Willie,“The football pool problem for 6 matches: A new upper bound obtained by simulated annealing,” J. Combinatorial Theory A, vol. 45, pp. 171-177, 1987.

Index Terms:
Interconnection cache, interconnection networks, switching locality, latency reduction, optical networks, reconfigurable parallel architectures, process mapping, simulated annealing.
Citation:
Vipul Gupta, Eugen Schenfeld, "Annealed Embeddings of Communication Patterns in an Interconnection Cached Network," IEEE Transactions on Parallel and Distributed Systems, vol. 6, no. 11, pp. 1153-1167, Nov. 1995, doi:10.1109/71.476187