This Article 
 Bibliographic References 
 Add to: 
Using Emulations to Enhance the Performance of Parallel Architectures
October 1999 (vol. 10 no. 10)
pp. 1067-1081

Abstract—We illustrate the potential of techniques and results from the theory of network emulations to enhance the performance of a parallel architecture. The vehicle for this demonstration is a suite of algorithms that endow an $N$-processor bit-serial processor array ${\cal A}$ with a “meta-instruction”GAUGE$k$, which (logically) reconfigures ${\cal A}$ into an $N/k$-processor virtual machine ${\cal B}_k$ that has: 1) a datapath and memory bus whose emulated width is $k$ bits, as opposed to ${\cal A}$'s 1-bit width and 2) an instruction set that operates on $k$-bit words, in contrast to ${\cal A}$'s instruction set, which operates on 1-bit words. In order to stress the strength of the approach, we show (via pseudocode) how our emulation techniques can be implemented efficiently even if ${\cal A}$ operates in strict SIMD mode, with only single-bit masking capabilities and with no indexed memory accesses. We describe at an algorithmic level how to implement our technique—including datapath conversion (“corner-turning”) and the creation of the word-parallel instruction sets—on arrays of any regular network topology. We instantiate our technique in detail for arrays based on topologies with quite disparate characteristics: the hypercube, the de Bruijn network, and a genre of mesh with reconfigurable buses. Importantly, the emulations that underlie our technique do not alter the native machine's instruction set, hence allowing an invariant programming model across gauges.

[1] F.S. Annexstein, M. Baumslag, and A.L. Rosenberg, “Group Action Graphs and Parallel Architectures,” SIAM J. Computing, vol. 19 pp. 544-569, 1990.
[2] R. Barman, M. Bolotski, D. Camporese, and J.J. Little, “Silt: The Bit-Parallel Approach,” Proc. 10th Int'l Conf. Pattern Recognition, vol. II, pp. 332–336, 1990.
[3] S.N. Bhatt, F.R.K. Chung, J.-W. Hong, F.T. Leighton, B. Obrenic, A.L. Rosenberg, and E.J. Schwabe, “Optimal Emulations by Butterfly-Like Networks,” J. ACM, vol. 43, pp. 293-330, 1996.
[4] G.E. Blelloch, "Scans as Primitive Parallel Operations," IEEE Trans. Computers, vol. 38, pp. 1,526-1,538, 1989.
[5] T.D. deRose, L. Snyder, and C. Yang, “Near-Optimal Speedup of Graphics Algorithms Using Multigauge Parallel Computers,” Proc. Int'l Conf. Parallel Processing, pp. 289–294, 1987.
[6] M.J.B. Duff, “Review of the CLIP Image Processing System,” AFIPS Conf., pp. 1,055–1,060, 1978.
[7] A.V. Goldberg and S.A. Plotkin, "Parallel (δ+ 1) Coloring of Constant-Degree Graphs," Information Processing Letters, vol. 25, pp. 241-245, 1987.
[8] D.S. Greenberg, L.S. Heath, and A.L. Rosenberg, “Optimal Embeddings of Butterfly-Like Graphs in the Hypercube,” Math. Systems Theory, vol. 23, pp. 61–77, 1990.
[9] M.C. Herbordt, C.C. Weems, and M.J. Scudder, “Non-Uniform Region Processing on SIMD Arrays Using the Coterie Network,” Machine Vision and Applications, vol. 5, pp. 105–125, 1991.
[10] S.I. Kartashev and S.P. Kartashev, “A Multicomputer System with Dynamic Architecture,” IEEE Trans. Computers, vol. 28, pp. 704–721, 1979.
[11] R. Koch, F.T. Leighton, B.M. Maggs, S.B. Rao, A.L. Rosenberg, and E.J. Schwabe, “Work-Preserving Emulations of Fixed-Connection Networks,” J. ACM, vol. 44, pp. 104-147, 1997.
[12] R.M. Lea,“The Asp: A Cost-Effective Parallel Microcomputer”, IEEE Micro, (1988), pp. 10-29.
[13] F.T. Leighton,Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes.San Mateo, Calif.: Morgan Kaufmann, 1992.
[14] F.T. Leighton, M.J. Newman, A.G. Ranade, and E.J. Schwabe, "Dynamic Tree Embeddings in Butterflies and Hypercubes," SIAM J. Computing, vol. 21, pp. 639-654, 1992.
[15] H. Li and M. Maresca,“Polymorphic-torus network,” IEEE Trans. on Computers, vol. 38, no. 9, pp. 1345-1351, Sept. 1989.
[16] W.B. Ligon and U. Ramachandran, "An Empirical Methodology for Exploring Reconfigurable Architectures," J. Parallel and Distributed Computing, vol. 19, pp. 323-337, 1993.
[17] R. Miller,V.K. Prasanna Kumar,D.I. Reisis, and Q.F. Stout,“Parallel computations on reconfigurable meshes,” IEEE Trans. on Computers, pp. 678-692, June 1993.
[18] B.T. McCormick, “The Illinois Pattern Recognition Computer—ILLIAC III.,” IEEE Trans. Electronic Computers, vol. 12, pp. 791–813, 1963.
[19] B. ${\bf Obreni\acute c}$, “Cell Graphs for Managing Communication in Parallel Computing,” PhD thesis, Univ. of Massachusetts, 1993.
[20] D. Parkinson and C.R. Jesshope, “The AMT DAP 500,” Proc. 33rd IEEE Int'l Conf., pp. 196–199, 1988.
[21] Y. Saad and M.H. Schultz, "Data Communication in Hypercubes," J. Parallel and Distributed Computing, vol. 6, pp. 115-135, 1989.
[22] M.R. Samatham and D.K. Pradhan, "The de Bruijn Multiprocessor Network: A Versatile Parallel Processing and Sorting Network for VLSI," IEEE Trans. Computers, vol. 38, no. 4, pp. 567-581, Apr. 1989.
[23] E.J. Schwabe, “Constant-Slowdown Simulations of Normal Hypercube Algorithms on the Butterfly Network,” Information Processing Letters, vol. 45, pp. 295–301, 1993.
[24] J.T. Schwartz, “Ultracomputers,” ACM Trans. Proggressive Languages, vol. 2, 484–521, 1980.
[25] L. Snyder, “An Inquiry into the Benefits of Multigauge Parallel Computation,” Proc. Int'l Conf. Parallel Processing, pp. 488–492, 1985.
[26] L. Snyder and C. Yang, “The Principles of Multigauging Architectures,” typescript, Univ. of Washington., 1988.
[27] C. Stanfill, “Communications Architecture in the Connection Machine System,” Technical Report HA87-3, Thinking Machines Corp., 1987.
[28] H. Stone, “Parallel Processing with the Perfect Shuffle,” IEEE Trans. Computers, vol. 20, pp. 153–161, 1971.
[29] Q.F. Stout, “Using Clerks in Parallel Processing,” Proc. 23rd IEEE Symp. Foundations of Computer Science, pp. 272–279, 1982.
[30] C.C. Weems, S.P. Levitan, A.R. Hanson, E.M. Riseman, D.B. Shu, and J.G. Nash, “The Image Understanding Architecture,” Int'l J. Computer Vision, vol. 2, pp. 251–282, 1989.

Index Terms:
Parallel architecture, multiprocessor interconnection, parallel algorithms.
Bojana Obrenic, Martin C. Herbordt, Arnold L. Rosenberg, Charles C. Weems, "Using Emulations to Enhance the Performance of Parallel Architectures," IEEE Transactions on Parallel and Distributed Systems, vol. 10, no. 10, pp. 1067-1081, Oct. 1999, doi:10.1109/71.808155
Usage of this product signifies your acceptance of the Terms of Use.