This Article 
 Bibliographic References 
 Add to: 
A Processor-Time-Minimal Systolic Array for Cubical Mesh Algorithms
January 1992 (vol. 3 no. 1)
pp. 4-13
Using a directed acyclic graph (DAG) model of algorithms, the paper focuses ontime-minimal multiprocessor schedules that use as few processors as possible. Such a processor-time-minimal scheduling of an algorithm's DAG first is illustrated using a triangular shaped 2-D directed mesh (representing, for example, an algorithm for solving a triangular system of linear equations). Then, algorithms represented by an n*n*n directed mesh are investigated. This cubical directed mesh is fundamental; it represents the standard algorithm for computing matrix product as well as many other algorithms. Completion of the cubical mesh required 3n-2 steps. It is shown that the number of processing elements needed to achieve this time bound is at least (3n/sup 2/4/). Asystolic array for the cubical directed mesh is then presented. It completes the mesh using the minimum number of steps and exactly (3n/sup 2/4/) processing elements it is processor-time-minimal. The systolic array's topology is that of a hexagonally shaped, cylindrically connected, 2-D directed mesh.

[1] A. Benaini and Y. Robert, "Spacetime-minimal systolic architectures for gaussian elimination and the algebraic path problem," inProc. Int. Conf. on Applicat. Specific Array Processors, 1990, pp. 747-757.
[2] P. R. Cappello, "VLSI architectures for digital signal processing," Ph.D. dissertation, Princeton Univ., Princeton, NJ, Oct. 1982.
[3] P. R. Cappello and A. J. Laub, "Systolic computation of multivariable frequency response,"IEEE Trans. Automat. Contr., vol. 33, pp. 550-558, June 1988.
[4] P. R. Cappello and K. Steiglitz, "Unifying VLSI array design with geometric transformations," inProc. Int. Conf. Parallel Processing, H. J. Siegel and L. Siegel, Eds., Bellaire, MI, Aug. 1983, pp. 448-457.
[5] P. R. Cappello and K. Steiglitz, "Unifying VLSI array design with linear transformations of space-time," inAdvances in Computing Research, Vol. 2: VLSI Theory. Greenwich, CT: JAI Press, 1984, pp. 23-65.
[6] M. Chen, "A design methodology for synthesizing parallel algorithms and architectures,"J. Parallel Distributed Comput., pp. 461-491, Dec. 1986.
[7] M. C. Chen, "The generation of a class of multipliers: Synthesizing highly parallel algorithms in VLSI,"IEEE Trans. Comput., vol. 37, pp. 329-338, Mar. 1988.
[8] P. E. Danielsson, "Serial/parallel convolvers,"IEEE Trans. Comput., vol. C-33, pp. 652-667, July 1984.
[9] J.-M. Delosme and I. C. F. Ipsen, "An illustration of a methodology for the construction of efficient systolic architectures in VLSI," inProc. 2nd Int. Symp. VLSI Technol., Syst. and Appl., Taipei, 1985, pp. 268-273.
[10] J.-M. Delosme and I. C. F. Ipsen, "Systolic array synthesis: Computability and time cones," Tech. Rep. Yale/DCS/RR-474, Yale, May 1986.
[11] V. Van Dongen and P. Quinton, "Uniformization of linear recurrence equations: A step toward the automatic synthesis of systolic arrays," inProc. Int. Conf. Systolic Arrays, San Diego, CA, IEEE Computer Society, May 1988, pp. 473-482.
[12] J. A. B. Fortes, "Algorithm transformations for parallel processing and VLSI architecture design," Ph.D. dissertation, Univ. of Southern California, Los Angeles, Dec. 1983.
[13] J. A. B. Fortes, K.-S. Fu, and B. W. Wah, "Systematic design approaches for algorithmically specified systolic arrays, " inComputer Architecture: Concepts and Systems, V. M. Milutinovic´, Ed. New York: North-Holland, Elsevier Science, 1988, ch. 11, pp. 454-494.
[14] J. A. B. Fortes and D. I. Moldovan, "Parallelism detection and algorithm transformation techniques useful for VLSI architecture design,"J. Parallel Distributed Comput., vol. 2, pp. 277-301, Aug. 1985.
[15] J. A. B. Fortes and F. Parisi-Presicce, "Optimal linear schedules for the parallel execution of algorithms, " inProc. Int. Conf. Parallel Processing, Aug. 1984, pp. 322-328.
[16] P. Gachet, B. Jouannault, and P. Quinton, "Synthesizing systolic arrays using DIASTOL," inProc. Int. Workshop Systolic Arrays, W. Moore, A. McCabe, and R. Urquhart, Eds., University of Oxford, Adam Hilger, July 1986, pp. 25-36.
[17] M. R. Garey and D. S. Johnson,Computers and Intractability: A Guide to Theory of NP-Completeness. San Francisco, CA: Freeman, 1979.
[18] L. J. Guibas, H.-T. Kung, and C. D. Thompson, "Direct VLSI implementation of combinatorial algorithms," inProc. Caltech Conf. VLSI, 1979, pp. 509-525.
[19] O. Ibarra and M. Palis, "VLSI algorithms for solving recurrence equations and applications,"IEEE Trans. Acoust., Speech, Signal Processing, vol. 35, no. 7, pp. 1046-1064, 1987.
[20] H. V. Jagadish, S. K. Rao, and T. Kailath, "Multi-processor architectures for iterative algorithms,"Proc. IEEE, Sept. 1987.
[21] G. j. Li and B. W. Wah, "The design of optimal systolic algorithms,"IEEE Trans. Comput., vol. C-34, no. 1, pp. 66-77, 1985.
[22] S. Lennart Johnsson and D. Cohen, "A mathematical approach to modeling the flow of data and control in computational networks," inVLSI Systems and Computations, H. T. Kung, R. Sproull, and G. Steele, Eds. Rockville, MD: Computer Science Press. 1981, pp. 213-225.
[23] S. Lennart Johnsson, U. Weiser, D. Cohen, and A. L. Davis, "Toward a formal treatment of VLSI arrays," inProc. 2nd Caltech Conf. VLSI, 1981, pp. 375-398.
[24] R. Karp, R. Miller, and S. Winograd, "The Organization of Computations for Uniform Recurrence Equations,"J. ACM, Vol. 14, No. 3, 1967, pp. 563-590.
[25] H.-T. Kung, "Why systolic architectures?,"IEEE Comput. Mag., vol. 15, pp. 37-45, Jan. 1982.
[26] H.-T. Kung and C. E Leiserson, "Systolic arrays (for VLSI)," inSparse Matrix Proceedings 1978, I.S. Duff and G. W. Stewart, Eds., SIAM, 1979, pp. 256-282.
[27] P. Lee and Z. M. Kedem, "Synthesizing linear array algorithms from nested for loop algorithms,"IEEE Trans. Comput., vol. 37, pp. 1578-1598, Dec. 1988.
[28] B. Louka and M. Tchuente, "An optimal solution for Gauss-Jordon elimination on 2D systolic arrays, " inSystolic Array Processors, J. V. McCanny, J. McWhirter, and E.E. Swartzlander Jr., Eds. Killarney, Ireland: Prentice-Hall, May 1989, pp. 264-274.
[29] C. Mead and L. Conway,Introduction to VLSI Systems. Reading, MA: Addison-Wesley, 1980, pp. 150-152.
[30] L. Melkemi and M. Tchuente, "Complexity of matrix product on a class of orthogonally connected systolic arrays,"IEEE Trans. Comput., vol. C-36, pp. 615-619, May 1987.
[31] W. L. Miranker and A. Winkler, "Spacetime representations of computational structures,"Computing, vol. 32, pp. 93-114, 1984.
[32] D.I. Moldovan, "On the analysis and synthesis of VLSI algorithms,"IEEE Trans. Comput., vol. C-31, pp. 1121-1126, Nov. 1982.
[33] D.I. Moldovan, "On the design of algorithms for VLSI systolic arrays,"Proc. IEEE, vol. 71, pp. 113-120, Jan. 1983.
[34] D.I. Moldovan, "Advis: A Software Package for the Design of Systolic Arrays,"IEEE Trans. Computer-Aided Design, Jan. 1987, pp. 33-40.
[35] C. H. Papadimitriou and J. D. Ullman, "A communication-time tradeoff,"SIAM J. Comput., vol. 16, no. 4, pp. 639-646, Aug. 1987.
[36] N. Petkov,Systolische Algorithmen und Arrays. Berlin, Germany: Akademie-Verlag, 1989.
[37] P. Quinton, "Automatic synthesis of systolic arrays from uniform recurrent equations," inProc. 11th Annu. Symp. Comput. Architecture, 1984, pp. 208-214.
[38] P. Quinton,The Systematic Design of Systolic Arrays. Princeton NJ: Princeton University Press, 1987, pp. 229-260.
[39] P. Quinton, "Mapping recurrences on parallel architectures," inSupercomuter Design: Hardware&Software, Vol. III. Int. Supercomputing Inst., Inc., 1988, pp. 1-8.
[40] S.V. Rajopadhye and R. M. Fujimoto, "Systolic array synthesis by static analysis of program dependencies, " inParallel Architectures and Languages, Lecture Notes in Computer Science, number 258, J. W. DeBakker, A. J. Nijman, and P. C. Treleaven, Eds. Berlin, Germany: Springer-Verlag. June 1987, pp. 295-310.
[41] S. V. Rajopadhye, S. Purushothaman, and R. M. Fujimoto, "On synthesizing systolic arrays from recurrence equations with linear dependencies," inFoundations of Software Technology and Theoretical Computer Science, Lecture Notes in Computer Science, number 241, K. V. Nori, Ed. Berlin. Germany: Springer-Verlag, Dec. 1986, pp. 485-503.
[42] I. V. Ramakrishnan, D. S. Fussell, and A. Silberschatz, "Mapping homogeneous graphs on linear arrays,"IEEE Trans. Comput., vol. C-35, pp. 189-209, Mar. 1986.
[43] S. K. Rao, "Regular iterative algorithms and their implementations on processor arrays," Ph.D. dissertation, Stanford Univ., Stanford, CA, Oct. 1985.
[44] C. Scheiman and P.R. Cappello, "A processor-time minimal systolic array for transitive closur," inProc. Int. Conf. on Application Specific Array Processors, Princeton, IEEE Computer Society, Sept. 1990, pp. 19-31.
[45] W. Shang and J. A. B. Fortes, "Time optimal linear schedules for algorithms with uniform dependencies," inProc. Int. Conf. Systolic Arrays, May 1988, pp. 393-402.
[46] L. Snyder, "Introduction to the configurable highly parallel computer,"IEEE Comput. Mag., vol. 5, pp. 47-56, Jan. 1982.
[47] E. E. Swartzlander, Ed.,Systolic Signal Processing Systems. New York: Marcel Dekker, July 1987.
[48] U. Weiser and A. L. Davis, "A wavefront notation tool for VLSI array design," inVLSI Systems and Computations, H. T. Kung, R. Sproull, and G. Steele, Eds. Rockville, MD: Computer Science Press, 1981, pp. 226-234.
[49] Y. Wong and J.-M. Delosme, "Optimization of computation time for systolic arrays," Dep. Comput. Sci. RR-651, Yale Univ., May 1989.
[50] Y. Wong and J.-M. Delosme, "Optimization of processor count for systolic arrays," Dep. Comput. Sci. RR-697, Yale Univ., May 1989.

Index Terms:
Index Termshexagon shaped; cylinder connected; processor-time-minimal systolic array; cubical meshalgorithms; directed acyclic graph; time-minimal multiprocessor schedules;processor-time-minimal scheduling; triangular shaped 2-D directed mesh; matrix product;processing elements; topology; 2-D directed mesh; computational complexity; directedgraphs; parallel algorithms; systolic arrays
P. Cappello, "A Processor-Time-Minimal Systolic Array for Cubical Mesh Algorithms," IEEE Transactions on Parallel and Distributed Systems, vol. 3, no. 1, pp. 4-13, Jan. 1992, doi:10.1109/71.113078
Usage of this product signifies your acceptance of the Terms of Use.