Publication 1995 Issue No. 5 - May Abstract - Design of Space-Optimal Regular Arrays for Algorithms with Linear Schedules
Design of Space-Optimal Regular Arrays for Algorithms with Linear Schedules
May 1995 (vol. 44 no. 5)
pp. 683-694
 ASCII Text x Pen-Yuang Chang, Jong-Chuang Tsay, "Design of Space-Optimal Regular Arrays for Algorithms with Linear Schedules," IEEE Transactions on Computers, vol. 44, no. 5, pp. 683-694, May, 1995.
 BibTex x @article{ 10.1109/12.381953,author = {Pen-Yuang Chang and Jong-Chuang Tsay},title = {Design of Space-Optimal Regular Arrays for Algorithms with Linear Schedules},journal ={IEEE Transactions on Computers},volume = {44},number = {5},issn = {0018-9340},year = {1995},pages = {683-694},doi = {http://doi.ieeecomputersociety.org/10.1109/12.381953},publisher = {IEEE Computer Society},address = {Los Alamitos, CA, USA},}
 RefWorks Procite/RefMan/Endnote x TY - JOURJO - IEEE Transactions on ComputersTI - Design of Space-Optimal Regular Arrays for Algorithms with Linear SchedulesIS - 5SN - 0018-9340SP683EP694EPD - 683-694A1 - Pen-Yuang Chang, A1 - Jong-Chuang Tsay, PY - 1995KW - Algorithm mappingKW - data dependencyKW - linear scheduleKW - matrix multiplicationKW - optimizing compilerKW - space-optimalKW - systolic array.VL - 44JA - IEEE Transactions on ComputersER -

Abstract—The problem of designing space-optimal 2D regular N×N×N cubical mesh algorithms with linear schedule ai+bj+ck, 1 ≤abc, and N=nc, is studied. Three novel nonlinear processor allocation methods, each of which works by combining a partitioning technique (gcd-partition) with different nonlinear processor allocation procedures (traces), are proposed to handle different cases. In cases where a+bc, which are dealt with by the first processor allocation method, space-optimal designs can always be obtained in which the number of processing elements is equal to $\left\{N^2\over c\right\}$. For other cases where a+b > c and either a=b and b=c, two other optimal processor allocation methods are proposed. Besides, the closed form expressions for the optimal number of processing elements are derived for these cases.

[1] H.T. Kung and C.E. Leiserson,“Systolic arrays for VLSI,” Proc. 1978 Soc. for Industrial and Applied Math., pp. 256-282, 1979.
[2] J.A.B. Fortes and B.W. Wah,“Systolic arrays—From concept to implementation,” Computer, pp. 12-17, July 1987.
[3] J.C. Tsay and P.Y. Chang,“Some new designs of 2D array for matrix multiplication and transitive closure,” IEEE Trans. Parallel and Distributed Systems. Submitted for publication.
[4] P.Y. Chang and J.C. Tsay,“A family of efficient regular arrays for algebraic path problem,” IEEE Trans. Computers, vol. 43, no. 7, pp. 769-777, July 1994.
[5] J.C. Tsay and P.Y. Chang,“Design of efficient regular arrays for matrix multiplication by two step regularization,” IEEE Trans. Parallel and Distributed Systems, vol. 6, no. 2, pp. 215-222, Feb. 1995.
[6] V. Van Dongen and P. Quinton,“Uniformization of linear recurrence equations: A step towards the automatic synthesis of systolic arrays,” Proc. Int’l Conf. Systolic Arrays, pp. 473-482, 1988.
[7] S.Y. Kung, VLSI Array Processors. Prentice Hall, 1988.
[8] Y.W. Wong and J.M. Delosme,“Broadcast removal in systolic algorithms,” Proc. Int’l Conf. Systolic Arrays, pp. 403-412, 1988.
[9] S.Y. Kung,S.C. Lo,, and P.S. Lewis,“Optimal systolic design for the transitive closure and the shortest path problems,” IEEE Trans. Computers, vol. 36, pp. 603-614, May 1987.
[10] C. Choffrut and K. Culik II,“Folding of the plane and the design of systolic arrays,” Information Processing Letters, vol. 17, pp. 149-153, 1983.
[11] J.M. Delosme and I.C.F. Ipsen,“Efficient systolic arrays for the solution of toeplitz systems: An illustration of a methodology for the construction ofsystolic architectures in VLSI,” Proc. Int’l Workshop on Systolic Arrays, pp. 37-45, 1986.
[12] J.H. Moreno and T. Lang,“Graph-based partitioning of matrix algorithms for systolic arrays: application to transitive closure,” Proc. Int’l Conf. Parallel Processing, pp. 28-31, 1988.
[13] P. Quinton, “Automatic Synthesis of Systolic Arrays from Uniform Recurrent Equations,” Proc. 11th Ann. Int'l Symp. Computer Architecture, pp. 208-214, June 1984.
[14] S.K. Rao,“Regular iterative algorithms and their implementations on processor arrays,” PhD thesis, Stanford Univ., 1985.
[15] D.I. Moldovan and J.A.B. Fortes, “Partitioning and Mapping Algorithms into Fixed Size Systolic Arrays,” IEEE Trans. Computers, vol. 35, no. 1, pp.1-12, Jan. 1986.
[16] W.L. Miranker and A. Winkler,“Spacetime representations of computational structures,” Computing, vol. 32, pp. 92-114, 1984.
[17] P.Z. Lee and Z.M. Kedem,“Synthesizing linear array algorithms from nested for loop algorithms,” IEEE Trans. Computers, vol. 37, pp. 1,578-1,598, Dec. 1988.
[18] V.P. Roychowdhury and T. Kailath,“Subspace scheduling and parallel implementation of non-systolic regular iterative algorithms,” J. of VLSI Signal Processing, vol. 1, pp. 127-142, 1989.
[19] V. Van Dongen,“Quasi-regular arrays: Definition and design methodology,” Proc. Int’l Conf. on Systolic Arrays, pp. 126-135, 1989.
[20] W. Shang and J.A.B. Fortes, "Time Optimal Linear Schedules for Algorithms with Uniform Dependencies," IEEE Trans. Computers, vol. 40, June 1991.
[21] P. Cappello,“A processor-time-minimal systolic array for cubical mesh algorithms,” IEEE Trans. Parallel and Distributed Systems, vol. 3, pp. 4-13, Jan. 1992.
[22] C.J. Scheiman and R.P. Cappello,“A processor-time-minimal systolic array for transitive closure,” IEEE Trans. Parallel and Distributed Systems, vol. 3, pp. 257-269, May 1992.
[23] A. Benaini and Y. Robert,“Spacetime-minimal systolic arrays for gaussian elimination and the algebraic path problem,” Parallel Computing, vol. 15, pp. 211-225, 1990.
[24] P. Clauss,C. Mongenet,, and G.R. Perrin,“Synthesis of size-optimal toroidal arrays for the algebraic path problem: A new contribution,” Parallel Computing, vol. 18, pp. 185-194, 1992.
[25] P. Clauss,C. Mongenet,, and G. Perrin,“Calculus of space-optimal mappings of systolic algorithms on processor arrays,” J. VLSI Signal Processing, vol. 4, pp. 27-36, 1992.
[26] J. Bu and E.F. Deprettere,“Processor clustering for the design of optimal fixed-size systolic arrays,” Proc. Int’l Conf. on Application Specific Array Processors, pp. 402-413, Sept. 1991.
[27] A. Darte,T. Risset,, and Y. Robert,“Synthesizing systolic arrays: some recent developments,” Proc. Int’l Conf. on Application Specific Array Processors, pp. 372-386, Sept. 1991.
[28] Y. Wong and J. M. Delosme,“Space-optimal linear processor allocation for systolic arrays synthesis,” Proc. Sixth Int’l Parallel Processing Symp., pp. 275-282, Mar. 1992.
[29] Y.C. Hou and J.C. Tsay,“Equivalent transformations on systolic design represented by generating functions,” J. Information Science and Eng., vol. 5, pp. 229-250, 1989.
[30] S.K. Rao and T. Kailath, “Regular Iterative Algorithms and Their Implementation on Processor Arrays,” IEEE Proc., pp. 259-269, Mar. 1988.
[31] P.Y. Chang and J.C. Tsay,“Timespace mapping for regular arrays,” Parallel Algorithms and Applications. Submitted for publication.
[32] E.M. Reingold,J. Nieverglt,, and N. Deo,Combinatorial Algorithms: Theory and Practice, Prentice Hall, Englewood Cliffs, N.J., 1977.
[33] P.S. Lewis and S.Y. Kung,“An optimal systolic array for the algebraic path problem,” IEEE Trans. Computers, vol. 40, pp. 100-105, Jan. 1991.

Index Terms:
Algorithm mapping, data dependency, linear schedule, matrix multiplication, optimizing compiler, space-optimal, systolic array.
Citation:
Pen-Yuang Chang, Jong-Chuang Tsay, "Design of Space-Optimal Regular Arrays for Algorithms with Linear Schedules," IEEE Transactions on Computers, vol. 44, no. 5, pp. 683-694, May 1995, doi:10.1109/12.381953