This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Systolic Opportunities for Multidimensional Data Streams
April 2002 (vol. 13 no. 4)
pp. 388-398

Portable image processing applications require an efficient, scalable platform with localized computing regions. This paper presents a new class of area I/O systolic architecture to exploit the physical data locality of planar data streams by processing data where it falls. A synthesis technique using dependence graphs, data partitioning, and computation mapping is developed to handle planar data streams and to systematically design arrays with area I/O. Simulation results show that the use of area I/O provides a 16 times speedup over systems with perimeter I/O. Performance comparisons for a set of signal processing algorithms show that systolic arrays that consider planar data streams in the design process are up to three times faster than traditional arrays.

[1] K. Diefendorff and R. Dubey, “How Multimedia Workloads Will Change Processor Design,” Computer, vol. 30, no. 9, pp. 43-45, Sept. 1997.
[2] D. Matzke, “Will Physical Scalability Sabotage Performance Gains?” Computer, vol. 30, no. 9, pp. 37-39, Sept. 1997.
[3] W.J. Dally and S. Lacy, “VLSI Architecture: Past, Present, and Future,” Proc. 20th Anniversary Conf. Advanced Research in Very Large Systems Intelligence, pp. 232-241, Mar. 1999.
[4] H.T. Kung, “Why Systolic Architectures?” Computer, vol. 15, no. 1, pp. 37-46, Jan. 1982.
[5] H.T. Kung and C. Leiserson, “Systolic Arrays for VLSI,” Proc. Sparse Matrix Conf., pp. 245-282, 1978.
[6] D.W. Lake, “CMOS Image Capture for Digital Stills Cameras,” IS&T's 1998 PICS Conf., May 1998.
[7] B.E. Bayer, “Color Imaging Array,” US Patent No. 3971065, Eastman Kodak Inc., 1976.
[8] S.Y. Kung, VLSI Array Processors. Prentice Hall, 1988.
[9] R. Hughey and D.P. Lopresti, “Architecture of a Programmable Systolic Array,” Proc. Int'l Conf. Systolic Arrays, pp. 41-49, May 1988.
[10] S.V. Rajopadhye, “I/O Behavior of Systolic Arrays,” Very Large Systems Intelligence Signal Processing III, pp. 459-470, 1988.
[11] S.H. Unger, “A Computer Oriented towards Spatial Problems,” Proc. Inst. Radio Engineering and Electronics (IRE), vol. 46, pp. 1744-1750, Oct. 1958.
[12] F.C. Hennie, Iterative Arrays of Logical Circuits, Cambridge, Mass. and New York, NY: MIT Press and Wiley, 1961.
[13] D.I. Moldovan and J.A.B. Fortes, “Partitioning and Mapping Algorithms into Fixed Size Systolic Arrays,” IEEE Trans. Computers, vol. 35, no. 1, pp.1-12, Jan. 1986.
[14] Y. Hwang and Y. Hu, “On Systolic Mapping of Multi-Stage Algorithms,” Proc. IEEE Conf. Application Specific Array Processors, pp. 47-61, 1992.
[15] J.A.B. Fortes, K.S. Fu, B.W. Wah, “Systematic Approaches to the Design of Algorithmically Specified Systolic Arrays,” Proc. Int'l Conf Acoustic Speech and Signal Processing, pp. 300-303, 1985.
[16] S.K. Rao and T. Kailath, “Regular Iterative Algorithms and Their Implementation on Processor Arrays,” IEEE Proc., pp. 259-269, Mar. 1988.
[17] R. Karp, R. Miller, and S. Winograd, "The Organization of Computations for Uniform Recurrence Equations," J. ACM, vol. 14, July 1967.
[18] P. Quinton, “Automatic Synthesis of Systolic Arrays from Uniform Recurrent Equations,” Proc. 11th Ann. Int'l Symp. Computer Architecture, pp. 208-214, June 1984.
[19] H.V. Jagadish, S.K. Rao, and T. Kailath, “Multiprocessor Architectures for Iterative Algorithms,” Proc. IEEE, vol. 75, no. 9, pp. 1304-1321, Sept. 1987.
[20] V.P. Roychowdhury, “Derivation, Extensions, and Parallel Implementation of Regular Iterative Algorithms,” PhD thesis, Dept. of Electrical Eng., Stanford Univ., Stanford, Calif., Dec. 1988.
[21] S.V. Rajopadhye and R.M. Fujimoto, “Synthesizing Systolic Arrays from Recurrence Equations,” Parallel Computing, vol. 14, no 2, pp. 163-189, June 1990.
[22] Y. Hwang and Y. Hu, “MSSM: A Design Aide for Multistage Systolic Mapping,” VLSI Signal Processing IV, pp. 147-156, 1990.
[23] D.K. Wilde and O. Sie, “Regular Array Synthesis Using ALPHA,” Proc. IEEE Int'l Conf. Application Specific Array Processors, pp. 200-211, Aug. 1994.
[24] M.S. Lam, A Systolic Array Optimizing Compiler. Boston, Mass.: Kluwer Academic, 1989.
[25] P.-S. Tseng, “A Systolic Array Parallelizing Compiler,” J. Parallel and Distributed Computing, vol. 9, no. 2, pp. 116-127, June 1990.
[26] D. Sarkar, “Cost and Time-Cost Effectiveness of Multiprocessing,” IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 6, June 1993.
[27] V. Kumar, A. Grama, A. Gupta, and G. Karypis, Introduction to Parallel Computing: Design and Analysis of Algorithms. Benjamin Cummings, 1994.
[28] H.T. Kung, L.M. Ruane, and D.W.L. Yen, “A Two Level Pipelined Systolic Array for Convolutions,” VLSI Systems and Computations, pp. 225-264, Oct. 1981.
[29] C.M. Wittenbrink and A.K. Somani, “Cache Tiling for High Performance Morphological Image Processing,” Machine Vision and Applications, vol. 7, no. 1, pp. 12-22, 1993.
[30] S. Mirchandaney and J. Saltz, “A Scheme for Supporting Automatic Data Migration on Multicomputers,” Proc. Fifth Distributed Memory Computing Conf., pp. 1028-1037, Apr. 1990.
[31] N. Ling and M.A. Bayoumi, “Systematic Algorithm Mapping for Multidimensional Systolic Arrays,” J. Parallel and Distributed Computer, vol. 7, no. 2, pp. 368-382, 1989.
[32] J.D. Foley, Computer Graphics: Principles and Practice, second ed. Reading, Mass.: Addison-Wesley, 1990.
[33] S.M. Chai, “Real Time Image Processing on Parallel Arrays for Gigascale Integration,” PhD dissertation, Georgia Inst. of Technology, Atlanta, Georgia, 1999.
[34] S. Lakhani, Y. Wang, A. Milenkovic, and V. Milutinovic, “3D Convolution on a 3D Systolic Array: Another Point of View,” Int'l J. Computers Applications, vol. 19, no. 3, pp.130-134, 1997.
[35] S. Lakhani, Y. Wang, A. Milenkovic, and V. Milutinovic, “2D Matrix Multiplication on a 3D Systolic Array,” Microelectronics J., vol. 27, no. 1 pp. 11-22, 1996.
[36] I.Z. Milentijevic, I.Z. Milovanovic, E.I. Milovanovic, and M.K. Stojcev, “The Design of Optimal Planar Systolic Arrays for Matrix Multiplication,” Computers&Mathematics with Applications, vol. 33, no. 6, pp. 17-35, 1997.
[37] M. Vishwanath, R.M. Owens, and M.J. Irwin, “VLSI Architectures for Discrete Wavelet Transform,” IEEE Trans. Circuits and Systems-II: Analog and Digital Signal Processing, vol. 42, no. 5, pp. 305-316, May 1995.
[38] H. Lim and E.E. Swartzlander Jr., “A Systolic Array for 2D DFT and 2D DCT,” Proc. IEEE Int'l Conf. Application Specific Array Processors, pp. 123-131, Aug. 1994.
[39] N. Ling and M.A. Bayoumi, “The Design and Implementation of Multidimensional Systolic Arrays for DSP Applications,” IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, vol. 2, pp. 1142-1145, May 1989.
[40] Semiconductor Industry Assoc., SEMATECH The National Technology Roadmap for Semiconductors, San Jose, Calif., 1997.
[41] J.D. Meindl, “Gigascale Integration: Is the Sky the Limit?” IEEE Circuits and Devices Magazine, vol. 12, no. 6, pp. 19-24, Nov. 1996.
[42] H.B. Bakoglu, Circuits, Interconnections, and Packaging for VLSI. New York: Addison-Wesley, 1990.
[43] R.R. Tummala, E.J. Rymaszewski, and A.G. Klopfenstein, Microelectronics Packaging Handbook, second ed. New York: Chapman&Hall, 1997.
[44] E. Hirt, M. Scheffler, and J.P. Wyss, “Area I/O's Potential for Future Processor Systems,” IEEE Micro, vol. 18, no. 4, pp. 42-49, July 1998.
[45] A.L. Rosenberg, “Three-Dimensional Integrated Circuits,” VLSI Aystems and Computations, H.T. Kung, R.F. Sproull, and G.L. Steele, eds., pp. 69-80, 1981.
[46] M. Little and J. Grinberg, “The 3D Computer: An Integrated Stack of WSI Wafers,” Wafer Scale Integration, chapter 8, Kluwer Academic, 1988.
[47] E.S. Eid and E. Fossum, “Real-Time Focal-Plane Array Image Processor,” Proc. Int'l Soc. Optical Eng., vol. 1197, pp. 2-12, 1989.
[48] E. Fossum, “Digital Camera System on a Chip,” Micro, vol. 18, no. 3, pp. 8-15, May 1998.
[49] D.S. Wills, J.M. Baker, H.H. Cat, S.M. Chai, L. Codrescu, J. Cruz-Rivera, J. Eble, A. Gentile, M. Hopper, W.S. Lacy, A. Lopez-Lagunas, P. May, S. Smith, and T. Taha, “Processing Architectures for Smart Pixel Systems,” IEEE J. Selected Topics in Quantum Electronics, vol. 2, no. 1, pp. 24-34, Apr. 1996.

Index Terms:
parallel computer architecture, systolic arrays, area I/O, design and performance evaluation
Citation:
Sek M. Chai, Scott Wills, "Systolic Opportunities for Multidimensional Data Streams," IEEE Transactions on Parallel and Distributed Systems, vol. 13, no. 4, pp. 388-398, April 2002, doi:10.1109/71.995819
Usage of this product signifies your acceptance of the Terms of Use.