This Article 
 Bibliographic References 
 Add to: 
Portable Video Supercomputing
August 2004 (vol. 53 no. 8)
pp. 960-973

Abstract—As inexpensive imaging chips and wireless telecommunications are incorporated into an increasing array of portable products, the need for high efficiency, high throughput embedded processing will become an important challenge in computer architecture. Videocentric applications, such wireless videoconferencing, real-time video enhancement and analysis, and new, immersive modes of distance education, will exceed the computational capabilities of current microprocessor and digital signal processor (DSP) architectures. A new class of embedded computers, Portable Video Supercomputers, will combine supercomputer performance with the energy efficiency required for deployment in portable systems. This paper examines one candidate portable video supercomputer, a low memory, monolithically integrated SIMD architecture (SIMPil) that exploits the substantial data parallelism that exists in a suite of implemented video processing applications. The processing element microarchitecture is optimized using a novel technique that combines application simulation and technology modeling to provide a desired combination of performance, area, and energy consumption. Analysis results show that, for MPEG encoding, a SIMPil array implemented in 100 nm CMOS provides 100x greater performance and 10x higher energy efficiency than today's DSPs implemented 150 nm CMOS. This is accomplished using execution parallelism and a carefully selected processing element design. This research demonstrates that appropriately designed SIMD arrays, implemented monolithically in today's technology, can provide high performance and high efficiency for embedded video processing.

[1] K. Diefendorff and R. Dubey, “How Multimedia Workloads Will Change Processor Design,” Computer, vol. 30, no. 9, pp. 43-45, Sept. 1997.
[2] A. Gentile, Portable Multimedia Supercomputers: System Architecture Design and Evaluation PhD dissertation, Georgia Inst. of Technology, Atlanta, 2000.
[3] Texas Instruments, TMS320C6411 Datasheet SPRS196, Texas Instruments Inc., 2000.
[4] Texas Instruments, TMS320C547 Datasheet SPRS078F, Texas Instruments Inc., 2000.
[5] Texas Instruments, TMS320C5502 Datasheet SPRS166A, Texas Instruments Inc., 2002.
[6] P. Biswas, A. Hasegawa, S. Mandaville, M. Debbage, A. Sturges, F. Arakawa, Y. Saito, and K. Uchiyama, SH-5: The 64-Bit SuperH Architecture IEEE Micro, vol. 20, no. 4, pp. 28-39, July/Aug. 2000.
[7] L.W. Tucker and G.G. Robertson, Architecture and Applications of the Connection Machine Computer, vol. 21, no. 8, pp. 26-38, Aug. 1988.
[8] Connection Machine Model CM-2 Technical Summary Thinking Machines Corp., ver. 51, May 1989.
[9] MasPar (MP-2) System Data Sheet, MasPar Corp., 1993.
[10] W.F. Wong and K.T. Lua, A Preliminary Evaluation of a Massively Parallel Processor: GAPP Microprocessor Microprogramming, vol. 29, no. 1, pp. 53-62, July 1990.
[11] M.J. Irwin and R.M. Owens, A Two-Dimensional, Distributed Logic Processor IEEE Trans. Computers, vol. 40, no. 10, pp. 1094-1101, Oct. 1991.
[12] M. Bolotski, R. Armithrajah, and W. Chen, ABACUS: A High Performance Architecture for Vision Proc. Int'l Conf. Pattern Recognition, 1994.
[13] A. Saulsbury, P. Pong, and A. Nowatzyk, Missing the Memory Wall: The Case for Processor/Memory Integration Proc. 23rd Int'l Symp. Computer Architecture, pp. 90-101, May 1996.
[14] D. Patterson, T. Anderson, N. Cardwell, R. Fromm, K. Keeton, C. Kozyrakis, R. Thomas, and K. Yelick, A Case for Intelligent DRAM: IRAM IEEE Micro, Apr. 1997.
[15] C. Kozyrakis, D. Judd, J. Gebis, S. Williams, D. Patterson, and K. Yelick, Hardware/Compiler Codevelopment for an Embedded Media Processor Proc. IEEE, vol. 89, no. 11, pp. 1694-1709, Nov. 2001.
[16] D. Cronquist, C. Fisher, M. Figueroa, P. Franklin, and C. Ebeling, Architecture Design of Reconfigurable Pipelined Datapaths Proc. 20th Anniversary Conf. Advanced Research in VLSI, pp. 23-40, 1997.
[17] H. Singh, M.H. Lee, G. Lu, F.J. Kurdahi, N. Bagherzadeh, T. Ladeh, R. Heaton, and E.M.C. Filho, MorphoSys: An Integrated Reconfigurable Architecture Proc. NATO Symp. Systems Concepts and Integration, 1998.
[18] E. Waingold, M. Taylor, D. Srikrishna, V. Sarkar, W. Lee, V. Lee, J. Kim, M. Frank, P. Finch, R. Barua, J. Babb, S. Amarasinghe, and A. Agarwal, “Baring It All to Software: Raw Machines,” Computer, pp. 86-93, Sept. 1997.
[19] V. Baumgarte, F. May, A. Nückel, M. Vorbach, and M. Weinhardt, PACT XPP A Self-Reconfigurable Data Processing Architecture Proc. Eng. of Reconfigurable Systems and Algorithms (ERSA2001), 2001.
[20] C.P. Feigel, TI Introduces Four-Processor DSP Chip Microprocessor Report, pp. 22-25, 28 Mar. 1994.
[21] Texas Instruments, Fixed- and Floating-Point DSPS One Architecture DSP Products, `C6x Information, Texas Instruments Inc., 1998.
[22] P. Kalapathy, Hardware-Software Interactions on MPACT IEEE Micro, vol. 17, pp. 20-26, 1997.
[23] S. Rathnam and G. Slavenburg, "An Architectural Overview of the Programmable Multimedia Processor, TM-1," Proc. Compcon, IEEE Computer Society Press,Los Alamitos, Calif., 1996, pp. 319-326.
[24] A. Peleg and U. Weiser, “MMX Technology Extension to the Intel Architecture,” IEEE Micro, vol. 16, no. 4, pp. 42-50, Aug. 1996.
[25] S.K. Raman, V. Pentkovski, and J. Keshava, Implementing Streaming SIMD Extensions on the Pentium III Processor IEEE Micro, vol. 20, no. 4, pp. 28-39, July/Aug. 2000.
[26] R.B. Lee, Subword Parallelism with MAX-2 IEEE Micro, vol. 16, no. 4, pp. 51-59, Aug. 1996.
[27] M. Tremblay, J.M. O'Connor, V. Narayanan, and L. He, VIS Speeds New Media Processing IEEE Micro, vol. 16, no. 4, pp. 10-20, Aug. 1996.
[28] M. Phillip et al., AltiVec Technology: Accelerating Media Processing across the Spectrum Proc. HOTCHIPS10, Aug. 1998.
[29] A. Bellaouar and M.I. Elmasry, Low-Power Digital VLSI Design: Circuits and Systems. Boston: Kluwer Academic, 1995.
[30] T.D. Burd and R.W. Brodersen, Energy Efficient CMOS Microprocessor Design Proc. 28th Hawaii Int'l Conf. System Sciences, Jan. 1995.
[31] A. Chandrakasan, S. Sheng, and R. Brodersen, "Low-Power CMOS Digital Design," IEEE J. Solid-State Circuits, Apr. 1992, pp. 473-484.
[32] V.M. Bove Jr. and J.A. Watlington, "Cheops: A Reconfigurable Data-Flow System for Video Processing," IEEE Trans. Circuits and Systems for Video Technology, vol. 5, no. 2, Apr. 1995, pp. 140-149.
[33] B. Khailany, W.J. Dally, U.J. Kapasi, P. Mattson, J. Namkoong, J.D. Owens, B. Towles, A. Chang, and S. Rixner, Imagine: Media Processing with Streams IEEE Micro, vol. 21, no. 2, pp. 35-46, Mar./Apr. 2001.
[34] S. Rixner et al., "A Bandwidth-Efficient Architecture for Media Processing," Proc. 31st Int'l Symp. Microarchitecture, IEEE Computer Society Press, Los Alamitos, Calif., 1998, pp. 3-13.
[35] S. Ciricescu, R. Essick, B. Lucas, P. May, K. Moat, J. Norris, M. Schuette, and A. Saidi, The Reconfigurable Streaming Vector Processor (RSVPTM) IEEE Micro, vol. 21, no. 2, pp. 35-46, Mar./Apr. 2001.
[36] C. Mead, Analog VLSI and Neural Systems. Reading, Mass.: Addison-Wesley, 1989.
[37] C.B. Kuznia, A.A. Sawchuk, and L. Cheng, FET-SEED Smart Pixels for Free-Space Digital Optics Systems Optical Computing. 1995 Technical Digest Series, vol. 10, pp. 108-110, 1995.
[38] Y.-K. Chen and S.Y. Kung, Multimedia Signal Processors: An Architectural Platform with Algorithmic Compilation J. VLSI Signal Processing Systems, vol. 20, nos. 1/2, pp. 183-206, 1998.
[39] H.B. Bakoglu, Circuits, Interconnections, and Packaging for VLSI. New York: Addison-Wesley, 1990.
[40] R.R. Tummala, E.J. Rymaszewski, and A.G. Klopfenstein, Microelectronics Packaging Handbook, second ed. New York: Chapman&Hall, 1997.
[41] Semiconductor Industry Assoc., The International Technology Roadmap for Semiconductors, 2003, http:/
[42] J.D. Meindl, J.A. Davis, P. Zarkesh-Ha, C.S. Patel, K.P. Martin, and P.A. Kohl, Interconnect Opportunities for Gigascale Integration IBM J. Research&Development, vol. 46, pp. 245-263, Mar/May 2002.
[43] E. Fossum, “Digital Camera System on a Chip,” Micro, vol. 18, no. 3, pp. 8-15, May 1998.
[44] D.S. Wills, J.M. Baker, H.H. Cat, S.M. Chai, L. Codrescu, J. Cruz-Rivera, J. Eble, A. Gentile, M. Hopper, W.S. Lacy, A. Lopez-Lagunas, P. May, S. Smith, and T. Taha, “Processing Architectures for Smart Pixel Systems,” IEEE J. Selected Topics in Quantum Electronics, vol. 2, no. 1, pp. 24-34, Apr. 1996.
[45] S.M. Chai and D.S. Wills, Systolic Opportunities for Multidimensional Data Streams IEEE Trans. Parallel and Distributed Systems, vol. 13, no. 4, pp. 388-398, Apr. 2002.
[46] A.L. Rosenberg, Three-Dimensional Integrated Circuits VLSI Systems and Computations, H.T. Kung, R.F. Sproull, and G.L. Steele eds., pp. 69-80, Rockville, Md.: Computer Science Press, 1981.
[47] H.H. Cat, A. Gentile, J.C. Eble, M. Lee, O. Vendier, Y.J. Joo, D.S. Wills, M. Brooke, N.M. Jokerst, A.S. Brown, and R. Leavitt, SIMPil: An OE Integrated SIMD Architecture for Focal Plane Processing Applications Proc. Massively Parallel Processing Using Optical Interconnection (MPPOI-96), pp. 44-52, 1996.
[48] S. Bond, S. Jung, O. Vendier, M. Brooke, N.M. Jokerst, S. Chai, A. Lopez-Lagunas, and D.S. Wills, 3D Stacked Si CMOS VLSI Smart Pixels Using Through-Si Optoelectronic Interconnections Proc. IEEE Lasers and Electro-Optics Soc. Summer Topical Meeting on Smart Pixels, pp. 27-28, July 1998.
[49] J.C. Eble, V.K. De, D.S. Wills, and J.D. Meindl, A Generic System Simulator (GENESYS) for ASIC Technology and Architecture Beyond 2001 Proc. Ninth Ann. IEEE Int'l ASIC Conf., pp. 193-196, Sept. 1996.
[50] J.C. Eble, A Generic System Simulator with Novel On-Chip Cache and Throughput Models for Gigascale Integration PhD dissertation, Georgia Inst. of Technology, Atlanta, 1998.
[51] J.D. Meindl, Low Power Microelectronics: Retrospect and Prospect Proc. IEEE, vol. 83, no. 4, pp. 619-635, 1995.
[52] S.M. Chai, T.M. Taha, D.S. Wills, and J.D. Meindl, Heterogeneous Architecture Models for Interconnect-Motivated System Design IEEE Trans. VLSI Systems, special issue on system level interconnect prediction, vol. 8, no. 6, pp. 660-670, Dec. 2000.
[53] L. Codrescu, S.P. Nugent, J.D. Meindl, and D.S. Wills, Modeling Technology Impact on Cluster Microprocessor Performance IEEE Trans. VLSI Systems, vol. 11, no. 5, pp. 909-920, Oct. 2003.
[54] P.E. Landman and J.M. Rabaey, Activity-Sensitive Architectural Power Analysis IEEE Trans. CAD of Integrated Circuits and Systems, vol. 15, no. 6, pp. 571-587, June 1996.
[55] W. Ye, N. Vijaykrishnan, M. Kandemir, and M.J. Irwin, The Design and Use of SimplePower: a Cycle-Accurate Energy Estimation Tool Proc. 37th Design Automation Conf., pp. 340-345, June 2000.
[56] C.C. Weems, E.M. Riseman, A.R. Hanson, and A. Rosenfield, The DARPA Image Understanding Benchmark for Parallel Computers J. Parallel and Distributed Computing, vol. 11, pp. 1-24, 1991.
[57] G.J. Nutt, A Case Study of Simulation as a Computer System Design Tool Computer, vol. 11, no. 10, pp. 31-36, Oct. 1978.
[58] M.C. Herbordt, J. Cravy, R. Sam, O. Kidwai, and C. Lin, A System for Evaluating Performance and Cost of SIMD Arrays Designs Proc. Seventh Symp. Frontiers of Massively Parallel Computations, 1999.
[59] J.M. Jennings, R.A. Heaton, and E.W. Davis, Comparative Performance Evaluation of a New SIMD Machine Proc. Third Symp. Frontiers of Massively Parallel Computations, pp. 255-258, 1990.
[60] W.B. Ligon III and U. Ramachandran, An Empirical Method for Exploring Reconfigurable Architectures J. Parallel and Distributed Computing, vol. 19, pp. 323-337, 1993.
[61] V. Cuppu, Cycle Accurate Simulator for TMS320C62x, 8 way VLIW DSP Processor ENEE 646-Digital Computer Design, Class Project Report, Univ. of Maryland, College Park,, 1999.

Index Terms:
Image and video processing systems, high efficiency SIMD architecture, processor design methodologies.
Antonio Gentile, D. Scott Wills, "Portable Video Supercomputing," IEEE Transactions on Computers, vol. 53, no. 8, pp. 960-973, Aug. 2004, doi:10.1109/TC.2004.48
Usage of this product signifies your acceptance of the Terms of Use.