Issue No. 05 - May (2007 vol. 56)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TC.2007.1010
Shorin Kyo , Media and Information Research Laboratories, NEC Corporation, Nakahara-Ku, Kawasaki, Kanagawa, Japan
Shin'ichiro Okazaki , Media and Information Research Laboratories, NEC Corporation, Nakahara-Ku, Kawasaki, Kanagawa, Japan
Tamio Arai , Department of Precision Engineering, University of Tokyo, Bunkyo-Ku, Tokyo, Japan
Embedded processors for video image recognition in most cases not only need to address the conventional cost (die size and power) versus real-time performance issue, but must also maintain high flexibility due to the immense diversity of recognition targets, situations, and applications. This paper describes IMAP, a highly parallel SIMD linear processor and memory array architecture that addresses these trade-off requirements. By using parallel and systolic algorithmic techniques, but based on a simple linear array architecture, IMAP successfully exploits not only the straightforward per-image row data level parallelism (DLP), but also the inherent DLP of other memory access patterns frequently found in various image recognition tasks, while allowing programming to be done using an explicit parallel C language (1DC). We describe and evaluate IMAP-CE, one of the latest IMAP processors, integrating 128 100 MHz 8 bit 4-way VLIW PEs, 128 2 KByte RAMs, and one 16 bit RISC control processor onto a single chip. The PE instruction set is enhanced to support 1DC code. The die size of IMAP-CE is 11 times11 mm2 integrating 32.7 M transistors, while the power consumption is, on average, approximately 2 watts. IMAP-CE is evaluated mainly by comparing its performance while running 1DC code with that of a 2.4 GHz Intel P4 running optimized C code. Based on the use of parallelizing techniques, benchmark results show a speed increase of up to 20 times for image filter kernels and of 4 times for a full image recognition application
systolic arrays, C language, embedded systems, image recognition, instruction sets, parallel algorithms, parallel memories, random-access storage, reduced instruction set computing
S. Kyo, S. Okazaki and T. Arai, "An Integrated Memory Array Processor for Embedded Image Recognition Systems," in IEEE Transactions on Computers, vol. 56, no. 5, pp. 622-634, 2008.