This Article 
 Bibliographic References 
 Add to: 
Automated Custom Instruction Generation for Domain-Specific Processor Acceleration
October 2005 (vol. 54 no. 10)
pp. 1258-1270
Application-specific extensions to the computational capabilities of a processor provide an efficient mechanism to meet the growing performance and power demands of embedded applications. Hardware, in the form of new function units (or coprocessors), and the corresponding instructions are added to a baseline processor to meet the critical computational demands of a target application. In this paper, the design of a system to automate the instruction set customization process is presented. A dataflow graph design space exploration engine efficiently identifies computation subgraphs to create custom hardware and a compiler subgraph matching framework seamlessly exploits this hardware. We demonstrate the effectiveness of this system across a range of application domains and study the applicability of the custom hardware across an entire application domain. Generalization techniques are presented which enable the application-specific hardware to be more effectively used across a domain.

[1] A. Aho et al., “Code Generation Using Tree Pattern Matching and Dynamic Programming,” ACM Trans. Programming Languages and Systems, vol. 11, no. 4, pp. 491-516, Oct. 1989.
[2] A. Alomary et al., “PEAS-I: A Hardware/Software Co-Design System for ASIPs,” Proc. European Design Automation Conf., 1993.
[3] M. Arnold, “Instruction Set Extensions for Embedded Processors,” PhD thesis, Delft Univ. of Tech nology, 2001.
[4] K. Atasu et al., “Automatic Application-Specific Instruction-Set Extensions under Microarchitectural Constraints,” Proc. 40th Design Automation Conf., June 2003.
[5] P.M. Athanas et al., “Processor Reconfiguration through Instruction Set Metamorphosis,” Computer, vol. 18, no. 11, Nov. 1993.
[6] M. Baleani et al., “HW/SW Partitioning and Code Generation of Embedded Control Applications on a Reconfigurable Architecture Platform,” Proc. Workshop Hardware/Software Codesign, pp. 61-66, May 2002.
[7] J.P. Bennett, “A Methodology for Automated Design of Computer Instruction Sets,” PhD thesis, Univ. of Cambridge, 1988.
[8] P. Bose and E.S. Davidson, “Design of Instruction Set Architctures for Support of High-Level Languages,” Proc. Int'l Symp. Computer Architecture, June 1984.
[9] P. Brisk et al., “Instruction Generation and Regularity Extraction for Reconfigurable Processors,” Proc. Int'l Conf. Compilers, Architectures, and Synthesis for Embedded Systems, pp. 262-269, 2002.
[10] H. Choi et al., “Synthesis of Application Specific Instructions for Embedded DSP Software,” IEEE Trans. Computers, vol. 48, no. 6, pp. 603-614, June 1999.
[11] N. Clark et al., “OptimoDE: Programmable Accelerator Engines through Retargetable Customization,” Proc. HotChips 16, 2004.
[12] N. Clark, H. Zhong, and S. Mahlke, “Processor Acceleration through Automated Instruction Set Customization,” Proc. Int'l Symp. Microarchitecture, pp. 129-140, Dec. 2003.
[13] J. Cong et al., “Application-Specific Instruction Generation for Configurable Processor Architectures,” Proc. Int'l Symp. Field Programmable Gate Arrays, pp. 183-189, 2004.
[14] L. Cordella et al., “Performance Evaluation of the VF Graph Matching Algorithm,” Proc. Int'l Conf. Image Analysis and Processing, vol. 2, pp. 1038-1041, 1999.
[15] R.E. Gonzalez, “Xtensa: A Configurable and Extensible Processor,” IEEE Micro, vol. 20, no. 2, pp. 60-70, Mar. 2000.
[16] M. Gschwind, “Instruction Set Selection for ASIP Design,” Proc. Workshop Hardware/Software Codesign, May 1999.
[17] M.R. Guthaus et al., “MiBench: A Free, Commercially Representative Embedded Benchmark Suite,” Proc. IEEE Fourth Workshop Workload Characterization, Dec. 2001.
[18] J.R. Hauser and J. Wawrzynek, “GARP: A MIPS Processor with a Reconfigurable Coprocessor,” Proc. Symp. Field-Programmable Custom Computing Machines, Apr. 1997.
[19] B. Holmer, “Automatic Design of Computer Instruction Sets,” PhD thesis, Univ. of California, Berkeley, 1993.
[20] E. Horowitz and S. Sahni, “Exact and Approximate Algorithms for Scheduling Nonidentical Processors,” J. ACM, vol. 23, no. 2, pp. 317-327, 1976.
[21] I. Huang and A.M. Despain, “Synthesis of Application Specific Instruction Sets,” IEEE Trans. Computer Aided Design, vol. 14, no. 6, June 1995.
[22] G. Karypis et al., “Multilevel Hypergraph Partitioning: Applications in VLSI Domain,” technical report, Univ. of Minnesota, 1997.
[23] R. Kastner et al., “Instruction Generation for Hybrid Reconfigurable Systems,” ACM Trans. Design Automation of Electronic Systems, vol. 7, no. 4, Apr. 2002.
[24] C. Lee, M. Potkonjak, and W. Mangione-Smith, “MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communications Systems,” Proc. Int'l Symp. Microarchitecture, Dec. 1997.
[25] R. Leupers and P. Marwedel, “Instruction Selection for Embedded DSPs with Complex Instructions,” Proc. European Design Automation Conf., Sept. 1996.
[26] S. Liao et al., “Instruction Selection Using Binate Covering for Code Size Optimization,” Proc. Int'l Conf. Computer Aided Design, pp. 393-399, 1995.
[27] G. Memik, W.H. Mangione-Smith, and W. Hu, “NetBench: A Benchmarking Suite for Network Processors,” Proc. Int'l Conf. Computer Aided Design, pp. 39-43, 2001.
[28] K.V. Palem, S. Talla, and W.-F. Wong, “Compiler Optimizations for Adaptive EPIC Processors,” Proc. ACM Conf. Embedded Software, pp. 257-273, 2001.
[29] A. Peymandoust et al., “Automatic Instruction Set Extension and Utilization for Embedded Processors,” Proc. Int'l Conf. Application-Specific Systems, Architectures, and Processors, June 2003.
[30] J.V. Praet et al., “Instruction Set Definition and Instruction Selection for ASIPs,” Proc. Int'l Symp. High Level Synthesis, 1994.
[31] D.S. Rao et al., “On Clustering for Maximal Regularity Extraction,” IEEE Trans. Computer Aided Design, vol. 12, no. 8, Aug. 1993.
[32] R. Razdan and M.D. Smith, “A High-Performance Microarchitecture with Hardware-Programmable Function Units,” Proc. Int'l Symp. Microarchitecture, pp. 172-180, Dec. 1994.
[33] D. Seal, ARM Architecture Reference Manual. Addison-Wesley, 2000.
[34] F. Sun et al., “Synthesis of Custom Processors Based on Extensible Platforms,” Proc. Int'l Conf. Computer Aided Design, Nov. 2002.
[35] Trimaran, “An Infrastructure for Research in ILP,” http:/www., 2003.
[36] M.J. Wirthlin and B.L. Hutchings, “DISC: The Dynamic Instruction Set Computer,” Proc. IEEE Symp. FPGAs for Custom Computing Machines, pp. 92-103, 1995.
[37] L. Wu, C. Weaver, and T. Austin, “Cryptomaniac: A Fast Flexible Architecture for Secure Communication,” Proc. Int'l Symp. Computer Architecture, pp. 110-119, June 2001.
[38] Z.A. Ye et al., “CHIMAERA: A High-Performance Architecture with a Tightly-Coupled Reconfigurable Functional Unit,” Proc. Int'l Symp. Computer Architecture, pp. 225-235, 2000.

Index Terms:
Index Terms- Automatic synthesis, instruction set interpretation, special-purpose, instruction set design, special-purpose and application-based systems.
Nathan T. Clark, Hongtao Zhong, Scott A. Mahlke, "Automated Custom Instruction Generation for Domain-Specific Processor Acceleration," IEEE Transactions on Computers, vol. 54, no. 10, pp. 1258-1270, Oct. 2005, doi:10.1109/TC.2005.156
Usage of this product signifies your acceptance of the Terms of Use.