The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.08 - August (2009 vol.58)
pp: 1035-1048
Yedidya Hilewitz , Intel Corporation, Hudson
Ruby B. Lee , Princeton University, Princeton
ABSTRACT
This paper describes a new basis for the implementation of the shifter functional unit in microprocessors that can implement new advanced bit manipulations as well as standard shifter operations. Our design is based on the inverse butterfly and butterfly data path circuits, rather than the barrel shifter or log-shifter designs currently used. We show how this new shifter can implement the standard shift and rotate operations, as well as more advanced extract, deposit, and mix operations found in some processors. Furthermore, it can perform important new classes of even more advanced bit manipulation instructions like arbitrary bit permutations, bit gather (or parallel extract), and bit scatter (or parallel deposit) instructions. Thus, our new functional unit performs the functionality of three functional units—the basic shifter, the multimedia-mix unit, and the advanced bit manipulation functional unit, while having a latency only slightly longer than that of the log-shifter. For performing only the existing functions of a shifter, it has significantly smaller area.
INDEX TERMS
Shifter, rotation, shift, permutation, butterfly, inverse butterfly, bit manipulation, bit gather, bit scatter, microprocessor, instruction set architecture, processor architecture, circuit design, extract, deposit, mix, multimedia, arithmetic, parallel operations.
CITATION
Yedidya Hilewitz, Ruby B. Lee, "A New Basis for Shifters in General-Purpose Processors for Existing and Advanced Bit Manipulations", IEEE Transactions on Computers, vol.58, no. 8, pp. 1035-1048, August 2009, doi:10.1109/TC.2008.219
REFERENCES
[1] Intel Corporation, IA-32 Intel Architecture Software Developer's Manual, vol. 2, 2004.
[2] R.B. Lee, “Precision Architecture,” Computer, vol. 22, no. 1, pp. 78-91, Jan. 1989.
[3] Intel Corporation, Intel Itanium Architecture Software Developer's Manual, vol. 3, rev. 2.2, Jan. 2006.
[4] IBM Corporation, PowerPC Microprocessor Family: Programming Environments Manual for 64 and 32-Bit Microprocessors, ver. 2.0, June 2003.
[5] R.B. Lee and J. Huck, “64-Bit and Multimedia Extensions in the PA-RISC 2.0 Architecture,” Proc. IEEE Compcon '96, pp. 152-160, Feb. 1996.
[6] R.B. Lee, “Subword Parallelism with MAX-2,” IEEE Micro, vol. 16, no. 4, pp. 51-59, Aug. 1996.
[7] H.S. Warren Jr, Hackers's Delight. Addison-Wesley Professional, 2002.
[8] Y. Hilewitz and R.B. Lee, “Fast Bit Compression and Expansion with Parallel Extract and Parallel Deposit Instructions,” Proc. IEEE Int'l Conf. Application-Specific Systems, Architectures, and Processors (ASAP '06), pp. 65-72, Sept. 2006.
[9] Y. Hilewitz and R.B. Lee, “Fast Bit Gather, Bit Scatter and Bit Permutation Instructions for Commodity Microprocessors,” J.Signal Processing Systems, vol. 53, nos. 1/2, Nov. 2008.
[10] Y. Hilewitz and R.B. Lee, “Performing Advanced Bit Manipulations Efficiently in General-Purpose Processors,” Proc. 18th IEEE Symp. Computer Arithmetic (ARITH '07), June 2007.
[11] R.B. Lee, A.M. Fiskiran, and A. Bubshait, “Multimedia Instructions in IA-64,” Proc. IEEE Int'l Conf. Multimedia and Expo (ICME '01), pp. 281-284, Aug. 2001.
[12] Z. Shi and R.B. Lee, “Subword Sorting with Versatile Permutation Instructions,” Proc. Int'l Conf. Computer Design (ICCD '02), pp. 234-241, Sept. 2002.
[13] Z. Shi and R.B. Lee, “Bit Permutation Instructions for Accelerating Software Cryptography,” Proc. IEEE Int'l Conf. Application-Specific Systems, Architectures and Processors (ASAP '00), pp.138-148, July 2000.
[14] R.B. Lee, Z. Shi, and X. Yang, “Efficient Permutation Instructions for Fast Software Cryptography,” IEEE Micro, vol. 21, no. 6, pp.56-69, Nov./Dec. 2001.
[15] X. Yang and R.B. Lee, “Fast Subword Permutation Instructions Using Omega and Flip Network Stages,” Proc. Int'l Conf. Computer Design (ICCD '00), pp. 15-22, Sept. 2000.
[16] R.B. Lee, Z. Shi, and X. Yang, “How a Processor Can Permute $n$ Bits in O(1) Cycles,” Proc. 14th Symp. High Performance Chips (HotChips '02), Aug. 2002.
[17] Z. Shi, X. Yang, and R.B. Lee, “Arbitrary Bit Permutations in One or Two Cycles,” Proc. IEEE Int'l Conf. Application-Specific Systems, Architectures and Processors (ASAP '03), pp. 237-247, June 2003.
[18] R.B. Lee, X. Yang, and Z.J. Shi, “Single-Cycle Bit Permutations with MOMR Execution,” J. Computer Science and Technology, vol. 20, no. 5, pp. 577-585, Sept. 2005.
[19] J.P. McGregor and R.B. Lee, “Architectural Enhancements for Fast Subword Permutations with Repetitions in Cryptographic Applications,” Proc. Int'l Conf. Computer Design (ICCD '01), pp. 453-461, Sept. 2001.
[20] V.E. Beneš, “Optimal Rearrangeable Multistage Connecting Networks,” Bell System Technical J., vol. 43, no. 4, pp. 1641-1656, July 1964.
[21] Y. Hilewitz, Z.J. Shi, and R.B. Lee, “Comparing Fast Implementations of Bit Permutation Instructions,” Proc. 38th Asilomar Conf. Signals, Systems, and Computers (Asilomar '04), Nov. 2004.
[22] F.T. Leighton, Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes. Morgan Kaufmann, 1992.
[23] R.B. Lee and Y. Hilewitz, “Fast Pattern Matching with Parallel Extract Instructions,” Technical Report CE-L2005-002, Dept. of Electrical Eng., Princeton Univ., Feb. 2005.
[24] I. Sutherland, B. Sproull, and D. Harris, Logical Effort: Designing Fast CMOS Circuits. Morgan Kaufmann, 1999.
[25] Taiwan Semiconductor Manufacturing Corp., TCBN90GTHP: TSMC 90nm Core Library Databook, ver 1.1, Dec. 2006.
9 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool