This Article 
 Bibliographic References 
 Add to: 
Interlock Collapsing ALU's
July 1993 (vol. 42 no. 7)
pp. 825-839

A device capable of executing interlocked fixed point arithmetic logic unit (ALU) instructions in parallel with other instructions causing the execution interlock is presented. The device incorporates the design of a 3-1 ALU and can execute two's complement, unsigned binary, and binary logical operations. It is shown that status for ALU operations using a 3-1 ALU can be determined in a parallel fashion, resulting in the compliance of the proposed device with predetermined architectural behavior of single instruction execution. The device requires no more logic stages than does a 3-1 binary adder using a carry-save adder (CSA) followed by a carry-lookahead adder (CLA) design. Design considerations using a commonly available CMOS technology are also reported, indicating that the device will not increase the machine cycle of an implementation. It is suggested that the device can maintain full architectural compatibility.

[1] P. M. Kogge,The Architecture of Pipelined Computers. New York: McGraw-Hill, 1981.
[2] R. M. Tomasulo, "An efficient algorithm for exploiting multiple arithmetic units,"IBM J. Res. Develop., pp. 25-33, Jan. 1967.
[3] R. D. Acosta, J. Kjelstrup, and H. C. Torng, "An instruction issuing approach to enhancing performance in multiple functional unit processors,"IEEE Trans. Comput., vol. C-35, pp. 815-828, Sept. 1986.
[4] H. S. Warren, "Instruction scheduling for the IBM RISC system/6000 processor,"IBM J. Res. Develop., vol. 34, no. 1, pp. 85-92, Jan. 1990.
[5] N. P. Jouppi, "The nonuniform distribution of instruction-level and machine parallelism and its effect on performance,"IEEE Trans. Comput., vol. 38, no. 12, pp. 1645-1658, Dec. 1989.
[6] N.P. Jouppi and D.W. Wall, "Available Instruction-Level Parallelism for Superpipelined and Superscalar Machines,"Third Int'l Conf. Architectural Support for Programming Languages and Operating Systems, IEEE CS Press, Los Alamitos, Calif., Order No. 1936, 1989, pp. 272-282.
[7] R.R. Oehler and R.D. Groves, "IBM RISC System/6000 Processor Architecture,"IBM J. Research and Development, Vol. 34, No. 1, Jan. 1990, pp. 23-36.
[8] A. Padegs, B. B. Moore, R. M. Smith, and W. Buchholz, "The IBM system/370 vector architecture: Design considerations,"IEEE Trans. Comput., vol. 37, no. 5, pp. 509-520, May 1988.
[9] W. A. Wulf, "The WM computer architecture,"Comput. Architecture News, vol. 16, no. 1, pp. 70-84, Mar. 1988.
[10] W. A. Wulf and C. Y. Hitchcock, III, "Apparatus for reading to and writing from memory streams of data while concurrently executing a plurality of data processing operations," U. S. Patent 4 819 155, Apr. 1989.
[11] D. W. Ruck, S. K. Rogers, M. Kabrinsky, M. E. Oxley, and B. W. Sutter, "The multilayer perceptron as an approximation to a Bayes optimal discriminant function,"IEEE Trans. Neural Networks, vol. 1, no. 4, pp. 296-298, Dec. 1990.
[12] S. Vassiliadis, "Compound instruction set machines," Private Communication, May 1989.
[13] S. Vassiliadis, B. Blaner, and R. Eickemeyer, "On the attributes of the SCISM organization,"Comput. Architecture News, vol. 20, no. 4, pp. 44-53, Sept. 1992.
[14] ESA/370 Principles of Operation, IBM Corp., SA22-7200-0, 1989.
[15] S. Vassiliadis, J. Phillips, and B. Blaner, "ICU design considerations," IBM, Endicott, NY, Tech. Rep. TR01.C114, p. 22, Oct. 1991.
[16] S. Vassiliadis and J. Phillips, "Interlock collapsing SCISM ALU design," IBM, Endicott, NY, Tech. Rep. TR01.C115, p. 31, Oct. 1991.
[17] S. Vassiliadis, "Recursive equations for hardwired binary adders,"Int. J. Electron., vol. 67, no. 2, pp. 201-213, Aug. 1989.
[18] M. J. Flynn and S. Waser,Introduction to Arithmetic for Digital Systems Designers. CBS College Publishing, 1982, pp. 215-222.
[19] S. Vassiliadis, "A comparison between adders with new defined carries and traditional schemes for addition,"Int. J. Electron., vol. 64, no. 4, pp. 617-626, Apr. 1988.
[20] H. Ling, "High speed binary adder,"IBM J. Res. Develop., vol. 25, no. 3, pp. 156-166, May 1981.
[21] N. T. Quach and M. J. Flynn, "High-speed addition in CMOS," Standford Univ., Tech. Rep. CSL-TR-90-415, p. 13, Feb. 1990.
[22] B. Olsson, R. Montoye, P. Markstein, and M. Nguyen Phu, "IBM RISC system/6000 floating-point unit," IBM Corp., Pub. Order SA23-2619, pp. 34-42, 1990.
[23] L. Dadda, "Some schemes for parallel multipliers,"Alta Frequenza, vol. 34, pp. 349-356, May 1965.
[24] S. Vassiliadis, E. Schwarz, and D. J. Hanrahan, "A general proof for overlapped multiple-bit scanning multiplications,"IEEE Trans. Comput., vol. 33, no. 2, pp. 172-183, Feb. 1989.
[25] S. Vassiliadis, E. Schwarz, and B. M. Sung, "Hard-wired multipliers with encoded partial products,"IEEE Trans. Comput., vol. 40, no. 11, pp. 1181-1197, Nov. 1991.
[26] S. Vassiliadis, D. S. Lemon, and M. Putrino, "S/370 sign-magnitude floating-point adder,"IEEE J. Solid-State Circuits, vol. 24, no. 4, pp. 1062-1070, Aug. 1989.
[27] N. Malik, R. Eickemeyer, and S. Vassiliadis, "Instruction-level parallelism for execution interlock collapsing,"Comput. Architecture News, vol. 20, no. 4, pp. 38-43, Sept. 1992.
[28] N. Malik, R. J. Eickenmeyer, and S. Vassiliadis, "Interlock collapsing ALU for increased instruction-level parallelism," inConf Proc. MICRO 25, (Portland, OR), Dec. 1992, pp. 149-157.

Index Terms:
interlocked fixed point arithmetic logic unit; two's complement; unsigned binary; binary logical operations; single instruction execution; carry-save adder; carry-lookahead adder; CMOS technology; machine cycle; architectural compatibility; adders; CMOS integrated circuits; digital arithmetic; parallel processing.
S. Vassiliadis, J. Phillips, B. Blaner, "Interlock Collapsing ALU's," IEEE Transactions on Computers, vol. 42, no. 7, pp. 825-839, July 1993, doi:10.1109/12.237723
Usage of this product signifies your acceptance of the Terms of Use.