|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| Yooichi Shintani, Toru Shonai, Hiroshi Kurokawa, Kazunori Kuriyama, Akira Yamaoka, "Hierarchical Execution to Speed Up Pipeline Interlock in Mainframe Computers," IEEE Transactions on Computers, vol. 45, no. 5, pp. 589-599, May, 1996. | |||
| BibTex | x | ||
| @article{ 10.1109/12.509910, author = {Yooichi Shintani and Toru Shonai and Hiroshi Kurokawa and Kazunori Kuriyama and Akira Yamaoka}, title = {Hierarchical Execution to Speed Up Pipeline Interlock in Mainframe Computers}, journal ={IEEE Transactions on Computers}, volume = {45}, number = {5}, issn = {0018-9340}, year = {1996}, pages = {589-599}, doi = {http://doi.ieeecomputersociety.org/10.1109/12.509910}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Computers TI - Hierarchical Execution to Speed Up Pipeline Interlock in Mainframe Computers IS - 5 SN - 0018-9340 SP589 EP599 EPD - 589-599 A1 - Yooichi Shintani, A1 - Toru Shonai, A1 - Hiroshi Kurokawa, A1 - Kazunori Kuriyama, A1 - Akira Yamaoka, PY - 1996 KW - Benchmark KW - code optimization KW - compiler KW - hierarchical execution KW - mainframe computer KW - pipeline. VL - 45 JA - IEEE Transactions on Computers ER - | |||
Abstract—This paper introduces a methodology, called hierarchical execution, which reduces stalls caused by pipeline interlocks such as data and control dependencies. Since a lot of software has been accumulated in mainframe computer systems as object code, it is important to improve performance without having to recompile the code for optimization. Our methodology consists of a simple pre-ALU that generates results, with shorter latency than the main ALU, asynchronously, which reduces the overhead especially for address generation interlocks and branch instructions. This method was implemented in Hitachi's mainframe processors, M-680 and M-880. In M-680, the pre-ALU, together with the instruction decoder, processes instructions in superpipelined fashion, which further improves performance. The aggregate effect of hierarchical execution on CPU time, for evaluated benchmarks, is 10% on average, with only a 1.6% increase in hardware. Therefore, we can roughly say that the hierarchical execution method improved cost performance by 8%.
[1] IBM, Enterprise Systems Architecture/390 Principles of Operation, second edition, 1993
[2] J. Novitsky, M. Azimi, and R. Ghaznavi, “Optimizing Systems Performance Based on Pentium Processor,” Digest of Papers Compcon, IEEE CS Press, Spring 1993, pp. 63-72.
[3] J. Circello and F. Goodrich,“The Motorola 68060 Microprocessor,” Proc. Compcon, IEEE Computer Society Press, Los Alamitos, Calif., 1993, pp. 73-78.
[4] C.R. Moore,“The PowerPC 601 microprocessor,” COMPCON Spring’93, pp. 109-116, Feb. 1993.
[5] D. Dobberpuhl,R.T. Witek,R. Allmon,R. Anglin,D. Bertucci,S. Britton,L. Chao,R.A. Conrad,D.E. Dever,B. Gieseke,S.M.N. Hassoun,G.W. Hoeppner,K. Kuchler,M. Ladd,B.M. Leary,L. Madden,E.J. McLellan,D.R. Meyer,J. Montanaro,D.A. Priore,V. Rajagopalan,S. Samudrala, and S. Santhanam,"A 200 MHz 64 Bit Dual Issue CMOS Microprocessor," IEEE J. Solid-State Circuits, vol. 27, no. 11, pp. 1,555-1,567, Nov. 1992.
[6] E. Delano,W. Walker,J. Yetter,, and M. Forsyth,“A high speed superscalar PA-RISC processor,” COMPCON Spring’92, Feb. 1992.
[7] SPARC International, The SPARC Architecture Manual Version 8.Englewood Cliffs, N.J.: Prentice Hall, 1992.
[8] W.J. Nohilly and V.T. Lund,“IBM ES/9000 system architecture and hardware,” ICCD’91, pp. 540-543, Oct. 1991.
[9] Y. Shintani,K. Inoue,E. Kamada,T. Shonai,K. Wada,S. Abe,, and K. Wakai,“Logic design for a high performance mainframe computer, the HITAC M-880processor,” ICCD’91, pp. 14-20, Oct. 1991.
[10] A. Bashteen,I. Lui,, and J. Mullan,“A superpipeline approach to the MIPS architecture,” COMPCON Spring’91, pp. 8-12, Feb. 1991.
[11] G.F. Grohoski, "Machine Organization of the IBM RS/6000 Processor," IBM J. Research and Development, vol. 34, no. 1, pp. 37-58, Jan. 1990.
[12] J.E. Smith,"Dynamic Instruction Scheduling and the Astronautics ZS-1," Computer, pp. 21-35, July 1989.
[13] R.M. Tomasulo,"An Efficient Algorithm for Exploiting Multiple Arithmetic Units," IBM J. Research and Development, pp. 25-33, Jan. 1967.
[14] Y. Shintani,K. Inoue,E. Kamada, and T. Shonai,"A Performance and Cost Analysis of Applying Superscalar Method to Mainframe Computers," IEEE Trans. Computers, vol. 44, no. 7, July 1995

