This Article 
 Bibliographic References 
 Add to: 
In-Line Interrupt Handling and Lock-Up Free Translation Lookaside Buffers (TLBs)
May 2006 (vol. 55 no. 5)
pp. 559-574
The effects of the general-purpose precise interrupt mechanisms in use for the past few decades have received very little attention. When modern out-of-order processors handle interrupts precisely, they typically begin by flushing the pipeline to make the CPU available to execute handler instructions. In doing so, the CPU ends up flushing many instructions that have been brought in to the reorder buffer. In particular, these instructions may have reached a very deep stage in the pipeline—representing significant work that is wasted. In addition, an overhead of several cycles and wastage of energy (per exception detected) can be expected in refetching and reexecuting the instructions flushed. This paper concentrates on improving the performance of precisely handling software managed translation look-aside buffer (TLB) interrupts, one of the most frequently occurring interrupts. The paper presents a novel method of in-lining the interrupt handler within the reorder buffer. Since the first level interrupt-handlers of TLBs are usually small, they could potentially fit in the reorder buffer along with the user-level code already there. In doing so, the instructions that would otherwise be flushed from the pipe need not be refetched and reexecuted. Additionally, it allows for instructions independent of the exceptional instruction to continue to execute in parallel with the handler code. By in-lining the TLB interrupt handler, this provides lock-up free TLBs. This paper proposes the prepend and append schemes of in-lining the interrupt handler into the available reorder buffer space. The two schemes are implemented on a performance model of the Alpha 21264 processor built by Alpha designers at the Palo Alto Design Center (PADC), California. We compare the overhead and performance impact of handling TLB interrupts by the traditional scheme, the append in-lined scheme, and the prepend in-lined scheme. For small, medium, and large memory footprints, the overhead is quantified by comparing the number and pipeline state of instructions flushed, the energy savings, and the performance improvements. We find that lock-up free TLBs reduce the overhead of refetching and reexecuting the instructions flushed by 30-95 percent, reduce the execution time by 5-25 percent, and also reduce the energy wasted by 30-90 percent.

[1] T.E. Anderson, H.M. Levy, B.N. Bershad, and E.D. Lazowska, “The Interaction of Architecture and Operating System Design,” Proc. Fourth Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS '91), pp. 108-120, Apr. 1991.
[2] Z. Cvetanovic and R.E. Kessler, “Performance Analysis of the Alpha 21264-Based Compaq ES40 System,” Proc. 27th Ann. Int'l Symp. Computer Architecture (ISCA '00), pp. 192-202, June 2000.
[3] M.K. Gowan, L.L. Biro, and D.B. Jackson, “Power Considerations in the Design of the Alpha 21264 Microprocessor,” Proc. 35th Design Automation Conf., pp. 726-731, June 1998.
[4] L. Gwennap, “Intel's P6 Uses Decoupled Superscalar Design,” Microprocessor Report, vol. 9, no. 2, Feb. 1995.
[5] L. Gwennap, “Digital 21264 Sets New Standard,” Microprocessor Report, vol. 10, no. 14, Oct. 1996.
[6] D. Henry, B. Kuszmaul, G. Loh, and R. Sami, “Circuits for Wide-Window Superscalar Processors,” Proc. 27th Ann. Int'l Symp. Computer Architecture (ISCA '00), pp. 236-247, June 2000.
[7] D.S. Henry, “Adding Fast Interrupts to Superscalar Processors,” Technical Report Memo-366, MIT Computation Structures Group, Dec. 1994.
[8] J. Huck and J. Hays, “Architectural Support for Translation Table Management in Large Address Space Machines,” Proc. 20th Ann. Int'l Symp. Computer Architecture (ISCA '93), pp. 39-50, May 1993.
[9] B. Jacob and T.N. Mudge, “A Look at Several Memory-Management Units, TLB-Refill Mechanisms, and Page Table Organizations,” Proc. Eighth Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS '98), pp. 295-306, Oct. 1998.
[10] B. Jacob and T.N. Mudge, “Virtual Memory in Contemporary Microprocessors,” IEEE Micro, vol. 18, no. 4, pp. 60-75, July/Aug. 1998.
[11] B. Jacob and T.N. Mudge, “Virtual Memory: Issues of Implementation,” Computer, vol. 31, no. 6, pp. 33-43, June 1998.
[12] A. Jaleel and B. Jacob, “In-Line Interrupt Handling for Software-Managed TLBs,” Proc. 2001 IEEE Int'l Conf. Computer Design (ICCD 2001), Sept. 2001.
[13] A. Jaleel and B. Jacob, “Improving the Precise Interrupt Mechanism for Software Managed TLB Interrupts,” Proc. 2001 IEEE Int'l Conf. High Performance Computing (HiPC 2001), Dec. 2001.
[14] T. Juan, T. Lang, and J.J. Navarro, “Reducing TLB Power Requirements,” Proc. 1997 IEEE Int'l Symp. Low Power Electronics and Design (ISLPED '97), pp. 196-201, Aug. 1997.
[15] D. Nagle, R. Uhlig, T. Stanley, S. Sechrest, T. Mudge, and R. Brown, “Design Tradeoffs for Software-Managed TLBs,” Proc. 20th Ann. Int'l Symp. Computer Architecture (ISCA '93), May 1993.
[16] G. Kane and J. Heinrich, MIPS RISC Architecture. Englewood Cliffs, N.J.: Prentice-Hall, 1992.
[17] S.W. Keckler, A. Chang, W.S. Lee, S. Chatterjee, and W.J. Dally, “Concurrent Event Handling through Multithreading,” IEEE Trans. Computers, vol. 48, no. 9, pp 903-916, Sept. 1999.
[18] J. McCalpin, “An Industry Perspective on Performance Characterization: Applications vs. Benchmarks,” Proc. Third Ann. IEEE Workshop Workload Characterization, keynote address, Sept. 2000.
[19] M. Moudgill and S. Vassiliadis, “Precise Interrupts,” IEEE Micro, vol. 16, no. 1, pp. 58-67, Feb. 1996.
[20] X. Qiu and M. Dubois, “Tolerating Late Memory Traps in ILP Processors,” Proc. 26th Ann. Int'l Symp. Computer Architecture (ISCA '99), pp. 76-87, May 1999.
[21] M. Rosenblum, E. Bugnion, S.A. Herrod, E. Witchel, and A. Gupta, “The Impact of Architectural Trends on Operating System Performance,” Proc. 15th ACM Symp. Operating Systems Principles (SOSP '95), Dec. 1995.
[22] J.E. Smith and A.R. Pleszkun, “Implementation of Precise Interrupts in Pipelined Processors,” Proc. 12th Ann. Int'l Symp. Computer Architecture (ISCA '85), pp. 36-44, June 1985.
[23] G.S. Sohi and S. Vajapeyam, “Instruction Issue Logic for High-Performance, Interruptable Pipelined Processors,” Proc. 14th Ann. Int'l Symp. Computer Architecture (ISCA '87), June 1987.
[24] R.M. Tomasulo, “An Efficient Algorithm for Exploiting Multiple Arithmetic Units,” IBM J. Research and Development, vol. 11, no. 1, pp. 25-33, 1967.
[25] H.C. Torng and M. Day, “Interrupt Handling for Out-of-Order Execution Processors,” IEEE Trans. Computers, vol. 42, no. 1, pp. 122-127, Jan. 1993.
[26] M. Upton, personal comm., 1997.
[27] W. Walker and H.G. Cragon, “Interrupt Processing in Concurrent Processors,” Computer, vol. 28, no. 6, June 1995.
[28] K. Wilcox and S. Manne, “Alpha Processors: A History of Power Issues and a Look to the Future,” Compaq Computer Corp., 2001.
[29] K.C. Yeager, “The MIPS R10000 Superscalar Microprocessor,” IEEE Micro, vol. 16, no. 2, pp. 28-40, Apr. 1996.
[30] C.B. Zilles, J.S. Emer, and G.S. Sohi, “The Use of Multithreading for Exception Handling,” Proc. 32nd Int'l Symp. Microarchitecture, pp 219-229, Nov. 1999.

Index Terms:
Reorder-buffer (ROB), precise interrupts, exception handlers, in-line interrupt, lock-up free, translation lookaside buffers (TLBs), performance modeling.
Aamer Jaleel, Bruce Jacob, "In-Line Interrupt Handling and Lock-Up Free Translation Lookaside Buffers (TLBs)," IEEE Transactions on Computers, vol. 55, no. 5, pp. 559-574, May 2006, doi:10.1109/TC.2006.77
Usage of this product signifies your acceptance of the Terms of Use.