This Article 
 Bibliographic References 
 Add to: 
MPS: Miss-Path Scheduling for Multiple-Issue Processors
December 1998 (vol. 47 no. 12)
pp. 1382-1397

Abstract—Many contemporary multiple issue processors employ out-of-order scheduling hardware in the processor pipeline. Such scheduling hardware can yield good performance without relying on compile-time scheduling. The hardware can also schedule around unexpected run-time occurrences such as cache misses. As issue widths increase, however, the complexity of such scheduling hardware increases considerably and can have an impact on the cycle time of the processor. This paper presents the design of a multiple issue processor that uses an alternative approach called miss path scheduling or MPS. Scheduling hardware is removed from the processor pipeline altogether and placed on the path between the instruction cache and the next level of memory. Scheduling is performed at cache miss time as instructions are received from memory. Scheduled blocks of instructions are issued to an aggressively clocked in-order execution core. Details of a hardware scheduler that can perform speculation are outlined and shown to be feasible. Performance results from simulations are presented that highlight the effectiveness of an MPS design.

[1] D. Papworth, "Tuning the Pentium Pro Microarchitecture," IEEE Micro, vol. 16, no. 2, pp. 8-15, Apr. 1996.
[2] L. Gwennap, "PA-8000 Combines Speed and Complexity," Microprocessor Report, vol. 8, no. 15, Nov. 1994.
[3] J.A. Fisher, "Trace Scheduling: A Technique for Global Microcode Compaction," IEEE Trans. Computers, vol. 30, no. 7, pp. 478-490, July 1981.
[4] S. Melvin, M. Shebanow, and Y. Patt, "Hardware Support for Large Atomic Units in Dynamically Scheduled Machines," Proc. 21st Ann. Workshop Microprogramming and Microarchitecture, pp. 60-66,San Diego, Calif., Dec. 1988.
[5] M. Franklin and M. Smotherman, "A Fill-Unit Approach to Multiple Instruction Issue," Proc. 27th Ann. Int'l Symp. Microarchitecture, pp. 162-171,San Jose, Calif., Dec. 1994.
[6] K. Ebcioglu, "Some Design Ideas for a VLIW Architecture for Sequential-Natured Software," Proc. IFIP Working Group 10.3 Working Conf. Parallel Processing, pp. 3-21,Pisa, Italy, 1988. Published as Parallel Processing, M. Cosnard et al., eds., NorthHolland.
[7] M. Franklin and M. Smotherman, "Improving CISC Instruction Decoding Performance Using a Fill Unit," Proc. 28th Ann. Int'l Symp. Microarchitecture, pp. 313-323,Ann Arbor, Mich., Dec. 1995.
[8] J.D. Johnson, "Expansion Caches for Superscalar Processors," Technical Report CSL-TR-94-630, Computer Systems Laboratory, Stanford Univ., Palo Alto, Calif., June 1994.
[9] E. Rotenberg, S. Bennett, and J. Smith, "Trace Cache: A Low Latency Approach to High Bandwidth Instruction Fetching," Proc. 29th Ann. ACM/IEEE Int'l Symp. on Microarchitecture, IEEE CS Press, Los Alamitos, Calif., 1996, pp. 24-34.
[10] R. Nair and M. Hopkins, "Exploiting Instruction Level Parallelism in Processors by Caching Scheduled Groups," Proc. 24th Ann. Int'l Symp. Computer Architecture,Denver, Colo., June 1997.
[11] J.R. Ellis, Bulldog: A Compiler for VLIW Architectures.Cambridge, Mass.: MIT Press, 1986.
[12] B.R. Rau, "Iterative Modulo Scheduling: An Algorithm for Software Pipelined Loops," Proc. 27th Ann. Int'l Symp. Microarchitecture,San Jose, Calif., Dec. 1994.
[13] J.E. Smith and A. Pleszkun, "Implementation of Precise Interrupts in Pipelined Processors," Proc. 12th Ann. Int'l Symp. Computer Architecture,Boston, June 1985.
[14] M.D. Smith, M.A. Horowitz, and M.S. Lam, "Efficient Superscalar Performance Through Boosting," Proc. Fifth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 248-259,Boston, Oct. 1992.
[15] S.A. Mahlke, W.Y. Chen, R.A. Bringmann, R.E. Hank, W.W. Hwu, B.R. Rau, and M.S. Schlansker, "Sentinel Scheduling: A Model for Compiler-Controlled Speculative Execution," ACM Trans. Computer Systems, vol. 11, no. 4, pp. 376-408, Nov. 1993.
[16] J.E. Thornton, Design of a Computer—The Control Data 6600.Glenview, Ill.: Scott, Foresman, and Co., 1970.
[17] S. Weiss and J.E. Smith, "Instruction Issue Logic for Pipelined Supercomputers," IEEE Trans. Computers, vol. 33, no. 11, pp. 1,013-1,022, Nov. 1984.
[18] S. Weiss and J.E. Smith, POWER and PowerPC.San Francisco: Morgan Kaufmann, 1994.
[19] Hewlett-Packard Corp., PA-RISC 1.1 Architecture and Instruction Set Reference Manual.Palo Alto, Calif.: Hewlett-Packard, 1994.
[20] P.P. Chang, S.A. Mahlke, W.Y. Chen, N.J. Warter, and W.W. Hwu, "IMPACT: An Architectural Framework for Multiple-Issue Processors," Proc. 18th Ann. Int'l Symp. Computer Architecture, pp. 276-275,Toronto, Ontario, Canada, May 1991.
[21] J.E. Smith, "A Study of Branch Prediction Strategies," Proc. Eighth Ann. Int'l Symp. Computer Architecture, pp. 135-148, June 1981.
[22] T.M. Conte, S. Banerjia, S.Y. Larin, K.N. Menezes, and S.W. Sathaye, "Instruction Fetch Mechanisms for VLIW Architectures with Compressed Encodings," Proc. 29th Ann. Int'l Symp. Microarchitecture, pp. 201-211,Paris, Dec. 1996.
[23] T.M. Conte et al., "Optimization of Instruction Fetch Mechanisms for High Issue Rates," Proc. 22nd Int'l Symp. on Computer Architecture, IEEE CS Press, Los Alamitos, Calif., 1995, pp. 333-344.
[24] S. Banerjia, K.N. Menezes, and T.M. Conte, "NextPC Computation for a Banked Instruction Cache for a VLIW Architecture with a Compressed Encoding," technical report, Dept. of Electrical and Computer Eng., North Carolina State Univ., Raleigh, N.C., June 1996.

Index Terms:
Multiple instruction issue, miss path scheduling, instruction level parallelism, schedule cache.
Sanjeev Banerjia, Sumedh W. Sathaye, Kishore N. Menezes, Thomas M. Conte, "MPS: Miss-Path Scheduling for Multiple-Issue Processors," IEEE Transactions on Computers, vol. 47, no. 12, pp. 1382-1397, Dec. 1998, doi:10.1109/12.737684
Usage of this product signifies your acceptance of the Terms of Use.