This Article 
 Bibliographic References 
 Add to: 
Time-Predictable Out-of-Order Execution for Hard Real-Time Systems
September 2010 (vol. 59 no. 9)
pp. 1210-1223
Jack Whitham, University of York, York
Neil Audsley, University of York, York
Superscalar out-of-order CPU designs can achieve higher performance than simpler in-order designs through exploitation of instruction-level parallelism in software. However, these CPU designs are often considered to be unsuitable for hard real-time systems because of the difficulty of guaranteeing the worst-case execution time (WCET) of software. This paper proposes and evaluates modifications for a superscalar out-of-order CPU core to allow instruction-level parallelism to be exploited without sacrificing time predictability and support for WCET analysis. Experiments using the M5 O3 CPU simulator show that WCETs can be two-four times smaller than those obtained using an idealized in-order CPU design, as instruction-level parallelism is exploited without compromising timing safety.

[1] A. Burns and A.J. Wellings, Real-Time Systems and Programming Languages. Addison Wesley, 2001.
[2] T. Henties, J.J. Hunt, D. Locke, K. Nilsen, M. Schoeberl, and J. Vitek, "Java for Safety-Critical Applications," Proc. Int'l Workshop Certification of Safety-Critical Software Controlled Systems (SafeCert), Mar. 2009.
[3] J.L. Hennessy and D.A. Patterson, Computer Architecture: A Quantitative Approach, fourth ed. Morgan Kaufmann Publishers, Inc., 2006.
[4] D.W. Wall, "Limits of Instruction-Level Parallelism," Technical Report WRL-93-6, DEC Western Research Laboratory, WRL-93-6.pdf, 1995.
[5] R. Wilhelm, J. Engblom, A. Ermedahl, N. Holsti, S. Thesing, D. Whalley, G. Bernat, C. Ferdinand, R. Heckmann, T. Mitra, F. Mueller, I. Puaut, P. Puschner, J. Staschulat, and P. Stenström, "The Worst-Case Execution-Time Problem—Overview of Methods and Survey of Tools," Trans. Embedded Computing Systems, vol. 7, no. 3, pp. 1-53, 2008.
[6] P. Puschner and A. Burns, "Guest Editorial: A Review of WCET Analysis," Real-Time Systems, vol. 18, nos. 2/3, pp. 115-128, 2000.
[7] J. Whitham and N. Audsley, "Implementing Time-Predictable Load and Store Operations," Proc. Int'l Conf. Embedded Software (EMSOFT), pp. 265-274, 2009.
[8] R. Heckmann, M. Langenbach, S. Thesing, and R. Wilhelm, "The Influence of Processor Architecture on the Design and the Results of WCET Tools," Proc. IEEE, vol. 91, no. 7, pp. 1038-1054, July 2003.
[9] C. Berg, "PLRU Cache Domino Effects," Proc. Int'l Workshop Worst-Case Execution Time (WCET), 2006672, 2006.
[10] M. Schoeberl, "A Java Processor Architecture for Embedded Real-Time Systems," J. Systems Architecture, vol. 54, nos. 1/2, pp. 265-286,, 2008.
[11] S. Edwards and E.A. Lee, "The Case for the Precision Timed (PRET) Machine," Technical Report UCB/EECS-2006-149, EECS Dept., Univ. of California, Berkeley, http://www.eecs.berkeley. edu/Pubs/TechRpts/ 2006EECS-2006-149.html, Nov. 2006.
[12] J. Gustaffson, A. Ermedahl, and B. Lisper, "Algorithms for Infeasible Path Calculation," Proc. Int'l Workshop Worst-Case Execution Time (WCET), pp. 1-6, 2006.
[13] A. Ermedahl and J. Gustafsson, "Deriving Annotations for Tight Calculation of Execution Time," Lecture Notes in Computer Science, pp. 1298-1307, Springer, 1997.
[14] R. Chapman, "Static Timing Analysis and Program Proof," PhD dissertation, 05.tar.Z , 1995.
[15] A. Colin and I. Puaut, "A Modular and Retargetable Framework for Tree-Based WCET Analysis," Proc. Euromicro Conf. Real-Time Systems (ECRTS), pp. 37-44, 2001.
[16] Y.-T.S. Li and S. Malik, "Performance Analysis of Embedded Software Using Implicit Path Enumeration," Proc. Design Automation Conf. (DAC), pp. 456-461, 1995.
[17] P. Puschner and A. Schedl, "Computing Maximum Task Execution Times—a Graph-Based Approach," Real-Time Systems, vol. 13, no. 1, pp. 67-91, 1997.
[18] M. Schoeberl and R. Pedersen, "WCET Analysis for a Java Processor," Proc. Int'l Workshop Java Technologies for Real-time and Embedded Systems (JTRES), pp. 202-211, http://www.jopdesign. com/docwcet_jtres2006.pdf , 2006.
[19] D. Seal, ARM Architecture Reference Manual. Addison-Wesley Longman Publishing Co., Inc., 2000.
[20] Motorola Inc., MC68000 16-Bit Microprocessor User's Manual, third ed. Prentice Hall, 1982.
[21] N. Zhang, A. Burns, and M. Nicholson, "Pipelined Processors and Worst Case Execution Times," Real-Time Systems, vol. 5, no. 4, pp. 319-343, 1993.
[22] C.A. Healy, R.D. Arnold, F. Mueller, M.G. Harmon, and D.B. Whalley, "Bounding Pipeline and Instruction Cache Performance," IEEE Trans. Computers, vol. 48, no. 1, pp. 53-70, Jan. 1999.
[23] T. Lundqvist and P. Stenström, "Timing Anomalies in Dynamically Scheduled Microprocessors," Proc. Real-Time Systems Symp. (RTSS), p. 12, 1999.
[24] J. Reineke, B. Wachter, S. Thesing, R. Wilhelm, I. Polian, J. Eisinger, and B. Becker, "A Definition and Classification of Timing Anomalies," Proc. Int'l Workshop Worst-Case Execution Time (WCET), 2006.
[25] I. Wenzel, R. Kirner, P. Puschner, and B. Rieder, "Principles of Timing Anomalies in Superscalar Processors," Proc. Int'l Conf. Quality Software, Sept. 2005.
[26] A. Nicolau and J.A. Fisher, "Using an Oracle to Measure Potential Parallelism in Single Instruction Stream Programs," Proc. Int'l Symp. Microarchitecture (MICRO), pp. 171-182, 1981.
[27] G.S. Sohi, "Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers," IEEE Trans. Computers, vol. 39, no. 3, pp. 349-359, Mar. 1990.
[28] C. Rochange and P. Sainrat, "A Time-Predictable Execution Mode for Superscalar Pipelines with Instruction Prescheduling," Proc. Conf. Computing Frontiers (CF), pp. 307-314, 2005.
[29] A. Anantaraman, K. Seth, E. Rotenberg, and F. Mueller, "Enforcing Safety of Real-Time Schedules on Contemporary Processors Using a Virtual Simple Architecture," Proc. Real-Time Systems Symp. (RTSS), pp. 114-125, 2004.
[30] J. Whitham and N. Audsley, "Using Trace Scratchpads to Reduce Execution Times in Predictable Real-Time Architectures," Proc. Real-Time and Embedded Technology and Applications Symp. (RTAS), pp. 305-316, 2008.
[31] J. Fisher, P. Faraboschi, and C. Young, Embedded Computing: A VLIW Approach. Morgan Kaufmann, 2004.
[32] S. Mohan and F. Mueller, "Merging State and Preserving Timing Anomalies in Pipelines of High-End Processors," Proc. Real-Time Systems Symp. (RTSS), pp. 467-477, 2008.
[33] J. Fisher, "Trace Scheduling: A Technique for Global Microcode Compaction," IEEE Trans. Computers, vol. 30, no. 7, pp. 478-490, July 1981.
[34] D. Landskov, S. Davidson, B. Shriver, and P.W. Mallett, "Local Microcode Compaction Techniques," ACM Computing Surveys, vol. 12, no. 3, pp. 261-294, 1980.
[35] I. Puaut and C. Pais, "Scratchpad Memories vs Locked Caches in Hard Real-Time Systems: A Quantitative Comparison," Proc. Design, Automation, and Test in Europe (DATE) Conf., pp. 1484-1489, 2007.
[36] P.P. Chang, N.F. Warter, S.A. Mahlke, W.Y. Chen, and W. Hwu, "Three Architectural Models for Compiler-Controlled Speculative Execution," IEEE Trans. Computers, vol. 44, no. 4, pp. 481-494, Apr. 1995.
[37] J. Whitham, "Real-Time Processor Architectures for Worst Case Execution Time Reduction," PhD Thesis YCST-2008-01, Univ. of York, 2008.
[38] F. Bodin and I. Puaut, "A WCET-Oriented Static Branch Prediction Scheme for Real Time Systems," Proc. Euromicro Conf. Real-Time Systems (ECRTS), pp. 33-40, 2005.
[39] A.V. Aho, R. Sethi, and J.D. Ullman, Compilers: Principles, Techniques, and Tools. Addison Wesley, 1986.
[40] J. Whitham and N. Audsley, "Predictable Out-of-Order Execution Using Virtual Traces," Proc. Real-Time Systems Symp. (RTSS), pp. 445-455, 2008.
[41] J. Whitham and N. Audsley, "Forming Virtual Traces for WCET Analysis and Reduction," Proc. Int'l Conf. Embedded and Real-Time Computing Systems and Applications (RTCSA), pp. 377-386, 2008.
[42] A. Moshovos, "Exploiting Load/Store Parallelism via Memory Dependence Prediction," Speculative Execution in High Performance Computer Architectures, pp. 355-392, CRC Press, 2005.
[43] V. Suhendra, T. Mitra, A. Roychoudhury, and T. Chen, "WCET Centric Data Allocation to Scratchpad Memory," Proc. Real-Time Systems Symp. (RTSS), pp. 223-232, 2005.
[44] MRTC, "WCET Benchmarks," projects/ wcetbenchmarks.html, 2010.
[45] F. Bellard, "QEMU Open Source Processor Emulator,", 2007.
[46] D. Burger and T.M. Austin, "The SimpleScalar Tool Set, Version 2.0," SIGARCH Computer Architecture News, vol. 25, no. 3, pp. 13-25, 1997.
[47] M5 Project, "O3CPU—Execute-in-Execute Model," http://m5sim. org/wikiindex.php?title=O3CPU&oldid=2703#Execute-in- execu te_model , 2010.
[48] N.L. Binkert, R.G. Dreslinski, L.R. Hsu, K.T. Lim, A.G. Saidi, and S.K. Reinhardt, "The M5 Simulator: Modeling Networked Systems," IEEE Micro, vol. 26, no. 4, pp. 52-60, July/Aug. 2006.
[49] F. Mueller, "Timing Analysis for Instruction Caches," Real-Time Systems, vol. 18, nos. 2/3, pp. 217-247, 2000.
[50] H. Ramaprasad and F. Mueller, "Bounding Preemption Delay within Data Cache Reference Patterns for Real-Time Tasks," Proc. Real-Time and Embedded Technology and Applications Symp. (RTAS), pp. 71-80, 2006.
[51] X. Vera, B. Lisper, and J. Xue, "Data Cache Locking for Tight Timing Calculations," Trans. Embedded Computing Systems, vol. 7, no. 1, pp. 1-38, 2007.
[52] H. Falk, S. Plazar, and H. Theiling, "Compile-Time Decided Instruction Cache Locking Using Worst-Case Execution Paths," Proc. Conf. CODES+ISSS, pp. 143-148, 2007.
[53] S. Udayakumaran, A. Dominguez, and R. Barua, "Dynamic Allocation for Scratch-Pad Memory Using Compile-Time Decisions," Trans. Embedded Computing Systems, vol. 5, no. 2, pp. 472-511, 2006.
[54] J.-F. Deverge and I. Puaut, "WCET-Directed Dynamic Scratchpad Memory Allocation of Data," Proc. Euromicro Conf. Real-Time Systems (ECRTS), pp. 179-190, 2007.
[55] The Alpha Architecture Handbook. Compaq, 1998.
[56] Free Software Foundation, "GCC Optimize Options," , 2010.
[57] D.R. Chase, M. Wegman, and F.K. Zadeck, "Analysis of Pointers and Structures," Proc. Conf. Programming Language Design and Implementation (PLDI), pp. 296-310, 1990.
[58] C99 Programming Language Standard, ISO/IEC 9899, ISO/IEC, 1999.
[59] J.R. Allen, K. Kennedy, C. Porterfield, and J. Warren, "Conversion of Control to Data Dependence," Proc. Symp. Principles of Programming Languages (POPL), pp. 177-189, 1983.
[60] P. Puschner, "Is WCET Analysis a Non-Problem?—Towards New Software and Hardware Architectures," Proc. Int'l Workshop Worst-Case Execution Time (WCET), 2002.
[61] M. Delvai, W. Huber, P. Puschner, and A. Steininger, "Processor Support for Temporal Predictability—the SPEAR Design Example," Proc. Euromicro Conf. Real-Time Systems (ECRTS), July 2003.

Index Terms:
Real-time and embedded systems, superscalar and dynamically scheduled microarchitectures.
Jack Whitham, Neil Audsley, "Time-Predictable Out-of-Order Execution for Hard Real-Time Systems," IEEE Transactions on Computers, vol. 59, no. 9, pp. 1210-1223, Sept. 2010, doi:10.1109/TC.2010.109
Usage of this product signifies your acceptance of the Terms of Use.