This Article 
 Bibliographic References 
 Add to: 
Supporting Demanding Hard-Real-Time Systems with STI
October 2005 (vol. 54 no. 10)
pp. 1188-1202
Software thread integration (STI) is a compilation technique which enables the efficient use of an application's fine-grain idle time on generic processors without special hardware support. With STI, a primary function is automatically interleaved with a secondary function to create a single implicitly multithreaded function which minimizes context switching and, hence, both improves performance and also offers very fine-grain concurrency. In this work, we extend STI techniques to address two challenges. First, we reduce response time for interrupts or other high-priority threads by introducing polling servers into integrated threads. Second, we enable integration with long host threads, expanding the domain of STI. We derive methods to evaluate the response time for threads in systems with and without these new integration methods. We demonstrate these concepts with the integration of various threads in a sample hard-real-time system on a highly-constrained microcontroller. We use an inexpensive 20 MHz AVR 8-bit microcontroller to generate monochrome NTSC video while servicing a high-speed (115.2 kbaud) serial communication link. We have built and tested this system, acheiving graphics rendering speed-ups of 3.99x to 13.5x.

[1] A.G. Dean and R.R. Grzybowski, “A High-Temperature Embedded Network Interface Using Software Thread Integration,” Proc. Second Workshop Compiler and Architectural Support for Embedded Systems, Oct. 1999.
[2] A.G. Dean and J.P. Shen, “System-Level Issues for Software Thread Integration: Guest Triggering and Host Selection,” Proc. 20th Symp. Real-Time Systems, pp. 234-245, Dec. 1999.
[3] “Techniques for Software Thread Integration in Real-Time Embedded Systems,” Proc. 19th Symp. Real-Time Systems, pp. 322-333, Dec. 1998.
[4] “Hardware to Software Migration with Real-Time Thread Integration,” Proc. 24th EUROMICRO Conf., pp. 243-252, Aug. 1998.
[5] A.G. Dean, “Compiling for Concurrency: Planning and Performing Software Thread Integration,” Proc. 23rd IEEE Real-Time Systems Symp., Dec. 2002.
[6] Atmega 128: 8-Bit AVR Microcontroller with 128K Bytes In-System Programmable Flash, Atmel Corp., doc2467.pdf, 2005.
[7] E. Nisley, “Rising Tides,” Dr. Dobb's J., vol. 346, Mar. 2003.
[8] L. Lee, S. Kannan, and J. Fridman, “Mpeg4 Video Codec on a Wireless Handset Baseband System,” Proc. Workshop Media and Signal Processors for Embedded Systems and SoCs, Nov. 2004.
[9] J. Ferrante, K.J. Ottenstein, and J.D. Warren, “The Program Dependence Graph and Its Use in Optimization,” ACM Trans. Programming Languages and Systems, vol. 9, no. 3, pp. 319-349, July 1987.
[10] B. Welch, S. Kanaujia, A. Seetharam, D. Thirumalai, and A.G. Dean, “Extending STI for Demanding Hard-Real-Time Systems,” Proc. Int'l Conf. Compilers, Architectures, and Synthesis for Embedded Systems, pp. 41-50, 2003.
[11] T. Baker and A. Shaw, “The Cyclic Executive Model and Ada,” Univ. of Washington, Dept. of Computer Science, Seattle, Washington, Technical Report 88-04-07, 1988.
[12] A.N. Audsley, A. Burns, M. Richardson, and K. Tindell, “Applying New Scheduling Theory to Static Priority Pre-Emptive Scheduling,” Software Eng. J., pp. 284-292, 1993.
[13] T.M. Conte, S. Banerjia, S.Y. Larin, K.N. Menezes, and S.W. Sathaye, “Instruction Fetch Mechanisms for Vliw Architectures with Compressed Encodings,” MICRO 29: Proc. 29th Ann. ACM/IEEE Int'l Symp. Microarchitecture, pp. 201-211, 1996.
[14] Y.-C. Chang and K.G. Shin, “A Reservation-Based Algorithm for Scheduling Both Periodic and Aperiodic Real-Time Tasks,” IEEE Trans. Computers, vol. 44, no. 12, pp. 1405-1419, 1995.
[15] B. Sprunt, “Aperiodic Task Scheduling for Real-Time Systems,” , 1990.
[16] J.K. Strosnider, J.P. Lehoczky, and L. Sha, “The Deferrable Server Algorithm for Enhanced Aperiodic Responsiveness in Hard Real-time Environments,” IEEE Trans. Computers, vol. 44, no. 1, pp. 73-91, 1995.
[17] M.E. Conway, “Design of a Separable Transition-Diagram Compiler,” Comm. ACM, vol. 6, no. 7, pp. 396-408, 1963.
[18] A.G. Dean, B. Welch, and S. Kanaujia, “Generate Video Using Software Thread Integration,” Circuit Cellar, pp. 28-34, Dec. 2003.
[19] avr-gcc 3.2, http:, 2005.
[20] G.J. Chaitin, “Register Allocation and Spilling via Graph Coloring,” Proc. SIGPLAN '82 Symp. Compiler Construction, 1982.
[21] D. Bairagi, S. Pande, and D.P. Agrawal, “A Framework for Enhancing Code Quality in Limited Register Set Embedded Processors,” Proc. Languages, Compilers, and Tools for Embedded Systems: ACM SIGPLAN Workshop, June 2000.
[22] T. Kong and K.D. Wilken, “Precise Register Allocation for Irregular Register Architectures,” Proc. 31st Ann. ACM/IEEE Int'l Symp. Microarchitecture (MICRO-98), pp. 297-307, 1998.
[23] B. Scholz and E. Eckstein, “Register Allocation for Irregular Architectures,” Proc. 2002 Joint Conf. Languages, Compilers, and Tools for Embedded Systems and Software and Compilers for Embedded Systems (LCTES/SCOPES-02), C. Norris and J.J.B. Fenwick, eds., pp. 139-148, June 2002.
[24] R. Gupta and M. Spezialetti, “Busy-Idle Profiles and Compact Task Graphs: Compile-Time Support for Interleaved and Overlapped Scheduling of Real-Time Tasks,” Proc. 15th IEEE Real Time Systems Symp., 1994.
[25] C.J. Beckmann, “Hardware and Software for Functional and Fine Grain Parallelism,” PhD dissertation, Univ. Illinois at Urbana-Champaign, , Apr. 1994.
[26] P. Chou and G. Borriello, “Interval Scheduling: Fine Grained Code Scheduling for Embedded Systems,” Proc. Design Automation Conf., pp. 462-467, June 1995.
[27] R.K. Gupta and G. De Micheli, “A Co-Synthesis Approach to Embedded System Design Automation,” Design Automation for Embedded Systems, vol. 1, nos. 1-2, pp. 69-120, 1996.
[28] S.A. Edwards, “Compiling Esterel into Sequential Code,” Proc. Design Automation Conf., pp. 322-327, , 2000.
[29] R.S. French, M.S. Lam, J.R. Levitt, and K. Olukotun, “A General Method for Compiling Event-Driven Simulations,” Proc. Design Automation Conf., pp. 151-156, , 1995.
[30] E.A. Lee, “Recurrences, Iteration, and Conditionals in Statically Scheduled Block Diagram Languages,” VLSI Signal Processing III, R.W. Brodersen and H.S. Moscovitz, eds., pp. 330-340, IEEE Press, 1998.
[31] C. Loeffler, A. Lightenberg, H. Bheda, and G. Moschytz, “Hierarchical Scheduling Systems for Parallel Architectures,” Proc. Euco, Sept. 1988.
[32] S. Ha and E. Lee, “Compile-Time Scheduling of Dynamic Constructs in Dataflow Program Graphs,” ha97compiletime.html , 1997.
[33] M. Sgroi, L. Lavagno, Y. Watanabe, and A.L. Sangiovanni-Vincentelli, “Quasi-Static Scheduling of Embedded Software Using Equal Conflict Nets,” Proc. Int'l Conf. Application and Theory of Petri Nets, pp. 208-227, , 1999.
[34] B. Lin, “Efficient Compilation of Process-Based Concurrent Programs without Run-Time Scheduling,” Proc. Conf. Design, Automation, and Test in Europe, pp. 211-217, 1998.
[35] E.A. Lee and D.G. Messerschmitt, “Static Scheduling of Synchronous Data Flow Graphs for Digital Signal Processing,” IEEE Trans. Computers, Jan. 1987.
[36] J. Cortadella, A. Kondratyev, L. Lavagno, C. Passerone, and Y. Watanabe, “Quasi-Static Scheduling of Independent Tasks for Reactive Systems,” Proc. Design Automation Conf., June 2000,
[37] K.D. Cooper, M.W. Hall, and K. Kennedy, “A Methodology for Procedure Cloning,” Computer Languages, vol. 19, no. 2, pp. 105-117, 1993, .
[38] J. Dean, C. Chambers, and D. Grove, “Selective Specialization for Object-Oriented Languages,” Proc. SIGPLAN Conf. Programming Language Design, and Implementation, pp. 93-102, 1995, dean95selective.html .
[39] A. Nene, S. Talla, B. Goldberg, and R. Rabbah, Trimaran— An Infrastructure for Compiler Research in Instruction-Level Parallelism — User Manual. New York Univ., 1998, http:/
[40] M.W. Hall, J.M. Mellor-Crummey, A. Carle, and R.G. Rodriguez, “Fiat: A Framework for Interprocedural Analysis and Transfomation,” Proc. Sixth Int'l Workshop Languages and Compilers for Parallel Computing, pp. 522-545, 1994.
[41] M.W. Hall, B.R. Murphy, S.P. Amarasinghe, S.-W. Liao, and M.S. Lam, “Interprocedural Analysis for Parallelization,” LCPC '95: Proc. Eighth Int'l Workshop Languages and Compilers for Parallel Computing, pp. 61-80, 1996.
[42] R.E. Hank, W.-M.W. Hwu, and B.R. Rau, “Region-Based Compilation: An Introduction and Motivation,” MICRO 28: Proc. 28th Ann. Int'l Symp. Microarchitecture, pp. 158-168, 1995.
[43] T. Way and L. Pollock, “Using Path Spectra to Direct Function Cloning,” Proc. Workshop Profile and Feedback-Directed Compilation, pp. 40-47, Oct. 1998.
[44] D. Lancaster, Cheap Video Cookbook. Howard W. Sams and Co. Inc., 1978.
[45] R. Gunee, “Software Generated Video,” http://www.rickard. gunee.comprojects/, 2005.
[46] R. Lacoste, “PIC-Spectrum Audio Spectrum Analyzer,” Circuit Cellar, vol. 98, pp. 24-31, Sept. 1998.
[47] “The Xy-Plotter: Drive High-Resolution LCDs for Less,” Circuit Cellar, vol. 133, pp. 42-51, Sept. 2003.
[48] B. Land, “AVR Video Generator: Teaching Programming and Graphics,” Circuit Cellar, vol. 150, pp. 40-53, Jan. 2003.
[49] A. Riccibitti, “Video DVM,” 3632dvm.htm, 2005.
[50] E. Smith, “PIC-Tock,” , 2005.
[51] “PIC-Pong,” pong.html , 2005.
[52] D. Thomas, “VCR Pong,” Nov. 1998,, 2005.
[53] “Video Clock Superimposer,” Mar. 1999, http://dt.prohosting. com/picvidclock.html , 2005.
[54] “Ubicom Video Virtual Peripheral Design Challenge and Contest,” , 2005..
[55] T. Napier, “Use Frequency Modulation to Send Ascii Data,” Circuit Cellar, vol. 150, pp. 12-16, Jan. 2003.

Index Terms:
Index Terms- Software thread integration, embedded systems, fine-grain concurrency, post-pass compiler, hardware-to-software migration.
Benjamin J. Welch, Shobhit O. Kanaujia, Adarsh Seetharam, Deepaksrivats Thirumalai, Alexander G. Dean, "Supporting Demanding Hard-Real-Time Systems with STI," IEEE Transactions on Computers, vol. 54, no. 10, pp. 1188-1202, Oct. 2005, doi:10.1109/TC.2005.169
Usage of this product signifies your acceptance of the Terms of Use.