This Article 
 Bibliographic References 
 Add to: 
Achieving Efficiency and Portability in Systems Software: A Case Study on POSIX-Compliant Multithreaded Programs
September 2005 (vol. 31 no. 9)
pp. 785-800
Portable (standards-compliant) systems software is usually associated with unavoidable overhead from the standards-prescribed interface. For example, consider the POSIX Threads standard facility for using thread-specific data (TSD) to implement multithreaded code. The first TSD reference must be preceded by pthread_getspecific(), typically implemented as a function or macro with 40-50 instructions. This paper proposes a method that uses the runtime specialization facility of the Tempo program specializer to convert such unavoidable source code into simple memory references of one or two instructions for execution. Consequently, the source code remains standard compliant and the executed code's performance is similar to direct global variable access. Measurements show significant performance gains over a range of code sizes. A random number generator (10 lines of C) shows a speedup of 4.8 times on a SPARC and 2.2 times on a Pentium. A time converter (2,800 lines) was sped up by 14 and 22 percent, respectively, and a parallel genetic algorithm system (14,000 lines) was sped up by 13 and 5 percent.

[1] T. Anderson, B. Bershad, E. Lazowska, and H. Levy, “Scheduler Activations: Effective Kernel Support for the User-Level Management of Parallelism,” Proc. 13th ACM Symp. Operating Systems Principles (SOSP-13), pp. 95-109, Dec. 1991.
[2] “Balance 8000 Parallel Programming,” Sequent Computer Systems, Inc., 1985.
[3] B. Bershad, S. Savage, P. Pardyak, E.G. Sirer, M.E. Fiuczynski, D. Becker, C. Chambers, and S. Eggers, “Extensibility Safety and Performance in the SPIN Operating System,” Proc. 15th ACM Symp. Operating Systems Principles (SOSP-15), pp. 267-283, Dec. 1995.
[4] A. Cohen and M. Woodring, Win32 Multithreaded Programming. O'Reilly, 1998.
[5] C. Consel, L. Hornof, F. Noël, J. Noyé, and E.N. Volanschi, “A Uniform Approach for Compile-Time and Runtime Specialization,” Proc. Int'l Seminar on Partial Evaluation, pp. 54-72, Feb. 1996.
[6] C. Consel, J.L. Lawall, and A.-F. Le Meur, “A Tour of Tempo: A Program Specializer for the C Language,” Research Report 1299-03, LaBRI, Apr. 2003.
[7] C. Cowan, A. Black, C. Krasic, C. Pu, J. Walpole, C. Consel, and E.N. Volanschi, “Specialization Classes: An Object Framework for Specialization,” Proc. Fifth IEEE Int'l Workshop Object-Orientation in Operating Systems, pp. 72-78, Oct. 1996.
[8] R.P. Draves, B.N. Bershad, R.F. Rashid, and R.W. Dean, “Using Continuations to Implement Thread Management and Communication in Operating Systems,” Proc. 13th ACM Symp. Operating Systems Principles (SOSP-13), pp. 122-136, 1991.
[9] D.R. Engler, M.F. Kaashoek, and J. O'Toole, “Exokernel: An Operating System Architecture for Application-Level Resource Management,” Proc. 15th ACM Symp. Operating Systems Principles (SOSP-15), pp. 251-266, Dec. 1995.
[10] B. Grant, M. Mock, M. Philipose, C. Chambers, and S.J. Eggers, “The Benefits and Costs of DyC's Runtime Optimizations,” ACM Trans. Programming Languages and Systems (TOPLAS), vol. 22, no. 5, pp. 932-972, Sept. 2000.
[11] M.B. Jones, “Bringing the C Libraries With Us into a MultiThreaded Future,” Proc. Winter 1991 USENIX Conf., pp. 81-91, Jan. 1991.
[12] N.D. Jones, C.K. Gomard, and P. Sestoft, Partial Evaluation and Automatic Program Generation. Prentice-Hall, 1993.
[13] D.E. Knuth, The Art of Computer Programming, Volume 2: Seminumerical Algorithms. Addison-Wesley, 1969.
[14] E.D. Goodman, “An Introduction to GALOPPS— The Genetic Algorithm Optimized for Portability and Parallelism System,” Technical Report #96-71-01, Michigan State Univ., 1996.
[15] A.-F. Le Meur, C. Consel, and B. Escrig, “An Environment for Building Customizable Software Components,” Proc. IFIP/ACM Conf. Component Deployment, pp. 1-14, June 2002.
[16] A.-F. Le Meur, J.L. Lawall, and C. Consel, “Specialization Scenarios: A Pragmatic Approach to Declaring Program Specialization,” Higher-Order and Symbolic Computation, vol. 17, no. 1, pp. 47-92, 2004.
[17] A.-F. Le Meur, J.L. Lawall, and C. Consel, “Towards Bridging the Gap Between Programming Languages and Partial Evaluation,” Proc. ACM SIGPLAN Workshop Partial Evaluation and Semantics-Based Program Manipulation, pp. 9-18, Jan. 2002.
[18] B. Lewis and D.J. Bere, Multithreaded Programming with Pthreads. Sun Microsystems Press/Prentice Hall, 1998.
[19] R. Marlet, S. Thibault, and C. Consel, “Mapping Software Architectures to Efficient Implementations via Partial Evaluation,” Proc. IEEE Conf. Automated Software Eng. (ASE '97), pp. 183-192, 1997.
[20] R. Marlet, S. Thibault, and C. Consel, “Efficient Implementations of Software Architectures via Partial Evaluation,” J. Automated Software Eng., vol. 5, no. 4, pp. 411-440, Oct. 1999.
[21] B. Marsh and M. Scott, “First-Class User-Level Threads,” Proc. 13th ACM Symp. Operating Systems Principles (SOSP-13), pp. 110-121, 1991.
[22] D. McNamee, J. Walpole, C. Pu, C. Cowan, C. Krasic, A. Goel, P. Wagle, C. Consel, G. Muller, and R. Marlet, “Specialization Tools and Techniques for Systematic Optimization of System Software,” ACM Trans. Computer Systems, vol. 19, no. 2, pp. 217-251, May 2001.
[23] G. Muller, R. Marlet, E.N. Volanschi, C. Consel, C. Pu, and A. Goel, “Fast, Optimized Sun RPC Using Automatic Program Specialization,” Proc. 19th IEEE Int'l Conf. Distributed Computing Systems (ICDCS '98), pp. 249-258, May 1998.
[24] S.J. Norton and M.D. Dipasquale, Thread Time: the Multithreaded Programming Guide. Hewlett-Packard/Prentice Hall, 1997.
[25] J.K. Ousterhout, “Why Threads Are a Bad Idea, for Most Purposes,” invited talk at the 1996 USENIX Technical Conf., Jan. 1996, .
[26] M. Poletto, Wi.C. Hsieh, D.R. Engler, and M.F. Kaashoek, “`C and tcc: a Language and Compiler for Dynamic Code Generation,” ACM Trans. Programming Languages and Systems (TOPLAS), vol. 21, no. 2, pp. 324-336, 1999.
[27] “Portable Operating System Interface (POSIX), Part 1: System Application Program Interface (API),” IEEE, 1996.
[28] C. Pu, T. Autrey, A. Black, C. Consel, C. Cowan, J. Inouye, L. Kethana, J. Walpole, and K. Zhang, “Optimistic Incremental Specialization: Streamlining a Commercial Operating System,” Proc. 15th ACM Symp. Operating Systems Principles (SOSP-15), pp. 314-324, Dec. 1995.
[29] S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, T. Anderson, “Eraser: A Dynamic Data Race Detector for Multithreaded Programs,” Proc. 16th ACM Symp. Operating Systems Principles (SOSP-16), pp. 27-37, Dec. 1997.
[30] U.P. Schultz, J.L. Lawall, and C. Consel, “Automatic Program Specialization for Java,” ACM Trans. Programming Languages and Systems (TOPLAS), vol. 25, no. 4, pp. 452-499, July 2003.
[31] Y. Shinjo and Y. Kiyoki, “A Lightweight Process Facility Supporting Meta-Level Programming,” Parallel Computing, vol. 22, no. 11, pp. 1429-1454, 1997.
[32] “The SPARC64 Processor,” HAL Computer Systems, Inc., 1998.
[33] UltraSPARC-II User's Manual. Sun Microsystems, Inc., 1997.
[34] A. Waldspurger and W.E. Weihl, “Register Relocation: Flexible Contexts for Multithreading,” Proc. 20th Ann. Int'l Symp. Computer Architecture, pp. 120-130, 1993.

Index Terms:
Index Terms- Performance, portability, threads, software libraries, concurrent programming, runtime specialization, thread-specific data.
Yasushi Shinjo, Calton Pu, "Achieving Efficiency and Portability in Systems Software: A Case Study on POSIX-Compliant Multithreaded Programs," IEEE Transactions on Software Engineering, vol. 31, no. 9, pp. 785-800, Sept. 2005, doi:10.1109/TSE.2005.98
Usage of this product signifies your acceptance of the Terms of Use.