This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Continuous Program Optimization: Design and Evaluation
June 2001 (vol. 50 no. 6)
pp. 549-566

Abstract—This paper presents a system in which the already executing user code is continually and automatically reoptimized in the background, using dynamically collected execution profiles as a guide. Whenever a new code image has been constructed in the background in this manner, it is hot-swapped in place of the previously executing one. Control is then transferred to the new code and construction of yet another code image is initiated in the background. Two new runtime optimization techniques have been implemented in the context of this system: object layout adaptation and dynamic trace scheduling. The former technique constantly improves the storage layout of dynamically allocated data structures to improve data cache locality. The latter increases the instruction-level parallelism by continually adapting the instruction schedule to predominantly executed program paths. The empirical results presented in this paper make a case in favor of continuous optimization, but also indicate some of the pitfalls and current shortcomings of continuous optimization. If not applied judiciously, the costs of dynamic optimizations outweigh their benefit in many situations so that no break-even point is ever reached. In favorable circumstances, however, speed-ups of over 96 percent have been observed. It appears as if the main beneficiaries of continuous optimization are shared libraries in specific application domains which, at different times, can be optimized in the context of the currently dominant client application.

[1] L.P. Deutsch and A.M. Schiffman,“Efficient implementation of the Smalltalk system,” Proc. 11th POPL, pp. 297-302, Jan. 1984.
[2] M. Franz, “Code-Generation On-the-Fly: A Key to Portable Software,” PhD thesis, Institut für Computersysteme, ETH Zürich, 1994.
[3] U. Hölzle, “Adaptive Optimization for SELF: Reconciling High Performance with Exploratory Programming,” PhD thesis, Dept. of Computer Science, Stanford Univ., 1994.
[4] U. Hölzle and D. Ungar, “Reconciling Responsiveness with Performance in Pure Object-Oriented Languages,” ACM Trans. Programming Languages and Systems, vol. 18, no. 4, pp. 355-400, July 1996.
[5] A.-R. Adl-Tabatabi, M. Cierniak, G.-Y. Lueh, V.M. Parikh, and J.M. Stichnoth, “Fast, Effective Code Generation in a Just-in-Time Java Compiler,” Proc. ACM SIGPLAN '98 Conf. Programming Language Design and Implementation, pp. 280-290, June 1998.
[6] X. Zhang, Z. Wang, N. Gloy, J.B. Chen, and M.D. Smith, “System Support for Automatic Profiling and Optimization,” Proc. 16th ACM Symp. Operating Systems Principles, pp. 15-26, Oct. 1997.
[7] R.J. Hookway and M.A. Herdeg, "Digital FX!32: Combining Emulation and Binary Translation," Digital Technical J., Vol. 9, No. 1, 1997, pp. 3-12.
[8] N. Wirth and J. Gutknecht, Project Oberon. Addison-Wesley, 1992.
[9] J. Gutknecht, “Oberon System 3: Vision of a Future Software Technology,” Software—Concepts and Tools, vol. 15, no. 1, pp. 26-33, 1994.
[10] J. Gutknecht and M. Franz, “Oberon with Gadgets: A Simple Component Framework,” Object-Oriented Application Frameworks, vol. 2, 1999.
[11] T. Kistler, “Continuous Program Optimization,” PhD thesis, Dept. of Information and Computer Science, Univ. of California, Irvine, Nov. 1999.
[12] M. Franz and T. Kistler, “Slim Binaries,” Comm. ACM, vol. 40, no. 12, pp. 87-94, Dec. 1997, also published as Technical Report TR 96-24, Dept. of Information and Computer Science, Univ. of California, Irvine, June 1996.
[13] T. Ball and J.R. Larus, “Optimally Profiling and Tracing Programs,” ACM Trans. Programming Languages and Systems, vol. 16, no. 4, pp. 1319–1360, July 1994.
[14] T. Ball, P. Mataga, and M. Sagiv, “Edge Profiling versus Path Profiling: The Showdown,” Proc. 25th ACM SIGPLAN Symp. Principles of Programming Languages (POPL), pp. 134-148, Jan. 1998.
[15] T. Kistler and M. Franz, “Computing the Similarity of Profiling Data—Heuristics for Guiding Adaptive Compilation,” Proc. Workshop Profile and Feedback-Directed Compilation (in conjunction with PACT '98), Oct. 1998, also published as Technical Report TR 98–30, Dept. of Information and Computer Science, Univ. of California, Irvine, Dec. 1996.
[16] M. Brandis, “Optimizing Compilers for Structured Programming Languages,” PhD thesis, Institut für Computersysteme, ETH Zürich, 1995.
[17] R. Cytron, J. Ferrante, B.K. Rosen, M.N. Wegman, and F.K. Zadeck, "Efficiently Computing Static Single Assignment Form and the Control Dependence Graph," ACM Trans. Programming Languages and Systems, Oct. 1991.
[18] T. Kistler and M. Franz, “Automated Data-Member Layout of Heap Objects to Improve Memory-Hierarchy Performance,” ACM Trans. Programming Languages and Systems (TOPLAS), vol. 22, no. 3, pp. 490-505, 2000.
[19] T.C. Mowry, M.S. Lam, and A. Gupta, “Design and Evaluation of a Compiler Algorithm for Prefetching,” Proc. Fifth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, Oct. 1992.
[20] M. Wolf and M. Lam, “A Data Locality Optimizing Algorithm,” Proc. SIGPLAN Conf. Programming Language Design and Implementation, pp. 30-44, June 1991.
[21] J. Gosling, B. Joy, and G. Steele, The Java Language Specification, Addison-Wesley, Reading, Mass., 1996.
[22] N. Wirth, "The Programming Language Oberon," Software-Practice and Experience, No. 7, 1988, pp. 671-690.
[23] S.S. Muchnick, Advanced Compiler Design and Implementation, Morgan Kaufmann, San Francisco, Calif., 1997.
[24] T.M. Chilimbi, B. Davidson, and J.R. Larus, "CacheConscious Structure Definition," Proc. SIGPLAN 99, Conf. Programming Language Design and Implementation, ACM Press, New York, 1999, pp. 13-26.
[25] N. Gloy, T. Blackwell, M.D. Smith, and B. Calder, “Procedure Placement Using Temporal Ordering Information,” Proc. 30th Ann. IEEE/ACM Int'l Symp. Microarchitecture (MICRO), pp. 303-313, Dec. 1997.
[26] B.W. Kernighan and S. Lin, “An Efficient Heuristic Procedure for Partitioning Graphs,” The Bell System Technical J., pp. 291-307, Feb. 1970.
[27] S. Dutt, “New Faster Kernighan-Lin-Type 'Graph-Partitioning Algroithms',” Proc. IEEE/ACM Int'l Conf. Computer-Aided Design, Nov. 1993.
[28] G. Karypis and V. Kumar, “A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs,” SIAM J. Scientific Computing, to appear.
[29] Motorola, Inc., PowerPC: Addendum to PowerPC 604 RISC Microprocessor User's Manual: PowerPC 604e Microprocessor Supplement and User's Manual Errata, 1996.
[30] K. Pettis and R.C. Hansen, “Profile Guided Code Positioning,” Proc. SIGPLAN 1990 Conf. Programming Language Design and Implementation, pp. 16-27, June 1990.
[31] P.P. Chang, S.A. Mahlke, and W.W. Hwu, "Using Profile Information to Assisst Classic Code Optimizations," Software—Practice and Experiences, vol. 21, no. 12, pp. 1,301-1,321, 1991.
[32] P.P. Chang, S.A. Mahlke, W.Y. Chen, and W.-M.W. Hwu, “Profile-Guided Automatic Inline Expansion for C Programs,” Software–Practice and Experience, vol. 22, no. 5, pp. 349-369, May 1992.
[33] W.Y. Chen, S.A. Mahlke, N.J. Warter, S. Anik, and W.-M.W. Hwu, “Profile-Assisted Instruction Scheduling,” Int'l J. Parallel Programming, vol. 22, no. 2, pp. 151-181, Apr. 1994.
[34] P.P. Chang, W.Y. Chen, S.A. Mahlke, and W.-M.W. Hwu, “Comparing Static and Dynamic Code Scheduling for Multiple-Instruction-Issue Processors,” Proc. 24th Ann. IEEE/ACM Int'l Symp. Microarchitecture (MICRO), pp. 25-33, Nov. 1991.
[35] Motorola, Inc., PowerPC 604: RISC Microprocessor User's Manual, 1994.
[36] J.A. Fisher, “Trace Scheduling: A Technique for Global Microcode Compaction,” IEEE Trans. Computers, vol. 30, no. 7, pp. 478-490, 1981.
[37] W.Y. Chen, S.A. Mahlke, N.J. Warter, R.E. Hank, R.A. Bringmann, S. Anik, and W.-M.W. Hwu, “Using Profile Information to Assist Advanced Compiler Optimization and Scheduling,” Advances in Languages and Compilers for Parallel Processing, U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, eds. London: Pitman Publishing, 1993.
[38] C. Young and M.D. Smith, “Better Global Scheduling Using Path Profiles,” Proc. 31st Ann. ACM/IEEE Int'l Symp. Microarchitecture (MICRO), pp. 115-126, Dec. 1998.
[39] H. S. Warren, Jr.,“Instruction scheduling for the IBM RISC System/6000 processor,”IBM J. Res. Develop., vol. 34, pp. 85–92, 1990.
[40] D.N. Truong, F. Bodin, and A. Seznec, “Improving Cache Behavior of Dynamically Allocated Data Structures,” Proc. 1998 Int'l Conf. Parallel Architectures and Compilation Techniques (PACT), pp. 322-329, Oct. 1998.
[41] A. Rogers, M. Carlisle, J. Reppy, and L. Hendren, “Supporting Dynamic Data Structures on Distributed Memory Machines,” ACM Trans. Programming Languages and Systems, vol. 17, no. 2, Mar. 1995.
[42] D. Finkel, R. Kinicki, J. Lehmann, and J. CaraDonna, “Comparisons of Distributed Operating System Performance Using the WPI Benchmark Suite,” Technical Report CS-TR-92-2, Worcester Polytechnic Inst., 1992.
[43] J. Dongarra, J. Du Croz, S. Hammarling, and R. Hanson, "An Extended Set of Basic Linear Algebra Subroutines," ACM Trans. Mathematical Software, vol. 14, pp. 18-32, 1988.
[44] J. Anderson et al., "Continuous Profiling: Where Have All the Cycles Gone?" Proc. 16th ACM Symp. on Operating System Principles, ACM Press, New York, 1997, pp. 1-14.
[45] Y. Wu, Y.-F. Lee, and H. Wang, “An Efficient Software-Hardware Collaborative Profiling Technique for Wide-Issue Processors,” Proc. Workshop Binary Translation, Oct. 1999.
[46] G.J. Hansen, “Adaptive Systems for the Dynamic Run-Time Optimization of Programs,” PhD thesis, Dept. of Computer Science, Carnegie-Mellon Univ., Mar. 1974.
[47] B. Alpern, A. Cocchi, D. Lieber, M. Mergen, and V. Sarkar, “Jalapeño—A Compiler-Supported Java Virtual Machine for Servers,” Proc. ACM SIGPLAN 1999 Workshop Compiler Support for System Software (WCSSS '99), May 1999.
[48] B. Alpern, C.R. Attanasio, J.J. Barton, A. Cocchi, S.F. Hummel, D. Lieber, T. Ngo, M. Mergen, J.C. Shepherd, and S. Smith, “Implementing Jalapeño in Java,” Proc. ACM SIGPLAN '99 Conf. Object-Oriented Programming Systems, Languages and Applications (OOPSLA), Nov. 1999.
[49] M. Cierniak, G.-Y. Lueh, and J.M. Stichnoth, “Practicing JUDO: Java under Dynamic Optimizations,” Proc. ACM SIGPLAN '00 Conf. Programming Language Design and Implementation (PLDI), pp. 13-26, June 2000.
[50] U. Hölzle and D. Ungar, “Optimizing Dynamically-Dispatched Calls with Run-Time Type Feedback,” Proc. ACM SIGPLAN `94 Conf. Programming Language Design and Implementation, June 1994.
[51] J. Dean and C. Chambers, “Towards Better Inlining Decisions Using Inlining Trials,” Proc. Conf. Lisp and Functional Programming, pp. 273-282, July-Sept. 1994.
[52] C. Chambers, “The Design and Implementation of the SelfCompiler, an Optimizing Compiler for Object-Oriented Programming Languages,” PhD thesis, Stanford Univ., Apr. 1992.
[53] U. Hölzle, C. Chambers, and D. Ungar, “Optimizing Dynamically-Typed Object-Oriented Languages With Polymorphic Inline Caches,” Proc. Fifth European Conf. Object-Oriented Programming (ECOOP), pp. 21-38, July 1991.
[54] C. Chambers and D. Ungar, “Customization: Optimizing Compiler Technology for SELF, A Dynamically-Typed Object-Oriented Programming Language,” Proc. ACM SIGPLAN '89 Conf. Programming Language Design and Implementation (PLDI), pp. 146-160, June 1989.
[55] J.-D. Choi, M. Gupta, M. Serrano, V.C. Sreedhar, and S. Midkiff, “Escape Analysis for Java,” Proc. ACM SIGPLAN '99 Conf. Object-Oriented Programming Systems, Languages and Applications (OOPSLA), Nov. 1999.
[56] V. Bala, E. Duesterwald, and S. Banerjia, “Transparent Dynamic Optimization: The Design and Implementation of Dynamo,” Technical Report HPL-1999-78, Hewlett Packard Laboratories, June 1999.
[57] C. Zheng and C. Thompson, "PA-RISC to IA-64: Transparent Execution, No Recompilation," Computer, Vol. 33, Nom.3, Mar. 2000, pp. 47-52.
[58] K. Ebcio$\breve{\rm g}$lu and E.R. Altman, "DAISY: Dynamic Compilation for 100% Architectural Compatibility," Proc. ISCA 24, ACM Press, New York, 1997, pp. 26-37.
[59] M. Gschwind, E. Altman, S. Sathaye, P. Ledak, and D. Appenzeller, “Dynamic and Transparent Binary Translation,” Computer, pp. 54-59, Mar. 2000.
[60] A. Klaiber, “The Technology behind Crusoe Processors,” Transmeta Corp., Jan. 2000.
[61] D.R. Engler, W.C. Hsieh, and M.F. Kaashoek, “`C: A Language for High-Level, Efficient, and Machine-Independent Dynamic Code Generation,” Proc. 23rd ACM SIGPLAN Symp. Principles of Programming Languages (POPL), pp. 131-144, Jan. 1996.
[62] P. Lee and M. Leone, “Optimizing ML with Run-Time Code Generation,” Proc. ACM SIGPLAN '96 Conf. Programming Language Design and Implementation (PLDI), pp. 137-148, May 1996.
[63] R. Marlet, C. Consel, and P. Boinot, “Efficient Incremental Run-Time Specialization for Free,” Proc. ACM SIGPLAN '99 Conf. Programming Language Design and Implementation (PLDI), pp. 281-292, May 1999.
[64] B. Grant, M. Philipose, M. Mock, C. Chambers, and S.J. Eggers, “An Evaluatioin of Staged Run-Time Optimization in DyC,” Proc. ACM SIGPLAN '99 Conf. Programming Language Design and Implementation (PLDI), pp. 293-304, May 1999.
[65] T.M. Chilimbi and J.R. Larus, "Using Generational Garbage Collection to Implement Cache-Conscious Data Placement," Proc. Int'l Symp. Memory Management, ACM Press, New York, 1998, pp. 37-48.
[66] T.M. Chilimbi, M.D. Hill, and J.R. Larus, "Cache-Conscious Structure Layout," Proc. SIGPLAN 99, Conf. Programming Language Design and Implementation, ACM Press, New York, 1999, pp. 1-12.
[67] B. Calder et al., "Cache-Conscious Data Placement," Proc. 8th Int'l Conf. Architectural Support for Programming Languages and Operating Systems, ACM Press, New York, 1998, pp. 139-149.
[68] B.L. Deitrich and W.-M.W. Hwu, “Speculative Hedge: Regulating Compile-Time Speculation against Profile Variations,” Proc. 29th Ann. IEEE/ACM Int'l Symp. Microarchitecture (MICRO), pp. 70-79, Dec. 1996.
[69] C. Chekuri, R. Johnson, R. Motwani, B.K. Natarajan, B.R. Rau, and M. Schlansker, “Profile-Driven Instruction Level Parallel Scheduling with Applications to Super Blocks,” Proc. 29th Ann. IEEE/ACM Int'l Symp. Microarchitecture (MICRO), pp. 58-67, 1996.

Index Terms:
Dynamic compilation, continuous optimization, memory optimization, trace scheduling, profiling.
Citation:
Thomas Kistler, Michael Franz, "Continuous Program Optimization: Design and Evaluation," IEEE Transactions on Computers, vol. 50, no. 6, pp. 549-566, June 2001, doi:10.1109/12.931893
Usage of this product signifies your acceptance of the Terms of Use.