This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Java Runtime Systems: Characterization and Architectural Implications
February 2001 (vol. 50 no. 2)
pp. 131-146

Abstract—The Java Virtual Machine (JVM) is the cornerstone of Java technology and its efficiency in executing the portable Java bytecodes is crucial for the success of this technology. Interpretation, Just-In-Time (JIT) compilation, and hardware realization are well-known solutions for a JVM and previous research has proposed optimizations for each of these techniques. However, each technique has its pros and cons and may not be uniformly attractive for all hardware platforms. Instead, an understanding of the architectural implications of JVM implementations with real applications can be crucial to the development of enabling technologies for efficient Java runtime system development on a wide range of platforms. Toward this goal, this paper examines architectural issues from both the hardware and JVM implementation perspectives. The paper starts by identifying the important execution characteristics of Java applications from a bytecode perspective. It then explores the potential of a smart JIT compiler strategy that can dynamically interpret or compile based on associated costs and investigates the CPU and cache architectural support that would benefit JVM implementations. We also study the available parallelism during the different execution modes using applications from the SPECjvm98 benchmarks. At the bytecode level, it is observed that less than 45 out of the 256 bytecodes constitute 90 percent of the dynamic bytecode stream. Method sizes fall into a trinodal distribution with peaks of 1, 9, and 26 bytecodes across all benchmarks. The architectural issues explored in this study show that, when Java applications are executed with a JIT compiler, selective translation using good heuristics can improve performance, but the saving is only 10-15 percent at best. The instruction and data cache performance of Java applications are seen to be better than that of C/C++ applications except in the case of data cache performance in the JIT mode. Write misses resulting from installation of JIT compiler output dominate the misses and deteriorate the data cache performance in JIT mode. A study on the available parallelism shows that Java programs executed using JIT compilers have parallelism comparable to C/C++ programs for small window sizes, but falls behind when the window size is increased. Java programs executed using the interpreter have very little parallelism due to the stack nature of the JVM instruction set, which is dominant in the interpreted execution mode. In addition, this work gives revealing insights and architectural proposals for designing an efficient Java runtime system.

[1] T. Lindholm and F. Yellin, The Java Virtual Machine Specification, Addison-Wesley, Reading, Mass., 1997.
[2] T. Suganuma, T. Ogasawara, M. Takeuchi, T. Yasue, M. Kawahito, K. Ishizaki, H. Komatsu, and T. Nakatani, “Overview of the IBM Java Just-in-Time Compiler,” IBM Systems J., vol. 39, no. 1, pp. 175-193, 2000.
[3] “Symantec Cafe,” http://www.symantec.comcafe.
[4] A. Krall, “Efficient JavaVM Just-in-Time Compilation,” Proc. Int'l Conf. Parallel Architectures and Compilation Techniques, pp. 54-61, 1998.
[5] G. Muller, B. Moura, F. Bellard, and C. Consel, “Harissa: A Flexible and Efficient Java Environment Mixing Bytecode and Compiled Code,” Proc. Third Conf. Object-Oriented Technologies and Systems (COOTS), pp. 1-20, 1997.
[6] Tower Technology Corporation, Austin, Tex.,http:/www.towerj.com.
[7] T.A. Proebsting, G. Townsend, P. Bridges, J.H. Hartman, T. Newsham, and S.A. Watterson, “Toba: Java for Applications—A Way Ahead of Time (WAT) Compiler,” technical report, Dept. of Computer Science, Univ. of Arizona, Tucson, 1997.
[8] M.T.C.C.A. Hsieh, T.L. Johnson, J.C. Gyllenhaal, and W.W. Hwu, “Optimizing NET Compilers for Improved Java Performance,” Computer, pp. 67-75, June 1997.
[9] R. Fitzgerald, T.B. Knoblock, E. Ruf, B. Steensgaard, and D. Tarditi, “Marmot: An Optimizing Compiler for Java,” Technical Report MSR-TR-99-33, Microsoft Research, 1999.
[10] J.M. O'Connor and M. Tremblay, "PicoJava-1: The Java Virtual Machine in Hardware," IEEE Micro, Vol. 17, No. 2, Mar./Apr. 1992, pp. 45-53.
[11] T. Cramer, R. Friedman, T. Miller, D. Seberger, R. Wilson, and M. Wolczko, “Compiling Java Just in Time,” IEEE Micro, vol. 17, no. 3, pp. 36-43, May/June 1997.
[12] A.-R. Adl-Tabatabi, M. Cierniak, G.-Y. Lueh, V.M. Parikh, and J.M. Stichnoth, “Fast, Effective Code Generation in a Just-in-Time Java Compiler,” Proc. ACM SIGPLAN '98 Conf. Programming Language Design and Implementation, pp. 280-290, June 1998.
[13] T. Newhall and B. Miller, “Performance Measurement of Interpreted Programs,” Proc. Euro-Par '98 Conf., Sept. 1998.
[14] N. Vijaykrishnan, N. Ranganathan, and R. Gadekarla, “Object-Oriented Architectural Support for a Java Processor,” Proc. 12th European Conf. Object-Oriented Programming, pp. 430-455, July 1998.
[15] “SPEC JVM98 Benchmarks,” http://www.spec.org/osgjvm98/.
[16] “Overview of Java Platform Product Family,” http://www.javasoft.com/productsOV_jdkProduct.html .
[17] “Kaffe Virtual Machine,” Kluwer Academic,http:/www.transvirtual.com.
[18] R.F. Cmelik and D. Keppel, Shade: A Fast Instruction-Set Simulator for Execution Profiling, Sun Microsystems Laboratories and Univ. of Washington, Tech. Report SMLI 93-12 and UWCSE, June,6 1993.
[19] A. Barisone, F. Bellotti, R. Berta, and A. De Gloria, “Instruction Level Characterization of Java Virtual Machine Workload,” Workload Characterization for Computer System Design, L. John and A. Maynard, eds., pp. 1-24, 1999.
[20] T. Romer, D. Lee, G. Voelker, A. Wolman, W. Wong, J. Baer, B. Bershad, and H. Levy, “The Structure and Performance of Interpreters,” Proc. Seventh Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 150–159, Oct. 1996.
[21] C.A. Hsieh, M.T. Conte, T.L. Johnson, J.C. Gyllenhaal, and W.W. Hwu, “A Study of the Cache and Branch Performance Issues with Running Java on Current Hardware Platforms,” Proc. IEEE Compcon '97, pp. 211-216, 1997.
[22] C.-H.A. Hsieh, J.C. Gyllenhaal, and W.W. Hwu, "Java Bytecode to Native Code Translation: The Caffeine Prototype and Preliminary Results," Proc. 29th Ann. Int'l Symp. Microarchitecture, IEEE CS Press, Los Alamitos, Calif., 1996, pp. 90-97.
[23] D.F. Bacon, R. Konuru, C. Murthy, and M. Serrano, “Thin Locks: Featherweight Synchronization in Java,” Proc. ACM SIGPLAN '98 Conf. Programming Language Design and Implementation (PLDI), pp. 258-268, June 1998.
[24] O. Agesen, D. Detlefs, A. Garthwaite, R. Knippel, and Y. Ramakrishna, “An Efficient Meta-Lock for Implementing Ubiquitous Synchronization,” Proc. OOPSLA 1999, 1999.
[25] A. Krall and M. Probst, “Monitors and Exceptions: How to Implement Java Efficiently,” Proc. ACM 1998 Workshop Java for High-Performance Computing, pp. 15-24, 1998.
[26] O. Agesen, D. Detlefs, and J.E.B. Moss, “Garbage Collection and Local Variable Type-Precision and Liveness in Java Virtual Machines,” Programming Languages Development and Implementation, pp. 269-279, 1998.
[27] S. Deickmann and U. Holzle, “A Study of the Allocation Behavior of the SPECjvm98 Java Benchmarks,” Proc. European Conf. Object Oriented Programming, July 1999.
[28] H. McGhan and M. O'Connor, "PicoJava: A Direct Execution Engine for Java Bytecode," Computer, vol. 31, no. 10, Oct. 1998, pp. 22-30.
[29] D. Griswold, “The Java HotSpot Virtual Machine Architecture,” Sun Microsystems Whitepaper, Mar. 1998.
[30] U. Holzle, “Java on Steroids: Sun's High-Performance Java Implementation,” Proc. HotChips IX, Aug. 1997.
[31] T. Newhall and B. Miller, “Performance Measurement of Dynamically Compiled Java Executions,” Proc. 1999 ACM Java Grande Conf., June 1999.
[32] M. Gupta, “Optimizing Java Programs: Challenges and Opportunities,” Proc. Second Ann. Workshop Hardware Support for Objects and Microarchitectures for Java, Sept. 2000, Available athttp://www.sun.com/labs/people/marioiccd2000whso /.
[33] R. Radhakrishnan, J. Rubio, L. John, and N. Vijaykrishnan, “Execution Characteristics of Just-in-Time Compilers,” Technical Report TR-990717, Univ. of Texas at Austin, 1999, http://www.ece.utexas.edu/projects/ece/lca/ pstr990717.ps.
[34] B. Calder, D. Grunwald, and B. Zorn, “Quantifying Behavioral Differences between C and C++ Programs,” J. Programming Languages, vol. 2, no. 4, 1994.
[35] K. Driesen and U. Holzle, “The Cascaded Predictor: Economical and Adaptive Branch Target Prediction,” Proc. 31st Ann. IEEE/ACM Int'l Symp. Microarchitecture, pp. 249-258, 1999.
[36] K. Driesen and U. Hölzle, “Accurate Indirect Branch Prediction,” Proc. 25th Ann. Int'l Symp. Computer Architecture, pp. 167-178, July 1998.
[37] J. Hennessy and D. Patterson, Computer Architecture: A Quantitative Approach. Morgan Kaufmann, 1995.
[38] S. McFarling, “Combining Branch Predictors,” Technical Report WRL Technical Note TN-36, DEC Western Research Laboratory, June 1993.
[39] T.-Y. Yeh and Y. Patt, “A Comparison of Dynamic Branch Predictors that Use Two Levels of Branch History,” Proc. 20th Ann. Int'l Symp. Computer Architecture, pp. 257-266, May 1993.
[40] N. Vijaykrishnan and N. Ranganathan, “Tuning Branch Predictors to Support Virtual Method Invocation in Java,” Proc. Fifth USENIX Conf. Object-Oriented Technologies and Systems, pp. 217-228, 1999.
[41] R. Radhakrishnan, J. Rubio, and L. John, “Characterization of Java Applications at the Bytecode Level and at UltraSPARC-II Machine Code Level,” Proc. Int'l Conf. Computer Design, Oct. 1999.
[42] T.M. Austin and G.S. Sohi,"Dynamic Dependency Analysis of Ordinary Programs," Proc. 19th Ann. Int'l Symp. Computer Architecture, pp. 342-351, 1992.
[43] R. Sathe and M. Franklin, “Available Parallelism with Data Value Prediction,” Proc. Fifth Int'l Conf. High Performance Computing (HiPC-98), pp. 194-201, Apr. 1998.
[44] J. Sabarinathan, “A Study of Instruction Level Parallelism in Contemporary Computer Applications,” masters thesis, Univ. of Texas at Austin, Dec. 1999.
[45] M.C. Merten, A.R. Trick, C.N. George, J.C. Gyllenhaal, and W.W. Hwu, “A Hardware-Driven Profiling Scheme for Identifying Program Hot Spots to Support Runtime Optimization,” Proc. 26th Int'l Symp. Computer Architecture, pp. 136-147, May 1999.
[46] R. Radhakrishnan et al., "Architectural Issues in Java Runtime Systems," Proc. 6th Int'l Conf. High-Performance Computer Architecture (HPCA 00), IEEE CS Press, 2000, pp. 387-398.

Index Terms:
Java, Java bytecodes, CPU and cache architectures, ILP, performance evaluation, benchmarking.
Citation:
Ramesh Radhakrishnan, N. Vijaykrishnan, Lizy Kurian John, Anand Sivasubramaniam, Juan Rubio, Jyotsna Sabarinathan, "Java Runtime Systems: Characterization and Architectural Implications," IEEE Transactions on Computers, vol. 50, no. 2, pp. 131-146, Feb. 2001, doi:10.1109/12.908989
Usage of this product signifies your acceptance of the Terms of Use.