The Community for Technology Leaders
RSS Icon
Issue No.11 - November (2011 vol.60)
pp: 1521-1534
Kuo-Yi Chen , National Cheng Kung University, Tainan
J. Morris Chang , Iowa State University, Ames
Ting-Wei Hou , National Cheng Kung University, Tainan
The performance and scalability issues of multithreaded Java programs on multicore systems are studied in this paper. First, we examine the performance scaling of benchmarks with various numbers of processor cores and application threads. Second, by correlating low-level hardware performance data to JVM threads and system components, the detail analyses of performance and scalability are presented, such as the hardware stall events and memory system latencies. Third, the usages of memory resource are detailed to observe the potential bottlenecks. Finally, the JVM tuning techniques are proposed to alleviate the bottlenecks, and improve the performance and scalability. Several key findings are revealed through this study. First, the lock contentions usually lead to a strong limitation of scalability. Second, in terms of memory access latencies, the most of memory stalls are produced by L2 cache misses and cache-to-cache transfers. Finally, the overhead of minor garbage collections could be an important factor of throughput reductions. Based on these findings, the appropriate Java Virtual Machine (JVM) tuning techniques are examined in this study. We observe that the use of a parallel garbage collector and an appropriate ratio of young to old generation can alleviate the overhead of minor collection and improve the efficiency of garbage collections. Moreover, the cache utilizations could be enhanced with the use of thread-local allocation buffer, and then leads to the performance improvements significantly.
Garbage collection, Java, lock contention, multicore, performance counter, scalability, virtual machine.
Kuo-Yi Chen, J. Morris Chang, Ting-Wei Hou, "Multithreading in Java: Performance and Scalability on Multicore Systems", IEEE Transactions on Computers, vol.60, no. 11, pp. 1521-1534, November 2011, doi:10.1109/TC.2010.232
[1] Lime Wire, LLC. Lime Wire, Web Document, http:/www., 2007.
[2] T. Domani, G. Goldshtein, E.K. Kolodner, E. Lewis, E. Petrank, and D. Sheinwald, “Thread-Local Heaps for Java,” Proc. Third Int'l Symp. Memory Management (ISMM '02), pp. 76-87, 2002.
[3] OpenJDK Project, https:/, 2009.
[4] J. Donnell, “Java Performance Profiling Using the VTune Performance Analyzer,” Intel, 2004.
[5] Y. Luo and L.K. John, “Workload Characterization of Multithreaded Java Servers,” Proc. 2001 IEEE Int'l Symp. Performance Analysis of Systems and Software (ISPASS), pp. 128-136, Nov. 2001.
[6] S.M. Blackburn, R. Garner, C. Hoffman, A.M. Khan, K.S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S.Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J.E.B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann, “The DaCapo Benchmarks: Java Benchmarking Development and Analysis,” Proc. 21st Ann. ACM SIGPLAN Conf. Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA '06) pp. 169-190, Oct. 2006.
[7] DaCapo Research Project, http:/, 2009.
[8] A. Adamson, D. Dagastine, and S. Sarne, “SPECjbb2005—A Year in the Life of a Benchmark,” Proc. SPEC Benchmark Workshop, 2007.
[9] Intel VTune(TM) Performance Analyzer Documentations, , 2009.
[10] Cycle Accounting Analysis on Intel Core2 Processors. Dr. David Levinthal PhD, Intel Corporation.
[11] P.F. Sweeney, M. Hauswirth, B. Cahoon, P. Cheng, A. Diwan, D. Grove, and M. Hind, “Using Hardware Performance Monitors to Understand the Behavior of Java Applications,” Proc. USENIX Third Virtual Machine Research and Technology Symp. (VM '04), May 2004.
[12] M. Hauswirth, P.F. Sweeney, A. Diwan, and M. Hind, “Vertical Profiling: Understanding the Behavior of Object-Oriented Applications,” Proc. 18th ACM SIGPLAN Conf. Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), Oct. 2004.
[13] R. Ramesh, N. Vijaykrishnan, L.K. John, and S. Anand, “Architectural Issues in Java Runtime Systems,” Proc. Sixth Int'l Symp. High-Performance Computer Architecture (HPCA), Jan. 2000.
[14] S.M. Blackburn, P. Cheng, and K.S. McKinley, “Myths and Realities: The Performance Impact of Garbage Collection,” Proc. SIGMETRICS '04/Performance, pp. 25-36, 2004.
[15] J. Torrellas, H.S. Lam, and J.L. Hennessy, “False Sharing and Spatial Locality in Multiprocessor Caches,” IEEE Trans. Computers, vol. 43, no 6, pp. 651-663, June 1994.
[16] R.L. Halpert, C.J.F. Pickett, and C. Verbrugge, “Component-Based Lock Allocation,” Proc. 16th Int'l Conf. Parallel Architecture and Compilation Techniques (PACT '07), pp. 353-364, 2007.
28 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool