This Article 
 Bibliographic References 
 Add to: 
Measuring Benchmark Similarity Using Inherent Program Characteristics
June 2006 (vol. 55 no. 6)
pp. 769-782
This paper proposes a methodology for measuring the similarity between programs based on their inherent microarchitecture-independent characteristics, and demonstrates two applications for it: 1) finding a representative subset of programs from benchmark suites and 2) studying the evolution of four generations of SPEC CPU benchmark suites. Using the proposed methodology, we find a representative subset of programs from three popular benchmark suites—SPEC CPU2000, MediaBench, and MiBench. We show that this subset of representative programs can be effectively used to estimate the average benchmark suite IPC, L1 data cache miss-rates, and speedup on 11 machines with different ISAs and microarchitectures—this enables one to save simulation time with little loss in accuracy. From our study of the similarity between the four generations of SPEC CPU benchmark suites, we find that, other than a dramatic increase in the dynamic instruction count and increasingly poor temporal data locality, the inherent program characteristics have more or less remained unchanged.

[1] T. Austin , E. Larson , and D. Ernst , “SimpleScalar: An Infrastructure for Computer System Modeling,” Computer, vol. 35, no. 2, pp. 59-67, Feb. 2002.
[2] L. Barroso , K. Ghorachorloo , and E. Bugnion , “Memory System Characterization of Commercial Workloads,” Proc. Int'l Symp. Computer Architecture, pp. 3-14, 1998.
[3] J. Cantin and M. Hill , “Cache Performance for SPEC CPU2000 Benchmarks,” /, 2003.
[4] D. Citron , “MisSPECulation: Partial and Misleading Use of SPEC CPU2000 in Computer Architecture Conferences,” Proc. Int'l Symp. Computer Architecture, pp. 52-61, 2003.
[5] T. Conte and W. Hwu , “Benchmark Characterization for Experimental System Evaluation,” Proc. Hawaii Int'l Conf. System Science, vol. I, Architecture Track, pp. 6-18, 1990.
[6] P. Denning , “The Working Set Model for Program Behavior,” Comm. ACM, vol. 2, no. 5, pp. 323-333, 1968.
[7] K. Dixit , “Overview of the SPEC Benchmarks,” The Benchmark Handbook, chapter 9. Morgan Kaufmann, 1998.
[8] P. Dubey , G. Adams , and M. Flynn , “Instruction Window Size Trade-Offs and Characterization of Program Parallelism,” IEEE Trans. Computers, vol. 43, no. 4, pp. 431-442, Apr. 1994.
[9] J. Dujmovic and I. Dujmovic , “Evolution and Evaluation of SPEC Benchmarks,” ACM SIGMETRICS Performance Evaluation Rev., vol. 26, no. 3, pp. 2-9, 1998.
[10] G. Dunteman , Principal Component Analysis. Sage Publications, 1989.
[11] L. Eeckhout , H. Vandierendonck , and K. De Bosschere , “Designing Computer Architecture Research Workloads,” Computer, vol. 36, no. 2, pp. 65-71, Feb. 2003.
[12] L. Eeckhout , H. Vandierendonck , and K. De Bosschere , “Quantifying the Impact of Input Data Sets on Program Behavior and Its Applications,” J. Instruction Level Parallelism, vol. 5, pp. 1-33, 2003.
[13] R. Giladi and N. Ahituv , “SPEC as a Performance Evaluation Measure,” Computer, vol. 28, no. 8, pp. 33-42, Aug. 1995.
[14] M. Guthaus , J. Ringenberg , D. Ernst , T. Austin , T. Mudge , and R. Brown , “MiBench: A Free, Commercially Representative Embedded Benchmark Suite,” Proc. Fourth Ann. Workshop Workload Characterization, 2001.
[15] D. Hammerstrom and E. Davdison , “Information Content of CPU Memory Referencing Behavior,” Proc. Int'l Symp. Computer Architecture, pp. 184-192, 1997.
[16] J. Henning , “SPEC CPU2000: Measuring CPU Performance in the New Millenium,” Computer, vol. 33, no. 7, pp. 28-35, July 2000.
[17] A. Jain and R. Dubes , Algorithms for Clustering Data. Prentice Hall, 1988.
[18] L. John , P. Vasudevan , and J. Sabarinathan , “Workload Characterization: Motivation, Goals and Methodology,” Workload Characterization: Methodology and Case Studies, L.K. John and A.M.G. Maynard, eds., IEEE CS Press, 1999.
[19] L. John , V. Reddy , P. Hulina , and L. Coraor , “Program Balance and Its Impact on High Performance RISC Architecture,” Proc. Int'l Symp. High Performance Computer Architecture, pp. 370-379, Jan. 1995.
[20] A. Joshi , A. Phansalkar , L. Eeckhout , and L. John , “Measuring Benchmark Similarity Using Inherent Program Characteristics,” Laboratory of Computer Architecture Technical Report TR-060201, The Univ. of Texas at Austin, Feb. 2006.
[21] A.J. KleinOswoski and D. Lilja , “MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research,” Computer Architecture Letters, pp. 10-13, 2002.
[22] T. Lafage and A. Seznec , “Choosing Representative Slices of Program Execution for Microarchitecture Simulations: A Preliminary Application to the Data Stream,” Proc. Workshop Workload Characterization (WWC-2000), Sept. 2000.
[23] C. Lee , M. Potkonjak , and W.H. Mangione-Smith , “MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communication Systems,” Proc. Int'l Symp. Microarchitecture, 1997.
[24] N. Mirghafori , M. Jacoby , and D. Patterson , “Truth in SPEC Benchmarks,” Computer Architecture News, vol. 23, no. 5, pp. 34-42, Dec. 1995.
[25] S. Mukherjee , S. Adve , T. Austin , J. Emer , and P. Magnusson , “Performance Simulation Tools,” Computer, vol. 35, no. 2, Feb. 2002.
[26] D. Noonburg and J. Shen , “A Framework for Statistical Modeling of Superscalar Processor Performance,” Proc. Int'l Symp. High Performance Computer Architecture, pp. 298-309, 1997.
[27] A. Phansalkar , A. Joshi , L. Eeckhout , and L. John , “Measuring Program Similarity— Experiments with SPEC CPU Benchmark Suites,” Proc. Int'l Symp. Performance Analysis of Systems and Software, 2005.
[28] R. Saveedra and A. Smith , “Analysis of Benchmark Characteristics and Benchmark Performance Prediction,” ACM Trans. Computer Systems, vol. 14, no. 4, pp. 344-384, 1996.
[29] T. Sherwood , E. Perelman , and B. Calder , “Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications,” Proc. Int'l Conf. Parallel Architectures and Complication Techniques, pp. 3-14, 2000.
[30] T. Sherwood , E. Perelman , G. Hamerly , and B. Calder , “Automatically Characterizing Large Scale Program Behavior,” Proc. Int'l Conf. Architecture Support for Programming Languages and Operating Systems, pp. 45-57, 2002.
[31] K. Skadron , M. Martonosi , D. August , M. Hill , D. Lilja , and V. Pai , “Challenges in Computer Architecture Evaluation,” Computer, pp. 30-36, Aug. 2003.
[32] E. Sorenson and J. Flanagan , “Cache Characterization Surfaces and Prediction of Workload Miss Rates,” Proc. Int'l Workshop Workload Characterization, pp. 129-139, Dec. 2001.
[33] E. Sorenson and J. Flanagan , “Evaluating Synthetic Trace Models Using Locality Surfaces,” Proc. Fifth IEEE Ann. Workshop Workload Characterization, pp. 23-33, Nov. 2002.
[34] J. Spirn and P. Denning , “Experiments with Program Locality,” Proc. The Fall Joint Conf., pp. 611-621, 1972.
[35] Standard Performance Evaluation Corp., http://www.spec.orgbenchmarks.html, 2005.
[36] H. Vandierendonck and K. De Bosschere , “Many Benchmarks Stress the Same Bottlenecks,” Proc. Workshop Computer Architecture Evaluation Using Commerical Workloads (CAECW-7), pp. 57-71, 2004.
[37] R. Weicker”An , Overview of Common Benchmarks,” Computer, vol. 23, no. 12, pp. 65-75, Dec. 1990.
[38] T. Wenisch , R. Wunderlich , B. Falsafi , and J. Hoe , “Applying SMARTS to SPEC CPU2000,” CALCM Technical Report 2003-1, Carnegie Mellon Univ., June 2003.
[39] S. Woo , M. Ohara , E. Torrie , J. Singh , and A. Gupta , “The SPLASH-2 Programs: Characterization and Methodological Considerations,” Proc. Int'l Symp. Computer Architecture, pp. 24-36, June 1995.
[40] J. Wunderlich , R. Wenisch , B. Falfasi , and J. Hoe , “SMARTS: Accelerating Microarchitecture Simulation via Rigorous Statistical Sampling,” Proc. Int'l Symp. Computer Architecture, pp. 84-95, 2003.
[41] J. Yi , D. Lilja , and D. Hawkins , “A Statistically Rigorous Approach for Improving Simulation Methodology,” Proc. Int'l Conf. High-Performance Computer Architecture, pp. 281-291, 2003.
[42] “All Published SPEC CPU200 Results,” l, 2005.

Index Terms:
Measurement techniques, modeling techniques, performance of systems, performance attributes.
Ajay Joshi, Aashish Phansalkar, Lieven Eeckhout, Lizy Kurian John, "Measuring Benchmark Similarity Using Inherent Program Characteristics," IEEE Transactions on Computers, vol. 55, no. 6, pp. 769-782, June 2006, doi:10.1109/TC.2006.85
Usage of this product signifies your acceptance of the Terms of Use.