The Community for Technology Leaders
RSS Icon
Issue No.06 - June (2009 vol.58)
pp: 759-769
Shaoshan Liu , University of California, Irvine, Irvine
The newly emerging many-core-on-a-chip designs have renewed an intense interest in parallel processing. By applying Amdahl's formulation to the programs in the PARSEC and SPLASH-2 benchmark suites, we find that most applications may not have sufficient parallelism to efficiently utilize modern parallel machines. The long sequential portions in these application programs are caused by computation as well as communication latency. However, value prediction techniques may allow the “parallelization” of the sequential portion by predicting values before they are produced. In conventional superscalar architectures, the computation latency dominates the sequential sections. Thus, value prediction techniques may be used to predict the computation result before it is produced. In many-core architectures, since the communication latency increases with the number of cores, value prediction techniques may be used to reduce both the communication and computation latency. In this paper, we extend Amdahl's formulation to model the data redundancy inherent to each benchmark, thereby identifying the potential of value prediction techniques. Our analysis shows that the performance of PARSEC benchmarks may improve by a factor of 180 and 230 percent for the SPLASH-2 suite, compared to when only the intrinsic parallelism is considered. This demonstrates the immense potential of fine-grained value prediction in reducing the communication latency in many-core architectures.
Parallelism and concurrency, performance analysis, multiprocessors, value prediction.
Shaoshan Liu, "Potential Impact of Value Prediction on Communication in Many-Core Architectures", IEEE Transactions on Computers, vol.58, no. 6, pp. 759-769, June 2009, doi:10.1109/TC.2009.28
[1] M.H. Lipasti and P. Shen, “Exceeding the Dataflow Limit via Value Prediction,” Proc. 29th Int’l Symp. Microarchitecture, 1996.
[2] Y. Sazeides and J.E. Smith, “The Predictability of Data Values,” Proc. 30th Ann. Int'l Symp. Microarchitecture, 1997.
[3] Y. Sazeides, S. Vassiliadis, and J.E. Smith, “The Performance Potential of Data Dependence Speculation and Collapsing,” Proc. 29th Ann. Int’l Symp. Microarchitecture, 1996.
[4] A. Sodani and G.S. Sohi, “Understanding the Differences between Value Prediction and Instruction Reuse,” Proc. 31st Ann. In’l Symp. Microarchitecture, 1998.
[5] P. Marcuello, A. Gonzalez, and J. Tubella, “Thread Partitioning and Value Prediction for Exploiting Speculative Thread-Level Parallelism,” IEEE Trans. Computers, vol. 53, no. 2, pp. 114-125, Feb. 2004.
[6] J. Oplinger, D. Heine, and M.S. Lam, “In Search of Speculative Thread-Level Parallelism,” Proc. Eighth Int'l Conf. Parallel Architectures Compilation Techniques, 1999.
[7] J.G. Steffan, C.B. Colohan, A. Zhai, and T.C. Mowry, “Improving Value Communication for Thread-Level Speculation,” Proc. Eighth Int'l Symp. High-Performance Computer Architecture, 1999.
[8] D.W. Hammerstrom and E.S. Davidson, “Information Content of CPU Memory Reference Behavior,” Proc. Fourth Ann. Symp. Computer Architecture, 1977.
[9] J. Singer and G. Brown, “Return Value Prediction Meets Information Theory,” Electronic Notes Theoretical Computer Science, vol. 164, no. 3, pp.137-151, 2006.
[10] S. Liu and J.-L. Gaudiot, “Synchronization Mechanisms on Modern Multi-Core Architectures,” Proc. 12th Asia-Pacific Computer Systems Architecture Conf., 2007.
[11] Arvind, R.S. Nikhil, and K.K. Pingali, “I-Structures: Data Structures for Parallel Computing,” ACM Trans. Programming Languages Systems (TOPLAS), vol. 11, no. 4, pp.598-632, Oct. 1989.
[12] W. Zhu, V.C. Sreedhar, Z. Hu, and G. Gao, “Synchronization State Buffer: Supporting Efficient Fine-Grained Synchronization on Many-Core Architectures,” Proc. 34th Int'l Symp. Computer Architecture, 2007.
[13] M. Monchiero, G. Palermo, C. Silvano, and O. Villa, “An Efficient Synchronization Technique for Multiprocessor Systems on-Chip,” ACM SIGARCH Computer Architecture News, vol. 34, no. 1, pp. 33-40, Mar. 2006.
[14] J.T. Feo, “An Analysis of the Computational and Parallel Complexity of the Livermore Loops,” Parallel Computing, vol. 7, no. 2, pp.163-185, 1988.
[15] R.M. Gray, Entropy and Information Theory. Springer-Verlag, 1990.
[16] C. Bienia, S. Kumar, J.P. Singh, and K. Li, “The PARSEC Benchmark Suite: Characterization and Architectural Implications,” Technical Report TR-811-08, Princeton Univ. 2008.
[17] S.C. Woo, M. Ohara, E. Torrie, J.P. Singh, and A. Gupta, “The SPLASH-2 Programs: Characterization and Methodological Considerations,” Proc. 22nd Ann. Int’l Symp. Computer Architecture, 1995.
[18] H. Patil, R. Cohn, M. Charney, R. Kapoor, A. Sun, and A. Karunanidhi, “Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation,” Proc. 37th Ann. Int’l Symp. Microarchitecture, 2004.
[19] K. Hoste and L. Eeckhout, “Microarchitecture-Independent Workload Characterization,” IEEE Micro, vol. 27, no. 3, pp.63-72, May-June 2007.
[20] K. Asanovi, R. Bodik, B.C. Catanzaro, J.J. Gebis, P. Husbands, K. Keutzer, D.A. Patterson, W.L. Plishker, J. Shalf, S.W. Williams, and K.A. Yelick, “The Landscape of Parallel Computing Research: A View from Berkeley,” Technical Report UCB/EECS-2006-183, EECS UC Berkeley, 2006.
[21] P. Dubey, “Recognition, Mining, and Synthesis Moves Computers to the Era of Tera,” Technology@Intel Magazine, 2005.
[22] G.M. Amdahl, “Validation of the Single-Processor Approach to Achieving Large Scale Computing Capabilities,” Proc. American Federation of Info. Processing Societies Conf., vol. 30, 1967.
[23] Cell Broadband Engine Architecture and Its First Implementation, library/pa-cellperf, 2008.
[24] Tera-Scale Computing Research Program, com/research/platform/ terascaleindex.htm, 2008.
[25] J.D. Owens, W.J. Dally, R. Ho, D.N. Jayasimha, S.W. Keckler, and L. Peh, “Research Challenges for On-Chip Interconnection Networks,” IEEE Micro, vol. 27, no. 5, pp.96-108, Sept.-Oct. 2007.
[26] R.P. Martin, A.M. Vahdat, D.E. Culler, and T.E. Anderson, “Effects of Communication Latency, Overhead, and Bandwidth in a Cluster Architecture,” SIGARCH Computer Architecture News, vol. 25, no. 2, pp.85-97, 1997.
[27] D.M. Tullsen and J.S. Seng, “Storageless Value Prediction Using Prior Register Values,” Proc. 26th Int’l Symp. Computer Architecture, 1999.
[28] J.G. Steffan, C.B. Colohan, A. Zhai, and T.C. Mowry, “Improving Value Communication for Thread-Level Speculation,” Proc. Eighth Int’l Symp. High-Performance Computer Architecture, 2002.
[29] J. Parcerisa and A. González, “Reducing Wire Delay Penalty through Value Prediction,” Proc. 33rd Ann. ACM/IEEE Int’l Symp. Microarchitecture, 2000.
[30] K. Wang and M. Franklin, “Highly Accurate Data Value Prediction Using Hybrid Predictors,” Proc. 30th Ann. ACM/IEEE Int’l Symp. Microarchitecture, 1997.
[31] J. Renau, B. Fraguela, J. Tuck, W. Liu, M. Prvulovic, L. Ceze, S. Sarangi, P. Sack, K. Strauss, and P. Montesinos, “SESC Simulator,” http:/, 2006.
[32] A. Mendelson and F. Gabbay, “Speculative Execution Based on Value Prediction,” Technical Report 1080, Electrical Engineering Dept. of Technion—Israel Inst. of Tech nology, 1996.
26 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool