The Community for Technology Leaders
RSS Icon
Issue No.08 - Aug. (2012 vol.23)
pp: 1361-1368
Julio Sahuquillo , Universidad Politécnica de Valencia, Valencia
Salvador Petit , Universidad Politécnica de Valencia, Valencia
Rafael Ubal , Northeastern University, Boston
David R. Kaeli , Northeastern University, Boston
Out-of-order retirement of instructions has been shown to be an effective technique to increase the number of in-flight instructions. This form of runtime scheduling can reduce pipeline stalls caused by head-of-line blocking effects in the reorder buffer (ROB). Expanding the width of the instruction window can be highly beneficial to multiprocessors that implement a strict memory model, especially when both loads and stores encounter long latencies due to cache misses, and whose stalls must be overlapped with instruction execution to overcome the memory latencies. Based on the Validation Buffer (VB) architecture (a previously proposed out-of-order retirement, checkpoint-free architecture for single processors), this paper proposes a cost-effective, scalable, out-of-order retirement multiprocessor, capable of enforcing sequential consistency without impacting the design of the memory hierarchy or interconnect. Our simulation results indicate that utilizing a VB can speed up both relaxed and sequentially consistent in-order retirement in future multiprocessor systems by between 3 and 20 percent, depending on the ROB size.
Out-of-order retirement, multicore processors, validation buffer, sequential consistency.
Julio Sahuquillo, Salvador Petit, Rafael Ubal, David R. Kaeli, "A Sequentially Consistent Multiprocessor Architecture for Out-of-Order Retirement of Instructions", IEEE Transactions on Parallel & Distributed Systems, vol.23, no. 8, pp. 1361-1368, Aug. 2012, doi:10.1109/TPDS.2011.255
[1] H. Akkary, R. Rajwar, and S.T. Srinivasan, "Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors," Proc. 36th Int'l Symp. Microarchitecture, Dec. 2003.
[2] G.B. Bell and M.H. Lipasti, "Deconstructing Commit," Proc. the Int'l Symp. Performance Analysis of Systems and Software, Mar. 2004.
[3] A. Cristal, D. Ortega, J. Llosa, and M. Valero, "Out-of-Order Commit Processors," Proc. the Int'l Symp. High Performance Architecture, Feb. 2004.
[4] S. Petit, J. Sahuquillo, P. López, R. Ubal, and J. Duato, "A Complexity-Effective Out-of-Order Retirement Microarchitecture," IEEE Trans. Computers, vol. 58, no. 12, pp. 1626-1639, Dec. 2009.
[5] K.C. Yeager, "The Mips R10000 Superscalar Microprocessor," IEEE Micro, vol. 16, no. 2, pp. 28-41, Apr. 1996.
[6] A. Kumar, "The HP PA-8000 RISC CPU," IEEE Micro, vol. 17, no. 2, pp. 27-32, Mar. 1997.
[7] J. Torrellas, L. Ceze, J. Tuck, C. Cascaval, P. Montesinos, W. Ahn, and M. Prvulovic, "The Bulk Multicore Architecture for Improved Programmability," Comm. ACM, vol. 52, pp. 58-65, 2009.
[8] M. Moudgill, K. Pingali, and S. Vassiliadis, "Register Renaming and Dynamic Speculation: An Alternative Approach," Proc. the 26th Int'l Symp. Microarchitecture, Dec. 1993.
[9] L. Lamport, "How to Make a Multiprocessor Computer that Correctly Executes Multiprocess Programs," IEEE Trans. Computers, vol. C-28, no. 9, pp. 690-691, Sept. 1979.
[10] S.V. Adve, "Designing Memory Consistency Models for Shared-Memory Multiprocessors," PhD thesis, 1993.
[11] K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy, "Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors," Proc. 17th Int'l Symp. Computer Architecture, May 1990.
[12] C. Scheurich and M. Dubois, "Correct Memory Operation of Cache-Based Multiprocessors," Proc. the 14th Int'l Symp. Computer Architecture, June 1987.
[13] P. Ranganathan, V.S. Pai, and S.V. Adve, "Using Speculative Retirement and Larger Instruction Windows to Narrow the Performance Gap between Memory Consistency Models," Proc. the Ninth ACM Symp. Parallel Algorithms and Architectures, June 1997.
[14] K. Gharachorloo, A. Gupta, and J. Hennessy, "Two Techniques to Enhance the Performance of Memory Consistency Models," Proc. the Int'l Conf. Parallel Processing, Aug. 1991.
[15] AMD Opteron 8350 Quad-Core Processor - http://multicore., 2011.
[16] S.C. Woo, M. Ohara, E. Torrie, J.P. Singh, and A. Gupta, "The SPLASH-2 Programs: Characterization and Methodological Considerations," Proc. the 22nd Int'l Symp. Computer Architecture, June 1995.
[17] C. Gniady, B. Falsafi, and T.N. Vijaykumar, "Is SC + ILP = RC?," Proc. the 26th Int'l Symp. Computer Architecture, 1999.
[18] C. Gniady and B. Falsafi, "Speculative Sequential Consistency with Little Custom Storage," Proc. the Int'l Conf. Parallel Architectures and Compilation Techniques, Sept. 2002.
[19] L. Ceze, J. Tuck, P. Montesinos, and J. Torrellas, "BulkSC: Bulk Enforcement of Sequential Consistency," Proc. the 34th Int'l Symp. Computer Architecture, 2007.
[20] M. Kirman, N. Kirman, and J.F. Martínez, "Cherry-MP: Correctly Integrating Checkpointed Early Resource Recycling in Chip Multiprocessors," Proc. the Int'l Symp. Microarchitecture, Nov. 2005.
[21] E. Vallejo, M. Galluzzi, A. Cristal, F. Vallejo, R. Beivide, P. Stenstrom, J.E. Smith, and M. Valero, "Implementing Kilo-Instruction Multiprocessors," Proc. IEEE Int'l Conf. Pervasive Services, July 2005.
20 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool