The Community for Technology Leaders
RSS Icon
Issue No.10 - Oct. (2011 vol.22)
pp: 1610-1623
Blas Cuesta , Universidad Politécnica de Valencia, Valencia
Antonio Robles , Universidad Politécnica de Valencia, Valencia
José Duato , Universidad Politécnica de Valencia, Valencia
Token Coherence is a cache coherence protocol that simultaneously captures the best attributes of the traditional approximations to coherence: direct communication between processors (like snooping-based protocols) and no reliance on bus-like interconnects (like directory-based protocols). This is possible thanks to a class of unordered requests that usually succeed in resolving the cache misses. The problem of the unordered requests is that they can cause protocol races, which prevent some misses from being resolved. To eliminate races and ensure the completion of the unresolved misses, Token Coherence uses a starvation prevention mechanism named persistent requests. This mechanism is extremely inefficient and, besides, it endangers the scalability of Token Coherence since it requires storage structures (at each node) whose size grows proportionally to the system size. While multiprocessors continue including an increasingly number of nodes, both the performance and scalability of cache coherence protocols will continue to be key aspects. In this work, we propose an alternative starvation prevention mechanism, named priority requests, that outperforms the persistent request one. This mechanism is able to reduce the application runtime more than 20 percent (on average) in a 64-processor system. Furthermore, thanks to the flexibility shown by priority requests, it is possible to drastically minimize its storage requirements, thereby improving the whole scalability of Token Coherence. Although this is achieved at the expense of a slight performance degradation, priority requests still outperform persistent requests significantly.
Cache coherence, token coherence, starvation prevention, scalability.
Blas Cuesta, Antonio Robles, José Duato, "Efficient and Scalable Starvation Prevention Mechanism for Token Coherence", IEEE Transactions on Parallel & Distributed Systems, vol.22, no. 10, pp. 1610-1623, Oct. 2011, doi:10.1109/TPDS.2011.80
[1] P. Kongetira et al., "Niagara: A 32-Way Multithreaded SPARC Processor," IEEE Micro, vol. 25, no. 2, pp. 21-29, Mar./Apr. 2005.
[2] H.Q. Le et al., "IBM POWER6 Microarchitecture," IBM J. Research Development, vol. 51, no. 6, pp. 639-662, 2007.
[3] O. Liu, "AMD Technology: Power, Performance and the Future," CHINA HPC '07: Proc. 2007 Asian Technology Information Program's (ATIP's) Third Workshop High Performance Computing in China, pp. 89-94, 2007.
[4] J.A. Kahle et al., "Introduction to the Cell Multiprocessor," IBM J. Research Development, vol. 49, nos. 4/5, pp. 589-604, 2005.
[5] J.R. Goodman, "Using Cache Memory to Reduce Processor-Memory Traffic," ISCA '83: Proc. 10th Ann. Int'l Symp. Computer Architecture, pp. 124-131, 1983.
[6] M.R. Marty, J.D. Bingham, M.D. Hill, A.J. Hu, M.M.K. Martin, and D.A. Wood, "Improving Multiple-CMP Systems Using Token Coherence," HPCA '05: Proc. 11th Int'l Symp. High-Performance Computer Architecture, pp. 328-339, 2005.
[7] A. Agarwal, R. Simoni, J. Hennessy, and M. Horowitz, "An Evaluation of Directory Schemes for Cache Coherence," SIGARCH Computer Architecture News, vol. 16, no. 2, pp. 280-298, 1988.
[8] M.M.K. Martin et al., "Token Coherence: Decoupling Performance and Correctness," ISCA '03: Proc. 30th Ann. Int'l Symp. Computer Architecture, pp. 182-193, 2003.
[9] M.M.K. Martin, Token Coherence. Univ. of Wisconsin-Madison, 2003.
[10] B. Cuesta, A. Robles, and J. Duato, "An Effective Starvation Avoidance Mechanism to Enhance the Token Coherence Protocol," PDP '07: Proc. 15th Euromicro Int'l Conf. Parallel, Distributed and Network-Based Processing, pp. 47-54, 2007.
[11] P. Sweazey and A.J. Smith, "A Class of Compatible Cache Consistency Protocols and Their Support by the IEEE Futurebus," SIGARCH Computer Architecture News, vol. 14, no. 2, pp. 414-423, 1986.
[12] M.R. Marty and M.D. Hill, "Coherence Ordering for Ring-Based Chip Multiprocessors," MICRO 39: Proc. 39th Ann. IEEE/ACM Int'l Symp. Microarchitecture, pp. 309-320, 2006.
[13] A. Raghavan et al., "Token Tenure: Patching Token Counting Using Directory-Based Cache Coherence," MICRO 41: Proc. 41th Int'l Symp. Microarchitecture, 2008.
[14] N. Agarwal et al., "In-Network Snoop Ordering (inso): Snoopy Coherence on Unordered Interconnects," Proc. Int'l Symp. High Performance Computer Architecture (HPCA), Feb. 2009.
[15] B. Cuesta, A. Robles, and J. Duato, "Switch-Based Packing Technique for Improving Token Coherence Scalability," PDCAT '08: Proc. Ninth Int'l Conf. Parallel and Distributed Computing, Applications and Technologies, pp. 83-90, 2008.
[16] B. Cuesta, A.R. Martinez, and J.F.D. Marin, "Improving Token Coherence by Multicast Coherence Messages," PDP '08: Proc. 16th Euromicro Conf. Parallel, Distributed and Network-Based Processing, pp. 269-273, 2008.
[17] J. Duato, S. Yalamachili, and L. Ni, Interconnection Networks: An Engineering Approach. Morgan Kaufmann, 2003.
[18] H. Sullivan and T.R. Bashkow, "A Large Scale, Homogeneous, Fully Distributed Parallel Machine, I," SIGARCH Computer Architecture News, vol. 5, no. 7, pp. 105-117, 1977.
[19] P.S. Magnusson et al., "Simics: A Full System Simulation Platform," Computer, vol. 35, no. 2, pp. 50-58, Feb. 2002.
[20] M.M.K. Martin et al., "Multifacet's General Execution-Driven Multiprocessor Simulator (GEMS) Toolset," SIGARCH Computer Architecture News, vol. 33, no. 4, pp. 92-99, 2005.
[21] GAP - Parallel Architecture Group. http:/, 2011.
[22] J. Bobba et al., "Performance Pathologies in Hardware Transactional Memory," SIGARCH Computer Architecture News, vol. 35, no. 2, pp. 81-91, 2007.
[23] A.R. Alameldeen and D.A. Wood, "Variability in Architectural Simulations of Multi-Threaded Workloads," HPCA '03: Proc. Ninth Int'l Symp. High-Performance Computer Architecture, pp. 7-18, 2003.
20 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool