Proceedings Fifth International Symposium on High-Performance Computer Architecture (1999)
Jan. 9, 1999 to Jan. 12, 1999
Recent research shows that the occupancy of the coherence controllers is a major performance bottleneck for distributed cache coherent shared memory multiprocessors. A significant part of the occupancy is due to the latency of accessing the directory, which is usually kept in DRAM memory. Most coherence controller designs that use protocol processors for executing the coherence protocol handlers use the data cache of the protocol processor for caching directory entries along with protocol handler data. Analogously, a fast Directory Cache (DC) can also be used by the hardwired coherence controller designs in order to minimize directory access time. However, the existing hardwired controllers do not use a directory cache. Moreover, the performance impact of caching directory entries has not been studied in the literature before.This paper studies the performance of directory caches using parallel applications from the SPLASH-2 suite. We demonstrate that using a directory cache can result in 40% or more improvement in the execution time of applications that are communication intensive. We also investigate in detail the various directory cache design parameters: cache size, cache line size, and associativity. Our experimental results show that the directory cache size requirements grow sub-linearly with the increase in the application's data set size. The results also show the performance advantage of multi-entry directory cache lines, as a result of spatial locality and the absence of sharing of directories. The impact of the associativity of the directory caches on performance is less than that of the size and the line size.Also, we find a clear linear relation between the directory cache miss ratio and the coherence controller occupancy, and between both measures and the execution time of the applications, which can help system architects evaluate the impact of directory cache (or coherence controller) designs on overall system performance.
M. M. Michael and A. K. Nanda, "Design and Performance of Directory Caches for Scalable Shared Memory Multiprocessors," Proceedings Fifth International Symposium on High-Performance Computer Architecture(HPCA), Orlando, Florida, 1999, pp. 142.