The Community for Technology Leaders
RSS Icon
Issue No.02 - July-Dec. (2012 vol.11)
pp: 49-52
Yang Li , University of Pittsburgh, Pittsburgh
Rami Melhem , University of Pittsburgh, Pittsburgh
Alex K. Jones , University of Pittsburgh, Pittsburgh
Traversing page table during virtual to physical address translation causes significant pipeline stalls when misses occur in the translation-lookaside buffer (TLB). To mitigate this penalty, we propose a fast, scalable, multi-level TLB organization that leverages page sharing behaviors and performs efficient TLB entry placement. Our proposed partial sharing TLB (PSTLB) reduces TLB misses by around 60%. PSTLB also improves TLB performance by nearly 40% compared to traditional private TLBs and 17% over the state of the art scalable TLB proposal.
Prefetching, Benchmark testing, Virtual private networks, Runtime, Partial Sharing, Prefetching, Benchmark testing, Tiles, Oceans, Virtual private networks, Runtime, Fluids, CMPs, TLBs
Yang Li, Rami Melhem, Alex K. Jones, "Leveraging Sharing in Second Level Translation-Lookaside Buffers for Chip Multiprocessors", IEEE Computer Architecture Letters, vol.11, no. 2, pp. 49-52, July-Dec. 2012, doi:10.1109/L-CA.2011.35
1. J. M. Arnold,D. A. Buell,, and E. G. Davis,“Splash 2,” in SPAA ′92: Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures. New York, NY, USA: ACM, 1992, pp. 316-322.
2. A. Bhattacharjee,D. Lustig,, and M. Martonosi,“Shared last-level tlbs for chip multiprocessors,” in 17th International Symposium on High-Performance Computer Architecture (HPCA), February 2011.
3. A. Bhattacharjee and M. Martonosi,“Characterizing the tlb behavior of emerging parallel workloads on chip multiprocessors,” in Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques. Washington, DC, USA: IEEE Computer Society, 2009, pp. 29-40. [Online]. Available: http://portal.acm.orgcitation.cfm?id=1636712.1637745
4. A. Bhattacharjee,“Inter-core cooperative tlb for chip multiprocessors,” in Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems, ser. ASPLOS ′10. New York, NY, USA: ACM, 2010, pp. 359-370. [Online]. Available:
5. C. Bienia,S. Kumar,J. P. Singh,, and K. Li,“The parsec benchmark suite: Characterization and architectural implications,” Princeton University, Tech. Rep. TR-811-08, January 2008.
6. S. M. Blackburn,R. Garner,, and et al., “The dacapo benchmarks: java benchmarking development and analysis,” SIGPLAN Not., volume 41, pp. 169-190, October 2006. [Online]. Available:
7. N. Hardavellas,M. Ferdman,B. Falsafi,, and A. Ailamaki,“Re­active nuca: near-optimal block placement and replication in distributed caches,” in Proceedings of the 36th annual international symposium on Computer architecture, ser. ISCA ′09. New York, NY, USA: ACM, 2009, pp. 184-195.
8. C. Kim,D. Burger,, and S. W. Keckler,“Nonuniform cache architectures for wire-delay dominated on-chip caches,” IEEE Micro, volume 23, no. 6, pp. 99-107, 2003.
9. P. S. Magnusson,M. Christensson,J. Eskilson,D. Forsgren,G. Hallberg,J. Hogberg,F. Larsson,A. Moestedt,, and B. Werner,“Simics: A full system simulation platform,” IEEE Computer, volume 35, no. 2, pp. 50-58, February 2002.
10. S. Srikantaiah and M. Kandemir,“Synergistic tlbs for high performance address translation in chip multiprocessors,” in Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO ′43. Washington, DC, USA: IEEE Computer Society, 2010, pp. 313-324.
11. G. Venkatasubramanian,R. J. Figueiredo., and R. lllikkal,“Un the performance of tagged translation lookaside buffers: A simulation-driven analysis,” in 19th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, ser. MASCOTS 2011, 2011.
12. C. Villavieja,V. Karakostas,L. Vilanova,Y. Etsion,A. Ramirez,A. Mendelson,N. Navarro,A. Cristal,, and O. S. Unsal,“Didi: Mitigating the performance impact of tlb shootdowns using a shared tlb directory,” in Parallel Architectures and Compilation Techniques (PACT), October 2011.
43 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool