This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Task-Centric Memory Model for Scalable Accelerator Architectures
January/February 2010 (vol. 30 no. 1)
pp. 29-39
John H. Kelm, University of Illinois at Urbana-Champaign
Daniel R. Johnson, University of Illinois at Urbana-Champaign
Steven S. Lumetta, University of Illinois at Urbana-Champaign
Sanjay J. Patel, University of Illinois at Urbana-Champaign

This article presents a memory model for parallel compute accelerators with task-based programming models that uses a software protocol, working in collaboration with hardware caches, to maintain a coherent, single address space view of memory without requiring hardware cache coherence. The memory model supports visual computing applications, which are becoming an important class of workloads capable of exploiting 1,000-core processors.

1. NVIDIA, "NVIDIA GeForce 8800 GPU Architecture Overview," Nov. 2006.
2. L. Seiler et al., "Larrabee: A Many-Core x86 Architecture for Visual Computing," ACM Trans. Graphics, vol. 27, no. 3, 2008, article no. 18.
3. L.G. Valiant, "A Bridging Model for Parallel Computation," Comm. ACM, vol. 33, no. 8, 1990, pp. 103-111.
4. C. Amza et al., "Treadmarks: Shared Memory Computing on Networks of Workstations," Computer, vol. 29, no. 2, 1996, pp. 18-28.
5. J.K. Bennett, J.B. Carter, and W. Zwaenepoel, "Munin: Distributed Shared Memory Based on Type-Specific Memory Coherence," Proc. 2nd ACM SIGPLAN Symp. Principles & Practice of Parallel Programming (PPoPP 90), ACM Press, 1990, pp. 168-176.
6. A. Mahesri et al., "Tradeoffs in Designing Accelerator Architectures for Visual Computing," Proc. Int'l Symp. Microarchitecture, 2008, ACM Press, pp. 164-175.
7. J.H. Kelm et al., "Rigel: An Architecture and Scalable Programming Interface for a 1000-Core Accelerator," Proc. Int'l Symp. Computer Architecture, 2009, ACM Press, pp. 140-151.
8. J. Goodman, "Cache Consistency and Sequential Consistency," tech. report 61, SCI Working Group, 1989.
1. M. Gschwind, "Chip Multiprocessing and the Cell Broadband Engine," Proc. 3rd Conf. Computing Frontiers, ACM Press, 2006, pp. 1-8.
2. E. Lindholm et al., "NVIDIA Tesla: A Unified Graphics and Computing Architecture," IEEE Micro, vol. 28, no. 2, pp. 39-55, 2008.
3. L. Seiler et al., "Larrabee: A Many-Core x86 Architecture for Visual Computing," ACM Trans. Graphics, vol. 27, no. 3, 2008, article no. 18.
4. J. Leverich et al., "Comparing Memory Systems for Chip Multiprocessors," Proc. Int'l Symp. Computer Architecture (ISCA 07), ACM Press, 2007, pp. 358-368.
5. D. Gajski et al., "Cedar: A Large Scale Multiprocessor," SIGARCH Computing Architecture News, vol. 11, no. 1, pp. 7-11, 1983.
6. J.H. Kelm et al., "Rigel: An Architecture and Scalable Programming Interface for a 1000-Core Accelerator," Proc. Int'l Symp. Computer Architecture (ISCA 09), ACM Press, 2009, pp. 140-151.
7. J. Reinders, Intel Threading Building Blocks: Outfitting C++ for Multicore Processor Parallelism, O'Reilly, 2007.
8. M. Frigo, C.E. Leiserson, and K.H. Randall, "The Implementation of the Cilk-5 Multithreaded Language," SIGPLAN Notices, vol. 33, no. 5, 1998, pp. 212-223.
9. L.G. Valiant, "A Bridging Model for Parallel Computation," Comm. ACM, vol. 33, no. 8, 1990, pp. 103-111.
10. J. Nickolls et al., "Scalable Parallel Programming with CUDA," Queue, vol. 6, no. 2, 2008, pp. 40-53.
11. OpenCL Specification, 1st ed., Khronos OpenCL Working Group, 2008.
12. C. Amza et al., "Treadmarks: Shared Memory Computing on Networks of Workstations," Computer, vol. 29, no. 2, 1996, pp. 18-28.
13. D. Scales et al., "Shasta: A Low Overhead, Software-Only Approach for Fine-Grain Shared Memory," Proc. 7th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS 96), 1996, pp. 174-185.
14. B. Bershad, M. Zekauskas, and W. Sawdon, "The Midway Distributed Shared Memory System," Compcon Spring '93, Digest of Papers., Feb 1993, pp. 528-537.
15. J.K. Bennett, J.B. Carter, and W. Zwaenepoel, "Munin: Distributed Shared Memory Based on Type-Specific Memory Coherence," Proc. 2nd ACM SIGPLAN Symp. Principles & Practice of Parallel Programming (PPoPP 90), ACM Press, 1990, pp. 168-176.
16. L. Iftode, J.P. Singh, and K. Li, "Scope Consistency: A Bridge Between Release Consistency and Entry Consistency," Proc. 8th Ann. Symp. Parallel Algorithms and Architectures (SPAA 96), ACM Press, 1996, pp. 277-287.
17. M.D. Hill et al., "Cooperative Shared Memory: Software and Hardware for Scalable Multiprocessors," ACM Trans. Computer Systems, vol. 11, no. 4, 1993, pp. 300-318.

Index Terms:
accelerator, memory model, parallel architecture, software coherence
Citation:
John H. Kelm, Daniel R. Johnson, Steven S. Lumetta, Sanjay J. Patel, Matthew I. Frank, "A Task-Centric Memory Model for Scalable Accelerator Architectures," IEEE Micro, vol. 30, no. 1, pp. 29-39, Jan.-Feb. 2010, doi:10.1109/MM.2010.6
Usage of this product signifies your acceptance of the Terms of Use.