This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Rigel: A 1,024-Core Single-Chip Accelerator Architecture
July/August 2011 (vol. 31 no. 4)
pp. 30-41
Daniel R. Johnson, University of Illinois at Urbana-Champaign
Matthew R. Johnson, University of Illinois at Urbana-Champaign
John H. Kelm, University of Illinois at Urbana-Champaign
William Tuohy, University of Illinois at Urbana-Champaign
Steven S. Lumetta, University of Illinois at Urbana-Champaign
Sanjay J. Patel, University of Illinois at Urbana-Champaign

Rigel is a single-chip accelerator architecture with 1,024 independent processing cores targeted at a broad class of data- and task-parallel computation. This article discusses Rigel's motivation, evaluates its performance scalability as well as power and area requirements, and explores memory systems in the context of 1,024-core single-chip accelerators. The authors also consider future opportunities and challenges for large-scale designs.

1. J.H. Kelm et al., "Rigel: An Architecture and Scalable Programming Interface for a 1,000-Core Accelerator," Proc. 36th Ann. Int'l Symp. Computer Architecture, ACM Press, 2009, pp. 140-151.
2. J.H. Kelm et al., "A Task-Centric Memory Model for Scalable Accelerator Architectures," Proc. Int'l Conf. Parallel Architectures and Compilation Techniques, IEEE CS Press, 2009, pp. 77-87.
3. J.H. Kelm et al., "Waypoint: Scaling Coherence to 1,000-Core Architectures," Proc. Int'l Conf. Parallel Architectures and Compilation Techniques, ACM Press, 2010, pp. 99-110.
4. J.H. Kelm et al., "Cohesion: A Hybrid Memory Model for Accelerators," Proc. Int'l Symp. Computer Architecture, ACM Press, 2010, pp. 429-440.
5. E. Lindholm et al., "Nvidia Tesla: A Unified Graphics and Computing Architecture," IEEE Micro, vol. 28, no. 2, 2008, pp. 39-55.
6. J. Laudon and D. Lenoski, "The SGI Origin: A CCNUMA Highly Scalable Server," Proc. 24th Ann. Int'l Symp. Computer Architecture, ACM Press, 1997, pp. 241-251.
7. S. Borkar, "Thousand Core Chips: A Technology Perspective," Proc. 44th Ann. Design Automation Conf., ACM Press, 2007, pp. 746-749.
8. International Technology Roadmap for Semiconductors, tech. report, ITRS, 2009.
9. J.L. Gustafson, "Reevaluating Amdahl's Law," Comm. ACM, vol. 31, no. 5, 1988, pp. 532-533.
1. J. Shin et al., "A 40 nm 16-Core 128-Thread CMT SPARC SoC Processor," Proc. IEEE Int'l Solid-State Circuits Conf., IEEE Press, 2010, pp. 98-99.
2. S. Bell et al., "Tile64 Processor: A 64-Core SoC with Mesh Interconnect," Proc. IEEE Int'l Solid-State Circuits Conf., IEEE Press, 2008, pp. 88-598.
3. M.B. Taylor et al., "The Raw Microprocessor: A Computational Fabric for Software Circuits and General Purpose Programs," IEEE Micro, vol. 22, no. 2, 2002, pp. 25-35.
4. S. Vangal et al., "An 80-Tile 1.28 Tflops Network-on-Chip in 65 nm CMOS," Proc. IEEE Int'l Solid-State Circuits Conf., IEEE Press, 2007, pp. 98-99, 589.
5. M. Gschwind, "Chip Multiprocessing and the Cell Broadband Engine," Proc. 3rd Conf. Computing Frontiers, ACM Press, 2006, pp. 1-8.
6. S. Rixner et al., "A Bandwidth-Efficient Architecture for Media Processing," Proc. 31st Ann. ACM/IEEE Int'l Symp. Microarchitecture, IEEE CS Press, 1998, pp. 3-13.
7. "Nvidia's Next Generation CUDA Compute Architecture: Fermi," white paper, Nvidia, 2009.
8. L. Seiler et al., "Larrabee: A Many-Core x86 Architecture for Visual Computing," ACM Trans. Graphics, vol. 27, no. 3, 2008, article 18.
1. D.J. Lilja, "Cache Coherence in Large-Scale Shared-Memory Multiprocessors: Issues and Comparisons," ACM Computing Surveys, vol. 25, no. 3, 1993, pp. 303-338.
2. J. Hennessy, M. Heinrich, and A. Gupta, "Cache-Coherent Distributed Shared Memory: Perspectives on Its Development and Future Challenges," Proc. IEEE, vol. 87, no. 3, 1999, pp. 418-429.
3. J. Zebchuk et al., "A Tagless Coherence Directory," Proc. 42nd Ann. IEEE/ACM Int'l Symp. Microarchitecture, ACM Press, 2009, pp. 423-434.
4. G. Kurian et al., "ATAC: A 1000-Core Cache-Coherent Processor with On-Chip Optical Network," Proc. 19th Int'l Conf. Parallel Architectures and Compilation Techniques, ACM Press, 2010, pp. 477-488.
5. A. Firoozshahian et al., "A Memory System Design Framework: Creating Smart Memories," Proc. 36th Ann. Int'l Symp. Computer Architecture, ACM Press, 2009, pp. 406-417.
6. J.H. Kelm et al., "A Task-Centric Memory Model for Scalable Accelerator Architectures," Proc. Int'l Conf. Parallel Architectures and Compilation Techniques, IEEE CS Press, 2009, pp. 77-87.
7. J.H. Kelm et al., "Cohesion: A Hybrid Memory Model for Accelerators," Proc. Int'l Symp. Computer Architecture, ACM Press, 2010, pp. 429-440.
8. J.H. Kelm et al., "Waypoint: Scaling Coherence to 1000-Core Architectures," Proc. Int'l Conf. Parallel Architectures and Compilation Techniques, ACM Press, 2010, pp. 99-110.

Index Terms:
Multiple data-stream architectures (multiprocessors), multiple instruction, multiple data processors, parallel processors, parallel architectures, multicore, single-chip multiprocessors
Citation:
Daniel R. Johnson, Matthew R. Johnson, John H. Kelm, William Tuohy, Steven S. Lumetta, Sanjay J. Patel, "Rigel: A 1,024-Core Single-Chip Accelerator Architecture," IEEE Micro, vol. 31, no. 4, pp. 30-41, July-Aug. 2011, doi:10.1109/MM.2011.40
Usage of this product signifies your acceptance of the Terms of Use.