This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Explicit Communication and Synchronization in SARC
September/October 2010 (vol. 30 no. 5)
pp. 30-41
Manolis Katevenis, FORTH-ICS, Heraklion
Vassilis Papaefstathiou, FORTH-ICS, Heraklion
Stamatis Kavadias, FORTH-ICS, Heraklion
Dionisios Pnevmatikatos, FORTH-ICS, Heraklion
Federico Silla, Universidad Politecnica de Valencia, Valencia
Dimitrios Nikolopoulos, FORTH-ICS, Heraklion

A new network interface optimized for SARC supports synchronization and explicit communication and provides a robust mechanism for event responses. Full-system simulation of the authors' design achieved a 10- to 40-percent speed increase over traditional cache architectures on 64 cores, a two- to four-fold decrease in on-chip network traffic, and a three- to five-fold decrease in lock and barrier latency.

1. A. Ramirez et al., "The SARC Architecture," IEEE Micro, vol. 30, no. 5, 2010, pp. 16-29.
2. G. Kalokerinos et al., "FPGA Implementation of a Configurable Cache/Scratchpad Memory with Virtualized User-level RDMA Capability," Proc. IEEE Int'l Conf. Embedded Computer Systems: Architectures, Modeling, and Simulation (IC-SAMOS 09), IEEE Press, 2009, pp. 149-156.
3. M. Katevenis, "Interprocessor Communication seen as Load-Store Instruction Generalization," The Future of Computing, Essays in Memory of Stamatis Vassiliadis, TU Delft, 2007, pp. 55-68.
4. F. Gilabert et al., "Exploring High-Dimensional Topologies for NoC Design Through an Integrated Analysis and Synthesis Framework," Proc. 2nd IEEE Int'l Symp. Networks-on-Chip (NOCS 08), IEEE CS Press, 2008, pp. 107-116.
5. C. Gomez et al., "Beyond Fat-Tree: Unidirectional Load-Balanced Multistage Interconnection Network," Computer Architecture Letters, IEEE CS Press, June 2008, pp. 49-52.
6. J. Flich, S. Rodrigo, and J. Duato, "An Efficient Implementation of Distributed Routing Algorithms for NoCs," Proc. 2nd Ann. Int'l Symp. Networks-on-Chip (NOCS 08), IEEE CS Press, 2008, pp. 87-96.
7. S. Rodrigo et al., "Efficient Unicast and Multicast Support for CMPs," Proc. 41st Ann. IEEE/ACM Int'l Symp. Microarchitecture (MICRO 08), IEEE CS Press, 2008, pp. 364-375.
8. C. Hernandez, F. Silla, and J. Duato, "A Methodology for the Characterization of Process Variation in NoC Links," Proc. Design, Automation, and Test in Europe (DATE 10), EDAA Press, 2010, pp. 685-690.
9. C. Hernandez et al., "Improving the Performance of GALS-based NoCs in the Presence of Process Variation," Proc. IEEE Int'l Symp. Networks-on-Chip (NOCS 10), IEEE CS Press, 2010, pp. 35-42.
10. P.S. Magnusson et al., "Simics: A Full System Simulation Platform," Computer, vol. 35, no. 2, 2002, pp. 50-58.
11. M. Martin et al., "Multifacet's General Execution-driven Multiprocessor Simulator (Gems) Toolset," SIGARCH Computer Architecture News, vol. 33, no. 4, 2005, pp. 92-99.
12. N. Agarwal et al., "GARNET: A Detailed On-chip Network Model Inside a Full-system Simulator," Proc. Int'l Symp. Performance Analysis of Systems and Software (ISPASS 09), IEEE CS Press, 2009, pp. 33-42.
13. A.B. Kahng et al., "Orion 2.0: A Fast and Accurate NoC Power and Area Model for Early-stage Design Space Exploration," Proc. Design, Automation & Test in Europe (DATE 09), EDAA Press, 2009, pp. 423-428.
14. D. Abts et al., "Achieving Predictable Performance through Better Memory Controller Placement in Many-core CMPs," Proc. 36th Ann. Int'l Symp. Computer Architecture (ISCA 09), ACM Press, 2009, pp. 451-461.
15. F. Dahlgren and P. Stenström, "Evaluation of Hardware-based Stride and Sequential Prefetching in Shared-memory Multiprocessors," IEEE Trans. Parallel and Distributed Systems, vol. 7, no. 4, 1996, pp. 385-398.
16. J.M. Mellor-Crummey and M.L. Scott, "Algorithms for Scalable Synchronization on Shared-memory Multiprocessors," ACM Trans. Computer Systems, vol. 9, no. 1, 1991, pp. 21-65.
1. R.A.F. Bhoedjang, T. Ruhl, and H.E. Bal, "User-Level Network Interface Protocols," Computer, vol. 31, no. 11, 1998, pp. 53-60.
2. P. Ranganathan, S. Adve, and N.P. Jouppi, "Reconfigurable Caches and Their Application to Media Processing," Proc. 27th Ann. Int'l Symp. Computer Architecture (ISCA 00), ACM Press, 2000, pp. 214-224.
3. I. Schoinas et al., "Fine-grain Access Control for Distributed Shared Memory," Proc. 6th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS 94), ACM Press, 1994, pp. 297-306.
4. B. Falsafi et al., "Application-specific Protocols for User-level Shared Memory," Proc. Conf. Supercomputing, ACM Press, 1994, pp. 380-389.
5. S.L. Scott, "Synchronization and Communication in the T3E Multiprocessor," Proc. 7th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS 96), ACM Press, pp. 26-36.
6. J. Leverich et al., "Comparing Memory Systems for Chip Multiprocessors," Proc. 34th Ann. Int'l Symp. Computer Architecture (ISCA 07), ACM Press, 2007, pp. 358-368.
7. M. Wen et al., "On-chip Memory System Optimization Design for the ft64 Scientific Stream Accelerator," IEEE Micro, vol. 28, no. 4, 2008, pp. 51-70.
8. J. Gummaraju et al., "Architectural Support for the Stream Execution Model on General-purpose Processors," Proc. 16th Int'l Conf. Parallel Architecture and Compilation Techniques (PACT 07), IEEE CS Press, pp. 3-12.
9. J. Gummaraju et al., "Streamware: Programming General-purpose Multicore Processors Using Streams," Proc. 13th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS 08), ACM Press, pp. 297-307.
10. G. Kalokerinos et al., "FPGA Implementation of a Configurable Cache/Scratchpad Memory with Virtualized User-level RDMA Capability," Proc. IEEE Int'l Conf. Embedded Computer Systems: Architectures, Modeling, and Simulation (IC-SAMOS 09), IEEE Press, 2009, pp. 149-156.
11. S. Kavadias et al., "On-chip Communication and Synchronization Mechanisms with Cache-integrated Network Interfaces," Proc. ACM Int'l Conf. Computing Frontiers (CF 10), ACM Press, 2010, pp. 217-226.

Index Terms:
interprocessor communication, explicit communication, synchronization, configurable local memory, scratchpad, user-level RDMA, SARC
Citation:
Manolis Katevenis, Vassilis Papaefstathiou, Stamatis Kavadias, Dionisios Pnevmatikatos, Federico Silla, Dimitrios Nikolopoulos, "Explicit Communication and Synchronization in SARC," IEEE Micro, vol. 30, no. 5, pp. 30-41, Sept.-Oct. 2010, doi:10.1109/MM.2010.77
Usage of this product signifies your acceptance of the Terms of Use.