The Community for Technology Leaders
RSS Icon
Issue No.05 - September/October (2010 vol.30)
pp: 30-41
Manolis Katevenis , FORTH-ICS, Heraklion
Vassilis Papaefstathiou , FORTH-ICS, Heraklion
Stamatis Kavadias , FORTH-ICS, Heraklion
Dionisios Pnevmatikatos , FORTH-ICS, Heraklion
Federico Silla , Universidad Politecnica de Valencia, Valencia
Dimitrios Nikolopoulos , FORTH-ICS, Heraklion
<p>A new network interface optimized for SARC supports synchronization and explicit communication and provides a robust mechanism for event responses. Full-system simulation of the authors' design achieved a 10- to 40-percent speed increase over traditional cache architectures on 64 cores, a two- to four-fold decrease in on-chip network traffic, and a three- to five-fold decrease in lock and barrier latency.</p>
interprocessor communication, explicit communication, synchronization, configurable local memory, scratchpad, user-level RDMA, SARC
Manolis Katevenis, Vassilis Papaefstathiou, Stamatis Kavadias, Dionisios Pnevmatikatos, Federico Silla, Dimitrios Nikolopoulos, "Explicit Communication and Synchronization in SARC", IEEE Micro, vol.30, no. 5, pp. 30-41, September/October 2010, doi:10.1109/MM.2010.77
1. A. Ramirez et al., "The SARC Architecture," IEEE Micro, vol. 30, no. 5, 2010, pp. 16-29.
2. G. Kalokerinos et al., "FPGA Implementation of a Configurable Cache/Scratchpad Memory with Virtualized User-level RDMA Capability," Proc. IEEE Int'l Conf. Embedded Computer Systems: Architectures, Modeling, and Simulation (IC-SAMOS 09), IEEE Press, 2009, pp. 149-156.
3. M. Katevenis, "Interprocessor Communication seen as Load-Store Instruction Generalization," The Future of Computing, Essays in Memory of Stamatis Vassiliadis, TU Delft, 2007, pp. 55-68.
4. F. Gilabert et al., "Exploring High-Dimensional Topologies for NoC Design Through an Integrated Analysis and Synthesis Framework," Proc. 2nd IEEE Int'l Symp. Networks-on-Chip (NOCS 08), IEEE CS Press, 2008, pp. 107-116.
5. C. Gomez et al., "Beyond Fat-Tree: Unidirectional Load-Balanced Multistage Interconnection Network," Computer Architecture Letters, IEEE CS Press, June 2008, pp. 49-52.
6. J. Flich, S. Rodrigo, and J. Duato, "An Efficient Implementation of Distributed Routing Algorithms for NoCs," Proc. 2nd Ann. Int'l Symp. Networks-on-Chip (NOCS 08), IEEE CS Press, 2008, pp. 87-96.
7. S. Rodrigo et al., "Efficient Unicast and Multicast Support for CMPs," Proc. 41st Ann. IEEE/ACM Int'l Symp. Microarchitecture (MICRO 08), IEEE CS Press, 2008, pp. 364-375.
8. C. Hernandez, F. Silla, and J. Duato, "A Methodology for the Characterization of Process Variation in NoC Links," Proc. Design, Automation, and Test in Europe (DATE 10), EDAA Press, 2010, pp. 685-690.
9. C. Hernandez et al., "Improving the Performance of GALS-based NoCs in the Presence of Process Variation," Proc. IEEE Int'l Symp. Networks-on-Chip (NOCS 10), IEEE CS Press, 2010, pp. 35-42.
10. P.S. Magnusson et al., "Simics: A Full System Simulation Platform," Computer, vol. 35, no. 2, 2002, pp. 50-58.
11. M. Martin et al., "Multifacet's General Execution-driven Multiprocessor Simulator (Gems) Toolset," SIGARCH Computer Architecture News, vol. 33, no. 4, 2005, pp. 92-99.
12. N. Agarwal et al., "GARNET: A Detailed On-chip Network Model Inside a Full-system Simulator," Proc. Int'l Symp. Performance Analysis of Systems and Software (ISPASS 09), IEEE CS Press, 2009, pp. 33-42.
13. A.B. Kahng et al., "Orion 2.0: A Fast and Accurate NoC Power and Area Model for Early-stage Design Space Exploration," Proc. Design, Automation & Test in Europe (DATE 09), EDAA Press, 2009, pp. 423-428.
14. D. Abts et al., "Achieving Predictable Performance through Better Memory Controller Placement in Many-core CMPs," Proc. 36th Ann. Int'l Symp. Computer Architecture (ISCA 09), ACM Press, 2009, pp. 451-461.
15. F. Dahlgren and P. Stenström, "Evaluation of Hardware-based Stride and Sequential Prefetching in Shared-memory Multiprocessors," IEEE Trans. Parallel and Distributed Systems, vol. 7, no. 4, 1996, pp. 385-398.
16. J.M. Mellor-Crummey and M.L. Scott, "Algorithms for Scalable Synchronization on Shared-memory Multiprocessors," ACM Trans. Computer Systems, vol. 9, no. 1, 1991, pp. 21-65.
23 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool