Second IEEE International Symposium on Cluster Computing and the Grid (CCGRID'02)
Efficient Utilization of Memory Mapped NICs onto Clusters using Pipelined Schedules
Berlin, Germany
May 21-May 24
ISBN: 0-7695-1582-7
This paper describes the performance benefits attained using enhanced network interfaces to achieve low latency communication. We make use of DMA communication mode, to send data to other nodes, while the CPU performs useful calculations. Zero-copy communication is achieved through pinned-down physical memory regions, provided by NIC's driver modules. Our testbed concerns the parallel execution of tiled nested loops onto a Linux PC cluster with PCI-SCI NICs (Dolphin D330). Tiles are essentially exchanging data and should also have large Computational grain, so that their parallel execution becomes beneficial. We schedule tiles much more efficiently by exploiting the inherent overlapping between communication and computation phases among successive, atomic tile executions. The applied nonblocking schedule resembles a pipelined data-path where computation phases are overlapped with communication ones, instead of being interleaved with them. Experimental evaluation illustrates that when using enhanced communication features such as DMA transfers, memory-mapped interfaces and zero-copy mechanisms, overall performance is considerably improved compared to using conventional, CPU and kernel bounded, communication primitives.
Citation:
Aristidis Sotiropoulos, Georgios Tsoukalas, Nectarios Koziris, "Efficient Utilization of Memory Mapped NICs onto Clusters using Pipelined Schedules," ccgrid, pp.238, Second IEEE International Symposium on Cluster Computing and the Grid (CCGRID'02), 2002