This Article 
 Bibliographic References 
 Add to: 
In-Kernel Integration of Operating System and Infiniband Functions for High Performance Computing Clusters: A DSM Example
September 2005 (vol. 16 no. 9)
pp. 830-840
Liran Liss, IEEE
Yitzhak Birk, IEEE Computer Society

Abstract—The Infiniband (IB) System Area Network (SAN) enables applications to access hardware directly from user level, reducing the overhead of user-kernel crossings during data transfer. However, distributed applications that exhibit close coupling between network and OS services may benefit from accessing IB from the kernel through IB's native Verbs interface, which permits tight integration of these services. We assess this approach using a sequential-consistency Distributed Shared Memory (DSM) system as an example. We first develop primitives that abstract the low-level communication and kernel details, and efficiently serve the application's communication, memory, and scheduling needs. Next, we combine the primitives to form a kernel DSM protocol. The approach is evaluated using our full-fledged Linux kernel DSM implementation over Infiniband. We show that overheads are reduced substantially, and overall application performance is improved in terms of both absolute execution time and scalability relative to an entirely user level implementation.

[1] Infiniband Trade Assoc.Infiniband Specification, http:/, 2005.
[2] Virtual Interface Architecture Specification, http:/www.viaarch. org/, 2005.
[3] K. Li and P. Hudak, “Memory Coherence in Shared Virtual Memory Systems,” ACM Trans. Computer Systems, vol. 7, no. 4, pp. 321-359, Nov. 1989.
[4] P. Keleher, A.L. Cox, and W. Zwaenepol, “Lazy Consistency for Software Distributed Shared Memory,” Proc. 19th Ann. Symp. Computer Architecture, pp. 13-21, May 1992.
[5] A. Itzkovitz and A. Schuster, “MultiView and Millipage: Fine-Grain Sharing in Page-Based DSMs,” Proc. Conf. OS Design and Implementation, 1999.
[6] M. Banikazemi, J. Liu, D.K. Panda, and P. Sadayappan, “Implementing TreadMarks over Virtual Interface Architecture on Myrinet and Gigabit Ethernet: Challenges, Design Experience, and Performance Evaluation,” Proc. Int'l. Conf. Parallel Processing (ICPP), 2001.
[7] M. Rangarajan and L. Iftode, “Software Distributed Shared Memory over Virtual Interface Architecture: Implementation and Performance,” Proc. Fourth Ann. Linux Showcase and Conf., 2000.
[8] A. Bilas, C. Liao, and J.P. Singh, “Using Network Interface Support to Avoid Asynchronous Protocol Processing in Shared Virtual Memory Systems,” Proc. 26th Int'l Symp. Computer Architecture, 1999.
[9] R. Samanta, A. Bilas, L. Iftode, and J.P. Singh, “Home-Based SVM Protocols for SMP Clusters: Design and Performance,” Proc. Fourth Int'l Symp. High-Performance Computer Architecure (HPCA), 1998.
[10] A. Erlichson, N. Nuckolls, G. Chesson, and J. Hennessy, “SoftFLASH: Analzying the Performance of Clustered Distributed Virtual Shared Memory,” Proc. Seventh Int'l Conf. Architectural Support for Programming Languages and Operating Systems, 1996.
[11] P. Joubert, R.B. King, R. Neves, M Russinovich, and J.M. Tracy, “High-Performance Memory-BasedWeb Servers: Kernel and User-Space Performance,” Proc. USENIX Ann. Technical Conf., 2001.
[12] V.S. Pai, P. Druschel, and W. Zwaenepoel, “IO-Lite: A Unified I/O Buffering and Caching System,” Proc. Conf. OS Design and Implementation (OSDI), 1999.
[13] Oracle, Oracle Net VI Protocol Support, a technical white paper, Oracle_VI.pdf, 2001.
[14] K. Magoutis, S. Addetia, A. Fedorova, M.I. Seltzer, J.S. Chase, A.J. Gallatin, R. Kisley, R.G. Wickremesinghe, and E. Gabber, “Structure and Performance of the Direct Access File System,” Proc. USENIX Ann. Technical Conf., 2002.
[15] Y. Zhou, A. Bilas, S. Jagannathan, C. Dubnicki, J.F. Philbin, and K. Li, “Experiences with VI Communication for Database Storage,” Proc. 29th Int'l Symp. Computer Architecture (ISCA), 2002.
[16] S. Pakin, V. Karamacheti, and A. Chien, “Fast Messages: Efficient, Portable Communication for Workstation Clusters and Massively-Parallel Processors,” IEEE Concurency, vol. 5, no. 2, pp. 60-73, 1997.
[17] A. Rubini and J. Corbet, Linux Device Drivers, second ed. O'reilly Books,, 2005.
[18] Mellanox Technologies, http:/, 2005.
[19] S.C. Woo, M. Ohara, E. Torrie, J.P. Singh, and A. Gupta, “The SPLASH-2 Programs: Characterization and Methodological Considerations,” Proc. 22nd Ann. Int'l Symp. Computer Architecture (ISCA '95), 1995.
[20] D. Bailey, J. Barton, T. Lasinski, and H. Simon, “The NAS Parallel Benchmarks,” Technical Report RNR-91-002, NASA Ames, Aug. 1991.
[21] P. Keleher, S. Dwarkadas, A.L. Cox, and W. Zwaenepoel, “Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems,” Proc. USENIX Conf., pp. 115-131, 1994.
[22] N. Niv and A. Schuster, “Transparent Adaptation of Sharing Granularity in Multiview-Based DSM Systems,” Proc. Int'l Parallel and Distributed Processing Symp., Apr. 2001.

Index Terms:
Hardware/software interfaces, high-speed networks, distributed shared memory, parallel computing.
Liran Liss, Yitzhak Birk, Assaf Schuster, "In-Kernel Integration of Operating System and Infiniband Functions for High Performance Computing Clusters: A DSM Example," IEEE Transactions on Parallel and Distributed Systems, vol. 16, no. 9, pp. 830-840, Sept. 2005, doi:10.1109/TPDS.2005.111
Usage of this product signifies your acceptance of the Terms of Use.