This Article 
 Bibliographic References 
 Add to: 
Design Trade-Offs for User-Level I/O Architectures
August 2006 (vol. 55 no. 8)
pp. 962-973
To address the growing I/O bottleneck, next-generation distributed I/O architectures employ scalable point-to-point interconnects and minimize operating system overhead by providing user-level access to the I/O subsystem. Reduced I/O overhead allows I/O intensive applications to efficiently employ latency hiding techniques for improved throughput. This paper presents the design of a novel scalable user-level I/O architecture and evaluates the impact of various architectural mechanisms in terms of overall performance improvement. Results demonstrate that eliminating data movement across protection domains is the dominant contributor to improved scalability. Eliminating system call and interrupt overhead only has a small additional benefit that may not justify the additional hardware support required. While this evaluation is based on one specific design, the conclusions can be generalized to other user-level I/O architectures.

[1] L.A. Barroso and K. Gharachorloo, and E. Bugnion, “Memory System Characterization of Commercial Workloads,” Proc. 25th Int'l Symp. Computer Architecture (ISCA-25), pp. 3-14, 1998.
[2] J.K. Osterhout, “Why Aren't Operating Systems Getting Faster as Fast as Hardware?” Proc. Usenix Summer Conf., pp. 247-256, 1990.
[3] TcX AB, Detron HB, and Monty Program KB, MySQL Reference Manual Version 3.2.1,, 1999.
[4] InfiniBand Architecture Specification Release 1.0, InfiniBand Trade Assoc., Portland, Ore., 2000.
[5] L. Schaelicke, “Architectural Support for User-Level Input/Output,” PhD dissertation, Univ. of Utah, 2001.
[6] Y. Zhou et al., “Experiences with VI Communication for Database Storage,” Proc. 29th Int'l Symp. Computer Architecture (ISCA-29), pp. 257-268, 2002.
[7] G. Gibson et al., “A Cost-Effective, High-Bandwidth Storage Architecture,” Proc. Eighth Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS-VIII), pp.92-103, 1998.
[8] L. Schaelicke and A. Davis, “Improving I/O Performance with a Conditional Store Buffer,” Proc. 31st Int'l Symp. Microarchitecture (MICRO-31), pp. 160-169, 1998.
[9] M.A. Blumrich et al., “Virtual Memory Mapped Network Interface for the SHRIMP Multicomputer,” Proc. 21st Int'l Symp. Computer Architecture (ISCA-21), pp. 143-153, 1994.
[10] T. vonEicken et al., “U-Net: A User-Level Network Interface for Parallel and Distributed Computing,” Proc. 15th ACM Symp. Operating Systems Principles (SOSP-15), pp. 40-53, 1995.
[11] M. Fillo and R.B. Gillett, “Architecture and Implementation of Memory Channel 2,” DEC Technical J., vol. 9, no. 1, 1997.
[12] A.M. Mainwaring and D.E. Culler, “Design Challenges of Virtual Networks: Fast, General-Purpose Communication,” Proc. Seventh ACM SIGPLAN Symp. Principles and Practices of Parallel Programming, pp. 119-130, 1999.
[13] IA-32 Intel Architecture Software Developer's Manual, Volume 3: System Programming Guide, Intel Corp., 2004.
[14] PowerPC Microprocessor Family: The Programming Environment for 32-Bit Microprocessors, Motorola Corp., Schaumburg, Ill., 1997.
[15] B. Moore, T. Slabach, and L. Schaelicke, “Profiling Interrupt Handler Performance through Kernel Instrumentation,” Proc. Int'l Conf. Computer Design (ICCD-2003), pp. 156-163, 2003.
[16] C.A. Thekkath and H.M. Levy, “Hardware and Software Support for Efficient Exception Handling,” Proc. Sixth Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS-VI), pp. 110-119, 1994.
[17] T. von Eicken, “Active Messages: a Mechanism for Integrated Communication and Computation,” Proc. 19th Int'l Symp. Computer Architecture (ISCA-19), pp. 256-266, 1992.
[18] L. Schaelicke and M. Parker, “ML-RSIM Reference Manual,” Technical Report TR 02-10, Dept. of Computer Science Eng., Univ. of Notre Dame, Ind., 2002.
[19] V.S. Paj, P. Rangnathan, and S.V. Adve, “RSIM Reference Manual, Version 1.0,” Technical Report 9705, Dept. of Electrical and Computer Eng., Rice Univ., Houston, Tex., 1997.
[20] M. McKusick, K. Bostic, M. Karels, and J. Quarterman, The Design and Implementation of the 4.4. BSD Operating System. Addison Wesley Longman, 1996.
[21] L. Schaelicke, “L-RSIM: A Simulation Environment for I/O Intensive Workloads,” Proc. Third Ann. IEEE Workshop Workload Characterization, pp. 83-89, 2000.
[22] E.P. Markatos and M.G.H. Katevenis, “User-Level DMA without Operating System Kernel Modifications,” Proc. Third Symp. High-Performance Computer Architecture (HPCA-3), pp. 322-331, 1997.
[23] D. Weaver and T. Germond, The SPARC Architecture Manual Version 9. PTR Prentice Hall Inc., 1994.
[24] Y. Chen et al., “UTLB: A Mechanism for Address Translation on Network Interfaces,” Proc. Eighth Int'l Conf, Architectural Support for Programming Languages and Operating Systems (ASPLOS-VIII), pp. 193-204, 1998.
[25] M. Welsh, A. Basu, and T. von Eicken, “Incorporating Memory Management into User-Level Network Interfaces,” Technical Report TR97-1620, Dept. of Computer Science, Cornell Univ., Ithaca, N.Y., 1997.

Index Terms:
Architecture, user-level, input/output devices, performance analysis, simulation.
Lambert Schaelicke, Alan L. Davis, "Design Trade-Offs for User-Level I/O Architectures," IEEE Transactions on Computers, vol. 55, no. 8, pp. 962-973, Aug. 2006, doi:10.1109/TC.2006.122
Usage of this product signifies your acceptance of the Terms of Use.