This Article 
 Bibliographic References 
 Add to: 
VI-Attached Database Storage
January 2005 (vol. 16 no. 1)
pp. 35-50

Abstract—This article presents a VI-attached database storage architecture to improve database transaction rates. More specifically, we examine how VI-based interconnects can be used to improve I/O path performance between a database server and a storage subsystem. To facilitate the interaction between client applications and a VI-aware storage system, we design and implement a software layer called DSA, that is layered between applications and VI. DSA takes advantage of specific VI features and deals with many of its shortcomings. We provide and evaluate one kernel-level and two user-level implementations of DSA. These implementations trade transparency and generality for performance at different degrees and, unlike research prototypes, are designed to be suitable for real-world deployment. We have also investigated many design trade offs in the storage cluster. We present detailed measurements using a commercial database management system with both microbenchmarks and industrial database workloads on a mid-size, 4 CPU, and a large, 32 CPU, database server. We also compare the effectiveness of VI-attached storage with an iSCSI configuration, and conclude that storage protocols implemented using DSA over VI have significant performance advantages. More generally, our results show that VI-based interconnects and user-level communication can improve all aspects of the I/O path between the database system and the storage back-end. We also find that to make effective use of VI in I/O intensive environments, we need to provide substantial additional functionality than what is currently provided by VI. Finally, new storage APIs that help minimize kernel involvement in the I/O path are needed to fully exploit the benefits of VI-based communication.

[1] S. Aiken, D. Grunwald, J.W. Andrew, and R. Pleszkun, “A Performance Analysis of the iSCSI Protocol,” Proc. 11th NASA Goddard, 20th IEEE Conf. Mass Storage Systems and Technologies (MSST 2003), Apr. 2003.
[2] A. Ailamaki, D.J. DeWitt, M.D. Hill, and D.A. Wood, “DBMSs on a Modern Processor: Where Does Time Go?” Proc. 25th Int'l Conf. Very Large Databases, 1999.
[3] D.C. Anderson, J.S. Chase, S. Gadde, A.J. Gallatin, K.G. Yocum, and M.J. Feeley, “Cheating the I/O Bottleneck: Network Storage with Trapeze/Myrinet,” Proc. USENIX 1998 Ann. Technical Conf., 1998.
[4] ANSI, Scsi-3 Architecture Model (SAM), x3.270:1996, 11 West 42nd Street, 13th Floor, New York, NY 10036.
[5] A. Basu, V. Buch, W. Vogels, and T. von Eicken, “U-Net: A User-Level Network Interface for Parallel and Distributed Computing,” Proc. 15th ACM Symp. Operating Systems Principles (SOSP), Dec. 1995.
[6] A. Basu, M. Welsh, and T. von Eicken, “Incorporating Memory Management into User-Level Network Interfaces,” , 1996.
[7] B.C. Bialek, “Leading Vendors Validate Power of Clustering Architecture, Detail of the TPC-C Audited Benchmark,” IBM_TPC-C_Bench mark.pdf, July 2000.
[8] W.M. Cardoza, F.S. Glover, and W.E. Snaman Jr., “Design of the TruCluster Multicomputer System for the Digital UNIX Environment,” Digital Technical J. Digital Equipment Corp., vol. 8, no. 1, pp. 5-17, 1996.
[9] E.V. Carrera, S. Rao, L. Iftode, and R. Bianchini, “User-Level Communication in Cluster-Based Servers,” Proc. Eighth IEEE Int'l Symp. High-Performance Computer Architecture (HPCA 8), 2002.
[10] Y. Chen, A. Bilas, S.N. Damianakis, C. Dubnicki, and K. Li, “UTLB: A Mechanism for Address Translation on Network Interfaces,” Proc. Eighth Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 193-203, Oct. 1998.
[11] B.N. Chun, A.M. Mainwaring, and D.E. Culler, “Virtual Network Transport Protocols for Myrinet,” Proc. Hot Interconnects Symp. V, Aug. 1997.
[12] K.G. Coffman and A.M. Odlyzko, “Growth of the Internet,” Optical Fiber Telecomm. IV, I.P. Kaminow and T. Li, eds., Academic Press, 2001.
[13] Compaq/Intel/Microsoft, Virtual Interface Architecture Specification, Version 1.0, Dec. 1997.
[14] DAFS Collaborative, DAFS: Direct Access File System Protocol Version: 1.00, Sept. 2001.
[15] I. Dalgic, K. Ozdemir, R. Velpuri, and U. Kukreja, “Comparative Performance Evaluation of iSCSI Protocol over Metro, Local, and Wide Area Networks,” Proc. 12th NASA Goddard & 21st IEEE Conf. Mass Storage Systems and Technologies (MSST 2004), Apr. 2004.
[16] C. Dubnicki, A. Bilas, Y. Chen, S. Damianakis, and K. Li, “VMMC-2: Efficient Support for Reliable, Connection-Oriented Communication,” Proc. Hot Interconnects Conf., Aug. 1997.
[17] D. Dunning and G. Regnier, “The Virtual Interface Architecture,” Proc. Hot Interconnects V Symp., Aug. 1997.
[18] G.A. Gibson, D.F. Nagle, K. Amiri, J. Butler, F.W. Chang, H. Gobioff, C. Hardin, E. Riedel, D. Rochberg, and J. Zelenka, “A Cost-Effective, High-Bandwidth Storage Architecture,” Proc. Eighth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, 1998.
[19] G.A. Gibson, D.F. Nagle, K. Amiri, F.W. Chang, E.M. Feinberg, H. Gobioff, C. Lee, B. Ozceri, E. Riedel, D. Rochberg, and J. Zelenka, “File Server Scaling with Network-Attached Secure Disks,” Proc. 1997 ACM SIGMETRICS Int'l Conf. Measurement and Modeling of Computer Systems, 1997.
[20] “Accelerating and Scaling Data Networks Microsoft SQL Server 2000 and Giganet Clan,” Giganet, Giganet whitepaper: http:// -sql2000and clan.pdf, Sept. 2000.
[21] Giganet, Giganet cLAN Family of Products, http://www.emu lex.comproducts.html, 2001.
[22] R. Gillett, M. Collins, and D. Pimm, “Overview of Network Memory Channel for PCI,” Proc. IEEE Spring COMPCON Conf. '96, Feb. 1996.
[23] H. Gregory, J. Thomas, P. McMahon, A. Skjellum, and N. Doss, Design of the BDM Family of Myrinet Control Programs. 1998.
[24] I.E.T.F. (IETF), iSCSI, version 08, IP Storage (IPS), Internet Draft, Document: draft-ietf-ips-iscsi-08.txt, Sept. 2001.
[25] InfiniBand Trade Assoc., Infiniband Architecture Specification, version 1.0, http:/, Oct. 2000.
[26] K. Keeton, D. Patterson, Y. He, R. Raphael, and W. Baker, “Performance Characterization of a Quad Pentium Pro SMP Using OLTP Workloads,” Proc. 25th Ann. Int'l Symp. Computer Architecture (ISCA-98), 1998.
[27] Y. Lu, F. Noman, and D.H.C. Du, “Simulation Study of iSCSI-Based Storage System,” Proc. 12th NASA Goddard & 21st IEEE Conf. Mass Storage Systems and Technologies (MSST 2004), Apr. 2004.
[28] K. Magoutis, S. Addetia, A. Fedorova, M.I. Seltzer, J.S. Chase, A.J. Gallatin, R. Kisley, R.G. Wickremesinghe, and E. Gabber, “Structure and Performance of the Direct Access File System,” Proc. USENIX Ann. Technical Conf., 2002.
[29] Proc. Microsoft Windows Hardware Eng. Conf.: Advancing the Platform, library/en-us/dnw2k/htmlawewindata.asp , Address Windowing Extensions and Microsoft Windows 2000 Datacenter Server, Mar. 1999.
[30] Mylex, eXtremeRAID 3000 High Performance 1Gb Fibre RAID Controller, http:/, 2004.
[31] W.T. Ng, H. Sun, B. Hillyer, E. Shriver, E. Gabber, and B. Ozden, “Obtaining High Performance for Storage Outsourcing,” Proc. First USENIX Conf. File and Storage Technologies (FAST 02), pp. 145-158, Jan. 2002.
[32] ORACLE, Oracle Net VI Protocol Support, a Technical White Paper, Oracle_ VI.pdf, Feb. 2001.
[33] S. Pakin, V. Karamcheti, and A.A. Chien, “Fast Messages: Efficient, Portable Communication for Workstation Clusters and Massively Parallel Processors (MPP),” IEEE Concurrency, vol. 5, no. 2, pp. 60-73, Apr.-June 1997.
[34] D. Patterson, G. Gibson, and R. Katz, “A Case for Redundant Arrays for Inexpensive Disks (RAID),” Proc. ACM SIGMOD Conf., pp. 109-116, June 1988.
[35] M. Rosenblum, E. Bugnion, S.A. Herrod, E. Witchel, and A. Gupta, “The Impact of Architectural Trends on Operating System Performance,” Proc. 15th ACM Symp. Operating Systems Principles, pp. 285-298, 1995.
[36] H. Tezuka, A. Hori, and Y. Ishikawa, “PM: A High-Performance Communication Library for Multi-User Parallel Environments,” Technical Report TR-96015, Real World Computing Partnership, Nov. 1996.
[37] Transaction Processing Performance Council (TPC Benchmark C), Shanley Public Relations, 777 N. First Street, Suite 600, San Jose, CA 95112-6311, May 1991.
[38] M. Uysal, A. Acharya, and J. Saltz, “Evaluation of Active Disks for Decision Support Databases,” Proc. Sixth Int'l Symp. High-Performance Computer Architecture, pp. 337-348, Jan. 2000.
[39] J. Wilkes, R. Golding, C. Staelin, and T. Sullivan, “The HP AutoRAID Hierarchical Storage System,” ACM Trans. Computer Systems, vol. 14, no. 1, pp. 108-136, Feb. 1996.
[40] H. Xiong, R. Kanagavelu, Y. Zhu, and K.L. Yong, “An iSCSI Design and Implementation,” Proc. 12th NASA Goddard & 21st IEEE Conf. Mass Storage Systems and Technologies (MSST 2004), Apr. 2004.
[41] Y. Zhou, Y. Chen, and K. Li, “Second-Level Buffer Cache Management,” IEEE Trans. Parallel and Distributed Computing, vol. 15, no. 7, July 2004.
[42] Y. Zhou, J.F. Philbin, and K. Li, “The Multi-Queue Replacement Algorithm for Second Level Buffer Caches,” Proc. USENIX Ann. Technical Conf., pp. 91-104, June 2001.

Index Terms:
Database storage, storage server, Virtual Interface, user-level communication, performance evaluation, server cluster.
Yuanyuan Zhou, Angelos Bilas, Suresh Jagannathan, Dimitrios Xinidis, Cezary Dubnicki, Kai Li, "VI-Attached Database Storage," IEEE Transactions on Parallel and Distributed Systems, vol. 16, no. 1, pp. 35-50, Jan. 2005, doi:10.1109/TPDS.2005.13
Usage of this product signifies your acceptance of the Terms of Use.