The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.10 - October (2009 vol.58)
pp: 1356-1368
Rubao Lee , Chinese Academy of Sciences, Beijing
Zhiwei Xu , Chinese Academy of Sciences, Beijing
ABSTRACT
This paper focuses on the problem of improving throughput of distributed query processing in an RDBMS-based data integration system. Although a buffer pool can be used in an RDBMS to cache disk pages in memory to reduce disk accesses, it cannot be used for data integration queries since its foundation, the memory-disk hierarchy, does not exist. The lack of a data sharing mechanism limits system throughput because unnecessary data requests increase burden on data sources and redundant resultant data transfers waste network bandwidth. To address the problem, we present a new technique called request window, which can detect and exploit data sharing opportunities among concurrent queries. Request window exploits a new stream request locality which reflects common query interests among independent users in a short time period. The existence of such a locality makes it possible to collect a group of related data requests and process them as a batch by request window. Evaluation on a PostgreSQL-based data integration system shows that request window can significantly increase system throughput when running a distributed TPC-H workload.
INDEX TERMS
Buffer management, distributed databases, locality, query processing.
CITATION
Rubao Lee, Zhiwei Xu, "Exploiting Stream Request Locality to Improve Query Throughput of a Data Integration System", IEEE Transactions on Computers, vol.58, no. 10, pp. 1356-1368, October 2009, doi:10.1109/TC.2009.80
REFERENCES
[1] M. Lenzerini, “Data Integration: A Theoretical Perspective,” Proc. ACM SIGACT-SIGMOD-SIGART Symp. Principles of Database Systems, 2002.
[2] D. Kossmann, “The State of the Art in Distributed Query Processing,” ACM Computing Surveys, vol. 32, no. 4, pp. 422-469, Dec. 2000.
[3] V. Josifovski, P. Schwarz, L.M. Hass, and E. Lin, “Garlic: A New Flavor of Federated Query Processing for DB2,” Proc. ACM SIGMOD Int'l Conf. Management of Data, 2002.
[4] J.A. Blakeley, C. Cunningham, N. Ellis, B. Rathakrishnan, and M.C. Wu, “Distributed/Heterogeneous Query Processing in Microsoft SQL Server,” Proc. Int'l Conf. Data Eng., 2005.
[5] M.F. Mokbel, M. Lu, and W.G. Aref, “Hash-Merge Join: A Non-Blocking Algorithm for Producing Fast and Early Join Results,” Proc. Int'l Conf. Data Eng., 2004.
[6] T. Urhan and M.J. Franklin, “Xjoin: A Reactively-scheduled Pipelined Join Operator,” IEEE Data Eng. Bull., vol. 23, no. 2, pp.27-33, June 2000.
[7] S. Viglas, J.F. Naughton, and J. Burger, “Maximizing the Output Rate of Multi-Way Join Queries over Streaming Information Sources,” Proc. Int'l Conf. Very Large Data Bases, 2003.
[8] S. Harizopoulos, V. Shkapenyuk, and A. Ailamaki, “QPipe: A Simultaneously Pipelined Relational Query Engine,” Proc. ACM SIGMOD Int'l Conf. Management of Data, 2005.
[9] W.S. Enumeration, http://www.w3.org/SubmissionWS- Enumeration /, 2008.
[10] P.J. Denning, “Working Sets Past and Present,” IEEE Trans. Software Eng., vol. 6, no. 1, pp. 64-84, Jan. 1980.
[11] R. Lee and M. Zhou, “Extending PostgreSQL to Support Distributed/Heterogeneous Query Processing,” Proc. Int'l Conf. Database Systems for Advanced Applications, 2007.
[12] G. Graefe, “Query Evaluation Techniques for Large Databases,” ACM Computing Surveys, vol. 25, no. 2, pp. 73-169, June 1993.
[13] M. Stonebraker and G. Kemnitz, “The POSTGRES Next Generation Database Management System,” Comm. ACM, vol. 34, no. 10, pp. 78-92, 1991.
[14] M.J. Franklin, B.T. Jonsson, and D. Kossmann, “Performance Tradeoffs for Client-Server Query Processing,” Proc. ACM SIGMOD Int'l Conf. Management of Data, 1996.
[15] Z.G. Ives, A.Y. Halevy, and D.S. Weld, “Adapting to Source Properties in Processing Data Integration Queries,” Proc. ACM SIGMOD Int'l Conf. Management of Data, 2004.
[16] T. Malik, R. Burns, N. Chawla, and A. Szalay, “Estimating Query Result Sizes for Proxy Caching in Scientific Database Federations,” Proc. ACM/IEEE Conf. High Performance Networking and Computing, 2006.
[17] T. Malik, R. Burns, and N. Chawla, “A Black-Box Approach to Query Cardinality Estimation,” Proc. Conf. Innovative Data Systems Research, 2007.
[18] Z.G. Ives, D. Florescu, M.T. Friedman, A.Y. Levy, and D.S. Weld, “An Adaptive Query Execution System for Data Integration,” Proc. ACM SIGMOD Int'l Conf. Management of Data, 1999.
[19] R. Lee, M. Zhou, and H. Liao, “Request Window: An Approach to Improve Throughput of RDBMS-Based Data Integration System by Utilizing Data Sharing across Concurrent Distributed Queries,” Proc. Int'l Conf. Very Large Data Bases, 2007.
[20] N. Dalvi, S.K. Sanghai, P. Roy, and S. Sudarshan, “Pipelining in Multi-Query Optimization,” Proc. ACM SIGACT-SIGMOD-SIGART Symp. Principles of Database Systems, 2001.
[21] P. Roy, S. Seshadri, S. Sudarshan, and S. Bhobe, “Efficient and Extensible Algorithms for Multi Query Optimization,” Proc. ACM SIGMOD Int'l Conf. Management of Data, 2000.
[22] T.K. Sellis, “Multiple Query Optimization,” ACM Trans. Database Systems, vol. 13, no. 1, pp. 23-52, Mar. 1988.
[23] Y. Zhao, P.M. Deshpande, J.F. Naughton, and A. Shukla, “Simultaneous Optimization and Evaluation of Multiple Dimensional Queries,” Proc. ACM SIGMOD Int'l Conf. Management of Data, 1998.
[24] J. Zhou, P. Larson, J. Freytag, and W. Lehner, “Efficient Exploitation of Similar Subexpressions for Query Processing,” Proc. ACM SIGMOD Int'l Conf. Management of Data, 2007.
[25] C. Cook, “Database Architecture: The Storage Engine,” Microsoft SQL Server 2000 Technical Article, 2001.
[26] P.M. Fernandez, “Red Brick Warehouse: A Read-Mostly RDBMS for Open SMP Platforms,” Proc. ACM SIGMOD Int'l Conf. Management of Data, 1994.
[27] T. Walter, “Explaining Cache - NCR CTO Todd Walter Answers Your Trickiest Questions on Teradata's Caching Functionality,” http://www.teradata.com/t/page116344/, 2009.
[28] M. Zukowski, S. Heman, N. Nes, and P. Boncz, “Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS,” Proc. Int'l Conf. Very Large Data Bases, 2007.
[29] H. Chou and D.J. Dewitt, “An Evaluation of Buffer Management Strategies for Relational Database Systems,” Proc. Int'l Conf. Very Large Data Bases, 1985.
[30] S. Jiang and X. Zhang, “LIRS: An Efficient Low Inter-Reference Recency Set Replacement Policy to Improve Buffer Cache Performance,” Proc. ACM SIGMETRICS Conf. Measuring and Modeling of Computer Systems, 2002.
[31] T. Johnson and D. Shasha, “2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm,” Proc. Int'l Conf. Very Large Data Bases, 1994.
[32] N. Megiddo and D.S. Modha, “ARC: A Self-Tuning, Low Overhead Replacement Cache,” Proc. USENIX Conf. File and Storage Technologies, 2003.
[33] E.J. O'Neil, P.E. O'Neil, and G. Weikum, “The LRU-K Page Replacement Algorithm for Database Disk Buffering,” Proc. ACM SIGMOD Int'l Conf. Management of Data, 1993.
[34] G.M. Sacco and M. Schkolnick, “Buffer Management in Relational Database Systems,” ACM Trans. Database Systems, vol. 11, no. 4, pp. 473-498, Dec. 1986.
[35] G. Yadgar and M. Factor, “Karma: Know-it-All Replacement for a Multilevel cAche,” Proc. USENIX Conf. File and Storage Technologies, 2007.
[36] R. Fonseca, V. Almeida, M. Crovella, and B. Abrahao, “On the Intrinsic Locality Properties of Web Reference Streams,” Proc. IEEE INFOCOM, 2003.
[37] S. Dar, M. Franklin, B.T. Jonsson, D. Srivastava, and M. Tan, “Semantic Data Caching and Replacement,” Proc. Int'l Conf. Very Large Data Bases, 1996.
[38] B. Jonsson, M. Arinbjarnar, B. Jorsson, and M.J. Franklin, D. Srivastava, “Performance and Overhead of Semantic Cache Management,” ACM Trans. Internet Technology, vol. 6, no. 3, pp.302-331, Aug. 2006.
[39] S. Lyer and P. Druschel, “Anticipatory Scheduling: A Disk Scheduling Framework to Overcome Deceptive Idleness in Synchronous I/O,” Proc. Symp. Operating System Principles, 2001.
16 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool