The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.12 - December (2011 vol.23)
pp: 1763-1780
Yi Luo , Lab. Le2i, CNRS Dijon, Dijon, France
ABSTRACT
With the increasing amount of text data stored in relational databases, there is a demand for RDBMS to support keyword queries over text data. As a search result is often assembled from multiple relational tables, traditional IR-style ranking and query evaluation methods cannot be applied directly. In this paper, we study the effectiveness and the efficiency issues of answering top-k keyword query in relational database systems. We propose a new ranking formula by adapting existing IR techniques based on a natural notion of virtual document. We also propose several efficient query processing methods for the new ranking method. We have conducted extensive experiments on large-scale real databases using two popular RDBMSs. The experimental results demonstrate significant improvement to the alternative approaches in terms of retrieval effectiveness and efficiency.
INDEX TERMS
text analysis, query processing, question answering (information retrieval), relational databases, large-scale real database, SPARK2, top-k keyword query answering, relational database, text data storage, RDBMS, multiple relational table, efficiency issues, effectiveness issues, IR-style ranking, query evaluation method, virtual document, query processing method, Query processing, Keyword search, Relational databases, Information retrieval, Electronic mail, Semantics, information retrieval., Top-k, keyword search, relational database
CITATION
Yi Luo, "SPARK2: Top-k Keyword Query in Relational Databases", IEEE Transactions on Knowledge & Data Engineering, vol.23, no. 12, pp. 1763-1780, December 2011, doi:10.1109/TKDE.2011.60
REFERENCES
[1] S. Chaudhuri, R. Ramakrishnan, and G. Weikum, "Integrating DB and IR Technologies: What Is the Sound of One Hand Clapping?," Proc. Conf. Innovative Data Systems Research (CIDR), pp. 1-12, 2005.
[2] S. Agrawal, S. Chaudhuri, and G. Das, "DBXplorer: A System for Keyword-Based Search over Relational Databases," Proc. 18th Int'l Conf. Data Eng. (ICDE '02), pp. 5-16, 2002.
[3] G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan, "Keyword Searching and Browsing in Databases Using BANKS," Proc. 18th Int'l Conf. Data Eng. (ICDE '02), pp. 431-440, 2002.
[4] B. Kimelfeld and Y. Sagiv, "Efficient Engines for Keyword Proximity Search," Proc. Int'l Workshop Web and Databases (WebDB), pp. 67-72, 2005.
[5] R. Goldman, N. Shivakumar, S. Venkatasubramanian, and H. Garcia-Molina, "Proximity Search in Databases," Proc. 24th Int'l Conf. Very Large Data Bases (VLDB), pp. 26-37, 1998.
[6] V. Hristidis and Y. Papakonstantinou, "DISCOVER: Keyword Search in Relational Databases," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 670-681, 2002.
[7] V. Hristidis, L. Gravano, and Y. Papakonstantinou, "Efficient IR-Style Keyword Search over Relational Databases," Proc. 29th Int'l Conf. Very Large Data Bases (VLDB), 2003.
[8] V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai, and H. Karambelkar, "Bidirectional Expansion for Keyword Search on Graph Databases," Proc. 31st Int'l Conf. Very Large Data Bases (VLDB), pp. 505-516, 2005.
[9] B. Kimelfeld and Y. Sagiv, "Finding and Approximating Top-k Answers in Keyword Proximity Search," Proc. 25th ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS), pp. 173-182, 2006.
[10] F. Liu, C.T. Yu, W. Meng, and A. Chowdhury, "Effective Keyword Search in Relational Databases," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 563-574, 2006.
[11] R. Fagin, A. Lotem, and M. Naor, "Optimal Aggregation Algorithms for Middleware," Proc. 20th ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS), 2001.
[12] A. Natsev, Y.-C. Chang, J.R. Smith, C.-S. Li, and J.S. Vitter, "Supporting Incremental Join Queries on Ranked Inputs," Proc. 27th Int'l Conf. Very Large Data Bases (VLDB), pp. 281-290, 2001.
[13] K.C.-C. Chang and S. Hwang, "Minimal Probing: Supporting Expensive Predicates for Top-k Queries," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 346-357, 2002.
[14] P.J. Haas and J.M. Hellerstein, "Ripple Joins for Online Aggregation," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 287-298, 1999.
[15] S.E. Robertson, H. Zaragoza, and M.J. Taylor, "Simple BM25 Extension to Multiple Weighted Fields," Proc. 13th ACM Int'l Conf. Information and Knowledge Management (CIKM), pp. 42-49, 2004.
[16] Q. Su and J. Widom, "Indexing Relational Database Content Offline for Efficient Keyword-Based Search," Proc. Ninth Int'l Database Eng. and Application Symp. (IDEAS), 2005.
[17] R. Wilkinson, J. Zobel, and R. Sacks-Davis, "Similarity Measures for Short Queries," Proc. Fourth Text Retrieval Conf. (TREC), 1995.
[18] D.E. Rose and D.R. Cutting, "Ranking for Usability: Enhanced Retrieval for Short Queries," Apple Technical Report 163, 1996.
[19] G. Salton, E.A. Fox, and H. Wu, "Extended Boolean Information Retrieval," Comm. the ACM, vol. 26, no. 11, pp. 1022-1036, 1983.
[20] S. Börzsönyi, D. Kossmann, and K. Stocker, "The Skyline Operator," Proc. 17th Int'l Conf. Data Eng. (ICDE), pp. 421-430, 2001.
[21] T.K. Sellis, "Multiple-Query Optimization," ACM Trans. Database Systems, vol. 13, no. 1, pp. 23-52, 1988.
[22] Y. Chi, Y. Yang, and R.R. Muntz, "Hybridtreeminer: An Efficient Algorithm for Mining Frequent Rooted Trees and Free Trees Using Canonical Form," Proc. 16th Int'l Conf. Scientific and Statistical Database Management (SSDBM), pp. 11-20, 2004.
[23] A. Markowetz, Y. Yang, and D. Papadias, "Keyword Search on Relational Data Streams," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2007.
[24] R. Cyganiak "D2RQ Benchemarking," http://sites.wiwiss. fu-berlin.de/suhl/ bizer/d2rqbenchmarks/, 2011.
[25] Y. Luo, X. Lin, W. Wang, and X. Zhou, "SPARK: Top-k Keyword Query in Relational Databases," School of Computer Science and Eng., Technical Report 0708, Univ. of New South Wales, 2007.
[26] S. Wang, Z. Peng, J. Zhang, L. Qin, S. Wang, J.X. Yu, and B. Ding, "Nuits: A Novel User Interface for Efficient Keyword Search over Databases," Proc. 32nd Int'l Conf. Very Large Data Bases (VLDB), pp. 1143-1146, 2006.
[27] B. Ding, J.X. Yu, S. Wang, L. Qin, X. Zhang, and X. Lin, "Finding Top-k Min-Cost Connected Trees in Databases," Proc. IEEE 23rd Int'l Conf. Data Eng. (ICDE), 2007.
[28] W.-S. Li, K.S. Candan, Q. Vu, and D. Agrawal, "Query Relaxation by Structure and Semantics for Retrieval of Logical Web Documents," IEEE Trans. Knowledge Data Eng., vol. 14, no. 4, pp. 768-791, July/Aug. 2002.
[29] K. Golenberg, B. Kimelfeld, and Y. Sagiv, "Keyword Proximity Search in Complex Data Graphs," Proc. 28th ACM SIGMOD Int'l Conf. Management of Data, 2008.
[30] A. Baid, I. Rae, A. Doan, and J. Naughton, "Toward Industrial-Strength Keyword Search Systems over Relational Data," Proc. IEEE 26th Int'l Conf. Data Eng. (ICDE '10), 2010.
[31] B.B. Dalvi, M. Kshirsagar, and S. Sudarshan, "Keyword Search on External Memory Data Graphs," Proc. VLDB Endowment, vol. 1, no. 1, pp. 1189-1204, 2008.
[32] H. He, H. Wang, J. Yang, and P.S. Yu, "Blinks: Ranked Keyword Searches on Graphs," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 305-316, 2007.
[33] G. Li, B.C. Ooi, J. Feng, J. Wang, and L. Zhou, "EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-Structured and Structured Data," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2008.
[34] A. Markowetz, Y. Yang, and D. Papadias, "Reachability Indexes for Relational Keyword Search," Proc. IEEE 25th Int'l Conf. Data Eng. (ICDE '09), 2009.
[35] Y. Luo, X. Lin, W. Wang, and X. Zhou, "SPARK: Top-k Keyword Query in Relational Databases," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2007.
[36] L. Qin, J.X. Yu, and L. Chang, "Keyword Search in Databases: The Power of RDBMS," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2009.
[37] E. Chu, A. Baid, X. Chai, A. Doan, and J. Naughton, "Combining Keyword Search and Forms for Ad Hoc Querying of Databases," Proc. 35th SIGMOD Int'l Conf. Management of Data, 2009.
[38] P. Wu, Y. Sismanis, and B. Reinwald, "Towards Keyword-Driven Analytical Processing," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 617-628, 2007.
[39] J.X. Yu, L. Qin, and L. Chang, Keyword Search in Databases. Morgan & Claypool, 2009.
[40] T. Grabs, K. Böhm, and H.-J. Schek, "PowerDB-IR: Information Retrieval on Top of a Database Cluster," Proc. 10th Int'l Conf. Information and Knowledge Management (CIKM '01), pp. 411-418, 2001.
[41] Y. Tao and J.X. Yu, "Finding Frequent Co-Occurring Terms in Relational Keyword Search," Proc. 12th Int'l Conf. Extending Database Technology (EDBT): Advances in Database Technology, 2009.
[42] P. Roy, M.K. Mohania, B. Bamba, and S. Raman, "Towards Automatic Association of Relevant Unstructured Content with Structured Query Results," Proc. 14th ACM Int'l Conf. Information and Knowledge Management (CIKM), 2005.
[43] M. Bhide, A.G. 0004, R. Gupta, P. Roy, M.K. Mohania, and Z. Ichhaporia, "Liptus: Associating Structured and Unstructured Information in a Banking Environment," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2007.
[44] L. Guo, F. Shao, C. Botev, and J. Shanmugasundaram, "XRANK: Ranked Keyword Search over XML Documents," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2003.
[45] G. Li, J. Feng, J. Wang, and L. Zhou, "An Effective and Versatile Keyword Search Engine on Heterogenous Data Sources," Proc. VLDB Endowment, vol. 1, no. 2, pp. 1452-1455, 2008.
[46] M. Sayyadan, H. LeKhac, A. Doan, and L. Gravano, "Efficient Keyword Search Across Heterogeneous Relational Databases," Proc. 23rd IEEE Int'l Conf. Data Eng. (ICDE), 2007.
[47] S. Tata and G.M. Lohman, "SQAK: Doing More with Keywords," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 889-902, 2008.
[48] Q.H. Vu, B.C. Ooi, D. Papadias, and A.K.H. Tung, "A Graph Method for Keyword-Based Selection of the Top-k Databases," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2008.
[49] B. Yu, G. Li, K.R. Sollins, and A.K.H. Tung, "Effective Keyword-Based Selection of Relational Databases," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 139-150, 2007.
[50] G. Koutrika, A. Simitsis, and Y. Ioannidis, "Précis: The Essence of a Query Answer," Proc. 22nd Int'l Conf. Data Eng. (ICDE), 2006.
[51] G. Koutrika, Z.M. Zadeh, and H. Garcia-Molina, "DataClouds: Summarizing Keyword Search Results over Structured Data," Proc. Int'l Conf. Extending Database Technology (EDBT), 2009.
[52] Z. Liu, P. Sun, and Y. Chen, "Structured Search Result Differentiation," Proc. VLDB Endowment, vol. 2, pp. 313-324, 2009.
[53] K.Q. Pu and X. Yu, "Keyword Query Cleaning," Proc. VLDB Endowment, vol. 1, no. 1, pp. 909-920, 2008.
[54] N. Sarkas, N. Bansal, G. Das, and N. Koudas, "Measure-Driven Keyword-Query Expansion," Proc. VLDB Endowment, vol. 2, pp. 121-132, 2009.
[55] S. Chaudhuri and R. Kaushik, "Extending Autocompletion to Tolerate Errors," Proc. 35th SIGMOD Int'l Conf. Management of Data, 2009.
[56] R. Fagin, "Combining Fuzzy Information from Multiple Systems," J. Computer and System Sciences, vol. 58, no. 1, pp. 83-99, 1999.
[57] N. Mamoulis, K.H. Cheng, M.L. Yiu, and D.W. Cheung, "Efficient Aggregation of Ranked Inputs," Proc. 22nd Int'l Conf. Data Eng. (ICDE), 2006.
[58] H. Bast, D. Majumdar, R. Schenkel, M. Theobald, and G. Weikum, "IO-Top-k: Index-Access Optimized Top-k Query Processing," Proc. Int'l Conf. Very Large Databases (VLDB), pp. 475-486, 2006.
[59] D. Xin, C. Chen, and J. Han, "Towards Robust Indexing for Ranked Queries," Proc. 32nd Int'l Conf. Very Large Data Bases (VLDB), pp. 235-246, 2006.
[60] G. Das, D. Gunopulos, N. Koudas, and D. Tsirogiannis, "Answering Top-k Queries Using Views," Proc. 32nd Int'l Conf. Very Large Data Bases (VLDB), pp. 451-462, 2006.
[61] J.M. Hellerstein and M. Stonebraker, "Predicate Migration: Optimizing Queries with Expensive Predicates," Proc. SIGMOD Int'l Conf. Management of Data, pp. 267-276, 1993.
[62] A. Kemper, G. Moerkotte, K. Peithner, and M. Steinbrunn, "Optimizing Disjunctive Queries with Expensive Predicates," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 336-347, 1994.
[63] N. Bruno, L. Gravano, and A. Marian, "Evaluating Top-k Queries over Web-Accessible Databases," Proc. 18th Int'l Conf. Data Eng. (ICDE), pp. 369-380, 2002.
[64] I.F. Ilyas, W.G. Aref, and A.K. Elmagarmid, "Supporting Top-k Join Queries in Relational Databases," VLDB J., vol. 13, no. 3, pp. 207-221, 2004.
[65] K. Schnaitter and N. Polyzotis, "Evaluating Rank Joins with Optimal Cost," Proc. 27th ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS), pp. 43-52, 2008.
26 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool