The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.02 - Feb. (2013 vol.25)
pp: 461-475
Guoliang Li , Tsinghua Univsersity, Beijing
Jianhua Feng , Tsinghua University, Beijing
Chen Li , University of California, Irvine, Irvine
ABSTRACT
A search-as-you-type system computes answers on-the-fly as a user types in a keyword query character by character. We study how to support search-as-you-type on data residing in a relational DBMS. We focus on how to support this type of search using the native database language, SQL. A main challenge is how to leverage existing database functionalities to meet the high-performance requirement to achieve an interactive speed. We study how to use auxiliary indexes stored as tables to increase search performance. We present solutions for both single-keyword queries and multikeyword queries, and develop novel techniques for fuzzy search using SQL by allowing mismatches between query keywords and answers. We present techniques to answer first-N queries and discuss how to support updates efficiently. Experiments on large, real data sets show that our techniques enable DBMS systems on a commodity computer to support search-as-you-type on tables with millions of records.
INDEX TERMS
Indexes, Data privacy, Privacy, Engines, Publishing, Correlation, fuzzy search, Search-as-you-type, databases, SQL
CITATION
Guoliang Li, Jianhua Feng, Chen Li, "Supporting Search-As-You-Type Using SQL in Databases", IEEE Transactions on Knowledge & Data Engineering, vol.25, no. 2, pp. 461-475, Feb. 2013, doi:10.1109/TKDE.2011.148
REFERENCES
[1] S. Agrawal, K. Chakrabarti, S. Chaudhuri, and V. Ganti, "Scalable Ad-Hoc Entity Extraction from Text Collections," Proc. VLDB Endowment, vol. 1, no. 1, pp. 945-957, 2008.
[2] S. Agrawal, S. Chaudhuri, and G. Das, "DBXplorer: A System for Keyword-Based Search over Relational Data Bases," Proc. 18th Int'l Conf. Data Eng. (ICDE '02), pp. 5-16, 2002.
[3] A. Arasu, V. Ganti, and R. Kaushik, "Efficient Exact Set-Similarity Joins," Proc. 32nd Int'l Conf. Very Large Data Bases (VLDB '06), pp. 918-929, 2006.
[4] H. Bast, A. Chitea, F.M. Suchanek, and I. Weber, "ESTER: Efficient Search on Text, Entities, and Relations," Proc. 30th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR '07), pp. 671-678, 2007.
[5] H. Bast and I. Weber, "Type Less, Find More: Fast Autocompletion Search with a Succinct Index," Proc. 29th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR '06), pp. 364-371, 2006.
[6] H. Bast and I. Weber, "The Complete Search Engine: Interactive, Efficient, and Towards IR & DB Integration," Proc. Conf. Innovative Data Systems Research (CIDR), pp. 88-95, 2007.
[7] R.J. Bayardo, Y. Ma, and R. Srikant, "Scaling up all Pairs Similarity Search," Proc. 16th Int'l Conf. World Wide Web (WWW '07), pp. 131-140, 2007.
[8] G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan, "Keyword Searching and Browsing in Data Bases Using Banks," Proc. 18th Int'l Conf. Data Eng. (ICDE '02), pp. 431-440, 2002.
[9] K. Chakrabarti, S. Chaudhuri, V. Ganti, and D. Xin, "An Efficient Filter for Approximate Membership Checking," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '08), pp. 805-818, 2008.
[10] S. Chaudhuri, K. Ganjam, V. Ganti, R. Kapoor, V. Narasayya, and T. Vassilakis, "Data Cleaning in Microsoft SQL Server 2005," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '05), pp. 918-920, 2005.
[11] S. Chaudhuri, K. Ganjam, V. Ganti, and R. Motwani, "Robust and Efficient Fuzzy Match for Online Data Cleaning," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '03), pp. 313-324, 2003.
[12] S. Chaudhuri, V. Ganti, and R. Kaushik, "A Primitive Operator for Similarity Joins in Data Cleaning," Proc. 22nd Int'l Conf. Data Eng. (ICDE '06), pp. 5-16, 2006.
[13] S. Chaudhuri, V. Ganti, and R. Motwani, "Robust Identification of Fuzzy Duplicates," Proc. 21st Int'l Conf. Data Eng. (ICDE), pp. 865-876, 2005.
[14] S. Chaudhuri and R. Kaushik, "Extending Autocompletion to Tolerate Errors," Proc. 35th ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '09), pp. 433-439, 2009.
[15] B.B. Dalvi, M. Kshirsagar, and S. Sudarshan, "Keyword Search on External Memory Data Graphs," Proc. VLDB Endowment, vol. 1, no. 1, pp. 1189-1204, 2008.
[16] B. Ding, J.X. Yu, S. Wang, L. Qin, X. Zhang, and X. Lin, "Finding Top-K Min-Cost Connected Trees in Data Bases," Proc. IEEE 23rd Int'l Conf. Data Eng. (ICDE '07), pp. 836-845, 2007.
[17] L. Gravano, P.G. Ipeirotis, H.V. Jagadish, N. Koudas, S. Muthukrishnan, and D. Srivastava, "Approximate String Joins in a Data Base (Almost) for Free," Proc. 27th Int'l Conf. Very Large Data Bases (VLDB '01), pp. 491-500, 2001.
[18] M. Hadjieleftheriou, A. Chandel, N. Koudas, and D. Srivastava, "Fast Indexes and Algorithms for Set Similarity Selection Queries," Proc. IEEE 24th Int'l Conf. Data Eng. (ICDE '08), pp. 267-276, 2008.
[19] M. Hadjieleftheriou, N. Koudas, and D. Srivastava, "Incremental Maintenance of Length Normalized Indexes for Approximate String Matching," Proc. 35th ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '09), pp. 429-440, 2009.
[20] M. Hadjieleftheriou, X. Yu, N. Koudas, and D. Srivastava, "Hashed Samples: Selectivity Estimators for Set Similarity Selection Queries," Proc. VLDB Endowment, vol. 1, no. 1, pp. 201-212, 2008.
[21] H. He, H. Wang, J. Yang, and P.S. Yu, "Blinks: Ranked Keyword Searches on Graphs," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '07), pp. 305-316, 2007.
[22] V. Hristidis and Y. Papakonstantinou, "Discover: Keyword Search in Relational Data Bases," Proc. 28th Int'l Conf. Very Large Data Bases (VLDB '02), pp. 670-681, 2002.
[23] J. Jestes, F. Li, Z. Yan, and K. Yi, "Probabilistic String Similarity Joins," Proc. Int'l Conf. Management of Data (SIGMOD '10), pp. 327-338, 2010.
[24] S. Ji, G. Li, C. Li, and J. Feng, "Efficient Interactive Fuzzy Keyword Search," Proc. 18th ACM SIGMOD Int'l Conf. World Wide Web (WWW), pp. 371-380, 2009.
[25] V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai, and H. Karambelkar, "Bidirectional Expansion for Keyword Search on Graph Data Bases," Proc. 31st Int'l Conf. Very Large Data Bases (VLDB '05), pp. 505-516, 2005.
[26] M.-S. Kim, K.-Y. Whang, J.-G. Lee, and M.-J. Lee, "N-Gram/2l: A Space and Time Efficient Two-Level N-Gram Inverted Index Structure," Proc. 31st Int'l Conf. Very Large Data Bases (VLDB '05), pp. 325-336, 2005.
[27] N. Koudas, C. Li, A.K.H. Tung, and R. Vernica, "Relaxing Join and Selection Queries," Proc. 32nd Int'l Conf. Very Large Data Bases (VLDB '06), pp. 199-210, 2006.
[28] H. Lee, R.T. Ng, and K. Shim, "Extending Q-Grams to Estimate Selectivity of String Matching with Low Edit Distance," Proc. 33rd Int'l Conf. Very Large Data Bases (VLDB '07), pp. 195-206, 2007.
[29] H. Lee, R.T. Ng, and K. Shim, "Power-Law Based Estimation of Set Similarity Join Size," Proc. VLDB Endowment, vol. 2, no. 1, pp. 658-669, 2009.
[30] C. Li, J. Lu, and Y. Lu, "Efficient Merging and Filtering Algorithms for Approximate String Searches," Proc. IEEE 24th Int'l Conf. Data Eng. (ICDE '08), pp. 257-266, 2008.
[31] C. Li, B. Wang, and X. Yang, "VGRAM: Improving Performance of Approximate Queries on String Collections Using Variable-Length Grams," Proc. 33rd Int'l Conf. Very Large Data Bases (VLDB '07), pp. 303-314, 2007.
[32] G. Li, J. Fan, H. Wu, J. Wang, and J. Feng, "Dbease: Making Data Bases User-Friendly and Easily Accessible," Proc. Conf. Innovative Data Systems Research (CIDR), pp. 45-56, 2011.
[33] G. Li, J. Feng, X. Zhou, and J. Wang, "Providing Built-in Keyword Search Capabilities in Rdbms," VLDB J., vol. 20, no. 1, pp. 1-19, 2011.
[34] G. Li, S. Ji, C. Li, and J. Feng, "Efficient Type-Ahead Search on Relational Data: A Tastier Approach," Proc. 35th ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '09), pp. 695-706, 2009.
[35] G. Li, B.C. Ooi, J. Feng, J. Wang, and L. Zhou, "EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-Structured and Structured Data," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '08), pp. 903-914, 2008.
[36] F. Liu, C.T. Yu, W. Meng, and A. Chowdhury, "Effective Keyword Search in Relational Data Bases," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '06), pp. 563-574, 2006.
[37] Y. Luo, X. Lin, W. Wang, and X. Zhou, "Spark: Top-K Keyword Query in Relational Data Bases," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '07), pp. 115-126, 2007.
[38] R.B. Miller, "Response Time in Man-Computer Conversational Transactions," Proc. AFIPS '68: Fall Joint Computer Conf., Part I, pp. 267-277, 1968.
[39] S. Mitra, M. Winslett, W.W. Hsu, and K.C.-C. Chang, "Trustworthy Keyword Search for Compliance Storage," VLDB J.—Int'l J. Very Large Data Bases, vol. 17, no. 2, pp. 225-242, 2008.
[40] A. Nandi and H.V. Jagadish, "Effective Phrase Prediction," Proc. 33rd Int'l Conf. Very Large Data Bases (VLDB '07), pp. 219-230, 2007.
[41] L. Qin, J. Yu, and L. Chang, "Ten Thousand Sqls: Parallel Keyword Queries Computing," Proc. VLDB Endowment, vol. 3, no. 1, pp. 58-69, 2010.
[42] L. Qin, J.X. Yu, and L. Chang, "Keyword Search in Data Bases: The Power of Rdbms," Proc. 35th ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '09), pp. 681-694, 2009.
[43] S. Sarawagi and A. Kirpal, "Efficient Set Joins on Similarity Predicates," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '04), pp. 743-754, 2004.
[44] T. Tran, H. Wang, S. Rudolph, and P. Cimiano, "Top-K Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data," Proc. IEEE Int'l Conf. Data Eng. (ICDE '09), pp. 405-416, 2009.
[45] E. Ukkonen, "Finding Approximate Patterns in Strings," J. Algorithms, vol. 6, no. 1, pp. 132-137, 1985.
[46] J. Wang, G. Li, and J. Feng, "Trie-Join: Efficient Trie-Based String Similarity Joins with Edit-Distance Constraints," Proc. VLDB Endowment, vol. 3, no. 1, pp. 1219-1230, 2010.
[47] W. Wang, C. Xiao, X. Lin, and C. Zhang, "Efficient Approximate Entity Extraction with Edit Distance Constraints," Proc. 35th ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '09), pp. 759-770, 2009.
[48] C. Xiao, W. Wang, and X. Lin, "Ed-Join: An Efficient Algorithm for Similarity Joins with Edit Distance Constraints," Proc. VLDB Endowment, vol. 1, no. 1, pp. 933-944, 2008.
[49] C. Xiao, W. Wang, X. Lin, and H. Shang, "Top-K Set Similarity Joins," Proc. IEEE Int'l Conf. Data Eng. (ICDE '09), pp. 916-927, 2009.
[50] C. Xiao, W. Wang, X. Lin, and J.X. Yu, "Efficient Similarity Joins for Near Duplicate Detection," Proc. 17th Int'l Conf. World Wide Web (WWW '08), 2008.
5 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool