The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.01 - Jan. (2014 vol.26)
pp: 30-42
Joel Coffman , University of Virginia, Charlottesville
Alfred C. Weaver , University of Virginia, Charlottesville
ABSTRACT
Extending the keyword search paradigm to relational data has been an active area of research within the database and IR community during the past decade. Many approaches have been proposed, but despite numerous publications, there remains a severe lack of standardization for the evaluation of proposed search techniques. Lack of standardization has resulted in contradictory results from different evaluations, and the numerous discrepancies muddle what advantages are proffered by different approaches. In this paper, we present the most extensive empirical performance evaluation of relational keyword search techniques to appear to date in the literature. Our results indicate that many existing search techniques do not provide acceptable performance for realistic retrieval tasks. In particular, memory consumption precludes many search techniques from scaling beyond small data sets with tens of thousands of vertices. We also explore the relationship between execution time and factors varied in previous evaluations; our analysis indicates that most of these factors have relatively little impact on performance. In summary, our work confirms previous claims regarding the unacceptable performance of these search techniques and underscores the need for standardization in evaluations--standardization exemplified by the IR community.
INDEX TERMS
Keyword search, Benchmark testing, Internet, Databases, Encyclopedias, Electronic publishing,empirical evaluation, Keyword search, relational database, information retrieval
CITATION
Joel Coffman, Alfred C. Weaver, "An Empirical Performance Evaluation of Relational Keyword Search Techniques", IEEE Transactions on Knowledge & Data Engineering, vol.26, no. 1, pp. 30-42, Jan. 2014, doi:10.1109/TKDE.2012.228
REFERENCES
[1] D. Fallows, "Search Engine Use," technical report, Pew Internet and Am. Life Project, http://www.pewinternet.org/Reports/2008Search-Engine-Use.aspx . Aug. 2008.
[2] comScore, "Global Search Market Grows 46 Percent in 2009," http://www.comscore.com/Press_Events/Press_Releases/ 2010/1Global_Searc h_Market_Grows_46_%_in_2009 , Jan. 2010.
[3] J. Coffman and A.C. Weaver, "A Framework for Evaluating Database Keyword Search Strategies," Proc. 19th ACM Int'l Conf. Information and Knowledge Management (CIKM '10), pp. 729-738, Oct. 2010.
[4] Y. Chen, W. Wang, Z. Liu, and X. Lin, "Keyword Search on Structured and Semi-Structured Data," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '09), pp. 1005-1010, June 2009.
[5] W. Webber, "Evaluating the Effectiveness of Keyword Search," IEEE Data Eng. Bull., vol. 33, no. 1, pp. 54-59, Mar. 2010.
[6] A. Baid, I. Rae, J. Li, A. Doan, and J. Naughton, "Toward Scalable Keyword Search over Relational Data," Proc. VLDB Endowment, vol. 3, no. 1, pp. 140-149, 2010.
[7] Q. Su and J. Widom, "Indexing Relational Database Content Offline for Efficient Keyword-Based Search," Proc. Ninth Int'l Database Eng. and Application Symp. (IDEAS '05), pp. 297-306, July 2005.
[8] V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai, and H. Karambelkar, "Bidirectional Expansion For Keyword Search on Graph Databases," Proc. 31st Int'l Conf. Very Large Data Bases (VLDB '05), pp. 505-516, Aug. 2005.
[9] H. He, H. Wang, J. Yang, and P.S. Yu, "BLINKS: Ranked Keyword Searches on Graphs," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '07), pp. 305-316, June 2007.
[10] G. Kasneci, M. Ramanath, M. Sozio, F.M. Suchanek, and G. Weikum, "STAR: Steiner-Tree Approximation in Relationship Graphs," Proc. Int'l Conf. Data Eng. (ICDE '09), pp. 868-879, Mar. 2009.
[11] G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan, "Keyword Searching and Browsing in Databases Using BANKS," Proc. 18th Int'l Conf. Data Eng. (ICDE '02), pp. 431-440, Feb. 2002.
[12] B. Ding, J.X. Yu, S. Wang, L. Qin, X. Zhang, and X. Lin, "Finding Top-k Min-Cost Connected Trees in Databases," Proc. 23rd Int'l Conf. Data Eng. (ICDE '07), pp. 836-845, Apr. 2007.
[13] G. Li, B.C. Ooi, J. Feng, J. Wang, and L. Zhou, "EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-Structured and Structured Data," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '08), pp. 903-914, June 2008.
[14] L. Qin, J. Yu, L. Chang, and Y. Tao, "Querying Communities in Relational Databases," Proc. IEEE Int'l Conf. Data Eng. (ICDE '09), pp. 724-735, Mar. 2009.
[15] G. Li, J. Feng, X. Zhou, and J. Wang, "Providing Built-in Keyword Search Capabilities in RDBMS," The VLDB J., vol. 20, pp. 1-19, Feb. 2011.
[16] V. Hristidis and Y. Papakonstantinou, "DISCOVER: Keyword Search in Relational Databases," Proc. 28th Int'l Conf. Very Large Data Base (VLDB '02), pp. 670-681, Aug. 2002.
[17] V. Hristidis, L. Gravano, and Y. Papakonstantinou, "Efficient IR-Style Keyword Search over Relational Databases," Proc. 29th Int'l Conf. Very Large Data Bases (VLDB '03), pp. 850-861, Sept. 2003.
[18] A. Singhal, J. Choi, D. Hindle, D. Lewis, and F. Pereira, "AT&T at TREC-7," Proc. Seventh Text REtrieval Conf. (TREC-7), pp. 239-252, Nov. 1999.
[19] S.E. Dreyfus and R.A. Wagner, "The Steiner Problem in Graphs," Networks, vol. 1, no. 3, pp. 195-207, 1971.
[20] G. Reich and P. Widmayer, "Beyond Steiner's Problem: A VLSI Oriented Generalization," Proc. 15th Int'l Workshop Graph-Theoretic Concepts in Computer Science, pp. 196-210, 1990.
[21] W. May, "Information Extraction and Integration with Florid: The Mondial Case Study," Technical Report 131, Universität Freiburg, Institut für Informatik, 1999.
[22] G. Pass, A. Chowdhury, and C. Torgeson, "A Picture of Search," Proc. First Int'l Conf. Scalable Information Systems (InfoScale '06), May 2006.
[23] J. Coffman and A.C. Weaver, "What Are We Searching For? Analyzing User Objectives When Searching Relational Data," Proc. Workshop Web Search Click Data (WSCD '12), Feb. 2012.
[24] K. Golenberg, B. Kimelfeld, and Y. Sagiv, "Keyword Proximity Search in Complex Data Graphs," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '08), pp. 927-940, June 2008.
[25] J. Coffman and A.C. Weaver, "An Empirical Performance Evaluation of Relational Keyword Search Systems," Technical Report CS-2011-07, Univ. of Virginia, 2011.
[26] X. Yang, C.M. Procopiuc, and D. Srivastava, "Summarizing Relational Databases," Proc. VLDB Endowment, vol. 2, pp. 634-645, Aug. 2009.
[27] F. Liu, C. Yu, W. Meng, and A. Chowdhury, "Effective Keyword Search in Relational Databases," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '06), pp. 563-574, June 2006.
[28] Y. Luo, X. Lin, W. Wang, and X. Zhou, "SPARK: Top-$k$ Keyword Query in Relational Databases," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '07), pp. 115-126, June 2007.
[29] Y. Luo, W. Wang, X. Lin, X. Zhou, J. Wang, and K. Li, "SPARK2: Top-k Keyword Query in Relational Databases," IEEE Trans. Knowledge and Data Eng., vol. 23, no. 12, pp. 1763-1780, Dec. 2011.
[30] B.B. Dalvi, M. Kshirsagar, and S. Sudarshan, "Keyword Search on External Memory Data Graphs," Proc. VLDB Endowment, vol. 1, no. 1, pp. 1189-1204, 2008.
[31] S. Chaudhuri and G. Das, "Keyword Querying and Ranking in Databases," Proc. VLDB Endowment, vol. 2, pp. 1658-1659, Aug. 2009.
[32] J.X. Yu, L. Qin, and L. Chang, Keyword Search in Databases, first ed. Morgan and Claypool Publishers, 2010.
[33] L. Qin, J.X. Yu, and L. Chang, "Keyword Search in Databases: The Power of RDBMS," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '09), pp. 681-694, June 2009.
[34] V. Bicer, T. Tran, and R. Nedkov, "Ranking Support for Keyword Search on Structured Data Using Relevance Models," Proc. 20th ACM Int'l Conf. Information and Knowledge Management (CIKM '11), pp. 1669-1678, 2011.
[35] J. Coffman and A.C. Weaver, "Learning to Rank Results in Relational Keyword Search," Proc. 20th ACM Int'l Conf. Information and Knowledge Management (CIKM '11), Oct. 2011.
[36] Y. Mass and Y. Sagiv, "Language Models for Keyword Search over Data Graphs," Proc. Fifth ACM Int'l Conf. Web Search and Data Mining (WSDM '12), pp. 363-372, Feb. 2012.
[37] G. Li, J. Feng, and L. Zhou, "RETUNE: Retrieving and Materializing Tuple Units for Effective Keyword Search over Relational Databases," Proc. Int'l Conf. Conceptual Modeling, pp. 469-483, 2008.
[38] J. Feng, G. Li, and J. Wang, "Finding Top-k Answers in Keyword Search over Relational Databases Using Tuple Units," Trans. Knowledge and Data Eng., vol. 23, no. 12, pp. 1781-1794, Dec. 2011.
[39] S. Yogev, H. Roitman, D. Carmel, and N. Zwerdling, "Towards Expressive Exploratory Search over Entity-Relationship Data," Proc. 21st Int'l Conf. Companion on World Wide Web (WWW '12 Companion), pp. 83-92, 2012.
[40] N. Fuhr, N. Gövert, G. Kazai, and M. Lalmas, "INEX: Initiative for the Evaluation of XML Retrieval," Proc. SIGIR Workshop XML and Information Retrieval, Aug. 2002.
[41] E.M. Voorhees, "The Philosophy of Information Retrieval Evaluation," Proc. Second Workshop Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems (CLEF '01), pp. 355-370, 2002.
[42] C. Cleverdon, "The Cranfield Tests on Index Language Devices," Readings in Information Retrieval, K.S. Jones and P. Willett, eds., pp. 47-59, Morgan Kaufmann, 1997.
46 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool