The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.12 - December (2011 vol.23)
pp: 1795-1810
Bolin Ding , University of Illinois at Urbana-Champaign, Urbana
Cindy Xide Lin , University of Illinois at Urbana-Champaign, Urbana
Jiawei Han , University of Illinois at Urbana-Champaign, Urbana
Chengxiang Zhai , University of Illinois at Urbana-Champaign, Urbana
Ashok Srivastava , NASA Ames Research Center, Moffett Field
Nikunj C. Oza , NASA Ames Research Center, Moffett Field
ABSTRACT
Previous studies on supporting free-form keyword queries over RDBMSs provide users with linked structures (e.g., a set of joined tuples) that are relevant to a given keyword query. Most of them focus on ranking individual tuples from one table or joins of multiple tables containing a set of keywords. In this paper, we study the problem of keyword search in a data cube with text-rich dimension(s) (so-called text cube). The text cube is built on a multidimensional text database, where each row is associated with some text data (a document) and other structural dimensions (attributes). A cell in the text cube aggregates a set of documents with matching attribute values in a subset of dimensions. We define a keyword-based query language and an IR-style relevance model for scoring/ranking cells in the text cube. Given a keyword query, our goal is to find the top-k most relevant cells. We propose four approaches: inverted-index one-scan, document sorted-scan, bottom-up dynamic programming, and search-space ordering. The search-space ordering algorithm explores only a small portion of the text cube for finding the top-k answers, and enables early termination. Extensive experimental studies are conducted to verify the effectiveness and efficiency of the proposed approaches.
INDEX TERMS
Keyword search, multidimensional text data, data cube.
CITATION
Bolin Ding, Cindy Xide Lin, Jiawei Han, Chengxiang Zhai, Ashok Srivastava, Nikunj C. Oza, "Efficient Keyword-Based Search for Top-K Cells in Text Cube", IEEE Transactions on Knowledge & Data Engineering, vol.23, no. 12, pp. 1795-1810, December 2011, doi:10.1109/TKDE.2011.34
REFERENCES
[1] C.X. Lin, B. Ding, J. Han, F. Zhu, and B. Zhao, "Text Cube: Computing ir Measures for Multidimensional Text Database Analysis," Proc. IEEE Int'l Conf. Data Mining (ICDM), pp. 905-910, 2008.
[2] S. Agrawal, S. Chaudhuri, and G. Das, "Dbxplorer: A System for Keyword-Based Search over Relational Databases," Proc. IEEE Int'l Conf. Data Eng. (ICDE), pp. 5-16, 2002.
[3] G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan, "Keyword Searching and Browsing in Databases Using Banks," Proc. IEEE Int'l Conf. Data Eng. (ICDE), pp. 431-440, 2002.
[4] B. Ding, J.X. Yu, S. Wang, L. Qin, X. Zhang, and X. Lin, "Finding Top-k Min-Cost Connected Trees in Databases," Proc. IEEE Int'l Conf. Data Eng. (ICDE), pp. 836-845, 2007.
[5] L. Qin, J.X. Yu, L. Chang, and Y. Tao, "Querying Communities in Relational Databases," Proc. IEEE Int'l Conf. Data Eng. (ICDE), pp. 724-735, 2009.
[6] V. Hristidis, L. Gravano, and Y. Papakonstantinou, "Efficient Ir-Style Keyword Search over Relational Databases," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 850-861, 2003.
[7] V. Hristidis and Y. Papakonstantinou, "Discover: Keyword Search in Relational Databases," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 670-681, 2002.
[8] V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai, and H. Karambelkar, "Bidirectional Expansion for Keyword Search on Graph Databases," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 505-516, 2005.
[9] K. Golenberg, B. Kimelfeld, and Y. Sagiv, "Keyword Proximity Search in Complex Data Graphs," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 927-940, 2008.
[10] H. He, H. Wang, J. Yang, and P.S. Yu, "Blinks: Ranked Keyword Searches on Graphs," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 305-316, 2007.
[11] G. Li, B.C. Ooi, J. Feng, J. Wang, and L. Zhou, "Ease: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-Structured and Structured Data," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 903-914, 2008.
[12] F. Liu, C.T. Yu, W. Meng, and A. Chowdhury, "Effective Keyword Search in Relational Databases," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 563-574, 2006.
[13] Y. Luo, X. Lin, W. Wang, and X. Zhou, "Spark: Top-k Keyword Query in Relational Databases," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 115-126, 2007.
[14] B. Kimelfeld and Y. Sagiv, "Finding and Approximating Top-k Answers in Keyword Proximity Search," Proc. ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS), pp. 173-182, 2006.
[15] B. Kimelfeld and Y. Sagiv, "Efficient Engines for Keyword Proximity Search," Proc. Int'l Workshop Web and Databases (WebDB), pp. 67-72, 2005.
[16] B. Kimelfeld and Y. Sagiv, "Efficiently Enumerating Results of Keyword Search," Proc. Int'l Workshop Database Programming Languages (DBPL), pp. 58-73, 2005.
[17] J.L. Elsas, J. Arguello, J. Callan, and J.G. Carbonell, "Retrieval and Feedback Models for Blog Feed Search," Proc. Int'l Conf. Research and Development in Information Retrieval (SIGIR), pp. 347-354, 2008.
[18] S.E. Robertson, S. Walker, and M. Hancock-Beaulieu, "Okapi at Trec-7: Automatic Ad Hoc, Filtering, VLC and Interactive," Proc. Text REtrieval Conf. (TREC), pp. 199-210, 1998.
[19] A. Singhal, J. Choi, D. Hindle, D. Lewis, and F. Pereira, "AT&T at TREC-7," Proc. Text REtrieval Conf. (TREC), pp. 239-252, 1998.
[20] T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein, Introduction to Algorithms, second ed. The MIT Press, 2001.
[21] R. Fagin, A. Lotem, and M. Naor, "Optimal Aggregation Algorithms for Middleware," J. Computer and System Sciences, vol. 66, no. 4, pp. 614-656, 2003.
[22] S. Chaudhuri, R. Ramakrishnan, and G. Weikum, "Integrating DB and IR Technologies: What Is the Sound of One Hand Clapping?" Proc. Conf. Innovative Data Systems Research (CIDR), pp. 1-12, 2005.
[23] S. Amer-Yahia, P. Case, T. Rölleke, J. Shanmugasundaram, and G. Weikum, "Report on the DB/IR Panel at SIGMOD 2005," ACM SIGMOD Record, vol. 34, no. 4, pp. 71-74, 2005.
[24] G. Weikum, "DB&IR: Both Sides Now," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 25-30, 2007.
[25] D. Zhang, C. Zhai, and J. Han, "Topic Cube: Topic Modeling for Olap on Multidimensional Text Databases," Proc. SIAM Int'l Conf. Data Mining (SDM), pp. 1123-1134, 2009.
[26] M.A. Hearst, A. Elliott, J. English, R.R. Sinha, K. Swearingen, and K.-P. Yee, "Finding the Flow in Web Site Search," Comm. ACM, vol. 45, no. 9, pp. 42-49, 2002.
[27] V. Sinha and D.R. Karger, "Magnet: Supporting Navigation in Semistructured Data Environments," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 97-106, 2005.
[28] D. Dash, J. Rao, N. Megiddo, A. Ailamaki, and G.M. Lohman, "Dynamic Faceted Search for Discovery-Driven Analysis," Proc. Int'l Conf. Information and Knowledge Management (CIKM), pp. 3-12, 2008.
[29] S.B. Roy, H. Wang, G. Das, U. Nambiar, and M.K. Mohania, "Minimum-Effort Driven Dynamic Faceted Search in Structured Databases," Proc. Int'l Conf. Information and Knowledge Management (CIKM), pp. 13-22, 2008.
[30] O. Ben-Yitzhak, N. Golbandi, N. Har'El, R. Lempel, A. Neumann, S. Ofek-Koifman, D. Sheinwald, E.J. Shekita, B. Sznajder, and S. Yogev, "Beyond Basic Faceted Search," Proc. Int'l Conf. Web Search and Web Data Mining (WSDM), pp. 33-44, 2008.
[31] J. Koren, Y. Zhang, and X. Liu, "Personalized Interactive Faceted Search," Proc. Int'l World Wide Web Conf. (WWW), pp. 477-486, 2008.
[32] S.B. Roy, H. Wang, U. Nambiar, G. Das, and M.K. Mohania, "Dynacet: Building Dynamic Faceted Search Systems over Databases," Proc. IEEE Int'l Conf. Data Eng. (ICDE), pp. 1463-1466, 2009.
[33] C. Li, N. Yan, S.B. Roy, L. Lisham, and G. Das, "Facetedpedia: Dynamic Generation of Query-Dependent Faceted Interfaces for Wikipedia," Proc. Int'l World Wide Web Conf. (WWW), pp. 651-660, 2010.
[34] B. Zhou and J. Pei, "Answering Aggregate Keyword Queries on Relational Databases Using Minimal Group-Bys," Proc. Int'l Conf. Extending Database Technology (EDBT), pp. 108-119, 2009.
[35] P. Wu, Y. Sismanis, and B. Reinwald, "Towards Keyword-Driven Analytical Processing," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 617-628, 2007.
[36] A. Simitsis, A. Baid, Y. Sismanis, and B. Reinwald, "Multidimensional Content Exploration," Proc. VLDB Endowment, vol. 1, no. 1, pp. 660-671, 2008.
[37] A. Baid, A. Balmin, H. Hwang, E. Nijkamp, J. Rao, B. Reinwald, A. Simitsis, Y. Sismanis, and F. van Ham, "Dbpubs: Multidimensional Exploration of Database Publications," Proc. VLDB Endowment, vol. 1, no. 2, pp. 1456-1459, 2008.
[38] D. Xin, J. Han, H. Cheng, and X. Li, "Answering Top-k Queries with Multi-Dimensional Selections: The Ranking Cube Approach," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 463-475, 2006.
[39] B. Ding, B. Zhao, C.X. Lin, J. Han, and C. Zhai, "Topcells: Keyword-Based Search of Top-k Aggregated Documents in Text Cube," Proc. IEEE Int'l Conf. Data Eng. (ICDE), pp. 381-384, 2010.
17 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool