This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Efficient Keyword-Based Search for Top-K Cells in Text Cube
December 2011 (vol. 23 no. 12)
pp. 1795-1810
Bolin Ding, University of Illinois at Urbana-Champaign, Urbana
Bo Zhao, University of Illinois at Urbana-Champaign, Urbana
Cindy Xide Lin, University of Illinois at Urbana-Champaign, Urbana
Jiawei Han, University of Illinois at Urbana-Champaign, Urbana
Chengxiang Zhai, University of Illinois at Urbana-Champaign, Urbana
Ashok Srivastava, NASA Ames Research Center, Moffett Field
Nikunj C. Oza, NASA Ames Research Center, Moffett Field
Previous studies on supporting free-form keyword queries over RDBMSs provide users with linked structures (e.g., a set of joined tuples) that are relevant to a given keyword query. Most of them focus on ranking individual tuples from one table or joins of multiple tables containing a set of keywords. In this paper, we study the problem of keyword search in a data cube with text-rich dimension(s) (so-called text cube). The text cube is built on a multidimensional text database, where each row is associated with some text data (a document) and other structural dimensions (attributes). A cell in the text cube aggregates a set of documents with matching attribute values in a subset of dimensions. We define a keyword-based query language and an IR-style relevance model for scoring/ranking cells in the text cube. Given a keyword query, our goal is to find the top-k most relevant cells. We propose four approaches: inverted-index one-scan, document sorted-scan, bottom-up dynamic programming, and search-space ordering. The search-space ordering algorithm explores only a small portion of the text cube for finding the top-k answers, and enables early termination. Extensive experimental studies are conducted to verify the effectiveness and efficiency of the proposed approaches.

[1] C.X. Lin, B. Ding, J. Han, F. Zhu, and B. Zhao, "Text Cube: Computing ir Measures for Multidimensional Text Database Analysis," Proc. IEEE Int'l Conf. Data Mining (ICDM), pp. 905-910, 2008.
[2] S. Agrawal, S. Chaudhuri, and G. Das, "Dbxplorer: A System for Keyword-Based Search over Relational Databases," Proc. IEEE Int'l Conf. Data Eng. (ICDE), pp. 5-16, 2002.
[3] G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan, "Keyword Searching and Browsing in Databases Using Banks," Proc. IEEE Int'l Conf. Data Eng. (ICDE), pp. 431-440, 2002.
[4] B. Ding, J.X. Yu, S. Wang, L. Qin, X. Zhang, and X. Lin, "Finding Top-k Min-Cost Connected Trees in Databases," Proc. IEEE Int'l Conf. Data Eng. (ICDE), pp. 836-845, 2007.
[5] L. Qin, J.X. Yu, L. Chang, and Y. Tao, "Querying Communities in Relational Databases," Proc. IEEE Int'l Conf. Data Eng. (ICDE), pp. 724-735, 2009.
[6] V. Hristidis, L. Gravano, and Y. Papakonstantinou, "Efficient Ir-Style Keyword Search over Relational Databases," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 850-861, 2003.
[7] V. Hristidis and Y. Papakonstantinou, "Discover: Keyword Search in Relational Databases," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 670-681, 2002.
[8] V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai, and H. Karambelkar, "Bidirectional Expansion for Keyword Search on Graph Databases," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 505-516, 2005.
[9] K. Golenberg, B. Kimelfeld, and Y. Sagiv, "Keyword Proximity Search in Complex Data Graphs," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 927-940, 2008.
[10] H. He, H. Wang, J. Yang, and P.S. Yu, "Blinks: Ranked Keyword Searches on Graphs," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 305-316, 2007.
[11] G. Li, B.C. Ooi, J. Feng, J. Wang, and L. Zhou, "Ease: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-Structured and Structured Data," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 903-914, 2008.
[12] F. Liu, C.T. Yu, W. Meng, and A. Chowdhury, "Effective Keyword Search in Relational Databases," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 563-574, 2006.
[13] Y. Luo, X. Lin, W. Wang, and X. Zhou, "Spark: Top-k Keyword Query in Relational Databases," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 115-126, 2007.
[14] B. Kimelfeld and Y. Sagiv, "Finding and Approximating Top-k Answers in Keyword Proximity Search," Proc. ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS), pp. 173-182, 2006.
[15] B. Kimelfeld and Y. Sagiv, "Efficient Engines for Keyword Proximity Search," Proc. Int'l Workshop Web and Databases (WebDB), pp. 67-72, 2005.
[16] B. Kimelfeld and Y. Sagiv, "Efficiently Enumerating Results of Keyword Search," Proc. Int'l Workshop Database Programming Languages (DBPL), pp. 58-73, 2005.
[17] J.L. Elsas, J. Arguello, J. Callan, and J.G. Carbonell, "Retrieval and Feedback Models for Blog Feed Search," Proc. Int'l Conf. Research and Development in Information Retrieval (SIGIR), pp. 347-354, 2008.
[18] S.E. Robertson, S. Walker, and M. Hancock-Beaulieu, "Okapi at Trec-7: Automatic Ad Hoc, Filtering, VLC and Interactive," Proc. Text REtrieval Conf. (TREC), pp. 199-210, 1998.
[19] A. Singhal, J. Choi, D. Hindle, D. Lewis, and F. Pereira, "AT&T at TREC-7," Proc. Text REtrieval Conf. (TREC), pp. 239-252, 1998.
[20] T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein, Introduction to Algorithms, second ed. The MIT Press, 2001.
[21] R. Fagin, A. Lotem, and M. Naor, "Optimal Aggregation Algorithms for Middleware," J. Computer and System Sciences, vol. 66, no. 4, pp. 614-656, 2003.
[22] S. Chaudhuri, R. Ramakrishnan, and G. Weikum, "Integrating DB and IR Technologies: What Is the Sound of One Hand Clapping?" Proc. Conf. Innovative Data Systems Research (CIDR), pp. 1-12, 2005.
[23] S. Amer-Yahia, P. Case, T. Rölleke, J. Shanmugasundaram, and G. Weikum, "Report on the DB/IR Panel at SIGMOD 2005," ACM SIGMOD Record, vol. 34, no. 4, pp. 71-74, 2005.
[24] G. Weikum, "DB&IR: Both Sides Now," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 25-30, 2007.
[25] D. Zhang, C. Zhai, and J. Han, "Topic Cube: Topic Modeling for Olap on Multidimensional Text Databases," Proc. SIAM Int'l Conf. Data Mining (SDM), pp. 1123-1134, 2009.
[26] M.A. Hearst, A. Elliott, J. English, R.R. Sinha, K. Swearingen, and K.-P. Yee, "Finding the Flow in Web Site Search," Comm. ACM, vol. 45, no. 9, pp. 42-49, 2002.
[27] V. Sinha and D.R. Karger, "Magnet: Supporting Navigation in Semistructured Data Environments," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 97-106, 2005.
[28] D. Dash, J. Rao, N. Megiddo, A. Ailamaki, and G.M. Lohman, "Dynamic Faceted Search for Discovery-Driven Analysis," Proc. Int'l Conf. Information and Knowledge Management (CIKM), pp. 3-12, 2008.
[29] S.B. Roy, H. Wang, G. Das, U. Nambiar, and M.K. Mohania, "Minimum-Effort Driven Dynamic Faceted Search in Structured Databases," Proc. Int'l Conf. Information and Knowledge Management (CIKM), pp. 13-22, 2008.
[30] O. Ben-Yitzhak, N. Golbandi, N. Har'El, R. Lempel, A. Neumann, S. Ofek-Koifman, D. Sheinwald, E.J. Shekita, B. Sznajder, and S. Yogev, "Beyond Basic Faceted Search," Proc. Int'l Conf. Web Search and Web Data Mining (WSDM), pp. 33-44, 2008.
[31] J. Koren, Y. Zhang, and X. Liu, "Personalized Interactive Faceted Search," Proc. Int'l World Wide Web Conf. (WWW), pp. 477-486, 2008.
[32] S.B. Roy, H. Wang, U. Nambiar, G. Das, and M.K. Mohania, "Dynacet: Building Dynamic Faceted Search Systems over Databases," Proc. IEEE Int'l Conf. Data Eng. (ICDE), pp. 1463-1466, 2009.
[33] C. Li, N. Yan, S.B. Roy, L. Lisham, and G. Das, "Facetedpedia: Dynamic Generation of Query-Dependent Faceted Interfaces for Wikipedia," Proc. Int'l World Wide Web Conf. (WWW), pp. 651-660, 2010.
[34] B. Zhou and J. Pei, "Answering Aggregate Keyword Queries on Relational Databases Using Minimal Group-Bys," Proc. Int'l Conf. Extending Database Technology (EDBT), pp. 108-119, 2009.
[35] P. Wu, Y. Sismanis, and B. Reinwald, "Towards Keyword-Driven Analytical Processing," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 617-628, 2007.
[36] A. Simitsis, A. Baid, Y. Sismanis, and B. Reinwald, "Multidimensional Content Exploration," Proc. VLDB Endowment, vol. 1, no. 1, pp. 660-671, 2008.
[37] A. Baid, A. Balmin, H. Hwang, E. Nijkamp, J. Rao, B. Reinwald, A. Simitsis, Y. Sismanis, and F. van Ham, "Dbpubs: Multidimensional Exploration of Database Publications," Proc. VLDB Endowment, vol. 1, no. 2, pp. 1456-1459, 2008.
[38] D. Xin, J. Han, H. Cheng, and X. Li, "Answering Top-k Queries with Multi-Dimensional Selections: The Ranking Cube Approach," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 463-475, 2006.
[39] B. Ding, B. Zhao, C.X. Lin, J. Han, and C. Zhai, "Topcells: Keyword-Based Search of Top-k Aggregated Documents in Text Cube," Proc. IEEE Int'l Conf. Data Eng. (ICDE), pp. 381-384, 2010.

Index Terms:
Keyword search, multidimensional text data, data cube.
Citation:
Bolin Ding, Bo Zhao, Cindy Xide Lin, Jiawei Han, Chengxiang Zhai, Ashok Srivastava, Nikunj C. Oza, "Efficient Keyword-Based Search for Top-K Cells in Text Cube," IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 12, pp. 1795-1810, Dec. 2011, doi:10.1109/TKDE.2011.34
Usage of this product signifies your acceptance of the Terms of Use.