Issue No. 12 - December (2011 vol. 23)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2011.34
Bolin Ding , University of Illinois at Urbana-Champaign, Urbana
Bo Zhao , University of Illinois at Urbana-Champaign, Urbana
Cindy Xide Lin , University of Illinois at Urbana-Champaign, Urbana
Jiawei Han , University of Illinois at Urbana-Champaign, Urbana
Chengxiang Zhai , University of Illinois at Urbana-Champaign, Urbana
Ashok Srivastava , NASA Ames Research Center, Moffett Field
Nikunj C. Oza , NASA Ames Research Center, Moffett Field
Previous studies on supporting free-form keyword queries over RDBMSs provide users with linked structures (e.g., a set of joined tuples) that are relevant to a given keyword query. Most of them focus on ranking individual tuples from one table or joins of multiple tables containing a set of keywords. In this paper, we study the problem of keyword search in a data cube with text-rich dimension(s) (so-called text cube). The text cube is built on a multidimensional text database, where each row is associated with some text data (a document) and other structural dimensions (attributes). A cell in the text cube aggregates a set of documents with matching attribute values in a subset of dimensions. We define a keyword-based query language and an IR-style relevance model for scoring/ranking cells in the text cube. Given a keyword query, our goal is to find the top-k most relevant cells. We propose four approaches: inverted-index one-scan, document sorted-scan, bottom-up dynamic programming, and search-space ordering. The search-space ordering algorithm explores only a small portion of the text cube for finding the top-k answers, and enables early termination. Extensive experimental studies are conducted to verify the effectiveness and efficiency of the proposed approaches.
Keyword search, multidimensional text data, data cube.
B. Zhao et al., "Efficient Keyword-Based Search for Top-K Cells in Text Cube," in IEEE Transactions on Knowledge & Data Engineering, vol. 23, no. , pp. 1795-1810, 2011.