The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.10 - Oct. (2012 vol.24)
pp: 1731-1746
Lu Qin , The Chinese University of Hong Kong, Hong Kong
Jeffrey Xu Yu , The Chinese University of Hong Kong, Hong Kong
Lijun Chang , The Chinese University of Hong Kong, Hong Kong
ABSTRACT
Keyword search in RDBs has been extensively studied in recent years. The existing studies focused on finding all or top-k interconnected tuple-structures that contain keywords. In reality, the number of such interconnected tuple-structures for a keyword query can be large. It becomes very difficult for users to obtain any valuable information more than individual interconnected tuple-structures. Also, it becomes challenging to provide a similar mechanism like group-&-aggregate for those interconnected tuple-structures. In this paper, we study computing structural statistics keyword queries by extending the group-&-aggregate framework. We consider an RDB as a large directed graph where nodes represent tuples, and edges represent the links among tuples. Instead of using tuples as a member in a group, we consider rooted subgraphs. Such a rooted subgraph represents an interconnected tuple-structure among tuples and some of the tuples contain keywords. The dimensions of the rooted subgraphs are determined by dimensional keywords in a data driven fashion. Two rooted subgraphs are grouped into the same group if they are isomorphic based on the dimensions or in other words the dimensional keywords. The scores of the rooted subgraphs are computed by a user-given score function if the rooted subgraphs contain some of general keywords. Here, the general keywords are used to compute scores rather than determining dimensions. The aggregates are computed using an sql aggregate function for every group based on the scores computed. We give our motivation using a real data set. We propose new approaches to compute structural statistics keyword queries, perform extensive performance studies using two large real data sets and a large synthetic data set, and confirm the effectiveness and efficiency of our approach.
INDEX TERMS
Aggregates, Monitoring, Keyword search, Cities and towns, Educational institutions, Relational databases, structural statistics., Keyword search, relational database
CITATION
Lu Qin, Jeffrey Xu Yu, Lijun Chang, "Computing Structural Statistics by Keywords in Databases", IEEE Transactions on Knowledge & Data Engineering, vol.24, no. 10, pp. 1731-1746, Oct. 2012, doi:10.1109/TKDE.2012.78
REFERENCES
[1] S. Agrawal, S. Chaudhuri, and G. Das, "DBXplorer: A System for Keyword-Based Search over Relational Databases," Proc. 18th Int'l Conf. Data Eng. (ICDE '02), 2002.
[2] A. Balmin, V. Hristidis, and Y. Papakonstantinou, "ObjectRank: Authority-Based Keyword Search in Databases," Proc. 13th Int'l Conf. Very Large Data Bases (VLDB '04), 2004.
[3] G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan, "Keyword Searching and Browsing in Databases Using BANKS," Proc. 18th Int'l Conf. Data Eng. (ICDE '02), 2002.
[4] M. Bhide, V.T. Chakaravarthy, K. Ramamritham, and P. Roy, "Keyword Search over Dynamic Categorized Information," Proc. IEEE Int'l Conf. Data Eng. (ICDE '09), pp. 258-269, 2009.
[5] B.B. Dalvi, M. Kshirsagar, and S. Sudarshan, "Keyword Search on External Memory Data Graphs," Proc. VLDB Endowment, vol. 1, no. 1, pp. 1189-1204, 2008.
[6] B. Ding, J.X. Yu, S. Wang, L. Qin, X. Zhang, and X. Lin, "Finding Top-K Min-Cost Connected Trees in Databases," Proc. Int'l Conf. Data Eng. (ICDE '07), 2007.
[7] S.E. Dreyfus and R.A. Wagner, "The Steiner Problem in Graphs," Networks, 1972.
[8] K. Golenberg, B. Kimelfeld, and Y. Sagiv, "Keyword Proximity Search in Complex Data Graphs," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '08), 2008.
[9] H. He, H. Wang, J. Yang, and P.S. Yu, "BLINKS: Ranked Keyword Searches on Graphs," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '07), 2007.
[10] V. Hristidis, L. Gravano, and Y. Papakonstantinou, "Efficient IR-Style Keyword Search over Relational Databases," Proc. 29th Int'l Conf. Very Large Data Bases (VLDB '03), 2003.
[11] V. Hristidis, H. Hwang, and Y. Papakonstantinou, "Authority-Based Keyword Search in Databases," ACM Trans. Database Systems, vol. 33, no. 1,article 1, 2008.
[12] V. Hristidis and Y. Papakonstantinou, "DISCOVER: Keyword Search in Relational Databases," Proc. 28th Int'l Conf. Very Large Data Bases (VLDB '02), 2002.
[13] A. Inokuchi and K. Takeda, "A Method for Online Analytical Processing of Text Data," Proc. 16th ACM Conf. Information and Knowledge Management (CIKM '07), pp. 455-464, 2007.
[14] V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai, and H. Karambelkar, "Bidirectional Expansion for Keyword Search on Graph Databases," Proc. 31st Int'l Conf. Very Large Data Bases (VLDB '05), 2005.
[15] B. Kimelfeld and Y. Sagiv, "Finding and Approximating Top-K Answers in Keyword Proximity Search," Proc. 25th ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS '06), 2006.
[16] G. Li, B.C. Ooi, J. Feng, J. Wang, and L. Zhou, "EASE: An Effective 3-in-1 Keyword Search Method for Unstructured Semi-Structured and Structured Data," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '08), 2008.
[17] C.X. Lin, B. Ding, J. Han, F. Zhu, and B. Zhao, "Text Cube: Computing ir Measures for Multidimensional Text Database Analysis," Proc. IEEE Eighth Int'l Conf. Data Mining (ICDM '08), pp. 905-910, 2008.
[18] F. Liu, C.T. Yu, W. Meng, and A. Chowdhury, "Effective Keyword Search in Relational Databases," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '06), 2006.
[19] Y. Luo, X. Lin, W. Wang, and X. Zhou, "Spark: Top-K Keyword Query in Relational Databases," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '07), 2007.
[20] A. Markowetz, Y. Yang, and D. Papadias, "Keyword Search on Relational Data Streams," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '07), 2007.
[21] J. Mothe, C. Chrisment, B. Dousset, and J. Alau, "Doccube: Multi-Dimensional Visualisation and Exploration of Large Document Sets," J. Am. Soc. for Information Science and Technology, vol. 54, no. 7, pp. 650-659, 2003.
[22] L. Qin, J.X. Yu, and L. Chang, "Keyword Search in Databases: The Power of Rdbms," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '09), 2009.
[23] L. Qin, J.X. Yu, and L. Chang, "Computing Structural Statistics by Keyword in Databases," Proc. IEEE 27th Int'l Conf. Data Eng. (ICDE '11), 2011.
[24] L. Qin, J.X. Yu, L. Chang, and Y. Tao, "Querying Communities in Relational Databases," Proc. IEEE Int'l Conf. Data Eng. (ICDE '09), 2009.
[25] A. Simitsis, A. Baid, Y. Sismanis, and B. Reinwald, "Multidimensional Content Exploration," Proc. VLDB Endowment, vol. 1, no. 1, pp. 660-671, 2008.
[26] Y. Tao and J.X. Yu, "Finding Frequent Co-Occurring Terms in Relational Keyword Search," Proc. 12th Int'l Conf. Extending Database Technology: Advances in Database Technology (EDBT '09), pp. 839-850, 2009.
[27] S. Tata and G.M. Lohman, "Sqak: Doing More with Keywords," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '08), pp. 889-902, 2008.
[28] P. Wu, Y. Sismanis, and B. Reinwald, "Towards Keyword-Driven Analytical Processing," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '07), pp. 617-628, 2007.
[29] D. Zhang, C. Zhai, and J. Han, "Topic Cube: Topic Modeling for Olap on Multidimensional Text Databases," Proc. Int'l Conf. Data Mining (SDM '09), pp. 1123-1134, 2009.
[30] B. Zhou and J. Pei, "Answering Aggregate Keyword Queries on Relational Databases Using Minimal Group-Bys," Proc. 12th Int'l Conf. Extending Database Technology: Advances in Database Technology (EDBT '09), pp. 108-119, 2009.
24 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool