The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.10 - Oct. (2012 vol.24)
pp: 1819-1832
Arash Termehchy , University of Illinois at Urbana-Champaign, Urbana
Marianne Winslett , University of Illinois at Urbana-Champaign, Urbana
Yodsawalai Chodpathumwan , University of Illinois at Urbana-Champaign, Urbana
Austin Gibbons , Stanford University, Stanford
ABSTRACT
Real-world databases often have extremely complex schemas. With thousands of entity types and relationships, each with a hundred or so attributes, it is extremely difficult for new users to explore the data and formulate queries. Schema free query interfaces (SFQIs) address this problem by allowing users with no knowledge of the schema to submit queries. We postulate that SFQIs should deliver the same answers when given alternative designs for the same underlying data set. In this paper, we introduce and formally define design independence, which captures this property for SFQIs. We establish a theoretical framework to measure the amount of design independence provided by an SFQI. We show that most current SFQIs provide a very limited degree of design independence. We also show that SFQIs based on the statistical properties of data can provide design independence when the changes in the schema do not introduce or remove redundancy in the data. We propose a novel XML SFQI called Duplication Aware Coherency Ranking (DA-CR) based on information-theoretic relationships among the data items in the database, and prove that DA-CR is design independent. Our extensive empirical study using three real-world data sets shows that the average case design independence of current SFQIs is considerably lower than that of DA-CR. We also show that the ranking quality of DA-CR is better than or equal to that of current SFQI methods.
INDEX TERMS
XML, Algorithm design and analysis, Redundancy, Data mining, Databases, Database languages, Heuristic algorithms, design independence., Query interface
CITATION
Arash Termehchy, Marianne Winslett, Yodsawalai Chodpathumwan, Austin Gibbons, "Design Independent Query Interfaces", IEEE Transactions on Knowledge & Data Engineering, vol.24, no. 10, pp. 1819-1832, Oct. 2012, doi:10.1109/TKDE.2012.57
REFERENCES
[1] L. Guo, F. Shao, C. Botev, and J. Shanmugasundaram, "XRANK: Ranked Keyword Search over XML Documents," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2003.
[2] S. Cohen, J. Mamou, Y. Kanza, and Y. Sagiv, "XSearch: A Semantic Search Engine for XML," Proc. 29th Int'l Conf. Very Large Databases (VLDB), 2003.
[3] Y. Li, C. Yu, and H.V. Jagadish, "Schema-Free XQuery," Proc. 30th Int'l Conf. Very Large Databases (VLDB), 2004.
[4] Y. Xu and Y. Papakonstantinou, "Efficient Keyword Search for Smallest LCAs in XML Databases," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2005.
[5] Z. Liu and Y. Chen, "Reasoning and Identifying Relevant Matches for XML Keyword Search," Proc. VLDB Endowment, vol. 1, pp. 921-932, 2008.
[6] Z. Bao, T.W. Ling, B. Chen, and J. Lu, "Effective XML Keyword Search with Relevance Oriented Ranking," Proc. IEEE 25th Int'l Conf. Data Eng. (ICDE), 2009.
[7] A. Termehchy and M. Winslett, "Using Structural Information in XML Keyword Search Effectively," ACM Trans. Database Systems, vol. 36, no. 1,article 4, 2011.
[8] M. Arenas and L. Libkin, "A Normal Form for XML Documents," ACM Trans. Database Systems, vol. 29, no. 1, pp. 195-232, 2004.
[9] E.F. Codd, "A Relational Model of Data for Large Shared Data Banks," Comm. ACM, vol. 13, no. 6, pp. 377-387, 1970.
[10] M. Zaki, "Efficiently Mining Frequent Trees in a Forest," IEEE Trans. Knowledge and Data Eng., vol. 17, no. 8, pp. 1021-1035, Aug. 2005.
[11] G. Li, J. Feng, J. Wang, and L. Zhou, "Effective Keyword Search for Valuable LCAs over XML Documents," Proc. ACM 16th Conf. Information and Knowledge Management (CIKM), 2007.
[12] R. Hull, "Relative Information Capacity of Simple Relational Database Schemata," SIAM J. Computing, vol. 15, no. 3, pp. 856-886, 1986.
[13] W. Fan and P. Bohannon, "Information Preserving XML Schema Embedding," ACM Trans. Database Systems, vol. 33, no. 1,article 4, 2008.
[14] T. Antonopoulos, W. Martens, and F. Neven, "The Complexity of Text-Preserving XML Transformations," Proc. 30th ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS), 2011.
[15] M. Vardi, "The Universal-Relation Data Model for Logical Independence," IEEE Software, vol. 5, no. 4, pp. 80-85, July 1988.
[16] A. Termehchy, M. Winslett, and Y. Chodpathumwan, "How Schema Independent Are Schema Free Query Interfaces?" Proc. IEEE 27th Int'l Conf. Data Eng. (ICDE), 2011.
[17] Y. Zhou and B. Croft, "Ranking Robustness: A Novel Framework to Predict Query Performance," Proc. ACM 15th Conf. Information and Knowledge Management (CIKM), 2006.
[18] D. Fetterly, M. Manasse, M. Najork, and J.L. Wiener, "A Large-Scale Study of the Evolution of Web Pages," Proc. 12th Int'l Conf. World Wide Web (WWW), 2003.
[19] A. Termehchy and M. Winslett, "Keyword Search for Data-Centric XML Collections with Long Text Fields," Proc. 13th Int'l Conf. Extending Database Technology (EDBT), 2010.
[20] C. Manning, P. Raghavan, and H. Schutze, An Introduction to Information Retrieval. Cambridge Univ. Press, 2008.
[21] C. Yu and H.V. Jagadish, "Schema Summarization," Proc. 32nd Int'l Conf. Very large Databases (VLDB), 2006.
[22] E. Elmacioglu and D. Lee, "On Six Degrees of Separation in DBLP-DB and More," SIGMOD Record, vol. 34, pp. 33-40, 2005.
[23] Y. Luo, X. Lin, W. Wang, and X. Zhou, "SPARK: Top-k Keyword Query in Relational Databases," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2007.
[24] M. Kendall and J.D. Gibbons, Rank Correlation Methods. Edvard Ar nold, 1990.
46 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool