Subscribe

Issue No.12 - December (2008 vol.20)

pp: 1669-1682

Ke Yi , Hongkong University of Science and Technology, Hong Kong

Feifei Li , Florida State University, Tallahassee

George Kollios , Boston University, Boston

Divesh Srivastava , AT&T Labs-Research, Florham Park

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2008.90

ABSTRACT

This work introduces new algorithms for processing top-$k$ queries in uncertain databases, under the generally adopted model of x-relations. An x-relation consists of a number of x-tuples, and each x-tuple randomly instantiates into one tuple from one or more alternatives. Soliman et al.~\cite{soliman07} first introduced the problem of top-$k$ query processing in uncertain databases and proposed various algorithms to answer such queries. Under the x-relation model, our new results significantly improve the state of the art, in terms of both running time and memory usage. In the single-alternative case, our new algorithms are 2 to 3 orders of magnitude faster than the previous algorithms. In the multi-alternative case, the improvement is even more dramatic: while the previous algorithms have exponential complexity in both time and space, our algorithms run in near linear or low polynomial time. Our study covers both types of top-$k$ queries proposed in \cite{soliman07}. We provide both the theoretical analysis and an extensive experimental evaluation to demonstrate the superiority of the new approaches over existing solutions.

INDEX TERMS

Database Management, Information Technology and Systems, Database design, modeling and management, Query design and implementation languages, Analysis of Algorithms and Problem Complexity, Theory of Computation, Uncertain Database, Top-k Query, x-Relation Model

CITATION

Ke Yi, Feifei Li, George Kollios, Divesh Srivastava, "Efficient Processing of Top-k Queries in Uncertain Databases with x-Relations",

*IEEE Transactions on Knowledge & Data Engineering*, vol.20, no. 12, pp. 1669-1682, December 2008, doi:10.1109/TKDE.2008.90REFERENCES

- [1] A. Halevy, A. Rajaraman, and J. Ordille, “Data Integration: The Teenage Year,”
Proc. Int'l Conf. Very Large Data Bases (VLDB), 2006.- [2] H. Galhardas, D. Florescu, and D. Shasha, “Declarative Data Cleaning: Language, Model, and Algorithms,”
Proc. Int'l Conf. Very Large Data Bases (VLDB), 2001.- [3] S. Chaudhuri, K. Ganjam, V. Ganti, and R. Motwani, “Robust and Efficient Fuzzy Match for Online Data Cleaning,”
Proc. ACM SIGMOD, 2003.- [4] M.A. Hernandez and S.J. Stolfo, “Real-World Data Is Dirty: Data Cleansing and the Merge/Purge Problem,”
Data Mining and Knowledge Discovery, vol. 2, no. 1, pp. 9-37, 1998.- [5] A. Deshpande, C. Guestrin, S. Madden, J. Hellerstein, and W. Hong, “Model-Driven Data Acquisition in Sensor Networks,”
Proc. Int'l Conf. Very Large Data Bases (VLDB), 2004.- [6] R. Cheng, D. Kalashnikov, and S. Prabhakar, “Evaluating Probabilistic Queries over Imprecise Data,”
Proc. ACM SIGMOD, 2003.- [7] S. Abiteboul, P. Kanellakis, and G. Grahne, “On the Representation and Querying of Sets of Possible Worlds,”
Proc. ACM SIGMOD, 1987.- [8] N. Fuhr, “A Probabilistic Framework for Vague Queries and Imprecise Information in Databases,”
Proc. Int'l Conf. Very Large Data Bases (VLDB), 1990.- [12] A. Fuxman, E. Fazli, and R.J. Miller, “ConQuer: Efficient Management of Inconsistent Databases,”
Proc. ACM SIGMOD, 2005.- [13] L. Antova, C. Koch, and D. Olteanu, “${10^{10}}^{6}$ Worlds and Beyond: Efficient Representation and Processing of Incomplete Information,”
Proc. IEEE Int'l Conf. Data Eng. (ICDE), 2007.- [14] A.D. Sarma, O. Benjelloun, A. Halevy, and J. Widom, “Working Models for Uncertain Data,”
Proc. IEEE Int'l Conf. Data Eng. (ICDE), 2006.- [15] O. Benjelloun, A.D. Sarma, A. Halevy, and J. Widom, “ULDBs: Databases with Uncertainty and Lineage,”
Proc. Int'l Conf. Very Large Data Bases (VLDB), 2006.- [16] N. Dalvi and D. Suciu, “Efficient Query Evaluation on Probabilistic Databases,”
Proc. Int'l Conf. Very Large Data Bases (VLDB), 2004.- [17] P. Sen and A. Deshpande, “Representing and Querying Correlated Tuples in Probabilistic Databases,”
Proc. IEEE Int'l Conf. Data Eng. (ICDE), 2007.- [18] Y. Tao, R. Cheng, X. Xiao, W.K. Ngai, B. Kao, and S. Prabhakar, “Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions,”
Proc. Int'l Conf. Very Large Data Bases (VLDB), 2005.- [19] S. Singh, C. Mayfield, S. Prabhakar, R. Shah, and S. Hambrusch, “Indexing Uncertain Categorical Data,”
Proc. IEEE Int'l Conf. Data Eng. (ICDE), 2007.- [20] V. Ljosa and A.K. Singh, “APLA: Indexing Arbitrary Probability Distributions,”
Proc. IEEE Int'l Conf. Data Eng. (ICDE), 2007.- [21] L. Antova, C. Koch, and D. Olteanu, “From Complete to Incomplete Information and Back,”
Proc. ACM SIGMOD, 2007.- [22] 1998 World Cup Web Site Access Logs, http://ita.ee.lbl.gov/html/contribWorldCup.html , 1998.
- [23] P. Agrawal, O. Benjelloun, A. Das Sarma, C. Hayworth, S. Nabar, T. Sugihara, and J. Widom, “Trio: A System for Data, Uncertainty, and Lineage,”
Proc. Int'l Conf. Very Large Data Bases (VLDB), 2006.- [24] C. Li, K. Chang, I. Ilyas, and S. Song, “RankSQL: Query Algebra and Optimization for Relational Top-K Queries,”
Proc. ACM SIGMOD, 2005.- [25] M.A. Soliman, I.F. Ilyas, and K.C. Chang, “Top-K Query Processing in Uncertain Databases,”
Proc. IEEE Int'l Conf. Data Eng. (ICDE), 2007.- [26] C. Re, N. Dalvi, and D. Suciu, “Efficient Top-K Query Evaluation on Probabilistic Databases,”
Proc. IEEE Int'l Conf. Data Eng. (ICDE), 2007.- [27] M.A. Soliman, I.F. Ilyas, and K.C. Chang, “Top-K Query Processing in Uncertain Databases,” technical report, Univ. of Waterloo, 2007.
- [28] K. Yi, F. Li, D. Srivastava, and G. Kollios, “Efficient Processing of Top-K Queries in Uncertain Databases (poster),”
Proc. IEEE Int'l Conf. Data Eng. (ICDE), 2008.- [29] M. Hua, J. Pei, W. Zhang, and X. Lin, “Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach,”
Proc. ACM SIGMOD, 2008. |