This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Efficient Processing of Top-k Queries in Uncertain Databases with x-Relations
December 2008 (vol. 20 no. 12)
pp. 1669-1682
Ke Yi, Hongkong University of Science and Technology, Hong Kong
Feifei Li, Florida State University, Tallahassee
George Kollios, Boston University, Boston
Divesh Srivastava, AT&T Labs-Research, Florham Park
This work introduces new algorithms for processing top-$k$ queries in uncertain databases, under the generally adopted model of x-relations. An x-relation consists of a number of x-tuples, and each x-tuple randomly instantiates into one tuple from one or more alternatives. Soliman et al.~\cite{soliman07} first introduced the problem of top-$k$ query processing in uncertain databases and proposed various algorithms to answer such queries. Under the x-relation model, our new results significantly improve the state of the art, in terms of both running time and memory usage. In the single-alternative case, our new algorithms are 2 to 3 orders of magnitude faster than the previous algorithms. In the multi-alternative case, the improvement is even more dramatic: while the previous algorithms have exponential complexity in both time and space, our algorithms run in near linear or low polynomial time. Our study covers both types of top-$k$ queries proposed in \cite{soliman07}. We provide both the theoretical analysis and an extensive experimental evaluation to demonstrate the superiority of the new approaches over existing solutions.

[1] A. Halevy, A. Rajaraman, and J. Ordille, “Data Integration: The Teenage Year,” Proc. Int'l Conf. Very Large Data Bases (VLDB), 2006.
[2] H. Galhardas, D. Florescu, and D. Shasha, “Declarative Data Cleaning: Language, Model, and Algorithms,” Proc. Int'l Conf. Very Large Data Bases (VLDB), 2001.
[3] S. Chaudhuri, K. Ganjam, V. Ganti, and R. Motwani, “Robust and Efficient Fuzzy Match for Online Data Cleaning,” Proc. ACM SIGMOD, 2003.
[4] M.A. Hernandez and S.J. Stolfo, “Real-World Data Is Dirty: Data Cleansing and the Merge/Purge Problem,” Data Mining and Knowledge Discovery, vol. 2, no. 1, pp. 9-37, 1998.
[5] A. Deshpande, C. Guestrin, S. Madden, J. Hellerstein, and W. Hong, “Model-Driven Data Acquisition in Sensor Networks,” Proc. Int'l Conf. Very Large Data Bases (VLDB), 2004.
[6] R. Cheng, D. Kalashnikov, and S. Prabhakar, “Evaluating Probabilistic Queries over Imprecise Data,” Proc. ACM SIGMOD, 2003.
[7] S. Abiteboul, P. Kanellakis, and G. Grahne, “On the Representation and Querying of Sets of Possible Worlds,” Proc. ACM SIGMOD, 1987.
[8] N. Fuhr, “A Probabilistic Framework for Vague Queries and Imprecise Information in Databases,” Proc. Int'l Conf. Very Large Data Bases (VLDB), 1990.
[9] T. Imielinski and W. Lipski, “Incomplete Information in Relational Databases,” J. ACM, vol. 31, no. 4, 1984.
[10] D. Barbara, H. Garcia-Molina, and D. Porter, “The Management of Probabilistic Data,” IEEE Trans. Knowledge and Data Eng., vol. 4, no. 5, pp. 487-502, Oct. 1992.
[11] L.V.S. Lakshmanan, N. Leone, R. Ross, and V.S. Subrahmanian, “ProbView: A Flexible Probabilistic Database System,” ACM Trans. Database Systems, vol. 22, no. 3, pp. 419-469, 1997.
[12] A. Fuxman, E. Fazli, and R.J. Miller, “ConQuer: Efficient Management of Inconsistent Databases,” Proc. ACM SIGMOD, 2005.
[13] L. Antova, C. Koch, and D. Olteanu, “${10^{10}}^{6}$ Worlds and Beyond: Efficient Representation and Processing of Incomplete Information,” Proc. IEEE Int'l Conf. Data Eng. (ICDE), 2007.
[14] A.D. Sarma, O. Benjelloun, A. Halevy, and J. Widom, “Working Models for Uncertain Data,” Proc. IEEE Int'l Conf. Data Eng. (ICDE), 2006.
[15] O. Benjelloun, A.D. Sarma, A. Halevy, and J. Widom, “ULDBs: Databases with Uncertainty and Lineage,” Proc. Int'l Conf. Very Large Data Bases (VLDB), 2006.
[16] N. Dalvi and D. Suciu, “Efficient Query Evaluation on Probabilistic Databases,” Proc. Int'l Conf. Very Large Data Bases (VLDB), 2004.
[17] P. Sen and A. Deshpande, “Representing and Querying Correlated Tuples in Probabilistic Databases,” Proc. IEEE Int'l Conf. Data Eng. (ICDE), 2007.
[18] Y. Tao, R. Cheng, X. Xiao, W.K. Ngai, B. Kao, and S. Prabhakar, “Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions,” Proc. Int'l Conf. Very Large Data Bases (VLDB), 2005.
[19] S. Singh, C. Mayfield, S. Prabhakar, R. Shah, and S. Hambrusch, “Indexing Uncertain Categorical Data,” Proc. IEEE Int'l Conf. Data Eng. (ICDE), 2007.
[20] V. Ljosa and A.K. Singh, “APLA: Indexing Arbitrary Probability Distributions,” Proc. IEEE Int'l Conf. Data Eng. (ICDE), 2007.
[21] L. Antova, C. Koch, and D. Olteanu, “From Complete to Incomplete Information and Back,” Proc. ACM SIGMOD, 2007.
[22] 1998 World Cup Web Site Access Logs, http://ita.ee.lbl.gov/html/contribWorldCup.html , 1998.
[23] P. Agrawal, O. Benjelloun, A. Das Sarma, C. Hayworth, S. Nabar, T. Sugihara, and J. Widom, “Trio: A System for Data, Uncertainty, and Lineage,” Proc. Int'l Conf. Very Large Data Bases (VLDB), 2006.
[24] C. Li, K. Chang, I. Ilyas, and S. Song, “RankSQL: Query Algebra and Optimization for Relational Top-K Queries,” Proc. ACM SIGMOD, 2005.
[25] M.A. Soliman, I.F. Ilyas, and K.C. Chang, “Top-K Query Processing in Uncertain Databases,” Proc. IEEE Int'l Conf. Data Eng. (ICDE), 2007.
[26] C. Re, N. Dalvi, and D. Suciu, “Efficient Top-K Query Evaluation on Probabilistic Databases,” Proc. IEEE Int'l Conf. Data Eng. (ICDE), 2007.
[27] M.A. Soliman, I.F. Ilyas, and K.C. Chang, “Top-K Query Processing in Uncertain Databases,” technical report, Univ. of Waterloo, 2007.
[28] K. Yi, F. Li, D. Srivastava, and G. Kollios, “Efficient Processing of Top-K Queries in Uncertain Databases (poster),” Proc. IEEE Int'l Conf. Data Eng. (ICDE), 2008.
[29] M. Hua, J. Pei, W. Zhang, and X. Lin, “Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach,” Proc. ACM SIGMOD, 2008.

Index Terms:
Database Management, Information Technology and Systems, Database design, modeling and management, Query design and implementation languages, Analysis of Algorithms and Problem Complexity, Theory of Computation, Uncertain Database, Top-k Query, x-Relation Model
Citation:
Ke Yi, Feifei Li, George Kollios, Divesh Srivastava, "Efficient Processing of Top-k Queries in Uncertain Databases with x-Relations," IEEE Transactions on Knowledge and Data Engineering, vol. 20, no. 12, pp. 1669-1682, Dec. 2008, doi:10.1109/TKDE.2008.90
Usage of this product signifies your acceptance of the Terms of Use.