The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.12 - Dec. (2012 vol.24)
pp: 2143-2155
Davide Martinenghi , Politecnico di Milano, Milano
Marco Tagliasacchi , Politecnico di Milano, Milano
ABSTRACT
In this paper, we address the problem of joining ranked results produced by two or more services on the web. We consider services endowed with two kinds of access that are often available: 1) sorted access, which returns tuples sorted by score; 2) random access, which returns tuples matching a given join attribute value. Rank join operators combine objects of two or more relations and output the k combinations with the highest aggregate score. While the past literature has studied suitable bounding schemes for this setting, in this paper we focus on the definition of a pulling strategy, which determines the order of invocation of the joined services. We propose the Cost-Aware with Random and Sorted access (CARS) pulling strategy, which is derived at compile-time and is oblivious of the query-dependent score distributions. We cast CARS as the solution of an optimization problem based on a small set of parameters characterizing the joined services. We validate the proposed strategy with experiments on both real and synthetic data sets. We show that CARS outperforms prior proposals and that its overall access cost is always within a very short margin from that of an oracle-based optimal strategy. In addition, CARS is shown to be robust w.r.t. the uncertainty that may characterize the estimated parameters.
INDEX TERMS
Aggregates, Upper bound, Nickel, Context, Optimization, Search engines, Relational databases, random access, Top-k, rank join, sorted access
CITATION
Davide Martinenghi, Marco Tagliasacchi, "Cost-Aware Rank Join with Random and Sorted Access", IEEE Transactions on Knowledge & Data Engineering, vol.24, no. 12, pp. 2143-2155, Dec. 2012, doi:10.1109/TKDE.2011.161
REFERENCES
[1] Search Computing - Challenges and Directions, M. Brambilla and S. Ceri, eds. Springer, Mar. 2010.
[2] I.F. Ilyas, W.G. Aref, and A.K. Elmagarmid, "Supporting Top-k Join Queries in Relational Databases," VLDB J., vol. 13, no. 3, pp. 207-221, 2004.
[3] R. Fagin, A. Lotem, and M. Naor, "Optimal Aggregation Algorithms for Middleware," J. Computer and System Sciences, vol. 66, no. 4, pp. 614-656, 2003.
[4] K. Schnaitter and N. Polyzotis, "Evaluating Rank Joins with Optimal Cost," Proc. ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS), pp. 43-52, 2008.
[5] Y.N. Silva, W.G. Aref, and M.H. Ali, "The Similarity Join Database Operator," Proc. Int'l Conf. Data Eng. (ICDE), 2010.
[6] C.A. Lang, Y.-C. Chang, and J.R. Smith, "Making the Threshold Algorithm Access Cost Aware," IEEE Trans. Knowledge and Data Eng., vol. 16, no. 10, pp. 1297-1301, Oct. 2004.
[7] H. Bast, D. Majumdar, R. Schenkel, M. Theobald, and G. Weikum, "Io-Top-k: Index-Access Optimized Top-k Query Processing," Proc. 32nd Int'l Conf. Very Large Data Bases (VLDB), pp. 475-486, 2006.
[8] U. Güntzer, W.-T. Balke, and W. Kießling, "Towards Efficient Multi-Feature Queries in Heterogeneous Environments," Proc. Int'l Conf. Information Technology: Coding and Computing (ITCC), pp. 622-628, 2001.
[9] K.C.-C. Chang and S.-w. Hwang, "Minimal Probing: Supporting Expensive Predicates for Top-k Queries," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 346-357, 2002.
[10] S.-w. Hwang and K.C.-C. Chang, "Probe Minimization by Schedule Optimization: Supporting Top-k Queries with Expensive Predicates," IEEE Trans. Knowledge and Data Eng., vol. 19, no. 5, pp. 646-662, May 2007.
[11] N. Bruno, L. Gravano, and A. Marian, "Evaluating Top-k Queries over Web-Accessible Databases," Proc. Int'l Conf. Data Eng. (ICDE), pp. 369-380, 2002.
[12] A. Marian, N. Bruno, and L. Gravano, "Evaluating Top- Queries over Web-Accessible Databases," ACM Trans. Database Systems, vol. 29, no. 2, pp. 319-362, 2004.
[13] R. Akbarinia, E. Pacitti, and P. Valduriez, "Best Position Algorithms for Top-k Queries," Proc. 33rd Int'l Conf. Very Large Data Bases (VLDB), pp. 495-506, 2007.
[14] A. Dasgupta, N. Zhang, and G. Das, "Leveraging Count Information in Sampling Hidden Databases," Proc. Int'l Conf. Data Eng. (ICDE), pp. 329-340, 2009.
[15] M. Shmueli-Scheuer, C. Li, Y. Mass, H. Roitman, R. Schenkel, and G. Weikum, "Best-Effort Top-k Query Processing under Budgetary Constraints," Proc. Int'l Conf. Data Eng. (ICDE), pp. 928-939, 2009.
[16] P.J. Haas, J.F. Naughton, and A.N. Swami, "On the Relative Cost of Sampling for Join Selectivity Estimation," Proc. ACM SIGACT-SIGMOD-SIGART Symp. Principles of Database Systems (PODS), pp. 14-24, 1994.
[17] J. Finger and N. Polyzotis, "Robust and Efficient Algorithms for Rank Join Evaluation," Proc. 35th ACM SIGMOD Int'l Conf. Management of Data, pp. 415-428, 2009.
[18] I.F. Ilyas, G. Beskales, and M.A. Soliman, "A Survey of Top-$k$ Query Processing Techniques in Relational Database Systems," ACM Computational Survey, vol. 40, no. 4,article 11, 2008.
[19] R. Fagin, "Combining Fuzzy Information from Multiple Systems," J. Computer System Sciences, vol. 58, no. 1, pp. 83-99, 1999.
[20] U. Güntzer, W.-T. Balke, and W. Kießling, "Optimizing Multi-Feature Queries for Image Databases," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 419-428, 2000.
[21] A. Natsev, Y.-C. Chang, J.R. Smith, C.-S. Li, and J.S. Vitter, "Supporting Incremental Join Queries on Ranked Inputs," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 281-290, 2001.
[22] I.F. Ilyas, R. Shah, W.G. Aref, J.S. Vitter, and A.K. Elmagarmid, "Rank-Aware Query Optimization," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 203-214, 2004.
[23] I.F. Ilyas, W.G. Aref, A.K. Elmagarmid, H.G. Elmongui, R. Shah, and J.S. Vitter, "Adaptive Rank-Aware Query Optimization in Relational Databases," ACM Trans. Database Systems, vol. 31, no. 4, pp. 1257-1304, 2006.
[24] K. Schnaitter, J. Spiegel, and N. Polyzotis, "Depth Estimation for Ranking Query Optimization," Proc. ACM SIGMOD Int'l Conf. Management of Data (VLDB), pp. 902-913, 2007.
[25] B. Arai, G. Das, D. Gunopulos, and N. Koudas, "Anytime Measures for Top-Algorithms on Exact and Fuzzy Data Sets," VLDB J., vol. 18, no. 2, pp. 407-427, 2009.
14 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool