Issue No.12 - Dec. (2012 vol.24)
Davide Martinenghi , Politecnico di Milano, Milano
Marco Tagliasacchi , Politecnico di Milano, Milano
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2011.161
In this paper, we address the problem of joining ranked results produced by two or more services on the web. We consider services endowed with two kinds of access that are often available: 1) sorted access, which returns tuples sorted by score; 2) random access, which returns tuples matching a given join attribute value. Rank join operators combine objects of two or more relations and output the k combinations with the highest aggregate score. While the past literature has studied suitable bounding schemes for this setting, in this paper we focus on the definition of a pulling strategy, which determines the order of invocation of the joined services. We propose the Cost-Aware with Random and Sorted access (CARS) pulling strategy, which is derived at compile-time and is oblivious of the query-dependent score distributions. We cast CARS as the solution of an optimization problem based on a small set of parameters characterizing the joined services. We validate the proposed strategy with experiments on both real and synthetic data sets. We show that CARS outperforms prior proposals and that its overall access cost is always within a very short margin from that of an oracle-based optimal strategy. In addition, CARS is shown to be robust w.r.t. the uncertainty that may characterize the estimated parameters.
Aggregates, Upper bound, Nickel, Context, Optimization, Search engines, Relational databases, random access, Top-k, rank join, sorted access
Davide Martinenghi, Marco Tagliasacchi, "Cost-Aware Rank Join with Random and Sorted Access", IEEE Transactions on Knowledge & Data Engineering, vol.24, no. 12, pp. 2143-2155, Dec. 2012, doi:10.1109/TKDE.2011.161