|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
2012 IEEE 28th International Conference on Data Engineering
Parallel Top-K Similarity Join Algorithms Using MapReduce
Arlington, Virginia USA
April 01-April 05
ISBN: 978-0-7695-4747-3
| ASCII Text | x | ||
| Younghoon Kim, Kyuseok Shim, "Parallel Top-K Similarity Join Algorithms Using MapReduce," Data Engineering, International Conference on, pp. 510-521, 2012 IEEE 28th International Conference on Data Engineering, 2012. | |||
| BibTex | x | ||
| @article{ 10.1109/ICDE.2012.87, author = {Younghoon Kim and Kyuseok Shim}, title = {Parallel Top-K Similarity Join Algorithms Using MapReduce}, journal ={Data Engineering, International Conference on}, volume = {0}, year = {2012}, issn = {1084-4627}, pages = {510-521}, doi = {http://doi.ieeecomputersociety.org/10.1109/ICDE.2012.87}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - CONF JO - Data Engineering, International Conference on TI - Parallel Top-K Similarity Join Algorithms Using MapReduce SN - 1084-4627 SP510 EP521 A1 - Younghoon Kim, A1 - Kyuseok Shim, PY - 2012 VL - 0 JA - Data Engineering, International Conference on ER - | |||
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDE.2012.87
There is a wide range of applications that require finding the top-k most similar pairs of records in a given database. However, computing such top-k similarity joins is a challenging problem today, as there is an increasing trend of applications that expect to deal with vast amounts of data. For such data-intensive applications, parallel executions of programs on a large cluster of commodity machines using the MapReduce paradigm have recently received a lot of attention. In this paper, we investigate how the top-k similarity join algorithms can get benefits from the popular MapReduce framework. We first develop the divide-and-conquer and branch-and-bound algorithms. We next propose the all pair partitioning and essential pair partitioning methods to minimize the amount of data transfers between map and reduce functions. We finally perform the experiments with not only synthetic but also real-life data sets. Our performance study confirms the effectiveness and scalability of our MapReduce algorithms.
Citation:
Younghoon Kim, Kyuseok Shim, "Parallel Top-K Similarity Join Algorithms Using MapReduce," icde, pp.510-521, 2012 IEEE 28th International Conference on Data Engineering, 2012
Usage of this product signifies your acceptance of the Terms of Use.
