|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
Fifth IEEE International Conference on Data Mining (ICDM'05)
Adaptive Product Normalization: Using Online Learning for Record Linkage in Comparison Shopping
Houston, Texas
November 27-November 30
ISBN: 0-7695-2278-5
| ASCII Text | x | ||
| Mikhail Bilenko, Sugato Basu, Mehran Sahami, "Adaptive Product Normalization: Using Online Learning for Record Linkage in Comparison Shopping," Data Mining, IEEE International Conference on, pp. 58-65, Fifth IEEE International Conference on Data Mining (ICDM'05), 2005. | |||
| BibTex | x | ||
| @article{ 10.1109/ICDM.2005.18, author = {Mikhail Bilenko and Sugato Basu and Mehran Sahami}, title = {Adaptive Product Normalization: Using Online Learning for Record Linkage in Comparison Shopping}, journal ={Data Mining, IEEE International Conference on}, volume = {0}, year = {2005}, issn = {1550-4786}, pages = {58-65}, doi = {http://doi.ieeecomputersociety.org/10.1109/ICDM.2005.18}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - CONF JO - Data Mining, IEEE International Conference on TI - Adaptive Product Normalization: Using Online Learning for Record Linkage in Comparison Shopping SN - 1550-4786 SP58 EP65 A1 - Mikhail Bilenko, A1 - Sugato Basu, A1 - Mehran Sahami, PY - 2005 KW - null VL - 0 JA - Data Mining, IEEE International Conference on ER - | |||
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDM.2005.18
The problem of record linkage focuses on determining whether two object descriptions refer to the same underlying entity. Addressing this problem effectively has many practical applications, e.g., elimination of duplicate records in databases and citation matching for scholarly articles. In this paper, we consider a new domain where the record linkage problem is manifested: Internet comparison shopping. We address the resulting linkage setting that requires learning a similarity function between record pairs from streaming data. The learned similarity function is subsequently used in clustering to determine which records are co-referent and should be linked. We present an online machine learning method for addressing this problem, where a composite similarity function based on a linear combination of basis functions is learned incrementally. We illustrate the efficacy of this approach on several real-world datasets from an Internet comparison shopping site, and show that our method is able to effectively learn various distance functions for product data with differing characteristics. We also provide experimental results that show the importance of considering multiple performance measures in record linkage evaluation.
Citation:
Mikhail Bilenko, Sugato Basu, Mehran Sahami, "Adaptive Product Normalization: Using Online Learning for Record Linkage in Comparison Shopping," icdm, pp.58-65, Fifth IEEE International Conference on Data Mining (ICDM'05), 2005
Usage of this product signifies your acceptance of the Terms of Use.
