2009 IEEE International Conference on Data Mining Workshops Improving Similarity Join Algorithms Using Fuzzy Clustering Technique Miami, Florida, USA December 06-December 06 ISBN: 978-0-7695-3902-7
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDMW.2009.50
In this paper, we propose a pre-processing technique to improve existing string similarity join algorithms using fuzzy clustering. Our approach first identifies groups of related attributes and then, using this information, we apply existing string similarity join algorithms on these attributes. To identify the clustered attributes we use fuzzy techniques. This approach can be applied to the integration of knowledge bases and databases, as well as handle inconsistent values and naming conventions, incorrect or missing data values, and incomplete information from multiple sources with semi-compatible attributes or homogenous attributes. Using an experimental study, we have shown our preprocessing approach improves existing string similarity join algorithms by about 10 percent on precision and recall.
Citation:
Lisa Tan, Farshad Fotouhi, William Grosky, Horia F. Pop, Noureddine Mouaddib, "Improving Similarity Join Algorithms Using Fuzzy Clustering Technique," icdmw, pp.545-550, 2009 IEEE International Conference on Data Mining Workshops, 2009 Usage of this product signifies your acceptance of the Terms of Use. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||