|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| Moisés G. de Carvalho, Alberto H.F. Laender, Marcos André Gonçalves, Altigran S. da Silva, "A Genetic Programming Approach to Record Deduplication," IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 3, pp. 399-412, March, 2012. | |||
| BibTex | x | ||
| @article{ 10.1109/TKDE.2010.234, author = {Moisés G. de Carvalho and Alberto H.F. Laender and Marcos André Gonçalves and Altigran S. da Silva}, title = {A Genetic Programming Approach to Record Deduplication}, journal ={IEEE Transactions on Knowledge and Data Engineering}, volume = {24}, number = {3}, issn = {1041-4347}, year = {2012}, pages = {399-412}, doi = {http://doi.ieeecomputersociety.org/10.1109/TKDE.2010.234}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Knowledge and Data Engineering TI - A Genetic Programming Approach to Record Deduplication IS - 3 SN - 1041-4347 SP399 EP412 EPD - 399-412 A1 - Moisés G. de Carvalho, A1 - Alberto H.F. Laender, A1 - Marcos André Gonçalves, A1 - Altigran S. da Silva, PY - 2012 KW - Database administration KW - evolutionary computing and genetic algorithms KW - database integration. VL - 24 JA - IEEE Transactions on Knowledge and Data Engineering ER - | |||
[1] M. Wheatley, "Operation Clean Data," CIO Asia Magazine, http:/www.cio-asia.com, Aug. 2004.
[2] N. Koudas, S. Sarawagi, and D. Srivastava, "Record Linkage: Similarity Measures and Algorithms," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 802-803, 2006.
[3] S. Chaudhuri, K. Ganjam, V. Ganti, and R. Motwani, "Robust and Efficient Fuzzy Match for Online Data Cleaning," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 313-324, 2003.
[4] I. Bhattacharya and L. Getoor, "Iterative Record Linkage for Cleaning and Integration," Proc. Ninth ACM SIGMOD Workshop Research Issues in Data Mining and Knowledge Discovery, pp. 11-18, 2004.
[5] I.P. Fellegi and A.B. Sunter, "A Theory for Record Linkage," J. Am. Statistical Assoc., vol. 66, no. 1, pp. 1183-1210, 1969.
[6] V.S. Verykios, G.V. Moustakides, and M.G. Elfeky, "A Bayesian Decision Model for Cost Optimal Record Matching," The Very Large Databases J., vol. 12, no. 1, pp. 28-40, 2003.
[7] R. Bell and F. Dravis, "Is You Data Dirty? and Does that Matter?," Accenture Whiter Paper, http:/www.accenture.com, 2006.
[8] J.R. Koza, Gentic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, 1992.
[9] W. Banzhaf, P. Nordin, R.E. Keller, and F.D. Francone, Genetic Programming - An Introduction: On the Automatic Evolution of Computer Programs and Its Applications. Morgan Kaufmann Publishers, 1998.
[10] H.M. de Almeida, M.A. Gonçalves, M. Cristo, and P. Calado, "A Combined Component Approach for Finding Collection-Adapted Ranking Functions Based on Genetic Programming," Proc. 30th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 399-406, 2007.
[11] T.P.C. Silva, E.S. de Moura, J.M.B. Cavalcanti, A.S. da Silva, M.G. de Carvalho, and M.A. Gonçalves, "An Evolutionary Approach for Combining Different Sources of Evidence in Search Engines," Information Systems, vol. 34, no. 2, pp. 276-289, 2009.
[12] B. Zhang, Y. Chen, W. Fan, E.A. Fox, M. Gonçalves, M. Cristo, and P. Calado, "Intelligent gp Fusion from Multiple Sources for Text Classification," Proc. 14th ACM Int'l Conf. Information and Knowledge Management, pp. 477-484, 2005.
[13] R.d.S. Torres, A.X. Falcao, M.A. Gonçalves, J.P. Papa, B. Zhang, W. Fan, and E.A. Fox, "A Genetic Programming Framework for Content-Based Image Retrieval," Pattern Recognition, vol. 42, no. 2, pp. 283-292, 2009.
[14] A. Lacerda, M. Cristo, M.A. Gonçalves, W. Fan, N. Ziviani, and B. Ribeiro-Neto, "Learning to Advertise," Proc. 29th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 549-556, 2006.
[15] M.G. de Carvalho, M.A. Gonçalves, A.H.F. Laender, and A.S. da Silva, "Learning to Deduplicate," Proc. Sixth ACM/IEEE CS Joint Conf. Digital Libraries, pp. 41-50, 2006.
[16] M.G. de Carvalho, A.H.F. Laender, M.A. Gonçalves, and A.S. da Silva, "Replica Identification Using Genetic Programming," Proc. 23rd Ann. ACM Symp. Applied Computing (SAC), pp. 1801-1806, 2008.
[17] M. Bilenko, R. Mooney, W. Cohen, P. Ravikumar, and S. Fienberg, "Adaptive Name Matching in Information Integration," IEEE Intelligent Systems, vol. 18, no. 5, pp. 16-23, Sept./Oct. 2003.
[18] M. Bilenko and R.J. Mooney, "Adaptive Duplicate Detection Using Learnable String Similarity Measures," Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 39-48, 2003.
[19] S. Lawrence, C.L. Giles, and K.D. Bollacker, "Autonomous Citation Matching," Proc. Third Int'l Conf. Autonomous Agents, pp. 392-393, 1999.
[20] S. Lawrence, L. Giles, and K. Bollacker, "Digital Libraries and Autonomous Citation Indexing," Computer, vol. 32, no. 6, pp. 67-71, June 1999.
[21] A.K. Elmagarmid, P.G. Ipeirotis, and V.S. Verykios, "Duplicate Record Detection: A Survey," IEEE Trans. Knowledge and Data Eng., vol. 19, no. 1, pp. 1-16, Jan. 2007.
[22] R.A. Baeza-Yates and B.A. Ribeiro-Neto, Modern Information Retrieval. ACM Press/Addison-Wesley, 1999.
[23] W.W. Cohen, "Data Integration Using Similarity Joins and a Word-Based Information Representation Language," ACM Trans. Information Systems, vol. 18, no. 3, pp. 288-321, 2000.
[24] J.C.P. Carvalho and A.S. da Silva, "Finding Similar Identities among Objects from Multiple Web Sources," Proc. Fifth ACM Int'l Workshop Web Information and Data Management, pp. 90-93, 2003.
[25] H.B. Newcombe, J.M. Kennedy, S. Axford, and A. James, "Automatic Linkage of Vital Records," Science, vol. 130, no. 3381, pp. 954-959, Oct. 1959.
[26] "Freely Extensible Biomedical Record Linkage," http:// sourceforge.net/projectsfebrl , 2011.
[27] W.W. Cohen and J. Richman, "Learning to Match and Cluster Large High-Dimensional Data Sets for Data Integration," Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 475-480, 2002.
[28] S. Tejada, C.A. Knoblock, and S. Minton, "Learning Object Identification Rules for Information Integration," Information Systems, vol. 26, no. 8, pp. 607-633, 2001.
[29] S. Guha, N. Koudas, A. Marathe, and D. Srivastava, "Merging the Results of Approximate Match Operations," Proc. 30th Int'l Conf. Very Large Data Bases, pp. 636-647, 2004.
[30] P.J. Angeline, "Genetic Programming's Continued Evolution," Advances in Genetic Programming, vol. 2, ch. 1, MIT Press, 1996.
[31] S. Tejada, C.A. Knoblock, and S. Minton, "Learning Domain-Independent String Transformation Weights for High Accuracy Object Identification," Proc. Eighth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 350-359, 2002.
[32] P. Christen, "Probabilistic Data Generation for Deduplication and Data Linkage," Intelligent Data Eng. and Automated Learning, pp. 109-116, Springer, 2005.
[33] S. Sarawagi and A. Bhamidipaty, "Interactive Deduplication Using Active Learning," Proc. Eighth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 269-278, 2002.

