The Community for Technology Leaders
RSS Icon
Issue No.05 - September/October (2008 vol.14)
pp: 999-1014
Hyunmo Kang , University of Maryland , College Park
Lise Getoor , University of Maryland at College Park, College Park
Ben Shneiderman , University of Maryland at College Park, College Park
Mustafa Bilgic , University of Maryland at College Park, College Park
Louis Licamele , University of Maryland at College Park, College Park
Databases often contain uncertain and imprecise references to real-world entities. Entity resolution, the process of reconciling multiple references to underlying real-world entities, is an important data cleaning process required before accurate visualization or analysis of the data is possible. In many cases, in addition to noisy data describing entities, there is data describing the relationships among the entities. This relational data is important during the entity resolution process; it is useful both for the algorithms which determine likely database references to be resolved and for visual analytic tools which support the entity resolution process. In this paper, we introduce a novel user interface, D-Dupe, for interactive entity resolution in relational data. D-Dupe effectively combines relational entity resolution algorithms with a novel network visualization that enables users to make use of an entity's relational context for making resolution decisions. Since resolution decisions often are interdependent, D-Dupe facilitates understanding this complex process through animations which highlight combined inferences and a history mechanism which allows users to inspect chains of resolution decisions. An empirical study with 12 users confirmed the benefits of the relational context visualization on the performance of entity resolution tasks in relational data in terms of time as well as users' confidence and satisfaction.
User interfaces, Human-centered computing, Graphical user interfaces, User-centered design, Information visualization
Hyunmo Kang, Lise Getoor, Ben Shneiderman, Mustafa Bilgic, Louis Licamele, "Interactive Entity Resolution in Relational Data: A Visual Analytic Tool and Its Evaluation", IEEE Transactions on Visualization & Computer Graphics, vol.14, no. 5, pp. 999-1014, September/October 2008, doi:10.1109/TVCG.2008.55
[1] A. Monge and C. Elkan, “The Field Matching Problem: Algorithms and Applications,” Proc. ACM Int'l Conf. Knowledge Discovery and Data Mining (SIGKDD), 1996.
[2] An Atlas of Cyberspace, www.cybergeography.orgatlas, 2008.
[3] B. Bederson, J. Grosjean, and J. Meyer, “Toolkit Design for Interactive Structured Graphics,” IEEE Trans. Software Eng., vol. 30, no. 8, pp. 535-546, Aug. 2004.
[4] B. Shneiderman and A. Aris, “Network Visualization by Semantic Substrates,” IEEE Trans. Visualization and Computer Graphics, vol. 12, no. 5, pp. 733-740, Sept./Oct. 2006.
[5] D. Kalashnikov, S. Mehrotra, and Z. Chen, “Exploiting Relationships for Domain-Independent Data Cleaning,” Proc. SIAM Int'l Conf. Data Mining (SIAM SDM), 2005.
[6] E. Adar, “Guess: A Language and Interface for Graph Exploration,” Proc. Conf. Human Factors in Computing Systems (CHI '06), pp.791-800, 2006.
[7] E. Rahm and P. Bernstein, “A Survey of Approaches to Automatic Schema Matching,” The VLDB J., vol. 10, no. 4, 2001.
[8] E.S. Ristad and P.N. Yianilos, “Learning String-Edit Distance,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 5, pp. 522-532, May 1998.
[9] G. Di Battista, P. Eades, R. Tamassia, and I.G. Tollis, Graph Drawing: Algorithms for the Visualization of Graphs. Prentice-Hall, 1999.
[10] G. Navarro, “A Guided Tour to Approximate String Matching,” ACM Computing Surveys, vol. 33, no. 1, pp. 31-88, 2001.
[11] G.E. Krasner and S.T. Pope, “A Cookbook for Using the Model-View-Controller User Interface Paradigm in Smalltalk-80,” J.Object-Oriented Programming, vol. 1, no. 3, pp. 26-49, 1988.
[12] H. Kang and B. Shneiderman, “Exploring Personal Media: A Spatial Interface Supporting User-Defined Semantic Regions,” J.Visual Language and Computing, vol. 17, no. 3, pp. 254-283, 2006.
[13] H. Kang, L. Getoor, and L. Singh, “Visual Analysis of Dynamic Group Membership in Temporal Social Networks,” SIGKDD Explorations: Special Issue on Visual Analytics, vol. 9, no. 2, pp. 13-21, 2007.
[14] H. Kang, V. Sehgal, and L. Getoor, “GeoDDupe: A Novel Interface for Interactive Entity Resolution in GeoSpatial Data,” Proc. Int'l Conf. Information Visualisation (IV '07), pp. 489-496, 2007.
[15] I. Bhattacharya and L. Getoor, “Collective Entity Resolution in Relational Data,” ACM Trans. Knowledge Discovery from Data (TKDD '07), vol. 1, no. 1, 2007.
[16] I. Bhattacharya and L. Getoor, “Entity Resolution in Graphs,” Mining Graph Data, L.B. Holder and D.J. Cook, eds., Wiley, 2006.
[17] I. Bhattacharya and L. Getoor, “Iterative Record Linkage for Cleaning and Integration,” Proc. ACM SIGMOD Workshop Data Mining and Knowledge Discovery (DMKD '04), pp. 11-18, 2004.
[18] I. Herman, G. Melançon, and M.S. Marshall, “Graph Visualization and Navigation in Information Visualization: A Survey,” IEEE Trans. Visualization and Computer Graphics, vol. 6, no. 1, pp. 24-43, Jan.-Nar, 2000.
[19] J. Heer, S.K. Card, and J.A. Landay, “Prefuse: A Toolkit for Interactive Information Visualization,” Proc. Conf. Human Factors in Computing Systems (CHI '05), pp. 421-430, 2005.
[20] J. O'Madadhain, D. Fisher, P. Smyth, S. White, and Y.B. Boey, “Analysis and Visualization of Network Data Using JUNG,” J.Statistical Software, 2005.
[21] L.C. Freeman, “Visualizing Social Networks,” J. Social Structure, vol. 1, no. 1, 2000.
[22] L. Freeman, The Development of Social Network Analysis: A Study in the Sociology of Science. Empirical Press, 2004.
[23] M. Baur, M. Benkert, U. Brandes, S. Cornelsen, M. Gaertler, B. Köpf, J. Lerner, and D. Wagner, “Visone Software for Visual Social Network Analysis,” Graph Drawing Software, P.Mutzel, M. Jünger, and S. Leipert, eds., pp.463-464, Springer, 2002.
[24] M. Bilenko, B. Kamath, and R.J. Mooney, “Adaptive Blocking: Learning to Scale Up Record Linkage,” Proc. Int'l Conf. Data Mining (ICDM '06), pp. 87-96, 2006.
[25] M. Bilenko and R.J. Mooney, “Adaptive Duplicate Detection Using Learnable String Similarity Measures,” Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '03), pp. 39-48, 2003.
[26] M. Bilenko, R. Mooney, W. Cohen, P. Ravikumar, and S. Fienberg, “Adaptive Name Matching in Information Integration,” IEEE Intelligent Systems, vol. 18, no. 5, pp. 16-23, Sept./Oct. 2003.
[27] M. Bilgic, L. Licamele, L. Getoor, and B. Shneiderman, “D-Dupe: An Interactive Tool for Entity Resolution in Social Networks,” Proc. IEEE Symp. Visual Analytics Science and Technology (VAST '06), pp. 43-50, 2006.
[28] Netminer II: Social Network Mining Software, http://www.netminer. com/NetMinerhome 01.jsp , 2008.
[29] P. Singla and P. Domingos, “Multi-Relational Record Linkage,” Proc. ACM SIGKDD Workshop Multi-Relational Data Mining (MRDM), 2004.
[30] R. Ananthakrishna, S. Chaudhuri, and V. Ganti, “Eliminating Fuzzy Duplicates in Data Warehouses,” Proc. Int'l Conf. Very Large Databases (VLDB), 2002.
[31] S. Chaudhuri, K. Ganjam, V. Ganti, and R. Motwani, “Robust and Efficient Fuzzy Match for Online Data Cleaning,” Proc. ACM SIGMOD, 2003.
[32] S. Sarawagi and A. Bhamidipaty, “Interactive Deduplication Using Active Learning,” Proc. Eighth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '02), pp. 269-278, 2002.
[33] S. Wasserman and K. Faust, Social Network Analysis: Methods and Applications. Cambridge Univ. Press, 1994.
[34] S. Tejada, C. Knoblock, and S. Minton, “Learning Object Identification Rules for Information Integration,” Information Systems J., vol. 26, no. 8, pp. 635-656, 2001.
[35] SimMetrics: Open Source Similarity Measure Library, , 2007.
[36] T. Dasu and T. Johnson, Exploratory Data Mining and Data Cleaning. John Wiley and Sons, 2003.
[37] U. Brandes, T. Raab, and D. Wagner, “Exploratory Network Visualization: Simultaneous Display of Actor Status and Connections,” J. Social Structure, vol. 2, no. 4, 2001.
[38] V. Raman and J. Hellerstein, “Potter's Wheel: An Interactive Data Cleaning System,” Proc. Int'l Conf. Very Large Databases (VLDB '01), pp. 381-390, 2001.
[39] Visual Complexity, http:/, 2007.
[40] W.W. Cohen, P. Ravikumar, and S.E. Fienberg, “A Comparison of String Distance Metrics for Name-Matching Tasks,” Proc. IJCAI Workshop Information Integration on the Web (IIWeb '03), pp. 73-78, 2003.
[41] X. Dong, A. Halevy, and J. Madhavan, “Reference Reconciliation in Complex Information Spaces,” Proc. ACM SIGMOD, 2005.
529 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool