This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Fuzzy Orders-of-Magnitude-Based Link Analysis for Qualitative Alias Detection
April 2012 (vol. 24 no. 4)
pp. 649-664
Qiang Shen, Aberystwyth University, Ceredigion
Tossapon Boongoen, Royal Thai Air Force Academy, Bangkok
Alias detection has been the significant subject being extensively studied for several domain applications, especially intelligence data analysis. Many preliminary methods rely on text-based measures, which are ineffective with false descriptions of terrorists' name, date-of-birth, and address. This barrier may be overcome through link information presented in relationships among objects of interests. Several numerical link-based similarity techniques have proven effective for identifying similar objects in the Internet and publication domains. However, as a result of exceptional cases with unduly high measure, these methods usually generate inaccurate similarity descriptions. Yet, they are either computationally inefficient or ineffective for alias detection with a single-property based model. This paper presents a novel orders-of-magnitude based similarity measure that integrates multiple link properties to refine the estimation process and derive semantic-rich similarity descriptions. The approach is based on order-of-magnitude reasoning with which the theory of fuzzy set is blended to provide quantitative semantics of descriptors and their unambiguous mathematical manipulation. With such explanatory formalism, analysts can validate the generated results and partly resolve the problem of false positives. It also allows coherent interpretation and communication within a decision-making group, using this computing-with-word capability. Its performance is evaluated over a terrorism-related data set, with further generalization over publication and email data collections.

[1] L.A. Adamic and E. Adar, "Friends and Neighbors on the Web," Social Networks, vol. 25, no. 3, pp. 211-230, July 2003.
[2] N. Agell, X. Rovira, and C. Ansotegui, "Homogenising References in Orders of Magnitude Spaces: An Application to Credit Risk Prediction," Proc. Int'l Workshop Qualitative Reasoning, pp. 1-8, 2000.
[3] A.H. Ali, D. Dubois, and H. Prade, "Qualitative Reasoning Based on Fuzzy Relative Orders of Magnitude," IEEE Trans. Fuzzy Systems, vol. 11, no. 1, pp. 9-23, Feb. 2003.
[4] Y.R. Baeza and N.B. Ribeiro, Modern Information Retrieval. Addison Wesley/ACM Press, 1999.
[5] M. Baroni, J. Matiasek, and H. Trost, "Unsupervised Discovery of Morphologically Related Words Based on Orthographic and Semantic Similarity.," Proc. ACL Workshop Morphological and Phonological Learning, pp. 48-57, 2002.
[6] I. Bhattacharya and L. Getoor, "Collective Entity Resolution in Relational Data," ACM Trans. Knowledge Discovery from Data, vol. 1, no. 1, article 5, Mar. 2007.
[7] M. Bilenko, R. Mooney, W. Cohen, P. Ravikumar, and S. Fienberg, "Adaptive Name Matching in Information Integration," IEEE Intelligent Systems, vol. 18, no. 5, pp. 16-23, Sept./Oct. 2003.
[8] M. Bilenko and R.J. Mooney, "Adaptive Duplicate Detection Using Learnable String Similarity Measures," Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 39-48, 2003.
[9] T. Boongoen and Q. Shen, "Order-of-Magnitude Based Link Analysis for False Identity Detection," Proc. 23rd Int'l Workshop Qualitative Reasoning, pp. 7-15, 2009.
[10] T. Boongoen and Q. Shen, "Nearest-Neighbor Guided Evaluation of Data Reliability and its Applications," to be published in IEEE Trans. Systems, Man and Cybernetics, Part B, vol. 40, no, 6, pp. 1622-1633, Dec. 2010.
[11] T. Boongoen, Q. Shen, and C. Price, "A Hybrid Link Analysis Approach for False Identity Detection," AI and Law, vol. 18, no. 1, pp. 77-102, Mar. 2010.
[12] G. Bordogna, G. Pasi, and R.R. Yager, "Soft Approaches to Distributed Information Retrieval.," Int'l J. Intelligent Systems, vol. 34, pp.105-120, Nov. 2003.
[13] L.K. Branting, "Name-Matching Algorithms for Legal Case-Management Systems," J. Information, Law and Technology, vol. 2002, no. 1, May 2002.
[14] S. Brin and L. Page, "The Anatomy of a Large-Scale Hypertextual Web Search Engine," Computer Networks and ISDN Systems, vol. 30, nos. 1-7, pp. 107-117, Apr. 1998.
[15] P. Calado, M. Cristo, M.A. Gonc¸alves, E.S. de Moura, B.A. Ribeiro-Neto, and N. Ziviani, "Link-Based Similarity Measures for the Classification of Web Documents," J. Am. Society for Information Science, vol. 57, no. 2, pp. 208-221, Jan. 2006.
[16] P.T. Chang, K.C. Hung, K.P. Lin, and C.H. Chang, "A Comparison of Discrete Algorithms for Fuzzy Weighted Average," IEEE Trans. Fuzzy Systems, vol. 14, no. 5, pp. 663-675, Oct. 2006.
[17] R. Clarke, "Human Identification in Information Systems: Management Challenges and Public Policy Issues," IT and People, vol. 7, no. 4, pp. 6-37, Dec. 1994.
[18] P. Dague, "Qualitative Reasoning: A Survey on Techniques and Applications," AI Comm., vol. 8, no. 3/4, pp. 119-192, 1995.
[19] E. Cox, The Fuzzy Systems Handbook, Academic Press, 1994.
[20] I. Fellegi and A. Sunter, "A Theory for Record Linkage," J. Am. Statistical Assoc., vol. 64, pp. 1183-1210, Sept. 1969.
[21] K. Forbus, "Qualitative Reasoning," CRC Handbook of Computer Science and Engineering, CRC Press, 1996.
[22] F. Fouss, A. Pirotte, J.M. Renders, and M. Saerens, "Random-Walk Computation of Similarities Between Nodes of a Graph with Application to Collaborative Recommendation," IEEE Trans. Knowledge and Data Eng., vol. 19, no. 3, pp. 355-369, Mar. 2007.
[23] X. Fu and Q. Shen, "Fuzzy Compositional Modeling," IEEE Trans. Fuzzy Systems, vol. 18, no. 4, pp. 823-840, Aug. 2010.
[24] L. Getoor and C.P. Diehl, "Link Mining: A Survey," ACM SIGKDD Explorations Newsletter, vol. 7, no. 2, pp. 3-12, Dec. 2005.
[25] R. H'olzer, B. Malin, and L. Sweeney, "Email Alias Detection Using Social Network Analysis," Proc. Int'l Workshop Link Discovery, pp. 52-57, 2005.
[26] P. Hsiung, A. Moore, D. Neill, and J. Schneider, "Alias Detection in Link Data Sets," Proc. Int'l Conf. Intelligence Analysis, 2005.
[27] N. Iam-on, T. Boongoen, and S. Garrett, "LCE: A Link-Based Cluster Ensemble Method for Improved Gene Expression Data Analysis," Bioinformatics, vol. 26, no. 12, pp. 1513-1519, May 2010.
[28] G. Jeh and J. Widom, "SimRank: A Measure of Structural-Context Similarity," Proc. Eighth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 538-543, 2002.
[29] J. Jonas and J. Harper, "Effective Counterterrorism and the Limited Role of Predictive Data Mining," Policy Analysis, Cato Inst., Washington, D.C., Dec. 2006.
[30] D.V. Kalashnikov and S. Mehrotra, "Domain-Independent Data Cleaning via Analysis of Entity-Relationship Graph," ACM Trans. Database Systems, vol. 31, no. 2, pp. 716-767, June 2006.
[31] J. De Kleer and J.S. Brown, "A Qualitative Physics Based on Confluences," Artificial Intelligence, vol. 24, pp. 7-83, Dec. 1984.
[32] D.H. Kraft, G. Bordogna, and G. Pasi, "Information Retrieval Systems: Where is the Fuzz?," Proc. IEEE Int'l Conf. Fuzzy Systems, pp. 1367-1372, 1998.
[33] D. Liben-Nowell and J. Kleinberg, "The Link-Prediction Problem for Social Networks," J. Am. Society for Information Science and Technology, vol. 58, no. 7, pp. 1019-1031, Mar. 2007.
[34] Z. Lin, I. King, and M.R. Lyu, "Pagesim: A Novel Link-Based Similarity Measure for the World Wide Web," Proc. Int'l Conf. Web Intelligence, pp. 687-693, 2006.
[35] D.P. Lyras, K.N. Sgarbas, and N.D. Fakotakis, "Applying Similarity Measures for Automatic Lemmalization: A Case Study for Modern Greek and English," Int'l J. Artificial Intelligence Tools, vol. 17, no. 5, pp. 1043-1064, Sept. 2008.
[36] B. Malin, E. Airoldi, and K.M. Carley, "A Network Analysis Model for Disambiguation of Names in Lists," Computational and Math. Organization Theory, vol. 11, pp. 119-139, July 2005.
[37] E. Minkov, W.W. Cohen, and A.Y. Ng, "Contextual Search and Name Disambiguation in Email Using Graphs," Proc. Int'l Conf. Research and Development in Information Retrieval, pp. 27-34, 2006.
[38] C. Monson, "A Framework for Unsupervised Natural Language Morphology Induction," Proc. ACL Workshop Student Research, pp. 67-72, 2004.
[39] T. Murata and S. Moriyasu, "Link Prediction Based on Structural Properties of Online Social Networks," New Generation Computing, vol. 26, pp. 245-257, Jume 2008.
[40] G. Navarro, "A Guided Tour to Approximate String Matching," ACM Computing Surveys, vol. 33, no. 1, pp. 31-88, Mar. 2001.
[41] M.E.J. Newman, "The Structure and Function of Complex Networks," SIAM Rev., vol. 45, no. 2, pp. 167-256, June 2003.
[42] P. Pantel, "Alias Detection in Malicious Environments," Proc. AAAI Fall Symp. Capturing and Using Patterns for Evidence Detection, pp. 14-20, 2006.
[43] H. Pasula, B. Marthi, B. Milch, S. Russell, and I. Shpitser, "Identity Uncertainty and Citation Matching," Advances in Neural Information Processing Systems, vol. 15, pp. 1425-1432, Dec. 2003.
[44] R.L. Popp and J. Yen, Emergent Information Technologies and Enabling Policies for Counter-Terrorism. Wiley, 2006.
[45] O. Raiman, "Order of Magnitude Reasoning," Artificial Intelligence, vol. 51, nos. 1-3, pp. 11-38, Oct. 1991.
[46] P. Reuther and B. Walter, "Survey on Test Collections and Techniques for Personal Name Matching," Int'l J. Metadata, Semantics and Ontologies, vol. 1, no. 2, pp. 89-99, Jan. 2006.
[47] G. Salton and M.J. McGill, Introduction to Modern Information Retrieval. McGraw-Hill, New York, 1983.
[48] S. Sarawagi and A. Bhamidipaty, "Interactive Deduplication Using Active Learning," Proc. Eighth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 269-278, 2002.
[49] P. Schone and D. Jurafsky, "Knowldedge-Free Induction of Morphology Using Latent Semantic Analysis," Proc. Conf. Computational Natural Language Learning, pp. 67-72, 2000.
[50] T.E. Senator, "On the Efficacy of Data Mining for Security Applications.," Proc. KDD Workshop CyberSecurity and Intelligence Informatics, pp. 75-83, 2009.
[51] Q. Shen and R. Leitch, "On Extending the Quantity Space in Qualitative Reasoning," AI in Eng., vol. 7, pp. 167-173, 1992.
[52] L. Travé-Massuyès and P. Dague, Modèles et Raisonnements Qualitatifs, Lavoisier, Hermes Science, 2003.
[53] L. Travé-Massuyès and N. Piera, "The Orders of Magnitude Models as Qualitative Algebras," Proc. Int'l Joint Conf. Artificial Intelligence, pp. 1261-1266, 1989.
[54] G.A. Wang, H. Chen, J.J. Xu, and H. Atabakhsh, "Automatically Detecting Criminal Identity Deception: An Adaptive Detection Algorithm," IEEE Trans. Systems, Man and Cybernetics, Part A, vol. 36, no. 5, pp. 988-999, Sept. 2006.
[55] S. Wasserman and K. Faust, Social Network Analysis: Methods and Applications, Cambridge Univ. Press, 1994.
[56] R.R. Yager, "Intelligent Social Network Analysis Using Granular Computing," Int'l J. Intelligent Systems, vol. 23, pp. 1197-1220, Nov. 2008.
[57] D. Yarowsky and R. Wicentowski, "Minimally Supervised Morphological Analysis by Multimodal Alignment," Proc. ACL Workshop Student Research, pp. 207-216, 2000.
[58] L.A. Zadeh, "Fuzzy Sets," Information and Control, vol. 8, pp. 338-353, June 1965.
[59] L.A. Zadeh, "Fuzzy Logic = Computing with Words," IEEE Trans. Fuzzy Systems, vol. 4, no. 2, pp. 103-111, May 1996.
[60] S.M. Zhou and J.Q. Gan, "Constructing Accurate and Parsimonious Fuzzy Models with Distinguishable Fuzzy Sets Based on an Entropy Measure," Fuzzy Sets and Systems, vol. 157, no. 8, pp. 1057-1074, Apr. 2006.
[61] H.J. Zimmermann, Fuzzy Set Theory And Its Applications. Kluwer, 2001.

Index Terms:
Orders-of-magnitude reasoning, fuzzy set, link analysis, similarity measure, alias detection, intelligence data.
Citation:
Qiang Shen, Tossapon Boongoen, "Fuzzy Orders-of-Magnitude-Based Link Analysis for Qualitative Alias Detection," IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 4, pp. 649-664, April 2012, doi:10.1109/TKDE.2010.255
Usage of this product signifies your acceptance of the Terms of Use.