Subscribe
Issue No.08 - August (2008 vol.20)
pp: 1039-1052
ABSTRACT
An important problem in the area of homeland security is to identify suspicious entities in large datasets. Although there are methods from knowledge discovery and data mining (KDD) focusing on finding anomalies in numerical datasets, there has been little work aimed at discovering suspicious instances in large and complex semantic graphs whose nodes are richly connected with many different types of links. In this paper, we describe a novel, domain independent and unsupervised framework to identify such instances. Besides discovering suspicious instances, we believe that to complete the process, a system has to convince the users by providing understandable explanations for its findings. Therefore, in the second part of the paper we describe several explanation mechanisms to automatically generate human understandable explanations for the discovered results. To evaluate our discovery and explanation systems, we perform experiments on several different semantic graphs. The results show that our discovery system outperforms the state-of-the-art unsupervised network algorithms used to analyze the 9/11 terrorist network by a large margin. Additionally, the human study we conducted demonstrates that our explanation system, which provides natural language explanations for its findings, allowed human subjects to perform complex data analysis in a much more efficient and accurate manner.
INDEX TERMS
Data mining, Natural language, Natural Language Processing, Security, Semantic networks, Graphs and networks
CITATION
Shou-de Lin, Hans Chalupsky, "Discovering and Explaining Abnormal Nodes in Semantic Graphs", IEEE Transactions on Knowledge & Data Engineering, vol.20, no. 8, pp. 1039-1052, August 2008, doi:10.1109/TKDE.2007.190691
REFERENCES
 [1] M. Sparrow, “The Application of Network Analysis to Criminal Intelligence: An Assessment of the Prospects,” Social Networks, vol. 13, pp. 251-274, 1991. [2] T. Senator and H. Goldberg, “Break Detection Systems,” Handbook of Knowledge Discovery and Data Mining, W. Kloesgen and J.Zytkow, eds., pp. 863-873, Oxford Univ. Press, 2002. [3] D. Jensen, M. Rattigan, and H. Blau, “Information Awareness: A Prospective Technical Assessment,” Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '03), pp. 378-387, 2003. [4] J. Wang, F. Wang, and D. Zeng, “Rule$+$ Exception Learning-Based Class Specification and Labeling in Intelligence and Security Analysis,” Proc. Workshop Intelligence and Security Informatics (WISI '06), vol. 3917, pp. 181-182, 2006. [5] V. Krebs, “Mapping Networks of Terrorist Cells,” Connections, vol. 24, no. 3, pp. 43-52, 2001. [6] J. Qin, J. Xu, D. Hu, M. Sageman, and H. Chen, “Analyzing Terrorist Networks: A Case Study of the Global Salafi Jihad Network,” Intelligence and Security Informatics, P. Kantor et al., ed., pp. 287-304, Springer, 2005. [7] “Total Information Overload,” Scientific Am., editorial, Mar. 2003. [8] B. Simons and E. Spafford, Letter from the Association for Computing Machinery's US Public Policy Committee to the Senate Armed Services Committee, Jan. 2003. [9] R. Hill, “Non-Well-Founded Set Theory and the Circular Semantics of Semantic Networks,” Intelligent Systems: Third Golden West International Conference: Edited and Selected Papers, E.Yfantis,ed., pp. 375-386, Kluwer, 1995. [10] S. Lin, “Modeling, Finding, and Explaining Abnormal Instances in Multi-Relational Networks,” PhD dissertation, Dept. of Computer Science, Univ. of Southern California, 2006. [11] S. Ramaswamy, R. Rastogi, and K. Shim, “Efficient Algorithms for Mining Outliers from Large Data Sets,” Proc. ACM SIGMOD '00, pp. 427-438, 2000. [12] S. Lin and H. Chalupsky, “Issues of Verification for Unsupervised Discovery Systems,” Proc. Workshop Link Analysis and Group Detection (LinkKDD '04), 2004. [13] R. Schrag, “A Performance Evaluation Laboratory for Automated Threat Detection Technologies,” Proc. Performance Measures for Intelligent Systems Workshop (PerMIS '06), 2006. [14] L. Page, S. Brin, R. Motwani, and T. Winograd, “The PageRank Citation Ranking: Bringing Order to the Web,” Proc. Seventh Int'l World Wide Web Conf. (WWW '98), pp. 161-172, 1998. [15] J. Kleinberg, “Authoritative Sources in a Hyperlinked Environment,” J. ACM, vol. 46, no. 5, pp. 604-632, 1999. [16] S. Wasserman and K. Faust, Social Network Analysis: Methods and Applications. Cambridge Univ. Press, 1994. [17] S. Dongen, “A Cluster Algorithm for Graphs,” Technical Report INS-R0010, Nat'l Research Inst. for Math. and Computer Science, Amsterdam, The Netherlands, 2000. [18] J. Adibi, H. Chalupsky, E. Melz, and A. Valente, “The KOJAK Group Finder: Connecting the Dots via Integrated Knowledge-Based and Statistical Reasoning,” Proc. 16th Innovative Applications of Artificial Intelligence Conf. (IAAI '04), pp. 800-807, 2004. [19] S. Lin, “Generating Natural Language Descriptions for Paths in the Semantic Network,” final project report, Dept. of Linguistics, Univ. of Southern California, 2006. [20] R. Popp, “Countering Terrorism through Information Technology,” Comm. ACM, vol. 47, no. 3, pp. 36-43, 2004. [21] H. Chen and F. Wang, “Artificial Intelligence for Homeland Security,” IEEE Intelligent Systems, vol. 20, no. 5, pp. 12-16, 2005. [22] J. Xu and H. Chen, “Criminal Network Analysis and Visualization,” Comm. ACM, vol. 48, no. 6, pp. 101-107, 2005. [23] H. Chen and J. Xu, “Intelligence and Security Informatics for National Security: A Knowledge Discovery Perspective,” Ann. Rev. of Information Science and Technology, vol. 40, pp. 229-289, 2006. [24] B. Taskar, M.-F. Wong, P. Abbeel, and D. Koller, “Link Prediction in Relational Data,” Advances Neural Information Processing Systems —Proc. 17th Ann. Conf. Neural Information Processing Systems (NIPS '03), 2004. [25] S. Adafre and M. Rijke, “Discovering Missing Links in Wikipedia,” Proc. Workshop Link Discovery: Issues, Approaches and Applications (LinkKDD '05), 2005. [26] R. Bunescu and R. Mooney, “Relational Markov Networks for Collective Information Extraction,” Proc. 21st Int'l Conf. Machine Learning (ICML '04), 2004. [27] J. Neville and D. Jensen, “Dependency Networks for Relational Data,” Proc. Fourth IEEE Int'l Conf. Data Mining (ICDM '04), 2004. [28] M. Jaeger, “Relational Bayesian Networks,” Proc. 13th Ann. Conf. Uncertainty in Artificial Intelligence (UAI '97), 1997. [29] S. Lin and H. Chalupsky, “Unsupervised Link Discovery in Multi-Relational Data via Rarity Analysis,” Proc. Third IEEE Int'l Conf. Data Mining (ICDM '03), pp. 171-178, 2003. [30] S. Lin and H. Chalupsky, “Using Unsupervised Link Discovery Methods to Find Interesting Facts and Connections in a Bibliography Dataset,” SIGKDD Explorations, vol. 5, no. 2, pp.173-178, Dec. 2003. [31] S. Dzeroski, “Data Mining in a Nutshell,” Relational Data Mining, S. Dzeroski and N. Lavrac, eds. pp. 1-27, Springer, 2001. [32] K. Morik, “Detecting Interesting Instances,” Proc. ESF Exploratory Workshop Pattern Detection and Discovery, 2002. [33] F. Angiulli, R. Ben-Eliyahu-Zohary, and L. Palopoli, “Outlier Detection Using Default Logic,” Proc. 18th Int'l Joint Conf. Artificial Intelligence (IJCAI '03), pp. 833-838, 2003. [34] P. Flach and N. Lachiche, “Confirmation-Guided Discovery of First-Order Rules with Tertius,” Machine Learning, vol. 42, nos. 1-2, pp. 61-95, 2001. [35] S. Wrobel, “An Algorithm for Multi-Relational Discovery of Subgroups,” Proc. First European Symp. Principles of Data Mining and Knowledge Discovery (PKDD '97), pp. 78-87, 1997. [36] W. Klosgen, “Explora: A Multipattern and Multistrategy Discovery Assistant,” Advances in Knowledge Discovery and Data Mining, U. Fayyad et al., ed., pp. 249-271, AAAI/MIT Press, 1996. [37] K. Yamanishi and J. Takeuchi, “Discovering Outlier Filtering Rules from Unlabeled Data: Combining a Supervised Learner with an Unsupervised Learner,” Proc. Seventh ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '01), pp. 389-394, 2001. [38] Y. Yao, Y. Zhao, and R. Maguire, “Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms,” Proc. Canadian Conf. Artificial Intelligence, pp. 527-531, 2003. [39] Y. Yao, Y. Zhao, and R. Maguire, “Explanation Oriented Association Mining Using Rough Set Theory,” Proc. Ninth Int'l Conf. Rough Sets, Fuzzy Sets, Data Mining, and Granular-Soft Computing (RSFDGrC '03), pp. 165-172, 2003.