Issue No. 08 - August (2008 vol. 20)
An important problem in the area of homeland security is to identify suspicious entities in large datasets. Although there are methods from knowledge discovery and data mining (KDD) focusing on finding anomalies in numerical datasets, there has been little work aimed at discovering suspicious instances in large and complex semantic graphs whose nodes are richly connected with many different types of links. In this paper, we describe a novel, domain independent and unsupervised framework to identify such instances. Besides discovering suspicious instances, we believe that to complete the process, a system has to convince the users by providing understandable explanations for its findings. Therefore, in the second part of the paper we describe several explanation mechanisms to automatically generate human understandable explanations for the discovered results. To evaluate our discovery and explanation systems, we perform experiments on several different semantic graphs. The results show that our discovery system outperforms the state-of-the-art unsupervised network algorithms used to analyze the 9/11 terrorist network by a large margin. Additionally, the human study we conducted demonstrates that our explanation system, which provides natural language explanations for its findings, allowed human subjects to perform complex data analysis in a much more efficient and accurate manner.
Data mining, Natural language, Natural Language Processing, Security, Semantic networks, Graphs and networks
H. Chalupsky and S. Lin, "Discovering and Explaining Abnormal Nodes in Semantic Graphs," in IEEE Transactions on Knowledge & Data Engineering, vol. 20, no. , pp. 1039-1052, 2007.