DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2013.33
Massimiliano Albanese , George Mason University, Fairfax
Cristian Molinaro , University of Maryland, College Park
Fabio Persia , Universita di Napoli, Napoli
Antonio Picariello , Universita di Napoli, Napoli
V. S. Subrahmanian , University of Maryland, College Park
There are numerous applications where we want to discover unexpected activities in a sequence of time-stamped observation data---for instance, we may want to detect inexplicable events in transactions at a web site or in video surveillance of an airport tarmac. In this paper, we start with a known set A of activities (both innocuous and dangerous) that we wish to monitor. However, in addition, we wish to identify "unexplained" subsequences in a sequence of observations that are poorly explained by A (e.g., because they may contain occurrences of activities that have never been seen or anticipated before, i.e. they are not in A). We formally define the probability that a sequence of observations is unexplained totally or partially w.r.t. A. We develop efficient algorithms to identify the top-k Totally and Partially Unexplained Sequences w.r.t. A. These algorithms leverage a set of theorems that enable us to speed up the search for totally/partially unexplained sequences. We describe experiments using real-world datasets in the video and cyber security domains showing that our approach works well in practice in terms of both running time and accuracy.
Knowledge base management, Computing Methodologies, Artificial Intelligence, Knowledge Representation Formalisms and Methods
A. Picariello, F. Persia, C. Molinaro, M. Albanese and V. S. Subrahmanian, "Discovering the Top-k "Unexplained" Sequences in Time-Stamped Observation Data," in IEEE Transactions on Knowledge & Data Engineering.