Issue No. 03 - March (2009 vol. 31)
Jorge Silva , Duke University, Durham
Rebecca Willett , Duke University, Durham
This paper addresses the problem of detecting anomalous multivariate co-occurrences using a limited number of unlabeled training observations. A novel method based on using a hypergraph representation of the data is proposed to deal with this very high-dimensional problem. Hypergraphs constitute an important extension of graphs which allow edges to connect more than two vertices simultaneously. A variational Expectation-Maximization algorithm for detecting anomalies directly on the hypergraph domain without any feature selection or dimensionality reduction is presented. The resulting estimate can be used to calculate a measure of anomalousness based on the False Discovery Rate. The algorithm has $O(np)$ computational complexity, where $n$ is the number of training observations and $p$ is the number of potential participants in each co-occurrence event. This efficiency makes the method ideally suited for very high-dimensional settings, and requires no tuning, bandwidth or regularization parameters. The proposed approach is validated on both high-dimensional synthetic data and the Enron email database, where $p > 75,000$, and it is shown that it can outperform other state-of-the-art methods.
Anomaly detection, Co-occurrence data, Unsupervised learning, Variational methods, False Discovery Rate
R. Willett and J. Silva, "Hypergraph-Based Anomaly Detection of High-Dimensional Co-Occurrences," in IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 31, no. , pp. 563-569, 2008.