Subscribe
Issue No.03 - March (2009 vol.31)
pp: 563-569
Jorge Silva , Duke University, Durham
Rebecca Willett , Duke University, Durham
ABSTRACT
This paper addresses the problem of detecting anomalous multivariate co-occurrences using a limited number of unlabeled training observations. A novel method based on using a hypergraph representation of the data is proposed to deal with this very high-dimensional problem. Hypergraphs constitute an important extension of graphs which allow edges to connect more than two vertices simultaneously. A variational Expectation-Maximization algorithm for detecting anomalies directly on the hypergraph domain without any feature selection or dimensionality reduction is presented. The resulting estimate can be used to calculate a measure of anomalousness based on the False Discovery Rate. The algorithm has $O(np)$ computational complexity, where $n$ is the number of training observations and $p$ is the number of potential participants in each co-occurrence event. This efficiency makes the method ideally suited for very high-dimensional settings, and requires no tuning, bandwidth or regularization parameters. The proposed approach is validated on both high-dimensional synthetic data and the Enron email database, where $p &#x003E; 75,000$, and it is shown that it can outperform other state-of-the-art methods.
INDEX TERMS
Anomaly detection, Co-occurrence data, Unsupervised learning, Variational methods, False Discovery Rate
CITATION
Jorge Silva, Rebecca Willett, "Hypergraph-Based Anomaly Detection of High-Dimensional Co-Occurrences", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.31, no. 3, pp. 563-569, March 2009, doi:10.1109/TPAMI.2008.232
REFERENCES
 [1] A. Ozgur, B. Cetin, and H. Bingol, “Co-Occurrence Network of Reuters News,” http://arxiv.org/abs0712.2491, Dec. 2007. [2] A. Globerson, G. Chechik, F. Pereira, and N. Tishby, “Euclidean Embedding of Co-Occurrence Data,” J. Machine Learning Research, vol. 8, pp. 2265-2295, 2007. [3] N. Jhanwar, S. Chaudhuri, G. Seetharaman, and B. Zavidovique, “Content Based Image Retrieval Using Motif Cooccurrence Matrix,” Proc. Fourth Indian Conf. Computer Vision, Graphics and Image Processing, vol. 22, no. 14, pp. 1211-1220, 2004. [4] E. Garcia, “Targeting Documents and Terms: Using Co-Occurrence Data, Answer Sets and Probability Theory,” http://www.miislita.com/semanticsc-index-3.html , May 2008. [5] M. Li, B. Dias, W. El-Deredy, and P.J.G. Lisboa, “A Probabilistic Model for Item-Based Recommender Systems,” Proc. ACM Int'l Conf. Recommender Systems), 2007. [6] H. Li and N. Abe, “Word Clustering and Disambiguation Based on Co-Occurrence Data,” Proc. 19th Int'l Conf. Computational Linguistics, 2002. [7] M. Rabbat, M. Figueiredo, and R. Nowak, “Network Inference from Co-Occurrences,” IEEE Trans. Information Theory, vol. 54, no. 9, pp. 4053-4068, 2008. [8] P.D. Hoff, A.E. Raftery, and M.S. Handcock, “Latent Space Approaches to Social Network Analysis,” J. Am. Statistical Assoc., vol. 97, no. 460, pp. 1090-1099, 2002. [9] M.E.J. Newman, “The Structure and Function of Complex Networks,” SIAM Rev., vol. 45, pp. 167-256, 2003. [10] T. Hofmann and J. Puzicha, “Statistical Models for Co-Occurrence Data,” Technical Report AIM-1625, Massachusetts Inst. of Technology, citeseer. ist.psu.edu/articlehofmann98statistical.html , 1998. [11] C. Berge, Hypergraphs: Combinatorics of Finite Sets. North Holland, 1989. [12] W. Lee and S. Stolfo, “Data Mining Approaches for Intrusion Detection,” Proc. Seventh Usenix Security Symp., 1998. [13] N. Ye and Q. Chen, “An Anomaly Detection Technique Based on a Chi-Square Statistic for Detecting Intrusions into Information Systems,” Quality and Reliability Eng. Int'l, vol. 17, pp. 105-112, 2001. [14] A. Lazarevic, L. Ertoz, V. Kumar, A. Ozgur, and J. Srivastava, “A Comparative Study of Anomaly Detection Schemes in Network Intrusion Detection,” Proc. Third SIAM Int'l Conf. Data Mining, May 2003. [15] T. Ahmed, B. Oreshkin, and M. Coates, “Machine Learning Approaches to Network Anomaly Detection,” Proc. Second Workshop Tackling Computer Systems Problems with Machine Learning, Apr. 2007. [16] B. Schölkopf, J.C. Platt, J. Shawne-Taylor, A.J. Smola, and R.C. Williamson, “Estimating the Support of a High-Dimensional Distribution,” Neural Computation, vol. 13, pp. 1443-1471, 2001. [17] E. Eskin, A. Arnold, M. Prerau, L. Portnoy, and S. Stolfo, “A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data,” Applications of Data Mining in Computer Security, D.Barbara and S. Jajodia, eds., chapter 4, Kluwer Academic, 2002. [18] J. Aitchison and C.G.G. Aitken, “Multivariate Binary Discrimination by the Kernel Method,” Biometrika, vol. 63, pp. 413-420, 1976. [19] D.W. Scott, Multivariate Density Estimation: Theory, Practice, and Visualization. John Wiley & Sons, 1992. [20] C. Scott and E. Kolaczyk, “Nonparametric Assessment of Contamination in Multivariate Data Using Minimum Volume Sets and FDR,” technical report, Univ. of Michigan, 2007. [21] A.O. Hero, “Geometric Entropy Minimization (GEM) for Anomaly Detection and Localization,” Advances in Neural Information Processing Systems, 2007. [22] J. Storey, “The Positive False Discovery Rate: A Bayesian Interpretation of the $q$ -Value,” Annals of Statistics, vol. 31, no. 6, pp. 2013-2035, 2003. [23] R. El-Yaniv and M. Nisenson, “Optimal Single-Class Classification Strategies,” Advances in Neural Information Processing Systems, 2007. [24] A. McCallum and K. Nigam, “A Comparison of Event Models for Naïve Bayes Text Classification,” Proc. AAAI Workshop Learning for Text Categorization, Technical Report WS-98-05, 1998. [25] K. Humphreys and D.M. Titterington, “Improving the Mean-Field Approximation in Belief Networks Using Bahadur's Reparameterisation of the Multivariate Binary Representation,” Neural Processing Letters, vol. 12, pp. 183-197, 2000. [26] M.J. Wainwright and M.I. Jordan, “Graphical Models, Exponential Families, and Variational Inference,” technical report, Dept. of Statistics, Univ. of California, Berkeley, 2003. [27] G. Beylkin, J. Garcke, and M.J. Mohlenkamp, “Multivariate Regression and Machine Learning with Sums of Separable Functions,” submitted, 2007. [28] G. McLachlan and D. Peel, Finite Mixture Models. John Wiley & Sons, 2000. [29] G. McLachlan and T. Krishnan, The EM Algorithm and Extensions. Wiley-Interscience, 1996. [30] J.H. Wolfe, “Pattern Clustering by Multivariate Mixture Analysis,” Multivariate Behavioral Research, vol. 5, pp. 329-350, 1970. [31] W.K. Hastings, “Monte Carlo Sampling Methods Using Markov Chains and Their Applications,” Biometrika, vol. 57, no. 1, pp. 97-109, 1970. [32] N. Atienza, J. García-Heras, J.M. Muñoz-Pichardo, and R. Villa, “On the Consistency of MLE in Finite Mixture Models of Exponential Families,” J.Statistical Planning and Inference, vol. 137, pp. 496-505, 2007. [33] D.M. Titterington, A.F.M. Smith, and U.E. Makov, Statistical Analysis of Finite Mixture Distributions. John Wiley & Sons, 1985. [34] R.A. Redner and H.F. Walker, “Mixture Densities, Maximum Likelihood and the EM Algorithm,” SIAM Rev., vol. 26, pp. 195-239, 1984. [35] J. Silva and R. Willett, “Hypergraph-Based Anomaly Detection in Very Large Networks,” Technical Report ECE-2008-01, Duke Univ., 2008. [36] B. Klimt and Y. Yang, “The Enron Corpus: A New Dataset for E-Mail Classification Research,” Proc. 15th European Conf. Machine Learning, 2004. [37] R. Abelson, “Enron's Many Strands: Ex-Chief's Holdings; Putting 'Lost Everything' in Perspective,” New York Times, Jan. 2002. [38] C.-C. Chang and C.-J. Lin, “LIBSVM: A Library for Support Vector Machines,” http://www.csie.ntu.edu.tw/cjlinlibsvm, 2001. [39] A. Ng and M.I. Jordan, “On Discriminative versus Generative Classifiers: A Comparison of Logistic Regression and Naive Bayes,” Advances in Neural Information Processing Systems, 2002.