This Article 
 Bibliographic References 
 Add to: 
Higher Order Naïve Bayes: A Novel Non-IID Approach to Text Classification
July 2011 (vol. 23 no. 7)
pp. 1022-1034
Murat Can Ganiz, Dogus University, Istanbul
Cibin George, Rutgers University, Piscataway
William M. Pottenger, Rutgers University, Piscataway
The underlying assumption in traditional machine learning algorithms is that instances are Independent and Identically Distributed (IID). These critical independence assumptions made in traditional machine learning algorithms prevent them from going beyond instance boundaries to exploit latent relations between features. In this paper, we develop a general approach to supervised learning by leveraging higher order dependencies between features. We introduce a novel Bayesian framework for classification termed Higher Order Naïve Bayes (HONB). Unlike approaches that assume data instances are independent, HONB leverages higher order relations between features across different instances. The approach is validated in the classification domain on widely used benchmark data sets. Results obtained on several benchmark text corpora demonstrate that higher order approaches achieve significant improvements in classification accuracy over the baseline methods, especially when training data is scarce. A complexity analysis also reveals that the space and time complexity of HONB compare favorably with existing approaches.

[1] B. Taskar, P. Abbeel, and D. Koller, "Discriminative Probabilistic Models for Relational Data," Proc. 18th Ann. Conf. Uncertainty in Artificial Intelligence (UAI '02), 2002.
[2] L. Getoor and C.P. Diehl, "Link Mining: A Survey," SIGKDD Explorations, vol. 7, no. 2, pp. 3-12, 2005.
[3] S.A. Macskassy and F. Provost, "A Brief Survey of Machine Learning Methods for Classification in Networked Data and Application to Suspicion Scoring," Proc. Workshop Statistical Network Analysis at 23rd Int'l Conf. Machine Learning, 2006.
[4] R. Angelova and G. Weikum, "Graph-Based Text Classification: Learn from Your Neighbors," Proc. ACM SIGIR '06, 2006.
[5] S. Chakrabarti, B. Dom, and P. Indyk, "Enhanced Hypertext Classification Using Hyper-Links," Proc. ACM SIGMOD '98, pp. 307-318, 1998.
[6] J. Neville and D. Jensen, "Iterative Classification in Relational Data," Proc. Workshop on Statistical Relational Learning, 17th Nat'l Conf. Artificial Intelligence, pp. 42-49, 2000.
[7] B. Taskar, E. Segal, and D. Koller, "Probabilistic Classification and Clustering in Relational Data," Proc. Int'l Joint Conf. Artificial Intelligence (IJCAI '01), pp. 870-878, 2001.
[8] Q. Lu and L. Getoor, "Link-Based Classification," Proc. Int'l Conf. Machine Learning (ICML '03), pp. 496-503, 2003.
[9] A. Kontostathis and W.M. Pottenger, "A Framework for Understanding LSI Performance," Information Processing and Management, vol. 42, no. 1, pp. 56-73, 2006.
[10] S. Deerwester, S.T. Dumais, and R. Harshman, "Indexing by Latent Semantic Analysis," J. the Am. Soc. for Information Science, vol. 41, no. 6, pp. 391-407, 1990.
[11] N. Slonim and N. Tishby, "The Power of Word Clusters for Text Classification," Proc. 23rd European Colloquium on Information Retrieval Research, pp. 191-200, 2001.
[12] D. Jensen, J. Neville, and B. Gallagher, "Why Collective Inference Improves Relational Classification," Proc. ACM SIGKDD '04, 2004.
[13] P. Domingos and M. Pazzani, "Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier," Proc. Int'l Conf. Machine Learning (ICML '96), pp. 105-112, 1996.
[14] G.I. Webb, J.R. Boughton, and Z. Wang, "Not So Naive Bayes: Aggregating One-Dependence Estimators," Machine Learning, vol. 58, no. 1, pp. 5-24, 2005.
[15] R. Diestel, Graph Theory. Springer Press, 2000.
[16] J.H. Van Lint and R.M.A. Wilson, A Course in Combinatorics. Cambridge Univ. Press, 1993.
[17] A.K. McCallum and K. Nigam, "A Comparison of Event Models for Naive Bayes Text Classification," Proc. AAAI/ICML-98 Workshop Learning for Text Categorization, pp. 41-48, 1998, also Technical Report WS-98-05, 1998.
[18] S. Chakrabarti, Mining the Web: Discovering Knowledge from Hypertext Data. Morgan Kaufmann, 2002.
[19] J. Stirling, Methodus Differentialis. William Boywer, 1730.
[20] L.E. Holzman, T.A. Fisher, L.M. Galitsky, A. Kontostathis, and W.M. Pottenger, "A Software Infrastructure for Research in Textual Data Mining," Int'l J. Artificial Intelligence Tools, vol. 14, no. 4, pp. 829-849, 2004.
[21] A.K. McCallum, K. Nigam, J. Rennie, and K. Seymore, "Automating the Construction of Internet Portals with Machine Learning," Information Retrieval, vol. 3, pp. 127-163, 2000.
[22] M. Craven, D. DiPasquo, D. Freitag, A. McCallum, K. Nigam, and S. Slattery, "Learning to Extract Symbolic Knowledge from the World Wide Web," Proc. 15th Nat'l Conf. Artificial Intelligence Applications of Artificial Intelligence (AAAI '98), 1998.
[23] P. Sen and L. Getoor, "Link-Based Classification," Technical Report CS-TR-4858, Univ. of Maryland, Feb. 2007.
[24] D.R. Swanson, "Migraine and Magnesium: Eleven Neglected Connections," Perspectives in Biology and Medicine, vol. 31, no. 4, pp. 526-557, 1998.
[25] S. Zelikovitz and H. Hirsh, "Using LSI for Text Classification in the Presence of Background Text," Proc. Conf. Information and Knowledge Management (CIKM '01), pp. 113-118, 2001.
[26] S. Zelikovitz and H. Hirsh, "Improving Short-Text Classification Using Unlabeled Background Knowledge to Assess Document Similarity," Proc. Int'l Conf. Machine Learning (ICML '00), pp. 1183-1190, 2000.
[27] J. Sun, Z. Chen, H. Zeng, Y. Lu, C. Shi, and W. Ma, "Supervised Latent Semantic Indexing for Document Categorization," Proc. Int'l Conf. Data Mining (ICDM '04), pp. 535-538, 2004.
[28] T. Liu, Z. Chen, B. Zhang, W. Ma, and G. Wu, "Improving Text Classification Using Local Latent Semantic Indexing," Proc. Int'l Conf. Data Mining (ICDM '04), pp. 162-169, 2004.
[29] S. Chakraborti, R. Mukras, R. Lothian, N. Wiratunga, S. Watt, and D. Harper, "Supervised Latent Semantic Indexing Using Adaptive Sprinkling," Proc. Int'l Joint Conf. Artifical Intelligence (IJCAI '07), pp. 1582-1587, 2007.
[30] F. Wild, C. Stahl, G. Stermsek, G. Neumann, and Y. Penya, "Parameters Driving Effectiveness of Automated Essay Scoring with LSA," Proc. Int'l Computer Assisted Assessment (CAA '05), pp. 485-494, 2005.
[31] F. Wild, "An LSA Package for R," Proc. First Int'l Conf. Latent Semantic Analysis in Technology Enhanced Learning (LSA-TEL'07), pp. 11-12, 2007.
[32] A. Karatzoglou, D. Meyer, and K. Hornik, "Support Vector Machines in R." J. Statistical Software, vol. 15, pp. 1-28, 2006.

Index Terms:
Machine learning, statistical relational learning, naïve bayes, text classification, IID.
Murat Can Ganiz, Cibin George, William M. Pottenger, "Higher Order Naïve Bayes: A Novel Non-IID Approach to Text Classification," IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 7, pp. 1022-1034, July 2011, doi:10.1109/TKDE.2010.160
Usage of this product signifies your acceptance of the Terms of Use.