loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
To Select or To Weigh: A Comparative Study of Linear Combination Schemes for SuperParent-One-Dependence Estimators
December 2007 (vol. 19 no. 12)
pp. 1652-1665
We conduct a large-scale comparative study on linearly combining superparent-one-dependence estimators (SPODEs), a popular family of semi-naive Bayesian classifiers. Altogether 16 model selection and weighing schemes, 58 benchmark data sets, as well as various statistical tests are employed. This paper?s main contributions are three-fold. First, it formally presents each scheme?s definition, rationale and time complexity; and hence can serve as a comprehensive reference for researchers interested in ensemble learning. Second, it offers bias-variance analysis for each scheme?s classification error performance. Third, it identifies effective schemes that meet various needs in practice. This leads to accurate and fast classification algorithms with immediate and significant impact on real-world applications. Another important feature of our study is using a variety of statistical tests to evaluate multiple learning methods across multiple data sets.

[1] G. Brown, “The Ensemble Learning Bibliography,” http://www.cs.man.ac.uk/~gbrownensemblebib /, 2006.
[2] A.C. Achilles, “The Collection of Computer Science Bibliographies,” http://liinwww.ira.uka.de/bibliography/Neural EnsembleLearning.html, 2006.
[3] R. Caruana, A. Niculescu, G. Crew, and A. Ksikes, “Ensemble Selection from Libraries of Models,” Proc. 21st Int'l Conf. Machine Learning, 2004.
[4] L. Kuncheva, “Switching between Selection and Fusion in Combining Classifiers: An Experiment,” IEEE Trans. Systems, Man, and Cybernetics, vol. 32, no. 2, pp. 146-156, 2002.
[5] R.E. Banfield, L.O. Hall, K.W. Bowyer, and W.P. Kegelmeyer, “A Comparison of Decision Tree Ensemble Creation Techniques,” IEEE Trans. Pattern Analysis and Machine Learning, vol. 29, no. 1, pp. 173-180, Jan. 2007.
[6] E.J. Keogh and M.J. Pazzani, “Learning Augmented Bayesian Classifiers: A Comparison of Distribution-Based and Classification-Based Approaches,” Proc. Int'l Workshop Artificial Intelligence and Statistics, pp. 225-230, 1999.
[7] E.J. Keogh and M.J. Pazzani, “Learning the Structure of Augmented Bayesian Classifiers,” Int'l J. Artificial Intelligence Tools, vol. 11, no. 40, pp. 587-601, 2002.
[8] B. Cestnik, I. Kononenko, and I. Bratko, “Assistant 86: A Knowledge-Elicitation Tool for Sophisticated Users,” Proc. Second European Working Session on Learning (ESWL '87), pp. 31-45, 1987.
[9] P. Clark and T. Niblett, “The CN2 Induction Algorithm,” Machine Learning, vol. 3, pp. 261-283, 1989.
[10] B. Cestnik, “Estimating Probabilities: A Crucial Task in Machine Learning,” Proc. Ninth European Conf. Artificial Intelligence (ECAI '90), pp. 147-149, 1990.
[11] I. Kononenko, “Comparison of Inductive and Naive Bayesian Learning Approaches to Automatic Knowledge Acquisition,” Current Trends in Knowledge Acquisition, chapter, IOS Press, 1990.
[12] P. Langley, W. Iba, and K. Thompson, “An Analysis of Bayesian Classifiers,” Proc. 10th Nat'l Conf. Artificial Intelligence (AAAI '92), pp. 223-228, 1992.
[13] P. Domingos and M.J. Pazzani, “Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier,” Proc. 13th Int'l Conf. Machine Learning (ICML '96), pp. 105-112, 1996.
[14] P. Domingos and M.J. Pazzani, “On the Optimality of the Simple Bayesian Classifier under Zero-One Loss,” Machine Learning, vol. 29, pp. 103-130, 1997.
[15] H. Zhang, C.X. Ling, and Z. Zhao, “The Learnability of Naive Bayes,” Proc. Canadian Artificial Intelligence Conf., pp. 432-441, 2000.
[16] N. Friedman, D. Geiger, and M. Goldszmidt, “Bayesian Network Classifiers,” Machine Learning, vol. 29, no. 2, pp. 131-163, 1997.
[17] J. Kittler, “Feature Selection and Extraction,” Handbook of Pattern Recognition and Image Processing, T.Y. Young and K.S. Fu, eds., 1986.
[18] R. Kohavi, “Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid,” Proc. Second ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '96), pp. 202-207, 1996.
[19] I. Kononenko, “Semi-Naive Bayesian Classifier,” Proc. European Working Session on Machine Learning (EWSL '91), pp. 206-219, 1991.
[20] P. Langley, “Induction of Recursive Bayesian Classifiers,” Proc. European Conf. Machine Learning (ECML '93), pp. 153-164, 1993.
[21] P. Langley and S. Sage, “Induction of Selective Bayesian Classifiers,” Proc. 10th Ann. Conf. Uncertainty in Artificial Intelligence (UAI '94), pp. 399-406, 1994.
[22] M.J. Pazzani, “Constructive Induction of Cartesian Product Attributes,” ISIS: Information, Statistics and Induction in Science, pp. 66-77, 1996.
[23] M. Sahami, “Learning Limited Dependence Bayesian Classifiers,” Proc. Second ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '96), pp. 334-338, 1996.
[24] M. Singh and G.M. Provan, “Efficient Learning of Selective Bayesian Network Classifiers,” Proc. 13th Int'l Conf. Machine Learning (ICML '96), pp. 453-461, 1996.
[25] Z. Xie, W. Hsu, Z. Liu, and M.L. Lee, “SNNB: A Selective Neighborhood Based Naive Bayes for Lazy Learning,” Proc. Sixth Pacific-Asia Conf. Advances in Knowledge Discovery and Data Mining (PAKDD '02), pp. 104-114, 2002.
[26] Z. Zheng and G.I. Webb, “Lazy Learning of Bayesian Rules,” Machine Learning, vol. 41, no. 1, pp. 53-84, 2000.
[27] Z. Zheng, G.I. Webb, and K.M. Ting, “Lazy Bayesian Rules: A Lazy Semi-Naive Bayesian Learning Technique Competitive to Boosting Decision Trees,” Proc. 16th Int'l Conf. Machine Learning (ICML '99), pp. 493-502, 1999.
[28] G.I. Webb, “Candidate Elimination Criteria for Lazy Bayesian Rules,” Proc. 14th Australian Joint Conf. Artificial Intelligence, pp.545-556, 2001.
[29] G.I. Webb and M.J. Pazzani, “Adjusted Probability Naive Bayesian Induction,” Proc. 11th Australian Joint Conf. Artificial Intelligence, pp. 285-295, 1998.
[30] G.I. Webb, J. Boughton, and Z. Wang, “Not So Naive Bayes: Aggregating One-Dependence Estimators,” Machine Learning, vol. 58, no. 1, pp. 5-24, Jan. 2005.
[31] J. Cerquides and R.L. de Mántaras, “Robust Bayesian Linear Classifier Ensembles,” Proc. 16th European Conf. Machine Learning (ECML '05), pp. 72-83, 2005.
[32] L. De Ferrari, “Mining Housekeeping Genes with a Naive Bayes Classifier,” MSc thesis, School of Informatics, Univ. of Edinburgh, 2005.
[33] K. Flikka, L. Martens, J. Vandekerckhove, K. Gevaert, and I. Eidhammeri, “Improving Throughput and Reliability of Peptide Identifications through Spectrum Quality Evaluation,” Proc. Ninth Ann. Int'l Conf. Research in Computational Molecular Biology (RECOMB '05), 2005.
[34] A.P. Nikora, “Classifying Requirements: Towards a More Rigorous Analysis of Natural-Language Specifications,” Proc. 16th IEEE Int'l Symp. Software Reliability Eng. (ISSRE '05), pp. 291-300, 2005.
[35] Y. Yang, K. Korb, K.M. Ting, and G.I. Webb, “Ensemble Selection for Superparent-One-Dependence Estimators,” Proc. 18th Australian Joint Conf. Artificial Intelligence, 2005.
[36] F. Zheng and G.I. Webb, “A Comparative Study of Semi-Naive Bayes Methods in Classification Learning,” Proc. Fourth Australasian Data Mining Conf. (AusDM '05), pp. 141-156, 2005.
[37] S.B. Kotsiantis and P.E. Pintelas, “Logitboost of Simple Bayesian Classifier,” Informatica, vol. 29, pp. 53-59, 2005.
[38] H. Zhang, L. Jiang, and J. Su, “Hidden Naive Bayes,” Proc. 20th Nat'l Conf. Artificial Intelligence (AAAI '05), pp. 919-924, 2005.
[39] F. Birzele and S. Kramer, “A New Representation for Protein Secondary Structure Prediction Based on Frequent Patterns,” Bioinformatics, vol. 22, no. 21, pp. 2628-2634, 2006.
[40] J. Su and H. Zhang, “Representing Conditional Independence Using Decision Trees,” Proc. 20th Nat'l Conf. Artificial Intelligence (AAAI '05)), pp. 874-879, 2005.
[41] H. Zhang, L. Jiang, and J. Su, “Augmenting Naive Bayes for Ranking,” Proc. 22nd Int'l Conf. Machine Learning (ICML '05), pp.1020-1027, 2005.
[42] J. Abellan, “Application of Uncertainty Measures on Credal Sets on the Naive Bayesian Classifier,” Int'l J. General Systems, vol. 35, no. 6, pp. 675-686, 2006.
[43] H. Yin, G. Li, T.Y. Leong, V. Kuralmani, H. Pang, B.T. Ang, K.K. Lee, and I. Ng, “Experimental Analysis on Severe Head Injury Outcome Prediction—A Preliminary Study,” Technical Report TRD9/06, School of Computing, Nat'l Univ. of Singapore.
[44] F. Zheng and G.I. Webb, “Efficient Lazy Elimination for Averaged One-Dependence Estimators,” Proc. 23rd Int'l Conf. Machine Learning (ICML '06), pp. 1113-1120, 2006.
[45] D. Zeng, S. Zhang, Z. Cai, S. Jiang, and L. Jiang, “A Novel One-dependence Estimator Based on Multi-Parents,” Proc. Sixth Int'l Conf. Intelligent Systems Design and Applications (ISDA '06), pp. 639-643, 2006.
[46] M. Ceci, “Naive Bayesian Learning from Structural Data,” PhD dissertation, Dipartimento di Informatica, Univ. of Bari, Italy, 2005.
[47] Y. Yang, G. Webb, J. Cerquides, K. Korb, J. Boughton, and K.M. Ting, “To Select or to Weigh: A Comparative Study of Model Selection and Model Weighing for SPODE Ensembles,” Proc. 17th European Conf. Machine Learning (ECML '06), pp. 533-544, 2006.
[48] H. Akaike, “A New Look at the Statistical Model Identification,” IEEE Trans. Automatic Control, vol. 19, pp. 716-723, 1974.
[49] G. Schwarz, “Estimating the Dimension of a Model,” Annals of Statistics, vol. 6, pp. 461-465, 1978.
[50] J. Suzuki, “Learning Bayesian Belief Networks Based on the MDL Principle: An Efficient Algorithm Using the Branch and Bound Technique,” Proc. 13th Int'l Conf. Machine Learning (ICML '96), pp.463-470, 1996.
[51] K. Korb and A. Nicholson, Bayesian Artificial Intelligence. Chapman & Hall/CRC, 2004.
[52] G.F. Cooper and E. Herskovits, “A Bayesian Method for the Induction of Probabilistic Networks from Data,” Machine Learning, vol. 9, pp. 309-347, 1992.
[53] C.E. Shannon, “A Mathematical Theory of Communication,” Bell System Technical J., vol. 27, no. 3, pp. 379-423, 1948.
[54] J.A. Hoeting, D. Madigan, A.E. Raftery, and C.T. Volinsky, “Bayesian Model Averaging: A Tutorial,” Statistical Science, vol. 14, no. 4, pp. 382-417, 1999.
[55] P. Domingos, “Bayesian Averaging of Classifiers and the Overfitting Problem,” Proc. 17th Int'l Conf. Machine Learning (ICML '00), pp. 223-230, 2000.
[56] P. Pedregal, “Introduction to Optimization,” Texts in Applied Math., Springer, vol. 46, 2004.
[57] M.T. Heath, Scientific Computing: An Introductory Survey, second ed. McGraw-Hill, 2002.
[58] G. McLachlan and T. Krishnan, The EM Algorithm and Extensions. Wiley-Interscience, 1997.
[59] I.H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, second ed. Morgan Kaufmann, 2005.
[60] C.L. Blake and C.J. Merz, “UCI Repository of Machine Learning Databases,” Dept. of Information and Computer Science, Univ. of California, Irvine, http://www.ics.uci.edu/~mlearnmlrepository.html , 1998.
[61] G.I. Webb, “Multiboosting: A Technique for Combining Boosting and Wagging,” Machine Learning, vol. 40, no. 2, pp. 159-196, 2000.
[62] L. Breiman, “Bias, Variance and Arcing Classifiers,” Technical Report 460, Statistics Dept., Univ. of California, Berkeley, 1996.
[63] J.H. Friedman, “On Bias, Variance, 0/1-Loss, and the Curse-of-Dimensionality,” Data Mining and Knowledge Discovery, vol. 1, no. 1, pp. 55-77, 1997.
[64] R. Kohavi and D. Wolpert, “Bias Plus Variance Decomposition for Zero-One Loss Functions,” Proc. 13th Int'l Conf. Machine Learning (ICML '96), pp. 275-283, 1996.
[65] E.B. Kong and T.G. Dietterich, “Error-Correcting Output Coding Corrects Bias and Variance,” Proc. 12th Int'l Conf. Machine Learning (ICML '95), pp. 313-321, 1995.
[66] D.S. Moore and G.P. McCabe, Introduction to the Practice of Statistics, fourth ed. Michelle Julet, 2002.
[67] M. Friedman, “The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance,” J. Am. Statistical Assoc., vol. 32, pp. 675-701, 1937.
[68] M. Friedman, “A Comparison of Alternative Tests of Significance for the Problem of m Rankings,” Annals of Math. Statistics, vol. 11, pp. 86-92, 1940.
[69] J. Demsar, “Statistical Comparisons of Classifiers over Multiple Data Sets,” J. Machine Learning Research, vol. 7, pp. 1-30, 2006.
[70] T. Silander and P. Myllymaki, “A Simple Approach for Finding the Globally Optimal Bayesian Network Structure,” Proc. 22nd Ann. Conf. Uncertainty in Artificial Intelligence (UAI '06), 2006.
[71] N. Friedman and M. Goldszmidt, “Learning Bayesian Networks with Local Structure,” Proc. 12th Ann. Conf. Uncertainty in Artificial Intelligence, pp. 252-262, 1996.
[72] R. O'Donnell, “Learning Flexible Causal Models with MML,” PhD dissertation, Monash Univ., 2007.

Index Terms:
Machine learning, Performance evaluation of algorithms and systems
Citation:
Ying Yang, Geoffrey I. Webb, Jesus Cerquides, Kevin B. Korb, Janice Boughton, Kai Ming Ting, "To Select or To Weigh: A Comparative Study of Linear Combination Schemes for SuperParent-One-Dependence Estimators," IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 12, pp. 1652-1665, Aug. 2007, doi:10.1109/TKDE.2007.190650
Usage of this product signifies your acceptance of the Terms of Use.