This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Rotation Forest: A New Classifier Ensemble Method
October 2006 (vol. 28 no. 10)
pp. 1619-1630
We propose a method for generating classifier ensembles based on feature extraction. To create the training data for a base classifier, the feature set is randomly split into K subsets (K is a parameter of the algorithm) and Principal Component Analysis (PCA) is applied to each subset. All principal components are retained in order to preserve the variability information in the data. Thus, K axis rotations take place to form the new features for a base classifier. The idea of the rotation approach is to encourage simultaneously individual accuracy and diversity within the ensemble. Diversity is promoted through the feature extraction for each base classifier. Decision trees were chosen here because they are sensitive to rotation of the feature axes, hence the name "forest.” Accuracy is sought by keeping all principal components and also using the whole data set to train each base classifier. Using WEKA, we examined the Rotation Forest ensemble on a random selection of 33 benchmark data sets from the UCI repository and compared it with Bagging, AdaBoost, and Random Forest. The results were favorable to Rotation Forest and prompted an investigation into diversity-accuracy landscape of the ensemble models. Diversity-error diagrams revealed that Rotation Forest ensembles construct individual classifiers which are more accurate than these in AdaBoost and Random Forest, and more diverse than these in Bagging, sometimes more accurate as well.

[1] E.L. Allwein, R.E. Schapire, and Y. Singer, “Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers,” J. Machine Learning Research, vol. 1, pp. 113-141, 2000.
[2] R.E. Banfield, L.O. Hall, K.W. Bowyer, D. Bhadoria, W.P. Kegelmeyer, and S. Eschrich, “A Comparison of Ensemble Creation Techniques,” Proc Fifth Int'l Workshop Multiple Classifier Systems (MCS '04), 2004.
[3] E. Bauer and R. Kohavi, “An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants,” Machine Learning, vol. 36, nos. 1-2, pp. 105-139, 1999.
[4] C.L. Blake and C.J. Merz, “UCI Repository of Machine Learning Databases,” 1998, http://www.ics.uci.edu/mlearnMLRepository.html .
[5] L. Breiman, “Bagging Predictors,” Machine Learning, vol. 24, no. 2, pp. 123-140, 1996.
[6] L. Breiman, “Arcing Classifiers,” Annals of Statistics, vol. 26, no. 3, pp. 801-849, 1998.
[7] L. Breiman, “Random Forests,” Machine Learning, vol. 45, no. 1, pp. 5-32, 2001.
[8] T.G. Dietterich, “Ensemble Methods in Machine Learning,” Proc. Conf. Multiple Classifier Systems, pp. 1-15, 2000.
[9] X.Z. Fern and C.E. Brodley, “Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach,” Proc. 20th Int'l Conf. Machine Learning (ICML), pp. 186-193, 2003.
[10] J.L. Fleiss, Statistical Methods for Rates and Proportions. John Wiley and Sons, 1981.
[11] F.H. Foley and J.W. Sammon, “An Optimal Set of Discriminant Vectors,” IEEE Trans. Computers, vol. 24, no. 3, pp. 281-289, Mar. 1975.
[12] Y. Freund and R.E. Schapire, “A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting,” J. Computer and System Sciences, vol. 55, no. 1, pp. 119-139, 1997.
[13] J. Friedman, T. Hastie, and R. Tibshirani, “Additive Logistic Regression: A Statistical View of Boosting,” Annals of Statistics, vol. 28, no. 2, pp. 337-374, 2000.
[14] K. Fukunaga and W.L.G. Koontz, “Application of the Karhunen-Loeve Expansion to Feature Selection and Ordering,” IEEE Trans. Computers, vol. 19, no. 4, pp. 311-318, Apr. 1970.
[15] J. Han and M. Kamber, Data Mining: Concepts and Techniques. Morgan Kaufmann, 2001.
[16] L.K. Hansen and P. Salamon, “Neural Network Ensembles,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 12, no. 10, pp. 993-1001, Oct. 1990.
[17] T.K. Ho, “The Random Subspace Method for Constructing Decision Forests,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 8, pp. 832-844, Aug. 1998.
[18] T.K. Ho, “A Data Complexity Analysis of Comparative Advantages of Decision Forest Constructors,” Pattern Analysis and Applications, vol. 5, pp. 102-112, 2002.
[19] J. Kittler and P.C. Young, “A New Approach to Feature Selection Based on the Karhunen-Loeve Expansion,” Pattern Recognition, vol. 5, no. 4, pp. 335-352, Dec. 1973.
[20] S. Kolenikov and G. Angeles, “The Use of Discrete Data in PCA: Theory, Simulations, and Applications to Socioeconomic Indices,” Proc. 2004 Joint Statistical Meeting, 2004.
[21] L.I. Kuncheva, Combining Pattern Classifiers. Methods and Algorithms. John Wiley and Sons, 2004.
[22] L.I. Kuncheva and C.J. Whitaker, “Measures of Diversity in Classifier Ensembles,” Machine Learning, vol. 51, pp. 181-207, 2003.
[23] L.I. Kuncheva, “Diversity in Multiple Classifier Systems (editorial),” Information Fusion, vol. 6, no. 1, pp. 3-4, 2004.
[24] L.I. Kuncheva, C.J. Whitaker, C.A. Shipp, and R.P.W. Duin, “Is Independence Good for Combining Classifiers?” Proc. 15th Int'l Conf. Pattern Recognition, vol. 2, pp. 169-171, 2000.
[25] P.M. Long and V.B. Vega, “Boosting and Microarray Data,” Machine Learning, vol. 52, pp. 31-44, 2003.
[26] D.D. Margineantu and T.G. Dietterich, “Pruning Adaptive Boosting,” Proc. 14th Int'l Conf. Machine Learning, pp. 211-218, 1997.
[27] L. Mason, P.L. Bartlet, and J. Baxter, “Improved Generalization through Explicit Optimization of Margins,” Machine Learning, vol. 38, no. 3, pp. 243-255, 2000.
[28] P. Melville, N. Shah, L. Mihalkova, and R.J. Mooney, “Experiments with Ensembles with Missing and Noisy Data,” Proc Fifth Int'l Workshop Multiple Classifier Systems, pp. 293-302, 2004.
[29] C. Nadeau and Y. Bengio, “Inference for the Generalization Error,” Machine Learning, vol. 62, pp. 239-281, 2003.
[30] N.C. Oza, “Boosting with Averaged Weight Vectors,” Proc Fourth Int'l Workshop Multiple Classifier Systems (MCS 2003), 2003.
[31] Multiple Classifier Systems, Proc. Sixth Int'l Workshop, MCS 2005, N.C. Oza et al., eds., 2005.
[32] J.R. Quinlan, C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
[33] Proc. First Int'l Workshop Multiple Classifier Systems (MCS 2000), F. Roli and J. Kittler, eds., 2001.
[34] Proc. Second Int'l Workshop Multiple Classifier Systems (MCS 2001), F. Roli and J. Kittler, eds., 2001.
[35] Proc. Third Int'l Workshop Multiple Classifier Systems (MCS 2002), F. Roli and J. Kittler, eds., 2002.
[36] Proc. Fifth Int'l Workshop Multiple Classifier Systems (MCS 2004), F. Roli et al., eds., 2004.
[37] R.E. Schapire, “Theoretical Views of Boosting,” Proc. Fourth European Conf. Computational Learning Theory, pp. 1-10, 1999.
[38] R.E. Schapire, “The Boosting Approach to Machine Learning: An Overview,” Proc. MSRI Workshop Nonlinear Estimation and Classification, 2002.
[39] R.E. Schapire, Y. Freund, P. Bartlett, and W.S. Lee, “Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods,” Annals of Statistics, vol. 26, no. 5, pp. 1651-1686, 1998.
[40] R.E. Schapire and Y. Singer, “Improved Boosting Algorithms Using Confidence-Rated Predictions,” Machine Learning, vol. 37, no. 3, pp. 397-336, 1999.
[41] M. Skurichina and R.P.W. Duin, “Combining Feature Subsets in Feature Selection,” Proc. Sixth Int'l Workshop Multiple Classifier Systems, (MCS '05), pp. 165-175, 2005.
[42] K. Tumer and N.C. Oza, “Input Decimated Ensembles,” Pattern Analysis Applications, vol. 6, pp. 65-77, 2003.
[43] F. van der Heijden, R.P.W. Duin, D. de Ridder, and D.M.J. Tax, Classification, Parameter Estimation and State Estimation. Wiley, 2004.
[44] A. Webb, Statistical Pattern Recognition. London: Ar nold, 1999.
[45] G.I. Webb, “MultiBoosting: A Technique for Combining Boosting and Wagging,” Machine Learning, vol. 40, no. 2, pp. 159-196, 2000.
[46] Proc. Fourth Int'l Workshop Multiple Classifier Systems (MCS 2003), T. Windeatt and F. Roli, eds., 2003.
[47] I.H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, second ed. Morgan Kaufmann, 2005.

Index Terms:
Classifier ensembles, AdaBoost, bagging, random forest, feature extraction, PCA, kappa-error diagrams.
Citation:
Juan J. Rodr?guez, Ludmila I. Kuncheva, Carlos J. Alonso, "Rotation Forest: A New Classifier Ensemble Method," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 10, pp. 1619-1630, Oct. 2006, doi:10.1109/TPAMI.2006.211
Usage of this product signifies your acceptance of the Terms of Use.