The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.04 - April (2010 vol.22)
pp: 508-522
Rozita A. Dara , University of Waterloo, Waterloo
Masoud Makrehchi , University of Waterloo, Waterloo
Mohamed S. Kamel , University of Waterloo, Waterloo
ABSTRACT
Data partitioning methods such as bagging and boosting have been extensively used in multiple classifier systems. These methods have shown a great potential for improving classification accuracy. This study is concerned with the analysis of training data distribution and its impact on the performance of multiple classifier systems. In this study, several feature-based and class-based measures are proposed. These measures can be used to estimate statistical characteristics of the training partitions. To assess the effectiveness of different types of training partitions, we generated a large number of disjoint training partitions with distinctive distributions. Then, we empirically assessed these training partitions and their impact on the performance of the system by utilizing the proposed feature-based and class-based measures. We applied the findings of this analysis and developed a new partitioning method called “Clustering, Declustering, and Selection” (CDS). This study presents a comparative analysis of several existing data partitioning methods including our proposed CDS approach.
INDEX TERMS
Multiple classifier system, combining method, wrapper-based data partitioning, filter-based data partitioning, distance, feature-based, class-based.
CITATION
Rozita A. Dara, Masoud Makrehchi, Mohamed S. Kamel, "Filter-Based Data Partitioning for Training Multiple Classifier Systems", IEEE Transactions on Knowledge & Data Engineering, vol.22, no. 4, pp. 508-522, April 2010, doi:10.1109/TKDE.2009.80
REFERENCES
[1] N. Arshadi and I. Jurisica, "Data Mining for Case-Based Reasoning in High-Dimensional Biological Domains," IEEE Trans. Knowledge and Data Eng., vol. 17, no. 8, pp. 1127-1137, Aug. 2005.
[2] P. Bennett, S. Dumais, and E. Horvitz, "The Combination of Text Classifiers Using Reliability Indicators," Information Retrieval, vol. 8, no. 1, pp. 67-100, 2005.
[3] W.H. Berger and F.L. Parker, "Diversity of Planktonic Foraminifera in Deep Sea Sediments," Science, vol. 168, pp. 1345-1347, 1970.
[4] Benchmark Repository, http://users.rsise.anu.edu.au/raetschdata /, 2009.
[5] D. Blake and C. Merz, Repository of Machine Learning Databases, http://www.ics.uci.edu/mlearnMLRepository.html , 2009.
[6] A. Blum, and P. Langley, "Selection of Relevant Features and Examples in Machine Learning," Artificial Intelligence, vol. 97, nos. 1/2, pp. 245-271, 1997.
[7] L. Breiman, "Random Forests," Machine Learning, vol. 45, no. 1, pp. 5-32, 2001.
[8] L. Breiman, "Bagging Predictors," Machine Learning, vol. 24, no. 2, pp. 123-140, 1996.
[9] N. Chawla, T. Moore, L. Hall, L. Bowyer, P. Kegelmeyer, and C. Springer, "Distributed Learning with Bagging-like Performance," Pattern Recognition Letters, vol. 24, pp. 455-471, 2003.
[10] C.H. Cheng, A.W. Fu, and Y. Zhang, "Entropy-Based Subspace Clustering for Mining Numerical Data," Proc. Int'l Conf. Knowledge Discovery and Data Mining, pp. 84-93, 1999.
[11] R. Dara, M. Makrehchi, and M. Kamel, "Data Partitioning Evaluation Measures for Classifier Ensemble," Proc. Sixth Int'l Workshop Multiple Classifier Systems (MCS), pp. 306-315, 2005.
[12] T. Dietterich, "Ensemble Methods in Machine Learning," Proc. First Int'l Workshop Multiple Classifier Systems (MCS), pp. 1-15, 2000.
[13] T. Dietterich, "An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization," Machine Learning, vol. 40, no. 2, pp. 139-158, 2000.
[14] R. Duda, P. Hart, and D. Strok, Pattern Recognition, second ed. John Wiley and Sons, 2000.
[15] R.P. Duin, P. Juszczak, P. Paclik, E. Pekalska, D. de Ridder, and D.M.J. Tax,"PRTools3, a Matlab Toolbox for Pattern Recognition," http:/prtools.org, 2003.
[16] ELENA Project, http://www.dice.ucl.ac.be/neural-nets/ Research/ Projects/ELENAelena.htm , 2009.
[17] Y. Freund and R.E. Schapire, "A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting," J. Computer and System Sciences, vol. 55, pp. 119-139, 1997.
[18] Y. Freund and R. Schapire, "Experiments with a New Boosting Algorithm," Proc. 13th Int'l Conf. Machine Learning, pp. 148-156, 1996.
[19] D. Frosyniotis, A. Stafylopatis, and A. Likas, "A Divide-and-Conquer Method for Multi-Net Classifiers," Pattern Analysis and Applications, vol. 6, pp. 32-40, 2002.
[20] S. Haykin, Neural Networks: A Comprehensive Foundation, second ed. Prentice Hall, 1998.
[21] T. Ho, "Random Subspace Method for Constructing Decision Forests," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 8, pp. 832-844, Aug. 1998.
[22] W. Jiang and M. Tanne, "Hierarchical Mixtures of Experts for Generalized Linear Models," Neural Computation, vol. 11, pp. 1183-1198, 1999.
[23] M. Kamel and N. Wanas, "Data Dependence in Combining Classifiers," Proc. Fourth Int'l Workshop Multiple Classifier Systems (MCS), pp. 1-14, 2003.
[24] "Multiple Classifiers Systems," Proc. Sixth Int'l Workshop Multiple Classifier Systems (MCS), N.C. Oza, R. Polikar, J. Kittler, and F. Roli, eds., 2005.
[25] "Multiple Classifiers Systems," Proc. Fifth Int'l Workshop Multiple Classifier Systems (MCS), J. Kittler, F. Roli, and T. Windeatt, eds., 2004.
[26] J. Kittler, M. Hatef, R. Duin, and J. Matas, "On Combining Classifiers," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 3, pp. 226-239, Mar. 1998.
[27] L. Kuncheva, "Using Diversity Measures for Generating Error-Correcting Output Codes in Classifier Ensembles," Pattern Recognition Letters, vol. 26, no. 1, pp. 83-90, 2005.
[28] L.I. Kuncheva, "Diversity in Multiple Classifier Systems (Editorial)," Information Fusion, vol. 6, no. 1, pp. 3-4, 2005.
[29] L. Kuncheva, Combining Pattern Classifiers: Methods and Algorithms. John Wiley and Sons, 2004.
[30] L. Kuncheva, "Switching Between Selection and Fusion in Combining Classifiers: An Experiment," IEEE Trans. Systems, Man, and Cybernetics—Part B, vol. 32, no. 2, pp. 146-156, Apr. 2002.
[31] J. Rodriguez, L. Kuncheva, and C. Alonso, "Rotation Forest: A New Classifier Ensemble Method," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 10, pp. 1619-1630, Oct. 2006.
[32] D. Tao, X. Tang, X. Li, and X. Wu, "Asymmetric Bagging and Random Subspace for Support Vector Machines-Based Relevance Feedback in Image Retrieval," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 7, pp. 1088-1099, July 2006.
[33] K. Tumer and N. Oza, "Input Decimated Ensembles," Pattern Analysis and Applications, vol. 6, no. 1, pp. 65-77, 2003.
[34] N. Wanas, R.A. Dara, and M.S. Kamel, "Adaptive Fusion and Co-Operative Training for Classifier Ensembles," Pattern Recognition, vol. 39, no. 9, pp. 1781-1794, 2006.
[35] K. Woods, W. Kegelmeyer, and K. Bowyer, "Combination of Multiple Classifiers Using Local Accuracy Estimates," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 4, pp. 405-410, Apr. 1997.
[36] G.I. Webb and Z. Zheng, "Multistrategy Ensemble Learning: Reducing Error by Combining Ensemble Learning Techniques," IEEE Trans. Knowledge and Data Eng., vol. 16, no. 8, pp. 980-991, Aug. 2004.
20 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool