This Article 
 Bibliographic References 
 Add to: 
Linear-Time Wrappers to Identify Atypical Points: Two Subset Generation Methods
September 2005 (vol. 17 no. 9)
pp. 1289-1297
The wrapper approach to identify atypical examples can be preferable to the filter approach (which may not be consistent with the classifier in use), but its running time is prohibitive. The fastest available wrappers are quadratic in the number of examples, which is far too expensive for atypical detection. The algorithm presented in this paper is a linear-time wrapper that is roughly 75 times faster than the quadratic wrappers on average over 7 classifiers and 20 data sets tested in this research. Also, two subset generation methods for the wrapper are introduced and compared. Atypical points are defined in this paper as the misclassified points that the proposed algorithm (Atypical Sequential Removing: ASR) finds not useful to the classification task. They may include outliers as well as overlapping samples. ASR can identify and rank atypical points in the whole data set without damaging the prediction accuracy. It is general enough that classifiers without reject option can use it. Experiments on benchmark data sets and different classifiers show promising results and confirm that this wrapper method has some advantages and can be used for atypical detection.

[1] D.W. Aha, L.A. Breslow, and H. Muñoz-Avila, “Conversational Case-Based Reasoning,” Applied Intelligence, vol. 14, pp. 9-32, 2000.
[2] C.L. Blake and C.J. Merz, UCI Repository of Machine Learning Databases, , , Irvine, Calif.: Univ. of California, Dept. of Information and Computer Science, 1998.
[3] A.L. Blum, “Selection of Relevant Features and Examples in Machine Learning,” Artificial Intelligence, vol. 97, pp. 245-271, 1997.
[4] L. Breiman, “Arcing Classifiers,” The Annals of Statistics, vol. 26, no. 3, pp. 801-849, 1998.
[5] C.E. Brodley and M.A. Friedl, “Identifying Mislabeled Training Data,” J. Artificial Intelligence Research, vol. 11, pp. 131-167, 1999.
[6] C.K. Chow, “On Optimum Recognition Error and Reject Tradeoff,” IEEE Trans. Information Theory, vol. 16, no. 1, 1970.
[7] M. Dash and H. Liu, “Feature Selection for Classification,” Intelligent Data Analysis, vol. 1, pp. 131-156, 1997.
[8] J. Han and M. Kamber, Data Mining Concepts and Techniques. New York: Morgan Kaufman, 2001.
[9] S. Hashemi and T.P. Trappenberg, “Using SVM for Classification in Datasets with Ambiguous Data,” Proc. Sixth World Multiconf. Systemics, Cybernetics, and Informatics, July 2002.
[10] S. Hashemi, “Coverage-Performance Curves for Classification in Datasets with Atypical Data,” Proc. First IEEE Int'l Conf. Machine Learning and Cybernetics, Nov. 2002.
[11] H.G. John, R. Kohavi, and K. Pfleger, “Irrelevant Features and the Subset Selection Problem,” Proc. Int'l Conf. Machine Learning, pp. 121-129, 1994.
[12] H.G. John, “Robust Decision Trees: Removing Outliers from Databases,” Proc. First Int'l Conf. Knowledge Discovery and Data Mining, pp. 174-179, 1995.
[13] C.M. Jude and G.H. McClelland, Data Analysis, A Model-Comparison Approach. Harcourt Brace Jova novich, 1989.
[14] R. Kohavi and G.H. John, “Wrappers for Feature Subset Selection,” Artificial Intelligence J., special issue on relevance, 1997.
[15] R. Kohavi, “A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection,” Proc. Int'l Joint Conf. Artificial Intelligence, 1995.
[16] J. Komorowski and A. Øhrn, “Modelling Prognostic Power of Cardiac Tests Using Rough Sets,” Artificial Intelligence in Medicine 15, pp. 167-191, 1999.
[17] A. Krogh and J. Vedelsby, “Neural Network Ensembles, Cross-Validation, and Active Learning,” Advances in Neural Information Processing Systems, vol. 7, pp. 231-238, Cambridge, Mass.: MIT Press, 1995.
[18] A.J. Miller, Subset Selection in Regression. New York: Chapman and Hall, 1990.
[19] D. Opitz and R. Maclin, “Popular Ensemble Methods: An Empirical Study,” J. Artificial Intelligence Research, vol. 11, pp. 169-198, 1999.
[20] D. Opitz and J. Shavlik, “Actively Searching for an Effective Neural-Network Ensemble,” Connection Science, vol. 8, nos. 3/4, pp. 337-353, 1996.
[21] B.D. Ripley, Pattern Recognition and Neural Networks. Cambridge Univ. Press, 1996.
[22] R. Schapire, Y. Freund, P. Bartlett, and W. Lee, “Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods,” The Annals of Statistics, May 1998.
[23] T.P. Trappenberg, A.D. Back, and S.-I. Amari, “A Performance Measure for Classification with Ambiguous Data,” BSIS Technical Reports No. 99-67, May 1999.
[24] D. Wettschereck, D.W. Aha, and T. Mohri, “A Review and Empirical Evaluation of Feature Weighting Methods for a Class of Lazy Learning Algorithms,” Artificial Intelligence Rev., 1997.
[25] D.R. Wilson and T.R. Martinez, “An Integrated Instance-Based Algorithm,” Computational Intelligence, vol. 16, no. 1, pp. 1-28, 2000.
[26] X. Zhu, X. Wu, and Y. Yang, “Error Detection and Impact-Sensitive Instance Ranking in Noisy Datasets,” Proc. 19th Nat'l Conf. Artificial Intelligence (AAAI), 2004.

Index Terms:
Index Terms- Atypical data, outlier detection, overlapping samples, linear wrapper, sample subset selection.
Saeed Hashemi, "Linear-Time Wrappers to Identify Atypical Points: Two Subset Generation Methods," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 9, pp. 1289-1297, Sept. 2005, doi:10.1109/TKDE.2005.150
Usage of this product signifies your acceptance of the Terms of Use.