The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.05 - May (2013 vol.35)
pp: 1178-1192
Xindong Wu , Sch. of Comput. Sci. & Inf. Eng., Hefei Univ. of Technol., Hefei, China
Kui Yu , Sch. of Comput. Sci. & Inf. Eng., Hefei Univ. of Technol., Hefei, China
Wei Ding , Dept. of Comput. Sci., Univ. of Massachusetts, Boston, MA, USA
Hao Wang , Sch. of Comput. Sci. & Inf. Eng., Hefei Univ. of Technol., Hefei, China
Xingquan Zhu , Centre for Quantum Comput. & Intell. Syst., Univ. of Technol., Sydney, Sydney, NSW, Australia
ABSTRACT
We propose a new online feature selection framework for applications with streaming features where the knowledge of the full feature space is unknown in advance. We define streaming features as features that flow in one by one over time whereas the number of training examples remains fixed. This is in contrast with traditional online learning methods that only deal with sequentially added observations, with little attention being paid to streaming features. The critical challenges for Online Streaming Feature Selection (OSFS) include 1) the continuous growth of feature volumes over time, 2) a large feature space, possibly of unknown or infinite size, and 3) the unavailability of the entire feature set before learning starts. In the paper, we present a novel Online Streaming Feature Selection method to select strongly relevant and nonredundant features on the fly. An efficient Fast-OSFS algorithm is proposed to improve feature selection performance. The proposed algorithms are evaluated extensively on high-dimensional datasets and also with a real-world case study on impact crater detection. Experimental results demonstrate that the algorithms achieve better compactness and higher prediction accuracy than existing streaming feature selection algorithms.
INDEX TERMS
Markov processes, Redundancy, Algorithm design and analysis, Prediction algorithms, Training, Accuracy, supervised learning, Feature selection, streaming features
CITATION
Xindong Wu, Kui Yu, Wei Ding, Hao Wang, Xingquan Zhu, "Online Feature Selection with Streaming Features", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.35, no. 5, pp. 1178-1192, May 2013, doi:10.1109/TPAMI.2012.197
REFERENCES
[1] A. Agresti, Categorical Data Analysis. John Wiley and Sons, 1990.
[2] C.F. Aliferis, A. Statnikov, I. Tsamardinos, S. Mani, and X. Koutsoukos, "Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification, Part I: Algorithms and Empirical Evaluation," J. Machine Learning Research, vol. 11, pp. 171-234, 2010.
[3] Y. Aphinyanaphongs, A. Statnikov, and C.F. Aliferis, "A Comparison of Citation Metrics to Machine Learning Filters for the Identification of High Quality Medline Documents," J. Am. Medical Informatics Assoc., vol. 13, no. 4, pp. 446-455, 2006.
[4] G. Bontempi and P.E. Meyer, "Causal Filter Selection in Microarray Data," Proc. Int'l Conf. Machine Learning, 2010.
[5] C. Blake and C. Merz, UCI Repository of Machine Learning Databases. http://www.ics.uci.edu/~mlearnMLRepository.html . 1998.
[6] Clopinet, Causation and Prediction Challenge, (WCCI 2008), http:/www.causality.inf.ethz.ch, 2008.
[7] Clopinet, Performance Prediction Challenge, (WCCI 2006), http://clopinet.com/isabelle/Projectsmodelselect /, 2006.
[8] Clopinet, Feature Selection Challenge, (NIPS 2003), http:// clopinet.com/isabelle/ProjectsNIPS2003 /, 2003.
[9] T.P. Conrads et al., "High-Resolution Serum Proteomic Features for Ovarian Cancer Detection," Endocrine Related Cancer, vol. 11, pp. 163-178, 2004.
[10] M. Dash and H. Liu, "Consistency-Based Search in Feature Selection," Artificial Intelligence, vol. 151, nos. 1/2, pp. 155-176, 2003.
[11] W. Ding, T. Stepinski, Y. Mu, L. Bandeira, R. Vilalta, Y. Wu, Z. Lu, T. Cao, and X. Wu, "Sub-Kilometer Crater Discovery with Boosting and Transfer Learning," ACM Trans. Intelligent Systems and Technology, vol. 2, no. 4, pp. 1-22, 2011.
[12] P.S. Dhillon, D. Foster, and L. Ungar, "Feature Selection Using Multiple Streams," Proc. Int'l Conf. Artificial Intelligence and Statistics, 2010.
[13] G. Forman, "An Extensive Empirical Study of Feature Selection Metrics for Text Classification," J. Machine Learning Research. vol. 3, pp. 1289-1305, 2003.
[14] Z. Zhang and N. Ye, "Locality Preserving Multimodal Discriminative Learning for Supervised Feature Selection," Knowledge and Information Systems, vol. 27, no. 3, pp. 473-490, 2011.
[15] K. Glocer, D. Eads, and J. Theiler, "Online Feature Selection for Pixel Classification," Proc. Int'l Conf. Machine Learning, 2005.
[16] I. Guyon, C.F. Aliferis, and A. Elisseeff, "Causal Feature Selection," Computational Methods of Feature Selection. H. Liu and H. Motoda, eds., Chapman and Hall, 2008.
[17] I. Guyon and A. Elisseeff, "An Introduction to Variable and Feature Selection," J. Machine Learning Research. vol. 3, pp. 1157-1182, 2003.
[18] G.H. John, R. Kohavi, and K. Pfleger, "Irrelevant Features and the Subset Selection Problem," Proc. Int'l Conf. Machine Learning, pp. 121-129, 1994.
[19] T. Joachims, Learning to Classify Text Using Support Vector Machines. Kluwer Academic, 2002.
[20] R. Kohavi and G.H. John, "Wrappers for Feature Subset Selection," Artificial Intelligence, vol. 97, pp. 273-324, 1997.
[21] D. Koller and M. Sahami, "Toward Optimal Feature Selection," Proc. Int'l Conf. Machine Learning, pp. 284-292, 1996.
[22] P. Langley, "Selection of Relevant Features in Machine Learning," Proc. AAAI Fall Symp. Relevance, pp. 140-144, 1994.
[23] S. Loscalzo, L. Yu, and C. Ding, "Consensus Group Based Stable Feature Selection," Proc. 15th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 567-576, 2009.
[24] S. Mani and G.F. Cooper, "A Study in Causal Discovery from Population-Based Infant Birth and Death Records," Proc. AMIA Ann. Fall Symp., pp. 315-319, 1999.
[25] R. Neapolitan, Learning Bayesian Networks. Prentice Hall, 2003.
[26] D. Omidiran and M.J. Wainwright, "High-Dimensional Variable Selection with Sparse Random Projections: Measurement Sparsity and Statistical Efficiency," J. Machine Learning Research, vol. 11, pp. 2361-2386, 2010.
[27] S. Perkins and J. Theiler, "Online Feature Selection Using Grafting," Proc. Int'l Conf. Machine Learning, pp. 592-599, 2003.
[28] I. Rodriguez-Lujan, R. Huerta, C. Elkan, and C. Santa Cruz, "Quadratic Programming Feature Selection," J. Machine Learning Research, vol. 11, pp. 1491-1516, 2010.
[29] A. Rosenwald et al., "The Use of Molecular Profiling to Predict Survival after Chemotherapy for Diffuse Large-B-Cell Lymphoma," New England J. Medicine, vol. 346, pp. 1937-1947, 2002.
[30] L. Song, A.J. Smola, A. Gretton, K.M. Borgwardt, and J. Bedo, "Supervised Feature Selection via Dependence Estimation," Proc. Int'l Conf. Machine Learning, pp. 823-830, 2007.
[31] "Spider: A Matlab Machine Learning Tool," 2010.
[32] P. Spirtes, C. Glymour, and R. Scheines, Causation, Prediction, and Search, second ed. MIT Press, 2000.
[33] R. Tibshirani, "Regression Shrinkage and Selection via the Lasso," J. Royal Statistical Soc. B. vol. 58, pp. 267-288, 1996.
[34] E. Tuv, A. Borisov, and G.C. Runger, and K. Torkkola, "Feature Selection with Ensembles, Artificial Variables, and Redundancy Elimination," J. Machine Learning Research, vol. 10, pp. 1341-1366, 2009.
[35] Y. Wang et al., "Gene-Expression Profiles to Predict Distant Metastasis of Lymph-Node Negative Primary Breast Cancer," Lancet, vol. 365, pp. 671-679, 2005.
[36] L. Ungar, J. Zhou, D. Foster, and B. Stine, "Streaming Feature Selection Using IIC," Proc. 10th Int'l Workshop Artificial Intelligence and Statistics, 2005.
[37] X. Zhu, W. Ding, P.S. Yu, and C. Zhang, "One-Class Learning and Concept Summarization for Data Streams," Knowledge and Information Systems, vol. 28, no. 3, pp. 523-553, 2011.
[38] J. Weston, S. Mukherjee, O. Chapelle, M. Pontil, T. Poggio, and V. Vapnik, "Feature Selection for SVMs," Proc. Neural Information Processing Systems Conf., pp. 668-674, 2001.
[39] X. Wu, K. Yu, H. Wang, and W. Ding, "Online Streaming Feature Selection," Proc. Int'l Conf. Machine Learning, pp. 1159-1166, 2010.
[40] L. Yu, C. Ding, and S. Loscalzo, "Stable Feature Selection via Dense Feature Groups," Proc. 14th ACM SIGKDD Int'l Conf. Knowledge Discovery in Data Mining, pp. 803-811, 2008.
[41] L. Yu and H. Liu, "Efficient Feature Selection via Analysis of Relevance and Redundancy," J. Machine Learning Research, vol. 5, pp. 1205-1224, 2004.
[42] P. Zhao and B. Yu, "On Model Selection Consistency of Lasso," J. Machine Learning Research, vol. 7, pp. 2541-2567, 2006.
[43] Z. Zhao and H. Liu, "Searching for Interacting Features in Subset Selection," Intelligent Data Analysis. vol. 13, pp. 207-228, 2009.
[44] T. Zhang, "On the Consistency of Feature Selection Using Greedy Least Squares Regression," J. Machine Learning Research, vol. 10, pp. 555-568, 2009.
[45] J. Zhou, D.P. Foster, and R. Stine, and L.H. Ungar, "Streaming Feature Selection Using Alpha-Investing," Proc. 11th ACM SIGKDD Int'l Conf. Knowledge Discovery in Data Mining, pp. 384 -393, 2005.
[46] J. Zhou, D. Foster, R.A. Stine, and L.H. Ungar, "Streamwise Feature Selection," J. Machine Learning Research, vol. 7, pp. 1861-1885, 2006.
39 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool