This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Feature Subset Selection and Feature Ranking for Multivariate Time Series
September 2005 (vol. 17 no. 9)
pp. 1186-1198
Feature subset selection (FSS) is a known technique to preprocess the data before performing any data mining tasks, e.g., classification and clustering. FSS provides both cost-effective predictors and a better understanding of the underlying process that generated the data. We propose a family of novel unsupervised methods for feature subset selection from Multivariate Time Series (MTS) based on Common Principal Component Analysis, termed {\schmi CL}e{\schmi V}er. Traditional FSS techniques, such as Recursive Feature Elimination (RFE) and Fisher Criterion (FC), have been applied to MTS data sets, e.g., Brain Computer Interface (BCI) data sets. However, these techniques may lose the correlation information among features, while our proposed techniques utilize the properties of the principal component analysis to retain that information. In order to evaluate the effectiveness of our selected subset of features, we employ classification as the target data mining task. Our exhaustive experiments show that {\schmi CL}e{\schmi V}er outperforms RFE, FC, and random selection by up to a factor of two in terms of the classification accuracy, while taking up to 2 orders of magnitude less processing time than RFE and FC.

[1] I. Guyon and A. Elisseeff, “An Introduction to Variable and Feature Selection,” J. Machine Learning Research, vol. 3, pp. 1157-1182, Mar. 2003.
[2] H. Liu, L. Yu, M. Dash, and H. Motoda, “Active Feature Selection Using Classes,” Proc. Pacific-Asia Conf. Knowledge Discovery and Data Mining, 2003.
[3] A. Tucker, S. Swift, and X. Liu, “Variable Grouping in Multivariate Time Series Via Correlation,” IEEE Trans. Systems, Man, and Cybernetics B, vol. 31, no. 2, 2001.
[4] M.W. Kadous, “Temporal Classification: Extending the Classification Paradigm to Multivariate Time Series,” PhD dissertation, Univ. of New South Wales, 2002.
[5] C. Shahabi, “AIMS: An Immersidata Management System,” Proc. Very Large Data Bases Biennial Conf. Innovative Data Systems Research, 2003.
[6] R. Tanawongsuwan and A.F. Bobick, “Performance Analysis of Time-Distance Gait Parameters under Different Speeds,” Proc. Fourth Int'l Conf. Audio and Video-Based Biometric Person Authentication, June 2003.
[7] X.L. Zhang, H. Begleiter, B. Porjesz, W. Wang, and A. Litke, “Event Related Potentials during Object Recognition Tasks,” Brain Research Bull., vol. 38, no. 6, 1995.
[8] C. Winstein and J. Tretriluxana, “Motor Skill Learning after Rehabilitative Therapy: Kinematics of a Reach-Grasp Task,” Soc. Neuroscience, Oct. 2004.
[9] T.N. Lal, M. Schröder, T. Hinterberger, J. Weston, M. Bogdan, N. Birbaumer, and B. Schölkopf, “Support Vector Channel Selection in BCI,” IEEE Trans. Biomedical Eng., vol. 51, no. 6, June 2004.
[10] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene Selection for Cancer Classification Using Support Vector Machines,” Machine Learning, vol. 46, nos. 1-3, pp. 389-422, Jan. 2002.
[11] J. Weston, A. Elisseeff, B. Schölkopf, and M.E. Tipping, “Use of the Zero-Norm with Linear Models and Kernel Methods,” J. Machine Learning Research, vol. 3, pp. 1439-1461, Nov. 2003.
[12] T.K. Moon and W.C. Stirling, Mathematical Methods and Algorithms for Signal Processing. Prentice Hall, 2000.
[13] W. Krzanowski, “Between-Groups Comparison of Principal Components,” J. Am. Statistical Assoc., vol. 74, no. 367, 1979.
[14] C.M. Bishop, Neural Networks for Pattern Recognition. Oxford Univ. Press, 1996.
[15] S.K. Pal and P. Mitra, Pattern Recognition Algorithms for Data Mining: Scalability, Knowledge Discovery, and Soft Granular Computing. Boca Raton, Fla.: Chapman Hall/CRC Press, May 2004.
[16] R. Kohavi and G.H. John, “Wrappers for Feature Subset Selection,” Artificial Intelligence, vol. 97, nos. 1-2, pp. 273-324, 1997.
[17] J. Han and M. Kamber, Data Mining: Concepts and Techniques, chapter 3, p. 121, Morgan Kaufmann, 2000.
[18] S. Hettich and S.D. Bay, “The UCI KDD Archive,” http:/kdd.ics. uci.edu, 1999.
[19] S. Zhong and J. Ghosh, “HMMs and Coupled HMMs for Multi-Channel EEG Classification,” Proc. Int'l Joint Conf. Neural Networks, 2002.
[20] I. Cohen, Q. Tian, X.S. Zhou, and T.S. Huang, “Feature Selection Using Principal Feature Analysis,” Univ. of Illinois at Urbana-Champaign, 2002.
[21] I.T. Jolliffe, Principal Component Analysis. Springer, 2002.
[22] B.N. Flury, “Common Principal Components in k Groups,” J. Am. Statistical Assoc., vol. 79, no. 388, pp. 892-898, 1984.
[23] W. Krzanowski, “Orthogonal Components for Grouped Data: Review and Applications,” Statistics in Transition, vol. 5, no. 5, pp. 759-777, Oct. 2002.
[24] J.R. Schott, “Some Tests for Common Principal Components in Several Groups,” Biometrika, vol. 78, pp. 771-777, 1991.
[25] V. Vapnik, Statistical Learning Theory. New York: Wiley, 1998.
[26] C.-C. Chang and C.-J. Lin, “LIBSVMA Library for Support Vector Machines,” http://www. csie. ntu.edu.tw/~cjlinlibsvm /, 2004.
[27] R. Kohavi, “A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection,” Proc. Int'l Joint Conf. Artificial Intelligence, 1995.
[28] M. Shah and R. Jain, Motion-Based Recognition, chapter 15, Kluwer Academic Publishers, 1997.
[29] H. Sakoe and S. Chiba, “Dynamic Programming Algorithm Optimization for Spoken Word Recognition,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 26, no. 1, 1978.
[30] K. Yang and C. Shahabi, “A PCA-Based Similarity Measure for Multivariate Time Series,” Proc. Second ACM Int'l Workshop Multimedia Databases, 2004.
[31] J. Weston, A. Elisseeff, G. BakIr, and F. Sinz, “Spider: Object-Orientated Machine Learning Library,” http://www.kyb. tuebingen.mpg.de/bs/people spider/, 2005.
[32] “Marker Configurations for the Humangait Dataset,” ftp://ftp.cc.gatech.edu/pub/gvu/cpl/walkers/ speed_control_datadoc, 2003.
[33] P. Mitra, C. Murthy, and S.K. Pal, “Unsupervised Feature Selection Using Feature Similarity,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 3, pp. 301-312, Mar. 2002.
[34] R. Tibshirani, G. Walther, and T. Hastie, “Estimating the Number of Clusters in a Data Set Via the Gap Statistic,” J. Royal Statistical Soc.: Series B (Statistical Methodology), vol. 63, no. 2, pp. 411-423, 2001.
[35] D. Leibovici and R. Sabatier, “A Singular Value Decomposition of a $k{\hbox{-}}{\rm Way}$ Array for a Principal Component Analysis of Multiway Data, ${\rm PTA}{\hbox{-}}k$ ,” Linear Algebra and Its Applications, 1998.

Index Terms:
Index Terms- Data mining, feature evaluation and selection, feature extraction or construction, time series analysis, feature representation.
Citation:
Hyunjin Yoon, Kiyoung Yang, Cyrus Shahabi, "Feature Subset Selection and Feature Ranking for Multivariate Time Series," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 9, pp. 1186-1198, Sept. 2005, doi:10.1109/TKDE.2005.144
Usage of this product signifies your acceptance of the Terms of Use.