This Article 
 Bibliographic References 
 Add to: 
A New Dependency and Correlation Analysis for Features
September 2005 (vol. 17 no. 9)
pp. 1199-1207
The quality of the data being analyzed is a critical factor that affects the accuracy of data mining algorithms. There are two important aspects of the data quality, one is relevance and the other is data redundancy. The inclusion of irrelevant and redundant features in the data mining model results in poor predictions and high computational overhead. This paper presents an efficient method concerning both the relevance of the features and the pairwise features correlation in order to improve the prediction and accuracy of our data mining algorithm. We introduce a new feature correlation metric Q_Y (X_i ,X_j ) and feature subset merit measure e(S) to quantify the relevance and the correlation among features with respect to a desired data mining task (e.g., detection of an abnormal behavior in a network service due to network attacks). Our approach takes into consideration not only the dependency among the features, but also their dependency with respect to a given data mining task. Our analysis shows that the correlation relationship among features depends on the decision task and, thus, they display different behaviors as we change the decision task. We applied our data mining approach to network security and validated it using the DARPA KDD99 benchmark data set. Our results show that, using the new decision dependent correlation metric, we can efficiently detect rare network attacks such as User to Root (U2R) and Remote to Local (R2L) attacks. The best reported detection rates for U2R and R2L on the KDD99 data sets were 13.2 percent and 8.4 percent with 0.5 percent false alarm, respectively. For U2R attacks, our approach can achieve a 92.5 percent detection rate with a false alarm of 0.7587 percent. For R2L attacks, our approach can achieve a 92.47 percent detection rate with a false alarm of 8.35 percent.

[1] A. Al-Ani and M. Deriche, “Feature Selection Using a Mutual Information Based Measure,” Proc. 16th Int'l Conf. Pattern Recognition, vol. 4, pp. 82-85, 2002.
[2] H. Almuallim and T.G. Dietterich, “Learning with Many Irrelevant Features,” Proc. Ninth Nat'l Conf. Artificial Intelligence, pp. 547-552, 1991.
[3] R. Battiti, “Using Mutual Information for Selecting Features in Supervised Neural Net Learning,” IEEE Trans. Neural Networks, vol. 5, pp. 537-550, 1994.
[4] A. Blum and P. Langley, “Selection of Relevant Features and Examples in Machine Learning,” Artificial Intelligence, vol. 97, nos. 1-2, pp. 245-271, 1997.
[5] S. Das, “Filters, Wrappers and a Boosting-Based Hybrid for Feature Selection,” Proc. 18th Int'l Conf. Machine Learning, pp. 74-81, 2001.
[6] M. Dash and H. Liu, “Feature Selection for Classification,” Intelligent Data Analysis: An Int'l J., vol. 1, pp. 131-156, 1997.
[7] M.A. Hall, “Correlation-Based Feature Selection for Discrete and Numeric Class Machine Learning,” Proc. 17th Int'l Conf. Machine Learning, pp. 359-366, 2000.
[8] M.A. Hall and G. Holmes, “Benchmarking Attribute Selection Techniques for Discrete Class Data Mining,” IEEE Trans. Knowledge and Data Eng., 2002.
[9] K.A. De Jong, “An Analysis of Behavior of a Class of Genetic Adaptive Systems,” PhD Dissertation, Dept. of Computer and Comm. Sciences, Univ. of Michigan, 1975.
[10] K. Kira and L.A. Rendell, “A Practical Approach to Feature Selection,” Proc. Ninth Int'l Workshop Machine Intelligence, 1992.
[11] R. Kohavi and G. John, “Wrappers for Feature Subset Selection,” Artificial Intelligence, pp. 273-324, 1997.
[12] I. Kononenko, “Estimating Attributes: Analysis and Extensions of Relief,” Proc. Seventh European Conf. Machine Learning, pp. 171-182, 1994.
[13] P. Langley, “Selection of Relevant Features in Machine Learning,” Proc. AAAI Fall Symp. Relevance, 1994.
[14] H. Liu and R. Setiono, “A Probabilistic Approach to Feature Selection: A Filter Solution,” Proc. 13th Int'l Conf. Machine Learning, pp. 319-327, 1996.
[15] J.A. Miller, W.D. Potter, R.V. Grandham, and C.N. Lapena, “An Evaluation of Local Improvement Operators for Genetic Algorithms,” IEEE Trans. Systems, Man, and Cybernetics, vol. 23, pp. 1340-1351, Sept./Oct. 1993.
[16] Y. Peng and J.A. Reggia, “A Connectionist Model for Diagnostic Problem Solving,” IEEE Trans. Systems, Man, and Cybernetics, vol. 19, pp. 285-298, Mar./Apr. 1989.
[17] W.H. Press, B.P. Flannery, S.A. Teukolski, and W.T. Vetterling, Numerical Recipes in C. Cambridge Univ. Press, , 2005.
[18] J.R. Quinlan, C4.5: Programs for Machine Learning. San Mateo, Calif.: Morgan Kaufmann, 1993.
[19] L. Yu and H. Liu, “Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution,” Proc. 20th Int'l Conf. Machine Learning (ICML-2003), 2003.
[20] L. Yu and H. Liu, “Efficient Feature Selection via Analysis of Relevance and Redundancy,” J. Machine Learning Research, vol. 5, pp. 1205-1224, Oct. 2004.
[21] task.html, 2005.

Index Terms:
Index Terms- Feature extraction, correlation measure.
Guangzhi Qu, Salim Hariri, Mazin Yousif, "A New Dependency and Correlation Analysis for Features," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 9, pp. 1199-1207, Sept. 2005, doi:10.1109/TKDE.2005.136
Usage of this product signifies your acceptance of the Terms of Use.