Issue No. 03 - March (2010 vol. 22)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.84
Yuexian Hou , Tianjin University, Tianjin and The Hong Kong Polytechnic University, Hong Kong
Peng Zhang , The Robert Gordon University, Aberdeen
Tingxu Yan , Tianjin University, Tianjin
Wenjie Li , The Hong Kong Polytechnic University, Hong Kong
Dawei Song , The Robert Gordon University, Aberdeen
A fundamental goal of unsupervised feature selection is denoising, which aims to identify and reduce noisy features that are not discriminative. Due to the lack of information about real classes, denoising is a challenging task. The noisy features can disturb the reasonable distance metric and result in unreasonable feature spaces, i.e., the feature spaces in which common clustering algorithms cannot effectively find real classes. To overcome the problem, we make a primary observation that the relevance of features is intrinsic and independent of any metric scaling on the feature space. This observation implies that feature selection should be invariant, at least to some extent, with respect to metric scaling. In this paper, we clarify the necessity of considering the metric invariance in unsupervised feature selection and propose a novel model incorporating metric invariance. Our proposed method is motivated by the following observations: if the statistic that guides the unsupervised feature selection process is invariant with respect to possible metric scaling, the solution of this model will also be invariant. Hence, if a metric-invariant model can distinguish discriminative features from noisy ones in a reasonable feature space, it will also work on the unreasonable counterpart transformed from the reasonable one by metric scaling. A theoretical justification of the metric invariance of our proposed model is given and the empirical evaluation demonstrates its promising performance.
Feature evaluation and selection, information theory, metric invariant.
Y. Hou, D. Song, W. Li, T. Yan and P. Zhang, "Beyond Redundancies: A Metric-Invariant Method for Unsupervised Feature Selection," in IEEE Transactions on Knowledge & Data Engineering, vol. 22, no. , pp. 348-364, 2009.