Issue No. 06 - June (2001 vol. 23)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/34.927460
<p><b>Abstract</b>—This paper introduces a novel enhancement for unsupervised learning of conditional Gaussian networks that benefits from feature selection. Our proposal is based on the assumption that, in the absence of labels reflecting the cluster membership of each case of the database, those features that exhibit low correlation with the rest of the features can be considered irrelevant for the learning process. Thus, we suggest performing this process using only the relevant features. Then, every irrelevant feature is added to the learned model to obtain an explanatory model for the original database which is our primary goal. A simple and, thus, efficient measure to assess the relevance of the features for the learning process is presented. Additionally, the form of this measure allows us to calculate a relevance threshold to automatically identify the relevant features. The experimental results reported for synthetic and real-world databases show the ability of our proposal to distinguish between relevant and irrelevant features and to accelerate learning; however, still obtaining good explanatory models for the original database.</p>
Data clustering, conditional Gaussian networks, feature selection, edge exclusion tests.
Jose Antonio Lozano, Iñaki Inza, Pedro Larrañaga, Jose Manuel Peña, "Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 23, no. , pp. 590-603, June 2001, doi:10.1109/34.927460