This Article 
 Bibliographic References 
 Add to: 
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks
June 2001 (vol. 23 no. 6)
pp. 590-603

Abstract—This paper introduces a novel enhancement for unsupervised learning of conditional Gaussian networks that benefits from feature selection. Our proposal is based on the assumption that, in the absence of labels reflecting the cluster membership of each case of the database, those features that exhibit low correlation with the rest of the features can be considered irrelevant for the learning process. Thus, we suggest performing this process using only the relevant features. Then, every irrelevant feature is added to the learned model to obtain an explanatory model for the original database which is our primary goal. A simple and, thus, efficient measure to assess the relevance of the features for the learning process is presented. Additionally, the form of this measure allows us to calculate a relevance threshold to automatically identify the relevant features. The experimental results reported for synthetic and real-world databases show the ability of our proposal to distinguish between relevant and irrelevant features and to accelerate learning; however, still obtaining good explanatory models for the original database.

[1] M.R. Anderberg, Cluster Analysis for Applications. New York: Academic Press, 1973.
[2] J. Banfield and A. Raftery, “Model-Based Gaussian and Non-Gaussian Clustering,” Biometrics, vol. 49, pp. 803-821, 1993.
[3] A. Blum and P. Langley, Selection of Relevant Features and Examples in Machine Learning Artificial Intelligence, vol. 97, nos. 1-2, pp. 245-271, 1997.
[4] L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone, Classification and Regression Trees. Belmont, Calif.: Wadsworth Int'l Group, 1984.
[5] G.F. Cooper and E. Herskovits, “A Bayesian Method for the Induction of Probabilistic Networks from Data,” Machine Learning, vol. 9, pp. 309–347, 1992.
[6] M. Dash, H. Liu, and J. Yao, Dimensionality Reduction of Unsupervised Data Proc. Ninth IEEE Int'l Conf. Tools with Artificial Intelligence (ICTAI '97), pp. 532-539, 1997.
[7] A. Dempster, N. Laird, and D. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” J. Royal Statistical Soc. B, vol. 39, pp. 1-38, 1977.
[8] M. Devaney and A. Ram, “Efficient Feature Selection in Conceptual Clustering,” Proc. 14th Int'l Conf. Machine Learning, 1997.
[9] J. Doak, “An Evaluation of Feature Selection Methods and Their Application to Computer Security,” Technical Report CSE-92-18, Dept. of Computer Science, Univ. of California at Davis, 1992.
[10] R. Duda and P. Hart, Pattern Classification and Scene Analysis. New York: John Wiley&Sons, 1973.
[11] D.H. Fisher, “Knowledge Acquisition via Incremental Conceptual Clustering,” Machine Learning, no. 2, pp. 139-172, 1987.
[12] D. Fisher and G. Hapanyengwi, “Database Management and Analysis Tools of Machine Induction,” J. Intelligent Information Systems, vol. 2, pp. 5-38, 1993.
[13] N. Friedman, “The Bayesian Structural EM Algorithm,” Proc. 14th Conf. Uncertainty in Artificial Intelligence, pp. 129-138, 1998.
[14] N. Friedman and M. Goldszmidt, “Building Classifiers Using Bayesian Networks,” Proc. 13th Nat'l Conf. Artificial Intelligence, pp. 1277-1284, 1996.
[15] D. Geiger and D. Heckerman, “Learning Gaussian Networks,” Technical Report MSR-TR-94-10, Microsoft Research, Redmond, Wash., 1994.
[16] D. Geiger and D. Heckerman, “Learning Gaussian Networks,” Proc. 10th Conf. Uncertainty in Artificial Intelligence pp. 235-243, 1995.
[17] I. Good, “Rational Decisions,” J. Royal Statistical Soc. B, vol. 14, pp. 107-114, 1952.
[18] J.A Hartigan,Clustering Algorithms, John Wiley and Sons, New York, N.Y., 1975.
[19] D. Heckerman and D. Geiger, “Likelihoods and Parameter Priors for Bayesian Networks,” Technical Report MSR-TR-95-54, Microsoft Research, Redmond, Wash., 1995.
[20] I. Inza, P. Larrañaga, R. Etxeberria, and B. Sierra, “Feature Subset Selection by Bayesian Networks-Based Optimization,” Artificial Intelligence, vol. 123, pp. 157-184, 2000.
[21] G.H. John, R. Kohavi, and K. Pfleger, “Irrelevant Features and the Subset Selection Problem,” Proc. 11th Int'l Conf. Machine Learning, pp. 121-129, 1994.
[22] L. Kaufman and P. Rousseeuw, Finding Groups in Data. New York: John Wiley&Sons, 1990.
[23] R. Kohavi and G.H. John, Wrappers for Feature Subset Selection Artificial Intelligence, vol. 97, pp. 273-324, 1997.
[24] P. Larrañaga and J.A. Lozano, Estimation of Distribution Algorithms. A New Tool for Evolutionary Computation. Kluwer Academic Publishers, 2001.
[25] S.L. Lauritzen, “Propagation of Probabilities, Means and Variances in Mixed Graphical Association Models,” J. Am. Statistical Assoc., vol. 87, pp. 1098-1108, 1992.
[26] S.L. Lauritzen, Graphical Models. Oxford, U.K.: Clarendon Press, 1996.
[27] S.L. Lauritzen and N. Wermuth, “Graphical Models for Associations between Variables, Some of which Are Qualitative and Some Quantitative,” The Annals of Statistics, vol. 17, pp. 31-57, 1989.
[28] H. Liu and H. Motoda, Feature Selection for Knowledge Discovery and Data Mining. Dordrecht, The Netherlands: Kluwer Academic, 1998.
[29] G.J. McLachlan and T. Krishnan, The EM Algorithm and Extensions. New York: John Wiley&Sons, 1997.
[30] M. Meila, “Learning with Mixtures of Trees,” PhD thesis, Dept. of Electrical Eng. and Computer Science, Massachusetts Inst. of Technology, Cambridge, Mass., 1999.
[31] M. Meila and D. Heckerman, “An Experimental Comparison of Several Clustering and Initialization Methods” Proc. 14th Conf. Uncertainty in Artificial Intelligence, pp. 386-395, 1998.
[32] M. Meila and M.I. Jordan, “Estimating Dependency Structure as a Hidden Variable,” Neural Information Processing Systems, vol. 10, pp. 584-590, 1998.
[33] C. Merz, P. Murphy, and D. Aha, “UCI Repository of Machine Learning Databases,” Dept. Information and Computer Science, Univ. of California, Irvine, Calif., 1997. joachims.html/ / .
[34] J.M. Peña, J.A. Lozano, and P. Larrañaga, “An Improved Bayesian Structural EM Algorithm for Learning Bayesian Networks for Clustering,” Pattern Recognition Letters, vol. 21, pp. 779-786, 2000.
[35] J.M. Peña, J.A. Lozano, and P. Larrañaga, “Learning Recursive Bayesian Multinets for Data Clustering by Means of Constructive Induction,” Machine Learning, to appear in 2001.
[36] J.M. Peña, J.A. Lozano, and P. Larrañaga, “Performance Evaluation of Compromise Conditional Gaussian Networks for Data Clustering,” Int'l J. Approximate Reasoning, to appear in 2001.
[37] J.M. Peña, J.A. Lozano, and P. Larrañaga, “Learning Conditional Gaussian Networks for Data Clustering via Edge Exclusion Tests,” Pattern Recognition Letters, 2000.
[38] P.W.F. Smith and J. Whittaker, “Edge Exclusion Tests for Graphical Gaussian Models,” Learning in Graphical Models, pp. 555-574, 1998.
[39] L. Talavera, “Feature Selection as a Preprocessing Step for Hierarchical Clustering,” Proc. 16th Int'l Conf. on Machine Learning, pp. 389-397, 1999.
[40] L. Talavera, “Dependency-Based Feature Selection for Clustering Symbolic Data,” Intelligent Data Analysis, vol. 4, pp. 19-28, 2000.
[41] B. Thiesson, C. Meek, D.M. Chickering, and D. Heckerman, “Learning Mixtures of DAG Models,” Proc. 14th Conf. Uncertainty in Artificial Intelligence, pp. 504-513, 1998.
[42] D. Wettschereck and D. Aha, “Weighting Features,” Proc. First Int'l Conf. Case-Based Reasoning, 1995.
[43] J. Whittaker, Graphical Models in Applied Multivariate Statistics. Chichester, U.K.: John Wiley&Sons, 1990.

Index Terms:
Data clustering, conditional Gaussian networks, feature selection, edge exclusion tests.
Jose Manuel Peña, Jose Antonio Lozano, Pedro Larrañaga, Iñaki Inza, "Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 6, pp. 590-603, June 2001, doi:10.1109/34.927460
Usage of this product signifies your acceptance of the Terms of Use.