This Article 
 Bibliographic References 
 Add to: 
Dimensionality Reduction of Clustered Data Sets
March 2008 (vol. 30 no. 3)
pp. 535-540
We present a novel probabilistic latent variable model to perform linear dimensionality reduction on data sets which contain clusters. We prove that the maximum likelihood solution of the model is an unsupervised generalisation of linear discriminant analysis. This provides a completely new approach to one of the most established and widely used classification algorithms. The performance of the model is then demonstrated on a number of real and artificial data sets.

[1] C.M. Bishop, Pattern Recognition and Machine Learning. Springer, 2006.
[2] C.M. Bishop and M.E. Tipping, “A Hierarchical Latent Variable Model for Data Visualisation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 3, pp. 281-293, Mar. 1998.
[3] C.M. Bishop, M. Svensen, and C.K.I. Williams, “GTM: The Generative Topographic Mapping,” Neural Computation, vol. 10, no. 1, pp. 215-234, 1998.
[4] R.A. Fisher, “The Use of Multiple Measurements in Taxonomic Problems,” Annals of Eugenics, vol. 7, pp. 179-188, 1936.
[5] A.P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” J. Royal Statistical Soc. B, vol. 39, no. 1, pp. 1-38, 1977.
[6] M. Girolami and R. Breitling, “Biologically Valid Linear Factor Models of Gene Expression,” Bioinformatics, vol. 20, no. 17, pp. 3021-3033, 2004.
[7] M. Girolami, “Latent Class and Trait Models for Data Classification and Visualisation,” Independent Component Analysis: Principles and Practice, Cambridge Univ. Press, 2001.
[8] T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfield, and E.S. Lander, “Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring,” Science, vol. 286, no. 5439, pp. 531-537, 1999.
[9] G. Sanguinetti, M. Milo, M. Rattray, and N.D. Lawrence, “Accounting for Probe-Level Noise in Principal Component Analysis of Microarray Data,” Bioinformatics, vol. 21, no. 19, pp. 3748-3754, 2005.
[10] N.D. Lawrence, “Probabilistic Non-Linear Principal Component Analysis with Gaussian Process Latent Variable Models,” J. Machine Learning Research, vol. 6, pp. 1783-1816, 2005.
[11] M. Tipping and C.M. Bishop, “Probabilistic Principal Component Analysis,” J. Royal Statistical Soc. B, vol. 21, no. 3, pp. 611-622, 1999.
[12] M. Tipping and C.M. Bishop, “Mixtures of Probabilistic Principal Component Analyzers,” Neural Computation, vol. 11, no. 2, pp. 443-482, 1999.
[13] C.M. Bishop, “Bayesian PCA,” Proc. Advances in Neural Information Processing Systems, 1999.
[14] M. Welling, F. Agakov, and C.K.I. Williams, “Extreme Component Analysis,” Proc. Advances in Neural Information Processing Systems, 2003.
[15] S. Dasgupta, “Learning Mixture of Gaussians,” Proc. 40th Ann. IEEE Symp. Foundations of Computer Science, 1999.

Index Terms:
dimensionality reduction, clustering, discriminant analysis, probabilistic algorithms
Guido Sanguinetti, "Dimensionality Reduction of Clustered Data Sets," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 3, pp. 535-540, March 2008, doi:10.1109/TPAMI.2007.70819
Usage of this product signifies your acceptance of the Terms of Use.