This Article 
 Bibliographic References 
 Add to: 
A Combined Latent Class and Trait Model for the Analysis and Visualization of Discrete Data
August 2001 (vol. 23 no. 8)
pp. 859-872

Abstract—We present a general framework for data analysis and visualization by means of topographic organization and clustering. Imposing distributional assumptions on the assumed underlying latent factors makes the proposed model suitable for both visualization and clustering. The system noise will be modeled in parametric form, as a member of the exponential family of distributions and this allows us to deal with different (continuous or discrete) types of observables in a unified framework. In this paper, we focus on discrete case formulations which, contrary to self organizing methods for continuous data, imply variants of Bregman divergencies as measures of dissimilarity between data and reference points and, also, define the matching nonlinear relation between latent and observable variables. Therefore, the trait variant of the model can be seen as a data-driven noisy nonlinear Independent Component Analysis, which is capable of revealing meaningful structure in the multivariate observable data and visualizing it in two dimensions. The class variant (which performs the clustering) of our model performs data-driven parametric mixture modeling. The combined (trait and class) model along with the associated estimation procedures allows us to interpret the visualization result, in the sense of a topographic ordering. One important application of this work is the discovery of underlying semantic structure in text-based documents. Experimental results on various subsets of the 20-News groups text corpus and binary coded digits data are given by way of demonstration.

[1] S. Amari, Differential Geometrical Methods in Statistics. Berlin: Springer Verlag, 1985.
[2] O. Barndorff-Nielsen, Information and Exponential Families in Statistical Theory. Chichester: Wiley, 1978.
[3] A. Belouchrani and J.F. Cardoso, “A Maximum Likelihood Source Separation for Discrete Sources,” Proc. European Signal Processing Conf., vol. 2, pp. 768-771, 1994.
[4] C.M. Bishop, Neural Networks for Pattern Recognition. Clarendon Press, 1995.
[5] C.M. Bishop, M. Svensen, and C.K.I. Williams, “Developments of the Generative Topographic Mapping,” Neurocomputing, vol. 21, pp. 203-224, 1998.
[6] C.M. Bishop, M. Svensén, and C.K.I. Williams, “GTM: The Generative Topographic Mapping,” Neural Computation, vol. 10, no. 1, pp. 215-235, 1998.
[7] P. Cheeseman and J. Stutz, “Bayesian Classification (AutoClass): Theory and Results,” Advances in Knowledge Discovery and Data Mining, AAAI Press/MIT Press, pp. 61-83, 1996.
[8] P. Cheeseman, J. Kelly, M. Self, J. Stutz, W. Taylor, and D. Freeman, “AutoClass: A Bayesian Classification System,” Proc. Fifth Int'l Conf. Machine Learning. 1988.
[9] K.W. Church and W. Gale, “Poisson Mixtures,” Natural Language Eng., vol. 1, no. 2, pp. 163-190, 1995.
[10] T.M. Cover and J.A. Thomas, Elements of Information Theory. John Wiley&Sons, 1991.
[11] A.P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” J. Royal Statistical Soc., B, vol. 39, no. 1, pp. 1-38, 1977.
[12] S. Dumais et al., "Inductive Learning Algorithms and Representations for Text Categorization, to be published in Proc. Conf. Information and Knowledge Management, 1998; .
[13] A.P. Dunmur and D.M. Titterington, “Analysis of Latent Structure Models with Multidimensional Latent Variables,” Statistics and Neural Networks: Recent Advances at the Interface. J.W. Kay and D.M. Titterington, eds., pp. 165-194, Oxford: Oxford Univ. Press, 1999.
[14] S.T. Roweis and Z. Ghahramani, “A Unifying Review of Linear Gaussian Models,” Neural Computation, vol. 11, no. 2, pp. 305-345, 1999.
[15] M. Girolami, “A Generative Model for Sparse Discrete Binary Data with Non-Uniform Categorical Priors,” Proc. European Symp. Artificial Neural Networks (ESANN'00), pp. 1-6, 2000.
[16] M. Girolami, “Document Representations Based on Generative Multivariate Bernoulli Latent Topic Models,” Proc. BCS-Information Retrival Specialist Group 22nd Ann. Colloquium Information Retrieval Research, pp. 194-201, 2000.
[17] M. Girolami, A. Cichocki, and S.I. Amari, “A Common Neural Network Model for Exploratory Data Analysis and Independent Component Analysis,” IEEE Trans. Neural Networks, vol 9, no. 6, pp. 1495-1501, 1998.
[18] M. Girolami, Self-Organising Neural Networks. Springer-Verlag, 1999.
[19] M. Girolami, Advances in Independent Component Analysis. Perspectives in Neural Computation. M. Girolami, ed. Springer Verlag, 2000.
[20] I.J. Good, The Estimation of Probabilities: An Essay on Modern Bayesian Methods. M.I.T. Press, 1965.
[21] A. Hyvarinen, Fast and Robust Fixed-Point Algorithm for Independent Component Analysis IEEE Trans. Neural Networks, vol. 10, pp. 626-634, 1999.
[22] T. Hofmann, “Learning the Similarity of Documents,” Proc. Advances in Neural Information Processing Systems. MIT Press, 2000.
[23] T. Hofmann, “A Probabilistic Approach for Mapping Large Document Collections,” J. Intelligent Data Analysis, kalman.html ~sdumais/cikm98.doc people/thpublications.html
[24] A. Kabán and M. Girolami, “Initialized and Guided EM-Clustering of Sparse Binary Data with Application to Text Based Documents,” Proc. 15th Int'l Conf. Pattern Recognition, vol. 2, pp. 748-751, 2000.
[25] A. Kabán and M. Girolami, “Clustering of Text Documents by Skewness Maximization,” Proc. Second Int'l Workshop Independent Component Analysis and Blind Signal Separation, pp. 435-440, 2000.
[26] T. Kohonen, Self-Organizing Maps. Berlin: Springer-Verlag, 1995.
[27] R. Kohavi, B. Backer, and D. Sommerfield, “Improving Simple Bayes,” Proc. European Conf. Machnine Learning, 1997.
[28] D.D. Lee and H.S. Seung, “Learning the Parts of Objects by Non-negative Matrix Factorization,” Nature, vol. 401, pp. 788-791, 1999.
[29] D.D. Lewis, “Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval,” European Conf. Machine Learning, pp. 4-15, 1998.
[30] D.D. Lewis, “Feature Selection and Feature Extraction for Text Categorization,” Proc. Speech and Natural Language Workshop, Defense, Advanced Research Projects Agency, pp. 212-217, 1992.
[31] D.J.C. MacKay, “Density Networks and their Application to Protein Modelling,” Maximum Entropy and Bayesian Methods, pp. 259-268, Kluwer, 1996.
[32] D.J.C. MacKay, “Bayesian Neural Networks and Density Networks,” Nuclear Instruments and Methods in Physics Research, Section A, vol. 354, no. 1, pp. 73-80, 1995.
[33] A. McCallum and K. Nigam, “A Comparison of Event Models for Naive Bayes Text Classification,” Proc. Am. Assoc. Artifical Intelligence/Int'l Conf. Machine Learning-98 Workshop Learning Text Categorization, Technical Report WS-98-05, pp. 41-48, 1998.
[34] P. McCullagh and L.A. Nelder, Generalized Linear Models. Chapman and Hall, 1985.
[35] M. Meila and D. Heckerman, An Experimental Comparison of Several Clustering and Initialization Methods, Technical Report MSR-TR-98-06, Microsoft Research, Feb. 1998.
[36] F. Mosteller and D. Wallace, Applied Bayesian and Classical Inference. Springer-Verlag, 1984.
[37] K. Nigam, A. McCallum, S. Thrun, and T. Mitchell, Text Classification from Labeled and Unlabeled Documents Using EM Machine Learning, pp. 1-34, 2000.
[38] G.T. McLachlan and T. Krishnan, The EM Algorithm and Extensions. Wiley, 1997.
[39] I. Moustaki, “A Latent Trait and a Latent Class Model for Mixed Observed Variables,” British J. Math. and Statistical Psychology, vol. 49, pp. 313-334, 1996.
[40] P. Pajunen, “Blind Separation of Binary Sources with Less Sensors than Sources,” Proc. IEEE Int'l Conf. Neural Networks, vol. 3, pp. 1994-1997, 1997.
[41] M.A. Peot, “Geometric Implications of the Naïve Bayes Assumption,” Proc. Conf. Uncertainty in Artificial Intelligence (UAI'96), pp. 414-419, 1996.
[42] M. Sahami, “Using Machine Learning to Improve Information Access,” PhD dissertation, Dept. Computer Science, Stanford Univ., 1998.
[43] M.D. Sammel, L.M. Ryan, and J.M. Legler, “Latent Variable Models for Mixed Discrete and Continuous Outcomes,” J. Royal Statistical Soc., Series B, vol. 59, pp. 667-678, 1997.
[44] M.E. Tipping and C.M. Bishop, “Probabilistic Principal Component Analysis,” J. Royal Statistical Soc., Series B, vol. 61, no. 3, pp. 611-622, 1999.
[45] M.E. Tipping, “Probabilistic Visualization of High-Dimensional Binary Data,” Proc. Advances in Neural Information Processing Systems (NIPS*11), pp. 592-598, 1999.
[46] S. Vaithyanathan and B. Dom, “Generalized Model Selection for Unsupervised Learning in High Dimensions,” Proc. Advances in Neural Information Processing Systems (NIPS*99), 1999.
[47] A. Vinokourov and M. Girolami, “A Probabilistic Hierarchical Clustering Method for Organizing Collections of Text Documents,” Proc. Fifth Int'l Conf. Pattern Recognition (ICPR'2000), vol. 2, pp. 182-185, 2000.
[48] H.H. Yang and J. Moody, “Data Visualization and Feature Selection: New Algorithms for Nongaussian Data,” Proc. Advances in Neural Information Processing Systems, pp. 687-693, MIT Press, 2000,
[49] Y. Yang and X. Liu, “A Re-Examination of Text Categorization Methods,” Proc. 21st Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 42-49, 1999.

Index Terms:
Latent trait model, generative model, nonlinear mapping, topographic mapping, independent component analysis, clustering.
Ata Kabán, Mark Girolami, "A Combined Latent Class and Trait Model for the Analysis and Visualization of Discrete Data," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 8, pp. 859-872, Aug. 2001, doi:10.1109/34.946989
Usage of this product signifies your acceptance of the Terms of Use.