
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
Ata Kabán, Mark Girolami, "A Combined Latent Class and Trait Model for the Analysis and Visualization of Discrete Data," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 8, pp. 859872, August, 2001.  
BibTex  x  
@article{ 10.1109/34.946989, author = {Ata Kabán and Mark Girolami}, title = {A Combined Latent Class and Trait Model for the Analysis and Visualization of Discrete Data}, journal ={IEEE Transactions on Pattern Analysis and Machine Intelligence}, volume = {23}, number = {8}, issn = {01628828}, year = {2001}, pages = {859872}, doi = {http://doi.ieeecomputersociety.org/10.1109/34.946989}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Pattern Analysis and Machine Intelligence TI  A Combined Latent Class and Trait Model for the Analysis and Visualization of Discrete Data IS  8 SN  01628828 SP859 EP872 EPD  859872 A1  Ata Kabán, A1  Mark Girolami, PY  2001 KW  Latent trait model KW  generative model KW  nonlinear mapping KW  topographic mapping KW  independent component analysis KW  clustering. VL  23 JA  IEEE Transactions on Pattern Analysis and Machine Intelligence ER   
Abstract—We present a general framework for data analysis and visualization by means of topographic organization and clustering. Imposing distributional assumptions on the assumed underlying latent factors makes the proposed model suitable for both visualization and clustering. The system noise will be modeled in parametric form, as a member of the exponential family of distributions and this allows us to deal with different (continuous or discrete) types of observables in a unified framework. In this paper, we focus on discrete case formulations which, contrary to self organizing methods for continuous data, imply variants of Bregman divergencies as measures of dissimilarity between data and reference points and, also, define the matching nonlinear relation between latent and observable variables. Therefore, the trait variant of the model can be seen as a datadriven noisy nonlinear Independent Component Analysis, which is capable of revealing meaningful structure in the multivariate observable data and visualizing it in two dimensions. The class variant (which performs the clustering) of our model performs datadriven parametric mixture modeling. The combined (trait and class) model along with the associated estimation procedures allows us to interpret the visualization result, in the sense of a topographic ordering. One important application of this work is the discovery of underlying semantic structure in textbased documents. Experimental results on various subsets of the 20News groups text corpus and binary coded digits data are given by way of demonstration.
[1] S. Amari, Differential Geometrical Methods in Statistics. Berlin: Springer Verlag, 1985.
[2] O. BarndorffNielsen, Information and Exponential Families in Statistical Theory. Chichester: Wiley, 1978.
[3] A. Belouchrani and J.F. Cardoso, “A Maximum Likelihood Source Separation for Discrete Sources,” Proc. European Signal Processing Conf., vol. 2, pp. 768771, 1994.
[4] C.M. Bishop, Neural Networks for Pattern Recognition. Clarendon Press, 1995.
[5] C.M. Bishop, M. Svensen, and C.K.I. Williams, “Developments of the Generative Topographic Mapping,” Neurocomputing, vol. 21, pp. 203224, 1998.
[6] C.M. Bishop, M. Svensén, and C.K.I. Williams, “GTM: The Generative Topographic Mapping,” Neural Computation, vol. 10, no. 1, pp. 215235, 1998.
[7] P. Cheeseman and J. Stutz, “Bayesian Classification (AutoClass): Theory and Results,” Advances in Knowledge Discovery and Data Mining, AAAI Press/MIT Press, pp. 6183, 1996.
[8] P. Cheeseman, J. Kelly, M. Self, J. Stutz, W. Taylor, and D. Freeman, “AutoClass: A Bayesian Classification System,” Proc. Fifth Int'l Conf. Machine Learning. 1988.
[9] K.W. Church and W. Gale, “Poisson Mixtures,” Natural Language Eng., vol. 1, no. 2, pp. 163190, 1995.
[10] T.M. Cover and J.A. Thomas, Elements of Information Theory. John Wiley&Sons, 1991.
[11] A.P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” J. Royal Statistical Soc., B, vol. 39, no. 1, pp. 138, 1977.
[12] S. Dumais et al., "Inductive Learning Algorithms and Representations for Text Categorization, to be published in Proc. Conf. Information and Knowledge Management, 1998; .
[13] A.P. Dunmur and D.M. Titterington, “Analysis of Latent Structure Models with Multidimensional Latent Variables,” Statistics and Neural Networks: Recent Advances at the Interface. J.W. Kay and D.M. Titterington, eds., pp. 165194, Oxford: Oxford Univ. Press, 1999.
[14] S.T. Roweis and Z. Ghahramani, “A Unifying Review of Linear Gaussian Models,” Neural Computation, vol. 11, no. 2, pp. 305345, 1999.
[15] M. Girolami, “A Generative Model for Sparse Discrete Binary Data with NonUniform Categorical Priors,” Proc. European Symp. Artificial Neural Networks (ESANN'00), pp. 16, 2000.
[16] M. Girolami, “Document Representations Based on Generative Multivariate Bernoulli Latent Topic Models,” Proc. BCSInformation Retrival Specialist Group 22nd Ann. Colloquium Information Retrieval Research, pp. 194201, 2000.
[17] M. Girolami, A. Cichocki, and S.I. Amari, “A Common Neural Network Model for Exploratory Data Analysis and Independent Component Analysis,” IEEE Trans. Neural Networks, vol 9, no. 6, pp. 14951501, 1998.
[18] M. Girolami, SelfOrganising Neural Networks. SpringerVerlag, 1999.
[19] M. Girolami, Advances in Independent Component Analysis. Perspectives in Neural Computation. M. Girolami, ed. Springer Verlag, 2000.
[20] I.J. Good, The Estimation of Probabilities: An Essay on Modern Bayesian Methods. M.I.T. Press, 1965.
[21] A. Hyvarinen, Fast and Robust FixedPoint Algorithm for Independent Component Analysis IEEE Trans. Neural Networks, vol. 10, pp. 626634, 1999.
[22] T. Hofmann, “Learning the Similarity of Documents,” Proc. Advances in Neural Information Processing Systems. MIT Press, 2000.
[23] T. Hofmann, “A Probabilistic Approach for Mapping Large Document Collections,” J. Intelligent Data Analysis, http://www.cs.unc.edu/~welch/kalman/kalman_filter/ kalman.htmlhttp://research.microsoft.com/ ~sdumais/cikm98.dochttp://www.cs.brown.edu/ people/thpublications.html
[24] A. Kabán and M. Girolami, “Initialized and Guided EMClustering of Sparse Binary Data with Application to Text Based Documents,” Proc. 15th Int'l Conf. Pattern Recognition, vol. 2, pp. 748751, 2000.
[25] A. Kabán and M. Girolami, “Clustering of Text Documents by Skewness Maximization,” Proc. Second Int'l Workshop Independent Component Analysis and Blind Signal Separation, pp. 435440, 2000.
[26] T. Kohonen, SelfOrganizing Maps. Berlin: SpringerVerlag, 1995.
[27] R. Kohavi, B. Backer, and D. Sommerfield, “Improving Simple Bayes,” Proc. European Conf. Machnine Learning, 1997.
[28] D.D. Lee and H.S. Seung, “Learning the Parts of Objects by Nonnegative Matrix Factorization,” Nature, vol. 401, pp. 788791, 1999.
[29] D.D. Lewis, “Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval,” European Conf. Machine Learning, pp. 415, 1998.
[30] D.D. Lewis, “Feature Selection and Feature Extraction for Text Categorization,” Proc. Speech and Natural Language Workshop, Defense, Advanced Research Projects Agency, pp. 212217, 1992.
[31] D.J.C. MacKay, “Density Networks and their Application to Protein Modelling,” Maximum Entropy and Bayesian Methods, pp. 259268, Kluwer, 1996.
[32] D.J.C. MacKay, “Bayesian Neural Networks and Density Networks,” Nuclear Instruments and Methods in Physics Research, Section A, vol. 354, no. 1, pp. 7380, 1995.
[33] A. McCallum and K. Nigam, “A Comparison of Event Models for Naive Bayes Text Classification,” Proc. Am. Assoc. Artifical Intelligence/Int'l Conf. Machine Learning98 Workshop Learning Text Categorization, Technical Report WS9805, pp. 4148, 1998.
[34] P. McCullagh and L.A. Nelder, Generalized Linear Models. Chapman and Hall, 1985.
[35] M. Meila and D. Heckerman, An Experimental Comparison of Several Clustering and Initialization Methods, Technical Report MSRTR9806, Microsoft Research, Feb. 1998.
[36] F. Mosteller and D. Wallace, Applied Bayesian and Classical Inference. SpringerVerlag, 1984.
[37] K. Nigam, A. McCallum, S. Thrun, and T. Mitchell, Text Classification from Labeled and Unlabeled Documents Using EM Machine Learning, pp. 134, 2000.
[38] G.T. McLachlan and T. Krishnan, The EM Algorithm and Extensions. Wiley, 1997.
[39] I. Moustaki, “A Latent Trait and a Latent Class Model for Mixed Observed Variables,” British J. Math. and Statistical Psychology, vol. 49, pp. 313334, 1996.
[40] P. Pajunen, “Blind Separation of Binary Sources with Less Sensors than Sources,” Proc. IEEE Int'l Conf. Neural Networks, vol. 3, pp. 19941997, 1997.
[41] M.A. Peot, “Geometric Implications of the Naïve Bayes Assumption,” Proc. Conf. Uncertainty in Artificial Intelligence (UAI'96), pp. 414419, 1996.
[42] M. Sahami, “Using Machine Learning to Improve Information Access,” PhD dissertation, Dept. Computer Science, Stanford Univ., 1998.
[43] M.D. Sammel, L.M. Ryan, and J.M. Legler, “Latent Variable Models for Mixed Discrete and Continuous Outcomes,” J. Royal Statistical Soc., Series B, vol. 59, pp. 667678, 1997.
[44] M.E. Tipping and C.M. Bishop, “Probabilistic Principal Component Analysis,” J. Royal Statistical Soc., Series B, vol. 61, no. 3, pp. 611622, 1999.
[45] M.E. Tipping, “Probabilistic Visualization of HighDimensional Binary Data,” Proc. Advances in Neural Information Processing Systems (NIPS*11), pp. 592598, 1999.
[46] S. Vaithyanathan and B. Dom, “Generalized Model Selection for Unsupervised Learning in High Dimensions,” Proc. Advances in Neural Information Processing Systems (NIPS*99), 1999.
[47] A. Vinokourov and M. Girolami, “A Probabilistic Hierarchical Clustering Method for Organizing Collections of Text Documents,” Proc. Fifth Int'l Conf. Pattern Recognition (ICPR'2000), vol. 2, pp. 182185, 2000.
[48] H.H. Yang and J. Moody, “Data Visualization and Feature Selection: New Algorithms for Nongaussian Data,” Proc. Advances in Neural Information Processing Systems, pp. 687693, MIT Press, 2000,
[49] Y. Yang and X. Liu, “A ReExamination of Text Categorization Methods,” Proc. 21st Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 4249, 1999.