This Article 
 Bibliographic References 
 Add to: 
Semisupervised Learning of Hierarchical Latent Trait Models for Data Visualization
March 2005 (vol. 17 no. 3)
pp. 384-400
Recently, we have developed the hierarchical Generative Topographic Mapping (HGTM), an interactive method for visualization of large high-dimensional real-valued data sets. In this paper, we propose a more general visualization system by extending HGTM in three ways, which allows the user to visualize a wider range of data sets and better support the model development process. 1) We integrate HGTM with noise models from the exponential family of distributions. The basic building block is the Latent Trait Model (LTM). This enables us to visualize data of inherently discrete nature, e.g., collections of documents, in a hierarchical manner. 2) We give the user a choice of initializing the child plots of the current plot in either interactive, or automatic mode. In the interactive mode, the user selects "regions of interest,” whereas in the automatic mode, an unsupervised minimum message length (MML)-inspired construction of a mixture of LTMs is employed. The unsupervised construction is particularly useful when high-level plots are covered with dense clusters of highly overlapping data projections, making it difficult to use the interactive mode. Such a situation often arises when visualizing large data sets. 3) We derive general formulas for magnification factors in latent trait models. Magnification factors are a useful tool to improve our understanding of the visualization plots, since they can highlight the boundaries between data clusters. We illustrate our approach on a toy example and evaluate it on three more complex real data sets.

[1] F. Aurenhammer, “Voronoi Diagrams— Survey of a Fundamental Geometric Data Structure,” ACM Computing Surveys, vol. 3, pp. 345-405, 1991.
[2] O. Barndorff-Nielsen, Information and Exponential Families in Statistical Theory. Wiley, 1978.
[3] J. Bernardo and A. Smith, Bayesian Theory. Chichester, U.K.: J. Wiley & Sons, 1994.
[4] C.M. Bishop, M. Svensén, and C.K.I. Williams, “GTM: The Generative Topographic Mapping,” Neural Computation, vol. 10, no. 1, pp. 215-235, 1998.
[5] C.M. Bishop, M. Svensén, and C.K.I Williams, “Developments of the Generate Topographic Mapping,” Neurocomputing, vol. 21, pp. 203-224, 1998.
[6] C.M. Bishop, M. Svensén, and C.K.I. Williams, “Magnification Factors for the GTM Algorithm,” Proc. IEE Fifth Int'l Conf. Artificial Neural Networks, pp. 64-69, 1997.
[7] C.M. Bishop and M.E. Tipping, “A Hierarchical Latent Variable Model for Data Visualization,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 3, pp. 281-293, 1998.
[8] D.M. Boulton and C.S. Wallace, “An Information Measure for Hierarchic Classification,” Computer J., vol. 16, no. 3, pp. 254-261, 1973.
[9] G. Celeux, S. Chrétien, F. Forbes, and A. Mkhadri, “A Component-Wise EM Algorithm for Mixtures,” J. Computational and Graphical Statistics, vol. 10, pp. 699-712, 2001.
[10] A.P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” J. Royal Statistical Soc., Series B, vol. 39, pp. 1-38, 1977.
[11] M. Figueiredo and A.K. Jain, “Unsupervised Learning of Finite Mixture Models,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, pp. 381-396, 2002.
[12] P. Horton and K. Nakai, “A Probabilistic Classification System for Predicting the Cellular Localization Sites of Proteins,” Intelligent System in Molecular Biology, vol. 4, pp. 109-115, 1996.
[13] A. Inselberg and B. Dimsdale, “Parallel Coordinates: A Tool for Visualizing Mulitdimensional Geometry,” Proc. Visualization '90, pp. 361-78, 1990.
[14] A. Kabán and M. Girolami, “A Combined Latent Class and Trait Model for the Analysis and Visualization of Discrete Data,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 8, pp. 859-872, Aug. 2001.
[15] A. Kabán, P. Tiňo, and M. Girolami, “A General Framework for a Principled Hierarchical Visualisation of Multivariate Data,” Proc. Third Int'l Conf. Intelligent Data Eng. and Automated Learning (IDEAL '02), pp. 17-23, 2002.
[16] D.A. Keim, H.P. Kriegel, and M. Ankerst, “Recursive Pattern: A Technique for Visualizing Very Large Amounts of Data,” Proc. Sixth IEEE Visualization 1995 Conf. (VIS '95), pp. 279-286, 1995.
[17] T. Kohonen, “The Self-Organizing Map,” Proc. IEEE, vol. 78, no. 9, pp. 1464-1479, 1990.
[18] T. Kohonen, Self-Organizing Maps. Berlin: Springer-Verlag, 1999.
[19] Y. Koren and D. Harel, “A Two-Way Visualization Method for Clustered Data,” Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 589-594, 2003.
[20] J. LeBlanc, M.O. Ward, and N. Wittels, “Exploring n-Dimensional Databases,” Proc. Visualization '90, pp. 230-237, 1990.
[21] P. McCullagh and L. Nelder, Generalized Linear Models. Chapman and Hall, 1985.
[22] R. Miikkulainen, “Script Recognition with Hierarchical Feature Maps,” Connection Science, vol. 2, pp. 83-101, 1990.
[23] E. Pampalk, W. Goebl, and G. Widmer, “Visualising Changes in the Structure of Data for Exploratory Feature Selection,” Proc. 2003 ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '03), P. Domingos et al., eds., pp. 157-166, 2003.
[24] W. Ribarsky, E. Ayers, J. Eble, and S. Mukherjea, “Glyphmaker: Creating Customized Visualization of Complex Data,” Computer, vol. 27, vol. 7, pp. 57-64, July 1994.
[25] S.J. Roberts, C. Holmes, and D. Denison, “Minimum-Entropy Data Partitioning Using Reversible Jump Markov Chain MonteCarlo,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, pp. 909-914, 2001.
[26] C. Stein, “Approximation of Improper Prior Measures by Proper Probability Measures,” Bernoulli, Bayes, Laplace Festschrift. (J. Neyman and L. LeCam, eds.)., Berlin: Springer, pp. 217-240, 1965.
[27] P. Tiňo and I.T. Nabney, “Hierarchical GTM: Constructing Localized Nonlinear Projection Manifolds in a Principled Way,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, pp. 639-656, 2002.
[28] C.S. Wallace and D.L. Dowe, “Minimum Message Length and Kolmogorov Complexity,” The Computer J., vol. 42, pp. 270-283, 1999.
[29] C.S. Wallace and D.M. Boulton, “An Information Measure for Classification,” The Computer J., vol. 11, no. 2, pp. 185-194, 1968.
[30] C.S. Wallace and D.L. Dowe, “Intrinsic Classification by MML— the Snob Program,” Proc. Seventh Australian Joint Conf. Artificial Intelligence, C. Zhang et al., eds., World Scientific, pp. 37-44, 1994.
[31] C.S. Wallace and D.L. Dowe, “Refinements of MDL and MML Coding,” Computer J., vol. 42, no. 4, pp. 330-337, 1999.
[32] C.S. Wallace and D.L. Dowe, “MML Clustering of Multi-State, Poisson, Von Mises Circular and Gaussian Distributions,” Statistics and Computing, vol. 10, pp. 73-83, 2000.
[33] C.S. Wallace and P.R. Freeman, “Estimation and Inference by Compact Coding,” J. Royal Statistical Soc., series B, vol. 49, pp. 240-265, 1987.
[34] E.J. Wegman, “Hyperdimensional Data Analysis Using Parallel Coordinates,” J. Am. Statistical Assoc., vol. 411, no. 85, p. 664, 1990.
[35] R. Wolke and H. Schwetlick, “Iterative Reweighted Least Squares: Algorithms, Convergence Analysis, and Numerical Comparisons,” SIAM J. Scientific and Statistical Computing, vol. 9, no. 5, pp. 907-921, 1999.
[36] J. Yang, M.O. Ward, and E.A. Rundensteiner, “Interactive Hierarchical Displays: A General Framework for Visualisation and Exploration of Large Multivariate Data Sets,” Computers and Graphics J., vol. 27, pp. 265-283, 2002.

Index Terms:
Hierarchical model, latent trait model, magnification factors, data visualization, document mining.
Ian T. Nabney, Yi Sun, Peter Tino, Ata Kab?, "Semisupervised Learning of Hierarchical Latent Trait Models for Data Visualization," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 3, pp. 384-400, March 2005, doi:10.1109/TKDE.2005.49
Usage of this product signifies your acceptance of the Terms of Use.