This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Greedy Learning of Binary Latent Trees
June 2011 (vol. 33 no. 6)
pp. 1087-1097
Stefan Harmeling, Max Planck Institute for Biological Cybernetics, Tübingen
Christopher K.I. Williams, University of Edinburgh, Edinburgh
Inferring latent structures from observations helps to model and possibly also understand underlying data generating processes. A rich class of latent structures is the latent trees, i.e., tree-structured distributions involving latent variables where the visible variables are leaves. These are also called hierarchical latent class (HLC) models. Zhang and Kočka [CHECK END OF SENTENCE] proposed a search algorithm for learning such models in the spirit of Bayesian network structure learning. While such an approach can find good solutions, it can be computationally expensive. As an alternative, we investigate two greedy procedures: The BIN-G algorithm determines both the structure of the tree and the cardinality of the latent variables in a bottom-up fashion. The BIN-A algorithm first determines the tree structure using agglomerative hierarchical clustering, and then determines the cardinality of the latent variables as for BIN-G. We show that even with restricting ourselves to binary trees, we obtain HLC models of comparable quality to Zhang's solutions (in terms of cross-validated log-likelihood), while being generally faster to compute. This claim is validated by a comprehensive comparison on several data sets. Furthermore, we demonstrate that our methods are able to estimate interpretable latent structures on real-world data with a large number of variables. By applying our method to a restricted version of the 20 newsgroups data, these models turn out to be related to topic models, and on data from the PASCAL Visual Object Classes (VOC) 2007 challenge, we show how such tree-structured models help us understand how objects co-occur in images. For reproducibility of all experiments in this paper, all code and data sets (or links to data) are available at http://people.kyb.tuebingen.mpg.de/harmeling/code/ltt-1.4.tar.

[1] D. Blei, A. Ng, and M. Jordan, “Latent Dirichlet Allocation,” J. Machine Learning Research, vol. 3, pp. 993-1022, 2003.
[2] K.A. Bollen, Structural Equations with Latent Variables. Wiley, 1989.
[3] C.K. Chow and C.N. Liu, “Approximating Discrete Probability Distributions with Dependence Trees,” IEEE Trans. Information Theory, vol. 14, no. 3, pp. 462-467, May 1968.
[4] D. Connolly, “Constructing Hidden Variables in Bayesian Networks via Conceptual Clustering,” Proc. 10th Int'l Conf. Machine Learning, pp. 65-72, 1993.
[5] R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis. Wiley, 1973.
[6] J. Felsenstein, Inferring Phylogenies. Sinauer Assoc., 2004.
[7] X. Feng, C.K.I. Williams, and S.N. Felderhof, “Combining Belief Networks and Neural Networks for Scene Segmentation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 4, pp. 467-483, Apr. 2002.
[8] D.H. Fisher, “Knowledge Acquisition via Incremental Conceptual Clustering,” Machine Learning, vol. 2, pp. 139-172, 1987.
[9] N. Friedman, “Pcluster: Probabilistic Agglomerative Clustering of Gene Expression Profiles,” Technical Report 80, Hebrew Univ., 2003.
[10] K. Heller and Z. Ghahramani, “Bayesian Hierarchical Clustering,” Proc. 22nd Int'l Conf. Machine Learning, pp. 297-304, 2005.
[11] G.E. Hinton, S. Osindero, and Y.W. Teh, “A Fast Learning Algorithm for Deep Belief Nets,” Neural Computation, vol. 18, pp. 1527-1554, 2006.
[12] C. Kemp and J.B. Tenenbaum, “The Discovery of Structural Form,” Proc. Nat'l Academy of Sciences, vol. 105, pp. 10687-10692, 2008.
[13] T. Kohlmann and A.K. Formann, “Using Latent Class Models to Analyze Response Patterns in Epidemiologic Mail Surveys,” Applications of Latent Trait and Latent Class Models in the Social Sciences, J. Rost and R. Langeheine, eds., Waxman Verlag, 1997.
[14] I. Kojadinovic, “Agglomerative Hierarchical Clustering of Continuous Variables Based on Mutual Information,” Computational Statistics and Data Analysis, vol. 46, pp. 269-294, 2004.
[15] P.F. Lazarsfeld and N.W. Henry, Latent Structure Analysis. Houghton Mifflin, 1968.
[16] R.M. Neal, “Density Modeling and Clustering Using Dirichlet Diffusion Trees,” Bayesian Statistics, J.M. Bernardo et al., eds., vol. 7, pp. 619-629, Oxford Univ. Press, 2003.
[17] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, 1988.
[18] Y.W. Teh, H. Daume,III, and D.M. Roy, “Bayesian Agglomerative Clustering with Coalescents,” Advances in Neural Information Processing Systems, J. Platt, D. Koller, Y. Singer, and S. Roweis, eds., vol. 20, MIT Press, 2008.
[19] Y. Wang, N. Zhang, and T. Chen, “Latent Tree Models and Approximate Inference in Bayesian Networks,” J. Artificial Intelligence Research, vol. 32, pp. 879-900, 2008.
[20] C.K.I. Williams, “A MCMC Approach to Hierarchical Mixture Modelling,” Advances in Neural Information Processing Systems, S.A. Solla, T.K. Leen, and K.-R. Müller, eds., vol. 12, MIT Press, 2000.
[21] N.L. Zhang, “Hierachical Latent Class Models for Cluster Analysis,” J. Machine Learning Research, vol. 5, pp. 697-723, 2004.
[22] N.L. Zhang and T. Kočka, “Efficient Learning of Hierarchical Latent Class Models,” Proc. 16th IEEE Int'l Conf. Tools with AI, 2004.

Index Terms:
Unsupervised learning, latent variable model, hierarchical latent class model, greedy methods.
Citation:
Stefan Harmeling, Christopher K.I. Williams, "Greedy Learning of Binary Latent Trees," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 6, pp. 1087-1097, June 2011, doi:10.1109/TPAMI.2010.145
Usage of this product signifies your acceptance of the Terms of Use.