The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.02 - Feb. (2013 vol.35)
pp: 398-410
F. Sánchez-Vega , Dept. of Appl. Math. & Stat., Johns Hopkins Univ., Baltimore, MD, USA
J. Eisner , Dept. of Comput. Sci., Johns Hopkins Univ., Baltimore, MD, USA
L. Younes , Dept. of Appl. Math. & Stat., Johns Hopkins Univ., Baltimore, MD, USA
D. Geman , Dept. of Appl. Math. & Stat., Johns Hopkins Univ., Baltimore, MD, USA
ABSTRACT
We present a new framework for learning high-dimensional multivariate probability distributions from estimated marginals. The approach is motivated by compositional models and Bayesian networks, and designed to adapt to small sample sizes. We start with a large, overlapping set of elementary statistical building blocks, or “primitives,” which are low-dimensional marginal distributions learned from data. Each variable may appear in many primitives. Subsets of primitives are combined in a Lego-like fashion to construct a probabilistic graphical model; only a small fraction of the primitives will participate in any valid construction. Since primitives can be precomputed, parameter estimation and structure search are separated. Model complexity is controlled by strong biases; we adapt the primitives to the amount of training data and impose rules which restrict the merging of them into allowable compositions. The likelihood of the data decomposes into a sum of local gains, one for each primitive in the final structure. We focus on a specific subclass of networks which are binary forests. Structure optimization corresponds to an integer linear program and the maximizing composition can be computed for reasonably large numbers of variables. Performance is evaluated using both synthetic data and real datasets from natural language processing and computational biology.
INDEX TERMS
Bayesian methods, Assembly, Computational modeling, Probability distribution, Object oriented modeling, Connectors, Joints,linear programming, Graphs and networks, statistical models, machine learning
CITATION
F. Sánchez-Vega, J. Eisner, L. Younes, D. Geman, "Learning Multivariate Distributions by Competitive Assembly of Marginals", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.35, no. 2, pp. 398-410, Feb. 2013, doi:10.1109/TPAMI.2012.96
REFERENCES
[1] D.J. Duggan, M. Bittner, Y. Chen, P. Meltzer, and J.M. Trent, "Expression Profiling Using cDNA Microarrays," Nature Genetics Suppl., vol. 21, pp. 10-14, 1999.
[2] R.J. Lipshutz, S.P. Fodor, T.R. Gingeras, and D.J. Lockhart, "High-Density Synthetic Oligonucleotide Arrays," Nature Genetics, vol. 21, pp. 20-24, 1999.
[3] Y. Wang, D.J. Miller, and R. Clarke, "Approaches to Working in High-Dimensional Data Spaces: Gene Expression Microarrays," British J. Cancer, vol. 98, pp. 1023-1028, Feb. 2008.
[4] J.H. Moore and M.D. Ritchie, "The Challenges of Whole-Genome Approaches to Common Diseases," J. Am. Medical Assoc., vol. 291, no. 13, pp. 1642-1643, 2004.
[5] M. Morley, C. Molony, T. Weber, J. Devlin, K. Ewens, R. Spielman, and V. Cheung, "Genetic Analysis of Genome-Wide Variation in Human Gene Expression," Nature, vol. 430, pp. 743-747, 2004.
[6] M.I. McCarthy, G.R. Abecasis, L.R. Cardon, D.B. Goldstein, J. Little, J.P.A. Ioannidis, and J.N. Hirschhorn, "Genome-Wide Association Studies for Complex Traits: Consensus, Uncertainty and Challenges," Nature Rev. Genetics, vol. 9, no. 5, pp. 356-369, 2004.
[7] T. Hastie, R. Tibshirani, and J.H. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer-Verlag, 2001.
[8] S. Geman, E. Bienenstock, and R. Doursat, "Neural Networks and the Bias/Variance Dilemma," Neural Computation, vol. 4, no. 1, pp. 1-58, 1992.
[9] A.J. Butte and I.S. Kohane, "Mutual Information Relevance Networks: Functional Genomic Clustering Using Pairwise Entropy Measurements," Proc. Pacific Symp. Biocomputing, pp. 418-29, 2000.
[10] A. Margolin, I. Nemenman, K. Basso, C. Wiggins, G. Stolovitzky, R. Favera, and A. Califano, "ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context," BMC Bioinformatics, vol. 7, p. S7, 2006.
[11] J.J. Faith, B. Hayete, J.T. Thaden, I. Mogno, J. Wierzbowski, G. Cottarel, S. Kasif, J.J. Collins, and T.S. Gardner, "Large-Scale Mapping and Validation of e. coli Transcriptional Regulation from a Compendium of Expression Profiles," PLoS Biology, vol. 5, no. 1, p. e8, Jan. 2007.
[12] G.F. Cooper and T. Dietterich, "A Bayesian Method for the Induction of Probabilistic Networks from Data," Machine Learning, vol. 9, pp. 309-347, 1992.
[13] D.T. Brown, "A Note on Approximations to Discrete Probability Distributions," Information and Control, vol. 2, no. 4, pp. 386-392, 1959.
[14] C.I. Chow and C.N. Liu, "Approximating Discrete Probability Distributions with Dependence Trees," IEEE Trans. Information Theory, vol. 14, no. 3, pp. 462-467, May 1968.
[15] M. Meila, "An Accelerated Chow and Liu Algorithm: Fitting Tree Distributions to High-Dimensional Sparse Data," Proc. 16th Int'l Conf. Machine Learning, pp. 249-257, 1999.
[16] T. Szántai and E. Kovács, "Hypergraphs as a Means of Discovering the Dependence Structure of a Discrete Multivariate Probability Distribution," Ann. Operating Research, pp. 1-20, 2010.
[17] F.R. Bach and M.I. Jordan, "Thin Junction Trees," Proc. Neural Information Processing Systems, pp. 569-576, 2001.
[18] A. Chechetka and C. Guestrin, "Efficient Principled Learning of Thin Junction Trees," Proc. Neural Information Processing System, 2007.
[19] D. Shahaf and C. Guestrin, "Learning Thin Junction Trees via Graph Cuts," J. Machine Learning Research—Proc. Track, vol. 5, pp. 113-120, 2009.
[20] M. Narasimhan and J. Bilmes, "PAC-Learning Bounded Tree-Width Graphical Models," Proc. 20th Conf. Ann. Conf. Uncertainty in Artificial Intelligence, pp. 410-417, 2004.
[21] G. Elidan and S. Gould, "Learning Bounded Treewidth Bayesian Networks," Proc. Neural Information Processing Systems, pp. 417-424, 2008.
[22] S.I. Lee, V. Ganapathi, and D. Koller, "Efficient Structure Learning of Markov Networks Using L1-Regularization," Proc. Neural Information Processing Systems, pp. 817-824, 2007.
[23] C.P. de Campos, Z. Zeng, and Q. Ji, "Structure Learning of Bayesian Networks Using Constraints," Proc. 26th Ann. Int'l Conf. Machine Learning, 2009.
[24] S. Geman, D.F. Potter, and Z. Chi, "Composition Systems," Quarterly Applied Math., vol. 60, no. 4, pp. 707-736, 2002.
[25] S.C. Zhu and D. Mumford, "A Stochastic Grammar of Images," Foundations and Trends in Computing Graphics and Vision, vol. 2, no. 4, pp. 259-362, 2006.
[26] Y. Amit and A. Trouvé, "POP: Patchwork of Parts Models for Object Recognition," Int'l J. Computing Vision, vol. 75, no. 2, pp. 267-282, Nov. 2007.
[27] A. Dobra, C. Hans, B. Jones, J.R. Nevins, G. Yao, and M. West, "Sparse Graphical Models for Exploring Gene Expression Data," J. Multivariate Analysis, vol. 90, pp. 196-212, July 2004.
[28] J. Utans, "Learning in Compositional Hierarchies: Inducing the Structure of Objects from Data," Proc. Neural Information Processing Systems, pp. 285-292, 1994.
[29] R.D. Rimey and C.M. Brown, "Control of Selective Perception Using Bayes Nets and Decision Theory," Int'l J. Computing Vision, vol. 12, pp. 173-207, Apr. 1994.
[30] B. Neumann and K. Terzic, "Context-Based Probabilistic Scene Interpretation," Artificial Intelligence in Theory and Practice III: Proc. Third IFIP TC 12 Int'l Conf. Artificial Intelligence, vol. 331, pp. 155-164, 2010.
[31] D. Heckerman, D.M. Chickering, C. Meek, R. Rounthwaite, and C. Kadie, "Dependency Networks for Inference, Collaborative Filtering, and Data Visualization," J. Machine Learning Research, vol. 1, pp. 49-75, 2000.
[32] Y. Xiang, F.V. Jensen, and X. Chen, "Multiply Sectioned Bayesian Networks and Junction Forests for Large Knowledge-Based Systems," Computational Intelligence, vol. 9, pp. 680-687, 1993.
[33] D. Koller and A. Pfeffer, "Object-Oriented Bayesian Networks," Proc. 13th Conf. Uncertainty in Artificial Intelligence, pp. 302-313, 1997.
[34] N. Friedman, L. Getoor, D. Koller, and A. Pfeffer, "Learning Probabilistic Relational Models," Proc. 16th Int'l Conf. Artificial Intelligence, pp. 1300-1309, 1999.
[35] E. Gyftodimos and P.A. Flach, "Hierarchical Bayesian Networks: An Approach to Classification and Learning for Structured Data," Methods and Applications of Artificial Intelligence: Proc. Third Hellinc Conf. Artificial Intelligence, G.A. Vouros and T. Panayiotopoulos, eds., pp. 291-300, 2004.
[36] D. Pe'er, "Bayesian Network Analysis of Signaling Networks: A Primer," Science STKE, vol. 2005, no. 281, p. pl4, 2005.
[37] P. Spirtes, C. Glymour, and R. Scheins, Causation, Prediction, and Search, second ed. MIT Press, 2001.
[38] T. Jaakkola, D. Sontag, A. Globerson, and M. Meila, "Learning Bayesian Network Structure Using LP Relaxations," Proc. 13th Int'l Conf. Artificial Intelligence and Statistics, vol. 9, pp. 358-365, 2010.
[39] E. Segal, D. Koller, N. Friedman, and T. Jaakkola, "Learning Module Networks," J. Machine Learning Research, pp. 525-534, 2005.
[40] A. Martins, N. Smith, and E. Xing, "Concise Integer Linear Programming Formulations for Dependency Parsing," Proc. Joint Conf. 47th Ann. Meeting of the ACL and the Fourth Int'l Joint Conf. Natural Language Processing of the AFNLP, pp. 342-350, 2009.
[41] S.S. Wilks, "The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses," Ann. Math. Statistics, vol. 9, pp. 60-62, 1938.
[42] C.E. Miller, A.W. Tucker, and R.A. Zemlin, "Integer Programming Formulation and Traveling Salesman Problems," J. ACM, vol. 7, pp. 326-329, 1960.
[43] S.D. Pietra, V.D. Pietra, and J. Lafferty, "Inducing Features of Random Fields," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 4, pp. 380-393, Apr. 1997.
[44] D. Karger and N. Srebro, "Learning Markov Networks: Maximum Bounded Tree-Width Graphs," Proc. 12th ACM-SIAM Symp. Discrete Algorithms, 2001.
[45] N. Srebro, "Maximum Likelihood Bounded Tree-Width Markov Networks," Artificial Intelligence, vol. 143, pp. 123-138, 2003.
[46] K.-U. Höffgen, "Learning and Robust Learning of Product Distributions," Proc. Sixth Ann. Conf. Computational Learning Theory, pp. 77-83, 1993.
[47] J. Schäfer and K. Strimmer, "An Empirical Bayes Approach to Inferring Large-Scale Gene Association Networks," Bioinformatics, vol. 21, no. 6, pp. 754-764, 2005.
[48] D. Heckerman, "A Tutorial on Learning Bayesian Networks," Technical Report MSR-TR-95-06, Microsoft Research, Mar. 1995.
[49] K. Lang, "Newsweeder: Learning to Filter Netnews," Proc. 12th Int'l Conf. Machine Learning, pp. 331-339, 1995.
[50] B. Vogelstein, D. Lane, and A.J. Levine, "Surfing the p53 Network," Nature, vol. 408, pp. 307-310, 2000.
[51] A.J. Levine, C.A. Finlay, and P.W. Hinds, "p53 Is a Tumor Suppressor Gene," Cell, vol. 116, pp. S67-S70, 2004.
[52] M.S. Greenblatt, W.P. Bennett, M. Hollstein, and C.C. Harris, "Mutations in the p53 Tumor Suppressor Gene: Clues to Cancer Etiology and Molecular Pathogenesis," Cancer Research, vol. 54, no. 18, pp. 4855-4878, 1994.
[53] A. Petitjean, E. Mathe, S. Kato, C. Ishioka, S.V. Tavtigian, P. Hainaut, and M. Olivier, "Impact of Mutant p53 Functional Properties on TP53 Mutation Patterns and Tumor Phenotype: Lessons from Recent Developments in the IARC TP53 Database," Human Mutation, vol. 28, no. 6, pp. 622-629, 2007.
74 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool