This Article 
 Bibliographic References 
 Add to: 
Learning Multivariate Distributions by Competitive Assembly of Marginals
Feb. 2013 (vol. 35 no. 2)
pp. 398-410
F. Sánchez-Vega, Dept. of Appl. Math. & Stat., Johns Hopkins Univ., Baltimore, MD, USA
J. Eisner, Dept. of Comput. Sci., Johns Hopkins Univ., Baltimore, MD, USA
L. Younes, Dept. of Appl. Math. & Stat., Johns Hopkins Univ., Baltimore, MD, USA
D. Geman, Dept. of Appl. Math. & Stat., Johns Hopkins Univ., Baltimore, MD, USA
We present a new framework for learning high-dimensional multivariate probability distributions from estimated marginals. The approach is motivated by compositional models and Bayesian networks, and designed to adapt to small sample sizes. We start with a large, overlapping set of elementary statistical building blocks, or “primitives,” which are low-dimensional marginal distributions learned from data. Each variable may appear in many primitives. Subsets of primitives are combined in a Lego-like fashion to construct a probabilistic graphical model; only a small fraction of the primitives will participate in any valid construction. Since primitives can be precomputed, parameter estimation and structure search are separated. Model complexity is controlled by strong biases; we adapt the primitives to the amount of training data and impose rules which restrict the merging of them into allowable compositions. The likelihood of the data decomposes into a sum of local gains, one for each primitive in the final structure. We focus on a specific subclass of networks which are binary forests. Structure optimization corresponds to an integer linear program and the maximizing composition can be computed for reasonably large numbers of variables. Performance is evaluated using both synthetic data and real datasets from natural language processing and computational biology.
Index Terms:
statistical distributions,belief networks,integer programming,learning (artificial intelligence),linear programming,computational biology,competitive marginal assembly,high-dimensional multivariate probability distribution learning,estimated marginals,Bayesian networks,elementary statistical building blocks,low-dimensional marginal distributions,Lego-like fashion,probabilistic graphical model,parameter estimation,structure search,integer linear program,maximizing composition,natural language processing,Bayesian methods,Assembly,Computational modeling,Probability distribution,Object oriented modeling,Connectors,Joints,linear programming,Graphs and networks,statistical models,machine learning
F. Sánchez-Vega, J. Eisner, L. Younes, D. Geman, "Learning Multivariate Distributions by Competitive Assembly of Marginals," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 2, pp. 398-410, Feb. 2013, doi:10.1109/TPAMI.2012.96
Usage of this product signifies your acceptance of the Terms of Use.