This Article 
 Bibliographic References 
 Add to: 
A Guide to the Literature on Learning Probabilistic Networks from Data
April 1996 (vol. 8 no. 2)
pp. 195-210

Abstract—This literature review discusses different methods under the general rubric of learning Bayesian networks from data, and includes some overlapping work on more general probabilistic networks. Connections are drawn between the statistical, neural network, and uncertainty communities, and between the different methodological communities, such as Bayesian, description length, and classical statistics. Basic concepts for learning and Bayesian networks are introduced and methods are then reviewed. Methods are discussed for learning parameters of a probabilistic network, for learning the structure, and for learning hidden variables. The presentation avoids formal definitions and theorems, as these are plentiful in the literature, and instead illustrates key concepts with simplified examples.

[1] D. Heckerman, A. Mamdani, and M. Wellman, "Real-World Applications of Bayesian Networks: Introduction," Comm. ACM, vol. 38, no. 3, 1995.
[2] T. Verma and J. Pearl, "Equivalence and Synthesis of Causal Models," Proc. Sixth Conf. Uncertainty in Artificial Intelligence,Boston, Mass., pp. 220-227, 1990.
[3] P. Spirtes, C. Glymour, and R. Scheines, Causation, Prediction, and Search.New York: Springer-Verlag, 1993.
[4] D. Heckerman and R. Shachter, "A Definition and Graphical Representation for Causality," Besnard and Hanks [158].
[5] J. Pearl, "Graphical Models, Causality, and Intervention," Statistical Science, vol. 8, no. 3, pp. 266-273, 1993.
[6] S.L. Lauritzen and D.J. Spiegelhalter, "Local Computations with Probabilities on Graphical Structures and their Application to Expert Systems (with discussion)," J. Royal Statistical Society B, vol. 50, no. 2, pp. 240-265, 1988.
[7] S. Wright, "Correlation and Causation," J. Agricultural Research, vol. 20, pp. 557-585, 1921.
[8] J. Pearl, Probabilistic Reasoning in Intelligent Systems. San Mateo, Calif.: Morgan Kaufman, 1988.
[9] R.A. Howard and J.E. Matheson, "Influence Diagrams," Principles and Applications of Decision Analysis, R.A. Howard and J.E. Matheson, eds. Strategic Decisions Group, Menlo Park, Calif. 1981.
[10] C. Glymour, R. Scheines, P. Spirtes, and K. Kelly, Discovering Causal Structure.San Diego, Calif: Morgan Academic Press, 1987.
[11] S. Mishra and P.P. Shenoy, "Attitude Formation Models: Insights from TETRAD," Cheeseman and Oldford [159], pp. 223-232.
[12] R. Scheines, "Inferring Causal Structure Among Unmeasured Variables," Cheeseman and Oldford [159], pp. 197-204.
[13] "Network Methods in Statistics," Probability, Statistics and Optimization, F.P. Kelly, ed., pp. 241-253.New York: Wiley&Sons, 1994.
[14] J. Hertz, A. Krogh, and R.G. Palmer, Introduction to the Theory of Neural Computation. Addison-Wesley, 1991.
[15] J.R. Quinlan, C4.5: Programs for Machine Learning,San Mateo, Calif.: Morgan Kaufman, 1992.
[16] F. Hayes-Roth,D.A. Waterman,, and D.B. Lenat,Building Expert Systems, Addison-Wesley, Reading, Mass., 1983.
[17] D. Michie, "Current Developments in Expert Systems," Applications of Expert Systems, J.R. Quinlan, ed. London: Addison Wesley, 1987.
[18] J.R. Quinlan,P.J. Compton,K.A. Horn, , and L. Lazarus,“Inductive knowledge acquisition: a case study,” Applications of Expert Systems, J.R. Quinlan, ed., pp. 157-173,Sydney, Australia: Addition-Wesley, 1987.
[19] M. Henrion and D.R. Cooley, "An Experimental Comparison of Knowledge Engineering for Expert Systems and for Decision Analysis," Sixth Nat'l Conf. Artificial Intelligence, pp. 471-476,Seattle, 1987, American Assoc. Artificial Intelligence.
[20] D. Heckerman, "Probabilistic Similarity Networks," Networks, vol. 20, pp. 607-636, 1990.
[21] R.M. Neal, "Connectionist Learning of Belief Networks," Artificial Intelligence, vol. 56, pp. 71-113, 1992.
[22] L.K. Saul, T. Jaakkola, and M.I. Jordan, "Mean Field Theory for Sigmoid Belief Networks," Technical Report 9501, Computational Cognitive Science, MIT, 1995.
[23] W. Buntine, "Operations for Learning with Graphical Models," J. Artificial Intelligence Research, vol. 2, pp. 159-225, 1994.
[24] M.A. Tanner, Tools for Statistical Inference.New York: Springer-Verlag, second edition, 1993.
[25] R.E. Kass and A.E. Raftery, "Bayes Factors and Model Uncertainty," J. American Statistical Assoc., vol. 90, pp. 773-795, 1995.
[26] J.M. Bernardo and A.F.M. Smith, Bayesian Theory.Chichester: John Wiley,. 1994.
[27] W.L. Buntine, "Graphical Models for Discovering Knowledge," Advances in Knowledge Discovery and Data Mining, U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R.S. Uthurasamy, eds. MIT Press, 1995.
[28] R. Almond, "Software for Belief Networks," http://bayes., 1995 (Current on Mar. 9,1995).
[29] AUAI, Association for Uncertainty in Artificial Intelligence, Home Page, sited at Thinkbank, Berkeley, 1995, http://www. (Current on Apr. 2, 1996.)
[30] D. Michie, D.J. Spiegelhalter, and C.C. Taylor, eds., Machine Learning, Neural and Statistical Classification.Hertfordshire, England: Ellis Horword, 1994.
[31] S.L. Lauritzen, B. Thiesson, and D.J. Spiegelhalter, "Diagnostic Systems Created by Model Selection Methods: A Case Study," Cheeseman and Oldford [159], pp. 143-152.
[32] Remco R. Bouckaert, "Properties of Bayesian Belief Network Learning Algorithms," de Mantaras and Poole [160].
[33] M. Singh and M. Valtorta, "An Algorithm for the Construction of Bayesian Network Structures From Data," Heckerman and Mamdani [161], pp. 259-265.
[34] C.F. Aliferis and G.F. Cooper, "An Evaluation of an Algorithm for Inductive Learning of Bayesian Belief Networks Using Simulated Data Sets," de Mantaras and Poole [160], pp. 8-14.
[35] G.F. Cooper and E.H. Herskovits, "A Bayesian Method for the Induction of Probabilistic Networks from Data," Report SMI-91-01, Section of Medical Informatics, Univ. of Pittsburgh, Jan. 1991.
[36] R.R. Bouckaert, "Bayesian Belief Networks: from Inference to Construction," PhD thesis, Faculteit Wiskunde en Informatica, Utrecht Univ., June 1995.
[37] D. Heckerman, D. Geiger, and D.M. Chickering, “Learning Bayesian Networks: The Combination of Knowledge and Statistical Data,” Machine Learning, vol. 20, pp. 197–243, 1995.
[38] S. Højsgaard and B. Thiesson, "BIFROST—Block Recursive Models Induced From Relevant Knowledge, Observations, and Statistical Techniques," Computational Statistics and Data Analysis, vol. 19, no. 2, pp. 155-175, 1995.
[39] R.D. Shachter and D. Heckerman, "Thinking Backwards for Knowledge Acquisition," AI Magazine, vol. 8, pp. 55-61, Fall 1987.
[40] E. Charniak, “Bayesian Networks without Tears,” AI Magazine, pp. 50-63, 1991.
[41] M. Henrion, J.S. Breese, and E.J. Horvitz, "Decision Analysis and Expert Systems," AI Magazine, vol. 12, no. 4, pp. 64-91, 1991.
[42] J. Whittaker, Graphical Models in Applied Multivariate Statistics. Wiley, 1990.
[43] D. Edwards, Introduction to Graphical Modelling. Springer-Verlag, 1995.
[44] D. Heckerman, "Bayesian Networks for Knowledge Discovery," Advances in Knowledge Discovery and Data Mining, U.M. Fayyad et al., eds., AAAI/MIT Press, 1996.
[45] B.D. Ripley, Spatial Statistics.New York: Wiley, 1981.
[46] S.L. Lauritzen, A.P. Dawid, B.N. Larsen, and H.-G. Leimer, "Independence Properties of Directed Markov Fields," Networks, vol. 20, pp. 491-505, 1990.
[47] S. Lauritzen and N. Wermuth, "Graphical Models for Associations Between Variables, Some of Which Are Qualitative and Some Quantitative" Annals of Statistics, vol. 17, pp. 31-57, 1989.
[48] W.L. Buntine, "Chain Graphs for Learning," Besnard and Hanks [158].
[49] P. Tino, B.G. Horne, C. L., P.C. Giles, and Collingwood, "Finite State Machines and Recurrent Neural Networks -Automata and Dynamical Systems Approaches," Technical Report UMIACS-TR-95-1, Inst. for Advanced Computer Studies, Univ. of Maryland, 1995. To be published in Progress in Neural Networks special volume on "Temporal Dynamics and Time-Varying Pattern Recognition," J.E. Dayhoff and O. Omidvar, eds., Ablex Publishing.
[50] N. Wermuth and S.L. Lauritzen, "On Substantive Research Hypotheses, Conditional Independence Graphs and Graphical Chain Models," J. Royal Statistical Society B, vol. 51, no. 3, 1989.
[51] R. Hanson, J. Stutz, and P. Cheeseman, "Bayesian Classification with Correlation and Inheritance," IJCAI91 [162].
[52] T.L. Dean and M.P. Wellman, Planning and Control, San Mateo, Calif.: Morgan Kaufmann, 1991.
[53] W.B. Poland, "Decision Analysis with Continuous and Discrete Variables: A Mixture Distribution Approach," PhD thesis, Dept. of Eng. Economic Systems, Stanford Univ., 1994.
[54] P. Dagum, A. Galper, E. Horvitz, and A. Seiver, "Uncertain Reasoning and Forecasting," Int'l J. Forecasting, 1994, to appear.
[55] J. Pearl, "Causal Diagrams for Empirical Research," Technical Report R-218-L, Cognitive Systems Laboratory, Computer Science Dept., Univ. of California Los Angeles, 1994, to appear in Biometrika.
[56] J. Pearl and T.S. Verma, "A Theory of Inferred Causation," Principles of Knowledge Representation and Reasoning: Proc. Second Int'l Conf., J.A. Allen, R. Fikes, and E. Sandewall, eds., pp. 441-452.San Mateo, Calif: Morgan Kaufmann, 1991.
[57] J. Pearl, "On the Identification of Nonparametric Structural Equations," Technical Report R-207, Cognitive Systems Laboratory, Computer Science Dept., Univ. of California, Los Angeles, Mar. 1994.
[58] D. Heckerman, "A Bayesian Approach to Learning Causal Networks," Besnard and Hanks [158].
[59] P. Spirtes, C. Meek, and T. Richardson, "Causal Inference in the Presence of Latent Variables and Selection Bias," Besnard and Hanks [158], pp. 499-506.
[60] A.P. Dawid and S.L. Lauritzen, "Hyper Markov Laws in the Statistical Analysis of Decomposable Graphical Models," Annals of Statistics, vol. 21, no. 3, pp. 1,272-1,317, 1993.
[61] W. Lam and F. Bacchus, "Using Causal Information and Local Measures to Learn Bayesian Networks," Heckerman and Mamdani [161], pp. 243-250.
[62] D. Heckerman and D. Geiger, "Learning Bayesian Networks: A Unification for Discrete and Gaussian Domains," Besnard and Hanks [158].
[63] O.E. Barndorff-Nielsen, Information and Exponential Families in Statistical Theory.New York: John Wiley and Sons, 1978.
[64] J.H. Friedman, R.A. Olshen, and C.J. Stone, Classification and Regression Trees.Wadsworth, Belmont, 1984.
[65] H. Linhart and W. Zucchini, Model Selection. Wiley, 1986.
[66] S.L. Sclove, "Small-Sample and Large-Sample Statistical Model Selection Criteria," Cheeseman and Oldford [159], pp. 31-39.
[67] A.E. Raftery, "Bayesian Model Selection in Social Research (with Discussion by Gelman&Rubin, and Hauser, and a Rejoiner)," Sociological Methodology 1995, P.V. Marsden, ed. Cambridge, Mass.: Blackwells, 1995.
[68] D. Madigan and A.E. Raftery, "Model Selection and Accounting for Model Uncertainty in Graphical Models Using Occam's Window," J. American Statistical Assoc., vol. 89, pp. 1,535-1,546, 1994.
[69] D. Haussler, M. Kearns, H.S. Seung, and N. Tishby, "Rigorous Learning Curve Bounds From Statistical Mechanics," Proc. Seventh ACM Conf. Computational Learning Theory, M. Warmuth, ed. pp. 76-87, Morgan Kaufmann, 1994.
[70] D. Haussler, "Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications," Information and Control, vol. 100, no. 1, pp. 78-150, Sept. 1992.
[71] D.M. Chickering, "Learning Bayesian Networks Is NP-Complete," 1995, to appear.
[72] K.-U. Höffgen, "Learning and Robust Learning of Product Distributions," Research Report Nr. 464, revised May 1993, Fachbereich Informatik, Univ. Dortmund, 1993.
[73] J. Suzuki, "On An Efficient MDL Learning Procedure Using Branch and Bound Technique," Technical Report COMP95-27 (1995-06), Inst. of Electronics, Information, and Communication Engineers, 1995.
[74] D. Edwards, "Hierarchical Interaction Models," J. Royal Statistical Society B, vol. 51, no. 3, 1989.
[75] M. Kaelbling and D. Ogle, “Minimizing Monitoring Costs: Choosing Between Tracing and Sampling,” Proc. 23rd Int’l Conf. System Sciences, IEEE CS Press, Los Alamitos, Calif., Jan. 1990, pp. 314‐320.
[76] S.L. Lauritzen, "The EM Algorithm For Graphical Association Models with Missing Data," Computational Statistics and Data Analysis, vol. 19, no. 2, pp. 191-201, 1995.
[77] A.P. Dempster, N.M. Laird, and D.B. Rubin, "Maximum Likelihood from Incomplete Data Via the EM Algorithm," J. Royal Statistical Society B, vol. 39, pp. 1-38, 1977.
[78] G. Casella and R.L. Berger, Statistical Inference.Belmont, Calif: Wadsworth&Brooks/Cole, 1990.
[79] D. Heckerman, "A Tutorial on Learning Bayesian Networks," Technical Report MSR-TR-95-06, Microsoft Research, Advanced Technology Division, Mar. 1995.
[80] R.A. Howard, "Decision Analysis: Perspectives on Inference, Decision, and Experimentation," Proc. IEEE, vol. 58, no. 5, 1970.
[81] R. Musick, J. Catlett, and S. Russell, "Decision Theoretic Subsampling for Induction on Large Databases," Machine Learning: Proc. 10th Int'l Conf.,Amherst, Mass.: Morgan Kaufmann. 1993.
[82] A. Azevedo-Filho and R.D. Shachter, "Laplace's Method Approximations for Probabilistic Inference in Belief Networks with Continuous Variables," de Mantaras and Poole [160], pp. 28-36.
[83] B. Thiesson, "Accelerated Quantification of Bayesian Networks with Incomplete Data," Proc. First Int'l Conf. Knowledge Discovery and Data Mining, U.M. Fayyad and R. Uthurusamy, eds., 1995, to appear.
[84] Z. Ghahramani, "Factorial Learning and the EM Algorithm," Advances in Neural Information Processing Systems 7 (NIPS*94), G. Tesauro, D.S. Touretzky, and T.K. Leen, eds. Morgan Kaufmann, 1994,
[85] J. York and D. Madigan, "Markov Chain Monte Carlo Methods for Hierarchical Bayesian Expert Systems," Cheeseman and Oldford [159], pp. 445-452.
[86] W.R. Gilks, A. Thomas, and D.J. Spiegelhalter, "A Language and Program for Complex Bayesian Modelling," The Statistician, vol. 43, pp. 169-178, 1993.
[87] R.M. Neal, "Probabilistic Inference Using Markov Chain Monte Carlo Methods," Technical Report CRG-TR-93-1, Univ. of Toronto, 1993.
[88] Radford M. Neal, "Bayesian Learning for Neural Networks," PhD Thesis, Univ. of Toronto, Graduate Dept. of Computer Science Oct. 1994.
[89] P. McCullagh and J.A. Nelder, Generalized Linear Models.London: Chapman and Hall, second edition, 1989.
[90] H. Robbins and S. Munro, "A Stochastic Optimization Method," Annals of Mathematical Statistics, vol. 22, pp. 400-407, 1951.
[91] M.F. Møller, "Efficient Training of Feed-Forward Neural Networks," PhD Thesis, Aarhus Univ., Aarhus, Denmark, 1993.
[92] D.EE. Rumelhart and J.L. McClelland, Parallel Distributed Processing: Explorations in the Microstructure of Cognition, MIT Press, Cambridge, Mass., 1986.
[93] D.J. Spiegelhalter and S.L. Lauritzen, "Sequential Updating of Conditional Probabilities on Directed Graphical Structures," Networks, vol. 20, pp. 579-605, 1990.
[94] K.G. Olesen, S.L. Lauritzen, and F.V. Jensen, "aHUGiN: A System Creating Adaptive Causal Probabilistic Networks," D. Dubois, M.P. Wellman, B. D'Ambrosio, and P. Smets, eds., Uncertainty in Artificial Intelligence, pp. 223-229.San Mateo, Calif.: Morgan Kaufmann, 1992.
[95] F.J. Diez, "Parameter Adjustment in Bayesian Networks. the Generalized Noisy OR-gate," Heckerman and Mamdani [161], pp. 99-105.
[96] G.M. Provan, "Tradeoffs in Constructing and Evaluating Temporal Influence Diagrams," Heckerman and Mamdani [161], pp. 40-47.
[97] S. Ben-David and M. Jacovi, "On Learning in the Limit and Non-Uniform$(\epsilon, \delta)$-Learning," Proc. Sixth ACM Workshop on Computational Learning Theory, L. Pitt, ed., pp. 209-217. Morgan Kaufmann, 1993.
[98] P. Spirtes and T. Verma, "Equivalence of Causal Models with Latent Variables," Report CMU-PHIL-33, Philosophy, Carnegie Mellon Univ., 1992.
[99] T. Verma and J. Pearl, "An Algorithm For Deciding If a Set of Observed Independencies Has a Causal Explanation," Dubois et al. [163].
[100] M. Frydenberg, "The Chain Graph Markov Property," Scandinavian J. Statistics, vol. 17, pp. 333-353, 1990.
[101] D. Geiger, T. Verma, and J. Pearl, "Identifying Independence in Bayesian Networks," Networks, vol. 20, pp. 507-534, 1990.
[102] S.A. Andersson, D. Madigan, and M.D. Perlman, "On the Markov Equivalence of Chain Graphs, Undirected Graphs, and Acyclic Digraphs," Technical Report #281, Dept. of Statistics, Univ. of Washington, Seattle, Dec. 1994.
[103] P. Spirtes and C. Glymour, "An Algorithm for Fast Recovery of Sparse Causal Graphs," Social Science Computing Reviews, vol. 9, no. 1, pp. 62-72, 1991.
[104] R.M. Fung and S.L. Crawford, "A System for Induction of Probabilistic Models," Eighth Nat'l Conf. Artificial Intelligence, pp. 762-779.Boston, 1990, American Assoc. Artificial Intelligence.
[105] D. Geiger and D. Heckerman, "A Characterization of the Dirichlet Distribution with Application to Learning Bayesian Networks," Besnard and Hanks [158].
[106] R.L. Winkler, "The Quantification of Judgment: Some Methodological Suggestions," SIAM J. Computing, vol. 62, pp. 1,105-1,120, 1967.
[107] D. Spiegelhalter, R.C.G. Franklin, and K. Bull, “Assessment, Criticism, and Improvement of Imprecise Subjective Probabilities for a Medical Expert System,” Uncertainty in Artificial Intelligence, M. Henrion, R. Shachter, L. Kanal, and J. Lemmer, eds., NorthHolland, vol. 5, pp. 285–294, 1990.
[108] M.G. Morgan and M. Henrion, Uncertainty: A Guide to Dealing with Uncertainty in Quantitative Risk and Policy Analysis. Cambridge Univ. Press, 1990.
[109] D. Kahneman, P. Slovic, and A. Tversky, Judgement under Uncertainty: Heuristics and Biases.Cambridge: Cambridge Univ. Press, 1982.
[110] P. Langley and H.A. Simon, "Applications of Machine Learning and Rule Induction," CACM, 1995, to appear.
[111] D.J. Spiegelhalter, A.P. Dawid, S.L. Lauritzen, and R.G. Cowell, "Bayesian Analysis in Expert Systems," Statistical Science, vol. 8, no. 3 pp. 219-283, 1993.
[112] D.J. Spiegelhalter and R.G. Cowell, "Learning in Probabilistic Expert Systems," Bernardo et al. [165], pp. 447-465.
[113] R.G. Cowell, A.P. Dawid, and D.J. Spiegelhalter, "Sequential Model Criticism in Probabilistic Expert Systems," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 15, pp. 209-219, 1993.
[114] K.B. Laskey, "Sensitivity Analysis for Probability Assessments in Bayesian Networks," Heckerman and Mamdani [161], pp. 136-142.
[115] C.K. Chow and C.N. Liu,"Approximating discrete probability distributions with dependence trees," IEEE Trans. Information Theory, vol. 14, no. 3, pp. 462-467, May 1968.
[116] E. Herskovitz and G.F. Cooper,"Kutato: An entropy-driven system for construction of probabilistic expert systems from databases," Uncertainty in Artificial Intelligence 6, P.P. Bonnisone, M. Henrion, L.N. Kanal, and J.F. Lemmer, eds., NorthHolland, Amsterdam, pp. 117-125, 1991.
[117] S. Srinivas,S. Russell,, and A. Agogino,"Automated construction of sparse Bayesian networks from unstructured probabilistic models and domain information," Uncertainty in Artificial Intelligence 5, North-Holland, pp. 295-308, 1990.
[118] W.L. Buntine, "Learning Classification Trees," Hand [166], pp. 182-201.
[119] J. Rissanen, Stochastic Complexity in Statistical Inquiry. World Scientific Series in Computer Science, vol. 15, 1989.
[120] C.S. Wallace and J.D. Patrick, "Coding Decision Trees," Machine Learning, vol. 11, pp. 7-22, 1993.
[121] Y. Zorian and A. Ivanov,"EEODM: An Effective BIST Scheme for ROMs," Int'l Test Conf., pp. 871-879, 1990.
[122] W.L. Buntine, "Classifiers: A Theoretical and Empirical Study," IJCAI91 [162].
[123] J.R. Quinlan, Unknown Attribute Values in Induction Proc. Sixth Int'l Conf. Machine Learning, 1989.
[124] U.M. Fayyad and K.B. Irani, "Multi-Valued Interval Discretization of Continuous-Valued Attributes For Classification Learning," Int'l Joint Conf. Artificial Intelligence, IJCAI, pp. 1,022-1,027,Chambery, France, 1993.
[125] R. Kohavi, G. John, R. Long, D. Manley, and K. Pfleger, "MLC++: A Machine Learning Library in C++," Tools with Artificial Intelligence, pp. 740-743, IEEE C. S. Press, 1994.
[126] S. Kullback, Information Theory and Statistics.New York: John Wiley&Sons, 1959.
[127] D. Geiger, "An Entropy-Based Learning Algorithm of Bayesian Conditional Trees," D. Dubois, M.P. Wellman, B.D'Ambrosio, and P. Smets, eds., Uncertainty in Artificial Intelligence, pp. 92-97.San Mateo, Calif.: Morgan Kaufmann, 1992.
[128] D. Edwards and T. Havránek, "A Fast Model Selection Procedure for Large Families of Models," J. American Statistical Assoc., vol. 82, no. 397, pp. 205-211, 1987.
[129] R.B. Poland and R.D. Shachter, "Three Approaches to Probability Model Selection," de Mantaras and Poole [160], pp. 478-483.
[130] J. Rissanen, "Stochastic Complexity," J. Royal Statistical Society B, vol. 49, no. 3, pp. 223-239, 1987.
[131] C.S. Wallace and P.R. Freeman, "Estimation and Inference By Compact Encoding," J. Royal Statistical Society B, vol. 49, no. 3, pp. 240-265, 1987.
[132] A.R. Barron and T.M. Cover, "Minimum Complexity Density Estimation," IEEE Trans. Information Theory, vol. 37, no. 4, 1991.
[133] J.J. Oliver and R.A. Baxter, "Mml and Bayesianism: Similarities and Differences," Technical Report 206, Monash Univ., Melbourne, 1994.
[134] P. Smyth, "Admissible Stochastic Complexity Models for Classification Problems," Hand [166], pp. 335-347.
[135] W. Lam and F. Bacchus, "Learning Bayesian Belief Networks: An Approach Based on the MDL Principle," Computational Intelligence, vol. 10, no. 4, 1994.
[136] J. Suzuki, "A Construction of Bayesian Networks from DataBases Based on an MDL Scheme," Heckerman and Mamdani [161], pp. 266-273.
[137] B. Efron and R. Tibshirani, "Statistical Data Analysis in the Computer Age," Science, vol. 253, pp. 390-395, 1991.
[138] Ron Kohavi, "A Study of Cross Validation and Bootstrap for Accuracy Estimation and Model Selection," Int'l Joint Conf. Artificial Intelligence, IJCAI,Montreal, 1995.
[139] W.L. Buntine, "Prior Probabilities," http://www.Thinkbank. com/wray/"refs.html # tutes" (current on Apr. 2, 1996), 1994.
[140] G.F. Cooper and E. Herskovits, “A Bayesian Method for the Induction of Probabilistic Networks from Data,” Machine Learning, vol. 9, pp. 309–347, 1992.
[141] R.D. Shachter, D.M. Eddy, and V. Hasselblad, "An Influence Diagram Approach to Medical Technology Assessment," Influence Diagrams, Belief Nets, and Decision Analysis, R.M. Oliver and J.Q. Smith, eds., pp. 321-350. Wiley, 1990.
[142] G. Consonni and P. Giudici, "Learning in Probabilistic Expert Systems," Workshop on Probabilistic Expert Systems, R. Scozzafava, ed., pp. 57-78,Rome, Oct. 1993.
[143] J.C. York, "Bayesian Methods for the Analysis of Misclassified and Incomplete Multivariate Discrete Data," PhD thesis, Univ. of Washington, Seattle, 1992.
[144] D. Madigan and J. York, "Bayesian Graphical Models for Discrete Data," Technical Report #259, Dept. of Statistics, Univ. of Washington, Seattle, Nov. 1993, Submitted to Int'l. Statistical Rreview.
[145] D. Madigan, A.E. Raftery, J.C. York, J.M. Bradshaw, and R.G. Almond, "Strategies for Graphical Model Selection," Cheeseman and Oldford [159], pp. 91-100.
[146] D. Madigan, J. Gavrin, and A.E. Raftery, "Eliciting Prior Information to Enhance the Predictive Performance of Bayesian Graphical Models," Comm. Statistics, to appear, 1995.
[147] J. York, D. Madigan, I. Heuch, and R.T. Lie, "Estimation of the Proportion of Congenital Malformations Using Double Sampling: Incorporating Covariates and Accounting for Model Uncertainty," Applied Statistics, vol. 44, pp. 227-242, 1995.
[148] R. Musick, "Minimal Assumption Distribution Propogation in Belief Networks," Heckerman and Mamdani [161], pp. 251-258.
[149] B.D. Ripley, Stochastic Simulation. 1987.
[150] D. Heckerman, D. Geiger, and D. Chickering, "Learning Bayesian Networks: The Combination of Knowledge and Statistical Data," de Mantaras and Poole [160].
[151] G.F. Cooper, "A Method for Learning Belief Networks that Contain Hidden Variables," J. Intelligent Information Systems, 1994, to appear. Also in Proc. Workshop on Knowledge Discovery in Databases, pp. 112-124, 1993.
[152] D.M. Titterington, A.F.M. Smith, and U.E. Makov, Statistical Analysis of Finite Mixture Distributions.Chichester: John Wiley&Sons, 1985.
[153] P. Cheeseman, M. Self, J. Kelly, W. Taylor, D. Freeman, and J. Stutz, "Bayesian Classification," Seventh Nat'l Conf. Artificial Intelligence,Saint Paul, Minn., 1988, American Assoc. for Artificial Intelligence, pp. 607-611.
[154] A. Thomas, D.J. Spiegelhalter, and W.R. Gilks, "BUGS: A Program to Perform Bayesian Inference Using Gibbs Sampling," Bernardo et al. [165], pp. 837-42.
[155] W.R. Gilks, D.G. Clayton, D.J. Spiegelhalter, N.G. Best, A.J. McNeil, L.D. Sharples, and A.J. Kirby, "Modelling complexity: Applications of Gibbs Sampling in Medicine," J. Royal Statistical Society B, vol. 55, pp. 39-102, 1993.
[156] W.L. Buntine, "Networks for Learning," 50th Session of the Int'l Statistical Inst.,Beijing, China, 1995, invited lecture.
[157] Piero Bonissone, ed., Proc. Sixth Conf. Uncertainty in Artificial Intelligence,Cambridge, Mass., 1990.
[158] P. Besnard and S. Hanks, eds., Uncertainty in Artificial Intelligence: Proc. Eleventh Conf.,Montreal, 1995.
[159] P. Cheeseman and R.W. Oldford, eds., Selecting Models from Data: Artificial Intelligence and Statistics IV, Springer-Verlag, 1994.
[160] R. Lopez de Mantaras and D. Poole, eds. Uncertainty in Artificial Intelligence: Proc. Tenth Conf.,Seattle Wash., 1994.
[161] D. Heckerman and A. Mamdani, eds., Uncertainty in Artificial Intelligence: Proc. Ninth Conf.,Washington, D.C., 1993.
[162] IJCAI91, ed., Int'l Joint Conf. Artificial Intelligence.Sydney: Morgan Kaufmann, 1991.
[163] D. Dubois, M.P. Wellman, B.D. D'Ambrosio, and P. Smets, eds., Uncertainty in Artificial Intelligence: Proc. Eighth Conf.,Stanford, Calif., 1992.
[164] M. Henrion, R. Shachter, L.N. Kanal, and J. Lemmer, eds., Uncertainty in Artificial Intelligence 5.Amsterdam: Elsevier Science Publishers, 1991.
[165] J.M. Bernardo, J.O. Berger, A.P. Dawid, and A.F.M. Smith, eds., Bayesian Statistics 4. Oxford Univ. Press, 1992.
[166] D.J. Hand, ed. Artificial Intelligence Frontiers in Statistics.London: Chapman&Hall 1991.

Index Terms:
Bayesian networks, graphical models, hidden variables, learning, learning structure, probabilistic networks, knowledge discovery.
Wray Buntine, "A Guide to the Literature on Learning Probabilistic Networks from Data," IEEE Transactions on Knowledge and Data Engineering, vol. 8, no. 2, pp. 195-210, April 1996, doi:10.1109/69.494161
Usage of this product signifies your acceptance of the Terms of Use.