
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
Stephen Della Pietra, Vincent Della Pietra, John Lafferty, "Inducing Features of Random Fields," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 4, pp. 380393, April, 1997.  
BibTex  x  
@article{ 10.1109/34.588021, author = {Stephen Della Pietra and Vincent Della Pietra and John Lafferty}, title = {Inducing Features of Random Fields}, journal ={IEEE Transactions on Pattern Analysis and Machine Intelligence}, volume = {19}, number = {4}, issn = {01628828}, year = {1997}, pages = {380393}, doi = {http://doi.ieeecomputersociety.org/10.1109/34.588021}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Pattern Analysis and Machine Intelligence TI  Inducing Features of Random Fields IS  4 SN  01628828 SP380 EP393 EPD  380393 A1  Stephen Della Pietra, A1  Vincent Della Pietra, A1  John Lafferty, PY  1997 KW  Random field KW  KullbackLeibler divergence KW  iterative scaling KW  maximum entropy KW  EM algorithm KW  statistical learning KW  clustering KW  word morphology KW  natural language processing. VL  19 JA  IEEE Transactions on Pattern Analysis and Machine Intelligence ER   
Abstract—We present a technique for constructing random fields from a set of training samples. The learning paradigm builds increasingly complex fields by allowing potential functions, or features, that are supported by increasingly large subgraphs. Each feature has a weight that is trained by minimizing the KullbackLeibler divergence between the model and the empirical distribution of the training data. A greedy algorithm determines how features are incrementally added to the field and an iterative scaling algorithm is used to estimate the optimal values of the weights. The random field models and techniques introduced in this paper differ from those common to much of the computer vision literature in that the underlying random fields are nonMarkovian and have a large number of parameters that must be estimated. Relations to other learning approaches, including decision trees, are given. As a demonstration of the method, we describe its application to the problem of automatic word classification in natural language processing.
[1] M. Almeida and B. Gidas, "A Variational Method for Estimating the Parameters of MRF from Complete or Incomplete Data," Annals of Applied Probability, vol. 3, no. 1, pp. 103136, 1993.
[2] N. Balram and J. Moura, "Noncausal Gauss Markov Random Fields: Parameter Structure and Estimation," IEEE Transactions on Information Theory, vol. 39, no. 4, pp. 1,3331,343, July 1993.
[3] A. Berger, V. Della Pietra, and S. Della Pietra, "A Maximum Entropy Approach to Natural Language Processing," Computational Linguistics, vol. 22, no. 1, pp. 3971, 1996.
[4] L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees.Belmont, Calif.: Wadsworth, 1984.
[5] D. Brown, "A Note on Approximations to Discrete Probability Distributions," Information and Control, vol. 2, pp. 386392, 1959.
[6] P. Brown, V. Della Pietra, P. de Souza, J. Lai, and R. Mercer, "ClassBased nGram Models of Natural Language," Computational Linguistics, vol. 18, no. 4, pp. 467479, 1992.
[7] P.F. Brown, J. Cocke, V. DellaPietra, S. DellaPietra, J.D. Lafferty, R.L. Mercer, and P.S. Roossin, "A Statistical Approach to Machine Translation," Computational Linguistics, vol. 16, no. 2, pp. 7985, 1990.
[8] B. Chalmond, "An Iterative Gibbsian Technique for Reconstruction of mAry Images," Pattern Recognition, vol. 22, no. 6, pp. 747761, 1989.
[9] I. Csiszár, "IDivergence Geometry of Probability Distributions and Minimization Problems," Annals of Probability, vol. 3, no. 1, pp. 146158, 1975.
[10] I. Csiszár, "A Geometric Interpretation of Darroch and Ratcliff's Generalized Iterative Scaling," Annals of Statistics, vol. 17, no. 3, pp. 1,4091,413, 1989.
[11] I. Csiszár and G. Tusnády, "Information Geometry and Alternating Minimization Procedures," Statistics&Decisions, Supplement Issue, vol. 1, pp. 205237, 1984.
[12] J. Darroch and D. Ratcliff, "Generalized Iterative Scaling for LogLinear Models," Ann. Math. Statist., vol. 43, pp. 1,4701,480, 1972.
[13] A.P. Dempster, N.M. Laird, and D.B. Rubin, "Maximum Likelihood from Incomplete Data via the EM Algorithm," J. Royal Statistical Soc., vol. 39, no. B, pp. 138, 1977.
[14] P. Diaconis and D. Ylvisaker, "Conjugate Priors for Exponential Families," Ann. Statist., vol. 7, pp. 269281, 1979.
[15] P. Ferrari, A. Frigessi, and R. Schonmann, "Convergence of Some Partially Parallel Gibbs Samplers with Annealing," Annals of Applied Probability, vol. 3, no. 1, pp. 137152, 1993.
[16] A. Frigessi, C. Hwang, and L. Younes, "Optimal Spectral Structure of Reversible Stochastic Matrices, Monte Carlo Methods and the Simulation of Markov Random Fields," Annals of Applied Probability, vol. 2, no. 3, pp. 610628, 1992.
[17] S. Geman and D. Geman, "Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images," IEEE Trans. Pattern Anal. Machine Intell., vol. 6, pp. 721741, 1984.
[18] C. Geyer and E. Thomson, "Constrained Monte Carlo Maximum Likelihood for Dependent Data (with Discussion)," J. Royal Stat. Soc., vol. B54, pp. 657699, 1992.
[19] E. T. Jaynes, Papers on Probability, Statistics, and Statistical Physics, R. Rosenkrantz, ed. Dordrecht, Holland: D. Reidel Publishing Co., 1983.
[20] J. Lafferty and R. Mercer, "Automatic Word Classification Using Features of Spellings," Proc. Ninth Annual Conf. Univ. of Waterloo Centre for the New OED and Text Research.Oxford, England: Oxford Univ. Press, 1993.
[21] G.G. Potamianos and J.K. Goutsias, “Partition Function Estimation of Gibbs Random Field Images Using Monte Carlo Simulations,” IEEE Trans. Information Theory, vol. 39, pp. 13221332, 1993.
[22] L. Younes, "Estimation and Annealing for Gibbsian Fields," Ann. Inst. H. PoincaréProbab. Statist., vol. 24, no. 2, pp. 269294, 1988.