Pages: pp. 785-786
The last 10 years have witnessed rapid growth in the popularity of graphical models, most notably Bayesian networks, as a tool for representing, learning, and computing complex probability distributions. Graphical models provide an explicit representation of the statistical dependencies between the components of a complex probability model, effectively marrying probability theory and graph theory. As Jordan puts it in [ 2], graphical models are "a natural tool for dealing with two problems that occur throughout applied mathematics and engineering—uncertainty and complexity—and, in particular, they are playing an increasingly important role in the design and analysis of machine learning algorithms."
Graphical models provide powerful computational support for the Bayesian approach to computer vision, which has become a standard framework for addressing vision problems. Many familiar tools from the vision literature, such as Markov random fields, hidden Markov models, and the Kalman filter, are instances of graphical models. More importantly, the graphical models formalism makes it possible to generalize these tools and develop novel statistical representations and associated algorithms for inference and learning.
The history of graphical models in computer vision follows closely that of graphical models in general. Research by Pearl [ 3] and Lauritzen [ 4] in the late 1980s played a seminal role in introducing this formalism to areas of AI and statistical learning. Not long after, the formalism spread to fields such as statistics, systems engineering, information theory, pattern recognition, and, among others, computer vision. One of the earliest occurrences of graphical models in the vision literature was a paper by Binford et al. [ 1]. The paper described the use of Bayesian inference in a hierarchical probability model to match 3D object models to groupings of curves in a single image. The following year marked the publication of Pearl's influential book [ 3] on graphical models. Since then, many technical papers have been published in IEEE journals and conference proceedings that address different aspects and applications of graphical models in computer vision.
Our goal in organizing this special section was to demonstrate the breadth of applicability of the graphical models formalism to vision problems. Our call for papers in February 2002 produced 16 submissions. After a careful review process, we selected six papers for publication, including five regular papers, and one short paper. These papers reflect the state-of-the-art in the use of graphical models in vision problems that range from low-level image understanding to high-level scene interpretation. We believe these papers will appeal both to vision researchers who are actively engaged in the use of graphical models and machine learning researchers looking for a challenging application domain.
The first paper in this section is "Stereo Matching Using Belief Propagation" by J. Sun, N.-N. Zheng, and H.-Y. Shum. The authors describe a new stereo algorithm based on loopy belief propagation, a powerful inference technique for complex graphical models in which exact inference is intractable. They formulate the dense stereo matching problem as MAP estimation on coupled Markov random fields and obtain promising results on standard test data sets. One of the benefits of this formulation, as the authors demonstrate, is the ease with which it can be extended to handle multiview stereo matching.
In their paper "Statistical Cue Integration of DAG Deformable Models" S.K. Goldenstein, C. Vogler, and D. Metaxas describe a scheme for combining different sources of information into estimates of the parameters of a deformable model. They use a DAG representation of the interdependencies between the nodes in a deformable model. This framework supports the efficient integration of information from edges and other cues using the machinery of affine arithmetic and the propagation of uncertainties. They present experimental results for a face tracking application.
Y. Song, L. Goncalves, and P. Perona describe, in their paper "Unsupervised Learning of Human Motion," a method for learning probabilistic models of human motion from video sequences in cluttered scenes. Two key advantages of their method are its unsupervised nature, which can mitigate the need for tedious hand labeling of data, and the utilization of graphical model constraints to reduce the search space when fitting a human figure model.
M.J. Beal, N. Jojic, and H. Attias present a graphical model approach to the multimodal problem of audiovisual tracking in "A Graphical Model for Audiovisual Object Tracking." Their framework exploits the composability of graphical models in order to combine audio and video cues at the signal level. One benefit of their approach is automatic calibration of the sensors through parameter learning. Their paper reflects a growing trend in the vision community of fusing information from multiple modalities in order to tackle challenging sensing problems such as robust object tracking.
The last regular paper is an invited submission by T.O. Binford and T.S. Levitt entitled "Evidential Reasoning for Object Recognition." This paper summarizes the authors' early pioneering work in applying graphical models to 3D object recognition from a single image. It contains a great deal of material of historical interest from the late 1980s that has never before appeared in archival form.
The final paper of this special section is by M. Marengoni, A. Hanson, S. Zilberstein, and E. Riseman. "Decision Making and Uncertainty Management in a 3D Reconstruction System" describes an application of Bayesian networks coupled with the utility theory to the problem of aerial image interpretation. This framework supports optimal decision making, exemplified here as the selection of the most important pieces of visual evidence during classification.
The papers in this special section demonstrate the wide applicability of graphical modeling techniques to problems in computer vision. The graphical models formalism has the potential to unify many seemingly disparate approaches within the vision literature by placing them in a common framework. It also provides a connection to a rich literature on learning, inference, and modeling techniques. We look forward to continued rapprochement between the computer vision and machine learning communities and the growth of graphical modeling techniques within the vision literature.
We would like to extend our thanks to the authors and the reviewers for their efforts on behalf of the special section. We also thank the staff of TPAMI and, especially, Hilda Hosillos for her help and guidance throughout the publication process.
James M. Rehg
Thomas S. Huang
William T. Freeman