This Article 
 Bibliographic References 
 Add to: 
Context-Based Segmentation of Image Sequences
March 2006 (vol. 28 no. 3)
pp. 463-468
We describe an algorithm for context-based segmentation of visual data. New frames in an image sequence (video) are segmented based on the prior segmentation of earlier frames in the sequence. The segmentation is performed by adapting a probabilistic model learned on previous frames, according to the content of the new frame. We utilize the maximum a posteriori version of the EM algorithm to segment the new image. The Gaussian mixture distribution that is used to model the current frame is transformed into a conjugate-prior distribution for the parametric model describing the segmentation of the new frame. This semisupervised method improves the segmentation quality and consistency and enables a propagation of segments along the segmented images. The performance of the proposed approach is illustrated on both simulated and real image data.

[1] S. Agarwal and S. Belongie, “Segmentation by Example,” technical report, Computer Science and Eng. Dept., Univ. of California, San Diego, 2002.
[2] E. Borenstein and S. Ullman, “Class-Specific, Top-Down Segmentation,” Proc. European Conf. Computer Vision, pp. 109-122, 2002.
[3] C. Bregler, “Learning and Recognizing Human Dynamics in Video Sequences,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 1997.
[4] C. Carson, S. Belongie, H. Greenspan, and J. Malik, “Blobworld: Image Segmentation Using Expectation-Maximization and Its Application to Image Querying,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 8, pp. 1026-1038, Aug. 2002.
[5] S-F Chang, W. Chen, H. Meng, H. Sundaram, and D. Zhong, “A Fully Automated Content-Based Video Search Engine Supporting Spatiotemporal Queries,” IEEE Trans Circuits and Systems for Video Technology, vol. 8, no. 5, pp. 602-615, 1998.
[6] Y. Deng and B.S. Manjunath, “Netra-v: Toward an Object-Based Video Representation,” IEEE Trans. Circuits and Systems for Video Technology, vol. 8, no. 5, pp. 616-627, 1998.
[7] B. Duc, P. Schroeter, and J. Bigun, “Spatio-Temporal Robust Motion Estimation and Segmentation,” Proc. Sixth Int'l Conf. Computer Analysis, Images, and Patterns, pp. 238-245, 1995.
[8] P. Duygulu, K. Barnard, J.F.G. de Freitas, and D.A. Forsyth, “Object Recognition as Machine Translation; Learning a Lexicon for a Fixed Image Vocabulary,” Proc. European Conf. Computer Vision, 2002.
[9] C. Fowlkes, S. Belongie, F. Chung, and J. Malik, “Spectral Grouping Using the Nystron Method,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 2, pp. 214-225, Feb. 2004.
[10] J.L. Gauvain and C.H. Lee, “Maximum A Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains,” IEEE Trans. Speech Audio Processing, pp. 291-298, 1994.
[11] H. Greenspan, G. Dvir, and Y. Rubner, “Context Dependent Image Segmentation and Image Matching via EMD Flow,” J. Computer Vision and Image Understanding, vol. 93, no. 1, pp 86-109, 2004.
[12] H. Greenspan, J. Goldberger, and L. Ridel, “A Continuous Probabilistic Framework for Image Matching,” J. Computer Vision and Image Understanding, vol. 84, pp. 384-406, 2001.
[13] B. Horn and B. Schunck, “Determining Optical Flow,” Artificial Intelligence, vol. 17, pp. 185-203, 1981.
[14] G. Iyengar and A.B. Lippman, “Videobook: An Experiment in Characterization of Video,” Proc. IEEE Int'l Conf. Image Processing, vol. 3, pp. 855-858, 1996.
[15] P. KaewTraKulPong and R. Bowden, “An Improved Adaptive Background Mixture Model for Real-Time Tracking with Shadow Detection,” Proc. European Workshop Advanced Video Based Surveillance Systems, 2001.
[16] S. Khan and M Shah, “Object Based Segmentation of Video Using Color, Motion and Spatial Information,” Proc. IEEE Computer Vision and Pattern Recognition, vol. 2, pp. 746-751, 2001.
[17] S McKenna and H. Nait-Charif, “Learning Spatial Context from Tracking Using Penalized Likelihoods,” Proc. Int'l Conf. Pattern Recognition, pp. 138-141, 2004.
[18] S. McKenna, Y. Raja, and S. Gong, “Object Tracking Using Adaptive Colour Mixture Models,” Proc. Asian Conf. Computer Vision, pp. 615-622, 1998.
[19] R. Megret and D. DeMenthon, “A Survey of Spatio-Temporal Grouping Techniques,” Technical Report UMIACS-2002-83, 2002.
[20] D.A. Reynolds, T.F. Quatieri, and R.B. Dunn, “Speaker Verification Using Adapted Gaussian Mixture Models,” Digital Signal Processing, vol. 10, pp. 19-41, 2000.
[21] J. Shi and J. Malik, “Normalized Cuts and Image Segmentation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, pp. 888-905, 2000.
[22] J.Y. Wang and E.H. Adelson, “Spatio-Temporal Segmentation of Video Data,” Proc. SPIE, vol. 2182, pp. 120-131, 1994.
[23] Y. Weiss, “Segmentation Using Eigenvector: A Unifying View,” Proc. Int'l Conf. Computer Vision, 1999.

Index Terms:
Index Terms- Image-sequence analysis, video segmentation, model adaptation, conjugate prior, MAP, context-based segmentation.
Jacob Goldberger, Hayit Greenspan, "Context-Based Segmentation of Image Sequences," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 3, pp. 463-468, March 2006, doi:10.1109/TPAMI.2006.47
Usage of this product signifies your acceptance of the Terms of Use.