This Article 
 Bibliographic References 
 Add to: 
Automatic Caption Localization in Compressed Video
April 2000 (vol. 22 no. 4)
pp. 385-392

Abstract—We present a method to automatically localize captions in JPEG compressed images and the I-frames of MPEG compressed videos. Caption text regions are segmented from background images using their distinguishing texture characteristics. Unlike previously published methods which fully decompress the video sequence before extracting the text regions, this method locates candidate caption text regions directly in the DCT compressed domain using the intensity variation information encoded in the DCT domain. Therefore, only a very small amount of decoding is required. The proposed algorithm takes about $0.006$ second to process a $240 \times 350$ image and achieves a recall rate of $99.17$ percent while falsely accepting about $1.87$ percent nontext DCT blocks on a variety of MPEG compressed videos containing more than $2,300$ I-frames.

[1] M. Christel, S. Stevens, and H. Wactlar, “Informedia Digital Video Library,” Proc. ACM Multimedia Conf., pp. 480-481, Oct. 1994.
[2] U. Gargi, S. Antani, and R. Kasturi, “Indexing Text Events in Digital Video Databases”, Proc. 14th Int'l Conf. Pattern Recognition (ICPR), pp. 916–918, 1998.
[3] A. Hauptmann and M. Smith, “Text, Speech, and Vision for Video Segmentation: The Informedia Project,” AAAI Symp. Computational Models for Integrating Language and Vision, 1995.
[4] A.K. Jain and S. Bhattacharjee, “Text Segmentation Using Gabor Filters for Automatic Document Processing,” Machine Vision and Applications, vol. 5, pp. 169-184, 1992.
[5] A.K. Jain and B. Yu, “Automatic Text Location in Images and Video Frames,” Pattern Recognition, vol. 31,no. 12 , pp. 2,055–2,076, 1998.
[6] A.K. Jain and Y. Zhong, “Page Segmentation Using Texture Analysis,” Pattern Recognition, vol. 29, no. 5, pp. 743–770, 1996.
[7] S.-W. Lee, D.-J. Lee, and H.-S. Park, “A New Methodology for Gray-Scale Character Segmentation and Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 10, pp. 1,045-1,050, Oct. 1996.
[8] D. Le Gall, “MPEG: A Video Compression Standard for Multimedia Applications,” Comm. ACM, Apr. 1991.
[9] R. Lienhart and F. Stuber, “Automatic Text Recognition in Digital Videos,” Proc. Praktische Informatic IV, pp. 68–131, 1996.
[10] J. Ohya, A. Shio, and S. Akamastsu, “Recognizing Characters in Scene Images,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 16, pp. 214–220, 1994.
[11] I. K. Sethi and N. Patel, “A Statistical Approach to Scene Change Detection,” SPIE Storage and Retrieval for Image and Video Databases III, pp. 329–338, Feb. 1995.
[12] B. Shen and I.K. Sethi, “Convolution-Based Edge-Detection for Image/Video in Block DCT Domain,” J. Visual Comm. and Image Representation, vol. 7, no. 4, pp. 411–423, 1996.
[13] J.C. Shim, C. Dorai, and R. Bolle, “Automatic Text Extraction from Video for Content-Based Annotation and Retrieval,” Proc. 14th Int'l Conf. Pattern Recognition, pp. 618–620, 1998.
[14] M.A. Smith and T. Kanade, “Video Skimming and Characterization through Language and Image Understanding Techniques,” technical report, Carnegie Mellon Univ. 1995.
[15] H. Wactlar, T. Kanade, M.A. Smith, and S.M. Stevens, “Intelligent Access to Digital Video: The Informedia Project,” Computer, vol. 29, no. 5, pp. 46-52, 1996.
[16] G.K. Wallace, "The JPEG Still Compression Standard," Comm. ACM, vol. 34, no. 4, pp. 30-44, Apr. 1991.
[17] V. Wu, R. Manmatha, and E. Riseman, “Finding Text in Images,” 20th Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 3–12, 1997.
[18] B.L. Yeo and B. Liu, “Visual Content Highlighting via Automatic Extraction of Embedded Captions on MPEG Compressed Video,” SPIE Digital Video Compression: Algorithms and Technologies, Feb. 1995.
[19] H. Zhang, C. Low, and S. Smoliar, "Video Parsing and Indexing of Compressed Data," Multimedia Tools and Applications, Mar. 1995, pp. 89-111.
[20] H.J. Zhang, C.Y. Low, S.W. Smoliar, and J.H. Wu, “Video Parsing, Retrieval and Browsing: An Integrated and Content-Based Solution,” Proc. ACM Multimedia, pp. 15–24, Nov. 1995.
[21] H.J. Zhang and S.W. Smoliar, “Developing Power Tools for Video Indexing and Retrieval,” Proc. SPIE Conf. Storage and Retrieval for Image and Video Databases, pp. 140–149, 1994.
[22] Y. Zhong, K. Karu, and A.K. Jain, “Locating Text in Complex Color Images,” Pattern Recognition, vol. 28, no. 10, pp. 1,523–1,536, Oct. 1995.

Index Terms:
Caption extraction, text location, texture, compressed video, segmentation, multimedia.
Yu Zhong, Hongjiang Zhang, Anil K. Jain, "Automatic Caption Localization in Compressed Video," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 4, pp. 385-392, April 2000, doi:10.1109/34.845381
Usage of this product signifies your acceptance of the Terms of Use.