loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '03) - Volume 2
A Bayesian Framework for Fusing Multiple Word Knowledge Models in Videotext Recognition
Madison, Wisconsin
June 18-June 20
ISBN: 0-7695-1900-8
DongQing Zhang, Columbia University
Shih-Fu Chang, Columbia University
Videotext recognition is challenging due to low resolution, diverse fonts/styles, and cluttered background. Past methods enhanced recognition by using multiple frame averaging, image interpolation and lexicon correction, but recognition using multi-modality language models has not been explored. In this paper, we present a formal Bayesian framework for videotext recognition by combining multiple knowledge using mixture models, and describe a learning approach based on Expectation-Maximization (EM). In order to handle unseen words, a back-off smoothing approach derived from the Bayesian model is also presented. We exploited a prototype that fuses the model from closed caption and that from the British National Corpus. The model from closed caption is based on a unique time distance distribution model of videotext words and closed caption words. Our method achieves a significant performance gain, with word recognition rate of 76.8% and character recognition rate of 86.7%. A proposed post-processing method also improves videotext detection significantly, with precision at 91.8% and recall at 95.6%.
Index Terms:
Videotext recognition, Video OCR, Video indexing, Information Fusing. Multimodal Recognition
Citation:
DongQing Zhang, Shih-Fu Chang, "A Bayesian Framework for Fusing Multiple Word Knowledge Models in Videotext Recognition," cvpr, vol. 2, pp.528, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '03) - Volume 2, 2003
Usage of this product signifies your acceptance of the Terms of Use.