This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Automatic Script Identification From Document Images Using Cluster-Based Templates
February 1997 (vol. 19 no. 2)
pp. 176-181

Abstract—We describe an automated script identification system for typeset document images. Templates for each script are created by clustering textual symbols from a training set. Symbols from new images are compared to the templates to find the best script. Our current system processes thirteen scripts with minimal preprocessing and high accuracy.

[1] K. Church, "Stress Assignment in Letter to Sound Rules for Speech Synthesis," Proc. IEEE Int'l Conf. Acoustic Speech Signal Processing,Tokyo, Japan, pp. 2,423-2,426, 1986.
[2] M. Damashek, "Gauging Similarity n-Grams: Language-Independent Categorization of Text," Science, vol. 267, pp. 843-848, 1995.
[3] P. Sibun and A.L. Spitz, "Language Determination: Natural Language Processing From Scanned Document Images," Proc. ANLP, pp. 115-121, 1994.
[4] A.L. Spitz, "Script and Language Determination From Document Images," Proc. Third Ann. Symp. Document Anal. Info. Retrieval,Las Vegas, Nev., pp. 229-235, 1994.
[5] A. Hong and J.J. Hull, "Improving OCR Performance With Word Image Equivalence," Proc. Fourth Ann. Symp. Document Anal. Info. Retrieval,Las Vegas, Nev., pp. 177-189, 1995.
[6] A.L. Spitz, "Text Characterization by Connected Component Transformations," Proc. Document Recognition,.San Jose, Calif., pp. 97-105, 1994.
[7] J. Rasure and C. Williams, "An Integrated Visual Language and Software Development Environment," J. Visual Lang. Comp., vol. 2, pp. 217-246, 1991.
[8] Available by anonymous ftp to ftp.c3.lanl.gov, in pub/script_id directory.
[9] R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis.New York: Wiley, 1973.
[10] J. Hochberg, L. Kerns, P. Kelly, and T. Thomas, "Automatic Script Identification From Images Using Cluster-Based Templates," Proc. Int'l Conf. Document Analysis and Recognition,Montreal, pp. 378-381, Aug. 1995.

Index Terms:
Script identification, document analysis, optical character recognition.
Citation:
Judith Hochberg, Patrick Kelly, Timothy Thomas, Lila Kerns, "Automatic Script Identification From Document Images Using Cluster-Based Templates," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 2, pp. 176-181, Feb. 1997, doi:10.1109/34.574802
Usage of this product signifies your acceptance of the Terms of Use.