Automatic Script Identification From Document Images Using Cluster-Based Templates
February 1997 (vol. 19 no. 2)
pp. 176-181

Abstract—We describe an automated script identification system for typeset document images. Templates for each script are created by clustering textual symbols from a training set. Symbols from new images are compared to the templates to find the best script. Our current system processes thirteen scripts with minimal preprocessing and high accuracy.

Index Terms:
Script identification, document analysis, optical character recognition.
Judith Hochberg, Patrick Kelly, Timothy Thomas, Lila Kerns, "Automatic Script Identification From Document Images Using Cluster-Based Templates," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 2, pp. 176-181, Feb. 1997, doi:10.1109/34.574802
