2009 10th International Conference on Document Analysis and Recognition (2009)
July 26, 2009 to July 29, 2009
A two-stage approach for word-wise identification of English (Roman), Devnagari and Bengali (Bangla) scripts is proposed. This approach balances the tradeoff between recognition accuracy and processing speed. The 1st stage allows identifying scripts with high speed, yet less accuracy when dealing with noisy data. The advanced 2nd stage processes only those samples that yield low recognition confidence in the first stage. For both stages a rough character segmentation is performed and features are computed on segmented character components. Features used in the 1st stage are a 64-dimensional chain-code-histogram feature, while 400-dimensional gradient features are used in the 2nd stage. Final classification of a word to a particular script is done via majority voting of each recognized character component of the word. Extensive experiments with various confidence scores were conducted and reported here. The overall recognition accuracy and speed is remarkable. Correct classification of 98.51% on 11,123 test words is achieved, even when the recognition-confidence is as high as 95% at both stages.
S. Chanda, S. Pal, U. Pal and K. Franke, "Two-stage Approach for Word-wise Script Identification," 2009 10th International Conference on Document Analysis and Recognition(ICDAR), Barcelona, Spain, 2009, pp. 926-930.