17th International Conference on Pattern Recognition (ICPR'04) - Volume 1 Identification of Embedded Mathematical Expressions in Scanned Documents Cambridge UK August 23-August 26 ISBN: 0-7695-2128-2
Efficient extraction of mathematical expressions is considered as an important pre-processing step to apply existing OCR systems to convert scientific papers into their electronic format. In this correspondence, a technique for extracting embedded (or in-line) expressions has been presented. The proposed method for expression extraction initially invokes an existing OCR to recognize the input document. Several features including word n-grams (a statistical analysis of a corpus of scientific documents reveals that the word level n-gram profile for sentences containing embedded expressions is quite different from that of the sentences without any expression) are computed on sentence level to spot sentences containing expressions. Expression zones are pin pointed by exploiting OCR inability to handle expressions and by using some common typographical aspects followed in typing mathematical expressions. Experimental results on a considerable size of dataset show high efficiency of the proposed technique.
Citation:
Utpal Garain, B. B. Chaudhuri, A. Ray Chaudhuri, "Identification of Embedded Mathematical Expressions in Scanned Documents," icpr, vol. 1, pp.384-387, 17th International Conference on Pattern Recognition (ICPR'04) - Volume 1, 2004 Usage of this product signifies your acceptance of the Terms of Use. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||