loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
17th International Conference on Pattern Recognition (ICPR'04) - Volume 1
Identification of Embedded Mathematical Expressions in Scanned Documents
Cambridge UK
August 23-August 26
ISBN: 0-7695-2128-2
Utpal Garain, Indian Statistical Institute, India
B. B. Chaudhuri, Indian Statistical Institute, India
A. Ray Chaudhuri, Jadavpur University, India
Efficient extraction of mathematical expressions is considered as an important pre-processing step to apply existing OCR systems to convert scientific papers into their electronic format. In this correspondence, a technique for extracting embedded (or in-line) expressions has been presented. The proposed method for expression extraction initially invokes an existing OCR to recognize the input document. Several features including word n-grams (a statistical analysis of a corpus of scientific documents reveals that the word level n-gram profile for sentences containing embedded expressions is quite different from that of the sentences without any expression) are computed on sentence level to spot sentences containing expressions. Expression zones are pin pointed by exploiting OCR inability to handle expressions and by using some common typographical aspects followed in typing mathematical expressions. Experimental results on a considerable size of dataset show high efficiency of the proposed technique.
Citation:
Utpal Garain, B. B. Chaudhuri, A. Ray Chaudhuri, "Identification of Embedded Mathematical Expressions in Scanned Documents," icpr, vol. 1, pp.384-387, 17th International Conference on Pattern Recognition (ICPR'04) - Volume 1, 2004
Usage of this product signifies your acceptance of the Terms of Use.