Issue No. 08 - August (1994 vol. 16)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/34.308482
<p>An algorithm for robust machine recognition of keywords embedded in a poorly printed document is presented. For each keyword, two statistical models, called pseudo 2-D hidden Markov models, are created for representing the actual keyword and all the other extraneous words, respectively. Dynamic programming is then used for matching an unknown input word with the two models and for making a maximum likelihood decision. Although the models are pseudo 2-D in the sense that they are not fully connected 2-D networks, they are shown to be general enough in characterizing printed words efficiently. These models facilitate a nice "elastic matching" property in both horizontal and vertical directions, which makes the recognizer not only independent of size and slant but also tolerant of highly deformed and noisy words. The system is evaluated on a synthetically created database that contains about 26000 words. Currently, the authors achieve a recognition accuracy of 99% when words in testing and training sets are of the same font size, and 96% when they are in different sizes. In the latter case, the conventional 1-D HMM achieves only a 70% accuracy rate.</p>
hidden Markov models; dynamic programming; optical character recognition; statistical models; maximum likelihood estimation; decision theory; document image processing; keyword spotting; poorly printed documents; pseudo 2-D hidden Markov models; robust machine recognition; statistical models; dynamic programming; maximum likelihood decision; elastic matching; recognition accuracy; testing; training sets
S. Kuo and O. Agazzi, "Keyword Spotting in Poorly Printed Documents using Pseudo 2-D Hidden Markov Models," in IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 16, no. , pp. 842-848, 1994.