This Article 
 Bibliographic References 
 Add to: 
Document Image Decoding Using Markov Source Models
June 1994 (vol. 16 no. 6)
pp. 602-617

Document image decoding (DID) is a communication theory approach to document image recognition. In DID, a document recognition problem is viewed as consisting of three elements: an image generator, a noisy channel and an image decoder. A document image generator is a Markov source (stochastic finite-state automaton) that combines a message source with an imager. The message source produces a string of symbols, or text, that contains the information to be transmitted. The imager is modeled as a finite-state transducer that converts the 1D message string into an ideal 2D bitmap. The channel transforms the ideal image into a noisy observed image. The decoder estimates the message, given the observed image, by finding the a posteriori most probable path through the combined source and channel models using a Viterbi-like dynamic programming algorithm. The proposed approach is illustrated on the problem of decoding scanned telephone yellow pages to extract names and numbers from the listings. A finite-state model for yellow page columns was constructed and used to decode a database of scanned column images containing about 1100 individual listings.

[1] Adobe Systems,Postscript Language Reference Manual, Addison-Wesley, Reading, Mass., 1985.
[2] H. Abelson and A. diSessa,Turtle Geometry. Cambridge, MA: MIT Press, 1980.
[3] D. S. Batory, "A model of transactions on physical databases,"ACM Trans. Database Syst., vol. 7, no. 4, pp. 509-539, Dec. 1982.
[4] J. C. Anigbogu and A. Belaiïid, "Application of hidden Markov models to multifont text recognition," inProc. Int. Conf. Document Anal. and Recognit., Saint-Malo, France, September, 1991, pp. 785-793.
[5] H. Baird, "Document image defect models," inProc. IAPR Workshop on Syntactic and Structural Pattern Recognit., Murray Hill, NJ, June 1990.
[6] L. Bahl, F. Jelinek, and R. Mercer, "A maximum likelihood approach to continuous speech recognition,"IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI-5, no. 2, pp. 179-190, Mar. 1983.
[7] F. Chen, L. Wilcox, and D. Bloomberg, "Word spotting in scanned images using hidden Markov models," inProc. 1993 IEEE Int. Conf. Acoust., Speech and Signal Processing, Minneapolis, MN, vol. V, Apr. 27-30, 1993, pp. 1-4.
[8] M. Chen, A. Kundu, and J. Zhou, "Off-line handwritten word recognition using a hidden Markov model type stochastic network," to appear inIEEE Trans. Pattern Anal. Machine Inell.
[9] P. Chou, "Recognition of equations using a two-dimensional stochastic context-free grammar," presented at theSPIE Visual Commun. Image Processing (IV), Philadelphia, PA, vol. 1199, Nov. 1989, pp. 852-863.
[10] C. Goldfarb,The SGML Handbook. Oxford: Oxford Univ. Press, 1991.
[11] J. E. Hopcroft and J. D. Ullman,Introduction to Automata Theory, Languages, and Computation. Reading, MA: Addison-Wesley, 1979.
[12] X. Huang, Y. Ariki, and M. Jack,Hidden Markov Models for Speech Recognition. Edinburgh: Edinburgh Univ. Press, 1990.
[13] F. Jelinek, "Continuous speech recognition by statistical methods,"Proc. IEEE, vol. 64, no. 4, pp. 532-556, Apr. 1976.
[14] R. Karp, R. Miller, and S. Winograd, "The Organization of Computations for Uniform Recurrence Equations,"J. ACM, Vol. 14, No. 3, 1967, pp. 563-590.
[15] G. Kopec and S. Bagley, "Editing Images of Text," inEP90, R. Furuta, Ed. Cambridge: Cambridge Univ. Press, 1990; also rep. P92-000150 (ISTL-92-3) Xerox Palo Alto Res. Center, Palo Alto, CA, Nov. 1992.
[16] G. Kopec, "Row-major scheduling of image decoders," Rep. P92-00061 (EDL-92-5), XEROX Palo Alto Res. Center, Palo Alto, CA, June 1992.
[17] G. Kopec, "Least-squares font metric estimation from images,"IEEE Trans. Image Processing, vol. 2, no. 4, pp. 510-519, Oct. 1993.
[18] L. Lamport,Latex: A Document Preparation System, Addison-Wesley, Reading, Mass., 1986.
[19] Pacific Bell,Smart Yellow Pages, Palo Alto, Redwood City and Menlo Park, 1992.
[20] C. H. Papadimitriou and K. Steiglitz,Combinatorial Optimization: Algorithms and Complexity. Englewood Cliffs, NJ: Prentice-Hall, 1982.
[21] P. Prusinkiewicz and J. Hanan,Lindenmayer Systems, Fractals and Plants(Lecture Notes in Biomathematics), no. 79. Berlin: Springer-Verlag, 1989.
[22] S. K. Rao, "Regular iterative algorithms and their implementations on processor arrays," Ph.D. dissertation, Stanford Univ., Stanford, CA, Oct. 1985.
[23] R. Rubenstein,Digital Typography. Reading: Addison-Wesley, 1988.
[24] D. Sankoff and J. Kruskal, Eds.,Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison. Reading, MA: Addison-Wesley, 1983.
[25] M. Tomita, "Parsing 2-dimensional language," inACM Int. Workshop on Parsing Technol., 1989, pp. 414-424.
[26] K. Wong, G. Casey, and F. Wahl, "Document analysis system,"IBM J. Res. Develop., vol. 26, no. 6, pp. 647-656, Nov. 1982.
[27] J. Vlontzos and S.Y. Kung, "Hidden Markov models for character recognition," inProc. 1989 IEEE Int. Conf. Acoust., Speech and Signal Processing, Glasgow, Scotland, May 23-26, 1989, pp. 1719-1722.

Index Terms:
document image processing; hidden Markov models; dynamic programming; image coding; document image decoding; Markov source models; communication theory; document image recognition; stochastic finite state automaton; message source; 1D message string; 2D bitmap; decoder; channel models; Viterbi-like dynamic programming; finite state model
G.E. Kopec, P.A. Chou, "Document Image Decoding Using Markov Source Models," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16, no. 6, pp. 602-617, June 1994, doi:10.1109/34.295905
Usage of this product signifies your acceptance of the Terms of Use.