The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.03 - March (2010 vol.32)
pp: 431-447
Christian Wolf , Université de Lyon, CNRS, and INSA-Lyon, France
ABSTRACT
We present a new method for blind document bleed-through removal based on separate Markov Random Field (MRF) regularization for the recto and for the verso side, where separate priors are derived from the full graph. The segmentation algorithm is based on Bayesian Maximum a Posteriori (MAP) estimation. The advantages of this separate approach are the adaptation of the prior to the contents creation process (e.g., superimposing two handwritten pages), and the improvement of the estimation of the recto pixels through an estimation of the verso pixels covered by recto pixels; moreover, the formulation as a binary labeling problem with two hidden labels per pixels naturally leads to an efficient optimization method based on the minimum cut/maximum flow in a graph. The proposed method is evaluated on scanned document images from the 18th century, showing an improvement of character recognition results compared to other restoration methods.
INDEX TERMS
Markov random fields, Bayesian estimation, graph cuts, document image restoration.
CITATION
Christian Wolf, "Document Ink Bleed-Through Removal with Two Hidden Markov Random Fields and a Single Observation Field", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.32, no. 3, pp. 431-447, March 2010, doi:10.1109/TPAMI.2009.33
REFERENCES
[1] H.S. Baird, “Document Image Defect Models and Their Uses,” Proc. Int'l Conf. Document Analysis and Recognition, pp. 62-67, 1993.
[2] J. Besag, “Spatial Interaction and the Statistical Analysis of Lattice Systems,” J. Royal Statistical Soc., Series B, vol. 36, no. 2, pp. 192-236, 1974.
[3] C.A. Bouman and M. Shapiro, “A Multiscale Random Field Model for Bayesian Image Segmentation,” IEEE Trans. Image Processing, vol. 3, no. 2, pp. 162-177, Mar. 1994.
[4] Y. Boykov and V. Kolmogorov, “An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 9, pp. 1124-1137, Sept. 2004.
[5] Y. Boykov, O. Veksler, and R. Zabih, “Fast Approximate Energy Minimization via Graph Cuts,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 11, pp. 1222-1239, Nov. 2001.
[6] B. Braathen and W. Pieczynski, “Global and Local Methods of Unsupervised Bayesian Segmentation of Images,” Machine Graphics and Vision, vol. 2, no. 1, pp. 39-52, 1993.
[7] M. Brown and W. Seales, “Image Restoration of Arbitrarily Warped Documents,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 10, pp. 1295-1306, Oct. 2004.
[8] P. Charbonnier, L. Blanc-Féraud, G. Aubert, and M. Barlaud, “Deterministic Edge-Preserving Regularization in Computated Imaging,” IEEE Trans. Image Processing, vol. 6, no. 2, pp. 298-311, Feb. 1997.
[9] H. Derin and H. Elliott, “Modeling and Segmentation of Noisy and Textured Images Using Gibbs Random Fields,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 9, no. 1, pp. 39-55, Jan. 1987.
[10] D.M. Greig, B.T. Porteous, and A.H. Seheult, “Exact Maximum a Posteriori Estimation for Binary Images,” J. Royal Statistical Soc. B, vol. 51, no. 2, pp. 271-279, 1989.
[11] H.-S. Don, “A Noise Attribute Thresholding Method for Document Image Binarization,” Int'l J. Document Analysis and Recognition, vol. 4, no. 2, pp. 131-138, 2000.
[12] K. Donaldson and G.K. Myers, “Bayesian Super-Resolution of Text in Video with a Text-Specific Bimodal Prior,” Int'l J. Document Analysis and Recognition, vol. 7, nos. 2-3, pp. 159-167, 2005.
[13] F. Drira, F. LeBourgeois, and H. Emptoz, “Restoring Ink Bleed-Through Degraded Document Images Using a Recursive Unsupervised Classification Technique,” Proc. Seventh Workshop Document Analysis Systems, pp. 38-49, 2006.
[14] E. Dubois and A. Pathak, “Reduction of Bleed-Through in Scanned Manuscript Documents,” Proc. Image Processing, Image Quality, Image Capture Systems Conf., pp. 177-180, 2001.
[15] S. Geman and D. Geman, “Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 6, no. 6, pp. 721-741, Nov. 1984.
[16] J.M. Hammersley and P. Clifford, “Markov Fields on Finite Graphs and Lattices,” unpublished manuscript, 1968.
[17] T. Kanungo, R.M. Haralick, and I. Philips, “Global and Local Document Degradation Models,” Proc. Int'l Conf. Document Analysis and Recognition, pp. 730-734, 1993.
[18] Z. Kato, M. Berthod, and J. Zerubia, “A Hierarchical Markov Random Field Model and Multitemperature Annealing for Parallel Image Classification,” Graphical Models and Image Processing, vol. 58, no. 1, pp. 18-37, 1996.
[19] V. Kolmogorov and R. Zabih, “What Energy Functions Can Be Minimized via Graph Cuts?” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 2, pp. 147-159, Feb. 2004.
[20] S. Kumar and M. Hebert, “Discriminative Random Fields,” Int'l J. Computer Vision, vol. 68, no. 2, pp. 179-201, 2006.
[21] J.-M. Laferte, P. Perez, and F. Heitz, “Discrete Markov Image Modelling and Inference on the Quad Tree,” IEEE Trans. Image Processing, vol. 9, no. 3, pp. 390-404, Mar. 2000.
[22] J. Lafferty, A. McCallum, and F. Pereira, “Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Data,” Proc. Int'l Conf. Machine Learning, 2001.
[23] F. Lebourgeois, E. Trinh, B. Allier, V. Eglin, and H. Emptoz, “Document Images Analysis Solutions for Digital Libraries,” Proc. First Int'l Workshop Document Images Analysis Solutions for Digital Libraries, 2004.
[24] Y. Leydier, F. LeBourgeois, and H. Emptoz, “Serialized Unsupervised Classifier for Adaptative Color Image Segmentation: Application to Digitized Ancient Manuscripts,” Proc. Int'l Conf. Pattern Recognition, pp. 494-497, 2004.
[25] S.Z. Li, Markov Random Field Modeling in Image Analysis. Springer-Verlag, 2001.
[26] M. Melgosa, “Testing Cielab-Based Color-Difference Formulas,” Color Research and Application, vol. 25, no. 1, pp. 49-55, 2000.
[27] W. Niblack, An Introduction to Digital Image Processing. Prentice Hall, 1986.
[28] H. Nishida and T. Suzuki, “Correcting Show-Through Effects on Document Images by Multiscale Analysis,” Proc. Int'l Conf. Pattern Recognition, vol. 3, pp. 65-68, 2002.
[29] J. Pearl, Probabilistic Reasoning in Intelligent Systems. Morgan Kaufman, 1988.
[30] J. Sauvola, T. Seppänen, S. Haapakoski, and M. Pietikäinen, “Adaptive Document Binarization,” Proc. Int'l Conf. Document Analysis and Recognition, vol. 1, pp. 147-152, 1997.
[31] M.I. Sezan and A.M. Tekalp, “Survey of Recent Developments in Digital Image Restoration,” Optical Eng., vol. 29, no. 5, pp. 393-404, 1990.
[32] G. Sharma, “Show-Through Cancellation in Scans of Duplex Printed Documents,” IEEE Trans. Image Processing, vol. 10, no. 5, pp. 736-754, May 2001.
[33] C.L. Tan, R. Cao, and P. Shen, “Restoration of Archival Documents Using a Wavelet Technique,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 10, pp. 1399-1404, Oct. 2002.
[34] A. Tonazzini and L. Bedini, “Independent Component Analysis for Document Restoration,” Int'l J. Document Analysis and Recognition, vol. 7, no. 1, pp. 17-27, 2004.
[35] A. Tonazzini, L. Bedini, and E. Salerno, “A Markov Model for Blind Image Separation by a Mean-Field EM Algorithm,” IEEE Trans. Image Processing, vol. 15, no. 2, pp. 473-482, Feb. 2006.
[36] A. Tonazzini and I. Gerace, “Bayesian MRF-Based Blind Source Separation of Convolutive Mixtures of Images,” Proc. 13th European Signal Processing Conf., 2005.
[37] A. Tonazzini, E. Salerno, and L. Bedini, “Fast Correction of Bleed-Through Distortion in Grayscale Documents by a Blind Source Separation Technique,” Int'l J. Document Analysis and Recognition, vol. 10, no. 1, pp. 17-25, 2007.
[38] A. Tonazzini, S. Vezzosi, and L. Bedini, “Analysis and Recognition of Highly Degraded Printed Characters,” Int'l J. Document Analysis and Recognition, vol. 6, no. 4, pp. 236-247, 2003.
[39] O.D. Trier and A.K. Jain, “Goal-Directed Evaluation of Binarization Methods,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 17, no. 12, pp. 1191-1201, Dec. 1995.
[40] R.A. Wagner and M.J. Fisher, “The String to String Correction Problem,” J. ACM, vol. 21, no. 1, pp. 168-173, 1974.
[41] Q. Wang, T. Xia, C.L. Tan, and L. Li, “Directional Wavelet Approach to Remove Document Image Interference,” Proc. Int'l Conf. Document Analysis and Recognition, pp. 736-740, 2003.
[42] C. Wolf and D. Doermann, “Binarization of Low Quality Text Using a Markov Random Field Model,” Proc. Int'l Conf. Pattern Recognition, vol. 3, pp. 160-163, 2002.
[43] J. Zhang, “The Mean Field Theory in EM Procedures for Markov Random Fields,” IEEE Trans. Image Processing, vol. 40, no. 10, pp. 2570-2583, Oct. 1992.
[44] L. Zhang, Y. Zhang, and C.L. Tan, “An Improved Physically-Based Method for Geometrical Restoration of Distorted Document Images,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 4, pp. 728-734, Apr. 2008.
[45] Q. Zheng and T. Kanungo, “Morphological Degradation Models and Their Use in Document Image Restoration,” Proc. Int'l Conf. Image Processing, vol. 1, pp. 193-196, 2001.
26 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool