This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Restoration of Archival Documents Using a Wavelet Technique
October 2002 (vol. 24 no. 10)
pp. 1399-1404

Abstract—This paper addresses a problem of restoring handwritten archival documents by recovering their contents from the interfering handwriting on the reverse side caused by the seeping of ink. We present a novel method that works by first matching both sides of a document such that the interfering strokes are mapped with the corresponding strokes originating from the reverse side. This facilitates the identification of the foreground and interfering strokes. A wavelet reconstruction process then iteratively enhances the foreground strokes and smears the interfering strokes so as to strengthen the discriminating capability of an improved Canny edge detector against the interfering strokes. The method has been shown to restore the documents effectively with average precision and recall rates for foreground text extraction at 84 percent and 96 percent, respectively.

[1] G. Nagy, “Twenty Years of Document Image Analysis in PAMI,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 1, pp. 38-62, Jan. 2000.
[2] R. Casey and E. Lecolinet, “A Survey of Methods in Strategies in Character Segmentation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, pp. 690-706, 1996.
[3] H. Negishi, J. Kato, H. Hase, and T. Watanabe, “Character Extraction from Noisy Background for an Automatic Reference System,” Proc. Fifth Int'l Conf. Document Analysis and Recognition, pp.143-146, Sept. 1999.
[4] N. Otsu, “A Threshold Selection Method from Gray-Level Histograms,” IEEE Trans. System, Man, and Cybernetics, vol. 9, no. 1, pp. 62-66, 1979.
[5] Y. Liu and S.N. Srihari, “Document Image Binarization Based on Texture Features,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 5, pp. 1-5, May 1997.
[6] S. Liang and M. Ahmadi, “A Morphological Approach to Text String Extraction from Regular Periodic Overlapping Text/Background Images,” Graphical Models and Image Processing, vol. 56, no. 5, pp. 402-413, Sept. 1994.
[7] J.M. White and G.D. Rohrer, “Image Thresholding for Optical Character Recognition and Other Applications Requiring Character Image Extraction,” IBM J. Research Development, vol. 27, no. 4, pp. 400-410, 1983.
[8] H.-S. Don, “A Noise Attribute Thresholding Method for Document Image Binarization,” Proc. Third Int'l Conf. Document Analysis and Recognition, pp. 231-234, Aug. 1995.
[9] G. Sharma, “Cancellation of Show-through in Duplex Scanning,” Proc. Int'l Conf. Image Processing, vol.3, pp. 609-612, Sept. 2000.
[10] G. Sharma, “Show-through Cancellation in Scans of Duplex Printed Documents,” IEEE Trans. Image Processing, vol. 10, no. 5, pp. 736-754, May 2001.
[11] D.L. Donoho, “Threshold Selection for Wavelet Shrinkage of Noisy Data,” Proc. 16th Ann. Int'l Conf. IEEE Eng. in Medicine and Biology Soc., vol. 1, pp. A24-A25, Nov. 1994.
[12] D.L. Donoho, “De-Noising by Soft-Thresholding,” IEEE Trans. Information Theory, vol. 41, no. 3, pp. 613-627, May 1995.
[13] K. Berkner, M.J. Gormish, E.L. Schwarts, and M. Boliek, “A New Wavelet-Based Approach to Sharpening and Smoothing of Images in Besov Spaces with Applications to Deblurring,” Proc. Int'l Conf. Image Processing, vol. 3, pp. 797-800, Sept. 2000.
[14] J. Lu, D.M. Healy, and J.B. Weaver, “Contrast Enhancement of Medical Images Using Multiscale Edge Representation,” Optical Eng., vol. 33, no. 7, pp. 2151-2161, July 1994.
[15] J. Lu, “Image Deblocking via Multiscale Edge Processing,” Proc. SPIE Wavelet Application in Signal and Image Processing IV, M.A. Unser, A. Aldroubi, and A.F. Laine, eds., vol. 2825, part 2, pp. 742-751, Aug. 1996.
[16] S. Mallat and S. Zhong, “Characterization of Signals from Multiscale Edges,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 14, no. 7, pp. 710-732, July 1992.
[17] W.L. Hwang, F. Chang, “Character Extraction from Documents Using Wavelet Maxima,” Proc. SPIE: Wavelet Applications in Signal and Image Processing IV, vol. 2825, part 2, M.A. Unser, A. Aldroubi, and A.F. Laine, chairs/eds., pp.1003-1015, Aug. 1996.
[18] K. Etemad, D. Doerman, and R. Chellappa, “Multiscale Segmentation of Unstructured Document Pages Using Soft Decision Integration,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 1, pp. 92-96, Jan. 1997.
[19] R. Cao, C.L. Tan, Q. Wang, and P. Shen, “Segmentation and Analysis of Double-Sided Handwritten Archival Documents,” Proc. Fourth IAPR Int'l Workshop Document Analysis Systems, pp. 147-158, Dec. 2000.
[20] C.L. Tan, R. Cao, P. Shen, J. Chee, and J. Chang, “Removal of Interfering Strokes in Double-Sided Document Images,” Proc. Fifth IEEE Workshop Applications of Computer Vision, pp. 16-21, Dec. 2000.
[21] W. Niblack, An Introduction to Digital Image Processing, pp. 115-116, Englewood Cliffs, N.J.: Prentice Hall, 1986.
[22] L. Feng, Y.Y. Tang, and L.H. Yang, “A Wavelet Approach to Extracting Contours of Document Images,” Proc. Fifth Int'l Conf. Document Analysis and Recognition, pp. 71-74, Sept. 1999.
[23] M. Junker, R. Hoch, and A. Dengel, “On the Evaluation of Document Analysis Components by Recall, Precision, and Accuracy,” Proc. Fifth Int'l Conf. Document Analysis and Recognition, pp. 713-716, Sept. 1999.

Index Terms:
Document image analysis, wavelet enhancement, wavelet smearing, Canny edge detector, text extraction, image segmentation, bleed-through, show-through, noise cancellation, denoising.
Citation:
Chew Lim Tan, Ruini Cao, Peiyi Shen, "Restoration of Archival Documents Using a Wavelet Technique," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 10, pp. 1399-1404, Oct. 2002, doi:10.1109/TPAMI.2002.1039211
Usage of this product signifies your acceptance of the Terms of Use.