Issue No. 11 - November (2008 vol. 30)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TPAMI.2008.89
Chew Lim Tan , National University of Singapore, Singapore
Linlin Li , National University of Singapore, Singapore
Shijian Lu , A*STAR, Singapore
This paper presents a document retrieval technique that is capable of searching document images without OCR (optical character recognition). The proposed technique retrieves document images by a new word shape coding scheme, which captures the document content through annotating each word image by a word shape code. In particular, we annotate word images by using a set of topological shape features including character ascenders/descenders, character holes, and character water reservoirs. With the annotated word shape codes, document images can be retrieved by either query keywords or a query document image. Experimental results show that the proposed document image retrieval technique is fast, efficient, and tolerant to various types of document degradation.
Image/video retrieval, Shape, Text processing, Document analysism, Document Capture, Document and Text Processing, Computing Methodologies, Shape, Vision and Scene Understanding, Artificial Intelligence
Chew Lim Tan, Linlin Li, Shijian Lu, "Document Image Retrieval through Word Shape Coding", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 30, no. , pp. 1913-1918, November 2008, doi:10.1109/TPAMI.2008.89