CSDL Home IEEE Transactions on Pattern Analysis & Machine Intelligence 2008 vol.30 Issue No.11 - November
Issue No.11 - November (2008 vol.30)
Linlin Li , National University of Singapore, Singapore
Shijian Lu , A*STAR, Singapore
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TPAMI.2008.89
This paper presents a document retrieval technique that is capable of searching document images without OCR (optical character recognition). The proposed technique retrieves document images by a new word shape coding scheme, which captures the document content through annotating each word image by a word shape code. In particular, we annotate word images by using a set of topological shape features including character ascenders/descenders, character holes, and character water reservoirs. With the annotated word shape codes, document images can be retrieved by either query keywords or a query document image. Experimental results show that the proposed document image retrieval technique is fast, efficient, and tolerant to various types of document degradation.
Image/video retrieval, Shape, Text processing, Document analysism, Document Capture, Document and Text Processing, Computing Methodologies, Shape, Vision and Scene Understanding, Artificial Intelligence
Linlin Li, Shijian Lu, "Document Image Retrieval through Word Shape Coding", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.30, no. 11, pp. 1913-1918, November 2008, doi:10.1109/TPAMI.2008.89