The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.06 - June (2002 vol.24)
pp: 838-844
ABSTRACT
<p>We propose a method for text retrieval from document images without the use of OCR. Documents are segmented into character objects. Image features, namely, the Vertical Traverse Density (VTD) and Horizontal Traverse Density (HTD), are extracted. An n-gram based document vector is constructed for each document based on these features. Text similarity between documents is then measured by calculating the dot product of the document vectors. Testing with seven corpora of imaged textual documents in English and Chinese as well as images from UW1 database confirms the validity of the proposed method.</p>
INDEX TERMS
Document image analysis, document vector, text similarity, textretrieval.
CITATION
Chew Lim Tan, Weihua Huang, Zhaohui Yu, Yi Xu, "Imaged Document Text Retrieval Without OCR", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.24, no. 6, pp. 838-844, June 2002, doi:10.1109/TPAMI.2002.1008389
6 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool