Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Vol 2
Word image based latent semantic indexing for conceptual querying in document image databases
Curitiba, Parana, Brazil
September 23-September 26
ISBN: 0-7695-2822-8
In this paper we present an application of latent semantic analysis (LSA) for indexing and retrieval of document images with text. The query is specified as a set of word images and the documents which best match with the query representation in the the latent semantic space are retrieved. We show through extensive experiments on a large database that use of LSA for document images provides improvements in retrieval precision as is the case with electronic text documents.
Citation:
Sameek Banerjee, Gaurav Harit, Santanu Chaudhury, "Word image based latent semantic indexing for conceptual querying in document image databases," icdar, vol. 2, pp.1208-1212, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Vol 2, 2007