18th International Conference on Pattern Recognition (ICPR'06) Volume 3
Summarization of JBIG2 Compressed Indian Language Textual Images
Hong Kong
August 20-August 24
ISBN: 0-7695-2521-0
Utpal Garain, Indian Statistical Institute, 203, B. T. Road, Kolkata 700108, India
Alok K. Datta, Indian Statistical Institute, 203, B. T. Road, Kolkata 700108, India
U. Bhattacharya, Indian Statistical Institute, 203, B. T. Road, Kolkata 700108, India
S. K. Parui, Indian Statistical Institute, 203, B. T. Road, Kolkata 700108, India
This paper presents a method for automatic summarization of JBIG2 coded textual images without optical character recognition (OCR). Compressed images are partially (less than 10% of the uncompressed image size) decompressed and text lines and words are marked. A few features are computed at each sentence level. Based on the feature values sentences are then marked as a summary sentence or not. The system finally generates a set of sentences as summary. In addition, sentences are ranked within the summary. Experiment considers Indian language text images. Test results show a sentence selection efficiency of about 56% when judged against summarization generated by human. A nonparametric (distribution-free) rank statistic shows a correlation coefficient of 0.28 as a measure of the (minimum) strength of the associations between sentence ranking by machine and human.
Citation:
Utpal Garain, Alok K. Datta, U. Bhattacharya, S. K. Parui, "Summarization of JBIG2 Compressed Indian Language Textual Images," icpr, vol. 3, pp.344-347, 18th International Conference on Pattern Recognition (ICPR'06) Volume 3, 2006