Fourth International Conference Document Analysis and Recognition (ICDAR'97) Extracting Text from WWW Images Ulm, GERMANY August 18-August 20 ISBN: 0-8186-7898-4
In this paper, we examine the problem of locating and extracting text from in-line images of World Wide Web pages. We described a text detection algorithm which is based on color clustering and connected component analysis. The algorithm first quantizes the color space of the input image into a number of color classes using a parameter-free clustering procedure. It then identifies text-like connected components in each color class based on their shapes. Finally, a post-processing procedure aligns text-like components into textlines. The experimental results show that our text extraction algorithm works well on a variety of test images.
Index Terms:
text detection, information retrieval, World Wide Web.
Citation:
Jiangying Zhou, Daniel Lopresti, "Extracting Text from WWW Images," icdar, pp.248, Fourth International Conference Document Analysis and Recognition (ICDAR'97), 1997 Usage of this product signifies your acceptance of the Terms of Use. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||