Pattern Recognition, International Conference on (2002)
Quebec City, QC, Canada
Aug. 11, 2002 to Aug. 15, 2002
ISSN: 1051-4651
ISBN: 0-7695-1695-X
pp: 30057
Yue Lu , National University of Singapore
Chew Lim Tan , National University of Singapore
An approach to searching user-specified words/phrases in Chinese document images, without the requirements of layout analysis, is proposed in this paper. Bounding boxes of Chinese character images are fir st determined using connected component analysis. Next, a suitable character from the user-specified word/phrase is chosen as the initial character to search for a matching candidate in the document. Once a matched candidate is found, its adjacent characters in the horizontal and vertical directions are examined for matching with other corresponding characters in the user-specified word/phrase, subject to the constraints of positional relation and size similarity. The character matching is done in two stages. The coarse matching is carried out based on the stroke density features. A weighted Hausdorff distance (WHD) is proposed for the second matching phase. Experimental results show that the proposed method can effectively search the user-specified Chinese word/phrase from horizontal or vertical text lines of document images.
