Eighth International Conference on Document Analysis and Recognition (ICDAR'05) Rule-based Middle-level Character Detection for Simplifying Thai Document Layout Analysis Seoul, Korea August 31-September 01 ISBN: 0-7695-2420-6
Although research interest in machine printed Thai character recognition has been an intense research area in the past decade, there are only a few results available for Thai document layout analysis. In addition, directly using the method proposed for other languages with Thai documents is not possible since Thai documents have a unique characteristic (i.e., Thai characters can be placed in four different levels). This paper proposed an approach to eliminate that characteristic by removing nonmiddle-level characters from the image based on heuristic rules derived from Thai language properties: nonmiddle-level characters are usually smaller than middle-level characters, and the gap between each level is smaller than the gap between two consecutive lines. After removed, one can use any existing methods with Thai documents without any modification. The experimental results show that the proposed method can effectively remove nonmiddle-level characters from 200 test images with 99.46% accuracy even when the image contains various font sizes.
Citation:
Chaiyakorn Yingsaeree, Asanee Kawtrakul, "Rule-based Middle-level Character Detection for Simplifying Thai Document Layout Analysis," icdar, pp.888-892, Eighth International Conference on Document Analysis and Recognition (ICDAR'05), 2005 Usage of this product signifies your acceptance of the Terms of Use. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||