Second International Conference on Document Image Analysis for Libraries (DIAL'06)
Multilingual document recognition research and its application in China
Lyon, France
April 27-April 28
ISBN: 0-7695-2531-8
This paper demonstrates the research work on multilingual document recognition technology and its application in China, which is useful for building multilingual digital library. The multilingual OCR (Optical character recognition) key technologies and general system framework are summarized based on the previous research work for Chinese, Japanese, Korean, English, and recent research advancement for Tibetan, Uighur, Kazakh, Kirghiz, Arabic, Mongolian. The key technologies include statistical character recognition, structural analysis for similar character discrimination, character segmentation, script identification, post-processing. Application of multilingual document recognition system in digital library and website content construction will benefit people using various languages to retrieve knowledge.
Citation:
Liangrui Peng, Changsong Liu, Xiaoqing Ding, Hua Wang, "Multilingual document recognition research and its application in China," dial, pp.126-132, Second International Conference on Document Image Analysis for Libraries (DIAL'06), 2006