In this paper, we propose a novel method of building a language model for open-vocabulary Korean word recognition. Due to the complex morphology of Korean, it is inappropriate to use lexicons based on the linguistic entities such as words and morphemes in open-vocabulary domains. Instead, we build the lexicon by collecting variable length character sequences from the raw texts using a dynamic Bayesian network model of the language.
In simulated word recognition experiments, the proposed language model could find correct words from lattices of character candidates in 94.3% of cases, increasing the word recognition rates by 20.9%.