This Article 
 Bibliographic References 
 Add to: 
A Cache-Based Natural Language Model for Speech Recognition
June 1990 (vol. 12 no. 6)
pp. 570-583

Speech-recognition systems must often decide between competing ways of breaking up the acoustic input into strings of words. Since the possible strings may be acoustically similar, a language model is required; given a word string, the model returns its linguistic probability. Several Markov language models are discussed. A novel kind of language model which reflects short-term patterns of word use by means of a cache component (analogous to cache memory in hardware terminology) is presented. The model also contains a 3g-gram component of the traditional type. The combined model and a pure 3g-gram model were tested on samples drawn from the Lancaster-Oslo/Bergen (LOB) corpus of English text. The relative performance of the two models is examined, and suggestions for the future improvements are made.

[1] A. M. Derouault and B. Mérialdo, "Natural language modeling for phoneme-to-text transcription,"IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI-8, no. 6, pp. 742-749, 1986.
[2] A. M. Derouault and B. Mérialdo, "Language modeling at the syntactic level," inProc. 7th Int. Conf. Pattern Recognition, vol. II, Montreal, Aug. 1984, pp. 1373- 1375.
[3] F. Jelinek, "The development of an experimental discrete dictation recognizer,"Proc. IEEE, vol. 73, no. 11, pp. 1616-1624, Nov. 1985.
[4] F. Jelinek, R. L. Mercer, and L. R. Bahl, "A maximum likehood approach to continuous speech recognition,"IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI-5, pp. 179-90, Mar. 1983.
[5] F. Jelinek and R. L. Mercer, "Interpolated estimation of Markov source parameters from sparse data," inPattern Recognition in Practice, E. S. Gelsema and L. H. Kanal, Eds. 1981, pp. 381-397.
[6] S. Johansson, E. Atwell, R. Garside, and G. Leech,The Tagged LOB Corpus User's Manual, Norwegian Computing Centre for the Humanities, Bergen, Norway, 1986.
[7] S. Johansson, "Some observations on word frequencies in three corpora of present-day English texts,"ITL Rev. Appl. Linguistics, vol. 67-68, pp. 117-126, 1985.
[8] S. Johansson, "Word frequency and text type: Some observations based on the LOB corpus of British English texts,"Comput. Humanities, vol. 19, pp. 23-36, 1985.
[9] S. Katz, "RecursiveM-gram modeling via a smoothing of Turing's formula," forthcoming paper.
[10] E. M. Muckstein, "A natural language parser with statistical applications," IBM Res. Rep. RC7516 (38450), Mar. 1981.
[11] A. Nadas, "Estimation of probabilities in the language model of the IBM speech recognition system,"IEEE Trans. Acoust., Speech, Signal Processing, vol. 32, pp. 859-861, Aug. 1984.
[12] J. L. Peterson and A. Silberschatz,Operating System Concepts. Reading, MA: Addison-Wesley, 1985.
[13] L. R. Rabiner and B. H. Juang, "An introduction to hidden Markov models,"IEEE ASSP Mag., pp. 4-16, Jun. 1986.

Index Terms:
cache-based natural language model; speech recognition; word string; linguistic probability; Markov language models; Lancaster-Oslo/Bergen; English text; Markov processes; natural languages; probability; speech recognition
R. Kuhn, R. De Mori, "A Cache-Based Natural Language Model for Speech Recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 12, no. 6, pp. 570-583, June 1990, doi:10.1109/34.56193
Usage of this product signifies your acceptance of the Terms of Use.