Language Model Based on Word Order Sensitive Matrix Representation in Latent Semantic Analysis for Speech Recognition
Computer Science and Information Engineering, World Congress on (2009)
Los Angeles, California USA
Mar. 31, 2009 to Apr. 2, 2009
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/CSIE.2009.353
This paper investigates matrix representation in latent semantic analysis (LSA) framework for a language model. In LSA, word-document matrix is usually used to represent a corpus. However, this matrix ignores word order in the sentence. We propose several word co-occurrence matrices that keep word order to use in LSA. To support this matrix, we define a context dependent class (CDC) language model, which distinguishes classes according to their context in the sentences. Experiments on Wall Street Journal (WSJ) corpus show that the proposed method achieves better performance than the original LSA with word-document matrix.
Language model, Latent semantic analysis, Word co-occurrence matrix
Masatoshi Tsuchiya, Seiichi Nakagawa, Welly Naptali, "Language Model Based on Word Order Sensitive Matrix Representation in Latent Semantic Analysis for Speech Recognition", Computer Science and Information Engineering, World Congress on, vol. 07, no. , pp. 252-256, 2009, doi:10.1109/CSIE.2009.353