Subscribe

Issue No.01 - January (2009 vol.21)

pp: 66-77

Deyu Zhou , The University of Reading, Reading

Yulan He , The University of Reading, Reading

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2008.95

ABSTRACT

In this paper, we discuss how discriminative training can be applied to the Hidden Vector State (HVS) model in different task domains. The HVS model is a discrete Hidden Markov Model (HMM) in which each HMM state represents the state of a push-down automaton with a finite stack size. In previous applications, Maximum Likelihood estimation (MLE) is used to derive the parameters of the HVS model. However, MLE makes a number of assumptions and unfortunately some of these assumptions do not hold. Discriminative training, without making such assumptions, can improve the performance of the HVS model. Experiments have been conducted in two domains: the travel domain for the semantic parsing task using the DARPA Communicator data and the ATIS data, and the bioinformatics domain for the information extraction task using the GENIA corpus. The results demonstrate modest improvements of the performance of the HVS model using discriminative training. In the travel domain, discriminative training of the HVS model gives a relative error reduction rate of 31% in F-measure when compared with MLE on the DARPA Communicator data and 9% on the ATIS data. In the bioinformatics domain, a relative error reduction rate of 4% in F-measure is achieved on the GENIA corpus.

INDEX TERMS

Language parsing and understanding, Machine learning, Parameter learning

CITATION

Deyu Zhou, Yulan He, "Discriminative Training of the Hidden Vector State Model for Semantic Parsing",

*IEEE Transactions on Knowledge & Data Engineering*, vol.21, no. 1, pp. 66-77, January 2009, doi:10.1109/TKDE.2008.95REFERENCES

- [1] J. Dowding, R. Moore, F. Andry, and D. Moran, “Interleaving Syntax and Semantics in an Efficient Bottom-Up Parser,”
Proc. 32nd Ann. Meeting of the Assoc. for Computational Linguistics, pp.110-116, 1994.- [2] W. Ward and S. Issar, “Recent Improvements in the CMU Spoken Language Understanding System,”
Proc. ARPA Human Language Technology Workshop (HLT '94), pp. 213-216, 1994.- [3] M. Collins, “Head-Driven Statistical Models for Natural Language Parsing,” PhD dissertation, Univ. of Pennsylvania, 1999.
- [4] E. Charniak, “A Maximum Entropy Inspired Parser,”
Proc. First Meeting of North Am. Chapter of Assoc. for Computational Linguistics, pp. 132-139, 2000.- [5] Y. Normandin and S.D. Morgera, “An Improved MMIE Training Algorithm for Speaker-Independent, Small Vocabulary, Continuous Speech Recognition,”
Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing (ICASSP '91), pp. 537-540, 1991.- [6] B.H. Juang, W. Chou, and C.H. Lee, “Statistical and Discriminative Methods for Speech Recognition,”
Speech Recognition and Coding—New Advances and Trends, A.J.R. Ayuso and J.M.L. Soler,eds., Springer Verlag, 1995.- [7] W. Chou, C. Lee, and B. Juang, “Minimum Error Rate Training Based on N-Best String Models,”
Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing (ICASSP '93), vol. 2, pp. 652-655, Apr. 1993.- [8] J. Chen and F. Soong, “An N-Best Candidates-Based Discriminative Training for Speech Recognition Applications,”
IEEE Trans. Speech and Audio Processing, vol. 2, pp. 206-216, 1994.- [9] B. Juang, W. Hou, and C. Lee, “Minimum Classification Error Rate Methods for Speech Recognition,”
IEEE Trans. Speech and Audio Processing, vol. 5, pp. 257-265, 1997.- [10] R. Pieraccini, E. Tzoukermann, Z. Gorelov, E. Levin, C.H. Lee, and J.-L. Gauvain, “Progress Report on the Chronus System: ATIS Benchmark Results,”
Proc. DARPA Speech and Natural Language Workshop, pp. 67-71, 1992.- [11] S. Miller, R. Bobrow, R. Ingria, and R. Schwartz, “Hidden Understanding Models of Natural Language,”
Proc. 32nd Ann. Meeting of the Assoc. for Computational Linguistics, pp. 25-32, June 1994.- [12] S. Miller, R. Bobrow, and R. Ingria, “Statistical Language Processing Using Hidden Understanding Models,”
Proc. ARPA Human Language Technology Workshop (HLT '94), pp. 278-282, Mar. 1994.- [13] S. Miller, M. Bates, R. Bobrow, R. Ingria, J. Makhoul, and R. Schwartz, “Recent Progress in Hidden Understanding Models,”
Proc. DARPA Speech and Natural Language Workshop, pp. 276-280, Jan. 1995.- [14] R. Schwartz, S. Miller, D. Stallard, and J. Makhoul, “Language Understanding Using Hidden Understanding Models,”
Proc. Fourth Int'l Conf. Spoken Language Processing (ICSLP '96), Oct. 1996.- [15] S. Fine, Y. Singer, and N. Tishby, “The Hierarchical Hidden Markov Model: Analysis and Applications,”
Machine Learning, vol. 32, pp. 41-62, 1998.- [16] K. Murphy and M. Paskin, “Linear Time Inference in Hierarchical HMMS,”
Proc. Neural Information Processing Systems, Dec. 2001.- [17] E. Charniak, “Immediate-Head Parsing for Language Models,”
Proc. 39th Ann. Meeting of the Assoc. for Computational Linguistics, pp. 124-131, 2001.- [18] J. Henderson, “Inducing History Representations for Broad Coverage Statistical Parsing,”
Proc. Joint Meeting of the North Am. Chapter of the Assoc. for Computational Linguistics and the Human Language Technology Conf. (HLT-NAACL '03), May 2003.- [19] C. Chelba and M. Mahajan, “Information Extraction Using the Structured Language Model,”
Proc. Conf. Empirical Methods in Natural Language Processing (EMNLP), 2001.- [20] Y. He and S. Young, “Semantic Processing Using the Hidden Vector State Model,”
Computer Speech and Language, vol. 19, no. 1, pp. 85-106, 2005.- [21] H.-K. Kuo, E. Fosle-Lussier, H. Jiang, and C. Lee, “Discriminative Training of Language Models for Speech Recognition,”
Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing (ICASSP '02), vol. 1, pp. 325-328, Apr. 2002.- [22] L. Bahl, P. Brown, P. de Souza, and R. Mercer, “Maximum Mutual Information Estimation of Hidden Markov Model Parameters for Speech Recognition,”
Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing (ICASSP '86), pp. 49-52, 1986.- [23] P. Brown, “The Acoustic-Modelling Problem in Automatic Speech Recognition,” PhD dissertation, Carnegie Mellon Univ., 1987.
- [24] P. Gopalakrishnan, D. Kanevsky, A. Nadas, and D. Nahamoo, “An Inequality for Rational Functions with Applications to Some Statistical Estimation Problems,”
IEEE Trans. Information Theory, vol. 37, no. 1, pp. 107-113, 1991.- [25] V. Valtchev, J. Odell, P. Woodland, and S. Young, “Lattice-Based Discriminative Training for Large Vocabulary Speech Recognition,”
Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing (ICASSP '96), vol. 2, pp. 605-608, May 1996.- [26] D. Povey and P. Woodland, “Minimum Phone Error and I-Smoothing for Improved Discriminative Training,”
Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing (ICASSP '02), vol. 1, pp. 105-108, Apr. 2002.- [27] J. Henderson, “Discriminative Training of a Neural Network Statistical Parser,”
Proc. 42nd Ann. Meeting of the Assoc. for Computational Linguistics, pp. 95-102, 2004.- [28] L.E. Baum, T. Petrie, G. Soules, and N. Weiss, “A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains,”
Ann. Math. Statistics, vol. 41, pp.164-171, 1970.- [29] D. Klein and C.D. Manning, “Conditional Structure versus Conditional Estimation in NLP Models,”
Proc. Conf. Empirical Methods in Natural Language Processing (ACL '02), pp. 9-16, 2002.- [30] CUData,
DARPA Communicator Travel Data, Univ. of Colorado at Boulder, http://communicator.colorado.eduphoenix, 2004.- [31] D.A. Dahl, M. Bates, M. Brown, W. Fisher, K. Hunicke-Smith, D. Pallett, C. Pao, A. Rudnicky, and E. Shriberg, “Expanding the Scope of the ATIS Task: The ATIS-3 Corpus,”
Proc. ARPA Workshop Human Language Technology (HLT '94), pp. 43-48, 1994.- [32] D. Zhou, Y. He, and C.K. Kwoh, “Extracting Protein-Protein Interactions from the Literature Using the Hidden Vector State Model,”
Proc. Int'l Workshop Bioinformatics Research and Applications (IWBRA '06), pp. 718-725, 2006.- [33] J. Kim, T. Ohta, Y. Tateisi, and J. Tsujii, “GENIA Corpus-Semantically Annotated Corpus for Bio-Textmining,”
Bioinformatics, vol. 9, no. Suppl 1, pp. i180-182, 2003.- [34] HTK,
Hidden Markov Model Toolkit (HTK) 3.2, Eng. Dept., Cambridge Univ., http:/htk.eng.cam.ac.uk, 2002. |