
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
Steve Lawrence, C. Lee Giles, Sandiway Fong, "Natural Language Grammatical Inference with Recurrent Neural Networks," IEEE Transactions on Knowledge and Data Engineering, vol. 12, no. 1, pp. 126140, January/February, 2000.  
BibTex  x  
@article{ 10.1109/69.842255, author = {Steve Lawrence and C. Lee Giles and Sandiway Fong}, title = {Natural Language Grammatical Inference with Recurrent Neural Networks}, journal ={IEEE Transactions on Knowledge and Data Engineering}, volume = {12}, number = {1}, issn = {10414347}, year = {2000}, pages = {126140}, doi = {http://doi.ieeecomputersociety.org/10.1109/69.842255}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Knowledge and Data Engineering TI  Natural Language Grammatical Inference with Recurrent Neural Networks IS  1 SN  10414347 SP126 EP140 EPD  126140 A1  Steve Lawrence, A1  C. Lee Giles, A1  Sandiway Fong, PY  2000 KW  Recurrent neural networks KW  natural language processing KW  grammatical inference KW  governmentandbinding theory KW  gradient descent KW  simulated annealing KW  principlesandparameters framework KW  automata extraction. VL  12 JA  IEEE Transactions on Knowledge and Data Engineering ER   
Abstract—This paper examines the inductive inference of a complex grammar with neural networks—specifically, the task considered is that of training a network to classify natural language sentences as grammatical or ungrammatical, thereby exhibiting the same kind of discriminatory power provided by the Principles and Parameters linguistic framework, or GovernmentandBinding theory. Neural networks are trained, without the division into learned vs. innate components assumed by Chomsky, in an attempt to produce the same judgments as native speakers on sharply grammatical/ungrammatical data. How a recurrent neural network could possess linguistic capability and the properties of various common recurrent neural network architectures are discussed. The problem exhibits training behavior which is often not present with smaller grammars and training was initially difficult. However, after implementing several techniques aimed at improving the convergence of the gradient descent backpropagationthroughtime training algorithm, significant learning was possible. It was found that certain architectures are better able to learn an appropriate grammar. The operation of the networks and their training is analyzed. Finally, the extraction of rules in the form of deterministic finite state automata is investigated.
[1] R.B. Allen, “Sequential Connectionist Networks for Answering Simple Questions about a Microworld,” Fifth Ann. Proc. Cognitive Science Soc., pp. 489495, 1983.
[2] E. Barnard and E.C. Botha, “BackPropagation Uses Prior Information Efficiently,” IEEE Trans. Neural Networks, vol. 4, no. 5, pp. 794802, Sept. 1993.
[3] E. Barnard and D. Casasent, “A Comparison between Criterion Functions for Linear Classifiers, with an Application to Neural Nets,” IEEE Trans. Systems, Man, and Cybernetics, vol. 19, no. 5, pp. 1,0301,041, 1989.
[4] E.B. Baum and F. Wilczek, “Supervised Learning of Probability Distributions by Neural Networks,” Neural Information Processing Systems, D.Z. Anderson, ed., pp. 5261, New York: Am. Inst. of Physics, 1988.
[5] M.P. Casey, “The Dynamics of DiscreteTime Computation, with Application to Recurrent Neural Networks and Finite State Machine Extraction,” Neural Computation, vol. 8, no. 6, pp. 1,1351,178, 1996.
[6] N.A. Chomsky, “Three Models for the Description of Language,” IRE Trans. Information Theory, vol. 2, pp. 113124, 1956.
[7] N.A. Chomsky, Lectures on Government and Binding. Foris Publications, 1981.
[8] N.A. Chomsky, Knowledge of Language: Its Nature, Origin, and Use. Prager, 1986.
[9] A. Cleeremans, D. ServanSchreiber, and J.L. McClelland, “Finite State Automata and Simple Recurrent Networks,” Neural Computation, vol. 1, no. 3, pp. 372381, 1989.
[10] C. Darken and J.E. Moody, “Note on Learning Rate Schedules for Stochastic Optimization,” Advances in Neural Information Processing Systems, R.P. Lippmann, J.E. Moody, and D.S. Touretzky, eds., vol. 3, pp. 832838, San Mateo, Calif.: Morgan Kaufmann, 1991.
[11] C. Darken and J.E. Moody, “Towards Faster Stochastic Gradient Search,” Neural Information Processing Systems 4, pp. 1,0091,016, San Mateo, Calif.: Morgan Kaufmann, 1992.
[12] J.L. Elman, “Structured Representations and Connectionist Models,” Sixth Ann. Proc. Cognitive Science Soc., pp. 1725, 1984.
[13] J.L. Elman, “Distributed Representations, Simple Recurrent Networks, and Grammatical Structure,” Machine Learning, vol. 7, pp. 195–225, 1991.
[14] P. Frasconi and M. Gori, “Computational Capabilities of LocalFeedback Recurrent Networks Acting as FiniteState Machines,” IEEE Trans. Neural Networks, vol. 7, no. 6, pp. 1,5211,524, 1996.
[15] P. Frasconi, M. Gori, M. Maggini, and G. Soda, “Unified Integration of Explicit Knowledge and Learning by Example in Recurrent Networks,” IEEE Trans. Knowledge and Data Eng., vol. 7, no. 2, pp. 340–346, Apr. 1995.
[16] P. Frasconi,M. Gori,, and G. Soda,“Local feedback multilayered networks,” Neural Computation, vol. 4, no. 1, pp. 120130, 1992.
[17] K.S. Fu, Syntactic Pattern Recognition and Applications. Englewood Cliffs, N.J.: Prentice Hall, 1982.
[18] M. Gasser and C. Lee, “Networks That Learn Phonology,” technical report, Computer Science Dept., Indiana Univ., 1990.
[19] C.L. Giles, C.B. Miller, D. Chen, H.H. Chen, G.Z. Sun, and Y.C. Lee, “Learning and Extracted Finite State Automata with SecondOrder Recurrent Neural Networks,” Neural Computation, vol. 4, no. 3, pp. 393–405, 1992.
[20] C.L. Giles, C.B. Miller, D. Chen, G.Z. Sun, H.H. Chen, and Y.C. Lee, “Extracting and Learning an Unknown Grammar with Recurrent Neural Networks,” Advances in Neural Information Processing Systems 4, J.E. Moody, S.J. Hanson, and R.P Lippmann, eds., pp. 317324, San Mateo, Calif.: Morgan Kaufmann, 1992.
[21] C.L. Giles, G.Z. Sun, H.H. Chen, Y.C. Lee, and D. Chen, “Higher Order Recurrent Networks and Grammatical Inference,” Advances in Neural Information Processing Systems 2, D.S. Touretzky, ed., pp. 380387, San Mateo, Calif.: Morgan Kaufmann, 1990.
[22] M. Hare, “The Role of Similarity in Hungarian Vowel Harmony: A Connectionist Account,” Technical Report CRL 9004, Center for Research in Language, Univ. of California, San Diego, 1990.
[23] M. Hare, D. Corina, and G.W. Cottrell, “Connectionist Perspective on Prosodic Structure,” Technical Report CRL Newsletter, vol. 3, no. 2, Center for Research in Language, Univ. of California, San Diego, 1989.
[24] C.L. Harris and J.L. Elman, “Representing Variable Information with Simple Recurrent Networks,” Sixth Ann. Proc. Cognitive Science Soc., pp. 635642, 1984.
[25] M.H. Harrison, Introduction to Formal Language Theory. Reading, Mass.: AddisonWesley, 1978.
[26] S. Haykin, Neural Networks: A Comprehensive Foundation, Macmillan College Press, New York, 1994.
[27] J. Hertz, A. Krogh, and R.G. Palmer, Introduction to the Theory of Neural Computation. AddisonWesley, 1991.
[28] J. Hopcroft and J. Ullman, Introduction to Automata Theory, Languages and Computation, pp. 2224. AddisonWesley, 1979.
[29] J. Hopfield, “Learning Algorithms and Probability Distributions in FeedForward and FeedBack Networks,” Proc. Nat'l Academy of Science, vol. 84, pp. 8,4298,433, 1987.
[30] B.G. Horne and C.L. Giles, “An Experimental Comparison of Recurrent Neural Networks,” Advances in Neural Information Processing Systems 7, G. Tesauro, D. Touretzky, and T. Leen, eds., pp. 697704, MIT Press, 1995.
[31] L. Ingber, “Very Fast Simulated ReAnnealing,” Math. Computer Modelling, vol. 12, pp. 967973, 1989.
[32] L. Ingber, “Adaptive Simulated Annealing (ASA),” technical report, Lester Ingber Research, McLean, Va., 1993.
[33] M.I. Jordan, “Attractor Dynamics and Parallelism in a Connectionist Sequential Machine,” Proc. Ninth Ann. Conf. Cognitive Science Soc., pp. 531546, 1986.
[34] M.I. Jordan, “Serial Order: A Parallel Distributed Processing Approach,” Technical Report ICS Report 8604, Inst. for Cognitive Science, Univ. of California, San Diego, May 1986.
[35] S. Kirkpatrick and G.B. Sorkin, “Simulated Annealing,” The Handbook of Brain Theory and Neural Networks, M.A. Arbib, ed., pp. 876878, Cambridge, Mass.: MIT Press, 1995.
[36] S. Kullback, Information Theory and Statistics. New York: Wiley, 1959.
[37] H. Lasnik and J. Uriagereka, A Course in GB Syntax: Lectures on Binding and Empty Categories. Cambridge, Mass.: MIT Press, 1988.
[38] S. Lawrence, S. Fong, and C.L. Giles, “Natural Language Grammatical Inference: A Comparison of Recurrent Neural Networks and Machine Learning Methods,” Connectionist, Statistical, and Symbolic Approaches to Learning for Natural Language Processing, S. Wermter, E. Riloff, and G. Scheler, eds., 1996.
[39] Y. Le Cun, “Efficient Learning and Second Order Methods,” Tutorial presented at Neural Information Processing Systems 5, 1993.
[40] L.R. Leerink and M. Jabri, “Learning the Past Tense of English Verbs Using Recurrent Neural Networks,” Proc. Australian Conf. Neural Networks, P. Bartlett, A. Burkitt, and R. Williamson, eds., pp. 222226, 1996.
[41] B. MacWhinney, J. Leinbach, R. Taraban, and J. McDonald, “Language Learning: Cues or Rules?” J. Memory and Language, vol. 28, pp. 255277, 1989.
[42] R. Miikkulainen and M. Dyer, “Encoding Input/Output Representations in Connectionist Cognitive Systems,” Proc. 1988 Connectionist Models Summer School, D.S. Touretzky, G.E. Hinton, and T.J. Sejnowski, eds., pp. 188195, 1989.
[43] M.C. Mozer, “A Focused Backpropagation Algorithm for Temporal Pattern Recognition,” Complex Systems, vol. 3, no. 4, pp. 349381, Aug. 1989.
[44] K.S. Narendra and K. Parthasarathy, "Identification and Control of Dynamical Systems Using Neural Networks," IEEE Trans. Neural Networks, Vol. 1, No. 1, Mar. 1990, pp. 427.
[45] C.W. Omlin and C.L. Giles, “Constructing Deterministic FiniteState Automata in Recurrent Neural Networks,” J. ACM, vol. 43, no. 6, pp. 937–972, 1996.
[46] C.W. Omlin and C.L. Giles, "Extraction of Rules from DiscreteTime Recurrent Neural Networks," Neural Networks, vol. 9, no. 1, pp. 4152, 1996.
[47] C.W. Omlin and C.L. Giles, “Rule Revision with Recurrent Neural Networks,” IEEE Trans. Knowledge and Data Eng., vol. 8, no. 1, pp. 183188, 1996.
[48] F. Pereira and Y. Schabes, “InsideOutside ReEstimation from Partially Bracketed Corpora,” Proc. 30th Ann. Meeting ACL, pp. 128135, 1992.
[49] D.M. Pesetsky, “Paths and Categories,” PhD thesis, MIT, 1982.
[50] J. Pollack,"The induction of dynamical recognizers," Machine Learning, vol. 7, nos. 2/3, pp. 227252, 1991.
[51] D.E. Rumelhart and J.L. McClelland, “On Learning the Past Tenses of English Verbs,” Parallel Distributed Processing; Volume 2: Psychological and Biological Models, J.L. McClelland, D.E. Rumelhart, and the PDP Research Group, eds., Cambridge, Mass.: MIT Press, pp. 216–271, 1986.
[52] J. W. Shavlik,"A framework of combining symbolic and neural learning," Machine Learning, vol. 14, no. 3, pp. 321331, 1994.
[53] H.T. Siegelmann, “Computation Beyond the Turing Limit,” Science, vol. 268, pp. 545548, 1995.
[54] H.T. Siegelmann, B.G. Horne, and C.L. Giles, “Computational Capabilities of Recurrent NARX Neural Networks,” IEEE Trans. Systems, Man and CyberneticsPart B, vol. 27, no. 2, p. 208, 1997.
[55] H.T. Siegelmann and E.D. Sontag, “On the Computational Power of Neural Nets,” J. Computer and System Sciences, vol. 50, no. 1, pp. 132150, 1995.
[56] P. Simard, M.B. Ottaway, and D.H. Ballard, “Analysis of Recurrent Backpropagation,” Proc. 1988 Connectionist Models Summer School, D. Touretzky, G. Hinton, and T. Sejnowski, eds., pp. 103112, 1989.
[57] S. A. Solla,E. Levin,M. Fleisher,M:“Accelerated Learning in Layered Neural Networks,” Complex Systems, vol. 2, pp. 625639, 1988.
[58] M.F. St. John and J.L. McClelland, “Learning and Applying Contextual Constraints in Sentence Comprehension,” Artificial Intelligence, vol. 46, pp. 217–257, 1990.
[59] A. Stolcke, “Learning FeatureBased Semantics with Simple Recurrent Networks,” Technical Report TR90015, Int'l Computer Science Inst., Berkeley, Calif., Apr. 1990.
[60] M. Tomita, “Dynamic Construction of FiniteState Automata from Examples Using HillClimbing,” Proc. Fourth Ann. Cognitive Science Conf., p. 105108, 1982.
[61] D.S. Touretzky, “Rules and Maps in Connectionist Symbol Processing,” Technical Report CMUCS89158, Dept. of Computer Science, Carnegie Mellon Univ., Pittsburgh, Pa., 1989.
[62] D.S. Touretzky, “Towards a Connectionist Phonology: The 'Many Maps' Approach to Sequence Manipulation,” Proc. 11th Annual Conf. Cognitive Science Soc., pp. 188195, 1989.
[63] A.C. Tsoi and A.D. Back, “Locally Recurrent Globally Feedforward Networks: A Critical Review of Architectures,” IEEE Trans. Neural Networks, vol. 5, no. 2, pp. 229239, 1994.
[64] R.L. Watrous and G.M. Kuhn, “Induction of Finite State Languages Using SecondOrder Recurrent Networks,” Advances in Neural Information Processing Systems 4, J.E. Moody, S.J. Hanson, and R.P Lippmann, eds., pp. 309316, San Mateo, Calif.: Morgan Kaufmann, 1992.
[65] R. Watrous and G. Kuhn,"Induction of finitestate languages using secondorder recurrent networks," Neural Computation, vol. 4, no. 3, p. 406, 1992.
[66] R.J. Williams and J. Peng,“An efficient gradientbased algorithm for online training of recurrent networks trajectories,” Neural Computation vol. 2, no. 4, pp. 490501, 1990.
[67] R.J. Williams and D. Zipser, “A Learning Algorithm for Continually Running Fully Recurrent Neural Networks,” Neural Computation, vol. 1, no. 2, pp. 270280, 1989.
[68] Z. Zeng, R.M. Goodman, and P. Smyth, “Learning Finite State Machines with SelfClustering Recurrent Networks,” Neural Computation, vol. 5, no. 6, pp. 976990, 1993.
[69] S. Lawrence, I. Burns, A.D. Back, A.C. Tsoi, and C.L. Giles, “Neural Network Classification and Unequal Prior Classes,” Tricks of the Trade, G. Orr, K.R. Müller, and R. Caruana, eds., pp. 299314. SpringerVerlag, 1998.