This Article 
 Bibliographic References 
 Add to: 
Acquisition of Linguistic Patterns for Knowledge-Based Information Extraction
October 1995 (vol. 7 no. 5)
pp. 713-724

Abstract—This paper presents an automatic acquisition of linguistic patterns that can be used for knowledge-based information extraction from texts. In knowledge-based approach to information extraction, linguistic patterns play a central role in the recognition and classification of input texts. Although the knowledge-based approach has been proved effective for information extraction on limited domains, there are difficulties in construction of a large number of domain-specific linguistic patterns. Manual creation of patterns is time consuming and error prone, even for a small application domain. To solve the scalability and the portability problem, an automatic acquisition of patterns must be provided. In this paper, we present the PALKA (Parallel Automatic Linguistic Knowledge Acquisition) system that acquires linguistic patterns from a set of domain-specific training texts and their desired outputs. A specialized representation of patterns called FP-structures has been defined. Patterns are constructed in the form of FP-structures from training texts, and the acquired patterns are tuned further through the generalization of semantic constraints. Inductive learning mechanism is applied in the generalization step. The PALKA system has been used to generate patterns for our information extraction system developed for the fourth Message Understanding Conference (MUC-4). The MUC-4 was an ARPA-sponsored competitive evaluation of text analysis systems. Experimental results with a set of news articles from MUC-4 are discussed.

[1] J.D. Becker,“The phrasal lexicon,” Bolt Beranek and Newman Inc. Report no. 3081, 1975.
[2] R.C. Berwick,The Acquisition of Syntactic Knowledge, MIT Press, 1985.
[3] R.J. Brachman and J.G. Schmolze,“An overview of the KL-ONE knowledge representation system,” Cognitive Science, vol. 9, 1985.
[4] J.G. Carbonell,“Towards a self-extending parser,” Proc. 17th Meeting Assoc. for Computational Linguistics, 1979.
[5] K. Church and P. Hanks,“Word association norms, mutualinformation, and lexicography,” Proc. 28th Meeting Assoc. for Computational Linguistics, 1990.
[6] G. DeJong,“Prediction and substantiation: A new approach to natural languageprocessing,” Cognitive Science, vol. 3, pp. 251-273, 1979.
[7] R.H. Granger,“FOUL-UP: A program that figures out meanings of words from context,” Proc. 5th Int’l Joint Conf. Artificial Intelligence, 1977.
[8] A. Hauptmann,“From syntax to meaning in natural language processing,” Proc. 10th Nat’l Conf. Artificial Intelligence, 1991.
[9] J.R. Hobbs,D. Appelt,M. Tyson,J. Bear,, and D. Israel,“FASTUS: System summary,” Proc. Fourth Message Understanding Conf., 1992.
[10] P. Jacobs and U. Zernik,“Acquiring lexical knowledge from text: A case study,” Proc. Seventh Nat’l Conf. Artificial Intelligence, 1988.
[11] P. Jacobs and L. Rau,“Scisor: Extracting information from on-line news,” Comm. ACM, vol. 33, no. 11, 1990.
[12] P. Jacobs,“Using statistical methods to improve knowledge-based newscategorization,” IEEE Expert, Apr., 1993.
[13] J.-T. Kim and D. Moldovan,“Acquisition of semantic patterns for information extraction fromcorpora,” Proc. Ninth Conf. AI applications, 1993.
[14] J.-T. Kim,Semantic Knowledge Acquisition for Information Extraction from Texts on aParallel Marker-Passing Computer, PhD dissertation, Univ. of Southern California, Dept. of EE-Systems, 1993.
[15] H. Kitano,“ΦDM-Dialog: An experimental speech-to-speech dialog translationsystem,” Computer, June, 1991.
[16] J.F. Lehman,Adaptive Parsing: Self-Extending Natural Language Interface, Kluwer Academic Publisher, 1992.
[17] W.G. Lehnert,C. Cardie,D. Fisher,J. McCarthy,E. Riloff,, and S. Soderland,“Description of the CIRCUS system used for MUC-4,” Proc. Fourth Message Understanding Conf., 1992.
[18] W.G. Lehnert,“The role of scripts in understanding,” D. Metzing, ed., Frame Conceptions and Text Understanding.Berlin: De Gruyter, pp. 79-95, 1980.
[19] Strategies for Natural Language Processing, W.G. Lehnert and M.H. Ringle, eds., Lawrence Erlbaum Associates, 1982.
[20] R. Michalski,“A theory and methodology of inductive learning,” Artificial Intelligence, vol. 20, 1983.
[21] T. Mitchell,“Generalization as search,” Artificial Intelligence, vol. 18, 1982.
[22] D, Moldovan,W. Lee,C. Lin,, and M. Chung,“SNAP: Parallel processing applied to AI,” Computer, June, 1992.
[23] D. Moldovan,S. Cha,M. Chung,K. Hendrickson,J. Kim,, and S. Kowalski,“Description of the SNAP system used for MUC-4,” Proc. Fourth Message Understanding Conf., 1992.
[24] M.T. Pazienza and P. Velardi,“Methods for extracting knowledge from corpora,” Proc. Fifth Ann. Workshop Conceptual Structures, 1990.
[25] Proc. Fourth Message Understanding Conf., Morgan Kaufmann, 1992.
[26] J. Pustejovsky,“The generative lexicon,” Computational Linguistics, vol. 17, no. 4, pp. 409-441, 1991.
[27] C.K. Riesbeck and C.E. Martin,“Direct memory access parsing,” Report 354, Dept. of Computer Science, Yale Univ., 1985.
[28] E. Riloff and W. Lehnert,“Automated dictionary construction for information extraction fromtext,” Proc. Ninth Conf. AI Applications, 1993.
[29] R. Shank and R. Abelson,Scripts, Plans, Goals, and Understanding, Lawrence Erlbaum Associates, N.J., 1977.
[30] F. Smadja,“Macrocoding the lexicon with co-occurrence knowledge,” Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon, Lawrence Erlbaum Associates, N.J., 1991.
[31] P. Velardi,M.T. Pazienza,, and S. Magrini,“Acquisition of semantic patterns from a natural corpus of texts,” ACM SIGART Newsletter, no. 108, Apr. 1989.
[32] R. Wilenski,Y. Arens,, and D. Chin,“Talking to Unix in English: An overview of UC,” Comm. ACM, vol. 27, no. 6, 1984.
[33] U. Zernik and M.G. Dyer,“The self-extending phrasal lexicon,” Computational Linguistics, vol. 13, nos. 3-4, 1987.
[34] U. Zernik,Strategies in Language Acquisitions: Learning Phrases from Examples inContext, PhD Dissertation, Computer Science Dept., UCLA, 1987.
[35] Lexical Acquisition: Exploiting On-Line Resources to Build aLexicon, U. Zernik, ed., Lawrence Erlbaum Associates, N.J., 1991.

Index Terms:
Knowledge-based natural language processing, information extraction, linguistic knowledge acquisition, inductive learning.
Jun-Tae Kim, Dan I. Moldovan, "Acquisition of Linguistic Patterns for Knowledge-Based Information Extraction," IEEE Transactions on Knowledge and Data Engineering, vol. 7, no. 5, pp. 713-724, Oct. 1995, doi:10.1109/69.469825
Usage of this product signifies your acceptance of the Terms of Use.