Issue No. 05 - October (1995 vol. 7)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/69.469825
<p><it>Abstract</it>—This paper presents an automatic acquisition of <it>linguistic patterns</it> that can be used for knowledge-based information extraction from texts. In knowledge-based approach to information extraction, linguistic patterns play a central role in the recognition and classification of input texts. Although the knowledge-based approach has been proved effective for information extraction on limited domains, there are difficulties in construction of a large number of domain-specific linguistic patterns. Manual creation of patterns is time consuming and error prone, even for a small application domain. To solve the scalability and the portability problem, an <it>automatic acquisition</it> of patterns must be provided. In this paper, we present the PALKA (Parallel Automatic Linguistic Knowledge Acquisition) system that acquires linguistic patterns from a set of domain-specific training texts and their desired outputs. A specialized representation of patterns called <it>FP-structures</it> has been defined. Patterns are constructed in the form of FP-structures from training texts, and the acquired patterns are tuned further through the generalization of semantic constraints. Inductive learning mechanism is applied in the generalization step. The PALKA system has been used to generate patterns for our information extraction system developed for the fourth Message Understanding Conference (MUC-4). The MUC-4 was an ARPA-sponsored competitive evaluation of text analysis systems. Experimental results with a set of news articles from MUC-4 are discussed.</p>
Knowledge-based natural language processing, information extraction, linguistic knowledge acquisition, inductive learning.
Dan I. Moldovan, Jun-Tae Kim, "Acquisition of Linguistic Patterns for Knowledge-Based Information Extraction", IEEE Transactions on Knowledge & Data Engineering, vol. 7, no. , pp. 713-724, October 1995, doi:10.1109/69.469825