Issue No. 06 - June (1977 vol. 26)
F.J. Maryanski , Department of Computer Science, Kansas State University
The problem of the inference of finite-state probabilistic grammars is studied from two points of view. First, the theoretical aspects of grammatical inference are considered. Among the topics investigated are the structural and statistical properties of probabilistic grammars, methods for assigning probability measures to rewrite rules of probabilistic grammars, and statistical measures for determining how well an inferred probabilistic grammar approximates a sample set. The second concern of the study is the development and implementation of an algorithm for the inference of finite-state probabilistic grammars. This finite-state inference procedure produces a deterministic finite-state probabilistic grammar whose language approximates the sample set within a user-supplied acceptance region under the chi-square test. This procedure is enumerative. Heuristic tree-searching techniques are used to improve efficiency. The convergence of the procedure to an acceptable grammar is demonstrated and the steps of the procedure are theoretically justified. Test results of a PL/I implementation are presented. The inference procedure developed provides a means of synthesizing a probabilistic model of both physical and abstract systems from samples of their behavior.
Deterministic grammars, finite-state grammars, grammatical inference, Markov process, probabilistic grammars, statistical estimation.
T. Booth and F. Maryanski, "Inference of Finite-State Probabilistic Grammars," in IEEE Transactions on Computers, vol. 26, no. , pp. 521-536, 1977.