This Article 
 Bibliographic References 
 Add to: 
Inference of k-Testable Languages in the Strict Sense and Application to Syntactic Pattern Recognition
September 1990 (vol. 12 no. 9)
pp. 920-925

The inductive inference of the class of k-testable languages in the strict sense (k-TSSL) is considered. A k-TSSL is essentially defined by a finite set of substrings of length k that are permitted to appear in the strings of the language. Given a positive sample R of strings of an unknown language, a deterministic finite-state automation that recognizes the smallest k-TSSL containing R is obtained. The inferred automation is shown to have a number of transitions bounded by O(m) where m is the number of substrings defining this k-TSSL, and the inference algorithm works in O(kn log m) where n is the sum of the lengths of all the strings in R. The proposed methods are illustrated through syntactic pattern recognition experiments in which a number of strings generated by ten given (source) non-k-TSSL grammars are used to infer ten k-TSSL stochastic automata, which are further used to classify new strings generated by the same source grammars. The results of these experiments are consistent with the theory and show the ability of (stochastic) k-TSSLs to approach other classes of regular languages.

[1] N. Abramson,Information Theory and Coding. New York: Mc-Graw-Hill, 1966.
[2] A. V. Aho, J. E. Hopcroft, and J. D. Ullman,The Design and Analysis of Computer Algorithms. Menlo Park, CA: Addison-Wesley, 1974.
[3] D. Angluin, "On the complexity of minimum inference of regular sets,"Inform. Contr., vol. 39, pp. 337-350, 1978.
[4] D. Angluin, "Inductive inference of formal languages from positive data,"Inform. Contr., vol. 45, pp. 117-135, 1980.
[5] D. Angluin, "Inference of reversible languages,"J. ACM, vol. 29, no. 3, pp. 741-765, 1982.
[6] D. Angluin and C. H. Smith, "Inductive inference theory and methods,"ACM Comput. Surveys, vol. 15, pp. 237-269, 1983.
[7] L. R. Bahl, F. Jelinek, and L. R. Mercer, "A maximum likelihood approach to continuous speech recognition,"IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI-5, pp. 179-190, Mar. 1983.
[8] A. W. Biermann and J. A. Feldman, "On the synthesis of finite-state machines from samples of their behavior,"IEEE Trans. Comput., vol. C-21, pp. 592-597, 1972.
[9] A. M. Derouault and B. Mérialdo, "Natural language modeling for phoneme-to-text transcription,"IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI-8, no. 6, pp. 742-749, 1986.
[10] K. S. Fu and T. L. Booth, "Grammatical inference: Introduction and survey, Parts 1 and 2,"IEEE Trans. Syst., Man, Cybern, vol. SMC- 5, pp. 95-111, 409-423, 1975.
[11] K. S. Fu,Syntactic Pattern Recognition and Applications. Englewood Cliffs, NJ: Prentice-Hall, 1982.
[12] P. García, E. Vidal, and F. Casacuberta, "Local languages, the successor method, and a step towards a general methodology for the inference of regular grammars,"IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI-9 no. 6, pp. 841-845, 1987.
[13] P. García, "Explorabilidad local en inferencia inductiva de lenguajes regulares y aplicaciones," Doctoral dissertation, Departamento de Sistemas Informatícos y Computacion, Universidad Politécnica de Valencia, Valencia, Spain, 1988.
[14] E. M. Gold, "Language identification in the limit,"Inform. Contr., vol. 10, pp. 447-474, 1967.
[15] E. M. Gold, "Complexity of automaton identification from given data,"Inform. Contr., vol. 37, pp. 302-320, 1978.
[16] R. C. Gonzalez and M. G. Thomason,Syntactic Pattern Recognition, An Introduction. Reading, MA: Addison-Wesley, 1978.
[17] S. Y. Itoga, "A new heuristic for inferring regular grammars,"IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI-3, pp. 191-197, Mar. 1981.
[18] O. Venta, "A very fast sentence reconstruction method for the post-processing of computer-recognized continuous speech," inProc. IEEE ICPR '84, 1984, pp. 1240-1243.
[19] M. Kudo and M. Shimbo, "Efficient regular grammatical inference techniques by the use of partial similarities and their logical relationships,"Pattern Recognition, vol. 21, no. 4, pp. 401-409, 1988.
[20] B. Levine, "Derivatives of tree Sets with applications to grammatical inference,"IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI- 3, May 1981.
[21] R. McNaughton, "Algebraic decision procedures for local testability,"Math. Syst. Theory, vol. 8, no. 1, pp. 60-76, 1974.
[22] L. Miclet, "Regular inference with a tail-clustering method,"IEEE Trans. Syst., Man, Cybern.vol. SMC-10, pp. 737-747, 1980.
[23] J. L. Peterson, "Computer programs for spelling correction: An experiment in program design," inLecture Notes in Computer Science 96. New York: Springer-Verlag, 1980.
[24] V. Radhakrishnan and G. Nagaraja, "Inference of regular grammars via skeletons,"IEEE Trans. Syst., Man, Cybern., vol. SMC-17, no. 6, 1987.
[25] V. Radhakrishnan and G. Nagaraja, "Inference of even linear grammars and its application to picture description languages"Pattern Recognition, vol. 21, no. 1, pp. 55-62, 1988.
[26] M. Richetin and F. Vernadat, "Efficient regular grammatical inference for pattern recognition,"Pattern Recognition, vol. 17, no. 2, pp. 245-250, 1984.
[27] H. Rulot and E. Vidal, "Modeling (sub)string-length based constraints through a grammatical inference method," in NATO ASI Series,Pattern Recognition Theory and Applications(P. A. Devijver and J. Kittler, eds.), New York: Springer-Verlag, 1987, vol. F30.
[28] A. Salomaa,Formal Languages. New York: Academic, 1973.
[29] C. E. Shannon, "A mathematical theory of communication,"Bell Syst. Tech. J., vol. 27, pp. 390-397, 1948.
[30] C. E. Shannon, "Prediction and entropy of printed English,"Bell Syst. Tech. J., vol. 30, no. 1, pp. 50-64, 1951.
[31] A. R. Smith, J. N. Denemberg, T. B. Slack, C. C. Tan, and R. E. Wohlford, "Application of a sequential pattern learning system to connected speech recognition," inProc. IEEE ICASSP '85, 1985, pp. 31.2.1-31.2.4.
[32] O. Venta and T. Kohonen, "A non-stochastic method for the correction of sentences," inProc. IEEE ICPR '86, 1986, pp. 1214- 1217.
[33] F. Vernadat and M. Richetin, "Regular inference for syntactic pattern recognition: A case study," inProc. IEEE ICPR '84, 1984, 1370- 1372.
[34] E. Vidal, F. Casacuberta, E. Sanchis, and J. M. Benedi, "A general fuzzy parsing scheme for speech recognition," inNATO-ASI New Systems and Architectures for Automatic Speech Recognition and Synthesis, R. De Mori and C. Y. Suen, Eds. New York: Springer-Verlag, 1985, pp. 427-446.
[35] Y. Zalcstein, "Locally testable languages,"J. Comput. Syst. Sci., vol. 6, pp. 151-167, 1972.
[36] F. J. Maryanski and T. L. Booth, "Inference of finite state probabilistic grammars,"IEEE Trans. Comput., vol. C-26, pp. 531-536, 1977.
[37] R. Chaudhuri and A. N. V. Rao, "Approximating grammar probabilities: Solution of a conjecture,"J. Ass. Comput. Mach., vol. 33, no. 4, pp. 702-705, 1986.

Index Terms:
k-testable languages; syntactic pattern recognition; inductive inference; strings; deterministic finite-state automation; inference algorithm; grammars; computational complexity; finite automata; formal languages; grammars; inference mechanisms; pattern recognition
P. Garcia, E. Vidal, "Inference of k-Testable Languages in the Strict Sense and Application to Syntactic Pattern Recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 12, no. 9, pp. 920-925, Sept. 1990, doi:10.1109/34.57687
Usage of this product signifies your acceptance of the Terms of Use.