This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Synthesis and Recognition of Sequences
December 1991 (vol. 13 no. 12)
pp. 1245-1255

A string or sequence is a linear array of symbols that come from an alphabet. Due to unknown substitutions, insertions, and deletions of symbols, a sequence cannot be treated like a vector or a tuple of a fixed number of variables. The synthesis of an ensemble of sequences is a sequence of random elements that specify the probabilities of occurrence of the different symbols at the corresponding sites of the sequences. The synthesis is determined by a hierarchical sequence synthesis procedure (HSSP), which returns not only the taxonomic hierarchy of the whole ensemble of sequences but also the alignment and the synthesis of a group (a subset of the ensemble) of the sequences at each level of the hierarchy. The HSSP does not require the ensemble of sequences to be presented in the form of a tabulated array of data, the hierarchical information of the data, or the assumption of a stochastic process. The authors present the concept of sequence synthesis and the applicability of the HSSP as a supervised classification procedure as well as an unsupervised classification procedure.

[1] L. Miclet,Structural Methods in Pattern Recognition. North Oxford: Academic, 1986.
[2] D. Sankoff and J. B. Kruskal (eds.),Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. London: Addison-Wesley, 1983.
[3] R. C. Gonzalez and M. G. Thomason,Syntactic Pattern Recognition: An Introduction. Reading, MA: Addison-Wesley, 1978.
[4] B. W. Erickson and P. H. Sellers, "Recognition of patterns in genetic sequences," inTime Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison(D. Sankoff and J. B. Kruskal, eds.), London: Addison-Wesley, 1983.
[5] R. De Mori, "Syntactic recognition of speech patterns," inSyntactic Pattern Recognition, Applications(K. S. Fu, ed.). New York: Springer-Verlag, 1977.
[6] H. Rulot and E. Vidal, "Modeling (sub)string-length based constraints through a grammatical inference method," in NATO ASI Series,Pattern Recognition Theory and Applications(P. A. Devijver and J. Kittler, eds.), New York: Springer-Verlag, 1987, vol. F30.
[7] D. W. Bradley and R. A. Bradley, "Application of sequence comparison to the study of bird songs," inTime Warps, String Edits, and Macromolecules: the Theory and Practice of Sequence Comparison(D. Sankoff and J. B. Kruskal, eds.). London: Addison-Wesley, 1983.
[8] E. Granum and M. G. Thomason, "Chromosome analysis and automatically inferred Markov network models: Classification of band pattern structures," to be published inCytometry, 1989.
[9] K. S. Fu,Syntactic Methods in Pattern Recognition. New York: Academic, 1974.
[10] K. S. Fu (ed.),Syntactic Pattern Recognition Applications. New York: Springer-Verlag, 1977.
[11] K. S. Fu and T. L. Booth, "Grammatical inference: Introduction and survey,"IEEE Trans. Syst. Man. Cybern., vol. SMC-5, pp. 95-111 (Part 1) and pp. 409-423 (Part II), 1975.
[12] K. S. Fu, "Introduction to syntactic pattern recognition," inSyntactic Pattern Recognition(K. S. Fu, ed.). New York: Springer-Verlag, 1977.
[13] K. S. Fu,Pattern Recognition and Applications. Englewood Cliffs, NJ: Prentice-Hall, 1982.
[14] T. W. Anderson and L. A. Goodman, "Statistical inference about Markov chains,"Ann. Math. Stat., vol. 28, pp. 89-109, 1957.
[15] L. E. Baum and T. Petrie, "Statistical inference for probabilistic functions of finite Markov chains,"Ann. Math. Stat., vol. 37, pp. 1559-1563, 1966.
[16] L. E. Baum and J. A. Eagon, "An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology,"Bull. AMS, vol. 73, pp. 360-363, 1967.
[17] L. E. Baum, T. Petrie, G. Soules, and N. Weiss, "A maximization technique in the statistical analysis of probabilistic functions of Markov chains,"Ann. Math. Stat., vol. 41, pp. 164-171, 1970.
[18] L. E. Baum, "An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes,"Inequalities, vol. 3, pp. 1-8, 1972.
[19] S. E. Levinson, "Structural methods in automatic speech recognition,"Proc. IEEE, vol. 73, pp. 1625-1650, 1985.
[20] M. G. Thomason and E. Granum, "Dynamic programming inference of Markov networks from finite sets of sample strings,"IEEE Trans. Patt. Anal, Machine Intell., vol. PAMI-8, pp. 491-501, 1986.
[21] K. S. Fu and S. Y. Lu, "A clustering procedure for syntactic patterns,"IEEE Trans. Syst. Man. Cybern., vol. SMC-7, pp. 734-742, 1977.
[22] E. A. Patrick,Fundamentals of Pattern Recognition. Englewood Cliffs, NJ: Prentice Hall, 1972.
[23] R. O. Duda and P. E. Hart,Pattern Classification and Scene Analysis. New York: Wiley, 1973.
[24] J. T. Tou and R. C. Gonzalez,Pattern Recognition Principles. Reading, MA: Addison-Wesley, 1974.
[25] P. A. Devijer and J. Kittler,Pattern Recognition: A Statistical Approach. London: Prentice-Hall, 1982.
[26] K. C. C. Chan and A. K. C. Wong, "APACS: A system for automated analysis and classification," to be published inComputational Intell., 1990.
[27] A. K. C. Wong and K. C. C. Chan, "Automating the knowledge acquisition process in the construction of medical expert systems," to be published inArtificial Intell. Medicine, 1990.
[28] D. C. C. Wang and A. K. C. Wong, "Classification of discrete data with feature space transformation,"IEEE Trans. Automat. Contr., vol. AC-24, no. 3, pp. 434-437, 1979.
[29] A. K. C. Wong and D. Wang, "DECA: A discrete-valued data clustering algorithm,"IEEE Trans. Patt. Anal. Machine Intell., vol. PAMI-1, pp. 342-349, 1979.
[30] D. K. Y. Chiu and A. K. C. Wong, "Synthesizing knowledge: A cluster analysis approach using event-covering,"IEEE Trans. Syst., Man, Cybern., vol. SMC-16, no. 2, pp. 251-259, Mar./Apr. 1986.
[31] A. K. C. Wong and D. K. Y. Chiu, "Synthesizing statistical knowledge from incomplete mixed-mode data,"IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI-9, no. 6, pp. 796-805, Nov. 1987.
[32] M. You, "A random graph approach to pattern recognition," Ph.D. thesis, Dept. Syst. Design Eng., Univ. Waterloo, Canada, 1983.
[33] A. K. C. Wong and M. You, "Entropy and distance of random graphs with application to structural pattern recognition,"IEEE Trans. Patt. Anal. Machine Intell., vol. PAMI-7, pp. 599-609, 1985.
[34] A. K. C. Wong, "Structural pattern recognition: A random graph approach," in NATO ASI Series,Pattern Recognition Theory and Applications(P. A. Devijver and J. Kittler, eds.). New York: Springer-Verlag, 1987, vol. F30.
[35] A. K. C. Wong, J. Constant, and M. You, "Random graphs," inSyntactic and Structural Pattern Recognition - Fundamentals, Advances, and Applications(H. Bunke and A. Sanfeliu, eds.), New York: World Scientific, 1990.
[36] M. You and A. K. C. Wong, "An algorithm for graph optimal isomorphism," inProc. 7th Int. Conf. Part. Recog.
[37] S. C. Chan, A. K. C. Wong, and D. K. Y. Chiu, "A survey of multiple sequence comparison methods," to be published inBull. Math. Biol., 1991.
[38] S. C. Chan, "Random graph and sequence synthesis," Ph. D. thesis, Dept. Syst. Design Eng., Univ. Waterloo, Canada, 1990.
[39] H. A. Sneath and R. R. Sokal,Numerical Taxonomy. San Francisco: Freeman, 1973.
[40] G. Dunn and B. S. Everitt,An Introduction to Mathematical Taxonomy. Cambridge, U.K.: Cambridge University Press, 1982.
[41] M. Nei, F. Tajima, and Y. Tateno, "Accuracy of estimated phylogenetic trees from molecular data: II. Gene frequency data,"J. Mol. Evol., vol. 19, pp. 153-170, 1983.
[42] M. Nei, "Genetic distance between populations,"Amer. Nat., vol, 106, pp. 283-292, 1972.
[43] M. Nei, "The theory and estimation of genetic distance," inGenetic Structure of Populations(N. E. Morton, ed.). Honolulu: University of Hawaii Press, 1973.
[44] L. L. Cavalli-Sforza, "Human diversity," inProc. 12th Int. Congr. Genetics, vol. 3, pp. 405-416, 1969.
[45] A. W. F. Edwards and L. L. Cavalli-Sforza, "Reconstruction of evolutionary trees," inPhenetic and Phylogenetic Classification(V. H. Heywood and J. McNeill, eds.). London: Systematics Association, 1964.
[46] L. L. Cavalli-Sforza and W. F. Bodmer,The Genetics of Human Populations. San Francisco: Freeman, 1971.
[47] J. C. Gower, "A comparison of some methods of cluster analysis,"Biometrics, vol. 23, pp. 623-627, 1967.
[48] S. B. Needleman and C. D. Wunsch, "A general method applicable to the search for similarities in the amino acid sequences of two proteins,"J. Mol. Biol., vol. 48, pp. 444-453, 1970.
[49] M. S. Waterman, "General methods of sequence comparison,"Bull. Math. Biol., vol. 46, pp. 473-500, 1984.
[50] S. C. Chan, A. K. C. Wong, and D. K. Y. Chiu, "A multiple sequence comparison method," to be published inBull. Math. Biol., 1991.
[51] H. Grosjean, R. J. Cedergren, and W. McKay, "Structure in tRNA data,"Biochimie, vol. 64, pp. 387-397, 1982.
[52] G. N. Lance and W. T. Williams, "A general theory of classificatory sorting strategies: 1, Hierarchical systems,"Comput. J., vol. 9, pp. 373-380, 1967.
[53] D. Sankoff, R. J. Cedergren, and W. McKay, "A strategy for sequence phylogeny research,"Nucleic Acids Res., vol. 10, pp. 421-431, 1982.
[54] P. Hogeweg and B. Hesper, "The alignment of sets of sequences and the construction of phyletic trees: An integrated method,"J. Mol. Evol., vol. 20, pp. 175-186, 1984.
[55] W. T. Williams and H. T. Clifford, "On the comparison of two classifications of the same set of elements,"Taxon, vol. 20, pp. 519-522, 1971.
[56] H. T. Clifford and W. Stephenson,An Introduction to Numerical Classification. London: Academic, 1975.
[57] E. Diday and G. Cucumel, "Compatibility and consensus in numerical taxonomy," inProc. 9th Int. Conf. Patt. Recog., 1988.
[58] M. Gribskov, R. Lüthy, and D. Eisenberg, "Profile analysis,"Methods Enzymol., vol. 183, pp. 146-159.

Index Terms:
sequences synthesis; sequences recognition; alphabet; hierarchical sequence synthesis procedure; taxonomic hierarchy; alignment; supervised classification; unsupervised classification procedure; pattern recognition; probability
Citation:
S.C. Chan, A.K.C. Wong, "Synthesis and Recognition of Sequences," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, no. 12, pp. 1245-1255, Dec. 1991, doi:10.1109/34.106998
Usage of this product signifies your acceptance of the Terms of Use.