This Article 
 Bibliographic References 
 Add to: 
On the Pattern Recognition of Noisy Subsequence Trees
September 2001 (vol. 23 no. 9)
pp. 929-946

—In this paper, we consider the problem of recognizing ordered labeled trees by processing their noisy subsequence-trees which are “patched-up” noisy portions of their fragments. We assume that we are given H, a finite dictionary of ordered labeled trees. $\rm X^*$ is an unknown element of H, and U is any arbitrary subsequence-tree of $\rm X^*$. We consider the problem of estimating $\rm X^*$ by processing Y, which is a noisy version of U. The solution which we present is, to our knowledge, the first reported solution to the problem. We solve the problem by sequentially comparing Y with every element X of H, the basis of comparison being a new dissimilarity measure between two trees, which implicitly captures the properties of the corrupting mechanism (“channel”) which noisily garbles U into Y. The algorithm which incorporates this constraint has been used to test our pattern recognition system yielding a remarkable accuracy. Experimental results which involve manually constructed trees of sizes between 25 and 35 nodes, and which contain an average of 21.8 errors per tree demonstrate that the scheme has about 92.8 percent accuracy. Similar experiments for randomly generated trees yielded an accuracy of 86.4 percent.

[1] R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis. New York: John Wiley and Sons. 1973.
[2] P. Kilpelainen and H. Mannila, “Ordered and Unordered Tree Inclusion,” Technical Report A-1991-4, Dept. of Computer Science, Univ. of Helsinki, Aug. 1991, to appear inSIAM J. Computing.
[3] S.-Y. Le, J. Owens, R. Nussinov, J.-H. Chen, B. Shapiro, and J.V. Maizel, “RNA Secondary Structures: Comparison and Determination of Frequently Recurring Substructures by Consensus,” Computers and Applied Biosciences, vol. 5, pp. 205-210, 1989.
[4] S.-Y Le, R. Nussinov, and J.V. Maizel, “Tree Graphs of RNA Secondary Structures and Comparisons,” Computers and Biomedical Research, vol. 22, pp. 461-473, 1989.
[5] S.Y. Lu, “A Tree-to-Tree Distance and its Application to Cluster Analysis,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 1, no. 2, pp. 219-224, Feb. 1979.
[6] S.Y. Lu, “A Tree-Matching Algorithm Based on Node Splitting and Merging,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 6, no. 2, pp. 249-256, Feb. 1984.
[7] B.J. Oommen, “Constrained String Editing,” Information Science, vol. 40, pp. 267-284, 1986.
[8] B.J. Oommen, “Recognition of Noisy Subsequences Using Constrained Edit Distances,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 9, no. 5, pp. 676-685, May 1987.
[9] B.J. Oommen and R.L. Kashyap, “A Formal Theory for Optimal and Information Theoretic Syntactic Pattern Recognition,” Pattern Recognition, vol. 31 pp. 1159-1177, 1998.
[10] B.J. Oommen and W. Lee, “Constrained Tree Editing,” Information Sciences, vol. 77, nos. 3 and 4, pp. 253-273, 1994.
[11] B.J. Oommen, K. Zhang, and W. Lee, “Numeric Similarity and Dissimilarity Measures between Two Trees,” IEEE Trans. Computers, vol. 45, no. 12, pp. 1426-1434, Dec. 1996.
[12] D. Sankoff and J.B. Kruskal, Time Wraps, String Edits, and Macromolecules: Theory and Practice of Sequence Comparison. Addison-Wesley, 1983.
[13] S.M. Selkow, “The Tree-to-Tree Editing Problem,” Information Processing Letters, vol. 6, no. 6, pp. 184-186, 1977.
[14] B. Shapiro, “An Algorithm for Comparing Multiple RNA Secondary Structures,” Computers and Applied Biosciences, pp. 387-393, 1988.
[15] B. Shapiro and K. Zhang, “Comparing Multiple RNA Secondary Structures Using Tree Comparisons,” Computers and Applied Biosciences, vol. 6, no. 4, pp. 309-318, 1990.
[16] K-C Tai, "The Tree-to-Tree Correction Problem," J. ACM, vol. 26, no. 3, pp. 422-433, 1979.
[17] Y. Takahashi, Y. Satoh, H. Suzuki, and S. Sasaki, “Recognition of Largest Common Structural Fragment Among a Variety of Chemical Structures,” Analytical Science, vol. 3, pp. 23-28, 1987.
[18] R.A. Wagner and M.J. Fischer, "The String-to-String Correction Problem," J. ACM, vol. 21, no. 1, pp. 168-78, 1974.
[19] K. Zhang, “Constrained String and Tree Editing Distance,” Proc. Int'l Assoc. Science and Technology for Development Int'l Symp., pp. 92-95, 1990.
[20] K. Zhang and T. Jiang, “Some MAX SNP-Hard Results Concerning Unordered Labeled Trees,” Information Processing Letters, vol. 49, pp. 249-254, 1994.
[21] K. Zhang and D. Shasha, "Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems," Siam J. Computing, vol. 18, no. 6, pp. 1,245-1,262, 1989.
[22] K. Zhang, R. Statman, and D. Shasha, “On the Editing Distance between Unordered Labeled Trees,” Information Processing Letters, vol. 42, pp. 133-139, 1992.
[23] K. Zhang, D. Shasha, and J.T.L. Wang, “Fast Serial and Parallel Approximate Tree Matching with VLDC's,” Proc. 1992 Symp. Combinatorial Pattern Matching (CPM '92), pp. 148-161, 1992.

B.J. Oommen, R.K.S. Loke, "On the Pattern Recognition of Noisy Subsequence Trees," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 9, pp. 929-946, Sept. 2001, doi:10.1109/34.955108
Usage of this product signifies your acceptance of the Terms of Use.