Publication 1996 Issue No. 12 - December Abstract - Numerical Similarity and Dissimilarity Measures Between Two Trees
 This Article Share Bibliographic References Add to: Digg Furl Spurl Blink Simpy Google Del.icio.us Y!MyWeb Search Similar Articles Articles by B.j. Oommen Articles by K. Zhang Articles by W. Lee
Numerical Similarity and Dissimilarity Measures Between Two Trees
December 1996 (vol. 45 no. 12)
pp. 1426-1434
 ASCII Text x B.j. Oommen, K. Zhang, W. Lee, "Numerical Similarity and Dissimilarity Measures Between Two Trees," IEEE Transactions on Computers, vol. 45, no. 12, pp. 1426-1434, December, 1996.
 BibTex x @article{ 10.1109/12.545972,author = {B.j. Oommen and K. Zhang and W. Lee},title = {Numerical Similarity and Dissimilarity Measures Between Two Trees},journal ={IEEE Transactions on Computers},volume = {45},number = {12},issn = {0018-9340},year = {1996},pages = {1426-1434},doi = {http://doi.ieeecomputersociety.org/10.1109/12.545972},publisher = {IEEE Computer Society},address = {Los Alamitos, CA, USA},}
 RefWorks Procite/RefMan/Endnote x TY - JOURJO - IEEE Transactions on ComputersTI - Numerical Similarity and Dissimilarity Measures Between Two TreesIS - 12SN - 0018-9340SP1426EP1434EPD - 1426-1434A1 - B.j. Oommen, A1 - K. Zhang, A1 - W. Lee, PY - 1996KW - Tree comparison metricsKW - tree comparison algorithmsKW - algorithms for structure comparisonKW - generic strategies for tree comparisons.VL - 45JA - IEEE Transactions on ComputersER -

Abstract—Quantifying the measure of similarity between two trees is a problem of intrinsic importance in the study of algorithms and data structures and has applications in computational molecular biology, structural/syntactic pattern recognition and in data management. In this paper we define and formulate an abstract measure of comparison, Ω(T1, T2), between two trees T1 and T2 presented in terms of a set of elementary intersymbol measures ω(., .) and two abstract operators $\oplus$ and $\otimes$. By appropriately choosing the concrete values for these two operators and for ω(., .), this measure can be used to define various quantities including 1) the edit distance between two trees, 2) the size of their largest common subtree, 3) Prob(T2 | T1), the probability of receiving T2 given that T1 was transmitted across a channel causing independent substitution and deletion errors, and 4) the a posteriori probability of T1 being the transmitted tree given that T2 is the received tree containing independent substitution, insertion and deletion errors. The recursive properties of Ω(T1, T2) have been derived and a single generic iterative dynamic programming scheme to compute all the above quantities has been developed. The time and space complexities of the algorithm have been analyzed and the implications of our results in both theoretical and applied fields has been discussed.

[1] A.V. Aho,J.E. Hopcroft, and J.D. Ullman,The Design and Analysis of Computer Algorithms.Reading, Mass.: Addison-Wesley, 1974.
[2] A.V. Aho, D.S. Hirschberg, and J.D. Ullman, "Bounds on the Complexity of the Longest Common Subsequence Problem," J. ACM, vol. 23, pp. 1-12, Jan. 1976.
[3] C.N. Albegra, "String Similarity and Misspellings," Comm. ACM, vol. 10, pp. 302-313, May 1967.
[4] Y.C. Cheng and S.Y. Lu, "Waveform Correlation by Tree Matching," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 7, pp. 299-305, 1985.
[5] P.A.V. Hall and G.R. Dowling, "Approximate String Matching," Computing Surveys, vol. 12, no. 4, pp. 381-402, 1980.
[6] D.S. Hirschberg, "A Linear Space Algorithm for Computing Maximal Common Subsequences," Comm. Assoc. Comput. Mach., vol. 18, pp. 341-343, 1975.
[7] D.S. Hirschberg, “Algorithms for the Longest Common Sequence Problem,” J. ACM, vol. 24, no. 4, pp. 664-675, Oct. 1977.
[8] D.S. Hirschberg, "An Information-Theoretic Lower Bound for the Longest Common Subsequence Problem," Information Processing Letters, vol. 7, pp. 40-41, Jan. 1978.
[9] J.W. Hunt and T.G. Szymanski, “A Fast Algorithm for Computing Longest Common Subsequences,” Comm. ACM, vol. 20, no. 5, pp. 350-353, 1977.
[10] R.L. Kashyap and B.J. Oommen, "A Common Basis for Similarity Measures Involving Two Strings," Int'l J. Computer Math., vol. 13, pp. 17-40, 1983.
[11] R.L. Kashyap and B.J. Oommen, "The Noisy Substring Matching Problem," IEEE Trans. Software Eng., pp. 365-370, May 1983.
[12] R.L. Kashyap and B.J. Oommen, "Spelling Correction Using Probabilistic Methods," Pattern Recognition Letters, vol. 2, pp. 147-154, 1984.
[13] M. Kunze and G. Thierrin, "Maximal Common Subsequences of Pairs of Strings," Congressus Numerantium, vol. 34, pp. 299-311, 1982.
[14] S.-Y. Le, J. Owens, R. Nussinov, J.-H. Chen, B. Shapiro, and J.V. Maizel, "RNA Secondary Structures: Comparison and Determination of Frequently Recurring Substructures by Consensus," Computer Applications to Bioscience, vol. 5, pp. 205-210, 1989.
[15] R. Lowrance and R.A. Wagner, "An Extension of the String-to-String Correction Problem," J. ACM, vol. 22, pp. 177-183, Apr. 1975.
[16] S.Y. Lu, "A Tree-to-Tree Distance and Its Application to Cluster Analysis," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 1, no. 2, pp. 219-224, Apr. 1979.
[17] S.Y. Lu, "A Tree-Matching Algorithm Based on Node Splitting and Merging," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 6, no. 2, pp. 249-256, 1984.
[18] D. Maier, "The Complexity of Some Problems on Subsequences and Supersequences," J. ACM, vol. 25, pp. 322-336, Apr. 1978.
[19] W.J. Masek and M.S. Paterson, "A Faster Algorithm Computing String Edit Distances," J. Computing and Systems Science, vol. 20, pp. 18-31, 1980.
[20] B.J. Oommen, “Recognition of Noisy Subsequences Using Constrained Edit Distances,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 9, no. 5, pp. 676-685, May 1987.
[21] B.J. Oommen, K. Zhang, and W. Lee, "Numeric Similarity and Dissimilarity Measures Between Two Trees," Technical Report TR-203, School of Computer Science, Carleton Univ., Ottawa, Canada.
[22] S.M. Selkow, "The Tree-to-Tree Editing Problem," Information Processing Letters, vol. 6, no. 6, pp. 184-186, Dec. 1977.
[23] B.A. Shapiro, "An Algorithm for Comparing Multiple RNA Secondary Structures," Computer Applications in Bioscience, vol. 4, pp. 387-393, 1988.
[24] D. Sankoff and J.B. Kruskal, Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley, 1983.
[25] B. Shapiro and K. Zhang, "Comparing Multiple RNA Secondary Structures Using Tree Comparisons," Computer Applications in Bioscience, vol. 6, no. 4, pp. 309-318, 1990.
[26] K-C Tai, "The Tree-to-Tree Correction Problem," J. ACM, vol. 26, no. 3, pp. 422-433, 1979.
[27] E. Tanaka and K. Tanaka, "The Tree-to-Tree Editing Problem," Int'l J. Pattern Recognition, vol. 2, no. 2, pp. 221-240, 1988.
[28] R.A. Wagner and M.J. Fischer, "The String-to-String Correction Problem," J. ACM, vol. 21, no. 1, pp. 168-78, 1974.
[29] K. Zhang and D. Shasha, "Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems," Siam J. Computing, vol. 18, no. 6, pp. 1,245-1,262, 1989.

Index Terms:
Tree comparison metrics, tree comparison algorithms, algorithms for structure comparison, generic strategies for tree comparisons.
Citation:
B.j. Oommen, K. Zhang, W. Lee, "Numerical Similarity and Dissimilarity Measures Between Two Trees," IEEE Transactions on Computers, vol. 45, no. 12, pp. 1426-1434, Dec. 1996, doi:10.1109/12.545972