This Article 
 Bibliographic References 
 Add to: 
A System for Approximate Tree Matching
August 1994 (vol. 6 no. 4)
pp. 559-571

Ordered, labeled trees are trees in which each node has a label and the left-to-right order of its children (if it has any) is fixed. Such trees have many applications in vision, pattern recognition, molecular biology, programming compilation, and natural language processing. Many of the applications involve comparing trees or retrieving/extracting information from a repository of trees. Examples include classification of unknown patterns, analysis of newly sequenced RNA structures, semantic taxonomy for dictionary definitions, generation of interpreters for nonprocedural programming languages, and automatic error recovery and correction for programming languages. Previous systems use exact matching (or generalized regular expression matching) for tree comparison. This paper presents a system, called approximate-tree-by-example (ATBE), which allows inexact matching of trees. The ATBE system interacts with the user through a simple but powerful query language; graphical devices are provided to facilitate inputing the queries. The paper describes the architecture of ATBE, illustrates its use and describes some aspects of ATBE implementation. We also discuss the underlying algorithms and provide some sample applications.

[1] A.V. Aho, M. Ganapathi, and S.W.K. Tjiang, "Code Generation Using Tree Matching and Dynamic Programming,"ACM Trans. Programming Languages and Systems, Vol. 11, No. 4, Oct. 1989, pp. 491-516.
[2] A. M. Alashqur,et al."OQL: A query language for manipulating object-oriented databases," inProc. 15th Int. Conf. Very Large Data Bases, Aug. 1989.
[3] M. M. Astrahanet al., "System R: Relational approach to database management,"Trans. Database Syst., vol. 1, no. 1, pp. 97-137, 1976.
[4] R.S. Boyer and J. Moore, "A Fast String Searching Algorithm,"Comm. ACM, Vol. 20, Oct. 1977, pp. 762-772.
[5] R. Byrd, "LQL user notes: An informal guide to the lexical query language," Tech. Rep., IBM T. J. Watson Research Center, Yorktown Heights, NY, USA, 1990.
[6] S. K. Chang and T. L. Kunii, "Pictorial data-base systems,"Comput., vol. 14, no. 11, pp. 13-21, 1981.
[7] Y. C. Cheng and S. Y. Lu, "Waveform correlation by tree matching,"IEEE Trans. Patt. Anal. Mach. Intell., vol. 7, no. 3, pp. 299-305, May 1985.
[8] M. Chock, A. F. Cardenas, and A. Klinger, "Database structure manipulation capabilities of the picture database management system (PICDMS),"IEEE Trans. Patt. Anal. Mach. Intell., vol. 6, no. 4, pp. 484-492, July 1984.
[9] M. S. Chodorow, R. J. Byrd, and G. E. Heidorn, "Extracting semantic hierarchies from a large on-line dictionary," inProc. Ann. Meetings Assoc. for Computational Linguistics, 1985, pp. 299-304.
[10] M. Chodorow and J.L. Klavans, "Locating syntactic patterns in text corpora," manuscript, Lexical Systems Project, IBM T. J. Watson Research Center, Yorktown Heights, NY, USA, 1990.
[11] L. S. Davis and N. Roussopoulos, "Approximate pattern matching in a pattern database system,"Inform. Syst., vol. 5, pp. 107-119, 1980.
[12] R. O. Duda and P. E. Hart,Pattern Classification and Scene Analysis. New York: Wiley, 1973.
[13] R. W. Ehrich and J. P. Foith, "Representation of random waveforms by relational trees,"IEEE Trans. Comput., vol. 25, pp. 725-736, 1976.
[14] C. M. Hoffmann and M. J. O'Donnell, "Pattern matching in trees,"J. ACM, vol. 29, no. 1, pp. 68-95, Jan. 1982.
[15] B. E. Jacobs and C. A. Walczak, "A generalized query-by-example data manipulation language based on database logic,"IEEE Trans. Software Eng., vol. 9, no. 1, pp. 40-57, Jan. 1983.
[16] R. L. Kashyap and B. J. Oommen, "The noisy substring matching problem,"IEEE Trans. Software Eng., vol. 9, no. 3, pp. 365-370, May 1983.
[17] S. Kosaraju, "Efficient tree pattern matching," inProc. 30th Ann. IEEE Symp. on Found. Comput. Sci., 1989, pp. 178-183.
[18] G. M. Landau and U. Vishkin, "Introducing efficient parallelism into approximate string matching and a new serial algorithm," inProc. 18th Ann. ACM Symp. Theory of Computing, 1986, pp. 220-230.
[19] S. Y. Lu, "A tree-matching algorithm based on node splitting and merging,"IEEE Trans. Patt. Anal. Mach. Intell., vol. 6. pp. 249-256, Mar. 1984.
[20] B. Moayer and K. S. Fu, "A tree system approach for fingerprint pattern recognition,"IEEE Trans. Patt. Anal. Mach. Intell., vol. 8, pp. 376-387, May 1986.
[21] E. W. Myers and W. Miller, "Approximate matching of regular expressions,"Bull. Mathemat. Biology, vol. 51, pp. 5-37, 1989.
[22] M. Neff, R. Byrd, and O. Rizk, "Creating and querying hierarchical lexical data bases," inProc. 2nd Conf. Applied Natural Language Processing, 1988, pp. 84-93.
[23] M. Neff and B. K. Boguraev, "Dictionaries, dictionary grammars and dictionary entry parsing," inProc. 27th Ann. Meeting of the Assoc. for Computational Linguistics, 1989.
[24] J. A. Orenstein and F. A. Manola, "PROBE spatial data modeling and query processing in an image database application,"IEEE Trans. Software Eng., vol. 14, pp. 611-629, May 1988.
[25] G.Özsoyoglu, V. Matos, and Z.M.Özsoyoglu, "Query-Processing Techniques in the Summary-Table-by-Example Database Query Language,"ACM Trans. on Database Systems, Vol. 14, No. 4, Dec. 1989, pp. 526-573.
[26] J. Pustejovsky, "The semantic representation of lexical knowledge," in Uri Zernik, Ed.,Lexical Acquisition: Using On-Line Resources to Build a Lexicon. Cambridge, MA: MIT Press, 1989.
[27] E. M. Reingold and J. S. Tilford, "Tidier drawings of trees,"IEEE Trans. Software Eng., vol. 7, pp. 223-228, 1981.
[28] N. Roussopoulos, C. Faloursos, and T. Sellis, "An efficient pictorial database system for PSQL,"IEEE Trans. Software Eng., vol. 14, pp. 639-650, May 1988.
[29] H. Samet, "Distance transform for images represented by quad trees,"IEEE Trans. Patt. Anal. Mach. Intell., vol. 4, no. 3, pp. 298-303, May 1982.
[30] B. A. Shapiro and K. Zhang, "Comparing multiple RNA secondary structures using tree comparisons,"Comput. Appl. Biosci., vol. 6, pp. 309-318, 1990.
[31] L. G. Shapiro and R. M. Haralick, "Structural descriptions and inexact matching,"IEEE Trans. Patt. Anal. Mach. Intell., vol. 3, no. 5, pp. 504-519, Sep. 1981.
[32] D. Shasha and T. L. Wang, "New techniques for best-match retrieval,"ACM Trans. Inform. Syst., vol. 8, pp. 140-158, Apr. 1990.
[33] M. Stonebraker,et al., "The design and implementation of INGRES,"ACM Trans. Database Syst., vol. 1, no. 3, Sept. 1976.
[34] J. L. Sussman and S. H. Kim, "Three dimensional structure of a transfer RNA in two crystal forms,"Sci., vol. 192, p. 853, 1976.
[35] K. C. Tai, "The tree-to-tree correcting problem,"J. ACM, vol. 26, pp. 422-433, 1979.
[36] A. U. Tansel, M. E. Arkun, and G. Ozsoyoglu, "Time-by-example query language for historical databases,"IEEE Trans. Software Eng., vol. 15, pp. 464-478, Apr. 1989.
[37] E. Ukkonen, "Finding approximate pattern in strings,"J. Algorithms, vol. 6, pp. 132-137, 1985.
[38] P. D. Vaidya, L. G. Shapiro, R. M. Haralick, and G. J. Minden, "Design and architectural implications of a spatial information system,"IEEE Trans. Comput., vol. 31, pp. 1025-1031, 1982.
[39] T. L. Wang and D. Shasha, "Query processing for distance metrics," inProc. 16th Int. Conf. on Very Large Data Bases, 1990, pp. 602-613.
[40] T. L. Wang, "Query optimization in database and information retrieval systems," Ph.D. dissertation, Dept. Comput. Sci., Courant Inst. of Mathemat. Sci., New York Univ., New York, NY, USA, 1991.
[41] J. T. L. Wang, K. Jeong, K. Zhang, and D. Shasha, "Reference manual for ATBE: A tool for approximate tree matching," Tech. Rep. TR-551, Courant Inst. of Mathemat. Sci., New York Univ., New York, NY, USA, 1991, pp. 1-57.
[42] J. T. L. Wang, K. Zhang, K. Jeong, and D. Shasha, "A tool for tree pattern matching," inProc. 3rd IEEE Int. Conf. on Tools for Artificial Intell., 1991, pp. 436-444.
[43] C. Wetherell and A. Shannon, "Tidy drawings of trees."IEEE Trans. Software Eng., vol. 5, pp. 514-520, 1979.
[44] O'Reilly&Associates,X Toolkit Intrinsics Programming Manual. Sebastopol, CA: 1990.
[45] K. Zhang, "The editing distance between trees: Algorithms and applications," Ph.D. dissertation, Dept. Comput. Sci., Courant Inst. of Mathemat. Sci., New York Univ., New York, NY, USA, 1989.
[46] K. Zhang and D. Shasha, "Simple fast algorithms for the editing distance between trees and related problems,"SIAM J. Comput., vol. 18, pp. 1245-1262, Dec. 1989.
[47] K. Zhang, D. Shasha, and J. T. L. Wang, "Fast serial and parallel algorithms for approximate tree matching with VLDC's," inProc. 3rd Int. Conf. Combinatorial Pattern Matching, 1992, pp. 148-158.
[48] K. Zhang, R. Statman, and S. Shasha, "On the editing distance between unordered labeled trees,"Inf. Process. Lett., vol. 42, pp. 133-139, 1992.
[49] M. M. Zloof, "Query-by-example," inProc. Nat. Comput. Conf., 1975, pp. 431-438.
[50] M. M. Zloof, "Office-by-example: A business language that unifies data and word processing and electronic mail,"IBM Syst. J., vol. 21, pp. 272-304, 1982.

Index Terms:
trees (mathematics); tree data structures; search problems; query languages; database theory; approximate tree matching; labeled trees; vision; pattern recognition; molecular biology; programming compilation; natural language processing; RNA structures; semantic taxonomy; dictionary definitions; automatic error recovery; approximate-tree-by-example; query language
J.T.-L. Wang, K. Zhang, K. Jeong, D. Shasha, "A System for Approximate Tree Matching," IEEE Transactions on Knowledge and Data Engineering, vol. 6, no. 4, pp. 559-571, Aug. 1994, doi:10.1109/69.298173
Usage of this product signifies your acceptance of the Terms of Use.