| | This Article | |
| |
| |
| | Share | |
| |
| |
| | Bibliographic References | |
| |
| |
| | Add to: | |
| |
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
| |
| | Search | |
| |
| |
| | |
Markov Edit Distance
March 2004 (vol. 26 no. 3)
pp. 311-321
Abstract—Edit distance was originally developed by Levenstein several decades ago to measure the distance between two strings. It was found that this distance can be computed by an elegant dynamic programming procedure. The edit distance has played important roles in a wide array of applications due to its representational efficacy and computational efficiency. To effect a more reasonable distance measure, the normalized edit distance was proposed. Many algorithms and studies have been dedicated along this line with impressive performances in recent years. There is, however, a fundamental problem with the original definition of edit distance that has remained elusive: its context-free nature. In determining the p ossible actions, i.e., insertion, deletion, and substitution, no systematic consideration was given to the local behaviors of the string/pattern in question that indeed encompass great amount of useful information regarding its content. In this paper, inspired by the success of the Markov Random Field theory, a new edit distance called Markov edit distance (MED) within the dynamic programming framework is proposed to take full advantage of the local statistical dependencies in the pattern in order to arrive at enhanced matching performance. Within this framework, two specialized distance measures are developed: The reshuffling MED to handle cases where a subpattern in the target pattern is the reshuffles of that in the source pattern, and the coherence MED which is able to incur local content based substitution, insertion, and deletion. The applications based on these two MEDs in string matching are then explored, whereof encouraging empirical results have been observed.
[1] 311 R. Alferez and Y-F. Wang, “Geometric and Illumination Invariants for Object Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 21, no. 6, pp. 505-536, June 1999.[2] A. Arslan and O. Egecioglu, Efficient Algorithms for Normalized Edit Distance J. Discrete Algorithms, vol. 1, no. 1, pp. 3-20, 2000.[3] R. Bellman, Dynamic Programming. Princeton Univ. Press, 1957.[4] J. Besag, Spatial Interaction and the Statistical Analysis of Lattice Systems J. Royal Statistical Soc. B, vol. 36, no. 2, pp. 192-236, 1974.[5] R. Chellappa and A.K. Jain, Markov Random Fields. Academic Press, 1993.[6] T. Cormen, C. Leiserson, and R. Rivest, Introduction to Algorithms. The MIT Press, 1990.[7] T.M. Cover and J.A. Thomas, Elements of Information Theory, second ed. Wiley, 1991.[8] G.R. Cross and A.K. Jain, Markov Random Field Texture Models IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 5, no. 1, pp. 25-39, 1983.[9] M.S. Drew, J. Wei, and Z.N. Li, Illumination-Invariance Image Retrieval and Video Segmentation Pattern Recognition, vol. 32, no. 8, pp. 1369-1388, 1999.[10] R. Duda, P. Hart, and D. Stork, Pattern Classification, second ed. Wiley, 2001.[11] K. Durbin, S. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge Univ. Press, 1997.[12] K. Fukunaga, Statistical Pattern Recognition. Morgan Kaufmannm 1990.[13] S. Geman and D. Geman, Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 6, no. 6, pp. 721-741, 1984.[14] J.M. Hammersley and P. Clifford, Markov Field on Finite Graphs and Lattices unpublished, 1971.[15] X. Huang, A. Acero, and H. Hon, Spoken Language Processing. Prentice Hall, 2001.[16] A.K. Jain and K. Karu, “Learning Texture Discrimination Masks,” IEEE Trans. Pattern Analysis Machine Intelligence, vol. 18, no. 2, pp. 195-205, Feb. 1996.[17] M.W. Koch and R.L. Kashyap, Matching Polygon Fragments Pattern Recognition Letters, vol. 10, pp. 297-308, 1989.[18] A. Levenstein, Binary Codes Capable of Correcting Deletions, Insertions and Reversals Soviet Physics-Doklandy, vol. 10, 1966.[19] S.Z. Li, Markov Random Field Modeling in Computer Vision. Springer-Verlag, 1995.[20] R. Lowrance and R.A. Wagner, An Extension of the String-to-String Correction Problem J. ACM, vol. 23, no. 2, pp. 177-183, 1975.[21] D.L. MacAdam, Sources of Color Sciences. MIT Press, 1970.[22] A. Marzal and E. Vidal, "Computation of Normalized Edit Distance and Applications," IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 15, pp. 926-932, 1993.[23] T. Okuda, E. Tanaka, and T. Kasai, A Method of Correction of Garbled Words Based on the Levenstein Metric IEEE Trans. Computers, pp. 172-177, 1976.[24] B.A. Olshausen and D.J. Field, Emergence of Simple-Cell Receptive Field Properties by Learning a Sparse Code for Natural Images Nature, vol. 381, pp. 607-609, 1996.[25] B.J. Oommen, Constrained String Editing Information Sciences, vol. 40, pp. 267-284, 1986.[26] B.J. Oommen, Recognition of Noisy Subsequences Using Constrained Edit Distances IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 9, pp. 676-685, 1987.[27] B.J. Oommen and R. Loke, Pattern Recognition of Strings with Substitutions, Insertions, Deletions and Generalized Transpositions technical report, Dept. of Computer Science, Carleton Univ. 1995.[28] B.J. Oommen and K. Zhang, “The Normalized String Editing Problem Revisited,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, pp. 669-672, 1996.[29] L. Rabiner and B.H. Juang, Fundamentals of Speech Recognition. Prentice-Hall, 1993.[30] L. Raiha, A Study with Histograms Pattern Recognition, vol. 12, no. 1, 1990.[31] N. Ratha, K. Karu, S. Chen, and A.K. Jain, "A Real-Time Matching System for Large Fingerprint Databases," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 8, pp. 799-813, Aug. 1996.[32] G. Seni, V. Kripasundar, and R. Srihari, Generalizing Edit Distance to Incorporate Domain Information: Handwritten Text Recognition as a Case Study Pattern Recognition, vol. 29, no. 3, pp. 405-414, 1996.[33] M. Swain, Color Indexing technical report, Univ. of Rochester, 1990.[34] M.J. Swain and D.H. Ballard, Color Indexing Int'l J. Computer Vision, vol. 7, no. 1, pp. 11-32, 1991.[35] S. Theodoridis and K. Koutroumbas, Pattern Recognition. Academic Press, 1999.[36] Y. Tsay and W. Tsai, “Attributed String Matching by Split and Merge for On-Line Chinese Character Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 15, pp. 180-185, 1993.[37] E. Vidal, A. Marzal, and P. Aibar, “Fast Computation of Normalized Edit Distances,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 17, pp. 899-902, 1995.[38] R.A. Wagner and M.J. Fisher, The String to String Correction Problem J. ACM, vol. 21, p. 168, 1974.[39] Y. Wang and T. Pavlidis, “Optimal Correspondence of String Subsequences,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 12, pp. 1,080-1,086, 1990.[40] J. Wei, Color Object Indexing and Recognition in Digital Libraries IEEE Trans. Image Processing, vol. 11, no. 8, pp. 912-922, 2002.[41] J. Ziv and A. Lempel, "A Universal Algorithm for Sequential Data Compression," IEEE Trans. Information Theory, vol. 23, no. 3, pp. 337-343, 1977.
Index Terms:
Edit distance, Markov Random Field, dynamic programming, statistical dependency, text processing, pattern recognition.
Citation:
Jie Wei, "Markov Edit Distance," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 3, pp. 311-321, Mar. 2004, doi:10.1109/TPAMI.2004.1262315