The Community for Technology Leaders
String Processing and Information Retrieval, International Symposium on (1999)
Cancun, Mexico
Sept. 21, 1999 to Sept. 24, 1999
ISBN: 0-7695-0268-7
pp: 8
Abdullah N. Arslan , University of California at Santa Barbara
Omer Egecioglu , University of California at Santa Barbara
ABSTRACT
A common model for computing the similarity of two strings X and Y of lengths m, and n respectively with m = n, is to transform X into Y through a sequence of three types of edit operations: insertion, deletion, and substitution. The model assumes a given cost function which assigns a non-negative real weight to each edit operation. The amortized weight for a given edit sequence is the ratio of its weight to its length, and the minimum of this ratio over all edit sequences is the normalized edit distance. Existing algorithms for normalized edit distance computation with proven complexity bounds require O(mn2) time in the worst-case. We give an O(mnlogn)-time algorithm for the problem when the cost function is uniform, i.e, the weight of each edit operation is constant within the same type, except substitutions can have different weights depending on whether they are matching or non-matching.
INDEX TERMS
Edit distance, normalized edit distance, algorithm, dynamic programming, fractional programming, ratio minimization
CITATION
Abdullah N. Arslan, Omer Egecioglu, "An Efficient Uniform-Cost Normalized Edit Distance Algorithm", String Processing and Information Retrieval, International Symposium on, vol. 00, no. , pp. 8, 1999, doi:10.1109/SPIRE.1999.796572
101 ms
(Ver 3.3 (11022016))