This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
The Noisy Substring Matching Problem
May 1983 (vol. 9 no. 3)
pp. 365-370
R.L. Kashyap, School of Electrical Engineering, Purdue University
Let T(U) be the set of words in the dictionary H which contains U as a substring. The problem considered here is the estimation of the set T(U) when U is not known, but Y, a noisy version of U is available. The suggested set estimate S*(Y) of T(U) is a proper subset of H such that its every element contains at least one substring which resembles Y most according to the Levenshtein metric. The proposed algorithm for-the computation of S*(Y) requires cubic time. The algorithm uses the recursively computable dissimilarity measure Dk(X, Y), termed as the kth distance between two strings X and Y which is a dissimilarity measure between Y and a certain subset of the set of contiguous substrings of X. Another estimate of T(U), namely SM(Y) is also suggested. The accuracy of SM(Y) is only slightly less than that of S*(Y), but the computation time of SM(Y) is substantially less than that of S*(Y). Experimental results involving 1900 noisy substrings and dictionaries which are subsets of 1023 most common English words [11] indicate that the accuracy of the estimate S*(Y) is around 99 percent and that of SM(Y) is about 98 percent.
Index Terms:
text editing, Error correction in strings, Levenshtein metric, noisy substring matching, string dissimilarity in terms of the dissimilarity of their substrings, string set estimation
Citation:
R.L. Kashyap, B.J. Oommen, "The Noisy Substring Matching Problem," IEEE Transactions on Software Engineering, vol. 9, no. 3, pp. 365-370, May 1983, doi:10.1109/TSE.1983.237018
Usage of this product signifies your acceptance of the Terms of Use.