
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
R.L. Kashyap, B.J. Oommen, "The Noisy Substring Matching Problem," IEEE Transactions on Software Engineering, vol. 9, no. 3, pp. 365370, May, 1983.  
BibTex  x  
@article{ 10.1109/TSE.1983.237018, author = {R.L. Kashyap and B.J. Oommen}, title = {The Noisy Substring Matching Problem}, journal ={IEEE Transactions on Software Engineering}, volume = {9}, number = {3}, issn = {00985589}, year = {1983}, pages = {365370}, doi = {http://doi.ieeecomputersociety.org/10.1109/TSE.1983.237018}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Software Engineering TI  The Noisy Substring Matching Problem IS  3 SN  00985589 SP365 EP370 EPD  365370 A1  R.L. Kashyap, A1  B.J. Oommen, PY  1983 KW  text editing KW  Error correction in strings KW  Levenshtein metric KW  noisy substring matching KW  string dissimilarity in terms of the dissimilarity of their substrings KW  string set estimation VL  9 JA  IEEE Transactions on Software Engineering ER   
Let T(U) be the set of words in the dictionary H which contains U as a substring. The problem considered here is the estimation of the set T(U) when U is not known, but Y, a noisy version of U is available. The suggested set estimate S*(Y) of T(U) is a proper subset of H such that its every element contains at least one substring which resembles Y most according to the Levenshtein metric. The proposed algorithm forthe computation of S*(Y) requires cubic time. The algorithm uses the recursively computable dissimilarity measure Dk(X, Y), termed as the kth distance between two strings X and Y which is a dissimilarity measure between Y and a certain subset of the set of contiguous substrings of X. Another estimate of T(U), namely SM(Y) is also suggested. The accuracy of SM(Y) is only slightly less than that of S*(Y), but the computation time of SM(Y) is substantially less than that of S*(Y). Experimental results involving 1900 noisy substrings and dictionaries which are subsets of 1023 most common English words [11] indicate that the accuracy of the estimate S*(Y) is around 99 percent and that of SM(Y) is about 98 percent.
Index Terms:
text editing, Error correction in strings, Levenshtein metric, noisy substring matching, string dissimilarity in terms of the dissimilarity of their substrings, string set estimation
Citation:
R.L. Kashyap, B.J. Oommen, "The Noisy Substring Matching Problem," IEEE Transactions on Software Engineering, vol. 9, no. 3, pp. 365370, May 1983, doi:10.1109/TSE.1983.237018
Usage of this product signifies your acceptance of the Terms of Use.