|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| R.L. Kashyap, B.J. Oommen, "The Noisy Substring Matching Problem," IEEE Transactions on Software Engineering, vol. 9, no. 3, pp. 365-370, May, 1983. | |||
| BibTex | x | ||
| @article{ 10.1109/TSE.1983.237018, author = {R.L. Kashyap and B.J. Oommen}, title = {The Noisy Substring Matching Problem}, journal ={IEEE Transactions on Software Engineering}, volume = {9}, number = {3}, issn = {0098-5589}, year = {1983}, pages = {365-370}, doi = {http://doi.ieeecomputersociety.org/10.1109/TSE.1983.237018}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Software Engineering TI - The Noisy Substring Matching Problem IS - 3 SN - 0098-5589 SP365 EP370 EPD - 365-370 A1 - R.L. Kashyap, A1 - B.J. Oommen, PY - 1983 KW - text editing KW - Error correction in strings KW - Levenshtein metric KW - noisy substring matching KW - string dissimilarity in terms of the dissimilarity of their substrings KW - string set estimation VL - 9 JA - IEEE Transactions on Software Engineering ER - | |||
Let T(U) be the set of words in the dictionary H which contains U as a substring. The problem considered here is the estimation of the set T(U) when U is not known, but Y, a noisy version of U is available. The suggested set estimate S*(Y) of T(U) is a proper subset of H such that its every element contains at least one substring which resembles Y most according to the Levenshtein metric. The proposed algorithm for-the computation of S*(Y) requires cubic time. The algorithm uses the recursively computable dissimilarity measure Dk(X, Y), termed as the kth distance between two strings X and Y which is a dissimilarity measure between Y and a certain subset of the set of contiguous substrings of X. Another estimate of T(U), namely SM(Y) is also suggested. The accuracy of SM(Y) is only slightly less than that of S*(Y), but the computation time of SM(Y) is substantially less than that of S*(Y). Experimental results involving 1900 noisy substrings and dictionaries which are subsets of 1023 most common English words [11] indicate that the accuracy of the estimate S*(Y) is around 99 percent and that of SM(Y) is about 98 percent.
Index Terms:
text editing, Error correction in strings, Levenshtein metric, noisy substring matching, string dissimilarity in terms of the dissimilarity of their substrings, string set estimation
Citation:
R.L. Kashyap, B.J. Oommen, "The Noisy Substring Matching Problem," IEEE Transactions on Software Engineering, vol. 9, no. 3, pp. 365-370, May 1983, doi:10.1109/TSE.1983.237018
Usage of this product signifies your acceptance of the Terms of Use.

