This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
An Approach to Designing Very Fast Approximate String Matching Algorithms
August 1994 (vol. 6 no. 4)
pp. 620-633

An approach to designing very fast algorithms for approximate string matching in a dictionary is proposed. Multiple spelling errors corresponding to insert, delete, change, and transpose operations on character strings are considered in the fault model. The design of very fast approximate string matching algorithms through a four-step reduction procedure is described. The final and most effective step uses hashing techniques to avoid comparing the given word with words at large distances. The technique has been applied to a library book catalog textbase. The experiments show that performing approximate string matching for a large dictionary in real-time on an ordinary sequential computer under our multiple fault model is feasible.

[1] A. V. Aho, J. E. Hopcroft, and J. D. Ullman,The Design and Analysis of Computer Algorithms. Menlo Park, CA: Addison-Wesley, 1974.
[2] C. R. Blair, "A program for correcting spelling errors,"Inform. Control., vol. 3, pp. 60-67, 1960.
[3] C. P. Bourne, "Frequency and impact of spelling errors in bibliographic data bases,"Inform. Processing Mgmt., vol. 13, no. 1, pp. 1-12, 1977.
[4] S. H. Caldwell,Switching Circuits and Logical Design. New York: Wiley, 1958.
[5] "Spelling correction program for micros,"Comput. Weekly, pp. 7, July 3, 1980.
[6] F. J. Damerau, "A technique for computer detection and correction of spelling errors,"Commun. ACM, vol. 7, no. 3, pp. 171-176, Mar. 1964.
[7] L. Davidson, "Retrieval of misspelled names in an airlines passenger record system,"Commun. ACM, vol. 5, no. 3, pp. 169-171, Mar. 1962.
[8] M. W. Du and S. C. Chang, "A model and a fast algorithm for multiple errors spelling correction,"Acta Informatica, vol. 29, pp. 281-302, 1992.
[9] M. W. Du and S. C. Chang, "A new approach to shortest editing sequence and longest common subsequence problems," Tech. Rep. TR-047-06-89-500, GTE Laboratories, Waltham, MA, USA, June 1989.
[10] M. R. Garey and D. S. Johnson,Computers and Intractability: A Guide to Theory of NP-Completeness. San Francisco, CA: Freeman, 1979.
[11] P. A. V. Hall and G. R. Dowling, "Approximate string matching,"ACM Comput. Surveys, vol. 12, pp. 381-402, 1980.
[12] D. S. Hirschberg, "Algorithms for the longest common subsequence problem,"J. ACM, vol. 24, no. 4, pp. 664-675, Oct. 1977.
[13] J.E. Hopcroft and J.D. Ullman,Introduction to Automata Theory, Languages, and Computation, Addison-Wesley, Reading, Mass., 1979.
[14] E. Horowitz and S. Sahni,Fundamentals of Computer Algorithms. Rockville, MD: Computer Sci. Press, 1978.
[15] J. W. Hunt and T. G. Szymanski, "A fast algorithm for computing longest common subsequences,"Commun. ACM, vol. 20, no. 5, pp. 350-353, May 1977.
[16] D. E. Knuth,The Art of Computer Programming, Vol. 3, Reading, MA: Addison-Wesley, 1973.
[17] C. R. Litecky and G. B. Davis, "A study of errors, error-proneness, and error diagnosis in COBOL,"Commun. ACM, vol. 19, no. 1, pp. 33-37, Jan. 1976.
[18] R. Lawrence and R.A. Wagner, "An extension of the string to string correction problem,"ACM, vol. 22, pp. 177-183, 1975.
[19] W. J. Masek and M. S. Paterson, "A faster algorithm computing string edit distances,"J. Comput. Syst. Sci., vol. 20, pp. 18-31, 1980.
[20] H. L. Morgan, "Spelling correction in systems programs,"Commun. ACM, vol. 13, no. 2, pp. 90-94, Feb. 1970.
[21] A. Mukhopadhyay, "A fast algorithm for the longest-common-subsequence problem,"Inform. Sci., vol. 20, pp. 69-82, 1980.
[22] F. E. Muth and A. L. Tharp, "Correcting human error in alphanumeric terminal input,"Inform. Processing Mgmt., vol. 13, no. 6, pp. 329-337, 1977.
[23] J. L. Peterson, "Computer programs for spelling correction: An experiment in program design," inLecture Notes in Computer Science 96. New York: Springer-Verlag, 1980.
[24] J. L. Peterson, "Computer program for detecting and correcting spelling errors,"Commun. ACM, vol. 23, no. 12, pp. 676-687, Dec. 1980.
[25] J. J. Pollock and A. Zamora, "Automatic spelling correction in scientific and scholarly text,"Commun. ACM, vol. 27, no. 4, pp. 358-368, Apr. 1984.
[26] D. Sankoff and J. B. Kruskal,Time Warps, String Edits and Macromolecules: The Theory of Sequence Comparison. Reading, MA: Addison-Wesley, 1983.
[27] G. Salton,Automatic Text Processing: The Transformation. Analysis and Retrieval of Information by Computer. Reading, MA: Addison-Wesley, 1989.
[28] R. Wagner and M. Fischer, "The string-to-string correction problem,"J. ACM, vol. 21, pp. 168-173, 1974.

Index Terms:
error correction; search problems; database theory; query processing; fast approximate string matching algorithms; dictionary; spelling errors; character strings; fault model; four-step reduction procedure; hashing techniques; library book catalog textbase; large dictionary; sequential computer; multiple fault model; nearest neighbor search; error correction; information retrieval
Citation:
M.-W. Du, S.C. Chang, "An Approach to Designing Very Fast Approximate String Matching Algorithms," IEEE Transactions on Knowledge and Data Engineering, vol. 6, no. 4, pp. 620-633, Aug. 1994, doi:10.1109/69.298177
Usage of this product signifies your acceptance of the Terms of Use.