$k$ most likely output strings corresponding to the input string. This paper proposes a novel and probabilistic approach to string transformation, which is both accurate and efficient. The approach includes the use of a log linear model, a method for training the model, and an algorithm for generating the top $k$ candidates, whether there is or is not a predefined dictionary. The log linear model is defined as a conditional probability distribution of an output string and a rule set for the transformation conditioned on an input string. The learning method employs maximum likelihood estimation for parameter estimation. The string generation algorithm based on pruning is guaranteed to generate the optimal top $k$ candidates. The proposed method is applied to correction of spelling errors in queries as well as reformulation of queries in web search. Experimental results on large scale data show that the proposed approach is very accurate and efficient improving upon existing methods in terms of accuracy and efficiency in different settings." /> $k$ most likely output strings corresponding to the input string. This paper proposes a novel and probabilistic approach to string transformation, which is both accurate and efficient. The approach includes the use of a log linear model, a method for training the model, and an algorithm for generating the top $k$ candidates, whether there is or is not a predefined dictionary. The log linear model is defined as a conditional probability distribution of an output string and a rule set for the transformation conditioned on an input string. The learning method employs maximum likelihood estimation for parameter estimation. The string generation algorithm based on pruning is guaranteed to generate the optimal top $k$ candidates. The proposed method is applied to correction of spelling errors in queries as well as reformulation of queries in web search. Experimental results on large scale data show that the proposed approach is very accurate and efficient improving upon existing methods in terms of accuracy and efficiency in different settings." /> $k$ most likely output strings corresponding to the input string. This paper proposes a novel and probabilistic approach to string transformation, which is both accurate and efficient. The approach includes the use of a log linear model, a method for training the model, and an algorithm for generating the top $k$ candidates, whether there is or is not a predefined dictionary. The log linear model is defined as a conditional probability distribution of an output string and a rule set for the transformation conditioned on an input string. The learning method employs maximum likelihood estimation for parameter estimation. The string generation algorithm based on pruning is guaranteed to generate the optimal top $k$ candidates. The proposed method is applied to correction of spelling errors in queries as well as reformulation of queries in web search. Experimental results on large scale data show that the proposed approach is very accurate and efficient improving upon existing methods in terms of accuracy and efficiency in different settings." /> A Probabilistic Approach to String Transformation
Subscribe
Issue No.05 - May (2014 vol.26)
pp: 1063-1075
Ziqi Wang , Sch. of EECS, Peking Univ., Beijing, China
Gu Xu , Microsoft Bing, Bellevue, WA, USA
Hang Li , Huawei Noah's Ark Lab., Shatin, China
Ming Zhang , Sch. of EECS, Peking Univ., Beijing, China
ABSTRACT
Many problems in natural language processing, data mining, information retrieval, and bioinformatics can be formalized as string transformation, which is a task as follows. Given an input string, the system generates the k most likely output strings corresponding to the input string. This paper proposes a novel and probabilistic approach to string transformation, which is both accurate and efficient. The approach includes the use of a log linear model, a method for training the model, and an algorithm for generating the top k candidates, whether there is or is not a predefined dictionary. The log linear model is defined as a conditional probability distribution of an output string and a rule set for the transformation conditioned on an input string. The learning method employs maximum likelihood estimation for parameter estimation. The string generation algorithm based on pruning is guaranteed to generate the optimal top k candidates. The proposed method is applied to correction of spelling errors in queries as well as reformulation of queries in web search. Experimental results on large scale data show that the proposed approach is very accurate and efficient improving upon existing methods in terms of accuracy and efficiency in different settings.
INDEX TERMS
query processing, learning (artificial intelligence), maximum likelihood estimation, natural language processing, parameter estimation, probability,Web search queries, probabilistic approach, string transformation, natural language processing, data mining, information retrieval, bioinformatics, input string, output strings, conditional probability distribution, learning method, maximum likelihood estimation, parameter estimation, string generation algorithm,Dictionaries, Accuracy, Error correction, Indexes, Probabilistic logic, Training data, Data mining,Log Linear Model, Computing Methodologies, Document and Text Processing, Document and Text Editing, Spelling, Information Technology and Systems, Information Storage and Retrieval, Information Search and Retrieval, Query formulation, String Transformation,query reformulation, String transformation, log linear model, spelling error correction
CITATION
Ziqi Wang, Gu Xu, Hang Li, Ming Zhang, "A Probabilistic Approach to String Transformation", IEEE Transactions on Knowledge & Data Engineering, vol.26, no. 5, pp. 1063-1075, May 2014, doi:10.1109/TKDE.2013.11