Language Edit Distance and Maximum Likelihood Parsing of Stochastic Grammars: Faster Algorithms and Connection to Fundamental Graph Problems
2015 IEEE 56th Annual Symposium on Foundations of Computer Science (FOCS) (2015)
Berkeley, CA, USA
Oct. 17, 2015 to Oct. 20, 2015
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/FOCS.2015.17
Given a context free language G over alphabet &#x03A3; and a string s, the language edit distance problem seeks the minimum number of edits (insertions, deletions and substitutions) required to convert s into a valid member of the language L(G). The well-known dynamic programming algorithm solves this problem in cubic time in string length [Aho, Peterson 1972, Myers 1985]. Despite its numerous applications, to date there exists no algorithm that computes the exact or approximate language edit distance problem in true sub cubic time. In this paper we give the first such truly sub-cubic algorithm that computes language edit distance almost optimally. We further solve the local alignment problem, for all substrings of s, we can estimate their language edit distance near-optimally in same time with high probability. Next, we design the very first sub cubic algorithm that given an arbitrary stochastic context free grammar, and a string returns a nearly-optimal maximum likelihood parsing of that string. Stochastic context free grammars significantly generalize hidden Markov models, they lie at the foundation of statistical natural language processing, and have found widespread applications in many other fields. To complement our upper bound result, we show that exact computation of maximum likelihood parsing of stochastic grammars or language edit distance in true sub cubic time will imply a truly sub cubic algorithm for all-pairs shortest paths, a long-standing open question. This will result in a breakthrough for a large range of problems in graphs and matrices due to sub cubic equivalence. By a known lower bound result [Lee 2002], and a recent development [Abboud et al. 2015] even the much simpler problem of parsing a context free grammar requires fast matrix multiplication time. Therefore any nontrivial multiplicative approximation algorithms for either of the two problems in time less than matrix-multiplication are unlikely to exist.
Grammar, Approximation algorithms, Context, Hidden Markov models, Approximation methods, Data models, Stochastic processes
B. Saha, "Language Edit Distance and Maximum Likelihood Parsing of Stochastic Grammars: Faster Algorithms and Connection to Fundamental Graph Problems," 2015 IEEE 56th Annual Symposium on Foundations of Computer Science (FOCS), Berkeley, CA, USA, 2015, pp. 118-135.