Issue No. 07 - July (1996 vol. 29)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/2.511967
<p>Speech recognition performance has come a long way in the past 10 years. Present technology permits speaker-independent, continuous-speech, large-vocabulary dictation systems with word error rates of about 10 percent. Machine translation has also improved, but merely combining these technologies cannot produce good speech translation. Providing useful speech translation means attempting more than a sentence-by-sentence translation: It means interpreting an utterance or extracting its main intent. This often involves summarizing, which requires semantic and pragmatic interpretation within a domain of discourse. The Janus-II system described in this article not only extracts intent but also deals with problems such as ill-formed sentences, noise, and speech recognition errors. It does so by successively applying all sources of knowledge--from acoustic to discourse--to narrow the search for the most plausible translation. The system's main modules are speech recognition, parsing, discourse processing, and generation. Each is language-independent, consisting of a general processor that can be loaded with language-specific knowledge. Deriving an Interlingua--a language-independent representation of meaning--is key to system versatility. The Janus-II team is experimenting with spoken-language interpretation in an interactive videoconferencing environment, portable speech translation, and simultaneous dialogue translation. Questions concerning the human factors of interactive spoken-language translation await further study in actual field use. </p>
A. Waibel, "Interactive Translation of Conversational Speech," in Computer, vol. 29, no. , pp. 41-48, 1996.