Architecture, User Interface, and Enabling Technology in Windows Vista's Speech Systems
September 2007 (vol. 56 no. 9)
pp. 1156-1168
Existing speech recognition systems have claimed high accuracy for specific tasks such as dictation. What is new in Windows Speech recognition for Vista is a combination of high accuracy and high usability for the end-to-end speech experience. This paper describes the architecture, user interface and key technologies that make up the speech system incorporated in Microsoft Windows Vista. It outlines some of the challenges encountered in providing a speech-based interface to a system as complex and extensible as the modern desktop PC, as well as the technology developments that have made this possible. In particular, the paper describes key elements of the speech user interface and how they maintain the user's ability to control the system despite limitations in the underlying recognition technology. The paper also explains how feedback and adaptation systems are used to tailor the experience to each user and their particular style of speaking/use of language.

Index Terms:
Adaptation, Operating Systems, Speech recognition and synthesis, User interfaces
