15th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'03)
Using a Text-to-Speech Synthesizer to Generate a Reverse Turing Test
Sacramento, California, USA
November 03-November 05
ISBN: 0-7695-2038-3
Recognition of synthesized speech by a diphone synthesizer is thought to be easy for a machine due to the small variation of the synthesized speech. In this paper, we report the recognition rate of synthesized utterances in a noisy environment. Our experiments show that the performance of a HMM recognizer is not too bad even in the presence of background noise. These recognition results nearly approach the performance of a human. Thus, although there seems to be a gap in the ability of understanding synthesized speech with background noise between humans and computers, our results discourage using this gap to build an audio-based CAPTCHA (i.e., a reverse Turing test which can tell computers and humans apart). Moreover, we explored the possible use of a classification and regression tree to control the hardness of our CAPTCHA.