As an inseparable and crucial part of spoken language, emotions play a substantial role in human-human conversation. They convey information about a person’s needs, how one feels about the objectives of a conversation, the trustworthiness of one’s verbal communication, and more. Accordingly, substantial efforts have been made to generate affective text and speech for conversational AI, artificial storytelling, machine translation, and more. Similarly, there is a push for converting the affect in text and speech – ideally in real time and fully preserving intelligibility, e.g., to hide one’s emotion, for creative applications and in entertainment, or even to augment training data for affect analyzing AI. The rapid development of deep neural networks has increased the ability of computers to produce natural speech and language in many languages. Novel methodologies, including attention-based and sequence-to-sequence Text-to-Speech (TTS), have shown promise in synthesizing high-quality speech directly from text inputs. However, most TTS systems do not convey the emotional context that is omnipresent in human-human interaction. The lack of emotions in the generated speech can be assumed as a major reason for a low perceived likeability of such systems. Conversely, generative models such as WaveNet, which use raw waveforms of the audio signals instead of the text input for speech generation, can help to condition the emotions of the produced speech. Further, variations of generative adversarial networks (GANs), such as StartGANs or StyleGANs, have been successfully applied for speech-based emotion conversion and generation. Similarly, in affective natural language generation and conversion, deep-learning approaches have considerably changed the landscape and opened up new abilities based on massive language corpora and models. Yet, applications are yet to come at large, featuring human-alike real-time generation and conversion of affect in spoken and written language. However, the research in this field is still in its infancy and calls for a new perspective when designing neural speech and language synthesis, generation, and conversion models that consider human affects for a more natural human-AI interaction and a rich plethora of further applications.
This special issue is aimed at contributions from affective speech and language synthesis, generation, and conversion and expanding current research on current methodologies in this field and novel applications integrating such technology. We invite contributions focusing on the theoretical and practical perspectives as well as applications. Topics of interest for this
special issue include, but are not limited to:
- Affective speech synthesis methods
- Affective natural language generation methods
- Affect conversion in spoken and written language methods
- Integration of affective speech and language in conversational AI
- Evaluation methods and user studies for the above
- Databases for affective speech and language synthesis, generation, and conversion
- Applications of affective speech and language synthesis, generation, and conversion
Submission Deadline: 31 March 2022
Reviews Due: 1 May 2022
Revision Deadline: 15 July 2022
Final Decision: 1 September 2022
Publication: September 2022
For author information and guidelines on submission criteria, please visit the TAC Author Information page. Please submit papers through the ScholarOne system, and be sure to select the special-issue name. Manuscripts should not be published or currently submitted for publication elsewhere. Please submit only full papers intended for review, not abstracts, to the ScholarOne portal.
Contact the guest editors at firstname.lastname@example.org.
Shahin Amiriparian, University of Augsburg, Germany
Björn Schuller, Imperial College London, UK
Nabiha Asghar, Microsoft, USA
Heiga Zen, Google Research, Japan
Felix Burkhardt, audEERING / Technical University of Berlin, Germany