ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper Emotional Speech Synthesis Based on Style Embedded Tacotron2 Framework
Cited 17 time in scopus Share share facebook twitter linkedin kakaostory
Authors
Ohsung Kwon, Inseon Jang, ChungHyun Ahn, Hong-Goo Kang
Issue Date
2019-06
Citation
International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC) 2019, pp.344-347
Publisher
IEEE
Language
English
Type
Conference Paper
DOI
https://dx.doi.org/10.1109/ITC-CSCC.2019.8793393
Abstract
In this paper, we propose a speech synthesis system that effectively generates multiple types of emotional speech using the concept of global style token (GST); where the emotion-related style information is presented by an additional style embedding vector. Although the GST is not a new idea, no one has been utilized the idea for an emotional speech synthesis task. We explicitly combine the GST idea with the Tacotron2 framework to implement an emotional text-to-speech system. The analysis results demonstrate that the proposed GST structure successfully transfers various types of emotional information to the synthesized speech. Subjective listening tests to evaluate the naturalness and emotional expression of synthesized speech are conducted to verify the superiority of the proposed algorithm.
KSP Keywords
Emotional expression, Synthesized speech, Text-To-Speech(TTS), emotional speech, speech synthesis system, subjective listening tests, text-to-speech system