ETRI-Knowledge Sharing Plaform



논문 검색
구분 SCI
연도 ~ 키워드


학술대회 Emotional Speech Synthesis with Rich and Granularized Control
Cited 64 time in scopus Download 4 time Share share facebook twitter linkedin kakaostory
엄세연, 오상신, 변경근, 장인선, 안충현, 강홍구
International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2020, pp.7254-7258
20HH4500, 시청각 장애인의 방송시청을 지원하는 감성표현 서비스 개발, 안충현
This paper proposes an effective emotion control method for an end-to-end text-to-speech (TTS) system. To flexibly control the distinct characteristic of a target emotion category, it is essential to determine embedding vectors representing the TTS input. We introduce an inter-to-intra emotional distance ratio algorithm to the embedding vectors that can minimize the distance to the target emotion category while maximizing its distance to the other emotion categories. To further enhance the expressiveness of a target speech, we also introduce an effective interpolation technique that enables the intensity of a target emotion to be gradually changed to that of neutral speech. Subjective evaluation results in terms of emotional expressiveness and controllability show the superiority of the proposed algorithm to the conventional methods.
KSP 제안 키워드
Conventional methods, Emotion control, End to End(E2E), Text-To-Speech(TTS), control method, emotional speech, interpolation technique, neutral speech, speech synthesis, subjective evaluation