ETRI Knowledge Sharing Platform : A Study on Acoustic Parameter Selection Strategies to Improve Deep Learning-Based Speech Synthesis

Titles

논문 검색
Type		SCI
Year	~	Keyword

List

Conference Paper A Study on Acoustic Parameter Selection Strategies to Improve Deep Learning-Based Speech Synthesis

Cited 0 time in scopus

Authors: Hyeonjoo Kang, Young-Sun Joo, Inseon Jang, Chunghyun Ahn, Hong-Goo Kang

Citation: Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA-ASC) 2019, pp.618-622

Abstract: In this paper, we investigate the variation in the performance of a deep learning-based speech synthesis (DLSS) system based on the configuration of output acoustic parameters. Our method is mainly applicable for vocoding-based statistical parametric speech synthesis (SPSS), which has advantages in low-resource scenarios. Given the independence assumption of the source-filter model for the spectral and fundamental frequency F0 parameters, we propose a reliable network architecture for training acoustic parameters. Particularly, the F0 parameter suffers from high fluctuation and an extremely low number of dimensions. To relieve these problems, we introduce a context-window approach. Furthermore, we apply data augmentation to the proposed structure to overcome a lack of training data, which is a frequent issue with multi-speaker TTS systems. Experimental results confirm the superiority of the proposed algorithm over conventional ones in both single-speaker and multi-speaker TTS setups.

KSP Keywords: Data Augmentation, Fundamental Frequency(F0), Learning-based, Low-Resource, Network Architecture, Selection strategy, Statistical Parametric Speech Synthesis(SPSS), acoustic parameters, deep learning(DL), parameter selection, source-filter model

218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, KOREA, Contact: sh.kim@etri.re.kr

Please refrain from automatic collection of e-mail addresses posted on this homepage.