ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper A Study on Acoustic Parameter Selection Strategies to Improve Deep Learning-Based Speech Synthesis
Cited 0 time in scopus Share share facebook twitter linkedin kakaostory
Authors
Hyeonjoo Kang, Young-Sun Joo, Inseon Jang, Chunghyun Ahn, Hong-Goo Kang
Issue Date
2019-11
Citation
Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA-ASC) 2019, pp.618-622
Language
English
Type
Conference Paper
DOI
https://dx.doi.org/10.1109/APSIPAASC47483.2019.9023146
Abstract
In this paper, we investigate the variation in the performance of a deep learning-based speech synthesis (DLSS) system based on the configuration of output acoustic parameters. Our method is mainly applicable for vocoding-based statistical parametric speech synthesis (SPSS), which has advantages in low-resource scenarios. Given the independence assumption of the source-filter model for the spectral and fundamental frequency F0 parameters, we propose a reliable network architecture for training acoustic parameters. Particularly, the F0 parameter suffers from high fluctuation and an extremely low number of dimensions. To relieve these problems, we introduce a context-window approach. Furthermore, we apply data augmentation to the proposed structure to overcome a lack of training data, which is a frequent issue with multi-speaker TTS systems. Experimental results confirm the superiority of the proposed algorithm over conventional ones in both single-speaker and multi-speaker TTS setups.
KSP Keywords
Data Augmentation, Fundamental Frequency(F0), Learning-based, Network Architecture, Selection strategy, Statistical Parametric Speech Synthesis(SPSS), acoustic parameters, deep learning(DL), low resource, parameter selection, source-filter model