ETRI-Knowledge Sharing Plaform



논문 검색
구분 SCI
연도 ~ 키워드


학술대회 A Study on Acoustic Parameter Selection Strategies to Improve Deep Learning-Based Speech Synthesis
Cited 0 time in scopus Download 1 time Share share facebook twitter linkedin kakaostory
강현주, 주영선, 장인선, 안충현, 강홍구
Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA-ASC) 2019, pp.618-622
19HR4400, 시청각 장애인의 방송시청을 지원하는 감성표현 서비스 개발, 안충현
In this paper, we investigate the variation in the performance of a deep learning-based speech synthesis (DLSS) system based on the configuration of output acoustic parameters. Our method is mainly applicable for vocoding-based statistical parametric speech synthesis (SPSS), which has advantages in low-resource scenarios. Given the independence assumption of the source-filter model for the spectral and fundamental frequency F0 parameters, we propose a reliable network architecture for training acoustic parameters. Particularly, the F0 parameter suffers from high fluctuation and an extremely low number of dimensions. To relieve these problems, we introduce a context-window approach. Furthermore, we apply data augmentation to the proposed structure to overcome a lack of training data, which is a frequent issue with multi-speaker TTS systems. Experimental results confirm the superiority of the proposed algorithm over conventional ones in both single-speaker and multi-speaker TTS setups.
KSP 제안 키워드
Data Augmentation, Fundamental Frequency, Learning-based, Network Architecture, Selection strategy, Statistical Parametric Speech Synthesis(SPSS), acoustic parameters, deep learning(DL), low-resource, parameter selection, source-filter model