ETRI-Knowledge Sharing Plaform

ENGLISH

성과물

논문 검색
구분 SCI
연도 ~ 키워드

상세정보

학술대회 A Pitch-Synchronous Speech Analysis and Synthesis Method for DNN-SPSS System
Cited 1 time in scopus Download 0 time Share share facebook twitter linkedin kakaostory
저자
김진섭, 주영선, 강홍구, 장인선, 안충현, 서정일
발행일
201610
출처
International Conference on Digital Signal Processing (DSP) 2016, pp.408-411
DOI
https://dx.doi.org/10.1109/ICDSP.2016.7868589
협약과제
16MR3200, 시청각장애인 방송접근권 향상을 위한 디지털자막·음성해설 서비스 기술 개발, 안충현
초록
This paper proposes a pitch-synchronous deep neural network (DNN)-based statistical parametric speech synthesis (SPSS) system. The pitch-synchronous frames defined by the locations of glottal closure instants (GCIs) are used to extract speech parameters, which significantly reduce coupling effects between vocal tract and excitation signals. As a result, the distribution of spectral parameters within the same context of phonetic classes becomes more uniform, which improves a model trainability especially for a small-scaled DNN framework. Although the effectiveness of pitch-synchronous approach has been proven in other applications, it is not trivial to integrate the method into the typical DNN-based SPSS systems that have regularized structures, i.e. fixed frame rate and fixed dimension of features. In this paper, we design a new DNN-based SPSS system that pitch-synchronously trains and generates speech parameters. Objective and subjective test results verify the superiority of the proposed system compared to the conventional approach.
KSP 제안 키워드
Coupling effects, Deep neural network(DNN), Excitation signals, Fixed dimension, Frame rate, Speech analysis, Statistical Parametric Speech Synthesis(SPSS), Subjective test, Synchronous approach, Synthesis Method, analysis and synthesis