ETRI Knowledge Sharing Platform : A Pitch-Synchronous Speech Analysis and Synthesis Method for DNN-SPSS System

Titles

논문 검색
Type		SCI
Year	~	Keyword

List

Conference Paper A Pitch-Synchronous Speech Analysis and Synthesis Method for DNN-SPSS System

Cited 1 time in scopus

Authors: Jin-Seob Kim, Young-Sun Joo, Hong-Goo Kang, Inseon Jang, ChungHyun Ahn, Jeongil Seo

Citation: International Conference on Digital Signal Processing (DSP) 2016, pp.408-411

Abstract: This paper proposes a pitch-synchronous deep neural network (DNN)-based statistical parametric speech synthesis (SPSS) system. The pitch-synchronous frames defined by the locations of glottal closure instants (GCIs) are used to extract speech parameters, which significantly reduce coupling effects between vocal tract and excitation signals. As a result, the distribution of spectral parameters within the same context of phonetic classes becomes more uniform, which improves a model trainability especially for a small-scaled DNN framework. Although the effectiveness of pitch-synchronous approach has been proven in other applications, it is not trivial to integrate the method into the typical DNN-based SPSS systems that have regularized structures, i.e. fixed frame rate and fixed dimension of features. In this paper, we design a new DNN-based SPSS system that pitch-synchronously trains and generates speech parameters. Objective and subjective test results verify the superiority of the proposed system compared to the conventional approach.

KSP Keywords: Coupling effects, Deep neural network(DNN), Excitation signals, Fixed dimension, Glottal Closure Instants, Speech analysis, Statistical Parametric Speech Synthesis(SPSS), Subjective test, Synchronous approach, Synthesis Method, analysis and synthesis

218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, KOREA, Contact: sh.kim@etri.re.kr

Please refrain from automatic collection of e-mail addresses posted on this homepage.