ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper A Pitch-Synchronous Speech Analysis and Synthesis Method for DNN-SPSS System
Cited 1 time in scopus Share share facebook twitter linkedin kakaostory
Authors
Jin-Seob Kim, Young-Sun Joo, Hong-Goo Kang, Inseon Jang, ChungHyun Ahn, Jeongil Seo
Issue Date
2016-10
Citation
International Conference on Digital Signal Processing (DSP) 2016, pp.408-411
Language
English
Type
Conference Paper
DOI
https://dx.doi.org/10.1109/ICDSP.2016.7868589
Abstract
This paper proposes a pitch-synchronous deep neural network (DNN)-based statistical parametric speech synthesis (SPSS) system. The pitch-synchronous frames defined by the locations of glottal closure instants (GCIs) are used to extract speech parameters, which significantly reduce coupling effects between vocal tract and excitation signals. As a result, the distribution of spectral parameters within the same context of phonetic classes becomes more uniform, which improves a model trainability especially for a small-scaled DNN framework. Although the effectiveness of pitch-synchronous approach has been proven in other applications, it is not trivial to integrate the method into the typical DNN-based SPSS systems that have regularized structures, i.e. fixed frame rate and fixed dimension of features. In this paper, we design a new DNN-based SPSS system that pitch-synchronously trains and generates speech parameters. Objective and subjective test results verify the superiority of the proposed system compared to the conventional approach.
KSP Keywords
Coupling effects, Deep neural network(DNN), Excitation signals, Fixed dimension, Glottal Closure Instants, Speech analysis, Statistical Parametric Speech Synthesis(SPSS), Subjective test, Synchronous approach, Synthesis Method, analysis and synthesis