ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper A Pitch-Synchronous Speech Analysis and Synthesis Method for DNN-SPSS System
Cited 1 time in scopus Share share facebook twitter linkedin kakaostory
Authors
Jin-Seob Kim, Young-Sun Joo, Hong-Goo Kang, Inseon Jang, ChungHyun Ahn, Jeongil Seo
Issue Date
2016-10
Citation
International Conference on Digital Signal Processing (DSP) 2016, pp.408-411
Language
English
Type
Conference Paper
DOI
https://dx.doi.org/10.1109/ICDSP.2016.7868589
Abstract
This paper proposes a pitch-synchronous deep neural network (DNN)-based statistical parametric speech synthesis (SPSS) system. The pitch-synchronous frames defined by the locations of glottal closure instants (GCIs) are used to extract speech parameters, which significantly reduce coupling effects between vocal tract and excitation signals. As a result, the distribution of spectral parameters within the same context of phonetic classes becomes more uniform, which improves a model trainability especially for a small-scaled DNN framework. Although the effectiveness of pitch-synchronous approach has been proven in other applications, it is not trivial to integrate the method into the typical DNN-based SPSS systems that have regularized structures, i.e. fixed frame rate and fixed dimension of features. In this paper, we design a new DNN-based SPSS system that pitch-synchronously trains and generates speech parameters. Objective and subjective test results verify the superiority of the proposed system compared to the conventional approach.