ETRI Knowledge Sharing Platform : Fast Speaker Adaptation using Extended Diagonal Linear Transformation for Deep Neural Networks

Titles

논문 검색
Type		SCI
Year	~	Keyword

List

Journal Article Fast Speaker Adaptation using Extended Diagonal Linear Transformation for Deep Neural Networks

Cited 2 time in scopus

Download 192 time Share share

Abstract: This paper explores new techniques that are based on a hidden-layer linear transformation for fast speaker adaptation used in deep neural networks (DNNs). Conventional methods using affine transformations are ineffective because they require a relatively large number of parameters to perform. Meanwhile, methods that employ singular-value decomposition (SVD) are utilized because they are effective at reducing adaptive parameters. However, a matrix decomposition is computationally expensive when using online services. We propose the use of an extended diagonal linear transformation method to minimize adaptation parameters without SVD to increase the performance level for tasks that require smaller degrees of adaptation. In Korean large vocabulary continuous speech recognition (LVCSR) tasks, the proposed method shows significant improvements with error-reduction rates of 8.4% and 17.1% in five and 50 conversational sentence adaptations, respectively. Compared with the adaptation methods using SVD, there is an increased recognition performance with fewer parameters.

KSP Keywords: Adaptive parameter, Affine Transformation, Conventional methods, Deep neural network(DNN), Linear Transformation, Matrix decomposition, Online services, Performance levels, Recognition performance, Transformation method, computationally expensive

This work is distributed under the term of Korea Open Government License (KOGL)
(Type 4: : Type 1 + Commercial Use Prohibition+Change Prohibition)

218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, KOREA, Contact: sh.kim@etri.re.kr

Please refrain from automatic collection of e-mail addresses posted on this homepage.