ETRI-Knowledge Sharing Plaform



논문 검색
구분 SCI
연도 ~ 키워드


학술지 Rank-weighted Reconstruction Feature for a Robust Deep Neural Network-based Acoustic Model
Cited 3 time in scopus Download 9 time Share share facebook twitter linkedin kakaostory
정훈, 박전규, 정호영
ETRI Journal, v.41 no.2, pp.235-241
한국전자통신연구원 (ETRI)
18ZS1100, 자율성장형 AI 핵심원천기술 연구, 이윤근
In this paper, we propose a rank-weighted reconstruction feature to improve the robustness of a feed-forward deep neural network (FFDNN)-based acoustic model. In the FFDNN-based acoustic model, an input feature is constructed by vectorizing a submatrix that is created by slicing the feature vectors of frames within a context window. In this type of feature construction, the appropriate context window size is important because it determines the amount of trivial or discriminative information, such as redundancy, or temporal context of the input features. However, we ascertained whether a single parameter is sufficiently able to control the quantity of information. Therefore, we investigated the input feature construction from the perspectives of rank and nullity, and proposed a rank-weighted reconstruction feature herein, that allows for the retention of speech information components and the reduction in trivial components. The proposed method was evaluated in the TIMIT phone recognition and Wall Street Journal (WSJ) domains. The proposed method reduced the phone error rate of the TIMIT domain from 18.4% to 18.0%, and the word error rate of the WSJ domain from 4.70% to 4.43%.
KSP 제안 키워드
Deep neural network(DNN), Discriminative information, Feature Vector, Feed-forward Deep Neural Network, Input features, Phone Error Rate, Single parameter, Speech information, Wall Street, Window Size, acoustic model