ETRI-Knowledge Sharing Plaform

ENGLISH

성과물

논문 검색
구분 SCI
연도 ~ 키워드

상세정보

학술지 Combining Multiple Acoustic Models in GMM Spaces for Robust Speech Recognition
Cited 5 time in scopus Download 6 time Share share facebook twitter linkedin kakaostory
저자
강병옥, 권오욱
발행일
201603
출처
IEICE Transactions on Information and Systems, v.E99.D no.3, pp.724-730
ISSN
1745-1361
출판사
일본, 전자정보통신학회 (IEICE)
DOI
https://dx.doi.org/10.1587/transinf.2015EDP7252
협약과제
15MS9500, 언어학습을 위한 자유발화형 음성대화처리 원천기술 개발, 이윤근
초록
We propose a new method to combine multiple acoustic models in Gaussian mixture model (GMM) spaces for robust speech recognition. Even though large vocabulary continuous speech recognition (LVCSR) systems are recently widespread, they often make egregious recognition errors resulting from unavoidable mismatch of speaking styles or environments between the training and real conditions. To handle this problem, a multi-style training approach has been used conventionally to train a large acoustic model by using a large speech database with various kinds of speaking styles and environment noise. But, in this work, we combine multiple sub-models trained for different speaking styles or environment noise into a large acoustic model by maximizing the log-likelihood of the sub-model states sharing the same phonetic context and position. Then the combined acoustic model is used in a new target system, which is robust to variation in speaking style and diverse environment noise. Experimental results show that the proposed method significantly outperforms the conventional methods in two tasks: Non-native English speech recognition for second-language learning systems and noise-robust point-of-interest (POI) recognition for car navigation systems.
KSP 제안 키워드
Car navigation, Conventional methods, Gaussian mixture Model(GMM), Language Learning, Learning System, Log-likelihood, Point of interest, Speech Database, Sub-models, acoustic model, environment noise