ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper Frame-Level Selective Decoding Using Native and Non-native Acoustic Models for Robust Speech Recognition to Native and Non-native Speech
Cited - time in scopus Share share facebook twitter linkedin kakaostory
Authors
Yoo Rhee, Hoon Chung, Jeom-ja Kang, Yun Keun Lee
Issue Date
2012-11
Citation
International Workshop on Spoken Dialogue Systems (IWSDS) 2012, pp.269-274
Language
English
Type
Conference Paper
DOI
https://dx.doi.org/10.1007/978-1-4614-8280-2_24
Abstract
This paper proposes a frame-level selective-decoding method by using both native and non-native acoustic models (AMs) for a robust speech recognition to non-native speech as well as native speech. It is assumed that we use two kinds of well-trained AMs: (a) AMs trained with native speech (native AMs) and (b) AMs trained with non-native speech (non-native AMs). In other words, a speech feature vector is decoded using native AMs and non-native AMs in parallel and then proper AMs are selected based on the likelihoods of the two AMs. The selected AMs are used to decode the next M frames of speech feature vectors, where M is a pre-defined parameter. The selection and the decoding procedures are repeated until an utterance ends. From the speech recognition experiments for English spoken by Korean, it is shown that an automatic speech recognition (ASR) system employing the proposed method reduces an average word error rate (WER) by 16.6% and 41.3% for English spoken by Koreans and native English, respectively, when compared to an ASR system employing an utterance-level selective decoding method.
KSP Keywords
Frame-level, Robust Speech Recognition, Word Error Rate, acoustic model, automatic speech recognition(ASR), decoding method, non-native speech, speech feature vectors