ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper 음성 로그데이터 기반 전이학습을 통한 한국인 아동 영어 음성인식 성능 향상
Cited - time in scopus Share share facebook twitter linkedin kakaostory
Authors
김윤형, 강병옥, 송화전
Issue Date
2023-11
Citation
대한전자공학회 학술 대회 (추계) 2023, pp.552-555
Publisher
대한전자공학회
Language
Korean
Type
Conference Paper
Abstract
One of the major drawbacks of data-driven end-to-end automatic speech recognition (ASR) models is the intrinsic weakness to domain shift, such as fluency and accent. Transfer learning on human-labeled data of target domain could be a solution, but manually labeling is time consuming and compels extra costs. In this paper, we propose a transfer learning curriculum which utilizes unlabeled speech log data acquired by our application service. The log data are English utterances spoken by Korean children whose speaking style is different from that of native English speakers in fluency and accent. Firstly, we assign pseudo labels to the log data. Secondly, we propose to select and balance the pseudo labeled dataset to avoid overfitting. Experimental results show that our approach can reduce ASR errors by 9.2% than our baseline model surpassing the state-of-the-art pretrained ASR model.
KSP Keywords
Baseline model, Data-Driven, End to End(E2E), Log data, Pseudo labels, Target domain, Transfer learning, application services, automatic speech recognition(ASR), labeled data, speaking style