ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper An empirical study on semi-supervised transfer learning schemes for out-of-domain application of wav2vec 2.0
Cited - time in scopus Share share facebook twitter linkedin kakaostory
Authors
Yoonhyung Kim, Hyeong Bae Jeon, Byung Ok Kang, Hoon Chung
Issue Date
2022-10
Citation
International Congress on Acoustics (ICA) 2022, pp.1-6
Language
English
Type
Conference Paper
Abstract
Pre-trained speech models such as wav2vec 2.0 show decent performance for in-domain transfer learning scenarios. However, those models are not robust to domain discrepancy between pre-training and fine-tuning corpora. Preparing a new in-domain pre-trained model could be a naive solution, but it requires huge amounts of speech corpus and computational burdens. Thus, how to conduct transfer learning with off-the-shelf pre-trained models with restricted amounts of out-of-domain data becomes a crucial issue to enhance the usability of pre-trained models. Based on this motivation, in this paper, we present an extensive comparative study on out-of-domain and resource-scarce (i.e., semi-supervised) fine-tuning setups using the wav2vec 2.0 model. In addition, we present self-training results of small in-domain corpus and large out-of-domain corpus. We consider three native-tononnative (i.e., pre-train-to-fine-tune) corpora for automatic speech recognition (ASR) task, which are English spoken by Korean, Japanese, and Indian. Comparative evaluation results show that the ASR accuracy with 100 hours (10 hours labeled) in-domain data is better than that of 60k hours (960 hours labeled) out-of-domain data. Our experimental results would be a useful benchmark for researchers who are interested in utilizing pre-trained speech models in practice.
KSP Keywords
Comparative Evaluation, Domain transfer, Empirical study, Learning scenarios, Off-the-shelf, Pre-Training, Pre-trained model, Semi-supervised, Speech corpus, Training results, Transfer learning