ETRI Knowledge Sharing Platform : Semi-supervised Training for Sequence-to-Sequence Speech Recognition Using Reinforcement Learning

Titles

논문 검색
Type		SCI
Year	~	Keyword

List

Conference Paper Semi-supervised Training for Sequence-to-Sequence Speech Recognition Using Reinforcement Learning

Cited 10 time in scopus

Citation: International Joint Conference on Neural Networks (IJCNN) 2020, pp.1-6

Abstract: This paper proposes a reinforcement learning based semi-supervised training approach for sequence-to-sequence automatic speech recognition (ASR) systems. Most recent semi-supervised training approaches are based on multi-loss functions such as cross-entropy loss for speech-to-text paired data and reconstruction loss for speech-text unpaired data.Although these approaches show promising results, some considerations still remain: (a) different loss functions are used for paired and unpaired data separately even though the purpose is classification accuracy improvement, and (b) several methods need auxiliary networks that increase the complexity of a semi-supervised training process.To address these issues, a reinforcement learning based approach is proposed. The proposed approach focuses on rewarding ASR to generate more correct sentences for both paired and unpaired speech data. The proposed approach is evaluated on the Wall Street Journal task domain. The experimental results show that the proposed method is effective by reducing the character error rate from 10.4% to 8.7%.

KSP Keywords: Auxiliary networks, Cross entropy, Entropy loss, Paired data, Reinforcement learning(RL), Speech-To-Text(STT), Wall Street, accuracy improvement, automatic speech recognition(ASR), classification accuracy, error rate

218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, KOREA, Contact: sh.kim@etri.re.kr

Please refrain from automatic collection of e-mail addresses posted on this homepage.