ETRI-Knowledge Sharing Plaform

ENGLISH

성과물

논문 검색
구분 SCI
연도 ~ 키워드

상세정보

학술지 Fast Offline Transformer-based End-to-end Automatic Speech Recognition for Real-world Applications
Cited 4 time in scopus Download 233 time Share share facebook twitter linkedin kakaostory
저자
오유리, 박기영, 박전규
발행일
202206
출처
ETRI Journal, v.44 no.3, pp.476-490
ISSN
1225-6463
출판사
한국전자통신연구원 (ETRI)
DOI
https://dx.doi.org/10.4218/etrij.2021-0106
협약과제
21HS3400, 다중 화자간 대화 음성인식 기술개발, 박전규
초록
With the recent advances in technology, automatic speech recognition (ASR) has been widely used in real-world applications. The efficiency of converting large amounts of speech into text accurately with limited resources has become more vital than ever. In this study, we propose a method to rapidly recognize a large speech database via a transformer-based end-to-end model. Transformers have improved the state-of-the-art performance in many fields. However, they are not easy to use for long sequences. In this study, various techniques to accelerate the recognition of real-world speeches are proposed and tested, including decoding via multiple-utterance-batched beam search, detecting end of speech based on a connectionist temporal classification (CTC), restricting the CTC-prefix score, and splitting long speeches into short segments. Experiments are conducted with the Librispeech dataset and the real-world Korean ASR tasks to verify the proposed methods. From the experiments, the proposed system can convert 8 h of speeches spoken at real-world meetings into text in less than 3 min with a 10.73% character error rate, which is 27.1% relatively lower than that of conventional systems.
KSP 제안 키워드
Art performance, Beam Search, End to End(E2E), Limited resources, Real-world applications, Speech Database, automatic speech recognition(ASR), connectionist temporal classification(CTC), error rate, long sequence, state-of-The-Art
본 저작물은 공공누리 제4유형 : 출처표시 + 상업적 이용금지 + 변경금지 조건에 따라 이용할 수 있습니다.
제4유형