ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Journal Article Fast offline transformer‐based end‐to‐end automatic speech recognition for real‐world applications
Cited 4 time in scopus Download 353 time Share share facebook twitter linkedin kakaostory
Authors
Yoo Rhee Oh, Kiyoung Park, Jeon Gue Park
Issue Date
2022-06
Citation
ETRI Journal, v.44, no.3, pp.476-490
ISSN
1225-6463
Publisher
한국전자통신연구원 (ETRI)
Language
English
Type
Journal Article
DOI
https://dx.doi.org/10.4218/etrij.2021-0106
Abstract
With the recent advances in technology, automatic speech recognition (ASR) has been widely used in real-world applications. The efficiency of converting large amounts of speech into text accurately with limited resources has become more vital than ever. In this study, we propose a method to rapidly recognize a large speech database via a transformer-based end-to-end model. Transformers have improved the state-of-the-art performance in many fields. However, they are not easy to use for long sequences. In this study, various techniques to accelerate the recognition of real-world speeches are proposed and tested, including decoding via multiple-utterance-batched beam search, detecting end of speech based on a connectionist temporal classification (CTC), restricting the CTC-prefix score, and splitting long speeches into short segments. Experiments are conducted with the Librispeech dataset and the real-world Korean ASR tasks to verify the proposed methods. From the experiments, the proposed system can convert 8 h of speeches spoken at real-world meetings into text in less than 3 min with a 10.73% character error rate, which is 27.1% relatively lower than that of conventional systems.
KSP Keywords
Art performance, Beam Search, End to End(E2E), Limited resources, Long sequence, Real-world applications, Speech Database, automatic speech recognition(ASR), connectionist temporal classification(CTC), error rate, state-of-The-Art
This work is distributed under the term of Korea Open Government License (KOGL)
(Type 4: : Type 1 + Commercial Use Prohibition+Change Prohibition)
Type 4: