ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper On-device Streaming Transformer-based End-to-End Speech Recognition
Cited 1 time in scopus Share share facebook twitter linkedin kakaostory
Authors
Yoo Rhee Oh, Kiyoung Park
Issue Date
2021-08
Citation
International Speech Communication Association (INTERSPEECH) 2021, pp.1-2
Publisher
ISCA
Language
English
Type
Conference Paper
Abstract
This work is the first attempt to run streaming Transformerbased end-to-end speech recognition on embedded scale IoT systems. Recently there are many researches on online Transformer-based speech recognition such as a contextual block encoder [1] and a block-wise synchronous beam search [2]. Based on them we designed a novel fully-streaming endto-end speech recognition method using Transformer. By efficiently utilizing a connectionist temporal classification network to detect symbol and sentence boundaries, we make decoder in streaming manner. Moreover, by using the optimized model structure, the proposed method could be deployed on a low-power edge device such as Raspberry Pi 4B with the high accuracy and the small latency. With the experiments with Librispeech corpus, the methods achieved word error rates of 3.76% and 9.25% respectively. Also the recognition speed is measured in two aspects; the real-time factor and the user perceived latency. The system is evaluated to have 0.84 xRT and the average latency of 0.75±0.62 seconds on Raspberry Pi 4B.
KSP Keywords
Beam Search, Edge devices, End to End(E2E), End-to-End Speech Recognition, High accuracy, IoT systems, Low-Power, Model structure, Optimized model, Perceived latency, Real-Time