ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Journal Article Joint streaming model for backchannel prediction and automatic speech recognition
Cited 1 time in scopus Download 192 time Share share facebook twitter linkedin kakaostory
Authors
Yong-Seok Choi, Jeong-Uk Bang, Seung Hi Kim
Issue Date
2024-02
Citation
ETRI Journal, v.46, no.1, pp.118-126
ISSN
1225-6463
Publisher
한국전자통신연구원
Language
English
Type
Journal Article
DOI
https://dx.doi.org/10.4218/etrij.2023-0358
Abstract
In human conversations, listeners often utilize brief backchannels such as “uh‐huh” or “yeah.” Timely backchannels are crucial to understanding and increasing trust among conversational partners. In human–machine conversation systems, users can engage in natural conversations when a conversational agent generates backchannels like a human listener. We propose a method that simultaneously predicts backchannels and recognizes speech in real time. We use a streaming transformer and adopt multitask learning for concurrent backchannel prediction and speech recognition. The experimental results demonstrate the superior performance of our method compared with previous works while maintaining a similar single‐task speech recognition performance. Owing to the extremely imbalanced training data distribution, the single‐task backchannel prediction model fails to predict any of the backchannel categories, and the proposed multitask approach substantially enhances the backchannel prediction performance. Notably, in the streaming prediction scenario, the performance of backchannel prediction improves by up to 18.7% compared with existing methods.
KSP Keywords
Conversational Agents, Data Distribution, Natural conversations, Real-time, Recognition performance, Streaming model, automatic speech recognition(ASR), multi-task learning, prediction model, prediction performance, superior performance
This work is distributed under the term of Korea Open Government License (KOGL)
(Type 4: : Type 1 + Commercial Use Prohibition+Change Prohibition)
Type 4: