ETRI-Knowledge Sharing Plaform



논문 검색
구분 SCI
연도 ~ 키워드


학술대회 Improving End-To-End Speech Translation Model with Bert-Based Contextual Information
Cited 1 time in scopus Download 12 time Share share facebook twitter linkedin kakaostory
방정욱, 이민규, 윤승, 김상훈
International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022, pp.6277-6231
21ZS1100, 자율성장형 복합인공지능 원천기술 연구, 송화전
This paper proposes an end-to-end speech translation system that utilizes contextual information. Contextual information helps clarify the meaning of the utterances. However, conventional end-to-end speech translation (E2E-ST) is primarily designed to handle single-utterance. Thus, we introduce a context encoder that extracts contextual information from previous translation results. Here, the context encoder obtains high-quality contextual information by adopting the BERT model. Then, we combine it with speech information extracted from speech signals to generate translation results. On the widely used TED-based speech translation corpus, we show that the results of the contextual E2E-ST model are significantly better than those of the single utterance-based E2E-ST model. Furthermore, we demonstrate that contextual information contributes to the processing of unclearly spoken utterances as well as ambiguity caused by pronouns and homophones.
KSP 제안 키워드
Contextual information, End to End(E2E), High-quality, Speech Signals, Speech information, Speech translation, Translation Model, Translation system