ETRI Knowledge Sharing Platform : Spoken-to-written text conversion with Large Language Model

Titles

논문 검색
Type		SCI
Year	~	Keyword

List

Conference Paper Spoken-to-written text conversion with Large Language Model

Cited 1 time in scopus

Authors: HyunJung Choi, Muyeol Choi, Yohan Lim, Minkyu Lee, Seonhui Kim, Seung Yun, Donghyun Kim, SangHun Kim

Citation: International Speech Communication Association (INTERSPEECH) 2024, pp.2410-2414

Abstract: The improvement in end-to-end speech recognition systems has enhanced the readability of results, making it easier for users to understand texts and reducing translation errors. Korean uses both written and spoken forms, making it crucial to standardize pronunciation notation for high readability. Inverse Text Normalization (ITN) technology, which converts pronunciation into readable written form, can be applied in preprocessing training corpora or post-processing speech recognition outcomes. Recent Korean ITN research utilizes transformer models based on training data with both notations, facing performance degradation due to data scarcity. This paper proposes using Large Language Models for ITN to address this issue, overcoming the performance decline from limited data. The proposed method showed an 12.6% reduction in Error Reduction Rate (ERR).

KSP Keywords: Data scarcity, End to End(E2E), End-to-End Speech Recognition, Error reduction, Language Model, Limited data, Post-Processing, Text normalization, Translation errors, performance degradation, reduction rate

218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, KOREA, Contact: sh.kim@etri.re.kr

Please refrain from automatic collection of e-mail addresses posted on this homepage.