ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper Spoken-to-written text conversion with Large Language Model
Cited 0 time in scopus Share share facebook twitter linkedin kakaostory
Authors
HyunJung Choi, Muyeol Choi, Yohan Lim, Minkyu Lee, Seonhui Kim, Seung Yun, Donghyun Kim, SangHun Kim
Issue Date
2024-09
Citation
International Speech Communication Association (INTERSPEECH) 2024, pp.2410-2414
Language
English
Type
Conference Paper
DOI
https://dx.doi.org/10.21437/Interspeech.2024-376
Abstract
The improvement in end-to-end speech recognition systems has enhanced the readability of results, making it easier for users to understand texts and reducing translation errors. Korean uses both written and spoken forms, making it crucial to standardize pronunciation notation for high readability. Inverse Text Normalization (ITN) technology, which converts pronunciation into readable written form, can be applied in preprocessing training corpora or post-processing speech recognition outcomes. Recent Korean ITN research utilizes transformer models based on training data with both notations, facing performance degradation due to data scarcity. This paper proposes using Large Language Models for ITN to address this issue, overcoming the performance decline from limited data. The proposed method showed an 12.6% reduction in Error Reduction Rate (ERR).
KSP Keywords
Data scarcity, End to End(E2E), End-to-End Speech Recognition, Error reduction, Language Model, Limited data, Post-Processing, Text normalization, Translation errors, performance degradation, reduction rate