ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper Bidirectional Spoken-Written Text Conversion with Large Language Models
Cited 0 time in scopus Share share facebook twitter linkedin kakaostory
Authors
Muyeol Choi, HyunJung Choi, Yohan Lim, Jeonguk Bang, Minkyu Lee, Seonhui Kim, Seung Yun, Donghyun Kim, Minsoo Kim, SangHun Kim
Issue Date
2025-08
Citation
International Speech Communication Association (INTERSPEECH) 2025, pp.5088-5092
Language
English
Type
Conference Paper
DOI
https://dx.doi.org/10.21437/Interspeech.2025-1610
Abstract
Traditional ASR systems normalize transcriptions into spoken form for training, leading to spoken form outputs. In contrast, modern Transformer-based models directly map speech-to-text, preserving the written form. However, existing speech databases mainly contain spoken form, causing inconsistencies in recognition results. To address this, Inverse Text Normalization (ITN) is required, and dual transcription data is crucial for effective training. However, constructing such datasets is costly and time-consuming. This study proposes a data augmentation method leveraging LLMs to automatically generate dual transcription data with minimal effort. It employs iterative learning to expand the dataset through supervised and semi-supervised methods. Additionally, the bidirectional text conversion (BTC) model supports both ITN and TN within a unified framework. Experimental results demonstrate that the proposed method achieved an ERR of 13.4% and 4.7%, outperforming prior approaches.
KSP Keywords
ASR Systems, Augmentation method, Data Augmentation, Effective training, Semi-supervised, Speech-To-Text(STT), Text normalization, iterative learning, language models, supervised method, transformer-based