ETRI-Knowledge Sharing Plaform

ENGLISH

성과물

논문 검색
구분 SCI
연도 ~ 키워드

상세정보

학술지 로컬 프레임 속도 변경에 의한 데이터 증강을 이용한 트랜스포머 기반 음성 인식 성능 향상
Cited 0 time in scopus Download 99 time Share share facebook twitter linkedin kakaostory
저자
임성수, 강병옥, 권오욱
발행일
202203
출처
한국음향학회지, v.41 no.2, pp.122-129
ISSN
2287-3775
출판사
한국음향학회
DOI
https://dx.doi.org/10.7776/ASK.2022.41.2.122
협약과제
21HS2800, 준지도학습형 언어지능 원천기술 및 이에 기반한 외국인 지원용 한국어 튜터링 서비스 개발, 이윤근
초록
In this paper, we propose a method to improve the performance of Transformer-based speech recognizers using data augmentation that locally adjusts the frame rate. First, the start time and length of the part to be augmented in the original voice data are randomly selected. Then, the frame rate of the selected part is changed to a new frame rate by using linear interpolation. Experimental results using the Wall Street Journal and LibriSpeech speech databases showed that the convergence time took longer than the baseline, but the recognition accuracy was improved in most cases. In order to further improve the performance, various parameters such as the length and the speed of the selected parts were optimized. The proposed method was shown to achieve relative performance improvement of 11.8 % and 14.9 % compared with the baseline in the Wall Street Journal and LibriSpeech speech databases, respectively.
KSP 제안 키워드
Data Augmentation, Frame rate, Recognition Accuracy, Relative Performance, Voice Data, Wall Street, convergence time, linear interpolation, performance improvement, speech recognizers, transformer-based
본 저작물은 크리에이티브 커먼즈 저작자 표시 - 비영리 (CC BY NC) 조건에 따라 이용할 수 있습니다.
저작자 표시 - 비영리 (CC BY NC)