ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Journal Article 로컬 프레임 속도 변경에 의한 데이터 증강을 이용한 트랜스포머 기반 음성 인식 성능 향상
Cited 0 time in scopus Download 149 time Share share facebook twitter linkedin kakaostory
Authors
임성수, 강병옥, 권오욱
Issue Date
2022-03
Citation
한국음향학회지, v.41, no.2, pp.122-129
ISSN
2287-3775
Publisher
한국음향학회
Language
Korean
Type
Journal Article
DOI
https://dx.doi.org/10.7776/ASK.2022.41.2.122
Abstract
In this paper, we propose a method to improve the performance of Transformer-based speech recognizers using data augmentation that locally adjusts the frame rate. First, the start time and length of the part to be augmented in the original voice data are randomly selected. Then, the frame rate of the selected part is changed to a new frame rate by using linear interpolation. Experimental results using the Wall Street Journal and LibriSpeech speech databases showed that the convergence time took longer than the baseline, but the recognition accuracy was improved in most cases. In order to further improve the performance, various parameters such as the length and the speed of the selected parts were optimized. The proposed method was shown to achieve relative performance improvement of 11.8 % and 14.9 % compared with the baseline in the Wall Street Journal and LibriSpeech speech databases, respectively.
KSP Keywords
Data Augmentation, Frame rate, Recognition Accuracy, Relative Performance, Voice Data, Wall Street, convergence time, linear interpolation, performance improvement, speech recognizers, transformer-based
This work is distributed under the term of Creative Commons License (CCL)
(CC BY NC)
CC BY NC