ETRI-Knowledge Sharing Plaform

ENGLISH

성과물

논문 검색
구분 SCI
연도 ~ 키워드

상세정보

학술지 Multimodal Unsupervised Speech Translation for Recognizing and Evaluating Second Language Speech
Cited 4 time in scopus Download 87 time Share share facebook twitter linkedin kakaostory
저자
이윤경, 박전규
발행일
202103
출처
Applied Sciences, v.11 no.6, pp.1-17
ISSN
2076-3417
출판사
MDPI
DOI
https://dx.doi.org/10.3390/app11062642
협약과제
21HS2800, 준지도학습형 언어지능 원천기술 및 이에 기반한 외국인 지원용 한국어 튜터링 서비스 개발, 이윤근
초록
This paper addresses an automatic proficiency evaluation and speech recognition for second language (L2) speech. The proposed method recognizes the speech uttered by the L2 speaker, measures a variety of fluency scores, and evaluates the proficiency of the speaker's spoken English. Stress and rhythm scores are one of the important factors used to evaluate fluency in spoken English and are computed by comparing the stress patterns and the rhythm distributions to those of native speakers. In order to compute the stress and rhythm scores even when the phonemic sequence of the L2 speaker's English sentence is different from the native speaker's one, we align the phonemic sequences based on a dynamic time?릛arping approach. We also improve the performance of the speech recognition system for non?릒ative speakers and compute fluency features more accurately by augmenting the non?릒ative training dataset and training an acoustic model with the augmented dataset. In this work, we augment the non?릒ative speech by converting some speech signal characteristics (style) while preserving its linguistic information. The proposed variational autoencoder (VAE)?륿ased speech conversion network trains the conversion model by decomposing the spectral features of the speech into a speaker?릋nvariant content factor and a speaker?릗pecific style factor to estimate diverse and robust speech styles. Experimental results show that the proposed method effectively measures the fluency scores and generates diverse output signals. Also, in the proficiency evaluation and speech recognition tests, the proposed method improves the proficiency score performance and speech recognition accuracy for all proficiency areas compared to a method employ-ing conventional acoustic models.
KSP 제안 키워드
English sentence, Native speakers, Proficiency evaluation, Signal characteristics, Speech Signals, Speech recognition accuracy, Speech recognition system, Speech translation, acoustic model, conversion model, linguistic information
본 저작물은 크리에이티브 커먼즈 저작자 표시 (CC BY) 조건에 따라 이용할 수 있습니다.
저작자 표시 (CC BY)