ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Journal Article Multimodal Unsupervised Speech Translation for Recognizing and Evaluating Second Language Speech
Cited 4 time in scopus Download 183 time Share share facebook twitter linkedin kakaostory
Authors
Yun Kyung Lee, Jeon Gue Park
Issue Date
2021-03
Citation
Applied Sciences, v.11, no.6, pp.1-17
ISSN
2076-3417
Publisher
MDPI
Language
English
Type
Journal Article
DOI
https://dx.doi.org/10.3390/app11062642
Abstract
This paper addresses an automatic proficiency evaluation and speech recognition for second language (L2) speech. The proposed method recognizes the speech uttered by the L2 speaker, measures a variety of fluency scores, and evaluates the proficiency of the speaker's spoken English. Stress and rhythm scores are one of the important factors used to evaluate fluency in spoken English and are computed by comparing the stress patterns and the rhythm distributions to those of native speakers. In order to compute the stress and rhythm scores even when the phonemic sequence of the L2 speaker's English sentence is different from the native speaker's one, we align the phonemic sequences based on a dynamic time?릛arping approach. We also improve the performance of the speech recognition system for non?릒ative speakers and compute fluency features more accurately by augmenting the non?릒ative training dataset and training an acoustic model with the augmented dataset. In this work, we augment the non?릒ative speech by converting some speech signal characteristics (style) while preserving its linguistic information. The proposed variational autoencoder (VAE)?륿ased speech conversion network trains the conversion model by decomposing the spectral features of the speech into a speaker?릋nvariant content factor and a speaker?릗pecific style factor to estimate diverse and robust speech styles. Experimental results show that the proposed method effectively measures the fluency scores and generates diverse output signals. Also, in the proficiency evaluation and speech recognition tests, the proposed method improves the proficiency score performance and speech recognition accuracy for all proficiency areas compared to a method employ-ing conventional acoustic models.
KSP Keywords
English sentence, Linguistic information, Proficiency evaluation, Signal characteristics, Speech Signals, Speech recognition accuracy, Speech recognition system, Speech translation, acoustic model, conversion model, native speakers
This work is distributed under the term of Creative Commons License (CCL)
(CC BY)
CC BY