ETRI Knowledge Sharing Platform : Multimodal Unsupervised Speech Translation for Recognizing and Evaluating Second Language Speech

BROWSE

Titles

논문 검색
Type		SCI
Year	~	Keyword

Detail

List

Journal Article Multimodal Unsupervised Speech Translation for Recognizing and Evaluating Second Language Speech

Cited 6 time in scopus

Download 245 time Share share

Authors: Yun Kyung Lee, Jeon Gue Park

Issue Date: 2021-03

Citation: Applied Sciences, v.11, no.6, pp.1-17

ISSN: 2076-3417

Publisher: MDPI

Language: English

Type: Journal Article

DOI: https://dx.doi.org/10.3390/app11062642

Abstract: This paper addresses an automatic proficiency evaluation and speech recognition for second language (L2) speech. The proposed method recognizes the speech uttered by the L2 speaker, measures a variety of fluency scores, and evaluates the proficiency of the speaker's spoken English. Stress and rhythm scores are one of the important factors used to evaluate fluency in spoken English and are computed by comparing the stress patterns and the rhythm distributions to those of native speakers. In order to compute the stress and rhythm scores even when the phonemic sequence of the L2 speaker's English sentence is different from the native speaker's one, we align the phonemic sequences based on a dynamic time?릛arping approach. We also improve the performance of the speech recognition system for non?릒ative speakers and compute fluency features more accurately by augmenting the non?릒ative training dataset and training an acoustic model with the augmented dataset. In this work, we augment the non?릒ative speech by converting some speech signal characteristics (style) while preserving its linguistic information. The proposed variational autoencoder (VAE)?륿ased speech conversion network trains the conversion model by decomposing the spectral features of the speech into a speaker?릋nvariant content factor and a speaker?릗pecific style factor to estimate diverse and robust speech styles. Experimental results show that the proposed method effectively measures the fluency scores and generates diverse output signals. Also, in the proficiency evaluation and speech recognition tests, the proposed method improves the proficiency score performance and speech recognition accuracy for all proficiency areas compared to a method employ-ing conventional acoustic models.

KSP Keywords: English sentence, Linguistic information, Proficiency evaluation, Signal characteristics, Speech Signals, Speech recognition accuracy, Speech recognition system, Speech translation, acoustic model, conversion model, native speakers

This work is distributed under the term of Creative Commons License (CCL)
(CC BY)

ETRI-Knowledge Sharing Plaform

BROWSE

Titles

Detail

ETRI