ETRI Knowledge Sharing Platform : Multimodal audiovisual speech recognition architecture using a three‐feature multi‐fusion method for noise‐robust systems

Titles

논문 검색
Type		SCI
Year	~	Keyword

List

Journal Article Multimodal audiovisual speech recognition architecture using a three‐feature multi‐fusion method for noise‐robust systems

Cited 4 time in scopus

Download 294 time Share share

Abstract: Exposure to varied noisy environments impairs the recognition performance of artificial intelligence‐based speech recognition technologies. Degraded‐performance services can be utilized as limited systems that assure good performance in certain environments, but impair the general quality of speech recognition services. This study introduces an audiovisual speech recognition (AVSR) model robust to various noise settings, mimicking human dialogue recognition elements. The model converts word embeddings and log‐Mel spectrograms into feature vectors for audio recognition. A dense spatial–temporal convolutional neural network model extracts features from log‐Mel spectrograms, transformed for visual‐based recognition. This approach exhibits improved aural and visual recognition capabilities. We assess the signal‐to‐noise ratio in nine synthesized noise environments, with the proposed model exhibiting lower average error rates. The error rate for the AVSR model using a three‐feature multi‐fusion method is 1.711%, compared to the general 3.939% rate. This model is applicable in noise‐affected environments owing to its enhanced stability and recognition rate.

KSP Keywords: Audio recognition, Audiovisual speech recognition, Average error, Convolution neural network(CNN), Feature Vector, Fusion method, Neural network model, Proposed model, Recognition Rate, Recognition performance, Robust systems

This work is distributed under the term of Korea Open Government License (KOGL)
(Type 4: : Type 1 + Commercial Use Prohibition+Change Prohibition)

218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, KOREA, Contact: sh.kim@etri.re.kr

Please refrain from automatic collection of e-mail addresses posted on this homepage.