ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Journal Article Multimodal AudioVisual Speech Recognition Architecture Using a Three-Feature Multi Fusion Method for Noise-Robust Systems
Cited 1 time in scopus Download 90 time Share share facebook twitter linkedin kakaostory
Authors
Sanghun Jeon, Jieun Lee, Dohyeon Yeo, Yong-Ju Lee, SeungJun Kim
Issue Date
2024-02
Citation
ETRI Journal, v.46, no.1, pp.22-34
ISSN
1225-6463
Publisher
한국전자통신연구원
Language
English
Type
Journal Article
DOI
https://dx.doi.org/10.4218/etrij.2023-0266
Project Code
23HS4400, Development of AI Autonomy and Knowledge Enhancement for AI Agent Collaboration, Lee Yong-Ju
Abstract
Exposure to varied noisy environments impairs the recognition performance of artificial intelligence‐based speech recognition technologies. Degraded‐performance services can be utilized as limited systems that assure good performance in certain environments, but impair the general quality of speech recognition services. This study introduces an audiovisual speech recognition (AVSR) model robust to various noise settings, mimicking human dialogue recognition elements. The model converts word embeddings and log‐Mel spectrograms into feature vectors for audio recognition. A dense spatial–temporal convolutional neural network model extracts features from log‐Mel spectrograms, transformed for visual‐based recognition. This approach exhibits improved aural and visual recognition capabilities. We assess the signal‐to‐noise ratio in nine synthesized noise environments, with the proposed model exhibiting lower average error rates. The error rate for the AVSR model using a three‐feature multi‐fusion method is 1.711%, compared to the general 3.939% rate. This model is applicable in noise‐affected environments owing to its enhanced stability and recognition rate.
KSP Keywords
Audio recognition, Audiovisual speech recognition, Average error, Convolution neural network(CNN), Feature Vector, Proposed model, Recognition rate, Robust systems, Word Embedding, artificial intelligence, enhanced stability
This work is distributed under the term of Korea Open Government License (KOGL)
(Type 4: : Type 1 + Commercial Use Prohibition+Change Prohibition)
Type 4: