ETRI Knowledge Sharing Platform : Development of Recognition System Using Fusion of Natural Gesture/Speech

Titles

논문 검색
Type		SCI
Year	~	Keyword

List

Conference Paper Development of Recognition System Using Fusion of Natural Gesture/Speech

Cited 3 time in scopus

Citation: International Conference on Consumer Electronics (ICCE) 2008, pp.1-2

Abstract: A multimodal interface can achieve more natural and effective human-computer interaction. In this paper, we present an isolated-word recognizer using a fusion of speech and natural visual gestures. The fusion of audio and visual signals can be carried out either at the class level or the feature level. Our system incorporates a fusion system at the feature level which supports 10 natural gestures. One of most difficult problems in feature level fusion is synchronization between audio and visual features. To solve this problem, we propose a modified Time Delay Neural Network (TDNN) architecture with a dedicated fusion layer and optimize parameters of this recognition model. Experimental results show that this system yields a performance improvement when compared to the performance of Automatic Speech Recognition (ASR) under various Signal-to-Noise Rate (SNR) conditions. ©2008 IEEE.

KSP Keywords: Feature Level Fusion, Fusion layer, Fusion system, Human computer interaction, Multimodal interface, Recognition model, Recognition system, Signal-to-Noise, Time delay neural network, Visual Features, Visual signals

218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, KOREA, Contact: sh.kim@etri.re.kr

Please refrain from automatic collection of e-mail addresses posted on this homepage.