ETRI-Knowledge Sharing Plaform



논문 검색
구분 SCI
연도 ~ 키워드


학술대회 Development of Recognition System Using Fusion of Natural Gesture/Speech
Cited 3 time in scopus Download 0 time Share share facebook twitter linkedin kakaostory
정영규, 한문성, 박준석, 이상조
International Conference on Consumer Electronics (ICCE) 2008, pp.1-2
07MH1900, 웨어러블 퍼스널 스테이션 개발, 한동원
A multimodal interface can achieve more natural and effective human-computer interaction. In this paper, we present an isolated-word recognizer using a fusion of speech and natural visual gestures. The fusion of audio and visual signals can be carried out either at the class level or the feature level. Our system incorporates a fusion system at the feature level which supports 10 natural gestures. One of most difficult problems in feature level fusion is synchronization between audio and visual features. To solve this problem, we propose a modified Time Delay Neural Network (TDNN) architecture with a dedicated fusion layer and optimize parameters of this recognition model. Experimental results show that this system yields a performance improvement when compared to the performance of Automatic Speech Recognition (ASR) under various Signal-to-Noise Rate (SNR) conditions. ©2008 IEEE.
KSP 제안 키워드
Fusion layer, Multimodal interface, Recognition System, Recognition model, Signal-to-Noise, Time delay neural network, Visual features, Visual signals, automatic speech recognition(ASR), class level, feature level fusion