ETRI-Knowledge Sharing Plaform



논문 검색
구분 SCI
연도 ~ 키워드


학술대회 Learning Multi-modal Attentional Consensus in Action Recognition
Cited 1 time in scopus Download 6 time Share share facebook twitter linkedin kakaostory
김형민, 김도형, 김재홍
International Conference on Ubiquitous Robots (UR) 2021, pp.308-313
21HS1500, 고령 사회에 대응하기 위한 실환경 휴먼케어 로봇 기술 개발, 이재연
This paper addresses a practical action recognition method for elderly-care robots. Multi-stream based models are one of the promising approaches for solving the complexity of real-world environments. While multi-modal action recognition have been actively studied, there is a lack of research on models that effectively combine features of different modalities. This paper proposes a new mid-level feature fusion method for two-stream based action recognition network. In multi-modal approaches, extracting complementary information between different modalities is an essential task. Our network model is designed to fuse features at an intermediate level of feature extraction, which leverages a whole feature map from each modality. Consensus feature map and consensus attention mechanism are proposed as effective ways to extract information from two different modalities: RGB data and motion features. We also introduce ETRI-Activity3D-LivingLab, a real-world RGB-D dataset for robots to recognize daily activities of the elderly. It is the first 3D action recognition dataset obtained in a variety of home environments where the elderly actually reside. We expect our new dataset to contribute to the practical study of action recognition with the previously released ETRI-Activity3D dataset. To prove the effectiveness of the method, extensive experiments are performed on NTU RGB+D, ETRI-Activity3D and, ETRI-Activity3D-LivingLab dataset. Our mid-level fusion method achieves competitive performance in various experimental settings, especially for domain-changing situations.
KSP 제안 키워드
3D action recognition, Attention mechanism, Competitive performance, Daily activities, Feature Fusion, Feature Map, Feature extractioN, Multi-modal, Multi-stream, Network model, RGB-D