ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper Toward Effective Deep Reinforcement Learning for 3D Robotic Manipulation: Multimodal End-to-End Reinforcement Learning from Visual and Proprioceptive Feedback
Cited - time in scopus Share share facebook twitter linkedin kakaostory
Authors
Samyeul Noh, Hyun Myung
Issue Date
2022-12
Citation
Conference on Neural Information Processing Systems (NeurIPS) 2022, pp.1-18
Language
English
Type
Conference Paper
Abstract
Sample-efficient reinforcement learning (RL) methods capable of learning directly from raw sensory data without the use of human-crafted representations would open up real-world applications in robotics and control. Recent advances in visual RL have shown that learning a latent representation together with existing RL algorithms closes the gap between state-based and image-based training. However, image-based training is still significantly sample-inefficient with respect to learning in 3D continuous control problems (for example, robotic manipulation) compared to state-based training. In this study, we propose an effective model-free off-policy RL method for 3D robotic manipulation that can be trained in an end-to-end manner from multimodal raw sensory data obtained from a vision camera and a robot’s joint encoders, without the need for human-crafted representations. Notably, our method is capable of learning a latent multimodal representation and a policy in an efficient, joint, and end-to-end manner from multimodal raw sensory data. Our method, which we dub MERL: Multimodal End-to-end Reinforcement Learning, results in a simple but effective approach capable of significantly outperforming both current state-of-the-art visual RL and state-based RL methods with respect to sample efficiency, learning performance, and training stability in relation to 3D robotic manipulation tasks from DeepMind Control.
KSP Keywords
Continuous control, Control problems, Current state, Deep reinforcement learning, Effective model, End to End(E2E), Image-based, Learning performance, Model-free, Multimodal representation, Real-world applications