ETRI Knowledge Sharing Platform : Coupling Vision and Proprioception for Sample-Efficient, Object-Occlusion-Robust Robotic Manipulation

Titles

논문 검색
Type		SCI
Year	~	Keyword

List

Conference Paper Coupling Vision and Proprioception for Sample-Efficient, Object-Occlusion-Robust Robotic Manipulation

Cited - time in scopus

Citation: International Conference on Computer Vision Workshops (ICCVW) 2023, pp.1-9

Abstract: Recent advances in visual reinforcement learning (visual RL), which learns directly from image pixels, have bridged the gap between state-based and image-based training. These advances have been achieved through the effective combination of traditional RL algorithms with latent representation learning. However, visual RL still struggles with solving robotic manipulation tasks involving object occlusions; for example, lifting an object occluded by an obstacle. In this paper, we propose a multimodal RL method that is sample-efficient and robust to object occlusions by effectively coupling two different types of raw sensory data: vision and proprioception. Our method is able to jointly learn multimodal latent representations and a policy in a fully end-to-end manner from vision and proprioception without the need for pre-training. We show that our method outperforms both current state-of-the-art visual RL and statebased RL methods on robotic manipulation tasks involving object occlusions in terms of sample efficiency and task performance, without any prior knowledge such as pre-defined coordinate states or pre-trained representations.

KSP Keywords: Current state, End to End(E2E), Image-based, Latent representations, Pre-Training, Reinforcement learning(RL), Representation learning, Robotic Manipulation, image pixels, need for, prior knowledge

218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, KOREA, Contact: sh.kim@etri.re.kr

Please refrain from automatic collection of e-mail addresses posted on this homepage.