ETRI Knowledge Sharing Platform : [작업중] Stochastic Policy Optimization with Heuristic Information for Robot Learning

Titles

논문 검색
Type		SCI
Year	~	Keyword

List

Conference Paper [작업중] Stochastic Policy Optimization with Heuristic Information for Robot Learning

Cited 0 time in scopus

Abstract: Stochastic policy-based deep reinforcement learning (RL) approaches have remarkably succeeded to deal with continuous control tasks. However, applying these methods to manipulation tasks remains a challenge since actuators of a robot manipulator require high dimensional continuous action spaces. In this paper, we propose exploration-bounded exploration actor-critic (EBE-AC), a novel deep RL approach to combine stochastic policy optimization with interpretable human knowledge. The human knowledge is defined as heuristic information based on both physical relationships between a robot and objects and binary signals of whether the robot has achieved certain states. The proposed approach, EBE-AC, combines an off-policy actor-critic algorithm with an entropy maximization based on the heuristic information. On a robotic manipulation task, we demonstrate that EBE-AC outperforms prior state-of-the-art off-policy actor-critic deep RL algorithms in terms of sample efficiency. In addition, we found that EBE-AC can be easily combined with latent information, where EBE-AC with latent information further improved sample efficiency and robustness.

KSP Keywords: Actor-critic algorithm, Continuous action, Continuous control, Deep reinforcement learning, Entropy Maximization, Heuristic information, High-dimensional, Human knowledge, Policy optimization, Reinforcement learning(RL), Robot Learning

218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, KOREA, Contact: sh.kim@etri.re.kr

Please refrain from automatic collection of e-mail addresses posted on this homepage.