ETRI Knowledge Sharing Platform : Uncertainty-Driven Pessimistic Q-Ensemble for Offline-to-Online Reinforcement Learning

Titles

논문 검색
Type		SCI
Year	~	Keyword

List

Conference Paper Uncertainty-Driven Pessimistic Q-Ensemble for Offline-to-Online Reinforcement Learning

Cited - time in scopus

Citation: Conference on Neural Information Processing Systems (NeurIPS) 2022 : Workshop, pp.1-5

Abstract: Re-using existing offline reinforcement learning (RL) agents is an emerging topic for reducing the dominant computational cost for exploration in many settings. To effectively fine-tune the pre-trained offline policies, both offline samples and online interactions may be leveraged. In this paper, we propose the idea of incorporating a pessimistic Q-ensemble and an uncertainty quantification technique to effectively fine-tune offline agents. To stabilize online Q-function estimates during fine-tuning, the proposed method uses uncertainty estimation as a penalization for a replay buffer with a mixture of online interactions from the ensemble agent and offline samples from the behavioral policies. In various robotic tasks on D4RL benchmark, we show that our method outperforms the state-of-the-art algorithms in terms of the average return and the sample efficiency.

KSP Keywords: Fine-tuning, Q-Function, Reinforcement learning(RL), Uncertainty Quantification, computational cost, online interactions, state-of-The-Art, uncertainty estimation

218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, KOREA, Contact: sh.kim@etri.re.kr

Please refrain from automatic collection of e-mail addresses posted on this homepage.