ETRI Knowledge Sharing Platform : On Analysis of Clipped Critic Loss in Proximal Policy Optimization

Titles

논문 검색
Type		SCI
Year	~	Keyword

List

Journal Article On Analysis of Clipped Critic Loss in Proximal Policy Optimization

Cited 0 time in scopus

Download 118 time Share share

Abstract: Proximal policy optimization (PPO) is a widely used reinforcement learning algorithm valued for its robustness and sample efficiency. Its success is often attributed to the actor's clipped loss, which keeps policy updates within a trust region. In contrast, the critic's clipped loss has received relatively little attention, leaving its consistency with the trust-region principle unclear. To bridge this gap, we analyze the critic's clipped loss, show its misalignment, and propose a refined loss that enforces trust-region compliance by construction. Experiments on continuous-control tasks confirm that the proposed method improves adherence to the trust region.

KSP Keywords: Policy optimization, Reinforcement learning(RL), Reinforcement learning algorithm, artificial intelligence, trust region

This work is distributed under the term of Creative Commons License (CCL)
(CC BY NC ND)

218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, KOREA, Contact: sh.kim@etri.re.kr

Please refrain from automatic collection of e-mail addresses posted on this homepage.