ETRI Knowledge Sharing Platform : An Efficient Policy Improvement in Human Interactive Learning Using Entropy

BROWSE

Titles

논문 검색
Type		SCI
Year	~	Keyword

Detail

List

Conference Paper An Efficient Policy Improvement in Human Interactive Learning Using Entropy

Cited 0 time in scopus

Authors: Sung-Yun Park, Dae-Wook Kim, Sang-Kwang Lee, Seong-Il Yang

Issue Date: 2021-10

Citation: International Conference on Information and Communication Technology Convergence (ICTC) 2021, pp.283-286

Publisher: IEEE

Language: English

Type: Conference Paper

DOI: https://dx.doi.org/10.1109/ICTC52510.2021.9620856

Abstract: Human knowledge is used in reinforcement learning (RL), which reduces the amount of time taken by the learning agent to achieve its goal. The TAMER (Training an Agent Manually via Evaluative Reinforcements) algorithm allows a human to provide a reward to an autonomous agent through a manual interface while watching the agent performs the action. Because a policy, the agent have, is updated based on human rewards, it approximates how a human trainer gives rewards to the agent. For policy update, events that occurred during learning were selected. Furthermore, while selecting events, the temporal distance from the event to the human reward is considered. Thus, the events that only occurred in a certain time interval before the human trainer gives a reward are selected. However, this approach of considering only the time factor demands quite many human rewards for the policy. The policy update with high complexity make the human trainer exhausted during improvement of policy. Therefore, we propose a new method of selecting events, which considers the entropy value over the distribution of Q-values, in addition to the time factor. For the policy update in our proposed event selection method, we reuse the events despite of long temporal distance since human reward when their each human reward is negative and entropy value (over the distribution of Q-values) is low. To compare the effectiveness of the proposed method with the classic TAMER, we implement an experiment with the policy initialized to an incorrect weight. The results show that the TAMER algorithm, using our proposed selection of events, efficiently improves the policy.

KSP Keywords: Entropy value, Event selection, Human knowledge, Interactive Learning, Policy update, Q-value, Reinforcement learning(RL), Selection method, Time factor, Time interval, autonomous agents

ETRI-Knowledge Sharing Plaform

BROWSE

Titles

Detail

ETRI