ETRI-Knowledge Sharing Plaform

ENGLISH

성과물

논문 검색
구분 SCI
연도 ~ 키워드

상세정보

학술지 QSOD: Hybrid Policy Gradient for Deep Multi-agent Reinforcement Learning
Cited 6 time in scopus Download 95 time Share share facebook twitter linkedin kakaostory
저자
Hafiz Muhammad Raza Ur Rehman, 온병원, Devarani Devi Ningombam, 이성원, 최규상
발행일
202109
출처
IEEE Access, v.9, pp.129728-129741
ISSN
2169-3536
출판사
IEEE
DOI
https://dx.doi.org/10.1109/ACCESS.2021.3113350
초록
When individuals interact with one another to accomplish specific goals, they learn from others' experiences to achieve the tasks at hand. The same holds for learning in virtual environments, such as video games. Deep multiagent reinforcement learning shows promising results in terms of completing many challenging tasks. To demonstrate its viability, most algorithms use value decomposition for multiple agents. To guide each agent, behavior value decomposition is utilized to decompose the combined Q-value of the agents into individual agent Q-values. A different mixing method can be utilized, using a monotonicity assumption based on value decomposition algorithms such as QMIX and QVMix. However, this method selects individual agent actions through a greedy policy. The agents, which require large numbers of training trials, are not addressed. In this paper, we propose a novel hybrid policy for the action selection of an individual agent known as Q-value Selection using Optimization and DRL (QSOD). A grey wolf optimizer (GWO) is used to determine the choice of individuals' actions. As in GWO, there is proper attention among the agents facilitated through the agents' coordination with one another. We used the StarCraft 2 Learning Environment to compare our proposed algorithm with the state-of-the-art algorithms QMIX and QVMix. Experimental results demonstrate that our algorithm outperforms QMIX and QVMix in all scenarios and requires fewer training trials.
KSP 제안 키워드
Decomposition algorithm, Large numbers, Learning Environment, Learning in virtual environments, Multiple Agents, Policy Gradient, Reinforcement Learning(RL), StarCraft 2, action selection, grey Wolf optimizer, mixing method
본 저작물은 크리에이티브 커먼즈 저작자 표시 (CC BY) 조건에 따라 이용할 수 있습니다.
저작자 표시 (CC BY)