ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Journal Article QSOD: Hybrid Policy Gradient for Deep Multi-agent Reinforcement Learning
Cited 8 time in scopus Download 153 time Share share facebook twitter linkedin kakaostory
Authors
Hafiz Muhammad Raza Ur Rehman, Byung-Won On, Devarani Devi Ningombam, Sungwon Y, Gyu Sang Choi
Issue Date
2021-09
Citation
IEEE Access, v.9, pp.129728-129741
ISSN
2169-3536
Publisher
IEEE
Language
English
Type
Journal Article
DOI
https://dx.doi.org/10.1109/ACCESS.2021.3113350
Abstract
When individuals interact with one another to accomplish specific goals, they learn from others' experiences to achieve the tasks at hand. The same holds for learning in virtual environments, such as video games. Deep multiagent reinforcement learning shows promising results in terms of completing many challenging tasks. To demonstrate its viability, most algorithms use value decomposition for multiple agents. To guide each agent, behavior value decomposition is utilized to decompose the combined Q-value of the agents into individual agent Q-values. A different mixing method can be utilized, using a monotonicity assumption based on value decomposition algorithms such as QMIX and QVMix. However, this method selects individual agent actions through a greedy policy. The agents, which require large numbers of training trials, are not addressed. In this paper, we propose a novel hybrid policy for the action selection of an individual agent known as Q-value Selection using Optimization and DRL (QSOD). A grey wolf optimizer (GWO) is used to determine the choice of individuals' actions. As in GWO, there is proper attention among the agents facilitated through the agents' coordination with one another. We used the StarCraft 2 Learning Environment to compare our proposed algorithm with the state-of-the-art algorithms QMIX and QVMix. Experimental results demonstrate that our algorithm outperforms QMIX and QVMix in all scenarios and requires fewer training trials.
KSP Keywords
Decomposition algorithm, Large numbers, Learning Environment, Learning in virtual environments, Multiple Agents, Policy Gradient, Reinforcement Learning(RL), StarCraft 2, action selection, grey Wolf optimizer, mixing method
This work is distributed under the term of Creative Commons License (CCL)
(CC BY)
CC BY