ETRI Knowledge Sharing Platform : SA-MARL: Novel Self-Attention-Based Multi-Agent Reinforcement Learning With Stochastic Gradient Descent

BROWSE

Titles

논문 검색
Type		SCI
Year	~	Keyword

Detail

List

Journal Article SA-MARL: Novel Self-Attention-Based Multi-Agent Reinforcement Learning With Stochastic Gradient Descent

Cited 2 time in scopus

Download 260 time Share share

Authors: Rabbiya Younas, Hafiz Muhammad Raza Ur Rehman, Ingyu Lee, Byung-Won On, Sungwon Yi, Gyu Sang Choi

Issue Date: 2025-02

Citation: IEEE Access, v.13, pp.35674-35687

ISSN: 2169-3536

Publisher: Institute of Electrical and Electronics Engineers Inc.

Language: English

Type: Journal Article

DOI: https://dx.doi.org/10.1109/ACCESS.2025.3544961

Abstract: In the rapidly advancing Reinforcement Learning (RL) field, Multi-Agent Reinforcement Learning (MARL) has emerged as a key player in solving complex real-world challenges. A pivotal development in this realm is the introduction of the mixing network, representing a significant leap forward in the capabilities of multi-agent systems. Drawing inspiration from COMA and VDN methodologies, the mixing network overcomes limitations in extracting combined Q-values from joint state-action interactions. Previous approaches like COMA and VDN faced constraints in fully utilizing the state-provided information during training, limiting their effectiveness. QMIX and QVMinMax addressed this issue by employing neural networks to convert centralized states into weights for a second neural network, akin to hyper- networks. However, these solutions presented challenges such as computational intensity and susceptibility to local minima. To overcome these hurdles, our proposed methodology introduces three key contributions. First, we introduce the state- fusion network, an innovative alternative to traditional mixing, with a self-attention mechanism. Second, to address the local optima problem in MARL algorithms, we leverage the Grey Wolf Optimizer for weight and bias selection, adding a stochastic element for improved optimization. Finally, we comprehensively compare with QMIX, evaluating performance under two optimization methods: Gradient Descent and Stochastic Optimizer. Using the StarCraft II Learning Environment (SC2LE) as our experimental platform, our results demonstrate the superiority of our methodology over QMIX, QVMinMax, and QSOD in absolute performance, particularly when operating under resource constraints. Our proposed methodology contributes to the ongoing evolution of MARL techniques, showcasing advancements in attention mechanisms and optimization strategies for enhanced multi-agent system capabilities.

KSP Keywords: Attention mechanism, Grey Wolf optimizer, Local minima, Multi-agent system(MAS), Optimization methods, Optimization strategies, Q-value, Real-world, Reinforcement learning(RL), StarCraft II, Stochastic Gradient Descent

This work is distributed under the term of Creative Commons License (CCL)
(CC BY)

ETRI-Knowledge Sharing Plaform

BROWSE

Titles

Detail

ETRI