ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper Single-anchored Multi-modal Dense Video Captioning for Esports Broadcasts Commentaries
Cited 0 time in scopus Download 413 time Share share facebook twitter linkedin kakaostory
Authors
Ari Yu, Jinwoo Hyun, Hyeong-Gyu Jang, Sung-Yun Park, Sang-Kwang Lee
Issue Date
2025-10
Citation
International ACM Workshop on Multimedia Content Analysis in Sports (MMSports) 2025, pp.31-38
Language
English
Type
Conference Paper
DOI
https://dx.doi.org/10.1145/3728423.3759412
Abstract
The popularity and industrial scale of esports broadcasting are expanding rapidly. However, research on automated commentary generation tailored to esports remains in its early stages compared to traditional sports such as soccer and baseball. Processing esports videos directly with existing dense video captioning models is challenging due to the integration of real-time scoreboards and complex game mechanics. To address this challenge, we present a model optimized for esports commentary generation. We constructed and analyzed the esports broadcast commentaries dataset, which comprises 703 matches and 25,262 timestamped commentaries from the 2022 and 2023 League of Legends Champions Korea regular seasons. Based on this analysis, we propose a two-stage framework, termed the Single-anchored Multi-modal Dense Video Captioning model. In the first stage, a spotting sub-model detects game events by processing scoreboard time-series data using optical character recognition at 1 frame per second. Through rule-based noise correction, this stage achieves high temporal precision, attaining an F1-score of 99.7%. In the second stage, a captioning sub-model pools visual features centered on the detected anchor and generates fluent, contextually relevant commentary using an LSTM-based decoder. Experimental results demonstrate that the proposed model outperforms existing esports captioning baselines. Furthermore, qualitative evaluations indicate that the model effectively captures dynamic in-game scenes, producing commentary closely aligned with that of professional broadcasts. Therefore, our model holds significant practical potential as an automated commentary solution that can be readily deployed in amateur matches without human commentators.
KSP Keywords
Early stages, F1-score, First stage, Industrial Scale, League of Legends, Multi-modal, Noise correction, Optical character Recognition, Proposed model, Real-time, Rule-based
This work is distributed under the term of Creative Commons License (CCL)
(CC BY NC ND)
CC BY NC ND