ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Journal Article Multimodal understanding with GPT-4o to enhance generalizable pedestrian behavior prediction
Cited 0 time in scopus Download 267 time Share share facebook twitter linkedin kakaostory
Authors
Je-Seok Ham, Jia Huang, Peng Jiang, Jinyoung Moon, Yongjin Kwon, Srikanth Saripalli, Changick Kim
Issue Date
2026-01
Citation
Computers and Electrical Engineering, v.129, no.Part A, pp.1-22
ISSN
0045-7906
Publisher
Elsevier
Language
English
Type
Journal Article
DOI
https://dx.doi.org/10.1016/j.compeleceng.2025.110741
Abstract
Pedestrian behavior prediction is one of the most critical tasks in urban driving scenarios, playing a key role in ensuring road safety. Traditional learning-based methods have relied on vision models for pedestrian behavior prediction. However, fully understanding pedestrians’ behaviors in advance is very challenging due to the complex driving environments and the multifaceted interactions between pedestrians and road elements. Additionally, these methods often show a limited understanding of driving environments not included in the training. The emergence of Multimodal Large Language Models (MLLMs) provides an innovative approach to addressing these challenges through advanced reasoning capabilities. This paper presents OmniPredict, the first study to apply GPT-4o(mni), a state-of-the-art MLLM, for pedestrian behavior prediction in urban driving scenarios. We assessed the model using the JAAD and WiDEVIEW datasets, which are widely used for pedestrian behavior analysis. Our method utilized multiple contextual modalities and achieved 67% accuracy in a zero-shot setting without any task-specific training, surpassing the performance of the latest MLLM baselines by 10%. Furthermore, when incorporating additional contextual information, the experimental results demonstrated a significant increase in prediction accuracy across four behavior types (crossing, occlusion, action, and look). We also validated the model s generalization ability by comparing its responses across various road environment scenarios. OmniPredict exhibits strong generalization capabilities, demonstrating robust decision-making in diverse and unseen driving rare scenarios. These findings highlight the potential of MLLMs to enhance pedestrian behavior prediction, paving the way for safer and more informed decision-making in road environments.
KSP Keywords
Behavior Prediction, Behavior analysis, Contextual information, Critical task, Innovative approach, Key role, Multimodal understanding, Prediction accuracy, Road safety, Robust decision-making, Task-specific
This work is distributed under the term of Creative Commons License (CCL)
(CC BY NC ND)
CC BY NC ND