ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper Beyond-Labels: Advancing Open-Vocabulary Segmentation With Vision-Language Models
Cited 1 time in scopus Share share facebook twitter linkedin kakaostory
Authors
Muhammad Atta ur Rahman, Dooseop Choi, Seung-Ik Lee, KyoungWook Min
Issue Date
2025-07
Citation
International Conference on Advanced Computational Intelligence (ICACI) 2025, pp.231-236
Language
English
Type
Conference Paper
DOI
https://dx.doi.org/10.1109/ICACI65340.2025.11096230
Abstract
Open-vocabulary semantic segmentation attempts to classify and outline objects in an image using arbitrary text labels, including those unseen during training. Self-supervised learning resolves numerous visual and linguistic processing problems when effectively trained. This study investigates simple yet efficient methods for adapting previously learned foundation models for open-vocabulary semantic segmentation tasks. Our research proposes 'Beyond-Labels,' a lightweight transformer-based fusion module that uses a handful of image segmentation data to fuse frozen visual representations with language concepts. This strategy allows the model to successfully actualize enormous knowledge from pre-trained models without requiring extensive retraining, making the model data-efficient and scalable. Furthermore, we efficiently capture positional information in images using Fourier embeddings, thus improving the generalization and resulting in smooth and consistent spatial encoding. We perform thorough ablation studies to investigate the major components of our proposed method in comparison to the standard benchmark PASCAL-5i, the method performs better despite being trained on frozen vision and language characteristics.
KSP Keywords
Language Model, Language characteristics, Linguistic processing, Model Data, Semantic segmentation, Spatial encoding, Visual representations, image segmentation, positional information, self-supervised learning, transformer-based