ETRI Knowledge Sharing Platform : Beyond-Labels: Advancing Open-Vocabulary Segmentation With Vision-Language Models

Titles

논문 검색
Type		SCI
Year	~	Keyword

List

Conference Paper Beyond-Labels: Advancing Open-Vocabulary Segmentation With Vision-Language Models

Cited 1 time in scopus

Authors: Muhammad Atta ur Rahman, Dooseop Choi, Seung-Ik Lee, KyoungWook Min

Citation: International Conference on Advanced Computational Intelligence (ICACI) 2025, pp.231-236

Abstract: Open-vocabulary semantic segmentation attempts to classify and outline objects in an image using arbitrary text labels, including those unseen during training. Self-supervised learning resolves numerous visual and linguistic processing problems when effectively trained. This study investigates simple yet efficient methods for adapting previously learned foundation models for open-vocabulary semantic segmentation tasks. Our research proposes 'Beyond-Labels,' a lightweight transformer-based fusion module that uses a handful of image segmentation data to fuse frozen visual representations with language concepts. This strategy allows the model to successfully actualize enormous knowledge from pre-trained models without requiring extensive retraining, making the model data-efficient and scalable. Furthermore, we efficiently capture positional information in images using Fourier embeddings, thus improving the generalization and resulting in smooth and consistent spatial encoding. We perform thorough ablation studies to investigate the major components of our proposed method in comparison to the standard benchmark PASCAL-5i, the method performs better despite being trained on frozen vision and language characteristics.

KSP Keywords: Language Model, Language characteristics, Linguistic processing, Model Data, Semantic segmentation, Spatial encoding, Visual representations, image segmentation, positional information, self-supervised learning, transformer-based

218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, KOREA, Contact: sh.kim@etri.re.kr

Please refrain from automatic collection of e-mail addresses posted on this homepage.