ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper Focus Where It Matters: LLM-Guided Regional Identification for Instruction-based Image Editing
Cited 0 time in scopus Download 511 time Share share facebook twitter linkedin kakaostory
Authors
Minho Park, Young Joo Jo, Jae-Hyeok Lee, Ji Yong Lee, Dong-oh Kang, Yong Man Ro
Issue Date
2025-10
Citation
International Conference on Multimedia (MM) 2025, pp.9277-9286
Language
English
Type
Conference Paper
DOI
https://dx.doi.org/10.1145/3746027.3754711
Abstract
Instruction-based image editing enables intuitive modifications of images through natural language descriptions. However, existing models often struggle to accurately identify the target region, which refers to the area that should be modified. As a result, unintended changes may occur in non-target areas, where the original image should remain unchanged. To address this issue, we propose FoRE, an MLLM-guided framework that identifies the target region based on the given edit instruction and performs image editing using region-aware embeddings. Within FoRE, the Region-guided Edit Adapter projects these embeddings from the MLLM domain to the diffusion condition space. Subsequently, the Region-guided Refinement Module refines the projected features to enhance spatial accuracy prior to guiding the diffusion process. Through comprehensive evaluations, we demonstrate that FoRE significantly improves localization accuracy and instruction fidelity compared to existing approaches. By explicitly incorporating region-aware conditioning, our framework effectively bridges the gap between instruction comprehension and spatially precise image modifications, advancing the capabilities of instruction-based image editing.
KSP Keywords
Diffusion process, Existing Approaches, Image editing, Localization Accuracy, Natural language, Region-based, spatial accuracy
This work is distributed under the term of Creative Commons License (CCL)
(CC BY)
CC BY