ETRI-Knowledge Sharing Plaform

ENGLISH

성과물

논문 검색
구분 SCI
연도 ~ 키워드

상세정보

학술지 Improving Visual Relationship Detection using Linguistic and Spatial Cues
Cited 3 time in scopus Download 22 time Share share facebook twitter linkedin kakaostory
저자
정재원, 박종열
발행일
202006
출처
ETRI Journal, v.42 no.3, pp.399-410
ISSN
1225-6463
출판사
한국전자통신연구원 (ETRI)
DOI
https://dx.doi.org/10.4218/etrij.2019-0093
협약과제
19ZS1100, 자율성장형 AI 핵심원천기술 연구, 송화전
초록
Detecting visual relationships in an image is important in an image understanding task. It enables higher image understanding tasks, that is, predicting the next scene and understanding what occurs in an image. A visual relationship comprises of a subject, a predicate, and an object, and is related to visual, language, and spatial cues. The predicate explains the relationship between the subject and object and can be categorized into different categories such as prepositions and verbs. A large visual gap exists although the visual relationship is included in the same predicate. This study improves upon a previous study (that uses language cues using two losses) and a spatial cue (that only includes individual information) by adding relative information on the subject and object of the extant study. The architectural limitation is demonstrated and is overcome to detect all zero-shot visual relationships. A new problem is discovered, and an explanation of how it decreases performance is provided. The experiment is conducted on the VRD and VG datasets and a significant improvement over previous results is obtained.
키워드
deep learning, image retrieval, image understanding, predicate, visual relationship
KSP 제안 키워드
Image retrieval, Relative information, Spatial cue, Visual relationship detection, Zero-shot, deep learning(DL), image understanding
본 저작물은 공공누리 제4유형 : 출처표시 + 상업적 이용금지 + 변경금지 조건에 따라 이용할 수 있습니다.
제4유형