ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper Weakly Paired Associative Learning for Sound and Image Representations via Bimodal Associative Memory
Cited 5 time in scopus Share share facebook twitter linkedin kakaostory
Authors
Sangmin Lee, Hyung-Il Kim, Yong Man Ro
Issue Date
2022-06
Citation
Conference on Computer Vision and Pattern Recognition (CVPR) 2022, pp.10534-10543
Language
English
Type
Conference Paper
DOI
https://dx.doi.org/10.1109/CVPR52688.2022.01028
Abstract
Data representation learning without labels has attracted increasing attention due to its nature that does not require human annotation. Recently, representation learning has been extended to bimodal data, especially sound and image which are closely related to basic human senses. Existing sound and image representation learning methods necessarily require a large number of sound and image with corresponding pairs. Therefore, it is difficult to ensure the effectiveness of the methods in the weakly paired condition, which lacks paired bimodal data. In fact, according to human cognitive studies, the cognitive functions in the human brain for a certain modality can be enhanced by receiving other modalities, even not directly paired ones. Based on the observation, we propose a new problem to deal with the weakly paired condition: How to boost a certain modal representation even by using other unpaired modal data. To address the issue, we introduce a novel bimodal associative memory (BMA-Memory) with key-value switching. It enables to build sound-image association with small paired bimodal data and to boost the built association with the eas-ily obtainable large amount of unpaired data. Through the proposed associative learning, it is possible to reinforce the representation of a certain modality (e.g., sound) even by using other unpaired modal data (e.g., images).
KSP Keywords
Associative learning, Cognitive function, Data representation, Human annotation, Human senses, Key-Value, Learning methods, Modal data, Modal representation, Representation learning, associative memory