ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper High-level Visual Representation via Perceptual Representation Learning
Cited 0 time in scopus Share share facebook twitter linkedin kakaostory
Authors
Donghun Lee, Samyeul Noh, Ingook Jang, Seonghyun Kim, Heechul Bae
Issue Date
2023-10
Citation
International Conference on Information and Communication Technology Convergence (ICTC) 2023, pp.1793-1795
Publisher
IEEE
Language
English
Type
Conference Paper
DOI
https://dx.doi.org/10.1109/ICTC58733.2023.10393558
Abstract
Recent advancements in the field of representation learning and video prediction have demonstrated the potential for enhancing manipulation and control strategies across various applications through precise anticipation of future states. Nevertheless, the intricate dynamic nature inherent in real-world data poses a formidable challenge in acquiring these representations. Autoregressive models, which employ the generated future frame as input for the subsequent frame prediction, suffer from issues such as compounding errors, memory overload, and extended training times due to the need for reconstructing the state from the latent vector in each iteration. To address these limitations, recent studies have introduced the concept of State Space Models (SSMs) to forecast from the latent space, offering the advantage of predicting distant future states. However, these methodologies often exhibit restricted capabilities in extracting object-centric representations. More recent object-centric approaches concentrate on closely associated features from the input data, yet their ability to capture higher-level representations remains constrained. In this paper, we propose integrating a perceptual network into the slot attention mechanism to facilitate the extraction and segregation of high-level representations. Leveraging a pre-trained perceptual network, we derive elevated object-oriented representations for each perceptual layer, aligning them with corresponding slots. This elevated representation, rich in object-centric information, holds the potential to enhance comprehension of the present state and provide valuable guidance for accurate future state prediction.
KSP Keywords
Attention mechanism, Autoregressive models, Control strategy, Latent space, Object-centric, Real-world data, Representation learning, State Prediction, Video prediction, Visual Representation, dynamic nature