ETRI-Knowledge Sharing Plaform

ENGLISH

성과물

논문 검색
구분 SCI
연도 ~ 키워드

상세정보

학술지 SMaTE: A Segment-Level Feature Mixing and Temporal Encoding Framework for Facial Expression Recognition
Cited 1 time in scopus Download 88 time Share share facebook twitter linkedin kakaostory
저자
김나연, 조숙희, 배병준
발행일
202208
출처
Sensors, v.22 no.15, pp.1-19
ISSN
1424-8220
출판사
MDPI
DOI
https://dx.doi.org/10.3390/s22155753
협약과제
22HH6500, SD/HD급 저화질 미디어의 고품질 변환 기술 개발, 조숙희
초록
Despite advanced machine learning methods, the implementation of emotion recognition systems based on real-world video content remains challenging. Videos may contain data such as images, audio, and text. However, the application of multimodal models using two or more types of data to real-world video media (CCTV, illegally filmed content, etc.) lacking sound or subtitles is difficult. Although facial expressions in image sequences can be utilized in emotion recognition, the diverse identities of individuals in real-world content limits computational models of relationships between facial expressions. This study proposed a transformation model which employed a video vision transformer to focus on facial expression sequences in videos. It effectively understood and extracted facial expression information from the identities of individuals, instead of fusing multimodal models. The design entailed capture of higher-quality facial expression information through mixed-token embedding facial expression sequences augmented via various methods into a single data representation, and comprised two modules: spatial and temporal encoders. Further, temporal position embedding, focusing on relationships between video frames, was proposed and subsequently applied to the temporal encoder module. The performance of the proposed algorithm was compared with that of conventional methods on two emotion recognition datasets of video content, with results demonstrating its superiority.
KSP 제안 키워드
Computational Model, Conventional methods, Data representation, Emotion recognition, Facial Expression Recognition(FER), Image sequence, Machine Learning Methods, Real-world, Spatial and temporal, Temporal Encoding, Video contents
본 저작물은 크리에이티브 커먼즈 저작자 표시 (CC BY) 조건에 따라 이용할 수 있습니다.
저작자 표시 (CC BY)