ETRI Knowledge Sharing Platform : SMaTE: A Segment-Level Feature Mixing and Temporal Encoding Framework for Facial Expression Recognition

BROWSE

Titles

논문 검색
Type		SCI
Year	~	Keyword

Detail

List

Journal Article SMaTE: A Segment-Level Feature Mixing and Temporal Encoding Framework for Facial Expression Recognition

Cited 10 time in scopus

Download 273 time Share share

Authors: Nayeon Kim, Sukhee Cho, Byungjun Bae

Issue Date: 2022-08

Citation: Sensors, v.22, no.15, pp.1-19

ISSN: 1424-8220

Publisher: MDPI

Language: English

Type: Journal Article

DOI: https://dx.doi.org/10.3390/s22155753

Abstract: Despite advanced machine learning methods, the implementation of emotion recognition systems based on real-world video content remains challenging. Videos may contain data such as images, audio, and text. However, the application of multimodal models using two or more types of data to real-world video media (CCTV, illegally filmed content, etc.) lacking sound or subtitles is difficult. Although facial expressions in image sequences can be utilized in emotion recognition, the diverse identities of individuals in real-world content limits computational models of relationships between facial expressions. This study proposed a transformation model which employed a video vision transformer to focus on facial expression sequences in videos. It effectively understood and extracted facial expression information from the identities of individuals, instead of fusing multimodal models. The design entailed capture of higher-quality facial expression information through mixed-token embedding facial expression sequences augmented via various methods into a single data representation, and comprised two modules: spatial and temporal encoders. Further, temporal position embedding, focusing on relationships between video frames, was proposed and subsequently applied to the temporal encoder module. The performance of the proposed algorithm was compared with that of conventional methods on two emotion recognition datasets of video content, with results demonstrating its superiority.

KSP Keywords: Computational Model, Conventional methods, Data representation, Machine Learning Methods, Real-world, Spatial and temporal, Temporal Encoding, Video contents, emotion recognition, facial expression recognition, image sequence

This work is distributed under the term of Creative Commons License (CCL)
(CC BY)

ETRI

218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, KOREA, Contact: sh.kim@etri.re.kr

Please refrain from automatic collection of e-mail addresses posted on this homepage.

제1유형

ETRI-Knowledge Sharing Plaform

BROWSE

Titles

Detail

ETRI