ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper A Tool for Extracting 3D Avatar-Ready Gesture Animations from Monocular Videos
Cited 3 time in scopus Share share facebook twitter linkedin kakaostory
Authors
Andrew Feng, Samuel Shin, Youngwoo Yoon
Issue Date
2022-11
Citation
ACM SIGGRAPH Conference on Motion, Interaction and Games (MIG) 2022, pp.1-7
Publisher
ACM
Language
English
Type
Conference Paper
DOI
https://dx.doi.org/10.1145/3561975.3562953
Abstract
Modeling and generating realistic human gesture animations from speech audios has great impacts on creating a believable virtual human that can interact with human users and mimic real-world face-to-face communications. Large-scale datasets are essential in data-driven research, but creating multi-modal gesture datasets with 3D gesture motions and corresponding speech audios is either expensive to create via traditional workflow such as mocap, or producing subpar results via pose estimations from in-the-wild videos. As a result of such limitations, existing gesture datasets either suffer from shorter duration or lower animation quality, making them less ideal for training gesture synthesis models. Motivated by the key limitations from previous datasets and recent progress in human mesh recovery (HMR), we developed a tool for extracting avatar-ready gesture motions from monocular videos with improved animation quality. The tool utilizes a variational autoencoder (VAE) to refine raw gesture motions. The resulting gestures are in a unified pose representation that includes both body and finger motions and can be readily applied to a virtual avatar via online motion retargeting. We validated the proposed tool on existing datasets and created the refined dataset TED-SMPLX by re-processing videos from the original TED dataset. The new dataset is available at https://andrewfengusa.github.io/TED_SMPLX_Dataset.
KSP Keywords
3D avatar, 3D gesture, Data-driven research, Face-to-face, Gesture datasets, Gesture synthesis, In-the-wild, Large-scale datasets, Motion Retargeting, Multi-modal, Real-world