ETRI-Knowledge Sharing Plaform



논문 검색
구분 SCI
연도 ~ 키워드


학술대회 A Tool for Extracting 3D Avatar-Ready Gesture Animations from Monocular Videos
Cited 2 time in scopus Download 1 time Share share facebook twitter linkedin kakaostory
Andrew Feng, Samuel Shin, 윤영우
ACM SIGGRAPH Conference on Motion, Interaction and Games (MIG) 2022, pp.1-7
21HS1500, 고령 사회에 대응하기 위한 실환경 휴먼케어 로봇 기술 개발, 이재연
Modeling and generating realistic human gesture animations from speech audios has great impacts on creating a believable virtual human that can interact with human users and mimic real-world face-to-face communications. Large-scale datasets are essential in data-driven research, but creating multi-modal gesture datasets with 3D gesture motions and corresponding speech audios is either expensive to create via traditional workflow such as mocap, or producing subpar results via pose estimations from in-the-wild videos. As a result of such limitations, existing gesture datasets either suffer from shorter duration or lower animation quality, making them less ideal for training gesture synthesis models. Motivated by the key limitations from previous datasets and recent progress in human mesh recovery (HMR), we developed a tool for extracting avatar-ready gesture motions from monocular videos with improved animation quality. The tool utilizes a variational autoencoder (VAE) to refine raw gesture motions. The resulting gestures are in a unified pose representation that includes both body and finger motions and can be readily applied to a virtual avatar via online motion retargeting. We validated the proposed tool on existing datasets and created the refined dataset TED-SMPLX by re-processing videos from the original TED dataset. The new dataset is available at
KSP 제안 키워드
3D avatar, 3D gestures, Data-driven research, Face-to-face, Gesture datasets, Gesture synthesis, In-the-wild, Large-scale datasets, Motion Retargeting, Multi-modal, Real-world