GamePose: Self-Supervised 3D Human Pose Estimation from Multi-View Game Videos
dc.contributor.author | Zhou, Yang | en_US |
dc.contributor.author | Guo, Tianze | en_US |
dc.contributor.author | Xu, Hao | en_US |
dc.contributor.author | Wei, Xilei | en_US |
dc.contributor.author | Xu, Lang | en_US |
dc.contributor.author | Tang, Xiangjun | en_US |
dc.contributor.author | Yang, Sipeng | en_US |
dc.contributor.author | Kou, Qilong | en_US |
dc.contributor.author | Jin, Xiaogang | en_US |
dc.contributor.editor | Chen, Renjie | en_US |
dc.contributor.editor | Ritschel, Tobias | en_US |
dc.contributor.editor | Whiting, Emily | en_US |
dc.date.accessioned | 2024-10-13T18:05:30Z | |
dc.date.available | 2024-10-13T18:05:30Z | |
dc.date.issued | 2024 | |
dc.description.abstract | Recovering 3D character animations from published games is crucial when original animation assets are lost. One solution for recovering such animation assets is to use 3D human pose estimation with single or multiple views. Our insight is to preserve the ease of use of single-view estimation while enhancing its accuracy by leveraging information from multi-view videos. It is a difficult task that requires explicitly modelling the correlation of multi-view input to achieve superior accuracy and converting the multi-view correlation model to a single-view model without impacting the accuracy, which both are unresolved. To this end, we propose a novel self-supervised 3D pose estimation framework that models the correlation of multi-view input during training and can predict highly accurate estimation for single-view input. Our framework consists of two main components: the Single-View Module (SM) and the Cross-View Module (CM). The SM predicts approximate 3D poses and extracts features from a single viewpoint, while the CM enhances the learning process by modelling correlations across multiple viewpoints. This design facilitates effective self-distillation, improving the accuracy of single-view estimations. As a result, our method supports highly accurate inference with both multi-view data and single-view data. We validate our method on 3D human pose estimation benchmarks and create a new dataset using Mixamo assets to demonstrate its applicability in gaming scenarios. Extensive experiments show that our approach outperforms state-of-the-art methods in self-supervised learning scenarios. | en_US |
dc.description.sectionheaders | Human I | |
dc.description.seriesinformation | Pacific Graphics Conference Papers and Posters | |
dc.identifier.doi | 10.2312/pg.20241316 | |
dc.identifier.isbn | 978-3-03868-250-9 | |
dc.identifier.pages | 13 pages | |
dc.identifier.uri | https://doi.org/10.2312/pg.20241316 | |
dc.identifier.uri | https://diglib.eg.org/handle/10.2312/pg20241316 | |
dc.publisher | The Eurographics Association | en_US |
dc.rights | Attribution 4.0 International License | |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | |
dc.subject | CCS Concepts: Computing methodologies → Motion capture | |
dc.subject | Computing methodologies → Motion capture | |
dc.title | GamePose: Self-Supervised 3D Human Pose Estimation from Multi-View Game Videos | en_US |
Files
Original bundle
1 - 1 of 1