GamePose: Self-Supervised 3D Human Pose Estimation from Multi-View Game Videos

dc.contributor.authorZhou, Yangen_US
dc.contributor.authorGuo, Tianzeen_US
dc.contributor.authorXu, Haoen_US
dc.contributor.authorWei, Xileien_US
dc.contributor.authorXu, Langen_US
dc.contributor.authorTang, Xiangjunen_US
dc.contributor.authorYang, Sipengen_US
dc.contributor.authorKou, Qilongen_US
dc.contributor.authorJin, Xiaogangen_US
dc.contributor.editorChen, Renjieen_US
dc.contributor.editorRitschel, Tobiasen_US
dc.contributor.editorWhiting, Emilyen_US
dc.date.accessioned2024-10-13T18:05:30Z
dc.date.available2024-10-13T18:05:30Z
dc.date.issued2024
dc.description.abstractRecovering 3D character animations from published games is crucial when original animation assets are lost. One solution for recovering such animation assets is to use 3D human pose estimation with single or multiple views. Our insight is to preserve the ease of use of single-view estimation while enhancing its accuracy by leveraging information from multi-view videos. It is a difficult task that requires explicitly modelling the correlation of multi-view input to achieve superior accuracy and converting the multi-view correlation model to a single-view model without impacting the accuracy, which both are unresolved. To this end, we propose a novel self-supervised 3D pose estimation framework that models the correlation of multi-view input during training and can predict highly accurate estimation for single-view input. Our framework consists of two main components: the Single-View Module (SM) and the Cross-View Module (CM). The SM predicts approximate 3D poses and extracts features from a single viewpoint, while the CM enhances the learning process by modelling correlations across multiple viewpoints. This design facilitates effective self-distillation, improving the accuracy of single-view estimations. As a result, our method supports highly accurate inference with both multi-view data and single-view data. We validate our method on 3D human pose estimation benchmarks and create a new dataset using Mixamo assets to demonstrate its applicability in gaming scenarios. Extensive experiments show that our approach outperforms state-of-the-art methods in self-supervised learning scenarios.en_US
dc.description.sectionheadersHuman I
dc.description.seriesinformationPacific Graphics Conference Papers and Posters
dc.identifier.doi10.2312/pg.20241316
dc.identifier.isbn978-3-03868-250-9
dc.identifier.pages13 pages
dc.identifier.urihttps://doi.org/10.2312/pg.20241316
dc.identifier.urihttps://diglib.eg.org/handle/10.2312/pg20241316
dc.publisherThe Eurographics Associationen_US
dc.rightsAttribution 4.0 International License
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subjectCCS Concepts: Computing methodologies → Motion capture
dc.subjectComputing methodologies → Motion capture
dc.titleGamePose: Self-Supervised 3D Human Pose Estimation from Multi-View Game Videosen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
pg20241316.pdf
Size:
6.97 MB
Format:
Adobe Portable Document Format