EmoDiffGes: Emotion-Aware Co-Speech Holistic Gesture Generation with Progressive Synergistic Diffusion
| dc.contributor.author | Li, Xinru | en_US |
| dc.contributor.author | Lin, Jingzhong | en_US |
| dc.contributor.author | Zhang, Bohao | en_US |
| dc.contributor.author | Qi, Yuanyuan | en_US |
| dc.contributor.author | Wang, Changbo | en_US |
| dc.contributor.author | He, Gaoqi | en_US |
| dc.contributor.editor | Christie, Marc | en_US |
| dc.contributor.editor | Pietroni, Nico | en_US |
| dc.contributor.editor | Wang, Yu-Shuen | en_US |
| dc.date.accessioned | 2025-10-07T05:03:04Z | |
| dc.date.available | 2025-10-07T05:03:04Z | |
| dc.date.issued | 2025 | |
| dc.description.abstract | Co-speech gesture generation, driven by emotional expression and synergistic bodily movements, is essential for applications such as virtual avatars and human-robot interaction. Existing co-speech gesture generation methods face two fundamental limitations: (1) producing inexpressive gestures due to ignoring the temporal evolution of emotion; (2) generating incoherent and unnatural motions as a result of either holistic body oversimplification or independent part modeling. To address the above limitations, we propose EmoDiffGes, a diffusion-based framework grounded in embodied emotion theory, unifying dynamic emotion conditioning and part-aware synergistic modeling. Specifically, a Dynamic Emotion-Alignment Module (DEAM) is first applied to extract dynamic emotional cues and inject emotion guidance into the generation process. Then, a Progressive Synergistic Gesture Generator (PSGG) iteratively refines region-specific latent codes while maintaining full-body coordination, leveraging a Body Region Prior for part-specific encoding and Progressive Inter-Region Synergistic Flow for global motion coherence. Extensive experiments validate the effectiveness of our methods, showcasing the potential for generating expressive, coordinated, and emotionally grounded human gestures. | en_US |
| dc.description.number | 7 | |
| dc.description.sectionheaders | Digital Human | |
| dc.description.seriesinformation | Computer Graphics Forum | |
| dc.description.volume | 44 | |
| dc.identifier.doi | 10.1111/cgf.70261 | |
| dc.identifier.issn | 1467-8659 | |
| dc.identifier.pages | 13 pages | |
| dc.identifier.uri | https://doi.org/10.1111/cgf.70261 | |
| dc.identifier.uri | https://diglib.eg.org/handle/10.1111/cgf70261 | |
| dc.publisher | The Eurographics Association and John Wiley & Sons Ltd. | en_US |
| dc.subject | CCS Concepts: Computing methodologies → Computer graphics; Animation; Motion processing | |
| dc.subject | Computing methodologies → Computer graphics | |
| dc.subject | Animation | |
| dc.subject | Motion processing | |
| dc.title | EmoDiffGes: Emotion-Aware Co-Speech Holistic Gesture Generation with Progressive Synergistic Diffusion | en_US |