GazeMoDiff: Gaze-guided Diffusion Model for Stochastic Human Motion Prediction

dc.contributor.authorYan, Haodongen_US
dc.contributor.authorHu, Zhimingen_US
dc.contributor.authorSchmitt, Synen_US
dc.contributor.authorBulling, Andreasen_US
dc.contributor.editorChen, Renjieen_US
dc.contributor.editorRitschel, Tobiasen_US
dc.contributor.editorWhiting, Emilyen_US
dc.date.accessioned2024-10-13T18:05:24Z
dc.date.available2024-10-13T18:05:24Z
dc.date.issued2024
dc.description.abstractHuman motion prediction is important for many virtual and augmented reality (VR/AR) applications such as collision avoidance and realistic avatar generation. Existing methods have synthesised body motion only from observed past motion, despite the fact that human eye gaze is known to correlate strongly with body movements and is readily available in recent VR/AR headsets. We present GazeMoDiff - a novel gaze-guided denoising diffusion model to generate stochastic human motions. Our method first uses a gaze encoder and a motion encoder to extract the gaze and motion features respectively, then employs a graph attention network to fuse these features, and finally injects the gaze-motion features into a noise prediction network via a cross-attention mechanism to progressively generate multiple reasonable human motions in the future. Extensive experiments on the MoGaze and GIMO datasets demonstrate that our method outperforms the state-of-the-art methods by a large margin in terms of multi-modal final displacement error (17.3% on MoGaze and 13.3% on GIMO). We further conducted a human study (N=21) and validated that the motions generated by our method were perceived as both more precise and more realistic than those of prior methods. Taken together, these results reveal the significant information content available in eye gaze for stochastic human motion prediction as well as the effectiveness of our method in exploiting this information.en_US
dc.description.sectionheadersHuman I
dc.description.seriesinformationPacific Graphics Conference Papers and Posters
dc.identifier.doi10.2312/pg.20241315
dc.identifier.isbn978-3-03868-250-9
dc.identifier.pages12 pages
dc.identifier.urihttps://doi.org/10.2312/pg.20241315
dc.identifier.urihttps://diglib.eg.org/handle/10.2312/pg20241315
dc.publisherThe Eurographics Associationen_US
dc.rightsAttribution 4.0 International License
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subjectCCS Concepts: Computing methodologies → Machine learning; Machine learning approaches; Neural networks; Human-centred computing → Human computer interaction; Virtual reality; Interaction paradigms
dc.subjectComputing methodologies → Machine learning
dc.subjectMachine learning approaches
dc.subjectNeural networks
dc.subjectHuman centred computing → Human computer interaction
dc.subjectVirtual reality
dc.subjectInteraction paradigms
dc.titleGazeMoDiff: Gaze-guided Diffusion Model for Stochastic Human Motion Predictionen_US
Files
Original bundle
Now showing 1 - 3 of 3
Loading...
Thumbnail Image
Name:
pg20241315.pdf
Size:
1.27 MB
Format:
Adobe Portable Document Format
No Thumbnail Available
Name:
paper1106_mm.mp4
Size:
15.16 MB
Format:
Video MP4
Loading...
Thumbnail Image
Name:
paper1106_mm.pdf
Size:
1.28 MB
Format:
Adobe Portable Document Format