Text-Guided Diffusion with Spectral Convolution for 3D Human Pose Estimation

dc.contributor.authorShi, Liyuanen_US
dc.contributor.authorWu, Supingen_US
dc.contributor.authorYang, Shengen_US
dc.contributor.authorQiu, Weibinen_US
dc.contributor.authorQiang, Dongen_US
dc.contributor.authorZhao, Jiaruien_US
dc.contributor.editorChristie, Marcen_US
dc.contributor.editorPietroni, Nicoen_US
dc.contributor.editorWang, Yu-Shuenen_US
dc.date.accessioned2025-10-07T05:03:10Z
dc.date.available2025-10-07T05:03:10Z
dc.date.issued2025
dc.description.abstractAlthough significant progress has been made in monocular video-based 3D human pose estimation, existing methods lack guidance from fine-grained high-level prior knowledge such as action semantics and camera viewpoints, leading to significant challenges for pose reconstruction accuracy under scenarios with severely missing visual features, i.e., complex occlusion situations. We identify that the 3D human pose estimation task fundamentally constitutes a canonical inverse problem, and propose a motion-semantics-based diffusion(MS-Diff) framework to address this issue by incorporating high-level motion semantics with spectral feature regularization to eliminate interference noise in complex scenes and improve estimation accuracy. Specifically, we design a Multimodal Diffusion Interaction (MDI) module that incorporates motion semantics including action categories and camera viewpoints into the diffusion process, establishing semantic-visual feature alignment through a cross-modal mechanism to resolve pose ambiguities and effectively handle occlusions. Additionally, we leverage a Spectral Convolutional Regularization (SCR) module that implements adaptive filtering in the frequency domain to selectively suppress noise components. Extensive experiments on large-scale public datasets Human3.6M and MPI-INF-3DHP demonstrate that our method achieves state-of-the-art performance.en_US
dc.description.number7
dc.description.sectionheadersDigital Human
dc.description.seriesinformationComputer Graphics Forum
dc.description.volume44
dc.identifier.doi10.1111/cgf.70263
dc.identifier.issn1467-8659
dc.identifier.pages10 pages
dc.identifier.urihttps://doi.org/10.1111/cgf.70263
dc.identifier.urihttps://diglib.eg.org/handle/10.1111/cgf70263
dc.publisherThe Eurographics Association and John Wiley & Sons Ltd.en_US
dc.subjectCCS Concepts: Computing methodologies → Activity recognition and understanding
dc.subjectComputing methodologies → Activity recognition and understanding
dc.titleText-Guided Diffusion with Spectral Convolution for 3D Human Pose Estimationen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
cgf70263.pdf
Size:
1.35 MB
Format:
Adobe Portable Document Format
Collections