44-Issue 7

Permanent URI for this collection

https://diglib.eg.org/handle/10.2312/3607235

Browse

Now showing 1 - 1 of 1

Text-Guided Diffusion with Spectral Convolution for 3D Human Pose Estimation
(The Eurographics Association and John Wiley & Sons Ltd., 2025) Shi, Liyuan; Wu, Suping; Yang, Sheng; Qiu, Weibin; Qiang, Dong; Zhao, Jiarui; Christie, Marc; Pietroni, Nico; Wang, Yu-Shuen
Although significant progress has been made in monocular video-based 3D human pose estimation, existing methods lack guidance from fine-grained high-level prior knowledge such as action semantics and camera viewpoints, leading to significant challenges for pose reconstruction accuracy under scenarios with severely missing visual features, i.e., complex occlusion situations. We identify that the 3D human pose estimation task fundamentally constitutes a canonical inverse problem, and propose a motion-semantics-based diffusion(MS-Diff) framework to address this issue by incorporating high-level motion semantics with spectral feature regularization to eliminate interference noise in complex scenes and improve estimation accuracy. Specifically, we design a Multimodal Diffusion Interaction (MDI) module that incorporates motion semantics including action categories and camera viewpoints into the diffusion process, establishing semantic-visual feature alignment through a cross-modal mechanism to resolve pose ambiguities and effectively handle occlusions. Additionally, we leverage a Spectral Convolutional Regularization (SCR) module that implements adaptive filtering in the frequency domain to selectively suppress noise components. Extensive experiments on large-scale public datasets Human3.6M and MPI-INF-3DHP demonstrate that our method achieves state-of-the-art performance.

Browse

Browsing 44-Issue 7 by Subject "CCS Concepts: Computing methodologies → Activity recognition and understanding"

Results Per Page

Sort Options