44-Issue 7
Permanent URI for this collection
Browse
Browsing 44-Issue 7 by Subject "CCS Concepts: Computing methodologies → Activity recognition and understanding"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
Item Text-Guided Diffusion with Spectral Convolution for 3D Human Pose Estimation(The Eurographics Association and John Wiley & Sons Ltd., 2025) Shi, Liyuan; Wu, Suping; Yang, Sheng; Qiu, Weibin; Qiang, Dong; Zhao, Jiarui; Christie, Marc; Pietroni, Nico; Wang, Yu-ShuenAlthough significant progress has been made in monocular video-based 3D human pose estimation, existing methods lack guidance from fine-grained high-level prior knowledge such as action semantics and camera viewpoints, leading to significant challenges for pose reconstruction accuracy under scenarios with severely missing visual features, i.e., complex occlusion situations. We identify that the 3D human pose estimation task fundamentally constitutes a canonical inverse problem, and propose a motion-semantics-based diffusion(MS-Diff) framework to address this issue by incorporating high-level motion semantics with spectral feature regularization to eliminate interference noise in complex scenes and improve estimation accuracy. Specifically, we design a Multimodal Diffusion Interaction (MDI) module that incorporates motion semantics including action categories and camera viewpoints into the diffusion process, establishing semantic-visual feature alignment through a cross-modal mechanism to resolve pose ambiguities and effectively handle occlusions. Additionally, we leverage a Spectral Convolutional Regularization (SCR) module that implements adaptive filtering in the frequency domain to selectively suppress noise components. Extensive experiments on large-scale public datasets Human3.6M and MPI-INF-3DHP demonstrate that our method achieves state-of-the-art performance.