PG2024 Conference Papers and Posters

Permanent URI for this collection

https://diglib.eg.org/handle/10.2312/3607048

Browse

Now showing 1 - 20 of 57

Fast Wavelet-domain Smoke Guiding
(The Eurographics Association, 2024) Lyu, Luan; Ren, Xiaohua; Wu, Enhua; Yang, Zhi-Xin; Chen, Renjie; Ritschel, Tobias; Whiting, Emily
We propose a simple and efficient wavelet-based method to guide smoke simulation with specific velocity fields. This method primarily uses wavelets to combine low-resolution velocities with high-resolution details for smoke guiding. Due to the natural ability of wavelets to divide data into different frequency bands, we can merge low and high-resolution velocities by replacing wavelet coefficients. Compared to Fourier methods, the wavelet transform can use wavelets with shorter, compact supports, making the transformation faster and more adaptable to various boundary conditions. The method has a time complexity of O(n) and a memory complexity of n. Additionally, wavelets are compactly supported, which allows us to locally filter out or retain details by editing the wavelet coefficients. This enables us to locally edit smoke. Moreover, to accelerate the performance of wavelet transforms on GPUs, we propose a technique implemented in CUDA called in-kernel warp-level wavelet transform computation. This technique utilizes warp-level CUDA intrinsic functions to reduce data read times during computations, thus enhancing the efficiency of the wavelet transform. The experiments demonstrate that our proposed wavelet-based method achieves an approximate 5x speedup in 3D on GPUs compared to the Fourier methods, resulting in an overall improvement of around 40% in the smoke-guided simulation.
``Yunluo Journey'': A VR Cultural experience for the Chinese Musical Instrument
(The Eurographics Association, 2024) Wang, Yuqiu; Guo, Wenchen; He, Zhiting; Fan, Min; Chen, Renjie; Ritschel, Tobias; Whiting, Emily
The sustainability of the cultural heritage of traditional musical instruments requires integrating musical culture into people's daily lives. However, the Yunluo, a traditional Chinese musical instrument, is too large and expensive to be easily incorporated into everyday life. To promote the sustainability and dissemination of Yunluo culture, we designed a VR Yunluo cultural experience that allows people to engage in the creation and performance of Yunluo, as well as learn about its historical and cultural significance through a Yunluo experience. This embodied, gamified, and contextualized VR experience aims to enhance participants' interest in Yunluo culture and improve their understanding and appreciation of the related knowledge.
Enhancing Human Optical Flow via 3D Spectral Prior
(The Eurographics Association, 2024) Mao, Shiwei; Sun, Mingze; Huang, Ruqi; Chen, Renjie; Ritschel, Tobias; Whiting, Emily
In this paper, we consider the problem of human optical flow estimation, which is critical in a series of human-centric computer vision tasks. Recent deep learning-based optical flow models have achieved considerable accuracy and generalization by incorporating various kinds of priors. However, the majority either rely on large-scale 2D annotations or rigid priors, overlooking the 3D non-rigid nature of human articulations. To this end, we advocate enhancing human optical flow estimation via 3D spectral prior-aware pretraining, which is based on the well-known functional maps formulation in 3D shape matching. Our pretraining can be performed with synthetic human shapes. More specifically, we first render shapes to images and then leverage the natural inclusion maps from images to shapes to lift 2D optical flow into 3D correspondences, which are further encoded as functional maps. Such lifting operation allows to inject the intrinsic geometric features encoded in the spectral representations into optical flow learning, leading to improvement of the latter, especially in the presence of non-rigid deformations. In practice, we establish a pretraining pipeline tailored for triangular meshes, which is general regarding target optical flow network. It is worth noting that it does not introduce any additional learning parameters but only require some pre-computed eigen decomposition on the meshes. For RAFT and GMA, our pretraining task achieves improvements of 12.8% and 4.9% in AEPE on the SHOF benchmark, respectively.
Single Image 3D Reconstruction of Creased Documents Using Shape-from-Shading with Template-Based Error Correction
(The Eurographics Association, 2024) Wang, Linqin; Bo, Pengbo; Chen, Renjie; Ritschel, Tobias; Whiting, Emily
We present a method for reconstructing 3D models from single images of creased documents by enhancing the linear shapefrom- shading (SFS) technique with a template-based error correction mechanism. This mechanism is based on a mapping function established using precise data from a spherical surface modeled with linearized Lambertian shading. The error correction mapping is integrated into an algorithm that refines reconstructed depth values during the image scanning process. To resolve the inherent concave/convex ambiguities in SFS, we identify specific conditions based on assumed lighting and the geometric characteristics of creased documents, effectively improving reconstruction even in less controlled lighting environments. Our approach captures intricate geometric details on non-smooth surfaces. Comparative results demonstrate that our method provides superior accuracy and efficiency in reconstructing complex features such as creases and wrinkles.
Computational Mis-Drape Detection and Rectification
(The Eurographics Association, 2024) Shin, Hyeon-Seung; Ko, Hyeong-Seok; Chen, Renjie; Ritschel, Tobias; Whiting, Emily
For various reasons, mis-drapes occur in physically-based clothing simulation. Therefore, when developing a virtual try-on system that works without any human operators, a technique to algorithmically detect and rectify mis-drapes has to be developed. This paper makes a first attempt in that direction, by defining two mis-drape determinants, namely, the Gaussian and crease mis-drape determinants. According to the experiments performed to various avatar-garment combinations, the proposed determinants identify mis-drapes pretty accurately. This paper also proposes a treatment that can be applied to rectify the mis-drapes. The proposed treatment successfully resolves the mis-drapes without unnecessarily destroying the original drape.
Simulating Viscous Fluid Using Free Surface Lattice Boltzmann Method
(The Eurographics Association, 2024) Sun, Dakun; Gao, Yang; Xie, Xueguang; Chen, Renjie; Ritschel, Tobias; Whiting, Emily
High viscosity fluid simulation remains a significant area of interest within the graphics field. However, there are few discussions about simulating viscous fluids in computer graphics with the Lattice Boltzmann Method (LBM). In this study, we demonstrate the feasibility of using LBM for viscous fluid simulation and show a caveat regarding external forces. Previous methods (such as FLIP, MPM, SPH) on viscous fluids are mainly based on Navier-Stokes (NS) Equation, where the external forces are independent from viscosity in governing equation. Therefore, the decision to neglect the external force solely depends on its magnitude. However, in the context of the Lattice Boltzmann Equation (LBE), external forces are intertwined with viscosity within the collision term, making the choice to ignore the external force term dependent on both the viscosity and the force's magnitude. It has not been mentioned in previous study and we will show its importance by comparison experiments.
Label Name is Mantra: Unifying Point Cloud Segmentation across Heterogeneous Datasets
(The Eurographics Association, 2024) Liang, Yixun; He, Hao; Xiao, Shishi; Lu, Hao; Chen, Yingcong; Chen, Renjie; Ritschel, Tobias; Whiting, Emily
Point cloud segmentation is a fundamental task in 3D vision that serves a wide range of applications. Despite recent advancements, its practical usability is still limited by the availability of training data. The prevalent methodologies cannot optimally exploit multiple datasets due to the inconsistency of labels across datasets. In this work, we introduce a robust method that accommodates learning from diverse datasets with variant label sets. We leverage a pre-trained language model to map discrete labels into a continuous latent space using their semantic names. This harmonizes labels across datasets, facilitating concurrent training. Contrarily, when classifying points within the continuous 3D space via their linguistic tokens, our model exhibits superior generalizability compared to extant methods with fixed decoder structures. Further, our approach assimilates prompt learning to alleviate data shifts across sources. Comprehensive evaluations attest that our model markedly surpasses current benchmarks.
SPDD-YOLO for Small Object Detection in UAV Images
(The Eurographics Association, 2024) Xue, Xiang; Ji, Ya Tu; Liu, Yang; Xu, H. T.; Ren, Q. D. E. J.; Shi, B.; Wu, N. E.; Lu, M.; Zhuang, X. F.; Chen, Renjie; Ritschel, Tobias; Whiting, Emily
Aerial images captured by drones often suffer from blurriness and low resolution, which is particularly problematic for small targets. In such scenarios, the YOLO object detection algorithm tends to confuse or misidentify targets like bicycles and tricycles due to the complex features and local similarities. To address these issues, this paper proposes a SPDD-YOLO model based on YOLOv8. Firstly, the model enhances its ability to extract local features of small targets by introducing the Spatial-to- Depth Module (SPDM). Secondly, addressing the issue that SPDM reduces the receptive field, leading the model to overly focus on local features, we introduced Deep Separable Dilated Convolution (DSDC), which expands the receptive field while reducing parameters and forms the Deep Dilated Module (DDM) together with SPDM. Experiments on the VisDrone2019 dataset demonstrate that the proposed model improved precision, recall, and mAP50 by 5.8%, 5.7%, and 6.4%, respectively.
PVP-SSD: Point-Voxel Fusion with Partitioned Point Cloud Sampling for Anchor-Free Single-Stage Small 3D Object Detection
(The Eurographics Association, 2024) Wu, Xinlin; Tian, Yibin; Pan, Yin; Zhang, Zhiyuan; Wu, Xuesong; Wang, Ruisheng; Zeng, Zhi; Chen, Renjie; Ritschel, Tobias; Whiting, Emily
Single-stage object detection from 3D point clouds in autonomous driving faces significant challenges, particularly in accurately detecting small objects. To address this issue, we propose a novel method called Point-Voxel dual-branch feature extraction with Partitioned point cloud sampling for anchor-free Single-Stage Detection of 3D objects (PVP-SSD). The network comprises two branches: a point branch and a voxel branch. In the point branch, a partitioned point cloud sampling strategy leverages axial features to divide the point cloud. Then, it assigns different sampling weights to various segments to enhance the sampling accuracy. Additionally, a local feature enhancement module explicitly calculates the correlation between key points and query points, improving the extraction of local features. In the voxel branch, we use 3D sparse convolution to extract instance structural features efficiently. The point-voxel dual-branch fusion dynamically integrates instance features extracted from both branches using a self-attention mechanism, which contains not only the category information of the detected object but also the spatial dimensions and heading angle. Consequently, PVP-SSD achieves a certain balance between preserving detailed information and maintaining structural integrity. Experimental results on the KITTI and ONCE datasets demonstrate that PVP-SSD excels in multi-category small 3D object detection.
SLGDiffuser : Stroke-level Guidance Diffusion Model for Complex Scene Text Editing
(The Eurographics Association, 2024) Liu, Xiao Le; Wu, Lei; Wang, Chang Shuo; Dong, Pei; Meng, Xiang Xu; Chen, Renjie; Ritschel, Tobias; Whiting, Emily
Scene Text Editing (STE) focuses on replacing text in images while preserving style and background. Existing methods often grapple with simultaneously learning different transformation rules for text and background, especially in complex scenes. This leads to several notable challenges, such as low accuracy in content, ineffective extraction of text styles, and suboptimal background reconstruction. To address these challenges, we introduce SLGDiffuser, a stroke-level guidance diffusion model specifically designed for complex scene text editing. SLGDiffuser features a stroke-level guidance text conversion module that processes target text through character encoding and utilizes ContourLoss with stroke features to improve text accuracy. It also benefits from the proposed stroke-enhanced strategy, which enhances text integrity by leveraging detailed stroke information. Furthermore, we introduce a unified instruction-based background reconstruction module that fine-tunes a pre-trained diffusion model. It enables the application of a standardized instruction prompt to reconstruct a variety of complex scenes effectively. Tested extensively, our model outperforms existing methods across diverse real-world datasets. We release code and model weights at https://github.com/lxlde/SLGDiffuser
PhysHand: A Hand Simulation Model with Physiological Geometry, Physical Deformation, and Accurate Contact Handling
(The Eurographics Association, 2024) Sun, Mingyang; Kou, Dongliang; Yuan, Ruisheng; Yang, Dingkang; Zhai, Peng; Zhao, Xiao; Jiang, Yang; Li, Xiong; Li, Jingchen; Zhang, Lihua; Chen, Renjie; Ritschel, Tobias; Whiting, Emily
In virtual Hand-Object Interaction (HOI) scenarios, the authenticity of the hand's deformation is important to immersive experience, such as natural manipulation or tactile feedback. Unrealistic deformation arises from simplified hand geometry, neglect of the different physics attributes of the hand, and penetration due to imprecise contact handling. To address these problems, we propose PhysHand, a novel hand simulation model, which enhances the realism of deformation in HOI. First, we construct a physiologically plausible geometry, a layered mesh with a ''skin-flesh-skeleton'' structure. Second, to satisfy the distinct physics features of different soft tissues, a constraint-based dynamics framework is adopted with carefully designed layer-corresponding constraints to maintain flesh attached and skin smooth. Finally, we employ an SDF-based method to eliminate the penetration caused by contacts and enhance its accuracy by introducing a novel multi-resolution querying strategy. Extensive experiments have been conducted to demonstrate the outstanding performance of PhysHand in calculating deformations and handling contacts. Compared to existing methods, our PhysHand: 1) can compute both physiologically and physically plausible deformation; 2) significantly reduces the depth and count of penetration in HOI.
Convex Hull Computation in a Grid Space: A GPU Accelerated Parallel Filtering Approach
(The Eurographics Association, 2024) Antony, Joms; Mukundan, Manoj Kumar; Thomas, Mathew; Muthuganapathy, Ramanathan; Chen, Renjie; Ritschel, Tobias; Whiting, Emily
Many real-world applications demand the computation of a convex hull (CH) when the input points originate from structured configurations such as two-dimensional (2D) or three-dimensional (3D) grids. Convex hull in grid space has found applications in geographic information systems, medical data analysis, path planning for robots/autonomous vehicles etc. Conventional as well as existing GPU-accelerated algorithms available for CH computation cannot operate directly on 2D or 3D grids represented in matrix format and do not exploit the inherent sequential ordering in such rasterized representations. This work introduces novel filtering algorithms, initially developed for a 2D grid space and subsequently extended to 3D to speed up the hull computation. They are further extended as GPU-CPU hybrid algorithms and are implemented and evaluated on a commercial NVIDIA GPU. For a 2D grid, the number of contributing pixels is always restricted to ≤ 2n for an (n×n) grid. Moreover, they are extracted in lexicographic order, ensuring an efficient O(n) computation of CH. Similarly, in 3D, the number of contributing voxels is always limited to ≤ 2n2 for an (n×n×n) voxel matrix. Additionally, 2D CH filtering is enabled across all slices of the 3D grid in parallel, leading to a further reduction in the number of contributing voxels to be fed to the 3D CH computation procedure. Comparison with the state of the art indicated that our method is superior, especially for large and sparse point clouds.
Learning-based Self-Collision Avoidance in Retargeting using Body Part-specific Signed Distance Fields
(The Eurographics Association, 2024) Lee, Junwoo; Kim, Hoimin; Kwon, Taesoo; Chen, Renjie; Ritschel, Tobias; Whiting, Emily
Motion retargeting is a technique for applying the motion of one character to a new character. Differences in shapes and proportions between characters can cause self-collisions during the retargeting process. To address this issue, we propose a new collision resolution strategy comprising three key components: a collision detection module, a self-collision resolution model, and a training strategy for the collision resolution model. The collision detection module generates collision information based on changes in posture. The self-collision resolution model, which is based on a neural network, uses this collision information to resolve self-collisions. The proposed training strategy enhances the performance of the self-collision resolution model. Compared to previous studies, our self-collision resolution process demonstrates superior performance in terms of accuracy and generalization. Our model reduces the average penetration depth across the entire body by 56%, which is 28% better than the previous studies. Additionally, the minimum distance from the end-effectors to the skin averaged 2.65cm, which is more than 0.8cm smaller than in the previous studies. Furthermore, it takes an average of 7.9ms to solve one frame, enabling online real-time self-collision resolution.
GazeMoDiff: Gaze-guided Diffusion Model for Stochastic Human Motion Prediction
(The Eurographics Association, 2024) Yan, Haodong; Hu, Zhiming; Schmitt, Syn; Bulling, Andreas; Chen, Renjie; Ritschel, Tobias; Whiting, Emily
Human motion prediction is important for many virtual and augmented reality (VR/AR) applications such as collision avoidance and realistic avatar generation. Existing methods have synthesised body motion only from observed past motion, despite the fact that human eye gaze is known to correlate strongly with body movements and is readily available in recent VR/AR headsets. We present GazeMoDiff - a novel gaze-guided denoising diffusion model to generate stochastic human motions. Our method first uses a gaze encoder and a motion encoder to extract the gaze and motion features respectively, then employs a graph attention network to fuse these features, and finally injects the gaze-motion features into a noise prediction network via a cross-attention mechanism to progressively generate multiple reasonable human motions in the future. Extensive experiments on the MoGaze and GIMO datasets demonstrate that our method outperforms the state-of-the-art methods by a large margin in terms of multi-modal final displacement error (17.3% on MoGaze and 13.3% on GIMO). We further conducted a human study (N=21) and validated that the motions generated by our method were perceived as both more precise and more realistic than those of prior methods. Taken together, these results reveal the significant information content available in eye gaze for stochastic human motion prediction as well as the effectiveness of our method in exploiting this information.
DViTGAN: Training ViTGANs with Diffusion
(The Eurographics Association, 2024) Tong, Mengjun; Rao, Hong; Yang, Wenji; Chen, Shengbo; Zuo, Fang; Chen, Renjie; Ritschel, Tobias; Whiting, Emily
Recent research findings indicate that injecting noise using diffusion can effectively improve the stability of GAN for image generation tasks. Although ViTGAN based on Vision Transformer has certain performance advantages compared to traditional GAN, there are still issues such as unstable training and generated image details are not rich enough. Therefore, in this paper, we propose a novel model, DViTGAN, which leverages the diffusion model to generate instance noise facilitating ViTGAN training. Specifically, we employ forward diffusion to progressively generate noise that follows a Gaussian mixture distribution, and then introduce the generated noise into the input image of the discriminator. The generator incorporates the discriminator's feedback by backpropagating through the forward diffusion process to improve its performance. In addition, we observe that the ViTGAN generator lacks positional information, leading to a decreased context modeling ability and slower convergence. To this end, we introduce Fourier embedding and relative positional encoding to enhance the model's expressive ability. Experiments on multiple popular benchmarks have demonstrated the effectiveness of our proposed model.
Semantics-Augmented Quantization-Aware Training for Point Cloud Classification
(The Eurographics Association, 2024) Huang, Liming; Qin, Yunchuan; Li, Ruihui; Wu, Fan; Li, Kenli; Chen, Renjie; Ritschel, Tobias; Whiting, Emily
Point cloud classification is a pivotal procedure in 3D computer vision, and its deployment in practical applications is often constrained by limited computational and memory resources. To address these issues, we introduce a Semantics-Augmented Quantization-Aware Training (SAQAT) framework designed for efficient and precise classification of point cloud data. The SAQAT framework incorporates a point importance prediction semantic module as a side output, which assists in identifying crucial points, along with a point importance evaluation algorithm (PIEA). The semantics module leverages point importance prediction to skillfully select quantization levels based on local geometric properties and semantic context. This approach reduces errors by retaining essential information. In synergy, the PIEA acts as the cornerstone, providing an additional layer of refinement to SAQAT framework. Furthermore, we integrates a loss function that mitigates classification loss, quantization error, and point importance prediction loss, thereby fostering a reliable representation of the quantized data. The SAQAT framework is designed for seamless integration with existing point cloud models, enhancing their efficiency while maintaining high levels of accuracy. Testing on benchmark datasets demonstrates that our SAQAT framework surpasses contemporary quantization methods in classification accuracy while simultaneously economizing on memory and computational resources. Given these advantages, our SAQAT framework holds enormous potential for a wide spectrum of applications within the rapidly evolving domain of 3D computer vision. Our code is released: https://github.com/h-liming/SAQAT.
PointJEM: Self-supervised Point Cloud Understanding for Reducing Feature Redundancy via Joint Entropy Maximization
(The Eurographics Association, 2024) Cao, Xin; Xia, Huan; Wang, Haoyu; Su, Linzhi; Zhou, Ping; Li, Kang; Chen, Renjie; Ritschel, Tobias; Whiting, Emily
Most deep learning methods for point cloud processing are supervised and require extensive labeled data. However, labeling point cloud data is a tedious and time-consuming task. Self-supervised representation learning can solve this problem by extracting robust and generalized features from unlabeled data. Yet, the features from representation learning are often redundant. Current methods typically reduce redundancy by imposing linear correlation constraints. In this paper, we introduce PointJEM, a self-supervised representation learning method for point clouds. It includes an embedding scheme that divides the vector into parts, each learning a unique feature. To minimize redundancy, PointJEM maximizes joint entropy between parts, making the features pairwise independent. We tested PointJEM on various datasets and found it significantly reduces redundancy beyond linear correlation. Additionally, PointJEM performs well in downstream tasks like classification and segmentation.
Audio-Driven Speech Animation with Text-Guided Expression
(The Eurographics Association, 2024) Jung, Sunjin; Chun, Sewhan; Noh, Junyong; Chen, Renjie; Ritschel, Tobias; Whiting, Emily
We introduce a novel method for generating expressive speech animations of a 3D face, driven by both audio and text descriptions. Many previous approaches focused on generating facial expressions using pre-defined emotion categories. In contrast, our method is capable of generating facial expressions from text descriptions unseen during training, without limitations to specific emotion classes. Our system employs a two-stage approach. In the first stage, an auto-encoder is trained to disentangle content and expression features from facial animations. In the second stage, two transformer-based networks predict the content and expression features from audio and text inputs, respectively. These features are then passed to the decoder of the pre-trained auto-encoder, yielding the final expressive speech animation. By accommodating diverse forms of natural language, such as emotion words or detailed facial expression descriptions, our method offers an intuitive and versatile way to generate expressive speech animations. Extensive quantitative and qualitative evaluations, including a user study, demonstrate that our method can produce natural expressive speech animations that correspond to the input audio and text descriptions.
Mesh Slicing Along Isolines of Surface-Based Functions
(The Eurographics Association, 2024) Wang, Lei; Wang, Xudong; Wang, Wensong; Chen, Shuangmin; Xin, Shiqing; Tu, Changhe; Wang, Wenping; Chen, Renjie; Ritschel, Tobias; Whiting, Emily
There are numerous practical scenarios where the surface of a 3D object is equipped with varying properties. The process of slicing the surface along the isoline of the property field is a widely utilized operation. While the geometry of the 3D object can typically be approximated with a piecewise linear triangle mesh, the property field f might be too intricate to be linearly approximated at the same resolution. Arbitrarily reducing the isoline within a triangle into a straight-line segment could result in noticeable artifacts. In this paper, we delve into the precise extraction of the isoline of a surface-based function f for slicing the surface apart, allowing the extracted isoline to be curved within a triangle. Our approach begins by adequately sampling Steiner points on mesh edges. Subsequently, for each triangle, we categorize the Steiner points into two groups based on the signs of their function values. We then trace the bisector between these two groups of Steiner points by simply computing a 2D power diagram of all Steiner points. It's worth noting that the weight setting of the power diagram is derived from the first-order approximation of f . Finally, we refine the polygonal bisector by adjusting each vertex to the closest point on the actual isoline. Each step of our algorithm is fully parallelizable on a triangle level, making it highly efficient. Additionally, we provide numerous examples to illustrate its practical applications.
Biophysically-based Simulation of Sun-induced Skin Appearance Changes
(The Eurographics Association, 2024) He, Xueyan; Huang, Minghao; Fu, Ruoyu; Guo, Jie; Yuan, Junping; Wang, Yanghai; Guo, Yanwen; Chen, Renjie; Ritschel, Tobias; Whiting, Emily
Skin appearance modeling plays a crucial role in various fields such as healthcare, cosmetics and entertainment. However, the structure of the skin and its interaction with environmental factors like ultraviolet radiation are very complex and require further detailed modeling. In this paper, we propose a biophysically-based model to illustrate the changes in skin appearance under ultraviolet radiation exposure. It takes ultraviolet doses and specific biophysical parameters as inputs, leading to variations in melanin and blood concentrations, as well as the growth rate of skin cells. These changes bring alteration of light scattering, which is simulated by random walk method, and result in observable erythema and tanning. We showcase effects of various skin tones, comparisons across different body parts, and images illustrating the impact of occlusion. It demonstrates superior quality to the the commonly used method with more convincing skin details and bridges biological insights with visual simulations.

Browse

Browsing PG2024 Conference Papers and Posters by Issue Date

Results Per Page

Sort Options