High-Performance Graphics 2023
Permanent URI for this collection
Browse
Browsing High-Performance Graphics 2023 by Subject "Computing methodologies"
Now showing 1 - 5 of 5
Results Per Page
Sort Options
Item Edge-Friend: Fast and Deterministic Catmull-Clark Subdivision Surfaces(The Eurographics Association and John Wiley & Sons Ltd., 2023) Kuth, Bastian; Oberberger, Max; Chajdas, Matthäus; Meyer, Quirin; Bikker, Jacco; Gribble, ChristiaanWe present edge-friend, a data structure for quad meshes with access to neighborhood information required for Catmull-Clark subdivision surface refinement. Edge-friend enables efficient real-time subdivision surface rendering. In particular, the resulting algorithm is deterministic, does not require hardware support for atomic floating-point arithmetic, and is optimized for efficient rendering on GPUs. Edge-friend exploits that after one subdivision step, two edges can be uniquely and implicitly assigned to each quad. Additionally, edge-friend is a compact data structure, adding little overhead. Our algorithm is simple to implement in a single compute shader kernel, and requires minimal synchronization which makes it particularly suited for asynchronous execution. We easily extend our kernel to support relevant Catmull-Clark subdivision surface features, including semi-smooth creases, boundaries, animation and attribute interpolation. In case of topology changes, our data structure requires little preprocessing, making it amendable for a variety of applications, including real-time editing and animations. Our method can process and render billions of triangles per second on modern GPUs. For a sample mesh, our algorithm generates and renders 2.9 million triangles in 0.58ms on an AMD Radeon RX 7900 XTX GPU.Item Generative Adversarial Shaders for Real-Time Realism Enhancement(The Eurographics Association and John Wiley & Sons Ltd., 2023) Salmi, Arturo; Cséfalvay, Szabolcs; Imber, James; Bikker, Jacco; Gribble, ChristiaanApplication of realism enhancement methods, particularly in real-time and resource-constrained settings, has been frustrated by the expense of existing methods. These achieve high quality results only at the cost of long runtimes and high bandwidth, memory, and power requirements. We present an efficient alternative: a high-performance, generative shader-based approach that adapts machine learning techniques to real-time applications, even in resource-constrained settings such as embedded and mobile GPUs. The proposed learnable shader pipeline comprises differentiable functions that can be trained in an end-toend manner using an adversarial objective, allowing for faithful reproduction of the appearance of a target image set without manual tuning. The shader pipeline is optimized for highly efficient execution on the target device, providing temporally stable, faster-than-real time results with quality competitive with many neural network-based methods.Item GPU-Accelerated LOD Generation for Point Clouds(The Eurographics Association and John Wiley & Sons Ltd., 2023) Schütz, Markus; Kerbl, Bernhard; Klaus, Philip; Wimmer, Michael; Bikker, Jacco; Gribble, ChristiaanAbout: We introduce a GPU-accelerated LOD construction process that creates a hybrid voxel-point-based variation of the widely used layered point cloud (LPC) structure for LOD rendering and streaming. The massive performance improvements provided by the GPU allow us to improve the quality of lower LODs via color filtering while still increasing construction speed compared to the non-filtered, CPU-based state of the art. Background: LOD structures are required to render hundreds of millions to trillions of points, but constructing them takes time. Results: LOD structures suitable for rendering and streaming are constructed at rates of about 1 billion points per second (with color filtering) to 4 billion points per second (sample-picking/random sampling, state of the art) on an RTX 3090 - an improvement of a factor of 80 to 400 times over the CPU-based state of the art (12 million points per second). Due to being in-core, model sizes are limited to about 500 million points per 24GB memory. Discussion: Our method currently focuses on maximizing in-core construction speed on the GPU. Issues such as out-of-core construction of arbitrarily large data sets are not addressed, but we expect it to be suitable as a component of bottom-up out-of-core LOD construction schemes.Item Real-Time Rendering of Glinty Appearances using Distributed Binomial Laws on Anisotropic Grids(The Eurographics Association and John Wiley & Sons Ltd., 2023) Deliot, Thomas; Belcour, Laurent; Bikker, Jacco; Gribble, ChristiaanIn this work, we render in real-time glittery materials caused by discrete flakes on the surface. To achieve this, one has to count the number of flakes reflecting the light towards the camera within every texel covered by a given pixel footprint. To do so, we derive a counting method for arbitrary footprints that, unlike previous work, outputs the correct statistics. We combine this counting method with an anisotropic parameterization of the texture space that reduces the number of texels falling under a pixel footprint. This allows our method to run with both stable performance and 1.5× to 5× faster than the state-of-the-art.Item Sampling Visible GGX Normals with Spherical Caps(The Eurographics Association and John Wiley & Sons Ltd., 2023) Dupuy, Jonathan; Benyoub, Anis; Bikker, Jacco; Gribble, ChristiaanImportance sampling the distribution of visible GGX normals requires sampling those of a hemisphere. In this work, we introduce a novel method for sampling such visible normals. Our method builds upon the insight that a hemispherical mirror reflects parallel light rays uniformly within a solid angle shaped as a spherical cap. This spherical cap has the same apex as the hemispherical mirror, and its aperture given by the angle formed by the orientation of that apex and the direction of incident light rays. Based on this insight, we sample GGX visible normals as halfway vectors between a given incident direction and directions drawn from its associated spherical cap. Our resulting implementation is even simpler than that of Heitz and leads to up to 39% speed-ups in our benchmarks.