DViTGAN: Training ViTGANs with Diffusion
dc.contributor.author | Tong, Mengjun | en_US |
dc.contributor.author | Rao, Hong | en_US |
dc.contributor.author | Yang, Wenji | en_US |
dc.contributor.author | Chen, Shengbo | en_US |
dc.contributor.author | Zuo, Fang | en_US |
dc.contributor.editor | Chen, Renjie | en_US |
dc.contributor.editor | Ritschel, Tobias | en_US |
dc.contributor.editor | Whiting, Emily | en_US |
dc.date.accessioned | 2024-10-13T18:04:53Z | |
dc.date.available | 2024-10-13T18:04:53Z | |
dc.date.issued | 2024 | |
dc.description.abstract | Recent research findings indicate that injecting noise using diffusion can effectively improve the stability of GAN for image generation tasks. Although ViTGAN based on Vision Transformer has certain performance advantages compared to traditional GAN, there are still issues such as unstable training and generated image details are not rich enough. Therefore, in this paper, we propose a novel model, DViTGAN, which leverages the diffusion model to generate instance noise facilitating ViTGAN training. Specifically, we employ forward diffusion to progressively generate noise that follows a Gaussian mixture distribution, and then introduce the generated noise into the input image of the discriminator. The generator incorporates the discriminator's feedback by backpropagating through the forward diffusion process to improve its performance. In addition, we observe that the ViTGAN generator lacks positional information, leading to a decreased context modeling ability and slower convergence. To this end, we introduce Fourier embedding and relative positional encoding to enhance the model's expressive ability. Experiments on multiple popular benchmarks have demonstrated the effectiveness of our proposed model. | en_US |
dc.description.sectionheaders | Image Synthesis | |
dc.description.seriesinformation | Pacific Graphics Conference Papers and Posters | |
dc.identifier.doi | 10.2312/pg.20241305 | |
dc.identifier.isbn | 978-3-03868-250-9 | |
dc.identifier.pages | 10 pages | |
dc.identifier.uri | https://doi.org/10.2312/pg.20241305 | |
dc.identifier.uri | https://diglib.eg.org/handle/10.2312/pg20241305 | |
dc.publisher | The Eurographics Association | en_US |
dc.rights | Attribution 4.0 International License | |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | |
dc.subject | CCS Concepts: Computing methodologies → Collision detection; Hardware → Sensors and actuators; PCB design and layout | |
dc.subject | Computing methodologies → Collision detection | |
dc.subject | Hardware → Sensors and actuators | |
dc.subject | PCB design and layout | |
dc.title | DViTGAN: Training ViTGANs with Diffusion | en_US |
Files
Original bundle
1 - 1 of 1