DViTGAN: Training ViTGANs with Diffusion

Tong, Mengjun; Rao, Hong; Yang, Wenji; Chen, Shengbo; Zuo, Fang

DViTGAN: Training ViTGANs with Diffusion

dc.contributor.author	Tong, Mengjun	en_US
dc.contributor.author	Rao, Hong	en_US
dc.contributor.author	Yang, Wenji	en_US
dc.contributor.author	Chen, Shengbo	en_US
dc.contributor.author	Zuo, Fang	en_US
dc.contributor.editor	Chen, Renjie	en_US
dc.contributor.editor	Ritschel, Tobias	en_US
dc.contributor.editor	Whiting, Emily	en_US
dc.date.accessioned	2024-10-13T18:04:53Z
dc.date.available	2024-10-13T18:04:53Z
dc.date.issued	2024
dc.description.abstract	Recent research findings indicate that injecting noise using diffusion can effectively improve the stability of GAN for image generation tasks. Although ViTGAN based on Vision Transformer has certain performance advantages compared to traditional GAN, there are still issues such as unstable training and generated image details are not rich enough. Therefore, in this paper, we propose a novel model, DViTGAN, which leverages the diffusion model to generate instance noise facilitating ViTGAN training. Specifically, we employ forward diffusion to progressively generate noise that follows a Gaussian mixture distribution, and then introduce the generated noise into the input image of the discriminator. The generator incorporates the discriminator's feedback by backpropagating through the forward diffusion process to improve its performance. In addition, we observe that the ViTGAN generator lacks positional information, leading to a decreased context modeling ability and slower convergence. To this end, we introduce Fourier embedding and relative positional encoding to enhance the model's expressive ability. Experiments on multiple popular benchmarks have demonstrated the effectiveness of our proposed model.	en_US
dc.description.sectionheaders	Image Synthesis
dc.description.seriesinformation	Pacific Graphics Conference Papers and Posters
dc.identifier.doi	10.2312/pg.20241305
dc.identifier.isbn	978-3-03868-250-9
dc.identifier.pages	10 pages
dc.identifier.uri	https://doi.org/10.2312/pg.20241305
dc.identifier.uri	https://diglib.eg.org/handle/10.2312/pg20241305
dc.publisher	The Eurographics Association	en_US
dc.rights	Attribution 4.0 International License
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	CCS Concepts: Computing methodologies → Collision detection; Hardware → Sensors and actuators; PCB design and layout
dc.subject	Computing methodologies → Collision detection
dc.subject	Hardware → Sensors and actuators
dc.subject	PCB design and layout
dc.title	DViTGAN: Training ViTGANs with Diffusion	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: pg20241305.pdf
Size:: 1.99 MB
Format:: Adobe Portable Document Format

Download

Collections

PG2024 Conference Papers and Posters