DViTGAN: Training ViTGANs with Diffusion

dc.contributor.authorTong, Mengjunen_US
dc.contributor.authorRao, Hongen_US
dc.contributor.authorYang, Wenjien_US
dc.contributor.authorChen, Shengboen_US
dc.contributor.authorZuo, Fangen_US
dc.contributor.editorChen, Renjieen_US
dc.contributor.editorRitschel, Tobiasen_US
dc.contributor.editorWhiting, Emilyen_US
dc.date.accessioned2024-10-13T18:04:53Z
dc.date.available2024-10-13T18:04:53Z
dc.date.issued2024
dc.description.abstractRecent research findings indicate that injecting noise using diffusion can effectively improve the stability of GAN for image generation tasks. Although ViTGAN based on Vision Transformer has certain performance advantages compared to traditional GAN, there are still issues such as unstable training and generated image details are not rich enough. Therefore, in this paper, we propose a novel model, DViTGAN, which leverages the diffusion model to generate instance noise facilitating ViTGAN training. Specifically, we employ forward diffusion to progressively generate noise that follows a Gaussian mixture distribution, and then introduce the generated noise into the input image of the discriminator. The generator incorporates the discriminator's feedback by backpropagating through the forward diffusion process to improve its performance. In addition, we observe that the ViTGAN generator lacks positional information, leading to a decreased context modeling ability and slower convergence. To this end, we introduce Fourier embedding and relative positional encoding to enhance the model's expressive ability. Experiments on multiple popular benchmarks have demonstrated the effectiveness of our proposed model.en_US
dc.description.sectionheadersImage Synthesis
dc.description.seriesinformationPacific Graphics Conference Papers and Posters
dc.identifier.doi10.2312/pg.20241305
dc.identifier.isbn978-3-03868-250-9
dc.identifier.pages10 pages
dc.identifier.urihttps://doi.org/10.2312/pg.20241305
dc.identifier.urihttps://diglib.eg.org/handle/10.2312/pg20241305
dc.publisherThe Eurographics Associationen_US
dc.rightsAttribution 4.0 International License
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subjectCCS Concepts: Computing methodologies → Collision detection; Hardware → Sensors and actuators; PCB design and layout
dc.subjectComputing methodologies → Collision detection
dc.subjectHardware → Sensors and actuators
dc.subjectPCB design and layout
dc.titleDViTGAN: Training ViTGANs with Diffusionen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
pg20241305.pdf
Size:
1.99 MB
Format:
Adobe Portable Document Format