SLGDiffuser : Stroke-level Guidance Diffusion Model for Complex Scene Text Editing
dc.contributor.author | Liu, Xiao Le | en_US |
dc.contributor.author | Wu, Lei | en_US |
dc.contributor.author | Wang, Chang Shuo | en_US |
dc.contributor.author | Dong, Pei | en_US |
dc.contributor.author | Meng, Xiang Xu | en_US |
dc.contributor.editor | Chen, Renjie | en_US |
dc.contributor.editor | Ritschel, Tobias | en_US |
dc.contributor.editor | Whiting, Emily | en_US |
dc.date.accessioned | 2024-10-13T18:05:02Z | |
dc.date.available | 2024-10-13T18:05:02Z | |
dc.date.issued | 2024 | |
dc.description.abstract | Scene Text Editing (STE) focuses on replacing text in images while preserving style and background. Existing methods often grapple with simultaneously learning different transformation rules for text and background, especially in complex scenes. This leads to several notable challenges, such as low accuracy in content, ineffective extraction of text styles, and suboptimal background reconstruction. To address these challenges, we introduce SLGDiffuser, a stroke-level guidance diffusion model specifically designed for complex scene text editing. SLGDiffuser features a stroke-level guidance text conversion module that processes target text through character encoding and utilizes ContourLoss with stroke features to improve text accuracy. It also benefits from the proposed stroke-enhanced strategy, which enhances text integrity by leveraging detailed stroke information. Furthermore, we introduce a unified instruction-based background reconstruction module that fine-tunes a pre-trained diffusion model. It enables the application of a standardized instruction prompt to reconstruct a variety of complex scenes effectively. Tested extensively, our model outperforms existing methods across diverse real-world datasets. We release code and model weights at https://github.com/lxlde/SLGDiffuser | en_US |
dc.description.sectionheaders | Image Processing and Filtering II | |
dc.description.seriesinformation | Pacific Graphics Conference Papers and Posters | |
dc.identifier.doi | 10.2312/pg.20241308 | |
dc.identifier.isbn | 978-3-03868-250-9 | |
dc.identifier.pages | 12 pages | |
dc.identifier.uri | https://doi.org/10.2312/pg.20241308 | |
dc.identifier.uri | https://diglib.eg.org/handle/10.2312/pg20241308 | |
dc.publisher | The Eurographics Association | en_US |
dc.rights | Attribution 4.0 International License | |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | |
dc.subject | CCS Concepts: Imaging → Image/Video Editing; Image Processing ; Methods and Applications → Artificial Intelligence | |
dc.subject | Imaging → Image/Video Editing | |
dc.subject | Image Processing | |
dc.subject | Methods and Applications → Artificial Intelligence | |
dc.title | SLGDiffuser : Stroke-level Guidance Diffusion Model for Complex Scene Text Editing | en_US |
Files
Original bundle
1 - 1 of 1