SLGDiffuser : Stroke-level Guidance Diffusion Model for Complex Scene Text Editing

dc.contributor.authorLiu, Xiao Leen_US
dc.contributor.authorWu, Leien_US
dc.contributor.authorWang, Chang Shuoen_US
dc.contributor.authorDong, Peien_US
dc.contributor.authorMeng, Xiang Xuen_US
dc.contributor.editorChen, Renjieen_US
dc.contributor.editorRitschel, Tobiasen_US
dc.contributor.editorWhiting, Emilyen_US
dc.date.accessioned2024-10-13T18:05:02Z
dc.date.available2024-10-13T18:05:02Z
dc.date.issued2024
dc.description.abstractScene Text Editing (STE) focuses on replacing text in images while preserving style and background. Existing methods often grapple with simultaneously learning different transformation rules for text and background, especially in complex scenes. This leads to several notable challenges, such as low accuracy in content, ineffective extraction of text styles, and suboptimal background reconstruction. To address these challenges, we introduce SLGDiffuser, a stroke-level guidance diffusion model specifically designed for complex scene text editing. SLGDiffuser features a stroke-level guidance text conversion module that processes target text through character encoding and utilizes ContourLoss with stroke features to improve text accuracy. It also benefits from the proposed stroke-enhanced strategy, which enhances text integrity by leveraging detailed stroke information. Furthermore, we introduce a unified instruction-based background reconstruction module that fine-tunes a pre-trained diffusion model. It enables the application of a standardized instruction prompt to reconstruct a variety of complex scenes effectively. Tested extensively, our model outperforms existing methods across diverse real-world datasets. We release code and model weights at https://github.com/lxlde/SLGDiffuseren_US
dc.description.sectionheadersImage Processing and Filtering II
dc.description.seriesinformationPacific Graphics Conference Papers and Posters
dc.identifier.doi10.2312/pg.20241308
dc.identifier.isbn978-3-03868-250-9
dc.identifier.pages12 pages
dc.identifier.urihttps://doi.org/10.2312/pg.20241308
dc.identifier.urihttps://diglib.eg.org/handle/10.2312/pg20241308
dc.publisherThe Eurographics Associationen_US
dc.rightsAttribution 4.0 International License
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subjectCCS Concepts: Imaging → Image/Video Editing; Image Processing ; Methods and Applications → Artificial Intelligence
dc.subjectImaging → Image/Video Editing
dc.subjectImage Processing
dc.subjectMethods and Applications → Artificial Intelligence
dc.titleSLGDiffuser : Stroke-level Guidance Diffusion Model for Complex Scene Text Editingen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
pg20241308.pdf
Size:
2.62 MB
Format:
Adobe Portable Document Format