SLGDiffuser : Stroke-level Guidance Diffusion Model for Complex Scene Text Editing

Liu, Xiao Le; Wu, Lei; Wang, Chang Shuo; Dong, Pei; Meng, Xiang Xu

SLGDiffuser : Stroke-level Guidance Diffusion Model for Complex Scene Text Editing

dc.contributor.author	Liu, Xiao Le	en_US
dc.contributor.author	Wu, Lei	en_US
dc.contributor.author	Wang, Chang Shuo	en_US
dc.contributor.author	Dong, Pei	en_US
dc.contributor.author	Meng, Xiang Xu	en_US
dc.contributor.editor	Chen, Renjie	en_US
dc.contributor.editor	Ritschel, Tobias	en_US
dc.contributor.editor	Whiting, Emily	en_US
dc.date.accessioned	2024-10-13T18:05:02Z
dc.date.available	2024-10-13T18:05:02Z
dc.date.issued	2024
dc.description.abstract	Scene Text Editing (STE) focuses on replacing text in images while preserving style and background. Existing methods often grapple with simultaneously learning different transformation rules for text and background, especially in complex scenes. This leads to several notable challenges, such as low accuracy in content, ineffective extraction of text styles, and suboptimal background reconstruction. To address these challenges, we introduce SLGDiffuser, a stroke-level guidance diffusion model specifically designed for complex scene text editing. SLGDiffuser features a stroke-level guidance text conversion module that processes target text through character encoding and utilizes ContourLoss with stroke features to improve text accuracy. It also benefits from the proposed stroke-enhanced strategy, which enhances text integrity by leveraging detailed stroke information. Furthermore, we introduce a unified instruction-based background reconstruction module that fine-tunes a pre-trained diffusion model. It enables the application of a standardized instruction prompt to reconstruct a variety of complex scenes effectively. Tested extensively, our model outperforms existing methods across diverse real-world datasets. We release code and model weights at https://github.com/lxlde/SLGDiffuser	en_US
dc.description.sectionheaders	Image Processing and Filtering II
dc.description.seriesinformation	Pacific Graphics Conference Papers and Posters
dc.identifier.doi	10.2312/pg.20241308
dc.identifier.isbn	978-3-03868-250-9
dc.identifier.pages	12 pages
dc.identifier.uri	https://doi.org/10.2312/pg.20241308
dc.identifier.uri	https://diglib.eg.org/handle/10.2312/pg20241308
dc.publisher	The Eurographics Association	en_US
dc.rights	Attribution 4.0 International License
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	CCS Concepts: Imaging → Image/Video Editing; Image Processing ; Methods and Applications → Artificial Intelligence
dc.subject	Imaging → Image/Video Editing
dc.subject	Image Processing
dc.subject	Methods and Applications → Artificial Intelligence
dc.title	SLGDiffuser : Stroke-level Guidance Diffusion Model for Complex Scene Text Editing	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: pg20241308.pdf
Size:: 2.62 MB
Format:: Adobe Portable Document Format

Download

Collections

PG2024 Conference Papers and Posters