Modeling Sketches both Semantically and Structurally for Zero-Shot Sketch-Based Image Retrieval is Better

Jing, Jiansen; Liu, Yujie; Li, Mingyue; Xiao, Qian; Chai, Shijie

Modeling Sketches both Semantically and Structurally for Zero-Shot Sketch-Based Image Retrieval is Better

dc.contributor.author	Jing, Jiansen	en_US
dc.contributor.author	Liu, Yujie	en_US
dc.contributor.author	Li, Mingyue	en_US
dc.contributor.author	Xiao, Qian	en_US
dc.contributor.author	Chai, Shijie	en_US
dc.contributor.editor	Chen, Renjie	en_US
dc.contributor.editor	Ritschel, Tobias	en_US
dc.contributor.editor	Whiting, Emily	en_US
dc.date.accessioned	2024-10-13T18:05:06Z
dc.date.available	2024-10-13T18:05:06Z
dc.date.issued	2024
dc.description.abstract	Sketch, as a representation of human thought, is abstract but also structured because it is presented as a two-dimensional image. Therefore, modeling it from semantic and structural perspectives is reasonable and effective. In this paper, for the semantic capturing, we compare the performance of two mainstream pre-trained models on the Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) task and propose a new model, Semantic Net (SNET), based on Contrastive Language-Image Pre-training (CLIP) with a more effective fine-tuning strategy and a Semantic Preservation Module. Furthermore, we propose three lightweight modules, Channels Fusion (CF), Layers Fusion (LF), and Semantic Structure Fusion (SSF) to endow SNET with the ability of stronger structure capture. Finally, we supervise the entire training process by a classification loss based on contrastive learning and bidirectional triplet loss based on cosine distance metric. We call the final version model Semantic Structure Net (SSNET). The quantitative experimental results show that both our proposed SNET and the enhanced version SSNET achieve the new SOTA (16% retrieval boost on the most difficult QuickDraw Ext dataset). The visualization experiments also prove our thinking on sketch modeling from the side.	en_US
dc.description.sectionheaders	Image Processing and Filtering II
dc.description.seriesinformation	Pacific Graphics Conference Papers and Posters
dc.identifier.doi	10.2312/pg.20241309
dc.identifier.isbn	978-3-03868-250-9
dc.identifier.pages	12 pages
dc.identifier.uri	https://doi.org/10.2312/pg.20241309
dc.identifier.uri	https://diglib.eg.org/handle/10.2312/pg20241309
dc.publisher	The Eurographics Association	en_US
dc.rights	Attribution 4.0 International License
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	CCS Concepts: Computing methodologies → Visual content-based indexing and retrieval
dc.subject	Computing methodologies → Visual content
dc.subject	based indexing and retrieval
dc.title	Modeling Sketches both Semantically and Structurally for Zero-Shot Sketch-Based Image Retrieval is Better	en_US

Files

Original bundle

Now showing 1 - 2 of 2

Name:: pg20241309.pdf
Size:: 2.66 MB
Format:: Adobe Portable Document Format

Download

Name:: pg2024_ssnet_supply.pdf
Size:: 4.44 MB
Format:: Adobe Portable Document Format

Download

Collections

PG2024 Conference Papers and Posters