Modeling Sketches both Semantically and Structurally for Zero-Shot Sketch-Based Image Retrieval is Better

dc.contributor.authorJing, Jiansenen_US
dc.contributor.authorLiu, Yujieen_US
dc.contributor.authorLi, Mingyueen_US
dc.contributor.authorXiao, Qianen_US
dc.contributor.authorChai, Shijieen_US
dc.contributor.editorChen, Renjieen_US
dc.contributor.editorRitschel, Tobiasen_US
dc.contributor.editorWhiting, Emilyen_US
dc.date.accessioned2024-10-13T18:05:06Z
dc.date.available2024-10-13T18:05:06Z
dc.date.issued2024
dc.description.abstractSketch, as a representation of human thought, is abstract but also structured because it is presented as a two-dimensional image. Therefore, modeling it from semantic and structural perspectives is reasonable and effective. In this paper, for the semantic capturing, we compare the performance of two mainstream pre-trained models on the Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) task and propose a new model, Semantic Net (SNET), based on Contrastive Language-Image Pre-training (CLIP) with a more effective fine-tuning strategy and a Semantic Preservation Module. Furthermore, we propose three lightweight modules, Channels Fusion (CF), Layers Fusion (LF), and Semantic Structure Fusion (SSF) to endow SNET with the ability of stronger structure capture. Finally, we supervise the entire training process by a classification loss based on contrastive learning and bidirectional triplet loss based on cosine distance metric. We call the final version model Semantic Structure Net (SSNET). The quantitative experimental results show that both our proposed SNET and the enhanced version SSNET achieve the new SOTA (16% retrieval boost on the most difficult QuickDraw Ext dataset). The visualization experiments also prove our thinking on sketch modeling from the side.en_US
dc.description.sectionheadersImage Processing and Filtering II
dc.description.seriesinformationPacific Graphics Conference Papers and Posters
dc.identifier.doi10.2312/pg.20241309
dc.identifier.isbn978-3-03868-250-9
dc.identifier.pages12 pages
dc.identifier.urihttps://doi.org/10.2312/pg.20241309
dc.identifier.urihttps://diglib.eg.org/handle/10.2312/pg20241309
dc.publisherThe Eurographics Associationen_US
dc.rightsAttribution 4.0 International License
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subjectCCS Concepts: Computing methodologies → Visual content-based indexing and retrieval
dc.subjectComputing methodologies → Visual content
dc.subjectbased indexing and retrieval
dc.titleModeling Sketches both Semantically and Structurally for Zero-Shot Sketch-Based Image Retrieval is Betteren_US
Files
Original bundle
Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
pg20241309.pdf
Size:
2.66 MB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
pg2024_ssnet_supply.pdf
Size:
4.44 MB
Format:
Adobe Portable Document Format