Modeling Sketches both Semantically and Structurally for Zero-Shot Sketch-Based Image Retrieval is Better
dc.contributor.author | Jing, Jiansen | en_US |
dc.contributor.author | Liu, Yujie | en_US |
dc.contributor.author | Li, Mingyue | en_US |
dc.contributor.author | Xiao, Qian | en_US |
dc.contributor.author | Chai, Shijie | en_US |
dc.contributor.editor | Chen, Renjie | en_US |
dc.contributor.editor | Ritschel, Tobias | en_US |
dc.contributor.editor | Whiting, Emily | en_US |
dc.date.accessioned | 2024-10-13T18:05:06Z | |
dc.date.available | 2024-10-13T18:05:06Z | |
dc.date.issued | 2024 | |
dc.description.abstract | Sketch, as a representation of human thought, is abstract but also structured because it is presented as a two-dimensional image. Therefore, modeling it from semantic and structural perspectives is reasonable and effective. In this paper, for the semantic capturing, we compare the performance of two mainstream pre-trained models on the Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) task and propose a new model, Semantic Net (SNET), based on Contrastive Language-Image Pre-training (CLIP) with a more effective fine-tuning strategy and a Semantic Preservation Module. Furthermore, we propose three lightweight modules, Channels Fusion (CF), Layers Fusion (LF), and Semantic Structure Fusion (SSF) to endow SNET with the ability of stronger structure capture. Finally, we supervise the entire training process by a classification loss based on contrastive learning and bidirectional triplet loss based on cosine distance metric. We call the final version model Semantic Structure Net (SSNET). The quantitative experimental results show that both our proposed SNET and the enhanced version SSNET achieve the new SOTA (16% retrieval boost on the most difficult QuickDraw Ext dataset). The visualization experiments also prove our thinking on sketch modeling from the side. | en_US |
dc.description.sectionheaders | Image Processing and Filtering II | |
dc.description.seriesinformation | Pacific Graphics Conference Papers and Posters | |
dc.identifier.doi | 10.2312/pg.20241309 | |
dc.identifier.isbn | 978-3-03868-250-9 | |
dc.identifier.pages | 12 pages | |
dc.identifier.uri | https://doi.org/10.2312/pg.20241309 | |
dc.identifier.uri | https://diglib.eg.org/handle/10.2312/pg20241309 | |
dc.publisher | The Eurographics Association | en_US |
dc.rights | Attribution 4.0 International License | |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | |
dc.subject | CCS Concepts: Computing methodologies → Visual content-based indexing and retrieval | |
dc.subject | Computing methodologies → Visual content | |
dc.subject | based indexing and retrieval | |
dc.title | Modeling Sketches both Semantically and Structurally for Zero-Shot Sketch-Based Image Retrieval is Better | en_US |