arXiv 论文速递

2025-08-25 13:39
Snapshot: 20250825_1339
DesignCLIP: Multimodal Learning with CLIP for Design Patent Understanding
Authors: Zhu Wang, Homaira Huda Shomee, Sathya N. Ravi, Sourav Medya
Venue: EMNLP 2025
First: 2025-08-21T06:36:24+00:00 · Latest: 2025-08-21T06:36:24+00:00
Comments: Accepted by EMNLP 2025. 22 pages, 14 figures
Abstract
In the field of design patent analysis, traditional tasks such as patent classification and patent image retrieval heavily depend on the image data. However, patent images -- typically consisting of sketches with abstract and structural elements of an invention -- often fall short in conveying comprehensive visual context and semantic information. This inadequacy can lead to ambiguities in evaluation during prior art searches. Recent advancements in vision-language models, such as CLIP, offer promising opportunities for more reliable and accurate AI-driven patent analysis. In this work, we leverage CLIP models to develop a unified framework DesignCLIP for design patent applications with a large-scale dataset of U.S. design patents. To address the unique characteristics of patent data, DesignCLIP incorporates class-aware classification and contrastive learning, utilizing generated detailed captions for patent images and multi-views image learning. We validate the effectiveness of DesignCLIP across various downstream tasks, including patent classification and patent retrieval. Additionally, we explore multimodal patent retrieval, which provides the potential to enhance creativity and innovation in design by offering more diverse sources of inspiration. Our experiments show that DesignCLIP consistently outperforms baseline and SOTA models in the patent domain on all tasks. Our findings underscore the promise of multimodal approaches in advancing patent analysis. The codebase is available here: https://anonymous.4open.science/r/PATENTCLIP-4661/README.md.
中文标题/摘要
标题:DesignCLIP:基于CLIP的多模态学习用于设计专利理解
在设计专利分析领域,专利分类和专利图像检索等传统任务严重依赖图像数据。然而,专利图像——通常包含展现发明抽象和结构元素的草图——往往难以传达全面的视觉语境和语义信息。这种不足可能导致在先技术检索评估中的模糊性。近期视觉-语言模型(如CLIP)的进展为更可靠、精准的AI驱动专利分析提供了新机遇。本研究利用CLIP模型开发了统一框架DesignCLIP,基于美国设计专利大规模数据集,通过类别感知分类和对比学习策略,结合专利图像的生成式详细标注与多视角图像学习来应对专利数据特性。我们在专利分类和检索等下游任务中验证了DesignCLIP的有效性,并探索了多模态专利检索如何通过提供多样化灵感来源促进设计创新。实验表明DesignCLIP在所有任务上均优于基线模型和专利领域SOTA模型。研究成果印证了多模态方法推动专利分析发展的潜力。代码库详见:https://anonymous.4open.science/r/PATENTCLIP-4661/README.md
Summary / 总结
In the field of design patent analysis, traditional tasks such as patent classification and patent image retrieval heavily depend on the image data.
History