arXiv 论文速递

Flow Matching-Based Generative Modeling for Efficient and Scalable Data Assimilation

Authors: Taos Transue, Bohan Chen, So Takao, Bao Wang

First: 2025-08-18T19:00:45+00:00 · Latest: 2025-08-22T15:54:49+00:00

Comments: correcting authorship footnote, reformatting figures

Abstract

Data assimilation (DA) is the problem of sequentially estimating the state of a dynamical system from noisy observations. Recent advances in generative modeling have inspired new approaches to DA in high-dimensional nonlinear settings, especially the ensemble score filter (EnSF). However, these come at a significant computational burden due to slow sampling. In this paper, we introduce a new filtering framework based on flow matching (FM) -- called the ensemble flow filter (EnFF) -- to accelerate sampling and enable flexible design of probability paths. EnFF -- a training-free DA approach -- integrates MC estimators for the marginal FM vector field (VF) and a localized guidance to assimilate observations. EnFF has faster sampling and more flexibility in VF design compared to existing generative modeling for DA. Theoretically, we show that EnFF encompasses classical filtering methods such as the bootstrap particle filter and the ensemble Kalman filter as special cases. Experiments on high-dimensional filtering benchmarks demonstrate improved cost-accuracy tradeoffs and the ability to leverage larger ensembles than prior methods. Our results highlight the promise of FM as a scalable tool for filtering in high-dimensional applications that enable the use of large ensembles.

中文标题/摘要

标题：基于流匹配的高效可扩展数据同化生成建模

数据同化（DA）是从噪声观测中顺序估计动态系统状态的问题。生成建模的最新进展为高维非线性场景下的DA提供了新方法，尤其是集成评分滤波器（EnSF）。然而，由于采样速度慢，这些方法带来了显著的计算负担。本文提出了一种基于流匹配（FM）的新滤波框架——集成流滤波器（EnFF），以加速采样并实现概率路径的灵活设计。作为一种无需训练的DA方法，EnFF整合了边际FM向量场（VF）的蒙特卡洛估计器和局部化指导来同化观测。与现有DA生成建模相比，EnFF具有更快的采样速度和更灵活的VF设计。理论上，我们证明EnFF包含了经典滤波方法（如自举粒子滤波器和集成卡尔曼滤波器）作为特例。在高维滤波基准测试中，实验显示出改进的成本-精度权衡以及利用比先前方法更大集成规模的能力。我们的结果凸显了FM作为高维应用中可扩展滤波工具的潜力，能够支持大规模集成使用。

Summary / 总结

Data assimilation (DA) is the problem of sequentially estimating the state of a dynamical system from noisy observations.

Modular Embedding Recomposition for Incremental Learning

Authors: Aniello Panariello, Emanuele Frascaroli, Pietro Buzzega, Lorenzo Bonicelli, Angelo Porrello, Simone Calderara

First: 2025-08-22T15:25:40+00:00 · Latest: 2025-08-22T15:25:40+00:00

Comments: Accepted to the 36th British Machine Vision Conference (BMVC 2025), Sheffield, UK

Abs · PDF · Code1

Abstract

The advent of pre-trained Vision-Language Models (VLMs) has significantly transformed Continual Learning (CL), mainly due to their zero-shot classification abilities. Such proficiency makes VLMs well-suited for real-world applications, enabling robust performance on novel unseen classes without requiring adaptation. However, fine-tuning remains essential when downstream tasks deviate significantly from the pre-training domain. Prior CL approaches primarily focus on preserving the zero-shot capabilities of VLMs during incremental fine-tuning on a downstream task. We take a step further by devising an approach that transforms preservation into enhancement of the zero-shot capabilities of VLMs. Our approach, named MoDular Embedding Recomposition (MoDER), introduces a modular framework that trains multiple textual experts, each specialized in a single seen class, and stores them in a foundational hub. At inference time, for each unseen class, we query the hub and compose the retrieved experts to synthesize a refined prototype that improves classification. We show the effectiveness of our method across two popular zero-shot incremental protocols, Class-IL and MTIL, comprising a total of 14 datasets. The codebase is available at https://github.com/aimagelab/mammoth.

中文标题/摘要

标题：模块化嵌入重组在增量学习中的应用

预训练视觉-语言模型（VLMs）的出现显著改变了持续学习（CL）领域，主要得益于其零样本分类能力。这种能力使VLMs非常适合现实应用，能在无需适配的情况下对未见类别保持强大性能。然而当下游任务与预训练领域差异较大时，微调仍不可或缺。现有CL方法主要关注在下游任务增量微调期间保持VLMs的零样本能力，我们进一步提出将这种保持转化为增强的方法——模块化嵌入重组（MoDER）。该方法通过训练多个文本专家（每个专精于一个已见类别）并存储于基础中心，推理时针对未见类别查询中心并组合检索到的专家以合成改进的分类原型。我们在Class-IL和MTIL两种零样本增量协议（共包含14个数据集）上验证了方法的有效性。代码库详见：https://github.com/aimagelab/mammoth。

Summary / 总结

The advent of pre-trained Vision-Language Models (VLMs) has significantly transformed Continual Learning (CL), mainly due to their zero-shot classification abilities.

PediatricsMQA: a Multi-modal Pediatrics Question Answering Benchmark

Authors: Adil Bahaj, Mounir Ghogho

First: 2025-08-22T14:50:55+00:00 · Latest: 2025-08-22T14:50:55+00:00