MV-RAG: Retrieval Augmented Multiview Diffusion
Authors: Yosef Dayani, Omer Benishu, Sagie Benaim
First: 2025-08-22T17:59:40+00:00 · Latest: 2025-08-22T17:59:40+00:00
Comments: Project page: https://yosefdayani.github.io/MV-RAG
Abstract
Text-to-3D generation approaches have advanced significantly by leveraging
pretrained 2D diffusion priors, producing high-quality and 3D-consistent
outputs. However, they often fail to produce out-of-domain (OOD) or rare
concepts, yielding inconsistent or inaccurate results. To this end, we propose
MV-RAG, a novel text-to-3D pipeline that first retrieves relevant 2D images
from a large in-the-wild 2D database and then conditions a multiview diffusion
model on these images to synthesize consistent and accurate multiview outputs.
Training such a retrieval-conditioned model is achieved via a novel hybrid
strategy bridging structured multiview data and diverse 2D image collections.
This involves training on multiview data using augmented conditioning views
that simulate retrieval variance for view-specific reconstruction, alongside
training on sets of retrieved real-world 2D images using a distinctive held-out
view prediction objective: the model predicts the held-out view from the other
views to infer 3D consistency from 2D data. To facilitate a rigorous OOD
evaluation, we introduce a new collection of challenging OOD prompts.
Experiments against state-of-the-art text-to-3D, image-to-3D, and
personalization baselines show that our approach significantly improves 3D
consistency, photorealism, and text adherence for OOD/rare concepts, while
maintaining competitive performance on standard benchmarks.
中文标题/摘要
标题:MV-RAG:检索增强的多视角扩散模型
文本到3D生成方法通过利用预训练的2D扩散先验取得了显著进展,能产生高质量且3D一致的结果。然而,这些方法在处理域外(OOD)或罕见概念时往往表现不佳,导致生成结果不一致或不准确。为此,我们提出MV-RAG——一种新颖的文本到3D流程:首先从大规模真实世界2D数据库中检索相关图像,随后以这些图像为条件驱动多视角扩散模型,合成具有一致性和准确性的多视角输出。通过创新性混合策略,该模型实现了结构化多视角数据与多样化2D图像集的协同训练:一方面使用模拟检索差异的增强条件视图进行多视角数据训练以实现视角特异性重建,另一方面通过独特留出视角预测目标对检索到的真实2D图像集进行训练——模型根据其他视角预测留出视角,从而从2D数据推断3D一致性。为促进严格OOD评估,我们构建了具有挑战性的OOD提示词集合。与最先进的文本到3D、图像到3D及个性化基线方法的对比实验表明,我们的方法在保持标准基准竞争力的同时,显著提升了OOD/罕见概念的3D一致性、照片真实感和文本遵循度。
Summary / 总结
To address the limitations of text-to-3D generation methods in handling out-of-domain or rare concepts, which often result in inconsistent or inaccurate outputs, this work introduces MV-RAG, a retrieval-augmented pipeline. The method first retrieves relevant 2D images from a large database and conditions a multiview diffusion model on these images, using a hybrid training strategy that combines structured multiview data with diverse 2D collections via view-specific reconstruction and held-out view prediction. Experimental results demonstrate that MV-RAG significantly enhances 3D consistency, photorealism, and text adherence for challenging OOD prompts, outperforming state-of-the-art text-to-3D, image-to-3D, and personalization baselines while maintaining competitive performance on standard benchmarks.
针对文本到3D生成方法在处理域外或罕见概念时经常产生不一致或不准确结果的问题,本文提出了MV-RAG,一种检索增强的多视角扩散流程。该方法首先从大规模二维图像数据库中检索相关图像,并以此作为条件训练多视角扩散模型,采用混合策略结合结构化多视角数据和多样化二维图像集,通过视角特定重建和保留视角预测来推断三维一致性。实验结果表明,MV-RAG在处理挑战性域外概念时显著提升了三维一致性、照片真实感和文本遵循性,同时在标准基准上保持了竞争力。
Benchmarking Training Paradigms, Dataset Composition, and Model Scaling for Child ASR in ESPnet
Authors: Anyu Ying, Natarajan Balaji Shankar, Chyi-Jiunn Lin, Mohan Shi, Pu Wang, Hye-jin Shim, Siddhant Arora, Hugo Van hamme, Abeer Alwan, Shinji Watanabe
First: 2025-08-22T17:59:35+00:00 · Latest: 2025-08-22T17:59:35+00:00
Comments: 5 pages, 3 figures, presented at WOCCI 2025 (Workshop on Child
Computer Interaction), satellite workshop of Interspeech 2025
Abstract
Despite advancements in ASR, child speech recognition remains challenging due
to acoustic variability and limited annotated data. While fine-tuning adult ASR
models on child speech is common, comparisons with flat-start training remain
underexplored. We compare flat-start training across multiple datasets, SSL
representations (WavLM, XEUS), and decoder architectures. Our results show that
SSL representations are biased toward adult speech, with flat-start training on
child speech mitigating these biases. We also analyze model scaling, finding
consistent improvements up to 1B parameters, beyond which performance plateaus.
Additionally, age-related ASR and speaker verification analysis highlights the
limitations of proprietary models like Whisper, emphasizing the need for
open-data models for reliable child speech research. All investigations are
conducted using ESPnet, and our publicly available benchmark provides insights
into training strategies for robust child speech processing.
中文标题/摘要
标题:ESPnet中儿童语音识别的训练范式、数据集构成与模型规模扩展基准研究
尽管自动语音识别(ASR)技术有所进步,但由于声学变异性和标注数据有限,儿童语音识别仍具挑战。虽然通常采用成人ASR模型对儿童语音进行微调,但与从零开始的平面训练方式的对比研究仍不足。我们比较了跨多个数据集的平面训练、自监督学习表示(WavLM, XEUS)及解码器架构。结果显示,SSL表示存在对成人语音的偏好,而采用儿童语音的平面训练可缓解这种偏差。模型规模分析表明,参数增至10亿时性能持续提升,之后趋于稳定。此外,基于年龄的ASR和说话人验证分析揭示了如Whisper等专有模型的局限性,强调需要开放数据模型以支持可靠的儿童语音研究。所有研究均基于ESPnet框架,公开的基准为鲁棒的儿童语音处理训练策略提供了见解。
Summary / 总结
Child speech recognition remains difficult due to acoustic variability and scarce annotated data, motivating a comparison of training paradigms and model scaling. The study benchmarks flat-start training against fine-tuning using multiple datasets, self-supervised learning representations (WavLM, XEUS), and varied decoder architectures within ESPnet. Key findings reveal that SSL representations exhibit adult speech bias, mitigated by flat-start training on child data; model performance improves consistently up to 1B parameters before plateauing, and proprietary models like Whisper show limitations in age-related tasks, underscoring the value of open-data models for child speech research.
儿童语音识别因声学多变性和标注数据稀缺而面临挑战,研究旨在比较训练范式和模型缩放策略。该工作基于ESPnet框架,对比了多种数据集上的从头训练与微调方法,并评估了自监督学习表示(如WavLM和XEUS)及解码器架构。实验发现自监督表示存在成人语音偏差,而儿童数据的从头训练可缓解此问题;模型性能在10亿参数内持续提升后趋于饱和,且Whisper等私有模型在年龄相关任务中表现受限,突显了开放数据模型对儿童语音研究的重要性。
A Disease-Centric Vision-Language Foundation Model for Precision Oncology in Kidney Cancer
Authors: Yuhui Tao, Zhongwei Zhao, Zilong Wang, Xufang Luo, Feng Chen, Kang Wang, Chuanfu Wu, Xue Zhang, Shaoting Zhang, Jiaxi Yao, Xingwei Jin, Xinyang Jiang, Yifan Yang, Dongsheng Li, Lili Qiu, Zhiqiang Shao, Jianming Guo, Nengwang Yu, Shuo Wang, Ying Xiong
First: 2025-08-22T17:48:19+00:00 · Latest: 2025-08-22T17:48:19+00:00
Abstract
The non-invasive assessment of increasingly incidentally discovered renal
masses is a critical challenge in urologic oncology, where diagnostic
uncertainty frequently leads to the overtreatment of benign or indolent tumors.
In this study, we developed and validated RenalCLIP using a dataset of 27,866
CT scans from 8,809 patients across nine Chinese medical centers and the public
TCIA cohort, a visual-language foundation model for characterization, diagnosis
and prognosis of renal mass. The model was developed via a two-stage
pre-training strategy that first enhances the image and text encoders with
domain-specific knowledge before aligning them through a contrastive learning
objective, to create robust representations for superior generalization and
diagnostic precision. RenalCLIP achieved better performance and superior
generalizability across 10 core tasks spanning the full clinical workflow of
kidney cancer, including anatomical assessment, diagnostic classification, and
survival prediction, compared with other state-of-the-art general-purpose CT
foundation models. Especially, for complicated task like recurrence-free
survival prediction in the TCIA cohort, RenalCLIP achieved a C-index of 0.726,
representing a substantial improvement of approximately 20% over the leading
baselines. Furthermore, RenalCLIP's pre-training imparted remarkable data
efficiency; in the diagnostic classification task, it only needs 20% training
data to achieve the peak performance of all baseline models even after they
were fully fine-tuned on 100% of the data. Additionally, it achieved superior
performance in report generation, image-text retrieval and zero-shot diagnosis
tasks. Our findings establish that RenalCLIP provides a robust tool with the
potential to enhance diagnostic accuracy, refine prognostic stratification, and
personalize the management of patients with kidney cancer.
中文标题/摘要
标题:面向肾癌精准肿瘤学的疾病中心化视觉-语言基础模型
对日益多发的偶发性肾占位进行无创评估是泌尿系肿瘤学的关键挑战,诊断不确定性常导致良性或惰性肿瘤的过度治疗。本研究利用来自中国九家医疗中心和公共TCIA队列的8,809名患者的27,866次CT扫描数据集,开发并验证了视觉-语言基础模型RenalCLIP,用于肾占位的表征、诊断和预后预测。该模型通过两阶段预训练策略开发:首先用领域知识增强图像与文本编码器,再通过对比学习目标对齐二者,以创建具有卓越泛化能力和诊断精度的鲁棒表征。相较于其他最先进的通用CT基础模型,RenalCLIP在涵盖肾癌全临床工作流的10项核心任务(包括解剖评估、诊断分类和生存预测)中表现出更优的性能和泛化能力。特别是在TCIA队列中无复发生存预测这类复杂任务上,RenalCLIP取得了0.726的C指数,较领先基线提升约20%。此外,该模型的预训练赋予其显著的数据效率——在诊断分类任务中,仅需20%的训练数据即可达到所有基线模型使用100%数据充分微调后的峰值性能。同时,在报告生成、图文检索和零样本诊断任务中也实现了卓越性能。本研究证实RenalCLIP为提升诊断准确性、优化预后分层及实现肾癌患者个体化管理提供了强有力的工具。
Summary / 总结
To address the challenge of non-invasive assessment and diagnostic uncertainty in renal masses that often leads to overtreatment, this study developed RenalCLIP, a vision-language foundation model for kidney cancer. The method employed a two-stage pre-training strategy that first enhanced image and text encoders with domain-specific knowledge and then aligned them via contrastive learning to create robust representations. Experimental results demonstrated superior performance across 10 clinical tasks, including a 20% improvement in recurrence-free survival prediction (C-index 0.726), exceptional data efficiency requiring only 20% of training data to match baseline performance, and strong capabilities in report generation, retrieval, and zero-shot diagnosis.
该研究针对肾脏肿瘤非侵入性评估中过度治疗良性肿瘤的挑战,开发了用于肾癌的视觉-语言基础模型RenalCLIP。方法采用两阶段预训练策略:先通过领域知识增强图像和文本编码器,再通过对比学习对齐它们以构建泛化性强的表征。实验结果显示,模型在10项临床任务中性能优越,包括无复发生存预测指标提升20%(C-index 0.726)、数据效率高(仅用20%训练数据达到基线100%数据性能),并在报告生成和零样本诊断中表现突出。
Closer to Reality: Practical Semi-Supervised Federated Learning for Foundation Model Adaptation
Authors: Guangyu Sun, Jingtao Li, Weiming Zhuang, Chen Chen, Chen Chen, Lingjuan Lyu
First: 2025-08-22T17:47:02+00:00 · Latest: 2025-08-22T17:47:02+00:00
Abstract
Foundation models (FMs) exhibit remarkable generalization but require
adaptation to downstream tasks, particularly in privacy-sensitive applications.
Due to data privacy regulations, cloud-based FMs cannot directly access private
edge data, limiting their adaptation. Federated learning (FL) provides a
privacy-aware alternative, but existing FL approaches overlook the constraints
imposed by edge devices -- namely, limited computational resources and the
scarcity of labeled data. To address these challenges, we introduce Practical
Semi-Supervised Federated Learning (PSSFL), where edge devices hold only
unlabeled, low-resolution data, while the server has limited labeled,
high-resolution data. In this setting, we propose the Federated Mixture of
Experts (FedMox), a novel framework that enhances FM adaptation in FL. FedMox
tackles computational and resolution mismatch challenges via a sparse
Mixture-of-Experts architecture, employing a spatial router to align features
across resolutions and a Soft-Mixture strategy to stabilize semi-supervised
learning. We take object detection as a case study, and experiments on
real-world autonomous driving datasets demonstrate that FedMox effectively
adapts FMs under PSSFL, significantly improving performance with constrained
memory costs on edge devices. Our work paves the way for scalable and
privacy-preserving FM adaptation in federated scenarios.
中文标题/摘要
标题:更贴近现实:面向基础模型适配的实用半监督联邦学习
基础模型(FMs)展现出卓越的泛化能力,但需针对下游任务进行适配,尤其在隐私敏感应用中。由于数据隐私法规,云端基础模型无法直接访问私有边缘数据,限制了其适配能力。联邦学习(FL)提供了隐私保护的替代方案,但现有FL方法忽视了边缘设备带来的约束——即有限的计算资源和标注数据稀缺。为应对这些挑战,我们提出实用半监督联邦学习(PSSFL),其中边缘设备仅持有未标注的低分辨率数据,而服务器拥有有限标注的高分辨率数据。在此设定下,我们提出联邦专家混合模型(FedMox),这一新颖框架增强FL中的FM适配能力。FedMox通过稀疏专家混合架构应对计算与分辨率失配挑战,采用空间路由器对齐跨分辨率特征,并通过软混合策略稳定半监督学习。以目标检测为案例研究,在真实自动驾驶数据集上的实验表明,FedMox在PSSFL下有效适配FMs,在边缘设备有限内存成本下显著提升性能。我们的工作为联邦场景中可扩展且隐私保护的基础模型适配开辟了道路。
Summary / 总结
Foundation models require adaptation to downstream tasks while respecting data privacy, but existing federated learning methods overlook edge devices' computational constraints and scarcity of labeled data. The authors propose Practical Semi-Supervised Federated Learning (PSSFL) and introduce Federated Mixture of Experts (FedMox), which uses a sparse Mixture-of-Experts architecture with a spatial router to handle resolution mismatches and a Soft-Mixture strategy to stabilize semi-supervised learning. Experiments on autonomous driving datasets show that FedMox effectively adapts foundation models under PSSFL, achieving significant performance improvements with constrained memory costs on edge devices.
基础模型需要适应下游任务,但隐私法规限制了云端对边缘私有数据的直接访问,而现有联邦学习方法忽略了边缘设备的计算资源有限和标注数据稀缺问题。提出的实用半监督联邦学习(PSSFL)框架——联邦专家混合(FedMox),通过稀疏专家混合架构、空间路由器实现分辨率对齐,以及软混合策略稳定半监督学习,来解决这些挑战。在自动驾驶数据集上的实验表明,FedMox能有效适应基础模型,在边缘设备内存受限下显著提升性能。
Sparse but Wrong: Incorrect L0 Leads to Incorrect Features in Sparse Autoencoders
Authors: David Chanin, Adrià Garriga-Alonso
First: 2025-08-22T17:26:33+00:00 · Latest: 2025-08-22T17:26:33+00:00
Abstract
Sparse Autoencoders (SAEs) extract features from LLM internal activations,
meant to correspond to single concepts. A core SAE training hyperparameter is
L0: how many features should fire per token on average. Existing work compares
SAE algorithms using sparsity--reconstruction tradeoff plots, implying L0 is a
free parameter with no single correct value. In this work we study the effect
of L0 on BatchTopK SAEs, and show that if L0 is not set precisely, the SAE
fails to learn the underlying features of the LLM. If L0 is too low, the SAE
will mix correlated features to improve reconstruction. If L0 is too high, the
SAE finds degenerate solutions that also mix features. Further, we demonstrate
a method to determine the correct L0 value for an SAE on a given training
distribution, which finds the true L0 in toy models and coincides with peak
sparse probing performance in LLMs. We find that most commonly used SAEs have
an L0 that is too low. Our work shows that, to train SAEs with correct
features, practitioners must set L0 correctly.
中文标题/摘要
标题:稀疏但错误:错误的L0导致稀疏自编码器中的特征提取错误
稀疏自编码器(SAEs)从大语言模型内部激活中提取特征,这些特征本应对应单一概念。SAE训练的核心超参数L0表示每个令牌平均应激活的特征数量。现有研究通过稀疏度-重构权衡图比较SAE算法,暗示L0是可自由调节的参数。本研究探讨了L0对BatchTopK SAEs的影响,证明若L0设置不精确,SAE将无法学习大语言模型的底层特征:L0过低会使SAE混合相关特征以改善重构;L0过高则会导致退化解并混合特征。我们进一步提出确定SAE在给定训练分布下正确L0值的方法,该方法在玩具模型中能找到真实L0值,且与大语言模型中稀疏探测性能峰值吻合。研究发现常用SAEs的L0普遍偏低。本工作表明,要训练出具有正确特征的SAEs,必须准确设置L0参数。
Summary / 总结
This research investigates the critical role of the L0 hyperparameter in Sparse Autoencoders (SAEs), motivated by the observation that existing work treats L0 as a free parameter without a single correct value, potentially leading to incorrect feature learning. The authors study BatchTopK SAEs and demonstrate that incorrect L0 values cause feature mixing: too low L0 merges correlated features for better reconstruction, while too high L0 yields degenerate solutions. They propose a method to determine the correct L0 for a given training distribution, validated on toy models and LLMs where it aligns with peak sparse probing performance. Experimental results reveal that most commonly used SAEs have excessively low L0, emphasizing the necessity of precise L0 setting for accurate feature extraction.
稀疏自编码器(SAE)旨在从大语言模型激活中提取可解释的特征,但现有方法将L0稀疏性超参数视为没有单一正确值的自由变量。本研究探讨了BatchTopK SAE中不正确的L0设置如何导致特征混合:L0过低会使相关特征合并以改善重构,而L0过高则会产生同样混合特征的退化解。实验表明,精确的L0值至关重要,它在玩具模型中与真实值一致,并在大语言模型中实现稀疏探测性能峰值,揭示大多数常用SAE的L0设置过低。
Time-Aware One Step Diffusion Network for Real-World Image Super-Resolution
Authors: Tainyi Zhang, Zheng-Peng Duan, Peng-Tao Jiang, Bo Li, Ming-Ming Cheng, Chun-Le Guo, Chongyi Li
First: 2025-08-22T17:23:49+00:00 · Latest: 2025-08-22T17:23:49+00:00
Abstract
Diffusion-based real-world image super-resolution (Real-ISR) methods have
demonstrated impressive performance. To achieve efficient Real-ISR, many works
employ Variational Score Distillation (VSD) to distill pre-trained
stable-diffusion (SD) model for one-step SR with a fixed timestep. However, due
to the different noise injection timesteps, the SD will perform different
generative priors. Therefore, a fixed timestep is difficult for these methods
to fully leverage the generative priors in SD, leading to suboptimal
performance. To address this, we propose a Time-Aware one-step Diffusion
Network for Real-ISR (TADSR). We first introduce a Time-Aware VAE Encoder,
which projects the same image into different latent features based on
timesteps. Through joint dynamic variation of timesteps and latent features,
the student model can better align with the input pattern distribution of the
pre-trained SD, thereby enabling more effective utilization of SD's generative
capabilities. To better activate the generative prior of SD at different
timesteps, we propose a Time-Aware VSD loss that bridges the timesteps of the
student model and those of the teacher model, thereby producing more consistent
generative prior guidance conditioned on timesteps. Additionally, though
utilizing the generative prior in SD at different timesteps, our method can
naturally achieve controllable trade-offs between fidelity and realism by
changing the timestep condition. Experimental results demonstrate that our
method achieves both state-of-the-art performance and controllable SR results
with only a single step.
中文标题/摘要
标题:面向真实世界图像超分辨率的时序感知单步扩散网络
基于扩散模型的真实图像超分辨率(Real-ISR)方法已展现出卓越性能。为实现高效Real-ISR,许多研究采用变分分数蒸馏(VSD)技术,以固定时间步长蒸馏预训练稳定扩散(SD)模型实现单步超分。但由于不同噪声注入时间步会导致SD生成先验的差异,固定时间步长难以充分利用SD的生成先验,导致性能次优。为此,我们提出时序感知单步扩散网络(TADSR)。首先引入时序感知VAE编码器,根据时间步将同一图像映射为不同潜在特征。通过时间步与潜在特征的联合动态变化,学生模型能更好对齐预训练SD的输入模式分布,从而更有效利用其生成能力。为在不同时间步激活SD生成先验,提出时序感知VSD损失函数,桥接学生模型与教师模型的时间步,产生更符合时间步条件的生成先验指导。此外,通过利用SD在不同时间步的生成先验,本方法可通过改变时间步条件自然实现保真度与真实感的可控权衡。实验结果表明,我们的方法仅需单步即可同时实现最先进性能和可控超分结果。
Summary / 总结
Motivated by the suboptimal performance of existing one-step diffusion-based real-world image super-resolution methods that use a fixed timestep, which fails to fully leverage the varying generative priors of Stable Diffusion across different noise injection timesteps, this work proposes a Time-Aware one-step Diffusion Network (TADSR). The method introduces a Time-Aware VAE Encoder that projects images into different latent features based on timesteps, enabling better alignment with the pre-trained SD's input distribution, and a Time-Aware VSD loss that bridges student and teacher model timesteps for more consistent generative prior guidance. Experimental results show that TADSR achieves state-of-the-art performance with a single inference step and enables controllable trade-offs between fidelity and realism by adjusting the timestep condition.
本研究针对现有基于扩散的真实图像超分辨率方法在变分分数蒸馏中使用固定时间步的局限性,该限制无法充分利用预训练稳定扩散模型在不同噪声注入时间步的不同生成先验。提出的时间感知一步扩散网络(TADSR)引入了时间感知VAE编码器,根据时间步将图像投影到不同的潜在特征,并设计了时间感知VSD损失来对齐师生模型的时间步以更好地激活生成先验。实验结果表明,TADSR仅需单步推理即可实现最先进的性能,同时通过时间步调整实现保真度与真实感之间的可控权衡。
Transfer Learning via Lexical Relatedness: A Sarcasm and Hate Speech Case Study
Authors: Angelly Cabrera, Linus Lei, Antonio Ortega
First: 2025-08-22T17:23:08+00:00 · Latest: 2025-08-22T17:23:08+00:00
Abstract
Detecting hate speech in non-direct forms, such as irony, sarcasm, and
innuendos, remains a persistent challenge for social networks. Although sarcasm
and hate speech are regarded as distinct expressions, our work explores whether
integrating sarcasm as a pre-training step improves implicit hate speech
detection and, by extension, explicit hate speech detection. Incorporating
samples from ETHOS, Sarcasm on Reddit, and Implicit Hate Corpus, we devised two
training strategies to compare the effectiveness of sarcasm pre-training on a
CNN+LSTM and BERT+BiLSTM model. The first strategy is a single-step training
approach, where a model trained only on sarcasm is then tested on hate speech.
The second strategy uses sequential transfer learning to fine-tune models for
sarcasm, implicit hate, and explicit hate. Our results show that sarcasm
pre-training improved the BERT+BiLSTM's recall by 9.7%, AUC by 7.8%, and
F1-score by 6% on ETHOS. On the Implicit Hate Corpus, precision increased by
7.8% when tested only on implicit samples. By incorporating sarcasm into the
training process, we show that models can more effectively detect both implicit
and explicit hate.
中文标题/摘要
标题:基于词汇相关性的迁移学习:讽刺与仇恨言论案例研究
检测非直接形式的仇恨言论,如反讽、讽刺和影射,仍是社交媒体面临的持续挑战。尽管讽刺与仇恨言论被视为不同的表达方式,本研究探讨将讽刺检测作为预训练步骤是否能提升隐式仇恨言论检测效果,并进而改善显式仇恨言论检测。通过整合ETHOS、Reddit讽刺语料和隐式仇恨语料库的样本,我们设计了两种训练策略来比较讽刺预训练在CNN+LSTM和BERT+BiLSTM模型上的效果。第一种是单步训练策略,即仅在讽刺数据上训练的模型直接测试仇恨言论检测;第二种采用序列迁移学习,依次对讽刺、隐式仇恨和显式仇恨进行模型微调。实验结果表明:在ETHOS数据集上,讽刺预训练使BERT+BiLSTM模型的召回率提升9.7%,AUC提高7.8%,F1分数增长6%;在隐式仇恨语料库中,仅测试隐式样本时精确度上升7.8%。通过将讽刺纳入训练过程,我们证明模型能更有效地检测隐性与显性仇恨言论。
Summary / 总结
This study addresses the challenge of detecting non-direct forms of hate speech, such as irony and sarcasm, on social media. The authors investigate whether sarcasm pre-training can enhance implicit and explicit hate speech detection by employing two training strategies: single-step training and sequential transfer learning, using CNN+LSTM and BERT+BiLSTM models. Experimental results demonstrate that sarcasm pre-training significantly improves the BERT+BiLSTM model's performance, with recall increasing by 9.7%, AUC by 7.8%, and F1-score by 6% on the ETHOS dataset, and precision rising by 7.8% on the Implicit Hate Corpus, indicating that sarcasm integration aids in more effective hate speech detection.
本研究针对社交媒体上讽刺和反语等非直接仇恨言论检测的挑战,探讨了讽刺预训练是否能提升仇恨言论检测模型的效果。作者采用两种训练策略——单步训练和顺序迁移学习,结合CNN+LSTM和BERT+BiLSTM架构,并使用ETHOS、Reddit讽刺语料和隐式仇恨语料库进行实验。结果表明,讽刺预训练显著提高了BERT+BiLSTM模型的性能,在ETHOS数据集上召回率提升9.7%、AUC提升7.8%、F1分数提升6%,而在隐式仇恨语料库上精确度增加7.8%,证实了讽刺整合有助于检测隐式和显式仇恨言论。
Machine Learning Time Propagators for Time-Dependent Density Functional Theory Simulations
Authors: Karan Shah, Attila Cangi
First: 2025-08-22T17:22:24+00:00 · Latest: 2025-08-22T17:22:24+00:00
Comments: 20 pages, 5 figures
Abstract
Time-dependent density functional theory (TDDFT) is a widely used method to
investigate electron dynamics under external time-dependent perturbations such
as laser fields. In this work, we present a novel approach to accelerate
electron dynamics simulations based on real time TDDFT using autoregressive
neural operators as time-propagators for the electron density. By leveraging
physics-informed constraints and featurization, and high-resolution training
data, our model achieves superior accuracy and computational speed compared to
traditional numerical solvers. We demonstrate the effectiveness of our model on
a class of one-dimensional diatomic molecules under the influence of a range of
laser parameters. This method has potential in enabling real-time, on-the-fly
modeling of laser-irradiated molecules and materials with varying experimental
parameters.
中文标题/摘要
标题:机器学习时间传播子用于含时密度泛函理论模拟
含时密度泛函理论(TDDFT)是研究外场时变扰动(如激光场)下电子动力学的常用方法。本研究提出一种创新方法,通过使用自回归神经算子作为电子密度的时间传播子,加速基于实时TDDFT的电子动力学模拟。通过结合物理约束、特征化处理及高分辨率训练数据,我们的模型在精度和计算速度上均优于传统数值求解器。我们在受不同激光参数影响的一维双原子分子体系上验证了模型的有效性。该方法有望实现对激光辐照分子与材料的实时动态建模,并适应变化的实验参数。
Summary / 总结
This research aims to accelerate electron dynamics simulations in time-dependent density functional theory (TDDFT), which is computationally intensive when modeling systems under time-dependent perturbations like laser fields. The method employs autoregressive neural operators as time-propagators for electron density, incorporating physics-informed constraints and featurization along with high-resolution training data. Experimental results on one-dimensional diatomic molecules subjected to various laser parameters show that the model achieves higher accuracy and computational speed compared to traditional numerical solvers, demonstrating potential for real-time modeling of laser-irradiated systems.
本研究旨在加速含时密度泛函理论(TDDFT)中计算密集的电子动力学模拟,该理论用于研究激光场等含时微扰下的系统。方法采用自回归神经算子作为电子密度的时间传播子,结合物理信息约束和高分辨率训练数据以提高精度和速度。在一维双原子分子受不同激光参数作用的实验中,该模型相比传统数值求解器展现出更高的精度和计算效率,显示出对激光辐照系统进行实时建模的潜力。
TinyML Towards Industry 4.0: Resource-Efficient Process Monitoring of a Milling Machine
Authors: Tim Langer, Matthias Widra, Volkhard Beyer
First: 2025-08-22T17:21:56+00:00 · Latest: 2025-08-22T17:21:56+00:00
Comments: 10 pages, 5 figures, 1 table
Abstract
In the context of industry 4.0, long-serving industrial machines can be
retrofitted with process monitoring capabilities for future use in a smart
factory. One possible approach is the deployment of wireless monitoring
systems, which can benefit substantially from the TinyML paradigm. This work
presents a complete TinyML flow from dataset generation, to machine learning
model development, up to implementation and evaluation of a full preprocessing
and classification pipeline on a microcontroller. After a short review on
TinyML in industrial process monitoring, the creation of the novel MillingVibes
dataset is described. The feasibility of a TinyML system for
structure-integrated process quality monitoring could be shown by the
development of an 8-bit-quantized convolutional neural network (CNN) model with
12.59kiB parameter storage. A test accuracy of 100.0% could be reached at
15.4ms inference time and 1.462mJ per quantized CNN inference on an ARM Cortex
M4F microcontroller, serving as a reference for future TinyML process
monitoring solutions.
中文标题/摘要
标题:TinyML迈向工业4.0:铣床资源高效型过程监控
在工业4.0背景下,可通过为长期服役的工业机器加装过程监控功能,使其适应智能工厂的未来需求。部署无线监控系统是一种可行方案,该方案能显著受益于TinyML范式。本研究展示了完整的TinyML流程:从数据集生成、机器学习模型开发,到在微控制器上实现并评估完整的预处理与分类流水线。在简要回顾工业过程监控中的TinyML应用后,详细介绍了新型MillingVibes数据集的创建过程。通过开发参数量存储仅12.59kiB的8位量化卷积神经网络(CNN)模型,验证了结构集成式过程质量监控的TinyML系统可行性。在ARM Cortex M4F微控制器上实现了100.0%的测试准确率,单次量化CNN推理耗时15.4毫秒,能耗1.462毫焦,为未来TinyML过程监控方案提供了参考基准。
Summary / 总结
This research aims to retrofit legacy industrial machines with process monitoring capabilities for Industry 4.0 by leveraging TinyML to enable efficient, wireless smart factory applications. The method involves generating a novel dataset (MillingVibes), developing an 8-bit quantized convolutional neural network, and implementing a full preprocessing and classification pipeline on a microcontroller. Experimental results demonstrate 100% test accuracy with an inference time of 15.4ms and energy consumption of 1.462mJ per inference on an ARM Cortex M4F, using a model requiring only 12.59kiB of parameter storage.
本研究旨在通过TinyML为传统工业机械赋予过程监控能力,以支持工业4.0中的资源高效型无线智能工厂解决方案。方法包括创建新的MillingVibes数据集、开发8位量化卷积神经网络,以及在微控制器上部署完整的预处理和分类流程。实验结果表明,在ARM Cortex M4F微控制器上实现了100%的测试准确率,推理时间为15.4毫秒,每次量化推理能耗为1.462毫焦,模型参数存储仅需12.59kiB。
RL Is Neither a Panacea Nor a Mirage: Understanding Supervised vs. Reinforcement Learning Fine-Tuning for LLMs
Authors: Hangzhan Jin, Sicheng Lv, Sifan Wu, Mohammad Hamdaqa
First: 2025-08-22T17:10:37+00:00 · Latest: 2025-08-22T17:10:37+00:00
Abstract
Training large language models (LLMs) from scratch is increasingly
impractical, making post-training methods such as supervised fine-tuning (SFT)
and reinforcement-learning fine-tuning (RL-FT, e.g., PPO) central to modern
practice. Using an out-of-distribution (OOD) variant of the 24-point card game
and new spectrum-based diagnostics, we revisit how these two stages reshape
model representation and OOD performance. Our key findings are- (1) RL-FT can
restore much of the OOD performance loss from SFT (e.g., Llama-11B 8.97% to
15.38%, Qwen-7B 17.09% to 19.66%). But when SFT induces severe overfitting and
a clear distribution shift, RL-FT cannot fully recover OOD performance. (2)
Direction shifts of singular vectors matter more than singular value
magnitudes. These shifts concentrate on directions linked to the largest and
smallest singular values, leaving the bulk spectrum intact. (3) Low-rank and
shallow recovery is effective: restoring singular vector directions for the top
20% of values or first 25% of layers recovers 70-80% of OOD performance. (4)
Stronger SFT checkpoints enable better recovery by RL, while overfitted ones
resist restoration. These results reconcile prior reports of RL superior OOD
performance: RL primarily counteracts SFT-induced directional drift rather than
finding new solutions. Our spectrum-aware analysis highlights inexpensive
recovery knobs low-rank UV merging and shallow-layer resets that practitioners
can use before costly RL fine-tuning.
中文标题/摘要
标题:强化学习既非万能亦非幻影:理解监督学习与强化学习在大型语言模型微调中的作用
从头训练大型语言模型(LLMs)日益不切实际,使得监督微调(SFT)和强化学习微调(RL-FT,如PPO)成为现代实践的核心。通过采用24点卡牌游戏的分布外(OOD)变体及新型频谱诊断方法,我们重新审视了这两个阶段如何重塑模型表示与OOD性能。主要发现包括:(1)RL-FT可大幅恢复SFT造成的OOD性能损失(如Llama-11B从8.97%升至15.38%,Qwen-7B从17.09%升至19.66%),但当SFT导致严重过拟合和明显分布偏移时,RL-FT无法完全恢复;(2)奇异向量方向偏移比奇异值幅度更重要,这些偏移集中在最大和最小奇异值相关方向,而主体频谱保持稳定;(3)低秩浅层恢复有效:恢复前20%数值或前25%层的奇异向量方向可挽回70-80%的OOD性能;(4)强SFT检查点更利于RL恢复,而过拟合检查点难以修复。这些结果调和了先前关于RL优越OOD性能的报告:RL主要抵消SFT引发的方向漂移而非寻找新解决方案。我们的频谱感知分析揭示了低成本恢复手段——低秩UV合并和浅层重置,可供实践者在昂贵RL微调前采用。
Summary / 总结
This study investigates the comparative effectiveness of supervised fine-tuning (SFT) and reinforcement-learning fine-tuning (RL-FT) for adapting large language models, motivated by the impracticality of training LLMs from scratch and the need to understand how these methods affect out-of-distribution (OOD) generalization. Using an OOD variant of the 24-point card game and spectrum-based diagnostics, the authors analyze how SFT and RL-FT reshape model representations. Key findings show that RL-FT can recover much of the OOD performance loss from SFT (e.g., Llama-11B improved from 8.97% to 15.38%), but struggles when SFT causes severe overfitting; recovery is attributed to correcting directional shifts in singular vectors rather than discovering new solutions, with low-rank and shallow interventions restoring 70-80% of performance.
本研究探讨了监督微调(SFT)和强化学习微调(RL-FT)在适应大语言模型时的相对有效性,动机在于从头训练LLMs不切实际,且需理解这些方法如何影响分布外(OOD)泛化。作者使用24点纸牌游戏的OOD变体和基于频谱的诊断方法,分析SFT和RL-FT如何重塑模型表示。关键发现表明,RL-FT可恢复SFT导致的大部分OOD性能损失(如Llama-11B从8.97%提升至15.38%),但当SFT引发严重过拟合时恢复有限;恢复主要通过低秩和浅层干预实现,关注奇异向量方向而非数值大小,且初始SFT检查点越强恢复效果越好。
Parameter-Free Logit Distillation via Sorting Mechanism
Authors: Stephen Ekaputra Limantoro
First: 2025-08-22T17:09:38+00:00 · Latest: 2025-08-22T17:09:38+00:00
Comments: Accepted in IEEE Signal Processing Letters 2025
Abstract
Knowledge distillation (KD) aims to distill the knowledge from the teacher
(larger) to the student (smaller) model via soft-label for the efficient neural
network. In general, the performance of a model is determined by accuracy,
which is measured with labels. However, existing KD approaches usually use the
teacher with its original distribution, neglecting the potential of incorrect
prediction. This may contradict the motivation of hard-label learning through
cross-entropy loss, which may lead to sub-optimal knowledge distillation on
certain samples. To address this issue, we propose a novel logit processing
scheme via a sorting mechanism. Specifically, our method has a two-fold goal:
(1) fixing the incorrect prediction of the teacher based on the labels and (2)
reordering the distribution in a natural way according to priority rank at
once. As an easy-to-use, plug-and-play pre-processing, our sort method can be
effectively applied to existing logit-based KD methods. Extensive experiments
on the CIFAR-100 and ImageNet datasets demonstrate the effectiveness of our
method.
中文标题/摘要
标题:基于排序机制的无参数Logit蒸馏
知识蒸馏(KD)旨在通过软标签将教师(较大)模型的知识传递给学生(较小)模型,以实现高效的神经网络。通常,模型性能由基于标签的准确率决定。然而,现有KD方法多直接采用教师模型的原始分布,忽略了错误预测的潜在影响。这可能与通过交叉熵损失进行硬标签学习的动机相矛盾,导致在某些样本上产生次优的知识蒸馏效果。为解决此问题,我们提出了一种通过排序机制的新型logit处理方案。具体而言,我们的方法具有双重目标:(1)基于标签修正教师模型的错误预测;(2)按优先级排名一次性自然重排分布。作为一种即插即用的预处理方法,我们的排序技术可有效应用于现有基于logit的KD方法。在CIFAR-100和ImageNet数据集上的大量实验证明了该方法的有效性。
Summary / 总结
This work addresses a limitation in knowledge distillation where teacher models may propagate incorrect predictions through their original soft-label distributions, potentially conflicting with cross-entropy learning objectives. The authors propose a parameter-free logit processing method that corrects teacher errors using ground truth labels and reorders the distribution via a sorting mechanism to prioritize correct class rankings. Experimental results on CIFAR-100 and ImageNet show consistent performance improvements when this method is integrated with existing logit-based distillation approaches.
本研究针对知识蒸馏中教师模型可能传播错误预测的问题,提出了一种无需参数的排序机制对数处理方案。该方法通过基于真实标签纠正教师错误预测,并按照优先级自然重排分布,可作为即插即用的预处理模块应用于现有基于logit的蒸馏方法。在CIFAR-100和ImageNet数据集上的实验表明,该方法能有效提升蒸馏性能。
Explainable AI in Deep Learning-Based Prediction of Solar Storms
Authors: Adam O. Rawashdeh, Jason T. L. Wang, Katherine G. Herbert
First: 2025-08-22T17:09:00+00:00 · Latest: 2025-08-22T17:09:00+00:00
Comments: 6 pages, 8 figures
Abstract
A deep learning model is often considered a black-box model, as its internal
workings tend to be opaque to the user. Because of the lack of transparency, it
is challenging to understand the reasoning behind the model's predictions.
Here, we present an approach to making a deep learning-based solar storm
prediction model interpretable, where solar storms include solar flares and
coronal mass ejections (CMEs). This deep learning model, built based on a long
short-term memory (LSTM) network with an attention mechanism, aims to predict
whether an active region (AR) on the Sun's surface that produces a flare within
24 hours will also produce a CME associated with the flare. The crux of our
approach is to model data samples in an AR as time series and use the LSTM
network to capture the temporal dynamics of the data samples. To make the
model's predictions accountable and reliable, we leverage post hoc
model-agnostic techniques, which help elucidate the factors contributing to the
predicted output for an input sequence and provide insights into the model's
behavior across multiple sequences within an AR. To our knowledge, this is the
first time that interpretability has been added to an LSTM-based solar storm
prediction model.
中文标题/摘要
标题:基于深度学习的太阳风暴预测中可解释性人工智能
深度学习模型常被视为黑箱模型,因其内部机制对用户而言往往不透明。这种透明度的缺失使得理解模型预测背后的逻辑变得困难。本文提出一种方法,使基于深度学习的太阳风暴预测模型具有可解释性,其中太阳风暴包括太阳耀斑和日冕物质抛射(CMEs)。该深度学习模型基于带有注意力机制的长短期记忆(LSTM)网络构建,旨在预测太阳表面活动区(AR)在24小时内产生耀斑的同时是否会引发与之相关的CME。我们的方法核心是将AR中的数据样本建模为时间序列,并利用LSTM网络捕捉数据样本的时序动态特性。为确保模型预测的可追溯性和可靠性,我们采用事后模型无关技术,这些技术有助于阐明输入序列中对预测输出产生影响的因素,并揭示模型在AR内多个序列中的行为模式。据我们所知,这是首次在基于LSTM的太阳风暴预测模型中引入可解释性。
Summary / 总结
Deep learning models for solar storm prediction are often opaque, making it difficult to understand their decision processes. To address this, the authors developed an interpretable model using a long short-term memory (LSTM) network with an attention mechanism to predict whether an active region producing a solar flare will also generate an associated coronal mass ejection (CME) within 24 hours. The method treats active region data as time series to capture temporal dynamics and applies post hoc model-agnostic techniques to explain predictions. Experimental results demonstrate that the approach successfully provides insights into the model's behavior, enhancing accountability and reliability for solar storm forecasting.
本研究针对深度学习模型在太阳风暴预测中缺乏透明度的问题,提出了一种可解释的方法。作者利用带有注意力机制的长短期记忆(LSTM)网络,预测产生太阳耀斑的活动区是否会在24小时内引发相关的日冕物质抛射(CME)。该方法将活动区数据建模为时间序列,并应用事后模型无关技术来捕捉时序动态和解释预测因素,首次实现了基于LSTM的可解释太阳风暴预测模型。
Escaping Saddle Points via Curvature-Calibrated Perturbations: A Complete Analysis with Explicit Constants and Empirical Validation
Authors: Faruk Alpay, Hamdi Alakkad
First: 2025-08-22T17:06:28+00:00 · Latest: 2025-08-22T17:06:28+00:00
Comments: 16 pages. Perturbed gradient descent with fully explicit constants
for escaping saddle points, validated empirically
Abstract
We present a comprehensive theoretical analysis of first-order methods for
escaping strict saddle points in smooth non-convex optimization. Our main
contribution is a Perturbed Saddle-escape Descent (PSD) algorithm with fully
explicit constants and a rigorous separation between gradient-descent and
saddle-escape phases. For a function $f:\mathbb{R}^d\to\mathbb{R}$ with
$\ell$-Lipschitz gradient and $\rho$-Lipschitz Hessian, we prove that PSD finds
an $(\epsilon,\sqrt{\rho\epsilon})$-approximate second-order stationary point
with high probability using at most $O(\ell\Delta_f/\epsilon^2)$ gradient
evaluations for the descent phase plus
$O((\ell/\sqrt{\rho\epsilon})\log(d/\delta))$ evaluations per escape episode,
with at most $O(\ell\Delta_f/\epsilon^2)$ episodes needed. We validate our
theoretical predictions through extensive experiments across both synthetic
functions and practical machine learning tasks, confirming the logarithmic
dimension dependence and the predicted per-episode function decrease. We also
provide complete algorithmic specifications including a finite-difference
variant (PSD-Probe) and a stochastic extension (PSGD) with robust mini-batch
sizing.
中文标题/摘要
标题:通过曲率校准扰动逃离鞍点:含显式常数与实证验证的完整分析
本文对光滑非凸优化中逃离严格鞍点的一阶方法进行了全面理论分析。核心贡献是提出了具有完全显式常数的扰动鞍点逃离下降算法(PSD),明确区分梯度下降阶段与鞍点逃离阶段。对于梯度Lipschitz常数为ℓ、Hessian矩阵Lipschitz常数为ρ的函数f:ℝᵈ→ℝ,我们证明PSD能以高概率找到(ε,√(ρε))-近似二阶稳定点:下降阶段至多使用O(ℓΔ_f/ε²)次梯度计算,每次逃离事件需O((ℓ/√(ρε))log(d/δ))次计算,且最多需要O(ℓΔ_f/ε²)次逃离事件。通过合成函数和实际机器学习任务的广泛实验,我们验证了理论预测,确认了对数维度依赖性和预测的每次事件函数下降量。同时提供了完整算法规范,包括有限差分变体(PSD-Probe)和具有鲁棒小批量大小的随机扩展(PSGD)。
Summary / 总结
This work addresses the challenge of escaping strict saddle points in non-convex optimization, a critical issue for ensuring convergence to meaningful local minima. The authors propose the Perturbed Saddle-escape Descent (PSD) algorithm, which employs curvature-calibrated perturbations and rigorously separates gradient descent phases from saddle-escape phases with fully explicit constants. Experimental validation on synthetic functions and machine learning tasks confirms the theoretical predictions, including logarithmic dimension dependence and the expected function decrease per escape episode, while variants for finite-difference and stochastic settings are also provided.
本研究解决了非凸优化中一阶方法逃离严格鞍点的关键挑战。作者提出了扰动鞍点逃离下降(PSD)算法,该算法具有显式常数,并通过曲率校准的扰动清晰区分梯度下降和鞍点逃离阶段。理论分析证明PSD能以高概率找到近似二阶稳定点,下降阶段使用O(ℓΔ_f/ε²)次梯度计算,每次逃离事件使用O((ℓ/√(ρε))log(d/δ))次计算。在合成函数和机器学习任务上的大量实验验证了对数维度依赖性和每次事件的预测函数下降,同时提供了包括有限差分变体和随机扩展的完整算法规范。
Quality control in sublinear time: a case study via random graphs
Authors: Cassandra Marcussen, Ronitt Rubinfeld, Madhu Sudan
First: 2025-08-22T16:54:18+00:00 · Latest: 2025-08-22T16:54:18+00:00
Comments: 70 pages
Abstract
Many algorithms are designed to work well on average over inputs. When
running such an algorithm on an arbitrary input, we must ask: Can we trust the
algorithm on this input? We identify a new class of algorithmic problems
addressing this, which we call "Quality Control Problems." These problems are
specified by a (positive, real-valued) "quality function" $\rho$ and a
distribution $D$ such that, with high probability, a sample drawn from $D$ is
"high quality," meaning its $\rho$-value is near $1$. The goal is to accept
inputs $x \sim D$ and reject potentially adversarially generated inputs $x$
with $\rho(x)$ far from $1$. The objective of quality control is thus weaker
than either component problem: testing for "$\rho(x) \approx 1$" or testing if
$x \sim D$, and offers the possibility of more efficient algorithms.
In this work, we consider the sublinear version of the quality control
problem, where $D \in \Delta(\{0,1\}^N)$ and the goal is to solve the $(D
,\rho)$-quality problem with $o(N)$ queries and time. As a case study, we
consider random graphs, i.e., $D = G_{n,p}$ (and $N = \binom{n}2$), and the
$k$-clique count function $\rho_k := C_k(G)/\mathbb{E}_{G' \sim
G_{n,p}}[C_k(G')]$, where $C_k(G)$ is the number of $k$-cliques in $G$. Testing
if $G \sim G_{n,p}$ with one sample, let alone with sublinear query access to
the sample, is of course impossible. Testing if $\rho_k(G)\approx 1$ requires
$p^{-\Omega(k^2)}$ samples. In contrast, we show that the quality control
problem for $G_{n,p}$ (with $n \geq p^{-ck}$ for some constant $c$) with
respect to $\rho_k$ can be tested with $p^{-O(k)}$ queries and time, showing
quality control is provably superpolynomially more efficient in this setting.
More generally, for a motif $H$ of maximum degree $\Delta(H)$, the respective
quality control problem can be solved with $p^{-O(\Delta(H))}$ queries and
running time.
中文标题/摘要
标题:亚线性时间中的质量控制:随机图案例研究
许多算法被设计为在输入上平均表现良好。当在任意输入上运行此类算法时,我们必须问:我们能信任该算法在此输入上的表现吗?我们识别出一类新的算法问题来解决这一点,称之为“质量控制问题”。这些问题由一个(正的、实值的)“质量函数”$\rho$和一个分布$D$指定,使得从$D$中抽取的样本以高概率是“高质量的”,即其$\rho$值接近1。目标是接受输入$x \sim D$并拒绝可能由对抗生成的、$\rho(x)$远离1的输入$x$。因此,质量控制的目标弱于任一组件问题:测试“$\rho(x) \approx 1$”或测试$x \sim D$,并提供了更高效算法的可能性。
在这项工作中,我们考虑质量控制问题的亚线性版本,其中$D \in \Delta(\{0,1\}^N)$,目标是以$o(N)$查询和时间解决$(D,\rho)$-质量问题。作为案例研究,我们考虑随机图,即$D = G_{n,p}$(且$N = \binom{n}2$)和$k$-团计数函数$\rho_k := C_k(G)/\mathbb{E}_{G' \sim G_{n,p}}[C_k(G')]$,其中$C_k(G)$是$G$中$k$-团的数量。用一个样本测试$G \sim G_{n,p}$,更不用说对样本进行亚线性查询访问,当然是不可能的。测试$\rho_k(G)\approx 1$需要$p^{-\Omega(k^2)}$样本。相比之下,我们展示了对于$G_{n,p}$(其中$n \geq p^{-ck}$,$c$为某常数)关于$\rho_k$的质量控制问题可以用$p^{-O(k)}$查询和时间进行测试,表明在此设置中质量控制被证明是超多项式更高效的。更一般地,对于最大度为$\Delta(H)$的模体$H$,相应的质量控制问题可以用$p^{-O(\Delta(H))}$查询和运行时间解决。
Summary / 总结
The research is motivated by the need to trust algorithms designed for average-case performance when applied to arbitrary inputs, leading to the introduction of Quality Control Problems. The method involves defining a quality function ρ and a distribution D, then developing sublinear-time algorithms that accept inputs from D and reject adversarial ones with low ρ-values. Experimental results on random graphs with clique-counting functions demonstrate that quality control requires only p^{-O(k)} queries and time, significantly outperforming traditional testing methods which need p^{-Ω(k²)} samples.
本研究引入质量控制问题,其动机在于当算法仅在平均情况下表现良好时,需要确保其在任意输入上的可信度。方法涉及定义质量函数ρ和分布D,并设计亚线性时间算法来接受来自D的输入,同时拒绝具有低ρ值的对抗性输入。针对随机图和团计数质量函数的案例研究实验结果表明,质量控制可通过p^{-O(k)}次查询和时间实现,这比标准测试方法所需的p^{-Ω(k²)}样本量在效率上实现了超多项式提升。
Towards Open World Detection: A Survey
Authors: Andrei-Stefan Bulzan, Cosmin Cernazanu-Glavan
First: 2025-08-22T16:49:52+00:00 · Latest: 2025-08-22T16:49:52+00:00
Comments: 30 pages
Abstract
For decades, Computer Vision has aimed at enabling machines to perceive the
external world. Initial limitations led to the development of highly
specialized niches. As success in each task accrued and research progressed,
increasingly complex perception tasks emerged. This survey charts the
convergence of these tasks and, in doing so, introduces Open World Detection
(OWD), an umbrella term we propose to unify class-agnostic and generally
applicable detection models in the vision domain. We start from the history of
foundational vision subdomains and cover key concepts, methodologies and
datasets making up today's state-of-the-art landscape. This traverses topics
starting from early saliency detection, foreground/background separation, out
of distribution detection and leading up to open world object detection,
zero-shot detection and Vision Large Language Models (VLLMs). We explore the
overlap between these subdomains, their increasing convergence, and their
potential to unify into a singular domain in the future, perception.
中文标题/摘要
标题:迈向开放世界检测:一项综述
数十年来,计算机视觉致力于让机器感知外部世界。最初的局限性催生了高度专业化的细分领域。随着各任务成果的积累与研究进展,日益复杂的感知任务应运而生。本综述描绘了这些任务的融合趋势,并由此提出“开放世界检测”(OWD)这一统称,用以整合视觉领域中类别无关且普遍适用的检测模型。我们从基础视觉子领域的历史出发,涵盖构成当今前沿格局的关键概念、方法论及数据集。内容横跨早期显著性检测、前景/背景分离、分布外检测,直至开放世界目标检测、零样本检测及视觉大语言模型(VLLMs)。我们探讨这些子领域间的重叠性、日益增强的融合趋势,以及它们未来统一为单一感知领域的潜力。
Summary / 总结
This survey is motivated by the historical specialization in computer vision and the emergence of increasingly complex perception tasks, aiming to unify diverse detection approaches under the proposed concept of Open World Detection (OWD). The method involves a comprehensive review of foundational vision subdomains, key concepts, methodologies, and datasets, covering topics from saliency detection and foreground/background separation to out-of-distribution detection, open world object detection, zero-shot detection, and Vision Large Language Models (VLLMs). The main findings highlight the convergence of these subdomains, their overlaps, and their potential to unify into a singular perception domain in the future.
本综述的动机源于计算机视觉领域长期以来的专业化发展导致子领域碎片化。作者提出开放世界检测(OWD)作为类别无关和通用检测模型的统一框架,综述了关键方法——从显著性检测、前景-背景分离到分布外检测、零样本检测和视觉大语言模型。主要实验结果表明这些子领域正日益融合,展现出未来统一为单一感知范式的潜力,以应对开放世界场景。
Guiding Diffusion Models with Reinforcement Learning for Stable Molecule Generation
Authors: Zhijian Zhou, Junyi An, Zongkai Liu, Yunfei Shi, Xuan Zhang, Fenglei Cao, Chao Qu, Yuan Qi
First: 2025-08-22T16:44:55+00:00 · Latest: 2025-08-22T16:44:55+00:00
Abstract
Generating physically realistic 3D molecular structures remains a core
challenge in molecular generative modeling. While diffusion models equipped
with equivariant neural networks have made progress in capturing molecular
geometries, they often struggle to produce equilibrium structures that adhere
to physical principles such as force field consistency. To bridge this gap, we
propose Reinforcement Learning with Physical Feedback (RLPF), a novel framework
that extends Denoising Diffusion Policy Optimization to 3D molecular
generation. RLPF formulates the task as a Markov decision process and applies
proximal policy optimization to fine-tune equivariant diffusion models.
Crucially, RLPF introduces reward functions derived from force-field
evaluations, providing direct physical feedback to guide the generation toward
energetically stable and physically meaningful structures. Experiments on the
QM9 and GEOM-drug datasets demonstrate that RLPF significantly improves
molecular stability compared to existing methods. These results highlight the
value of incorporating physics-based feedback into generative modeling. The
code is available at: https://github.com/ZhijianZhou/RLPF/tree/verl_diffusion.
中文标题/摘要
标题:利用强化学习引导扩散模型实现稳定分子生成
生成物理真实的3D分子结构仍是分子生成建模的核心挑战。尽管配备等变神经网络的扩散模型在捕捉分子几何结构方面取得进展,但它们往往难以产生符合力场一致性等物理原理的平衡结构。为弥合这一差距,我们提出了物理反馈强化学习(RLPF),这是一种将去噪扩散策略优化扩展至3D分子生成的新框架。RLPF将任务构建为马尔可夫决策过程,并应用近端策略优化微调等变扩散模型。关键创新在于引入基于力场评估的奖励函数,通过直接物理反馈引导生成能量稳定且物理意义明确的结构。在QM9和GEOM-drug数据集上的实验表明,RLPF相比现有方法显著提升了分子稳定性。这些结果凸显了将基于物理的反馈融入生成建模的价值。代码发布于:https://github.com/ZhijianZhou/RLPF/tree/verl_diffusion。
Summary / 总结
This research addresses the challenge of generating physically realistic 3D molecular structures, as existing diffusion models often fail to produce equilibrium structures consistent with physical principles like force fields. The authors propose Reinforcement Learning with Physical Feedback (RLPF), a framework that formulates molecular generation as a Markov decision process and uses proximal policy optimization to fine-tune equivariant diffusion models, incorporating force-field evaluations as reward functions to guide generation. Experimental results on QM9 and GEOM-drug datasets show that RLPF significantly enhances molecular stability compared to prior methods, demonstrating the effectiveness of physics-based feedback in generative modeling.
生成物理上真实的三维分子结构是分子生成建模的核心挑战。虽然配备等变神经网络的扩散模型在捕捉分子几何结构方面取得了进展,但它们往往难以产生符合力场一致性等物理原理的平衡结构。为解决这一问题,作者提出了强化学习与物理反馈(RLPF)框架,将去噪扩散策略优化扩展到三维分子生成。RLPF将任务表述为马尔可夫决策过程,并使用近端策略优化来微调等变扩散模型,引入基于力场的奖励函数以引导生成能量稳定且物理意义明确的结构。在QM9和GEOM-drug数据集上的实验表明,RLPF相比现有方法显著提高了分子稳定性,凸显了基于物理的反馈在生成建模中的价值。
FLAMES: Improving LLM Math Reasoning via a Fine-Grained Analysis of the Data Synthesis Pipeline
Authors: Parker Seegmiller, Kartik Mehta, Soumya Saha, Chenyang Tao, Shereen Oraby, Arpit Gupta, Tagyoung Chung, Mohit Bansal, Nanyun Peng
Venue: EMNLP 2025
First: 2025-08-22T16:37:40+00:00 · Latest: 2025-08-22T16:37:40+00:00
Comments: To appear at EMNLP 2025
Abstract
Recent works improving LLM math reasoning with synthetic data have used
unique setups, making comparison of data synthesis strategies impractical. This
leaves many unanswered questions about the roles of different factors in the
synthetic data pipeline, such as the impact of filtering low-quality problems.
To address this gap, we introduce FLAMES, a Framework for LLM Assessment of
Math rEasoning Data Synthesis, and perform a systematic study of 10 existing
data synthesis strategies and multiple other factors impacting the performance
of synthetic math reasoning data. Our FLAMES experiments provide several
valuable insights about the optimal balance of difficulty and diversity of
synthetic data. First, data agents designed to increase problem complexity lead
to best improvements on most math metrics. Second, with a fixed data generation
budget, keeping higher problem coverage is more important than keeping only
problems with reliable solutions. Third, GSM8K- and MATH-based synthetic data
can lead to improvements on competition-level benchmarks, showcasing
easy-to-hard generalization. Leveraging insights from our FLAMES experiments,
we design two novel data synthesis strategies for improving out-of-domain
generalization and robustness. Further, we develop the FLAMES dataset, an
effective blend of our novel and existing data synthesis strategies,
outperforming public datasets on OlympiadBench (+15.7), CollegeMath (+4.5),
GSMPlus (+6.5), and MATH (+3.1). Fine-tuning Qwen2.5-Math-7B on the FLAMES
dataset achieves 81.4% on MATH, surpassing larger Llama3 405B, GPT-4o and
Claude 3.5 Sonnet.
中文标题/摘要
标题:FLAMES:通过数据合成流程的细粒度分析提升大语言模型数学推理能力
近期利用合成数据改进大语言模型数学推理的研究采用各异实验设置,导致数据合成策略难以直接比较。这使合成数据流程中不同因素的作用存在诸多未解之谜,例如过滤低质量问题的影响。为此,我们推出FLAMES框架(大语言模型数学推理数据合成评估框架),系统研究10种现有数据合成策略及影响数学推理合成数据性能的多个因素。FLAMES实验揭示了合成数据难度与多样性的最佳平衡:首先,旨在提升问题复杂度的数据代理能最大程度改善多数数学指标;其次,在固定数据生成预算下,保持较高问题覆盖率比仅保留可靠解的问题更重要;第三,基于GSM8K和MATH的合成数据可提升竞赛级基准表现,展现由易到难的泛化能力。基于这些发现,我们设计两种新颖数据合成策略以提升域外泛化与鲁棒性,并开发FLAMES数据集——融合新策略与现有策略的有效组合,在OlympiadBench(+15.7)、CollegeMath(+4.5)、GSMPlus(+6.5)和MATH(+3.1)上超越公开数据集。使用FLAMES数据集微调Qwen2.5-Math-7B后,MATH准确率达81.4%,超越更大规模的Llama3 405B、GPT-4o和Claude 3.5 Sonnet。
Summary / 总结
To systematically compare and improve synthetic data strategies for LLM math reasoning, which have been difficult to evaluate due to inconsistent experimental setups, this work introduces FLAMES, a framework for fine-grained analysis of data synthesis pipelines. The method involves a comprehensive study of 10 existing strategies and factors like problem filtering, using both GSM8K and MATH datasets to generate and evaluate synthetic data. Key findings show that increasing problem complexity yields the best performance gains, higher problem coverage is more beneficial than filtering for solution reliability, and synthetic data enables easy-to-hard generalization. The resulting FLAMES dataset, blending novel and existing strategies, significantly outperforms public benchmarks on multiple math reasoning tasks, and fine-tuning Qwen2.5-Math-7B on it achieves state-of-the-art results, surpassing larger models like GPT-4o and Claude 3.5 Sonnet.
本研究针对现有数学推理数据合成策略缺乏系统比较的问题,阻碍了对数据过滤等关键因素的理解。作者提出FLAMES框架,系统分析10种现有策略及多个管道因素,实验揭示了难度与多样性的最优平衡:增加复杂性的数据代理能带来最佳改进、固定预算下高问题覆盖率比解决方案可靠性更重要、合成数据可实现从易到难的泛化。基于这些发现,他们设计了提升泛化性和鲁棒性的新策略,并融合成FLAMES数据集,在多个数学领域超越公开基准,微调后的Qwen2.5-Math-7B在MATH上达到81.4%,性能超过更大模型。
Seeing Clearly, Forgetting Deeply: Revisiting Fine-Tuned Video Generators for Driving Simulation
Authors: Chun-Peng Chang, Chen-Yu Wang, Julian Schmidt, Holger Caesar, Alain Pagani
First: 2025-08-22T16:35:19+00:00 · Latest: 2025-08-22T16:35:19+00:00
Abstract
Recent advancements in video generation have substantially improved visual
quality and temporal coherence, making these models increasingly appealing for
applications such as autonomous driving, particularly in the context of driving
simulation and so-called "world models". In this work, we investigate the
effects of existing fine-tuning video generation approaches on structured
driving datasets and uncover a potential trade-off: although visual fidelity
improves, spatial accuracy in modeling dynamic elements may degrade. We
attribute this degradation to a shift in the alignment between visual quality
and dynamic understanding objectives. In datasets with diverse scene structures
within temporal space, where objects or perspective shift in varied ways, these
objectives tend to highly correlated. However, the very regular and repetitive
nature of driving scenes allows visual quality to improve by modeling dominant
scene motion patterns, without necessarily preserving fine-grained dynamic
behavior. As a result, fine-tuning encourages the model to prioritize
surface-level realism over dynamic accuracy. To further examine this
phenomenon, we show that simple continual learning strategies, such as replay
from diverse domains, can offer a balanced alternative by preserving spatial
accuracy while maintaining strong visual quality.
中文标题/摘要
标题:清晰视界,深度遗忘:重访用于驾驶仿真的微调视频生成器
视频生成技术的最新进展显著提升了视觉质量与时间连贯性,使这类模型在自动驾驶等应用中愈发受到青睐,尤其在驾驶仿真与所谓“世界模型”的语境下。本研究探讨了现有微调视频生成方法在结构化驾驶数据集上的影响,揭示了一个潜在权衡:尽管视觉保真度提升,但对动态元素建模的空间准确性可能下降。我们将此归因于视觉质量与动态理解目标之间对齐关系的偏移。在时间维度内具有多样化场景结构的数据集中,这些目标通常高度相关;然而驾驶场景高度规律且重复的特性,使得模型可通过主导场景运动模式提升视觉质量,却未必保留细粒度动态行为。因此微调促使模型优先考虑表层真实感而非动态准确性。为深入探究该现象,我们证明简单的持续学习策略(如跨域回放)能通过保持空间准确性同时维持强劲视觉质量,提供一种平衡的替代方案。
Summary / 总结
This study investigates the impact of fine-tuning video generation models for driving simulation, motivated by the need to balance visual fidelity with spatial accuracy in dynamic environments. The authors identify a trade-off where fine-tuning improves visual quality but degrades spatial accuracy in modeling dynamic elements, attributing this to a misalignment between visual realism and dynamic understanding objectives in repetitive driving scenes. They propose using simple continual learning strategies, such as replay from diverse domains, to mitigate this issue. Experimental results demonstrate that this approach preserves spatial accuracy while maintaining strong visual quality, offering a more balanced solution for driving simulation applications.
本研究探讨了在驾驶仿真中微调视频生成模型时的权衡问题,动机在于平衡视觉保真度与动态元素的空间准确性。作者分析了现有微调方法,并将动态理解能力的下降归因于驾驶场景的重复性,这使得模型优先学习主导运动模式而非细粒度行为。实验结果表明,采用持续学习策略(如从多样域重放数据)可以在保持高视觉质量的同时保留空间准确性,提供了一种更均衡的解决方案。
ML-PWS: Estimating the Mutual Information Between Experimental Time Series Using Neural Networks
Authors: Manuel Reinhardt, Gašper Tkačik, Pieter Rein ten Wolde
First: 2025-08-22T16:33:34+00:00 · Latest: 2025-08-22T16:33:34+00:00
Comments: 9 pages, 2 figures
Abstract
The ability to quantify information transmission is crucial for the analysis
and design of natural and engineered systems. The information transmission rate
is the fundamental measure for systems with time-varying signals, yet computing
it is extremely challenging. In particular, the rate cannot be obtained
directly from experimental time-series data without approximations, because of
the high dimensionality of the signal trajectory space. Path Weight Sampling
(PWS) is a computational technique that makes it possible to obtain the
information rate exactly for any stochastic system. However, it requires a
mathematical model of the system of interest, be it described by a master
equation or a set of differential equations. Here, we present a technique that
employs Machine Learning (ML) to develop a generative model from experimental
time-series data, which is then combined with PWS to obtain the information
rate. We demonstrate the accuracy of this technique, called ML-PWS, by
comparing its results on synthetic time-series data generated from a non-linear
model against ground-truth results obtained by applying PWS directly to the
same model. We illustrate the utility of ML-PWS by applying it to neuronal
time-series data.
中文标题/摘要
标题:ML-PWS:利用神经网络估计实验时间序列间的互信息
量化信息传输能力对于分析和设计自然与工程系统至关重要。信息传输速率是时变信号系统的基本度量指标,但其计算极具挑战性。由于信号轨迹空间的高维特性,若不采用近似方法,无法直接从实验时间序列数据中获取该速率。路径权重采样(PWS)是一种计算技术,可精确获取任何随机系统的信息速率,但需要基于主方程或微分方程组构建目标系统的数学模型。本文提出一种结合机器学习(ML)的技术:首先通过实验时间序列数据构建生成模型,再与PWS结合以获取信息速率。通过对比非线性模型生成的合成时间序列数据上ML-PWS的结果与直接应用PWS于同一模型获得的基准结果,我们验证了ML-PWS技术的准确性,并通过应用于神经元时间序列数据展示了其实用价值。
Summary / 总结
The research is motivated by the challenge of quantifying information transmission rates from experimental time series, which is computationally difficult due to the high dimensionality of signal trajectories. The method, termed ML-PWS, combines machine learning to build a generative model from time-series data with Path Weight Sampling (PWS) to estimate the mutual information rate without requiring a pre-existing mathematical model. Experimental results on synthetic data from a non-linear model show that ML-PWS achieves accuracy comparable to ground-truth PWS applied directly to the model, and its utility is further demonstrated on neuronal time-series data.
本研究旨在解决从实验时间序列数据中量化信息传输速率的挑战,这对分析和设计自然与工程系统至关重要,但由于高维轨迹空间而计算困难。该方法称为ML-PWS,结合机器学习从实验数据开发生成模型,并利用路径权重采样(PWS)来估计互信息率,无需预先的数学模型。实验结果在合成非线性数据上验证了其准确性,并展示了在神经元时间序列数据上的实用性。
MuST2-Learn: Multi-view Spatial-Temporal-Type Learning for Heterogeneous Municipal Service Time Estimation
Authors: Nadia Asif, Zhiqing Hong, Shaogang Ren, Xiaonan Zhang, Xiaojun Shang, Yukun Yuan
First: 2025-08-22T16:28:57+00:00 · Latest: 2025-08-22T16:28:57+00:00
Comments: Accepted to SIGSPATIAL 2025
Abstract
Non-emergency municipal services such as city 311 systems have been widely
implemented across cities in Canada and the United States to enhance residents'
quality of life. These systems enable residents to report issues, e.g., noise
complaints, missed garbage collection, and potholes, via phone calls, mobile
applications, or webpages. However, residents are often given limited
information about when their service requests will be addressed, which can
reduce transparency, lower resident satisfaction, and increase the number of
follow-up inquiries. Predicting the service time for municipal service requests
is challenging due to several complex factors: dynamic spatial-temporal
correlations, underlying interactions among heterogeneous service request
types, and high variation in service duration even within the same request
category. In this work, we propose MuST2-Learn: a Multi-view
Spatial-Temporal-Type Learning framework designed to address the aforementioned
challenges by jointly modeling spatial, temporal, and service type dimensions.
In detail, it incorporates an inter-type encoder to capture relationships among
heterogeneous service request types and an intra-type variation encoder to
model service time variation within homogeneous types. In addition, a
spatiotemporal encoder is integrated to capture spatial and temporal
correlations in each request type. The proposed framework is evaluated with
extensive experiments using two real-world datasets. The results show that
MuST2-Learn reduces mean absolute error by at least 32.5%, which outperforms
state-of-the-art methods.
中文标题/摘要
标题:MuST2-Learn:面向异构市政服务时间预估的多视角时空类型学习框架
非紧急市政服务(如城市311系统)已在加拿大和美国多个城市广泛实施,以提升居民生活质量。居民可通过电话、移动应用或网页上报问题,例如噪音投诉、垃圾未收和道路坑洼。然而,居民往往对服务请求的处理时间知之甚少,这降低了透明度,影响居民满意度,并增加后续查询次数。市政服务请求的时间预测面临多重复杂因素:动态时空相关性、异构服务请求类型间的潜在交互,以及同类请求中服务时长的高变异性。本研究提出MuST2-Learn:一种多视角时空类型学习框架,通过联合建模空间、时间和服务类型维度应对上述挑战。具体包含类型间编码器捕捉异构服务类型关系,类型内变异编码器建模同质类型服务时间变异,并集成时空编码器捕获各请求类型的时空相关性。通过两个真实数据集的广泛实验证明,该框架将平均绝对误差降低至少32.5%,优于现有先进方法。
Summary / 总结
This research addresses the challenge of predicting service completion times for non-emergency municipal requests (e.g., noise complaints, potholes) to improve transparency and resident satisfaction. The authors propose MuST2-Learn, a multi-view framework that jointly models spatial, temporal, and service-type dimensions using specialized encoders for inter-type relationships, intra-type variation, and spatiotemporal correlations. Experimental results on two real-world datasets demonstrate that the method reduces mean absolute error by at least 32.5% compared to state-of-the-art baselines.
本研究旨在提升市政311系统的透明度和居民满意度,通过准确预测服务请求的完成时间,该任务因动态时空相关性、异构请求类型间交互及同类请求内时长差异大而极具挑战。作者提出了MuST2-Learn框架,采用多视图学习方法联合建模时空和类型维度,包含类型间编码器捕获跨类型关系、类型内变异编码器处理同类时长差异,以及时空编码器学习各类型的时空相关性。在两个真实数据集上的实验表明,该方法比现有最优方法的平均绝对误差降低了至少32.5%。
On Zero-Shot Reinforcement Learning
Authors: Scott Jeen
First: 2025-08-22T16:20:49+00:00 · Latest: 2025-08-22T16:20:49+00:00
Comments: PhD thesis
Abstract
Modern reinforcement learning (RL) systems capture deep truths about general,
human problem-solving. In domains where new data can be simulated cheaply,
these systems uncover sequential decision-making policies that far exceed the
ability of any human. Society faces many problems whose solutions require this
skill, but they are often in domains where new data cannot be cheaply
simulated. In such scenarios, we can learn simulators from existing data, but
these will only ever be approximately correct, and can be pathologically
incorrect when queried outside of their training distribution. As a result, a
misalignment between the environments in which we train our agents and the
real-world in which we wish to deploy our agents is inevitable. Dealing with
this misalignment is the primary concern of zero-shot reinforcement learning, a
problem setting where the agent must generalise to a new task or domain with
zero practice shots. Whilst impressive progress has been made on methods that
perform zero-shot RL in idealised settings, new work is needed if these results
are to be replicated in real-world settings. In this thesis, we argue that
doing so requires us to navigate (at least) three constraints. First, the data
quality constraint: real-world datasets are small and homogeneous. Second, the
observability constraint: states, dynamics and rewards in the real-world are
often only partially observed. And third, the data availability constraint: a
priori access to data cannot always be assumed. This work proposes a suite of
methods that perform zero-shot RL subject to these constraints. In a series of
empirical studies we expose the failings of existing methods, and justify our
techniques for remedying them. We believe these designs take us a step closer
to RL methods that can be deployed to solve real-world problems.
中文标题/摘要
标题:论零样本强化学习
现代强化学习(RL)系统揭示了关于通用人类问题解决的深层原理。在能够低成本模拟新数据的领域,这些系统发现的序列决策策略远超人类能力。社会面临许多需要此类技能解决的问题,但这些领域往往无法低成本模拟新数据。在此类场景中,我们可以从现有数据学习模拟器,但这些模拟器仅能近似正确,且在训练分布之外查询时可能出现病态错误。因此,智能体训练环境与实际部署环境之间的失配不可避免。处理这种失配是零样本强化学习的核心议题——该问题设定要求智能体在零实践样本的情况下泛化至新任务或领域。尽管在理想化设置中实现零样本RL的方法已取得显著进展,但若要在现实场景中复现这些成果仍需新的研究。本论文主张需应对(至少)三重约束:其一,数据质量约束——现实数据集小而同质;其二,可观测性约束——现实中的状态、动态和奖励往往只能被部分观测;其三,数据可用性约束——不能总是假定可先验获取数据。本研究提出一套在此类约束下实现零样本RL的方法,通过系列实证研究揭示现有方法的缺陷,并论证改进技术的合理性。我们相信这些设计使可部署解决实际问题的RL方法更近一步。
Summary / 总结
This thesis addresses the challenge of zero-shot reinforcement learning, where agents must generalize to new tasks without additional practice, motivated by the misalignment between simulated training environments and real-world deployment scenarios. The proposed method involves developing a suite of techniques that account for three key constraints: limited and homogeneous data quality, partial observability of states and rewards, and uncertain data availability. Experimental results demonstrate that existing methods fail under these constraints, while the new techniques effectively mitigate these issues, advancing the feasibility of deploying RL in real-world problems.
本研究针对强化学习在现实场景中部署的挑战,即模拟数据成本高昂或不可用,导致训练环境与部署环境存在不可避免的错位。论文提出了一套零样本强化学习方法,专门解决三个约束:数据质量有限且同质、状态和奖励的部分可观测性,以及先验数据访问的不确定性。实验研究表明,现有方法在这些约束下失效,而所提出的技术有效弥补了这些不足,推动了强化学习在现实问题解决中的实用性。
Post Hoc Regression Refinement via Pairwise Rankings
Authors: Kevin Tirta Wijaya, Michael Sun, Minghao Guo, Hans-Peter Seidel, Wojciech Matusik, Vahid Babaei
First: 2025-08-22T16:17:31+00:00 · Latest: 2025-08-22T16:17:31+00:00
Abstract
Accurate prediction of continuous properties is essential to many scientific
and engineering tasks. Although deep-learning regressors excel with abundant
labels, their accuracy deteriorates in data-scarce regimes. We introduce
RankRefine, a model-agnostic, plug-and-play post hoc method that refines
regression with expert knowledge coming from pairwise rankings. Given a query
item and a small reference set with known properties, RankRefine combines the
base regressor's output with a rank-based estimate via inverse variance
weighting, requiring no retraining. In molecular property prediction task,
RankRefine achieves up to 10% relative reduction in mean absolute error using
only 20 pairwise comparisons obtained through a general-purpose large language
model (LLM) with no finetuning. As rankings provided by human experts or
general-purpose LLMs are sufficient for improving regression across diverse
domains, RankRefine offers practicality and broad applicability, especially in
low-data settings.
中文标题/摘要
标题:基于成对排序的事后回归优化
连续属性的精确预测对众多科学与工程任务至关重要。尽管深度学习回归器在标签充足时表现卓越,但在数据稀缺场景下其准确性会下降。我们提出RankRefine——一种与模型无关、即插即用的事后优化方法,通过来自成对排序的专家知识来改进回归。给定查询项和已知属性的小型参考集,RankRefine通过逆方差加权将基础回归器输出与基于排序的估计相结合,无需重新训练。在分子属性预测任务中,仅使用通用大语言模型(LLM)未经微调生成的20组成对比较,RankRefine即可实现平均绝对误差相对降低高达10%。由于人类专家或通用LLM提供的排序足以改进跨领域回归,RankRefine具有实用性和广泛适用性,尤其在低数据环境中。
Summary / 总结
This research addresses the challenge of maintaining regression accuracy in data-scarce environments where deep learning models typically underperform. The authors propose RankRefine, a model-agnostic post hoc method that refines regression outputs by incorporating pairwise ranking knowledge from experts or general-purpose large language models without requiring retraining. Through inverse variance weighting that combines base regressor predictions with rank-based estimates, the method achieved up to 10% relative reduction in mean absolute error in molecular property prediction tasks using only 20 pairwise comparisons.
本研究针对数据稀缺环境下深度学习回归模型性能下降的问题,提出了一种模型无关的后处理方法RankRefine。该方法通过整合专家知识提供的成对排序信息,采用逆方差加权策略将基础回归器输出与基于排序的估计相结合,且无需重新训练模型。在分子属性预测任务中,仅使用通用大语言模型提供的20组成对比较,该方法就能实现平均绝对误差相对降低10%的改进效果。
Ensembles of Neural Surrogates for Parametric Sensitivity in Ocean Modeling
Authors: Yixuan Sun, Romain Egele, Sri Hari Krishna Narayana, Luke Van Roekel, Carmelo Gonzales, Steven Brus, Balu Nadiga, Sandeep Madireddy, Prasanna Balaprakash
First: 2025-08-22T16:12:04+00:00 · Latest: 2025-08-22T16:12:04+00:00
Comments: 12 pages, 7 figures
Abstract
Accurate simulations of the oceans are crucial in understanding the Earth
system. Despite their efficiency, simulations at lower resolutions must rely on
various uncertain parameterizations to account for unresolved processes.
However, model sensitivity to parameterizations is difficult to quantify,
making it challenging to tune these parameterizations to reproduce
observations. Deep learning surrogates have shown promise for efficient
computation of the parametric sensitivities in the form of partial derivatives,
but their reliability is difficult to evaluate without ground truth
derivatives. In this work, we leverage large-scale hyperparameter search and
ensemble learning to improve both forward predictions, autoregressive rollout,
and backward adjoint sensitivity estimation. Particularly, the ensemble method
provides epistemic uncertainty of function value predictions and their
derivatives, providing improved reliability of the neural surrogates in
decision making.
中文标题/摘要
标题:海洋建模中参数敏感性的神经网络代理集成方法
精确的海洋模拟对理解地球系统至关重要。尽管低分辨率模拟效率高,但必须依赖各种不确定的参数化来处理未解析过程。然而,模型对参数化的敏感性难以量化,这使得调整参数化以复现观测数据具有挑战性。深度学习代理在通过偏导数形式高效计算参数敏感性方面展现出潜力,但缺乏真实导数时其可靠性难以评估。本研究利用大规模超参数搜索和集成学习改进前向预测、自回归推演及后向伴随敏感性估计。特别地,集成方法提供了函数值预测及其导数的认知不确定性,从而增强了神经代理在决策中的可靠性。
Summary / 总结
Accurate ocean modeling is essential for Earth system science, but low-resolution simulations rely on uncertain parameterizations that are difficult to tune due to unquantified sensitivity. This work employs large-scale hyperparameter search and ensemble learning to develop neural surrogates that improve forward predictions, autoregressive rollouts, and adjoint sensitivity estimation. Experimental results demonstrate that the ensemble method provides epistemic uncertainty for both function values and derivatives, enhancing the reliability of neural surrogates in sensitivity analysis and decision-making.
精确的海洋建模对理解地球系统至关重要,但低分辨率模拟依赖于难以量化敏感性的不确定参数化方案,阻碍了模型调优。本研究采用大规模超参数搜索和集成学习方法,开发了改进前向预测、自回归推演和伴随敏感性估计的神经代理模型。实验结果表明,集成方法为函数值及其导数提供了认知不确定性,从而增强了神经代理在决策中的可靠性。
FraPPE: Fast and Efficient Preference-based Pure Exploration
Authors: Udvas Das, Apurv Shukla, Debabrota Basu
First: 2025-08-22T16:02:06+00:00 · Latest: 2025-08-22T16:02:06+00:00
Abstract
Preference-based Pure Exploration (PrePEx) aims to identify with a given
confidence level the set of Pareto optimal arms in a vector-valued (aka
multi-objective) bandit, where the reward vectors are ordered via a (given)
preference cone $\mathcal{C}$. Though PrePEx and its variants are well-studied,
there does not exist a computationally efficient algorithm that can optimally
track the existing lower bound for arbitrary preference cones. We successfully
fill this gap by efficiently solving the minimisation and maximisation problems
in the lower bound. First, we derive three structural properties of the lower
bound that yield a computationally tractable reduction of the minimisation
problem. Then, we deploy a Frank-Wolfe optimiser to accelerate the maximisation
problem in the lower bound. Together, these techniques solve the maxmin
optimisation problem in $\mathcal{O}(KL^{2})$ time for a bandit instance with
$K$ arms and $L$ dimensional reward, which is a significant acceleration over
the literature. We further prove that our proposed PrePEx algorithm, FraPPE,
asymptotically achieves the optimal sample complexity. Finally, we perform
numerical experiments across synthetic and real datasets demonstrating that
FraPPE achieves the lowest sample complexities to identify the exact Pareto set
among the existing algorithms.
中文标题/摘要
标题:FraPPE:基于偏好的快速高效纯探索
基于偏好的纯探索(PrePEx)旨在以给定置信水平识别向量值(即多目标)赌博机中的帕累托最优臂集,其中奖励向量通过(给定的)偏好锥$\mathcal{C}$排序。尽管PrePEx及其变体已得到充分研究,但尚无计算高效算法能最优追踪任意偏好锥的现有下界。我们通过高效解决下界中的最小化和最大化问题成功填补了这一空白。首先,推导出下界的三个结构特性,使最小化问题可计算地简化;随后采用Frank-Wolfe优化器加速下界中的最大化问题。这些技术共同以$\mathcal{O}(KL^{2})$时间复杂度解决了包含$K$个臂和$L$维奖励的赌博机实例的maxmin优化问题,较现有研究实现显著加速。进一步证明所提算法FraPPE渐近达到最优样本复杂度。最后通过在合成和真实数据集上的数值实验表明,FraPPE在现有算法中实现了识别精确帕累托集的最低样本复杂度。
Summary / 总结
The research addresses the computational inefficiency in Preference-based Pure Exploration (PrePEx) for multi-objective bandits, where existing methods fail to optimally track the theoretical lower bound for arbitrary preference cones. The proposed method, FraPPE, first derives three structural properties to simplify the minimisation problem and then employs a Frank-Wolfe optimiser to accelerate the maximisation problem, reducing the time complexity to O(KL²) for K arms and L-dimensional rewards. Experimental results on synthetic and real datasets show that FraPPE achieves the lowest sample complexity among existing algorithms to exactly identify the Pareto set while asymptotically matching the optimal sample complexity.
该研究针对多目标赌博机中的偏好纯探索问题,旨在解决现有方法无法高效追踪任意偏好锥下理论下界的计算瓶颈。提出的FraPPE方法首先推导了三个结构特性以简化最小化问题,随后采用Frank-Wolfe优化器加速最大化问题,将计算复杂度降至O(KL²)(K为臂数,L为奖励维度)。在合成与真实数据集上的实验表明,FraPPE实现了现有算法中最低的样本复杂度,并渐近达到精确帕累托集识别的最优样本效率。
Underdamped Langevin MCMC with third order convergence
Authors: Maximilian Scott, Dáire O'Kane, Andraž Jelinčič, James Foster
First: 2025-08-22T16:00:01+00:00 · Latest: 2025-08-22T16:00:01+00:00
Comments: 62 pages, 7 figures
Abstract
In this paper, we propose a new numerical method for the underdamped Langevin
diffusion (ULD) and present a non-asymptotic analysis of its sampling error in
the 2-Wasserstein distance when the $d$-dimensional target distribution
$p(x)\propto e^{-f(x)}$ is strongly log-concave and has varying degrees of
smoothness. Precisely, under the assumptions that the gradient and Hessian of
$f$ are Lipschitz continuous, our algorithm achieves a 2-Wasserstein error of
$\varepsilon$ in $\mathcal{O}(\sqrt{d}/\varepsilon)$ and
$\mathcal{O}(\sqrt{d}/\sqrt{\varepsilon})$ steps respectively. Therefore, our
algorithm has a similar complexity as other popular Langevin MCMC algorithms
under matching assumptions. However, if we additionally assume that the third
derivative of $f$ is Lipschitz continuous, then our algorithm achieves a
2-Wasserstein error of $\varepsilon$ in
$\mathcal{O}(\sqrt{d}/\varepsilon^{\frac{1}{3}})$ steps. To the best of our
knowledge, this is the first gradient-only method for ULD with third order
convergence. To support our theory, we perform Bayesian logistic regression
across a range of real-world datasets, where our algorithm achieves competitive
performance compared to an existing underdamped Langevin MCMC algorithm and the
popular No U-Turn Sampler (NUTS).
中文标题/摘要
标题:具有三阶收敛性的欠阻尼朗之万MCMC方法
本文提出了一种针对欠阻尼朗之万扩散(ULD)的新数值方法,并在目标分布$p(x)\propto e^{-f(x)}$为强对数凹且具有不同光滑度程度的$d$维情况下,对其在2-Wasserstein距离上的采样误差进行了非渐近分析。具体而言,在$f$的梯度和Hessian矩阵满足Lipschitz连续的假设下,我们的算法分别以$\mathcal{O}(\sqrt{d}/\varepsilon)$和$\mathcal{O}(\sqrt{d}/\sqrt{\varepsilon})$步数达到$\varepsilon$的2-Wasserstein误差。因此,在匹配假设下,我们的算法与其他流行的朗之万MCMC算法具有相似的复杂度。然而,若进一步假设$f$的三阶导数也满足Lipschitz连续,则算法仅需$\mathcal{O}(\sqrt{d}/\varepsilon^{\frac{1}{3}})$步即可达到相同精度。据我们所知,这是首个实现三阶收敛的纯梯度ULD方法。为验证理论,我们在多个真实数据集上进行贝叶斯逻辑回归实验,结果显示该算法与现有欠阻尼朗之万MCMC算法及主流无转弯采样器(NUTS)相比具有竞争优势。
Summary / 总结
This work aims to improve the computational efficiency of sampling from strongly log-concave distributions using underdamped Langevin MCMC. The authors propose a new numerical method for the underdamped Langevin diffusion and analyze its non-asymptotic convergence in the 2-Wasserstein distance under gradient and Hessian Lipschitz conditions. Their algorithm achieves an ε error in O(√d/ε) and O(√d/√ε) steps under these assumptions, matching the complexity of existing methods. Notably, with an additional Lipschitz third derivative assumption, it attains third-order convergence with O(√d/ε^(1/3)) steps, which is the first gradient-only method to do so for ULD. Experimental validation on Bayesian logistic regression using real-world datasets shows competitive performance compared to established underdamped Langevin MCMC and NUTS.
本研究旨在提高欠阻尼 Langevin MCMC 在从强对数凹分布中采样时的收敛速率。该方法基于梯度对欠阻尼 Langevin 扩散进行数值离散,并在梯度与 Hessian 矩阵 Lipschitz 连续的假设下,以 2-Wasserstein 距离非渐近地分析采样误差。算法在 O(√d/ε) 或 O(√d/√ε) 步数内达到 ε 误差,与现有方法相当;但若进一步假设三阶导数 Lipschitz 连续,则仅需 O(√d/ε^(1/3)) 步即可实现三阶收敛——这是首个仅使用梯度的 ULD 方法达成该结果。在贝叶斯逻辑回归实验中,该算法在真实数据集上相比现有 ULD 和 NUTS 采样器表现出有竞争力的性能。
HAMSA: Hijacking Aligned Compact Models via Stealthy Automation
Authors: Alexey Krylov, Iskander Vagizov, Dmitrii Korzh, Maryam Douiba, Azidine Guezzaz, Vladimir Kokh, Sergey D. Erokhin, Elena V. Tutubalina, Oleg Y. Rogov
First: 2025-08-22T15:57:57+00:00 · Latest: 2025-08-22T15:57:57+00:00
Comments: 9 pages, 1 figure; article under review
Abstract
Large Language Models (LLMs), especially their compact efficiency-oriented
variants, remain susceptible to jailbreak attacks that can elicit harmful
outputs despite extensive alignment efforts. Existing adversarial prompt
generation techniques often rely on manual engineering or rudimentary
obfuscation, producing low-quality or incoherent text that is easily flagged by
perplexity-based filters. We present an automated red-teaming framework that
evolves semantically meaningful and stealthy jailbreak prompts for aligned
compact LLMs. The approach employs a multi-stage evolutionary search, where
candidate prompts are iteratively refined using a population-based strategy
augmented with temperature-controlled variability to balance exploration and
coherence preservation. This enables the systematic discovery of prompts
capable of bypassing alignment safeguards while maintaining natural language
fluency. We evaluate our method on benchmarks in English (In-The-Wild Jailbreak
Prompts on LLMs), and a newly curated Arabic one derived from In-The-Wild
Jailbreak Prompts on LLMs and annotated by native Arabic linguists, enabling
multilingual assessment.
中文标题/摘要
标题:HAMSA:通过隐蔽自动化劫持对齐的紧凑模型
大型语言模型(LLMs),尤其是其追求效率的紧凑变体,尽管经过广泛的对齐努力,仍易受越狱攻击影响,导致有害输出。现有的对抗性提示生成技术常依赖人工工程或初级混淆,产生低质量或不连贯的文本,易被基于困惑度的过滤器标记。我们提出了一种自动化红队框架,为对齐的紧凑LLMs生成语义明确且隐蔽的越狱提示。该方法采用多阶段进化搜索,通过基于种群的策略迭代优化候选提示,并辅以温度控制的变异性来平衡探索与连贯性保持。这能够系统性地发现既能绕过对齐安全措施又保持自然语言流畅性的提示。我们在英语基准(LLMs野外越狱提示)及新构建的阿拉伯语基准(源自LLMs野外越狱提示并由阿拉伯语母语语言学家标注)上评估了该方法,实现了多语言评估。
Summary / 总结
This research addresses the vulnerability of aligned compact language models to jailbreak attacks that elicit harmful outputs, noting that existing adversarial prompt generation methods often produce low-quality or easily detectable text. The authors introduce HAMSA, an automated red-teaming framework that employs a multi-stage evolutionary search with temperature-controlled variability to generate semantically meaningful and stealthy jailbreak prompts. Experimental evaluation on English and newly curated Arabic benchmarks demonstrates that the method effectively bypasses alignment safeguards while maintaining natural language fluency.
本研究针对对齐后紧凑型大语言模型仍易受越狱攻击生成有害输出的问题,指出现有对抗性提示生成方法常产生低质量或易被检测的文本。作者提出了HAMSA框架,采用多阶段进化搜索与温度控制变异相结合的方法,迭代生成语义明确且隐蔽的越狱提示。在英语及新构建的阿拉伯语基准测试中,该方法在保持自然语言流畅性的同时成功绕过对齐防护机制,并规避了基于困惑度的过滤器。
Benchmarking the Robustness of Agentic Systems to Adversarially-Induced Harms
Authors: Jonathan Nöther, Adish Singla, Goran Radanovic
First: 2025-08-22T15:53:22+00:00 · Latest: 2025-08-22T15:53:22+00:00
Comments: 52 Pages
Abstract
Ensuring the safe use of agentic systems requires a thorough understanding of
the range of malicious behaviors these systems may exhibit when under attack.
In this paper, we evaluate the robustness of LLM-based agentic systems against
attacks that aim to elicit harmful actions from agents. To this end, we propose
a novel taxonomy of harms for agentic systems and a novel benchmark, BAD-ACTS,
for studying the security of agentic systems with respect to a wide range of
harmful actions. BAD-ACTS consists of 4 implementations of agentic systems in
distinct application environments, as well as a dataset of 188 high-quality
examples of harmful actions. This enables a comprehensive study of the
robustness of agentic systems across a wide range of categories of harmful
behaviors, available tools, and inter-agent communication structures. Using
this benchmark, we analyze the robustness of agentic systems against an
attacker that controls one of the agents in the system and aims to manipulate
other agents to execute a harmful target action. Our results show that the
attack has a high success rate, demonstrating that even a single adversarial
agent within the system can have a significant impact on the security. This
attack remains effective even when agents use a simple prompting-based defense
strategy. However, we additionally propose a more effective defense based on
message monitoring. We believe that this benchmark provides a diverse testbed
for the security research of agentic systems. The benchmark can be found at
github.com/JNoether/BAD-ACTS
中文标题/摘要
标题:评估代理系统对对抗性诱导危害的鲁棒性基准研究
确保代理系统的安全使用需全面理解其在受攻击时可能表现出的恶意行为范围。本文评估了基于大语言模型的代理系统抵御旨在诱发有害行为的攻击的鲁棒性。为此,我们提出了代理系统危害的新分类法和新型基准BAD-ACTS,用于研究代理系统在广泛有害行为方面的安全性。BAD-ACTS包含4种不同应用环境中的代理系统实现及188个高质量有害行为实例数据集,支持跨多类有害行为、可用工具和代理间通信结构的全面鲁棒性研究。通过该基准,我们分析了当攻击者控制系统内单个代理并试图操纵其他代理执行有害目标行为时的系统鲁棒性。结果表明该攻击成功率较高,证明即使单个敌对代理也能对安全产生重大影响。即使代理采用基于提示的简单防御策略,该攻击依然有效。但我们进一步提出了基于消息监控的更有效防御方案。相信该基准能为代理系统安全研究提供多样化测试平台,详见github.com/JNoether/BAD-ACTS
Summary / 总结
This research addresses the need to understand and mitigate malicious behaviors in LLM-based agentic systems when subjected to adversarial attacks. The authors introduce a novel taxonomy of harms and a benchmark called BAD-ACTS, which includes four distinct agentic system implementations and a dataset of 188 harmful action examples. Experimental results demonstrate that an attacker controlling a single agent can achieve a high success rate in manipulating other agents to perform harmful actions, even when basic prompting defenses are used; however, a proposed message monitoring defense proves more effective.
本研究旨在确保智能体系统的安全部署,通过理解其在对抗攻击下诱发有害行为的脆弱性。作者提出了一种新的有害行为分类法和名为BAD-ACTS的基准测试,包含四个不同的智能体系统实现和188个有害行为示例数据集,能够全面测试系统在各种有害行为类别、工具和通信结构下的鲁棒性。实验结果表明,攻击者控制单个智能体即可高成功率地操纵其他智能体执行有害动作,即使系统采用简单的提示式防御策略仍易受攻击;但作者提出的基于消息监控的防御方法更为有效。
Disentangled Multi-modal Learning of Histology and Transcriptomics for Cancer Characterization
Authors: Yupei Zhang, Xiaofei Wang, Anran Liu, Lequan Yu, Chao Li
First: 2025-08-22T15:51:33+00:00 · Latest: 2025-08-22T15:51:33+00:00
Abstract
Histopathology remains the gold standard for cancer diagnosis and prognosis.
With the advent of transcriptome profiling, multi-modal learning combining
transcriptomics with histology offers more comprehensive information. However,
existing multi-modal approaches are challenged by intrinsic multi-modal
heterogeneity, insufficient multi-scale integration, and reliance on paired
data, restricting clinical applicability. To address these challenges, we
propose a disentangled multi-modal framework with four contributions: 1) To
mitigate multi-modal heterogeneity, we decompose WSIs and transcriptomes into
tumor and microenvironment subspaces using a disentangled multi-modal fusion
module, and introduce a confidence-guided gradient coordination strategy to
balance subspace optimization. 2) To enhance multi-scale integration, we
propose an inter-magnification gene-expression consistency strategy that aligns
transcriptomic signals across WSI magnifications. 3) To reduce dependency on
paired data, we propose a subspace knowledge distillation strategy enabling
transcriptome-agnostic inference through a WSI-only student model. 4) To
improve inference efficiency, we propose an informative token aggregation
module that suppresses WSI redundancy while preserving subspace semantics.
Extensive experiments on cancer diagnosis, prognosis, and survival prediction
demonstrate our superiority over state-of-the-art methods across multiple
settings. Code is available at
https://github.com/helenypzhang/Disentangled-Multimodal-Learning.
中文标题/摘要
标题:解耦多模态学习在组织学与转录组学中的癌症特征研究
组织病理学仍是癌症诊断与预后的金标准。随着转录组分析技术的发展,结合转录组学与组织学的多模态学习提供了更全面的信息。然而,现有多模态方法面临内在模态异质性、多尺度整合不足及对配对数据的依赖等挑战,限制了临床适用性。为此,我们提出解耦多模态框架,包含四项创新:1)通过解耦多模态融合模块将全切片图像和转录组分解为肿瘤与微环境子空间,并采用置信度引导的梯度协调策略平衡子空间优化;2)提出跨放大倍数基因表达一致性策略,实现不同分辨率下的转录组信号对齐;3)设计子空间知识蒸馏策略,使仅需全切片图像的学生模型实现不依赖转录组的推理;4)开发信息令牌聚合模块,在抑制冗余的同时保留子空间语义。在癌症诊断、预后和生存预测的大规模实验中,本方法在多种设定下均优于现有最优方法。代码详见:https://github.com/helenypzhang/Disentangled-Multimodal-Learning。
Summary / 总结
This research addresses key challenges in multi-modal cancer characterization by integrating histology and transcriptomics, motivated by the limitations of existing methods in handling multi-modal heterogeneity, insufficient multi-scale integration, and reliance on paired data. The proposed method introduces a disentangled multi-modal framework featuring tumor and microenvironment subspace decomposition, confidence-guided gradient coordination, inter-magnification gene-expression consistency, subspace knowledge distillation for transcriptome-agnostic inference, and an informative token aggregation module. Experimental results demonstrate superior performance in cancer diagnosis, prognosis, and survival prediction across multiple settings compared to state-of-the-art methods.
针对现有多模态方法在癌症表征中存在的模态异质性、多尺度整合不足以及对配对数据依赖性强的问题,本研究提出了一种解耦多模态学习框架。该方法通过解耦融合模块将全切片图像和转录组分解为肿瘤和微环境子空间,采用置信度引导的梯度协调策略平衡子空间优化,通过跨放大倍数基因表达一致性对齐转录组信号,利用子空间知识蒸馏减少对配对数据的依赖,并通过信息令牌聚合提升推理效率。在癌症诊断、预后和生存预测任务上的广泛实验表明,该方法在多种设置下均优于现有最先进方法。
LLM-as-classifier: Semi-Supervised, Iterative Framework for Hierarchical Text Classification using Large Language Models
Authors: Doohee You, Andy Parisi, Zach Vander Velden, Lara Dantas Inojosa
First: 2025-08-22T15:47:17+00:00 · Latest: 2025-08-22T15:47:17+00:00
Comments: 20 pages excluding reference list, 2 figures
Abstract
The advent of Large Language Models (LLMs) has provided unprecedented
capabilities for analyzing unstructured text data. However, deploying these
models as reliable, robust, and scalable classifiers in production environments
presents significant methodological challenges. Standard fine-tuning approaches
can be resource-intensive and often struggle with the dynamic nature of
real-world data distributions, which is common in the industry. In this paper,
we propose a comprehensive, semi-supervised framework that leverages the zero-
and few-shot capabilities of LLMs for building hierarchical text classifiers as
a framework for a solution to these industry-wide challenges. Our methodology
emphasizes an iterative, human-in-the-loop process that begins with domain
knowledge elicitation and progresses through prompt refinement, hierarchical
expansion, and multi-faceted validation. We introduce techniques for assessing
and mitigating sequence-based biases and outline a protocol for continuous
monitoring and adaptation. This framework is designed to bridge the gap between
the raw power of LLMs and the practical need for accurate, interpretable, and
maintainable classification systems in industry applications.
中文标题/摘要
标题:LLM作为分类器:利用大语言模型进行层次化文本分类的半监督迭代框架
大语言模型(LLMs)的出现为分析非结构化文本数据提供了前所未有的能力。然而,在生产环境中将这些模型部署为可靠、鲁棒且可扩展的分类器仍面临重大方法学挑战。标准的微调方法资源消耗大,且难以应对现实世界数据分布的动态特性——这在实际工业场景中十分常见。本文提出一个全面的半监督框架,利用LLMs的零样本和少样本能力构建层次化文本分类器,以应对这些行业共性挑战。我们的方法强调迭代式的人机协同流程:从领域知识抽取开始,逐步推进提示词优化、层次结构扩展和多维度验证。我们提出了评估和缓解序列偏差的技术,并制定了持续监控与自适应调整的协议。该框架旨在弥合LLMs原始能力与工业应用中对精准、可解释、可维护分类系统的实际需求之间的差距。
Summary / 总结
This research addresses the challenge of deploying large language models (LLMs) as reliable and scalable classifiers in dynamic industrial settings, where standard fine-tuning methods are often resource-intensive and struggle with shifting data distributions. The authors propose a semi-supervised framework that leverages zero- and few-shot capabilities of LLMs, incorporating an iterative, human-in-the-loop process involving domain knowledge elicitation, prompt refinement, hierarchical expansion, and multi-faceted validation. Key experimental contributions include techniques to assess and mitigate sequence-based biases and a protocol for continuous monitoring, aiming to bridge the gap between LLMs' raw power and the need for accurate, interpretable, and maintainable classification systems.
本研究旨在解决大型语言模型(LLM)在动态工业环境中作为可靠、可扩展分类器部署的挑战,其中标准微调方法通常资源密集且难以应对数据分布的变化。作者提出了一种半监督的迭代框架,利用LLM的零样本和少样本能力,结合人在回路的流程进行领域知识提取、提示优化、层次扩展和多方面验证,以减轻偏差并确保适应性。实验结果表明,该框架有效弥合了LLM原始能力与实际工业应用中对准确、可解释且可维护的层次文本分类系统需求之间的差距。
NOSTRA: A noise-resilient and sparse data framework for trust region based multi objective Bayesian optimization
Authors: Maryam Ghasemzadeh, Anton van Beek
First: 2025-08-22T15:43:01+00:00 · Latest: 2025-08-22T15:43:01+00:00
Abstract
Multi-objective Bayesian optimization (MOBO) struggles with sparse
(non-space-filling), scarce (limited observations) datasets affected by
experimental uncertainty, where identical inputs can yield varying outputs.
These challenges are common in physical and simulation experiments (e.g.,
randomized medical trials and, molecular dynamics simulations) and are
therefore incompatible with conventional MOBO methods. As a result,
experimental resources are inefficiently allocated, leading to suboptimal
designs. To address this challenge, we introduce NOSTRA (Noisy and Sparse Data
Trust Region-based Optimization Algorithm), a novel sampling framework that
integrates prior knowledge of experimental uncertainty to construct more
accurate surrogate models while employing trust regions to focus sampling on
promising areas of the design space. By strategically leveraging prior
information and refining search regions, NOSTRA accelerates convergence to the
Pareto frontier, enhances data efficiency, and improves solution quality.
Through two test functions with varying levels of experimental uncertainty, we
demonstrate that NOSTRA outperforms existing methods in handling noisy, sparse,
and scarce data. Specifically, we illustrate that, NOSTRA effectively
prioritizes regions where samples enhance the accuracy of the identified Pareto
frontier, offering a resource-efficient algorithm that is practical in
scenarios with limited experimental budgets while ensuring efficient
performance.
中文标题/摘要
标题:NOSTRA:一种基于信任域的多目标贝叶斯优化的抗噪声稀疏数据框架
多目标贝叶斯优化(MOBO)在处理受实验不确定性影响的稀疏(非空间填充)、稀缺(有限观测)数据集时面临挑战,其中相同输入可能产生不同输出。这些挑战常见于物理和模拟实验(如随机化医学试验和分子动力学模拟),因此与传统MOBO方法不兼容,导致实验资源分配低效和设计次优。为解决此问题,我们提出NOSTRA(基于噪声稀疏数据信任域的优化算法),该新颖采样框架整合实验不确定性的先验知识以构建更精确的代理模型,同时采用信任域将采样聚焦于设计空间的有前景区域。通过策略性利用先验信息和精细化搜索区域,NOSTRA加速向帕累托前沿的收敛,提升数据效率并改善解质量。通过两个具有不同实验不确定性水平的测试函数,我们证明NOSTRA在处理噪声、稀疏和稀缺数据方面优于现有方法。具体而言,我们展示NOSTRA能有效优先处理那些能提升已识别帕累托前沿精度的样本区域,提供一种在实验预算有限场景下实用且资源高效的算法,同时确保性能高效。
Summary / 总结
Multi-objective Bayesian optimization (MOBO) often fails when data is sparse, scarce, and affected by experimental noise, which is common in physical and simulation experiments. To address this, the authors propose NOSTRA, a framework that integrates prior knowledge of uncertainty into surrogate models and uses trust regions to focus sampling on promising areas. Experimental results on test functions with varying noise levels show that NOSTRA outperforms existing methods by accelerating convergence to the Pareto frontier, improving data efficiency, and enhancing solution quality in resource-limited scenarios.
多目标贝叶斯优化(MOBO)在处理稀疏、稀缺且受实验噪声影响的数据集时面临挑战,相同输入可能产生不同输出,导致物理和仿真实验中资源分配低效和设计次优。为此,作者提出了NOSTRA框架,该框架整合实验不确定性的先验知识以构建更精确的代理模型,并利用信赖域将采样集中在设计空间的有希望区域。在具有不同噪声水平的测试函数上的实验结果表明,NOSTRA优于现有方法,能加速收敛到帕累托前沿,提高数据效率,并在有限实验预算下优先优化帕累托前沿的准确性。