MV-RAG: Retrieval Augmented Multiview Diffusion
Authors: Yosef Dayani, Omer Benishu, Sagie Benaim
First: 2025-08-22T17:59:40+00:00 · Latest: 2025-08-22T17:59:40+00:00
Comments: Project page: https://yosefdayani.github.io/MV-RAG
Abstract
Text-to-3D generation approaches have advanced significantly by leveraging
pretrained 2D diffusion priors, producing high-quality and 3D-consistent
outputs. However, they often fail to produce out-of-domain (OOD) or rare
concepts, yielding inconsistent or inaccurate results. To this end, we propose
MV-RAG, a novel text-to-3D pipeline that first retrieves relevant 2D images
from a large in-the-wild 2D database and then conditions a multiview diffusion
model on these images to synthesize consistent and accurate multiview outputs.
Training such a retrieval-conditioned model is achieved via a novel hybrid
strategy bridging structured multiview data and diverse 2D image collections.
This involves training on multiview data using augmented conditioning views
that simulate retrieval variance for view-specific reconstruction, alongside
training on sets of retrieved real-world 2D images using a distinctive held-out
view prediction objective: the model predicts the held-out view from the other
views to infer 3D consistency from 2D data. To facilitate a rigorous OOD
evaluation, we introduce a new collection of challenging OOD prompts.
Experiments against state-of-the-art text-to-3D, image-to-3D, and
personalization baselines show that our approach significantly improves 3D
consistency, photorealism, and text adherence for OOD/rare concepts, while
maintaining competitive performance on standard benchmarks.
中文标题/摘要
标题:MV-RAG:检索增强的多视角扩散模型
文本到3D生成方法通过利用预训练的2D扩散先验取得了显著进展,能够产生高质量且3D一致的结果。然而,这些方法在处理域外(OOD)或罕见概念时往往表现不佳,导致生成结果不一致或不准确。为此,我们提出了MV-RAG——一种新颖的文本到3D流程:首先从大规模真实世界2D数据库中检索相关图像,然后基于这些图像条件化多视角扩散模型以合成一致且准确的多视角输出。通过创新性混合策略训练这种检索条件化模型,该策略桥接了结构化多视角数据与多样化2D图像集合:一方面使用增强条件视角在多视角数据上进行训练以模拟检索方差实现视角特异性重建,另一方面通过独特保留视角预测目标在检索到的真实2D图像集上训练——模型根据其他视角预测被保留视角,从而从2D数据推断3D一致性。为促进严格OOD评估,我们引入了具有挑战性的新型OOD提示集合。与最先进的文本到3D、图像到3D及个性化基线方法的对比实验表明,我们的方法显著提升了OOD/罕见概念的3D一致性、照片真实感和文本遵循度,同时在标准基准测试中保持竞争力。
Summary / 总结
Text-to-3D generation approaches have advanced significantly by leveraging pretrained 2D diffusion priors, producing high-quality and 3D-consistent outputs.
Benchmarking Training Paradigms, Dataset Composition, and Model Scaling for Child ASR in ESPnet
Authors: Anyu Ying, Natarajan Balaji Shankar, Chyi-Jiunn Lin, Mohan Shi, Pu Wang, Hye-jin Shim, Siddhant Arora, Hugo Van hamme, Abeer Alwan, Shinji Watanabe
First: 2025-08-22T17:59:35+00:00 · Latest: 2025-08-22T17:59:35+00:00
Comments: 5 pages, 3 figures, presented at WOCCI 2025 (Workshop on Child
Computer Interaction), satellite workshop of Interspeech 2025
Abstract
Despite advancements in ASR, child speech recognition remains challenging due
to acoustic variability and limited annotated data. While fine-tuning adult ASR
models on child speech is common, comparisons with flat-start training remain
underexplored. We compare flat-start training across multiple datasets, SSL
representations (WavLM, XEUS), and decoder architectures. Our results show that
SSL representations are biased toward adult speech, with flat-start training on
child speech mitigating these biases. We also analyze model scaling, finding
consistent improvements up to 1B parameters, beyond which performance plateaus.
Additionally, age-related ASR and speaker verification analysis highlights the
limitations of proprietary models like Whisper, emphasizing the need for
open-data models for reliable child speech research. All investigations are
conducted using ESPnet, and our publicly available benchmark provides insights
into training strategies for robust child speech processing.
中文标题/摘要
标题:ESPnet中儿童语音识别的训练范式、数据集构成与模型规模扩展的基准测试
尽管自动语音识别(ASR)技术有所进步,但由于声学变异性和标注数据有限,儿童语音识别仍具挑战性。虽然通常采用成人ASR模型对儿童语音进行微调,但与从零开始的平面训练方式的对比研究仍不足。我们比较了跨多个数据集的平面训练、自监督学习表示(WavLM, XEUS)及解码器架构。结果显示,SSL表示存在对成人语音的偏好,而采用儿童语音的平面训练可缓解这种偏差。模型规模分析表明,参数增至10亿时性能持续提升,之后趋于平稳。此外,基于年龄的ASR和说话人验证分析揭示了如Whisper等专有模型的局限性,强调了开放数据模型对可靠儿童语音研究的重要性。所有研究均基于ESPnet框架,公开的基准测试为鲁棒的儿童语音处理训练策略提供了见解。
Summary / 总结
Despite advancements in ASR, child speech recognition remains challenging due to acoustic variability and limited annotated data.
A Disease-Centric Vision-Language Foundation Model for Precision Oncology in Kidney Cancer
Authors: Yuhui Tao, Zhongwei Zhao, Zilong Wang, Xufang Luo, Feng Chen, Kang Wang, Chuanfu Wu, Xue Zhang, Shaoting Zhang, Jiaxi Yao, Xingwei Jin, Xinyang Jiang, Yifan Yang, Dongsheng Li, Lili Qiu, Zhiqiang Shao, Jianming Guo, Nengwang Yu, Shuo Wang, Ying Xiong
First: 2025-08-22T17:48:19+00:00 · Latest: 2025-08-22T17:48:19+00:00
Abstract
The non-invasive assessment of increasingly incidentally discovered renal
masses is a critical challenge in urologic oncology, where diagnostic
uncertainty frequently leads to the overtreatment of benign or indolent tumors.
In this study, we developed and validated RenalCLIP using a dataset of 27,866
CT scans from 8,809 patients across nine Chinese medical centers and the public
TCIA cohort, a visual-language foundation model for characterization, diagnosis
and prognosis of renal mass. The model was developed via a two-stage
pre-training strategy that first enhances the image and text encoders with
domain-specific knowledge before aligning them through a contrastive learning
objective, to create robust representations for superior generalization and
diagnostic precision. RenalCLIP achieved better performance and superior
generalizability across 10 core tasks spanning the full clinical workflow of
kidney cancer, including anatomical assessment, diagnostic classification, and
survival prediction, compared with other state-of-the-art general-purpose CT
foundation models. Especially, for complicated task like recurrence-free
survival prediction in the TCIA cohort, RenalCLIP achieved a C-index of 0.726,
representing a substantial improvement of approximately 20% over the leading
baselines. Furthermore, RenalCLIP's pre-training imparted remarkable data
efficiency; in the diagnostic classification task, it only needs 20% training
data to achieve the peak performance of all baseline models even after they
were fully fine-tuned on 100% of the data. Additionally, it achieved superior
performance in report generation, image-text retrieval and zero-shot diagnosis
tasks. Our findings establish that RenalCLIP provides a robust tool with the
potential to enhance diagnostic accuracy, refine prognostic stratification, and
personalize the management of patients with kidney cancer.
中文标题/摘要
标题:面向肾癌精准肿瘤学的疾病中心化视觉-语言基础模型
对日益多发的偶发性肾占位进行无创评估是泌尿肿瘤学的关键挑战,诊断不确定性常导致良性或惰性肿瘤的过度治疗。本研究利用来自中国九家医疗中心和公共TCIA队列的8,809名患者的27,866次CT扫描数据集,开发并验证了视觉-语言基础模型RenalCLIP,用于肾占位的表征、诊断和预后预测。该模型通过两阶段预训练策略开发:首先用领域知识增强图像和文本编码器,再通过对比学习目标进行对齐,以创建具有卓越泛化能力和诊断精度的鲁棒表征。与其它最先进的通用CT基础模型相比,RenalCLIP在涵盖肾癌全临床工作流的10项核心任务(包括解剖评估、诊断分类和生存预测)中表现出更优的性能和泛化能力。尤其在TCIA队列无复发生存预测这类复杂任务中,RenalCLIP取得了0.726的C指数,较领先基线提升约20%。此外,该模型的预训练展现出显著数据效率——在诊断分类任务中,仅需20%训练数据即可达到所有基线模型使用100%数据精调后的峰值性能,同时在报告生成、图文检索和零样本诊断任务中实现卓越性能。研究表明RenalCLIP为提升肾癌诊断准确性、优化预后分层和个体化治疗提供了强效工具。
Summary / 总结
The non-invasive assessment of increasingly incidentally discovered renal masses is a critical challenge in urologic oncology, where diagnostic uncertainty frequently leads to the overtreatment of benign or indolent tumors.
Closer to Reality: Practical Semi-Supervised Federated Learning for Foundation Model Adaptation
Authors: Guangyu Sun, Jingtao Li, Weiming Zhuang, Chen Chen, Chen Chen, Lingjuan Lyu
First: 2025-08-22T17:47:02+00:00 · Latest: 2025-08-22T17:47:02+00:00
Abstract
Foundation models (FMs) exhibit remarkable generalization but require
adaptation to downstream tasks, particularly in privacy-sensitive applications.
Due to data privacy regulations, cloud-based FMs cannot directly access private
edge data, limiting their adaptation. Federated learning (FL) provides a
privacy-aware alternative, but existing FL approaches overlook the constraints
imposed by edge devices -- namely, limited computational resources and the
scarcity of labeled data. To address these challenges, we introduce Practical
Semi-Supervised Federated Learning (PSSFL), where edge devices hold only
unlabeled, low-resolution data, while the server has limited labeled,
high-resolution data. In this setting, we propose the Federated Mixture of
Experts (FedMox), a novel framework that enhances FM adaptation in FL. FedMox
tackles computational and resolution mismatch challenges via a sparse
Mixture-of-Experts architecture, employing a spatial router to align features
across resolutions and a Soft-Mixture strategy to stabilize semi-supervised
learning. We take object detection as a case study, and experiments on
real-world autonomous driving datasets demonstrate that FedMox effectively
adapts FMs under PSSFL, significantly improving performance with constrained
memory costs on edge devices. Our work paves the way for scalable and
privacy-preserving FM adaptation in federated scenarios.
中文标题/摘要
标题:更贴近现实:面向基础模型适配的实用半监督联邦学习
基础模型(FMs)展现出卓越的泛化能力,但需针对下游任务进行适配,尤其在隐私敏感应用中。由于数据隐私法规,云端基础模型无法直接访问私有边缘数据,限制了其适配能力。联邦学习(FL)提供了隐私保护的替代方案,但现有方法未充分考虑边缘设备的约束——即有限的计算资源和标注数据稀缺。为解决这些挑战,我们提出实用半监督联邦学习(PSSFL),其中边缘设备仅持有未标注的低分辨率数据,而服务器拥有有限标注的高分辨率数据。在此设置下,我们提出联邦专家混合(FedMox)框架,通过稀疏专家混合架构应对计算与分辨率失配问题:采用空间路由器对齐跨分辨率特征,并通过软混合策略稳定半监督学习。以目标检测为例,在真实自动驾驶数据集上的实验表明,FedMox在PSSFL下有效适配基础模型,在边缘设备有限内存成本下显著提升性能。本研究为联邦场景中可扩展且隐私保护的基础模型适配开辟了新途径。
Summary / 总结
Foundation models (FMs) exhibit remarkable generalization but require adaptation to downstream tasks, particularly in privacy-sensitive applications.
Sparse but Wrong: Incorrect L0 Leads to Incorrect Features in Sparse Autoencoders
Authors: David Chanin, Adrià Garriga-Alonso
First: 2025-08-22T17:26:33+00:00 · Latest: 2025-08-22T17:26:33+00:00
Abstract
Sparse Autoencoders (SAEs) extract features from LLM internal activations,
meant to correspond to single concepts. A core SAE training hyperparameter is
L0: how many features should fire per token on average. Existing work compares
SAE algorithms using sparsity--reconstruction tradeoff plots, implying L0 is a
free parameter with no single correct value. In this work we study the effect
of L0 on BatchTopK SAEs, and show that if L0 is not set precisely, the SAE
fails to learn the underlying features of the LLM. If L0 is too low, the SAE
will mix correlated features to improve reconstruction. If L0 is too high, the
SAE finds degenerate solutions that also mix features. Further, we demonstrate
a method to determine the correct L0 value for an SAE on a given training
distribution, which finds the true L0 in toy models and coincides with peak
sparse probing performance in LLMs. We find that most commonly used SAEs have
an L0 that is too low. Our work shows that, to train SAEs with correct
features, practitioners must set L0 correctly.
中文标题/摘要
标题:稀疏但错误:错误的L0导致稀疏自编码器中的特征提取错误
稀疏自编码器(SAEs)从大型语言模型(LLM)内部激活中提取特征,这些特征本应对应单一概念。SAE训练的核心超参数L0表示每个令牌平均应激活的特征数量。现有研究通过稀疏度-重构权衡图比较SAE算法,暗示L0是可自由调节的参数。本研究探讨了L0对BatchTopK SAEs的影响,发现若L0设置不精确,SAE将无法学习LLM的底层特征:L0过低会使SAE混合相关特征以改善重构;L0过高则会导致特征混合的退化解。我们进一步提出一种确定训练分布下正确L0值的方法,该方法在玩具模型中能找到真实L0值,并与LLM峰值稀疏探测性能吻合。研究表明,常用SAEs的L0值普遍偏低,说明要获得正确特征,必须精确设置L0。
Summary / 总结
Sparse Autoencoders (SAEs) extract features from LLM internal activations, meant to correspond to single concepts.
Time-Aware One Step Diffusion Network for Real-World Image Super-Resolution
Authors: Tainyi Zhang, Zheng-Peng Duan, Peng-Tao Jiang, Bo Li, Ming-Ming Cheng, Chun-Le Guo, Chongyi Li
First: 2025-08-22T17:23:49+00:00 · Latest: 2025-08-22T17:23:49+00:00
Abstract
Diffusion-based real-world image super-resolution (Real-ISR) methods have
demonstrated impressive performance. To achieve efficient Real-ISR, many works
employ Variational Score Distillation (VSD) to distill pre-trained
stable-diffusion (SD) model for one-step SR with a fixed timestep. However, due
to the different noise injection timesteps, the SD will perform different
generative priors. Therefore, a fixed timestep is difficult for these methods
to fully leverage the generative priors in SD, leading to suboptimal
performance. To address this, we propose a Time-Aware one-step Diffusion
Network for Real-ISR (TADSR). We first introduce a Time-Aware VAE Encoder,
which projects the same image into different latent features based on
timesteps. Through joint dynamic variation of timesteps and latent features,
the student model can better align with the input pattern distribution of the
pre-trained SD, thereby enabling more effective utilization of SD's generative
capabilities. To better activate the generative prior of SD at different
timesteps, we propose a Time-Aware VSD loss that bridges the timesteps of the
student model and those of the teacher model, thereby producing more consistent
generative prior guidance conditioned on timesteps. Additionally, though
utilizing the generative prior in SD at different timesteps, our method can
naturally achieve controllable trade-offs between fidelity and realism by
changing the timestep condition. Experimental results demonstrate that our
method achieves both state-of-the-art performance and controllable SR results
with only a single step.
中文标题/摘要
标题:面向真实世界图像超分辨率的时序感知单步扩散网络
基于扩散的真实世界图像超分辨率(Real-ISR)方法已展现出卓越性能。为实现高效Real-ISR,许多研究采用变分分数蒸馏(VSD)技术,以固定时间步长蒸馏预训练稳定扩散(SD)模型实现单步超分辨率。然而,由于不同噪声注入时间步会导致SD呈现不同的生成先验,固定时间步长难以充分利用SD的生成先验,导致性能次优。为此,我们提出时序感知单步扩散网络(TADSR)。首先引入时序感知VAE编码器,根据时间步将同一图像映射为不同潜在特征。通过时间步与潜在特征的联合动态变化,学生模型能更好对齐预训练SD的输入模式分布,从而更有效利用其生成能力。为进一步激活SD在不同时间步的生成先验,提出时序感知VSD损失函数,桥接学生模型与教师模型的时间步,产生更符合时间步条件的生成先验指导。此外,通过利用SD在不同时间步的生成先验,本方法可通过改变时间步条件自然实现保真度与真实感的可控权衡。实验结果表明,我们的方法仅需单步即可同时实现最先进性能和可控的超分辨率结果。
Summary / 总结
Diffusion-based real-world image super-resolution (Real-ISR) methods have demonstrated impressive performance.
Transfer Learning via Lexical Relatedness: A Sarcasm and Hate Speech Case Study
Authors: Angelly Cabrera, Linus Lei, Antonio Ortega
First: 2025-08-22T17:23:08+00:00 · Latest: 2025-08-22T17:23:08+00:00
Abstract
Detecting hate speech in non-direct forms, such as irony, sarcasm, and
innuendos, remains a persistent challenge for social networks. Although sarcasm
and hate speech are regarded as distinct expressions, our work explores whether
integrating sarcasm as a pre-training step improves implicit hate speech
detection and, by extension, explicit hate speech detection. Incorporating
samples from ETHOS, Sarcasm on Reddit, and Implicit Hate Corpus, we devised two
training strategies to compare the effectiveness of sarcasm pre-training on a
CNN+LSTM and BERT+BiLSTM model. The first strategy is a single-step training
approach, where a model trained only on sarcasm is then tested on hate speech.
The second strategy uses sequential transfer learning to fine-tune models for
sarcasm, implicit hate, and explicit hate. Our results show that sarcasm
pre-training improved the BERT+BiLSTM's recall by 9.7%, AUC by 7.8%, and
F1-score by 6% on ETHOS. On the Implicit Hate Corpus, precision increased by
7.8% when tested only on implicit samples. By incorporating sarcasm into the
training process, we show that models can more effectively detect both implicit
and explicit hate.
中文标题/摘要
标题:基于词汇相关性的迁移学习:讽刺与仇恨言论案例研究
检测非直接形式的仇恨言论,如反讽、讽刺和影射,仍是社交媒体面临的持续挑战。尽管讽刺与仇恨言论被视为不同的表达方式,本研究探讨将讽刺检测作为预训练步骤是否能提升隐式仇恨言论检测效果,并进而改善显式仇恨言论检测。通过整合ETHOS、Reddit讽刺语料和隐式仇恨语料库的样本,我们设计了两种训练策略来比较CNN+LSTM与BERT+BiLSTM模型中讽刺预训练的效果。第一种是单步训练策略,即仅在讽刺数据上训练的模型直接测试仇恨言论;第二种采用序列迁移学习,依次对讽刺、隐式仇恨和显式仇恨进行模型微调。实验结果表明:在ETHOS数据集上,讽刺预训练使BERT+BiLSTM模型的召回率提升9.7%,AUC提高7.8%,F1分数增长6%;在隐式仇恨语料库中,仅测试隐式样本时精确度上升7.8%。研究表明,将讽刺纳入训练过程能有效提升模型对隐性与显性仇恨言论的检测能力。
Summary / 总结
Detecting hate speech in non-direct forms, such as irony, sarcasm, and innuendos, remains a persistent challenge for social networks.
Machine Learning Time Propagators for Time-Dependent Density Functional Theory Simulations
Authors: Karan Shah, Attila Cangi
First: 2025-08-22T17:22:24+00:00 · Latest: 2025-08-22T17:22:24+00:00
Comments: 20 pages, 5 figures
Abstract
Time-dependent density functional theory (TDDFT) is a widely used method to
investigate electron dynamics under external time-dependent perturbations such
as laser fields. In this work, we present a novel approach to accelerate
electron dynamics simulations based on real time TDDFT using autoregressive
neural operators as time-propagators for the electron density. By leveraging
physics-informed constraints and featurization, and high-resolution training
data, our model achieves superior accuracy and computational speed compared to
traditional numerical solvers. We demonstrate the effectiveness of our model on
a class of one-dimensional diatomic molecules under the influence of a range of
laser parameters. This method has potential in enabling real-time, on-the-fly
modeling of laser-irradiated molecules and materials with varying experimental
parameters.
中文标题/摘要
标题:机器学习时间传播器在含时密度泛函理论模拟中的应用
含时密度泛函理论(TDDFT)是研究外场时变扰动(如激光场)下电子动力学的常用方法。本研究提出一种创新方法,通过自回归神经算子作为电子密度的时间传播器,加速基于实时TDDFT的电子动力学模拟。结合物理约束、特征化处理及高分辨率训练数据,该模型在精度和计算速度上均优于传统数值求解器。我们在一维双原子分子体系上验证了该方法在不同激光参数下的有效性,此技术有望实现对实验参数变化的激光辐照分子与材料进行实时动态建模。
Summary / 总结
Time-dependent density functional theory (TDDFT) is a widely used method to investigate electron dynamics under external time-dependent perturbations such as laser fields.
TinyML Towards Industry 4.0: Resource-Efficient Process Monitoring of a Milling Machine
Authors: Tim Langer, Matthias Widra, Volkhard Beyer
First: 2025-08-22T17:21:56+00:00 · Latest: 2025-08-22T17:21:56+00:00
Comments: 10 pages, 5 figures, 1 table
Abstract
In the context of industry 4.0, long-serving industrial machines can be
retrofitted with process monitoring capabilities for future use in a smart
factory. One possible approach is the deployment of wireless monitoring
systems, which can benefit substantially from the TinyML paradigm. This work
presents a complete TinyML flow from dataset generation, to machine learning
model development, up to implementation and evaluation of a full preprocessing
and classification pipeline on a microcontroller. After a short review on
TinyML in industrial process monitoring, the creation of the novel MillingVibes
dataset is described. The feasibility of a TinyML system for
structure-integrated process quality monitoring could be shown by the
development of an 8-bit-quantized convolutional neural network (CNN) model with
12.59kiB parameter storage. A test accuracy of 100.0% could be reached at
15.4ms inference time and 1.462mJ per quantized CNN inference on an ARM Cortex
M4F microcontroller, serving as a reference for future TinyML process
monitoring solutions.
中文标题/摘要
标题:TinyML迈向工业4.0:铣床资源高效型过程监控
在工业4.0背景下,可为长期服役的工业机械加装过程监控功能,以适应未来智能工厂的应用。部署无线监控系统是一种可行方案,其能显著受益于TinyML范式。本研究展示了完整的TinyML流程,从数据集生成、机器学习模型开发,到在微控制器上实现并评估完整的预处理与分类流水线。在简要回顾工业过程监控中的TinyML应用后,详细介绍了新型MillingVibes数据集的创建过程。通过开发参数量存储仅12.59kiB的8位量化卷积神经网络(CNN)模型,验证了结构集成式过程质量监控的TinyML系统可行性。在ARM Cortex M4F微控制器上实现了100.0%的测试准确率,单次量化CNN推理耗时15.4毫秒、能耗1.462毫焦,为未来TinyML过程监控方案提供了参考基准。
Summary / 总结
In the context of industry 4.0, long-serving industrial machines can be retrofitted with process monitoring capabilities for future use in a smart factory.
RL Is Neither a Panacea Nor a Mirage: Understanding Supervised vs. Reinforcement Learning Fine-Tuning for LLMs
Authors: Hangzhan Jin, Sicheng Lv, Sifan Wu, Mohammad Hamdaqa
First: 2025-08-22T17:10:37+00:00 · Latest: 2025-08-22T17:10:37+00:00
Abstract
Training large language models (LLMs) from scratch is increasingly
impractical, making post-training methods such as supervised fine-tuning (SFT)
and reinforcement-learning fine-tuning (RL-FT, e.g., PPO) central to modern
practice. Using an out-of-distribution (OOD) variant of the 24-point card game
and new spectrum-based diagnostics, we revisit how these two stages reshape
model representation and OOD performance. Our key findings are- (1) RL-FT can
restore much of the OOD performance loss from SFT (e.g., Llama-11B 8.97% to
15.38%, Qwen-7B 17.09% to 19.66%). But when SFT induces severe overfitting and
a clear distribution shift, RL-FT cannot fully recover OOD performance. (2)
Direction shifts of singular vectors matter more than singular value
magnitudes. These shifts concentrate on directions linked to the largest and
smallest singular values, leaving the bulk spectrum intact. (3) Low-rank and
shallow recovery is effective: restoring singular vector directions for the top
20% of values or first 25% of layers recovers 70-80% of OOD performance. (4)
Stronger SFT checkpoints enable better recovery by RL, while overfitted ones
resist restoration. These results reconcile prior reports of RL superior OOD
performance: RL primarily counteracts SFT-induced directional drift rather than
finding new solutions. Our spectrum-aware analysis highlights inexpensive
recovery knobs low-rank UV merging and shallow-layer resets that practitioners
can use before costly RL fine-tuning.
中文标题/摘要
标题:强化学习既非万能亦非幻影:理解大型语言模型中的监督与强化学习微调
从头训练大型语言模型(LLMs)日益不切实际,使得监督微调(SFT)和强化学习微调(RL-FT,如PPO)成为现代实践的核心。通过采用24点卡牌的分布外(OOD)变体及新型频谱诊断方法,我们重新审视这两个阶段如何重塑模型表征与OOD性能。主要发现包括:(1)RL-FT可大幅恢复SFT造成的OOD性能损失(如Llama-11B从8.97%升至15.38%,Qwen-7B从17.09%升至19.66%),但当SFT导致严重过拟合和明显分布偏移时,RL-FT无法完全恢复;(2)奇异向量方向偏移比奇异值幅度更重要,这种偏移集中于最大和最小奇异值方向,而主体频谱保持稳定;(3)低秩浅层恢复有效:恢复前20%奇异值或前25%层的向量方向可挽回70-80%的OOD性能;(4)强SFT检查点更利于RL恢复,而过拟合检查点难以修复。这些发现调和了先前关于RL优越OOD性能的报告:RL主要抵消SFT引发的方向漂移而非寻找新解决方案。我们的频谱感知分析揭示了低成本恢复手段——低秩UV合并和浅层重置,可供实践者在昂贵RL微调前采用。
Summary / 总结
Training large language models (LLMs) from scratch is increasingly impractical, making post-training methods such as supervised fine-tuning (SFT) and reinforcement-learning fine-tuning (RL-FT, e.g., PPO) central to modern practice.
Parameter-Free Logit Distillation via Sorting Mechanism
Authors: Stephen Ekaputra Limantoro
First: 2025-08-22T17:09:38+00:00 · Latest: 2025-08-22T17:09:38+00:00
Comments: Accepted in IEEE Signal Processing Letters 2025
Abstract
Knowledge distillation (KD) aims to distill the knowledge from the teacher
(larger) to the student (smaller) model via soft-label for the efficient neural
network. In general, the performance of a model is determined by accuracy,
which is measured with labels. However, existing KD approaches usually use the
teacher with its original distribution, neglecting the potential of incorrect
prediction. This may contradict the motivation of hard-label learning through
cross-entropy loss, which may lead to sub-optimal knowledge distillation on
certain samples. To address this issue, we propose a novel logit processing
scheme via a sorting mechanism. Specifically, our method has a two-fold goal:
(1) fixing the incorrect prediction of the teacher based on the labels and (2)
reordering the distribution in a natural way according to priority rank at
once. As an easy-to-use, plug-and-play pre-processing, our sort method can be
effectively applied to existing logit-based KD methods. Extensive experiments
on the CIFAR-100 and ImageNet datasets demonstrate the effectiveness of our
method.
中文标题/摘要
标题:基于排序机制的无参数Logit蒸馏
知识蒸馏(KD)旨在通过软标签将教师(较大)模型的知识传递给学生(较小)模型,以实现高效的神经网络。通常,模型性能由基于标签的准确率决定。然而,现有KD方法多直接采用教师模型的原始分布,忽略了错误预测的潜在影响,这可能与通过交叉熵损失进行硬标签学习的动机相悖,导致在某些样本上产生次优蒸馏效果。为此,我们提出一种通过排序机制的新型logit处理方案:一方面基于标签修正教师的错误预测,另一方面根据优先级自然重排分布。这种即插即用的预处理排序方法可有效应用于现有基于logit的KD方法。在CIFAR-100和ImageNet数据集上的大量实验验证了本方法的有效性。
Summary / 总结
Knowledge distillation (KD) aims to distill the knowledge from the teacher (larger) to the student (smaller) model via soft-label for the efficient neural network.
Explainable AI in Deep Learning-Based Prediction of Solar Storms
Authors: Adam O. Rawashdeh, Jason T. L. Wang, Katherine G. Herbert
First: 2025-08-22T17:09:00+00:00 · Latest: 2025-08-22T17:09:00+00:00
Comments: 6 pages, 8 figures
Abstract
A deep learning model is often considered a black-box model, as its internal
workings tend to be opaque to the user. Because of the lack of transparency, it
is challenging to understand the reasoning behind the model's predictions.
Here, we present an approach to making a deep learning-based solar storm
prediction model interpretable, where solar storms include solar flares and
coronal mass ejections (CMEs). This deep learning model, built based on a long
short-term memory (LSTM) network with an attention mechanism, aims to predict
whether an active region (AR) on the Sun's surface that produces a flare within
24 hours will also produce a CME associated with the flare. The crux of our
approach is to model data samples in an AR as time series and use the LSTM
network to capture the temporal dynamics of the data samples. To make the
model's predictions accountable and reliable, we leverage post hoc
model-agnostic techniques, which help elucidate the factors contributing to the
predicted output for an input sequence and provide insights into the model's
behavior across multiple sequences within an AR. To our knowledge, this is the
first time that interpretability has been added to an LSTM-based solar storm
prediction model.
中文标题/摘要
标题:基于深度学习的太阳风暴预测中的可解释性人工智能
深度学习模型常被视为黑箱模型,因其内部机制对用户而言往往不透明。这种透明度的缺失使得理解模型预测背后的逻辑具有挑战性。本文提出一种方法,使基于深度学习的太阳风暴预测模型具备可解释性,其中太阳风暴包括太阳耀斑和日冕物质抛射(CMEs)。该深度学习模型基于带有注意力机制的长短期记忆(LSTM)网络构建,旨在预测太阳表面活动区(AR)在24小时内产生耀斑的同时是否会引发与之关联的CME。我们的方法核心是将AR中的数据样本建模为时间序列,并利用LSTM网络捕捉其时序动态特征。为确保预测结果的可追溯性与可靠性,我们采用事后模型无关技术,这些技术既能阐明输入序列中影响预测结果的关键因素,又能揭示模型在AR内多个序列中的行为规律。据我们所知,这是首次在基于LSTM的太阳风暴预测模型中引入可解释性机制。
Summary / 总结
A deep learning model is often considered a black-box model, as its internal workings tend to be opaque to the user.
Escaping Saddle Points via Curvature-Calibrated Perturbations: A Complete Analysis with Explicit Constants and Empirical Validation
Authors: Faruk Alpay, Hamdi Alakkad
First: 2025-08-22T17:06:28+00:00 · Latest: 2025-08-22T17:06:28+00:00
Comments: 16 pages. Perturbed gradient descent with fully explicit constants
for escaping saddle points, validated empirically
Abstract
We present a comprehensive theoretical analysis of first-order methods for
escaping strict saddle points in smooth non-convex optimization. Our main
contribution is a Perturbed Saddle-escape Descent (PSD) algorithm with fully
explicit constants and a rigorous separation between gradient-descent and
saddle-escape phases. For a function $f:\mathbb{R}^d\to\mathbb{R}$ with
$\ell$-Lipschitz gradient and $\rho$-Lipschitz Hessian, we prove that PSD finds
an $(\epsilon,\sqrt{\rho\epsilon})$-approximate second-order stationary point
with high probability using at most $O(\ell\Delta_f/\epsilon^2)$ gradient
evaluations for the descent phase plus
$O((\ell/\sqrt{\rho\epsilon})\log(d/\delta))$ evaluations per escape episode,
with at most $O(\ell\Delta_f/\epsilon^2)$ episodes needed. We validate our
theoretical predictions through extensive experiments across both synthetic
functions and practical machine learning tasks, confirming the logarithmic
dimension dependence and the predicted per-episode function decrease. We also
provide complete algorithmic specifications including a finite-difference
variant (PSD-Probe) and a stochastic extension (PSGD) with robust mini-batch
sizing.
中文标题/摘要
标题:通过曲率校准扰动逃离鞍点:含显式常数与实证验证的完整分析
本文对光滑非凸优化中一阶方法逃离严格鞍点的问题进行了全面理论分析。核心贡献是提出了具有完全显式常数的扰动鞍点逃离下降算法(PSD),并严格区分梯度下降与鞍点逃离阶段。对于梯度Lipschitz常数为ℓ、Hessian矩阵Lipschitz常数为ρ的函数f:ℝᵈ→ℝ,我们证明PSD能以高概率找到(ε,√(ρε))-近似二阶稳定点:下降阶段至多使用O(ℓΔ_f/ε²)次梯度计算,每次逃离阶段需O((ℓ/√(ρε))log(d/δ))次计算,且最多需要O(ℓΔ_f/ε²)次逃离阶段。通过合成函数和实际机器学习任务的广泛实验,我们验证了理论预测,确认了对数维度依赖性和每阶段函数值下降预测。同时提供了完整算法规范,包括有限差分变体(PSD-Probe)和具有鲁棒小批量大小的随机扩展(PSGD)。
Summary / 总结
We present a comprehensive theoretical analysis of first-order methods for escaping strict saddle points in smooth non-convex optimization.
Quality control in sublinear time: a case study via random graphs
Authors: Cassandra Marcussen, Ronitt Rubinfeld, Madhu Sudan
First: 2025-08-22T16:54:18+00:00 · Latest: 2025-08-22T16:54:18+00:00
Comments: 70 pages
Abstract
Many algorithms are designed to work well on average over inputs. When
running such an algorithm on an arbitrary input, we must ask: Can we trust the
algorithm on this input? We identify a new class of algorithmic problems
addressing this, which we call "Quality Control Problems." These problems are
specified by a (positive, real-valued) "quality function" $\rho$ and a
distribution $D$ such that, with high probability, a sample drawn from $D$ is
"high quality," meaning its $\rho$-value is near $1$. The goal is to accept
inputs $x \sim D$ and reject potentially adversarially generated inputs $x$
with $\rho(x)$ far from $1$. The objective of quality control is thus weaker
than either component problem: testing for "$\rho(x) \approx 1$" or testing if
$x \sim D$, and offers the possibility of more efficient algorithms.
In this work, we consider the sublinear version of the quality control
problem, where $D \in \Delta(\{0,1\}^N)$ and the goal is to solve the $(D
,\rho)$-quality problem with $o(N)$ queries and time. As a case study, we
consider random graphs, i.e., $D = G_{n,p}$ (and $N = \binom{n}2$), and the
$k$-clique count function $\rho_k := C_k(G)/\mathbb{E}_{G' \sim
G_{n,p}}[C_k(G')]$, where $C_k(G)$ is the number of $k$-cliques in $G$. Testing
if $G \sim G_{n,p}$ with one sample, let alone with sublinear query access to
the sample, is of course impossible. Testing if $\rho_k(G)\approx 1$ requires
$p^{-\Omega(k^2)}$ samples. In contrast, we show that the quality control
problem for $G_{n,p}$ (with $n \geq p^{-ck}$ for some constant $c$) with
respect to $\rho_k$ can be tested with $p^{-O(k)}$ queries and time, showing
quality control is provably superpolynomially more efficient in this setting.
More generally, for a motif $H$ of maximum degree $\Delta(H)$, the respective
quality control problem can be solved with $p^{-O(\Delta(H))}$ queries and
running time.
中文标题/摘要
标题:次线性时间中的质量控制:以随机图为例的研究
许多算法被设计为在输入上平均表现良好。当在任意输入上运行此类算法时,我们必须问:我们能信任该算法在此输入上的表现吗?我们识别出一类新的算法问题来解决这一点,称之为“质量控制问题”。这些问题由一个(正的、实值的)“质量函数”$\rho$和一个分布$D$指定,使得从$D$中抽取的样本以高概率是“高质量的”,即其$\rho$值接近1。目标是接受输入$x \sim D$并拒绝可能由对抗生成的、$\rho(x)$远离1的输入$x$。因此,质量控制的目标弱于任一组件问题:测试“$\rho(x) \approx 1$”或测试$x \sim D$,并提供了更高效算法的可能性。
在这项工作中,我们考虑质量控制问题的次线性版本,其中$D \in \Delta(\{0,1\}^N)$,目标是以$o(N)$查询和时间解决$(D,\rho)$-质量问题。作为案例研究,我们考虑随机图,即$D = G_{n,p}$(且$N = \binom{n}2$)和$k$-团计数函数$\rho_k := C_k(G)/\mathbb{E}_{G' \sim G_{n,p}}[C_k(G')]$,其中$C_k(G)$是$G$中$k$-团的数量。用一个样本测试$G \sim G_{n,p}$,更不用说对样本进行次线性查询访问,当然是不可能的。测试$\rho_k(G)\approx 1$需要$p^{-\Omega(k^2)}$样本。相比之下,我们展示了对于$G_{n,p}$(其中$n \geq p^{-ck}$,$c$为某常数)关于$\rho_k$的质量控制问题可以用$p^{-O(k)}$查询和时间进行测试,表明在此设置中质量控制被证明是超多项式更高效的。更一般地,对于最大度为$\Delta(H)$的模体$H$,相应的质量控制问题可以用$p^{-O(\Delta(H))}$查询和运行时间解决。
Summary / 总结
Many algorithms are designed to work well on average over inputs.
Towards Open World Detection: A Survey
Authors: Andrei-Stefan Bulzan, Cosmin Cernazanu-Glavan
First: 2025-08-22T16:49:52+00:00 · Latest: 2025-08-22T16:49:52+00:00
Comments: 30 pages
Abstract
For decades, Computer Vision has aimed at enabling machines to perceive the
external world. Initial limitations led to the development of highly
specialized niches. As success in each task accrued and research progressed,
increasingly complex perception tasks emerged. This survey charts the
convergence of these tasks and, in doing so, introduces Open World Detection
(OWD), an umbrella term we propose to unify class-agnostic and generally
applicable detection models in the vision domain. We start from the history of
foundational vision subdomains and cover key concepts, methodologies and
datasets making up today's state-of-the-art landscape. This traverses topics
starting from early saliency detection, foreground/background separation, out
of distribution detection and leading up to open world object detection,
zero-shot detection and Vision Large Language Models (VLLMs). We explore the
overlap between these subdomains, their increasing convergence, and their
potential to unify into a singular domain in the future, perception.
中文标题/摘要
标题:迈向开放世界检测:一项综述
数十年来,计算机视觉致力于让机器感知外部世界。最初的局限性催生了高度专业化的细分领域。随着各任务成果的积累与研究进展,日益复杂的感知任务应运而生。本综述描绘了这些任务的融合趋势,并由此提出“开放世界检测(OWD)”这一统称,用以整合视觉领域中类别无关且普遍适用的检测模型。我们从基础视觉子领域的历史出发,涵盖构成当今前沿格局的关键概念、方法论及数据集,内容贯穿从早期显著性检测、前景/背景分离、分布外检测,到开放世界目标检测、零样本检测及视觉大语言模型(VLLMs)等主题。我们探讨这些子领域间的重叠性、日益增强的融合趋势,以及它们未来统一为单一感知领域的潜力。
Summary / 总结
For decades, Computer Vision has aimed at enabling machines to perceive the external world.
Guiding Diffusion Models with Reinforcement Learning for Stable Molecule Generation
Authors: Zhijian Zhou, Junyi An, Zongkai Liu, Yunfei Shi, Xuan Zhang, Fenglei Cao, Chao Qu, Yuan Qi
First: 2025-08-22T16:44:55+00:00 · Latest: 2025-08-22T16:44:55+00:00
Abstract
Generating physically realistic 3D molecular structures remains a core
challenge in molecular generative modeling. While diffusion models equipped
with equivariant neural networks have made progress in capturing molecular
geometries, they often struggle to produce equilibrium structures that adhere
to physical principles such as force field consistency. To bridge this gap, we
propose Reinforcement Learning with Physical Feedback (RLPF), a novel framework
that extends Denoising Diffusion Policy Optimization to 3D molecular
generation. RLPF formulates the task as a Markov decision process and applies
proximal policy optimization to fine-tune equivariant diffusion models.
Crucially, RLPF introduces reward functions derived from force-field
evaluations, providing direct physical feedback to guide the generation toward
energetically stable and physically meaningful structures. Experiments on the
QM9 and GEOM-drug datasets demonstrate that RLPF significantly improves
molecular stability compared to existing methods. These results highlight the
value of incorporating physics-based feedback into generative modeling. The
code is available at: https://github.com/ZhijianZhou/RLPF/tree/verl_diffusion.
中文标题/摘要
标题:利用强化学习引导扩散模型实现稳定分子生成
生成物理真实的3D分子结构仍是分子生成建模的核心挑战。尽管配备等变神经网络的扩散模型在捕捉分子几何结构方面取得进展,但它们往往难以产生符合力场一致性等物理原理的平衡结构。为弥合这一差距,我们提出物理反馈强化学习(RLPF),这是一种将去噪扩散策略优化扩展至3D分子生成的新框架。RLPF将该任务构建为马尔可夫决策过程,并应用近端策略优化微调等变扩散模型。关键的是,RLPF引入了基于力场评估的奖励函数,提供直接物理反馈以引导生成能量稳定且物理意义明确的结构。在QM9和GEOM-drug数据集上的实验表明,相较于现有方法,RLPF显著提升了分子稳定性。这些结果凸显了将基于物理的反馈融入生成建模的价值。代码发布于:https://github.com/ZhijianZhou/RLPF/tree/verl_diffusion。
Summary / 总结
Generating physically realistic 3D molecular structures remains a core challenge in molecular generative modeling.
FLAMES: Improving LLM Math Reasoning via a Fine-Grained Analysis of the Data Synthesis Pipeline
Authors: Parker Seegmiller, Kartik Mehta, Soumya Saha, Chenyang Tao, Shereen Oraby, Arpit Gupta, Tagyoung Chung, Mohit Bansal, Nanyun Peng
Venue: EMNLP 2025
First: 2025-08-22T16:37:40+00:00 · Latest: 2025-08-22T16:37:40+00:00
Comments: To appear at EMNLP 2025
Abstract
Recent works improving LLM math reasoning with synthetic data have used
unique setups, making comparison of data synthesis strategies impractical. This
leaves many unanswered questions about the roles of different factors in the
synthetic data pipeline, such as the impact of filtering low-quality problems.
To address this gap, we introduce FLAMES, a Framework for LLM Assessment of
Math rEasoning Data Synthesis, and perform a systematic study of 10 existing
data synthesis strategies and multiple other factors impacting the performance
of synthetic math reasoning data. Our FLAMES experiments provide several
valuable insights about the optimal balance of difficulty and diversity of
synthetic data. First, data agents designed to increase problem complexity lead
to best improvements on most math metrics. Second, with a fixed data generation
budget, keeping higher problem coverage is more important than keeping only
problems with reliable solutions. Third, GSM8K- and MATH-based synthetic data
can lead to improvements on competition-level benchmarks, showcasing
easy-to-hard generalization. Leveraging insights from our FLAMES experiments,
we design two novel data synthesis strategies for improving out-of-domain
generalization and robustness. Further, we develop the FLAMES dataset, an
effective blend of our novel and existing data synthesis strategies,
outperforming public datasets on OlympiadBench (+15.7), CollegeMath (+4.5),
GSMPlus (+6.5), and MATH (+3.1). Fine-tuning Qwen2.5-Math-7B on the FLAMES
dataset achieves 81.4% on MATH, surpassing larger Llama3 405B, GPT-4o and
Claude 3.5 Sonnet.
中文标题/摘要
标题:FLAMES:通过数据合成流程的细粒度分析提升大语言模型数学推理能力
近期利用合成数据改进大语言模型数学推理的研究采用各异实验设置,导致数据合成策略难以直接比较。这使合成数据流程中不同因素的作用存在诸多未解之谜,例如过滤低质量问题的影响。为此,我们推出FLAMES框架(大语言模型数学推理数据合成评估框架),系统研究10种现有数据合成策略及影响合成数学推理数据性能的多个因素。FLAMES实验揭示了合成数据难度与多样性的最佳平衡:首先,旨在提升问题复杂度的数据代理能最大程度改善多数数学指标;其次,在固定数据生成预算下,保持较高问题覆盖率比仅保留可靠解的问题更重要;第三,基于GSM8K和MATH的合成数据可提升竞赛级基准表现,展现由易到难的泛化能力。基于这些发现,我们设计两种新颖数据合成策略以提升域外泛化与鲁棒性,并开发FLAMES数据集——融合新策略与现有策略的有效组合,在OlympiadBench(+15.7)、CollegeMath(+4.5)、GSMPlus(+6.5)和MATH(+3.1)上超越公开数据集。使用FLAMES数据集微调Qwen2.5-Math-7B后,MATH准确率达81.4%,超越更大规模的Llama3 405B、GPT-4o和Claude 3.5 Sonnet。
Summary / 总结
Recent works improving LLM math reasoning with synthetic data have used unique setups, making comparison of data synthesis strategies impractical.
Seeing Clearly, Forgetting Deeply: Revisiting Fine-Tuned Video Generators for Driving Simulation
Authors: Chun-Peng Chang, Chen-Yu Wang, Julian Schmidt, Holger Caesar, Alain Pagani
First: 2025-08-22T16:35:19+00:00 · Latest: 2025-08-22T16:35:19+00:00
Abstract
Recent advancements in video generation have substantially improved visual
quality and temporal coherence, making these models increasingly appealing for
applications such as autonomous driving, particularly in the context of driving
simulation and so-called "world models". In this work, we investigate the
effects of existing fine-tuning video generation approaches on structured
driving datasets and uncover a potential trade-off: although visual fidelity
improves, spatial accuracy in modeling dynamic elements may degrade. We
attribute this degradation to a shift in the alignment between visual quality
and dynamic understanding objectives. In datasets with diverse scene structures
within temporal space, where objects or perspective shift in varied ways, these
objectives tend to highly correlated. However, the very regular and repetitive
nature of driving scenes allows visual quality to improve by modeling dominant
scene motion patterns, without necessarily preserving fine-grained dynamic
behavior. As a result, fine-tuning encourages the model to prioritize
surface-level realism over dynamic accuracy. To further examine this
phenomenon, we show that simple continual learning strategies, such as replay
from diverse domains, can offer a balanced alternative by preserving spatial
accuracy while maintaining strong visual quality.
中文标题/摘要
标题:清晰视界,深度遗忘:重访用于驾驶模拟的微调视频生成器
视频生成技术的最新进展显著提升了视觉质量与时间连贯性,使这类模型在自动驾驶等应用中愈发受到青睐,尤其是在驾驶模拟和所谓“世界模型”的背景下。本研究探讨了现有微调视频生成方法在结构化驾驶数据集上的影响,揭示了一个潜在权衡:尽管视觉保真度提高,但对动态元素建模的空间准确性可能下降。我们将此归因于视觉质量目标与动态理解目标之间对齐关系的偏移。在时间域内具有多样化场景结构的数据集中,这些目标往往高度相关;然而驾驶场景高度规律和重复的特性,使得模型可通过主导场景运动模式提升视觉质量,却未必保留细粒度动态行为。因此微调促使模型优先考虑表层真实感而非动态准确性。为深入探究该现象,我们证明简单的持续学习策略(如跨域回放)能通过保持空间准确性同时维持强劲视觉质量,提供一种平衡的替代方案。
Summary / 总结
Recent advancements in video generation have substantially improved visual quality and temporal coherence, making these models increasingly appealing for applications such as autonomous driving, particularly in the context of driving simulation and so-called "world models".
ML-PWS: Estimating the Mutual Information Between Experimental Time Series Using Neural Networks
Authors: Manuel Reinhardt, Gašper Tkačik, Pieter Rein ten Wolde
First: 2025-08-22T16:33:34+00:00 · Latest: 2025-08-22T16:33:34+00:00
Comments: 9 pages, 2 figures
Abstract
The ability to quantify information transmission is crucial for the analysis
and design of natural and engineered systems. The information transmission rate
is the fundamental measure for systems with time-varying signals, yet computing
it is extremely challenging. In particular, the rate cannot be obtained
directly from experimental time-series data without approximations, because of
the high dimensionality of the signal trajectory space. Path Weight Sampling
(PWS) is a computational technique that makes it possible to obtain the
information rate exactly for any stochastic system. However, it requires a
mathematical model of the system of interest, be it described by a master
equation or a set of differential equations. Here, we present a technique that
employs Machine Learning (ML) to develop a generative model from experimental
time-series data, which is then combined with PWS to obtain the information
rate. We demonstrate the accuracy of this technique, called ML-PWS, by
comparing its results on synthetic time-series data generated from a non-linear
model against ground-truth results obtained by applying PWS directly to the
same model. We illustrate the utility of ML-PWS by applying it to neuronal
time-series data.
中文标题/摘要
标题:ML-PWS:利用神经网络估计实验时间序列间的互信息
量化信息传输能力对于分析和设计自然与工程系统至关重要。信息传输速率是时变信号系统的基本度量指标,但其计算极具挑战性。由于信号轨迹空间的高维特性,若不采用近似方法,无法直接从实验时间序列数据中获取该速率。路径权重采样(PWS)是一种计算技术,可精确获取任意随机系统的信息速率,但需要基于主方程或微分方程组构建目标系统的数学模型。本文提出一种结合机器学习(ML)的技术:首先通过实验时间序列数据构建生成模型,再与PWS结合计算信息速率。通过对比非线性模型生成的合成时间序列数据上ML-PWS的结果与直接应用PWS于同一模型获得的基准结果,我们验证了该技术的准确性,并通过神经元时间序列数据的应用展示了其效用。
Summary / 总结
The ability to quantify information transmission is crucial for the analysis and design of natural and engineered systems.
MuST2-Learn: Multi-view Spatial-Temporal-Type Learning for Heterogeneous Municipal Service Time Estimation
Authors: Nadia Asif, Zhiqing Hong, Shaogang Ren, Xiaonan Zhang, Xiaojun Shang, Yukun Yuan
First: 2025-08-22T16:28:57+00:00 · Latest: 2025-08-22T16:28:57+00:00
Comments: Accepted to SIGSPATIAL 2025
Abstract
Non-emergency municipal services such as city 311 systems have been widely
implemented across cities in Canada and the United States to enhance residents'
quality of life. These systems enable residents to report issues, e.g., noise
complaints, missed garbage collection, and potholes, via phone calls, mobile
applications, or webpages. However, residents are often given limited
information about when their service requests will be addressed, which can
reduce transparency, lower resident satisfaction, and increase the number of
follow-up inquiries. Predicting the service time for municipal service requests
is challenging due to several complex factors: dynamic spatial-temporal
correlations, underlying interactions among heterogeneous service request
types, and high variation in service duration even within the same request
category. In this work, we propose MuST2-Learn: a Multi-view
Spatial-Temporal-Type Learning framework designed to address the aforementioned
challenges by jointly modeling spatial, temporal, and service type dimensions.
In detail, it incorporates an inter-type encoder to capture relationships among
heterogeneous service request types and an intra-type variation encoder to
model service time variation within homogeneous types. In addition, a
spatiotemporal encoder is integrated to capture spatial and temporal
correlations in each request type. The proposed framework is evaluated with
extensive experiments using two real-world datasets. The results show that
MuST2-Learn reduces mean absolute error by at least 32.5%, which outperforms
state-of-the-art methods.
中文标题/摘要
标题:MuST2-Learn:面向异构市政服务时间估计的多视角时空类型学习框架
非紧急市政服务(如城市311系统)已在加拿大和美国多个城市广泛实施,以提升居民生活质量。居民可通过电话、移动应用或网页报告问题,例如噪音投诉、垃圾未收和道路坑洼。然而,居民通常对服务请求的处理时间知之甚少,这降低了透明度,影响居民满意度,并增加了后续查询次数。预测市政服务请求时间面临多重挑战:动态时空相关性、异构服务请求类型间的潜在交互,以及同类请求中服务时长的高变异性。本研究提出MuST2-Learn:一种多视角时空类型学习框架,通过联合建模空间、时间和服务类型维度来解决上述问题。具体包括:类型间编码器捕捉异构服务请求类型的关系,类型内变分编码器建模同质类型内的服务时间变异,同时集成时空编码器捕捉各请求类型的时空相关性。通过两个真实数据集的广泛实验证明,该框架将平均绝对误差降低至少32.5%,性能优于现有先进方法。
Summary / 总结
Non-emergency municipal services such as city 311 systems have been widely implemented across cities in Canada and the United States to enhance residents' quality of life.
On Zero-Shot Reinforcement Learning
Authors: Scott Jeen
First: 2025-08-22T16:20:49+00:00 · Latest: 2025-08-22T16:20:49+00:00
Comments: PhD thesis
Abstract
Modern reinforcement learning (RL) systems capture deep truths about general,
human problem-solving. In domains where new data can be simulated cheaply,
these systems uncover sequential decision-making policies that far exceed the
ability of any human. Society faces many problems whose solutions require this
skill, but they are often in domains where new data cannot be cheaply
simulated. In such scenarios, we can learn simulators from existing data, but
these will only ever be approximately correct, and can be pathologically
incorrect when queried outside of their training distribution. As a result, a
misalignment between the environments in which we train our agents and the
real-world in which we wish to deploy our agents is inevitable. Dealing with
this misalignment is the primary concern of zero-shot reinforcement learning, a
problem setting where the agent must generalise to a new task or domain with
zero practice shots. Whilst impressive progress has been made on methods that
perform zero-shot RL in idealised settings, new work is needed if these results
are to be replicated in real-world settings. In this thesis, we argue that
doing so requires us to navigate (at least) three constraints. First, the data
quality constraint: real-world datasets are small and homogeneous. Second, the
observability constraint: states, dynamics and rewards in the real-world are
often only partially observed. And third, the data availability constraint: a
priori access to data cannot always be assumed. This work proposes a suite of
methods that perform zero-shot RL subject to these constraints. In a series of
empirical studies we expose the failings of existing methods, and justify our
techniques for remedying them. We believe these designs take us a step closer
to RL methods that can be deployed to solve real-world problems.
中文标题/摘要
标题:论零样本强化学习
现代强化学习(RL)系统揭示了关于通用人类问题解决的深层原理。在能够低成本模拟新数据的领域,这些系统发现的序列决策策略远超人类能力。社会面临许多需要此类技能解决的问题,但这些领域往往无法低成本模拟新数据。在此类场景中,我们可以从现有数据学习模拟器,但这些模拟器仅能近似正确,且在训练分布之外查询时可能出现病态错误。因此,智能体训练环境与实际部署环境之间的失配不可避免。处理这种失配是零样本强化学习的核心议题——该问题设定要求智能体在零实践样本的情况下泛化至新任务或领域。尽管在理想化设置中实现零样本RL的方法已取得显著进展,但若要在现实世界中复现这些成果,仍需新的研究工作。本论文主张需应对(至少)三重约束:其一,数据质量约束——现实数据集规模小且同质化;其二,可观测性约束——现实世界中的状态、动态和奖励往往只能被部分观测;其三,数据可用性约束——不能总是假定可先验获取数据。本研究提出一套在此类约束下实现零样本RL的方法,通过系列实证研究揭示现有方法的缺陷,并论证我们所提的改进技术。我们相信这些设计使可部署解决实际问题的RL方法更近一步。
Summary / 总结
Modern reinforcement learning (RL) systems capture deep truths about general, human problem-solving.
Post Hoc Regression Refinement via Pairwise Rankings
Authors: Kevin Tirta Wijaya, Michael Sun, Minghao Guo, Hans-Peter Seidel, Wojciech Matusik, Vahid Babaei
First: 2025-08-22T16:17:31+00:00 · Latest: 2025-08-22T16:17:31+00:00
Abstract
Accurate prediction of continuous properties is essential to many scientific
and engineering tasks. Although deep-learning regressors excel with abundant
labels, their accuracy deteriorates in data-scarce regimes. We introduce
RankRefine, a model-agnostic, plug-and-play post hoc method that refines
regression with expert knowledge coming from pairwise rankings. Given a query
item and a small reference set with known properties, RankRefine combines the
base regressor's output with a rank-based estimate via inverse variance
weighting, requiring no retraining. In molecular property prediction task,
RankRefine achieves up to 10% relative reduction in mean absolute error using
only 20 pairwise comparisons obtained through a general-purpose large language
model (LLM) with no finetuning. As rankings provided by human experts or
general-purpose LLMs are sufficient for improving regression across diverse
domains, RankRefine offers practicality and broad applicability, especially in
low-data settings.
中文标题/摘要
标题:基于成对排序的事后回归优化
连续属性的精确预测对众多科学与工程任务至关重要。尽管深度学习回归器在标签充足时表现卓越,但其在数据稀缺场景下的准确性会下降。我们提出RankRefine——一种与模型无关、即插即用的事后优化方法,通过引入来自成对排序的专家知识来改进回归效果。给定查询项和已知属性的小型参考集,RankRefine通过逆方差加权将基础回归器的输出与基于排序的估计值相结合,无需重新训练。在分子属性预测任务中,仅使用通用大语言模型(LLM)未经微调生成的20组成对比较,RankRefine即可实现平均绝对误差相对降低高达10%。由于人类专家或通用LLM提供的排序足以改进跨领域回归,RankRefine具有实用性和广泛适用性,尤其在低数据环境中。
Summary / 总结
Accurate prediction of continuous properties is essential to many scientific and engineering tasks.
Ensembles of Neural Surrogates for Parametric Sensitivity in Ocean Modeling
Authors: Yixuan Sun, Romain Egele, Sri Hari Krishna Narayana, Luke Van Roekel, Carmelo Gonzales, Steven Brus, Balu Nadiga, Sandeep Madireddy, Prasanna Balaprakash
First: 2025-08-22T16:12:04+00:00 · Latest: 2025-08-22T16:12:04+00:00
Comments: 12 pages, 7 figures
Abstract
Accurate simulations of the oceans are crucial in understanding the Earth
system. Despite their efficiency, simulations at lower resolutions must rely on
various uncertain parameterizations to account for unresolved processes.
However, model sensitivity to parameterizations is difficult to quantify,
making it challenging to tune these parameterizations to reproduce
observations. Deep learning surrogates have shown promise for efficient
computation of the parametric sensitivities in the form of partial derivatives,
but their reliability is difficult to evaluate without ground truth
derivatives. In this work, we leverage large-scale hyperparameter search and
ensemble learning to improve both forward predictions, autoregressive rollout,
and backward adjoint sensitivity estimation. Particularly, the ensemble method
provides epistemic uncertainty of function value predictions and their
derivatives, providing improved reliability of the neural surrogates in
decision making.
中文标题/摘要
标题:海洋建模中参数敏感性的神经网络代理集成方法
精确的海洋模拟对理解地球系统至关重要。尽管低分辨率模拟效率较高,但必须依赖各种不确定的参数化来处理未解析过程。然而,模型对参数化的敏感性难以量化,这使得调整这些参数化以复现观测数据具有挑战性。深度学习代理在通过偏导数形式高效计算参数敏感性方面展现出潜力,但缺乏真实导数的情况下难以评估其可靠性。本研究利用大规模超参数搜索和集成学习改进前向预测、自回归推演及后向伴随敏感性估计。特别地,集成方法提供了函数值预测及其导数的认知不确定性,从而增强了神经代理在决策中的可靠性。
Summary / 总结
Accurate simulations of the oceans are crucial in understanding the Earth system.
FraPPE: Fast and Efficient Preference-based Pure Exploration
Authors: Udvas Das, Apurv Shukla, Debabrota Basu
First: 2025-08-22T16:02:06+00:00 · Latest: 2025-08-22T16:02:06+00:00
Abstract
Preference-based Pure Exploration (PrePEx) aims to identify with a given
confidence level the set of Pareto optimal arms in a vector-valued (aka
multi-objective) bandit, where the reward vectors are ordered via a (given)
preference cone $\mathcal{C}$. Though PrePEx and its variants are well-studied,
there does not exist a computationally efficient algorithm that can optimally
track the existing lower bound for arbitrary preference cones. We successfully
fill this gap by efficiently solving the minimisation and maximisation problems
in the lower bound. First, we derive three structural properties of the lower
bound that yield a computationally tractable reduction of the minimisation
problem. Then, we deploy a Frank-Wolfe optimiser to accelerate the maximisation
problem in the lower bound. Together, these techniques solve the maxmin
optimisation problem in $\mathcal{O}(KL^{2})$ time for a bandit instance with
$K$ arms and $L$ dimensional reward, which is a significant acceleration over
the literature. We further prove that our proposed PrePEx algorithm, FraPPE,
asymptotically achieves the optimal sample complexity. Finally, we perform
numerical experiments across synthetic and real datasets demonstrating that
FraPPE achieves the lowest sample complexities to identify the exact Pareto set
among the existing algorithms.
中文标题/摘要
标题:FraPPE:基于偏好的快速高效纯探索
基于偏好的纯探索(PrePEx)旨在以给定置信水平识别向量值(即多目标)赌博机中的帕累托最优臂集,其中奖励向量通过(给定的)偏好锥$\mathcal{C}$排序。尽管PrePEx及其变体已得到充分研究,但尚无计算高效算法能最优跟踪任意偏好锥的现有下界。我们通过高效解决下界中的最小化和最大化问题成功填补了这一空白。首先,推导出下界的三个结构特性,使最小化问题可计算地简化;随后采用Frank-Wolfe优化器加速下界中的最大化问题。这些技术共同以$\mathcal{O}(KL^{2})$时间复杂度解决最大最小优化问题(适用于包含$K$个臂和$L$维奖励的赌博机实例),较现有研究实现显著加速。进一步证明所提算法FraPPE渐近达到最优样本复杂度。最后通过合成与真实数据集的数值实验表明,FraPPE在现有算法中实现了识别精确帕累托集的最低样本复杂度。
Summary / 总结
Preference-based Pure Exploration (PrePEx) aims to identify with a given confidence level the set of Pareto optimal arms in a vector-valued (aka multi-objective) bandit, where the reward vectors are ordered via a (given) preference cone $\mathcal{C}$.
Underdamped Langevin MCMC with third order convergence
Authors: Maximilian Scott, Dáire O'Kane, Andraž Jelinčič, James Foster
First: 2025-08-22T16:00:01+00:00 · Latest: 2025-08-22T16:00:01+00:00
Comments: 62 pages, 7 figures
Abstract
In this paper, we propose a new numerical method for the underdamped Langevin
diffusion (ULD) and present a non-asymptotic analysis of its sampling error in
the 2-Wasserstein distance when the $d$-dimensional target distribution
$p(x)\propto e^{-f(x)}$ is strongly log-concave and has varying degrees of
smoothness. Precisely, under the assumptions that the gradient and Hessian of
$f$ are Lipschitz continuous, our algorithm achieves a 2-Wasserstein error of
$\varepsilon$ in $\mathcal{O}(\sqrt{d}/\varepsilon)$ and
$\mathcal{O}(\sqrt{d}/\sqrt{\varepsilon})$ steps respectively. Therefore, our
algorithm has a similar complexity as other popular Langevin MCMC algorithms
under matching assumptions. However, if we additionally assume that the third
derivative of $f$ is Lipschitz continuous, then our algorithm achieves a
2-Wasserstein error of $\varepsilon$ in
$\mathcal{O}(\sqrt{d}/\varepsilon^{\frac{1}{3}})$ steps. To the best of our
knowledge, this is the first gradient-only method for ULD with third order
convergence. To support our theory, we perform Bayesian logistic regression
across a range of real-world datasets, where our algorithm achieves competitive
performance compared to an existing underdamped Langevin MCMC algorithm and the
popular No U-Turn Sampler (NUTS).
中文标题/摘要
标题:具有三阶收敛性的欠阻尼朗之万MCMC方法
本文针对欠阻尼朗之万扩散(ULD)提出了一种新的数值方法,并在目标分布$p(x)\propto e^{-f(x)}$为强对数凹且具有不同光滑度程度的$d$维情况下,对其在2-Wasserstein距离上的采样误差进行了非渐近分析。具体而言,在$f$的梯度和Hessian矩阵满足Lipschitz连续的假设下,我们的算法分别以$\mathcal{O}(\sqrt{d}/\varepsilon)$和$\mathcal{O}(\sqrt{d}/\sqrt{\varepsilon})$步数达到$\varepsilon$的2-Wasserstein误差。因此,在匹配假设下,我们的算法与其他流行的朗之万MCMC算法具有相似的复杂度。然而,若进一步假设$f$的三阶导数也满足Lipschitz连续,则我们的算法仅需$\mathcal{O}(\sqrt{d}/\varepsilon^{\frac{1}{3}})$步即可达到$\varepsilon$的2-Wasserstein误差。据我们所知,这是首个实现三阶收敛的纯梯度ULD方法。为验证理论,我们在多个真实数据集上进行贝叶斯逻辑回归实验,结果显示该算法相较于现有欠阻尼朗之万MCMC算法和流行的无转弯采样器(NUTS)具有竞争优势。
Summary / 总结
In this paper, we propose a new numerical method for the underdamped Langevin diffusion (ULD) and present a non-asymptotic analysis of its sampling error in the 2-Wasserstein distance when the $d$-dimensional target distribution $p(x)\propto e^{-f(x)}$ is strongly log-concave and has varying degrees of smoothness.
Benchmarking the Robustness of Agentic Systems to Adversarially-Induced Harms
Authors: Jonathan Nöther, Adish Singla, Goran Radanovic
First: 2025-08-22T15:53:22+00:00 · Latest: 2025-08-22T15:53:22+00:00
Comments: 52 Pages
Abstract
Ensuring the safe use of agentic systems requires a thorough understanding of
the range of malicious behaviors these systems may exhibit when under attack.
In this paper, we evaluate the robustness of LLM-based agentic systems against
attacks that aim to elicit harmful actions from agents. To this end, we propose
a novel taxonomy of harms for agentic systems and a novel benchmark, BAD-ACTS,
for studying the security of agentic systems with respect to a wide range of
harmful actions. BAD-ACTS consists of 4 implementations of agentic systems in
distinct application environments, as well as a dataset of 188 high-quality
examples of harmful actions. This enables a comprehensive study of the
robustness of agentic systems across a wide range of categories of harmful
behaviors, available tools, and inter-agent communication structures. Using
this benchmark, we analyze the robustness of agentic systems against an
attacker that controls one of the agents in the system and aims to manipulate
other agents to execute a harmful target action. Our results show that the
attack has a high success rate, demonstrating that even a single adversarial
agent within the system can have a significant impact on the security. This
attack remains effective even when agents use a simple prompting-based defense
strategy. However, we additionally propose a more effective defense based on
message monitoring. We believe that this benchmark provides a diverse testbed
for the security research of agentic systems. The benchmark can be found at
github.com/JNoether/BAD-ACTS
中文标题/摘要
标题:评估代理系统对对抗性诱导危害的鲁棒性基准研究
确保代理系统的安全使用需全面理解其在受攻击时可能表现出的恶意行为范围。本文评估了基于LLM的代理系统抵御旨在诱发代理执行有害行为的攻击的鲁棒性。为此,我们提出了代理系统危害的新分类法和新型基准BAD-ACTS,用于研究代理系统在广泛有害行为方面的安全性。BAD-ACTS包含4种不同应用环境中的代理系统实现及188个高质量有害行为实例数据集,支持跨多类有害行为、可用工具和代理间通信结构的全面鲁棒性研究。通过该基准,我们分析了当攻击者控制系统内某个代理并试图操纵其他代理执行有害目标行为时的系统鲁棒性。结果表明该攻击成功率较高,证明即使单个敌对代理也能对安全产生重大影响。即使代理采用基于提示的简单防御策略,该攻击仍有效。但我们进一步提出了基于消息监控的更有效防御方案。相信该基准能为代理系统安全研究提供多样化测试平台,详见github.com/JNoether/BAD-ACTS
Summary / 总结
Ensuring the safe use of agentic systems requires a thorough understanding of the range of malicious behaviors these systems may exhibit when under attack.
Disentangled Multi-modal Learning of Histology and Transcriptomics for Cancer Characterization
Authors: Yupei Zhang, Xiaofei Wang, Anran Liu, Lequan Yu, Chao Li
First: 2025-08-22T15:51:33+00:00 · Latest: 2025-08-22T15:51:33+00:00
Abstract
Histopathology remains the gold standard for cancer diagnosis and prognosis.
With the advent of transcriptome profiling, multi-modal learning combining
transcriptomics with histology offers more comprehensive information. However,
existing multi-modal approaches are challenged by intrinsic multi-modal
heterogeneity, insufficient multi-scale integration, and reliance on paired
data, restricting clinical applicability. To address these challenges, we
propose a disentangled multi-modal framework with four contributions: 1) To
mitigate multi-modal heterogeneity, we decompose WSIs and transcriptomes into
tumor and microenvironment subspaces using a disentangled multi-modal fusion
module, and introduce a confidence-guided gradient coordination strategy to
balance subspace optimization. 2) To enhance multi-scale integration, we
propose an inter-magnification gene-expression consistency strategy that aligns
transcriptomic signals across WSI magnifications. 3) To reduce dependency on
paired data, we propose a subspace knowledge distillation strategy enabling
transcriptome-agnostic inference through a WSI-only student model. 4) To
improve inference efficiency, we propose an informative token aggregation
module that suppresses WSI redundancy while preserving subspace semantics.
Extensive experiments on cancer diagnosis, prognosis, and survival prediction
demonstrate our superiority over state-of-the-art methods across multiple
settings. Code is available at
https://github.com/helenypzhang/Disentangled-Multimodal-Learning.
中文标题/摘要
标题:解耦多模态组织学与转录组学学习在癌症表征中的应用
组织病理学仍是癌症诊断与预后的金标准。随着转录组分析技术的发展,结合转录组学与组织学的多模态学习提供了更全面的信息。然而,现有多模态方法面临内在模态异质性、多尺度整合不足及对配对数据的依赖等挑战,限制了临床适用性。为此,我们提出解耦多模态框架,包含四项创新:1)通过解耦多模态融合模块将全切片图像和转录组分解为肿瘤与微环境子空间,并采用置信度引导的梯度协调策略平衡子空间优化;2)提出跨放大倍数基因表达一致性策略,实现不同分辨率下的转录组信号对齐;3)设计子空间知识蒸馏策略,使仅需全切片图像的学生模型实现不依赖转录组的推理;4)开发信息令牌聚合模块,在抑制冗余的同时保留子空间语义。在癌症诊断、预后和生存预测的大规模实验中,本方法在多种设定下均优于现有先进技术。代码详见https://github.com/helenypzhang/Disentangled-Multimodal-Learning。
Summary / 总结
Histopathology remains the gold standard for cancer diagnosis and prognosis.
NOSTRA: A noise-resilient and sparse data framework for trust region based multi objective Bayesian optimization
Authors: Maryam Ghasemzadeh, Anton van Beek
First: 2025-08-22T15:43:01+00:00 · Latest: 2025-08-22T15:43:01+00:00
Abstract
Multi-objective Bayesian optimization (MOBO) struggles with sparse
(non-space-filling), scarce (limited observations) datasets affected by
experimental uncertainty, where identical inputs can yield varying outputs.
These challenges are common in physical and simulation experiments (e.g.,
randomized medical trials and, molecular dynamics simulations) and are
therefore incompatible with conventional MOBO methods. As a result,
experimental resources are inefficiently allocated, leading to suboptimal
designs. To address this challenge, we introduce NOSTRA (Noisy and Sparse Data
Trust Region-based Optimization Algorithm), a novel sampling framework that
integrates prior knowledge of experimental uncertainty to construct more
accurate surrogate models while employing trust regions to focus sampling on
promising areas of the design space. By strategically leveraging prior
information and refining search regions, NOSTRA accelerates convergence to the
Pareto frontier, enhances data efficiency, and improves solution quality.
Through two test functions with varying levels of experimental uncertainty, we
demonstrate that NOSTRA outperforms existing methods in handling noisy, sparse,
and scarce data. Specifically, we illustrate that, NOSTRA effectively
prioritizes regions where samples enhance the accuracy of the identified Pareto
frontier, offering a resource-efficient algorithm that is practical in
scenarios with limited experimental budgets while ensuring efficient
performance.
中文标题/摘要
标题:NOSTRA:一种基于信任域的多目标贝叶斯优化的抗噪声稀疏数据框架
多目标贝叶斯优化(MOBO)在处理受实验不确定性影响、具有稀疏性(非空间填充)和稀缺性(有限观测)的数据集时面临挑战,其中相同输入可能产生不同输出。这些挑战常见于物理与仿真实验(如随机化医学试验和分子动力学模拟),与传统MOBO方法不兼容,导致实验资源分配低效和设计次优化。为此,我们提出NOSTRA(基于噪声稀疏数据信任域的优化算法),该新型采样框架整合实验不确定性的先验知识以构建更精确的代理模型,同时采用信任域将采样聚焦于设计空间的有前景区域。通过策略性利用先验信息并精化搜索区域,NOSTRA加速向帕累托前沿收敛,提升数据效率并改进解质量。通过两个具有不同实验不确定性水平的测试函数,我们证明NOSTRA在处理噪声、稀疏和稀缺数据方面优于现有方法。具体而言,NOSTRA能有效优先选择提升已识别帕累托前沿精度的采样区域,提供一种在实验预算有限场景下实用且性能高效的资源优化算法。
Summary / 总结
Multi-objective Bayesian optimization (MOBO) struggles with sparse (non-space-filling), scarce (limited observations) datasets affected by experimental uncertainty, where identical inputs can yield varying outputs.
Reinforcement Learning-based Control via Y-wise Affine Neural Networks (YANNs)
Authors: Austin Braniff, Yuhe Tian
First: 2025-08-22T15:42:03+00:00 · Latest: 2025-08-22T15:42:03+00:00
Abstract
This work presents a novel reinforcement learning (RL) algorithm based on
Y-wise Affine Neural Networks (YANNs). YANNs provide an interpretable neural
network which can exactly represent known piecewise affine functions of
arbitrary input and output dimensions defined on any amount of polytopic
subdomains. One representative application of YANNs is to reformulate explicit
solutions of multi-parametric linear model predictive control. Built on this,
we propose the use of YANNs to initialize RL actor and critic networks, which
enables the resulting YANN-RL control algorithm to start with the confidence of
linear optimal control. The YANN-actor is initialized by representing the
multi-parametric control solutions obtained via offline computation using an
approximated linear system model. The YANN-critic represents the explicit form
of the state-action value function for the linear system and the reward
function as the objective in an optimal control problem (OCP). Additional
network layers are injected to extend YANNs for nonlinear expressions, which
can be trained online by directly interacting with the true complex nonlinear
system. In this way, both the policy and state-value functions exactly
represent a linear OCP initially and are able to eventually learn the solution
of a general nonlinear OCP. Continuous policy improvement is also implemented
to provide heuristic confidence that the linear OCP solution serves as an
effective lower bound to the performance of RL policy. The YANN-RL algorithm is
demonstrated on a clipped pendulum and a safety-critical chemical-reactive
system. Our results show that YANN-RL significantly outperforms the modern RL
algorithm using deep deterministic policy gradient, especially when considering
safety constraints.
中文标题/摘要
标题:基于强化学习的Y-wise仿射神经网络控制(YANNs)
本研究提出了一种基于Y-wise仿射神经网络(YANNs)的新型强化学习(RL)算法。YANNs提供了一种可解释的神经网络,能精确表示定义于任意多面体子域上的多输入输出维度分段仿射函数。其典型应用在于重构多参数线性模型预测控制的显式解。基于此,我们提出使用YANNs初始化RL执行器与评判器网络,使YANN-RL控制算法具备线性最优控制的初始置信度。YANN执行器通过近似线性系统模型的离线计算获得多参数控制解进行初始化,而YANN评判器则表征线性系统状态-动作值函数及最优控制问题(OCP)中目标奖励函数的显式形式。通过注入额外网络层扩展YANNs的非线性表达能力,这些层可通过与真实复杂非线性系统的直接交互进行在线训练。如此,策略函数和状态值函数既能初始精确表征线性OCP,又能最终学习通用非线性OCP的解。持续策略改进机制确保了线性OCP解作为RL策略性能有效下界的启发式置信度。该算法在剪切摆系统和安全关键型化学反应系统上的实验表明,YANN-RL显著优于采用深度确定性策略梯度的现代RL算法,尤其在考虑安全约束时。
Summary / 总结
This work presents a novel reinforcement learning (RL) algorithm based on Y-wise Affine Neural Networks (YANNs).
Arbitrary-Scale 3D Gaussian Super-Resolution
Authors: Huimin Zeng, Yue Bai, Yun Fu
First: 2025-08-22T15:33:48+00:00 · Latest: 2025-08-22T15:33:48+00:00
Abstract
Existing 3D Gaussian Splatting (3DGS) super-resolution methods typically
perform high-resolution (HR) rendering of fixed scale factors, making them
impractical for resource-limited scenarios. Directly rendering arbitrary-scale
HR views with vanilla 3DGS introduces aliasing artifacts due to the lack of
scale-aware rendering ability, while adding a post-processing upsampler for
3DGS complicates the framework and reduces rendering efficiency. To tackle
these issues, we build an integrated framework that incorporates scale-aware
rendering, generative prior-guided optimization, and progressive
super-resolving to enable 3D Gaussian super-resolution of arbitrary scale
factors with a single 3D model. Notably, our approach supports both integer and
non-integer scale rendering to provide more flexibility. Extensive experiments
demonstrate the effectiveness of our model in rendering high-quality
arbitrary-scale HR views (6.59 dB PSNR gain over 3DGS) with a single model. It
preserves structural consistency with LR views and across different scales,
while maintaining real-time rendering speed (85 FPS at 1080p).
中文标题/摘要
标题:任意尺度三维高斯超分辨率
现有三维高斯泼溅(3DGS)超分辨率方法通常仅支持固定尺度因子的高分辨率(HR)渲染,在资源受限场景中实用性不足。直接使用原始3DGS进行任意尺度HR渲染会因缺乏尺度感知能力而产生混叠伪影,而添加后处理上采样器则会增加框架复杂度并降低渲染效率。为此,我们构建了一个集成框架,融合尺度感知渲染、生成先验引导优化和渐进式超分辨率技术,实现单一三维模型支持任意尺度因子的高斯超分辨率。该方法同时支持整数和非整数尺度渲染,具有更高灵活性。大量实验证明,我们的模型能以单一模型实现高质量任意尺度HR视图渲染(较3DGS提升6.59 dB PSNR),在保持与低分辨率视图结构一致性和跨尺度一致性的同时,维持实时渲染速度(1080p分辨率下85 FPS)。
Summary / 总结
Existing 3D Gaussian Splatting (3DGS) super-resolution methods typically perform high-resolution (HR) rendering of fixed scale factors, making them impractical for resource-limited scenarios.