A Disease-Centric Vision-Language Foundation Model for Precision Oncology in Kidney Cancer
Authors: Yuhui Tao, Zhongwei Zhao, Zilong Wang, Xufang Luo, Feng Chen, Kang Wang, Chuanfu Wu, Xue Zhang, Shaoting Zhang, Jiaxi Yao, Xingwei Jin, Xinyang Jiang, Yifan Yang, Dongsheng Li, Lili Qiu, Zhiqiang Shao, Jianming Guo, Nengwang Yu, Shuo Wang, Ying Xiong
First: 2025-08-22T17:48:19+00:00 · Latest: 2025-08-22T17:48:19+00:00
Abstract
The non-invasive assessment of increasingly incidentally discovered renal
masses is a critical challenge in urologic oncology, where diagnostic
uncertainty frequently leads to the overtreatment of benign or indolent tumors.
In this study, we developed and validated RenalCLIP using a dataset of 27,866
CT scans from 8,809 patients across nine Chinese medical centers and the public
TCIA cohort, a visual-language foundation model for characterization, diagnosis
and prognosis of renal mass. The model was developed via a two-stage
pre-training strategy that first enhances the image and text encoders with
domain-specific knowledge before aligning them through a contrastive learning
objective, to create robust representations for superior generalization and
diagnostic precision. RenalCLIP achieved better performance and superior
generalizability across 10 core tasks spanning the full clinical workflow of
kidney cancer, including anatomical assessment, diagnostic classification, and
survival prediction, compared with other state-of-the-art general-purpose CT
foundation models. Especially, for complicated task like recurrence-free
survival prediction in the TCIA cohort, RenalCLIP achieved a C-index of 0.726,
representing a substantial improvement of approximately 20% over the leading
baselines. Furthermore, RenalCLIP's pre-training imparted remarkable data
efficiency; in the diagnostic classification task, it only needs 20% training
data to achieve the peak performance of all baseline models even after they
were fully fine-tuned on 100% of the data. Additionally, it achieved superior
performance in report generation, image-text retrieval and zero-shot diagnosis
tasks. Our findings establish that RenalCLIP provides a robust tool with the
potential to enhance diagnostic accuracy, refine prognostic stratification, and
personalize the management of patients with kidney cancer.
中文标题/摘要
标题:面向肾癌精准肿瘤学的疾病中心化视觉-语言基础模型
对日益多发的偶发性肾占位进行无创评估是泌尿肿瘤学的关键挑战,诊断不确定性常导致良性或惰性肿瘤的过度治疗。本研究利用来自中国九家医疗中心和公共TCIA队列的8,809名患者的27,866次CT扫描数据集,开发并验证了RenalCLIP——一个用于肾占位表征、诊断和预后的视觉-语言基础模型。该模型通过两阶段预训练策略开发:首先用领域特定知识增强图像和文本编码器,再通过对比学习目标对齐它们,以创建具有卓越泛化能力和诊断精度的鲁棒表征。与其它最先进的通用CT基础模型相比,RenalCLIP在涵盖肾癌全临床工作流的10项核心任务(包括解剖评估、诊断分类和生存预测)中表现出更优的性能和泛化能力。尤其在TCIA队列中无复发生存预测这类复杂任务上,RenalCLIP取得了0.726的C指数,较领先基线提升约20%。此外,RenalCLIP的预训练赋予其显著的数据效率:在诊断分类任务中,仅需20%训练数据即可达到所有基线模型使用100%数据充分微调后的峰值性能。该模型在报告生成、图文检索和零样本诊断任务中也实现了卓越性能。我们的研究证明,RenalCLIP为提升肾癌诊断准确性、优化预后分层和实现个体化诊疗提供了强有力的工具。
Summary / 总结
The non-invasive assessment of increasingly incidentally discovered renal masses is a critical challenge in urologic oncology, where diagnostic uncertainty frequently leads to the overtreatment of benign or indolent tumors.
Closer to Reality: Practical Semi-Supervised Federated Learning for Foundation Model Adaptation
Authors: Guangyu Sun, Jingtao Li, Weiming Zhuang, Chen Chen, Chen Chen, Lingjuan Lyu
First: 2025-08-22T17:47:02+00:00 · Latest: 2025-08-22T17:47:02+00:00
Abstract
Foundation models (FMs) exhibit remarkable generalization but require
adaptation to downstream tasks, particularly in privacy-sensitive applications.
Due to data privacy regulations, cloud-based FMs cannot directly access private
edge data, limiting their adaptation. Federated learning (FL) provides a
privacy-aware alternative, but existing FL approaches overlook the constraints
imposed by edge devices -- namely, limited computational resources and the
scarcity of labeled data. To address these challenges, we introduce Practical
Semi-Supervised Federated Learning (PSSFL), where edge devices hold only
unlabeled, low-resolution data, while the server has limited labeled,
high-resolution data. In this setting, we propose the Federated Mixture of
Experts (FedMox), a novel framework that enhances FM adaptation in FL. FedMox
tackles computational and resolution mismatch challenges via a sparse
Mixture-of-Experts architecture, employing a spatial router to align features
across resolutions and a Soft-Mixture strategy to stabilize semi-supervised
learning. We take object detection as a case study, and experiments on
real-world autonomous driving datasets demonstrate that FedMox effectively
adapts FMs under PSSFL, significantly improving performance with constrained
memory costs on edge devices. Our work paves the way for scalable and
privacy-preserving FM adaptation in federated scenarios.
中文标题/摘要
标题:更贴近现实:面向基础模型适配的实用半监督联邦学习
基础模型(FMs)展现出卓越的泛化能力,但需适配下游任务,尤其在隐私敏感应用中。受数据隐私法规限制,云端FMs无法直接访问私有边缘数据,制约了其适配能力。联邦学习(FL)提供了隐私保护的替代方案,但现有FL方法忽视了边缘设备的两大约束——有限的计算资源和标注数据稀缺。为此,我们提出实用半监督联邦学习(PSSFL),其中边缘设备仅持有未标注的低分辨率数据,而服务器拥有有限标注的高分辨率数据。在此设定下,我们创新性地提出联邦专家混合框架(FedMox),通过稀疏专家混合架构应对计算与分辨率失配挑战:采用空间路由器实现跨分辨率特征对齐,运用软混合策略稳定半监督学习。以目标检测为案例研究,在真实自动驾驶数据集上的实验表明,FedMox能在PSSFL下有效适配FMs,在边缘设备有限内存成本约束下显著提升性能。本研究为联邦场景中可扩展且隐私保护的FM适配开辟了新途径。
Summary / 总结
Foundation models (FMs) exhibit remarkable generalization but require adaptation to downstream tasks, particularly in privacy-sensitive applications.
Explicit Correspondence Matching for Generalizable Neural Radiance Fields
Authors: Yuedong Chen, Haofei Xu, Qianyi Wu, Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai
Venue: IEEE Transactions on Pattern Analysis and Machine Intelligence,
2025
First: 2023-04-24T17:46:01+00:00 · Latest: 2025-08-22T17:46:35+00:00
Comments: TPAMI 2025, Project page: https://donydchen.github.io/matchnerf,
Code: https://github.com/donydchen/matchnerf
Abstract
We present a new generalizable NeRF method that is able to directly
generalize to new unseen scenarios and perform novel view synthesis with as few
as two source views. The key to our approach lies in the explicitly modeled
correspondence matching information, so as to provide the geometry prior to the
prediction of NeRF color and density for volume rendering. The explicit
correspondence matching is quantified with the cosine similarity between image
features sampled at the 2D projections of a 3D point on different views, which
is able to provide reliable cues about the surface geometry. Unlike previous
methods where image features are extracted independently for each view, we
consider modeling the cross-view interactions via Transformer cross-attention,
which greatly improves the feature matching quality. Our method achieves
state-of-the-art results on different evaluation settings, with the experiments
showing a strong correlation between our learned cosine feature similarity and
volume density, demonstrating the effectiveness and superiority of our proposed
method. The code and model are on our project page:
https://donydchen.github.io/matchnerf
中文标题/摘要
标题:显式对应匹配实现可泛化神经辐射场
我们提出了一种新颖的可泛化NeRF方法,能够直接适应新的未见场景,并仅需两个源视图即可进行新颖视角合成。该方法的核心在于显式建模的对应匹配信息,为体积渲染中NeRF颜色和密度预测提供几何先验。通过计算三维点在不同视图二维投影上采样的图像特征间余弦相似度来量化显式对应匹配,这能为表面几何提供可靠线索。与以往各视图独立提取特征的方法不同,我们通过Transformer交叉注意力机制建模跨视图交互,显著提升了特征匹配质量。该方法在不同评估设置下均取得最先进成果,实验显示学习的余弦特征相似度与体积密度存在强相关性,证明了所提方法的有效性和优越性。代码与模型详见项目页面:https://donydchen.github.io/matchnerf
Summary / 总结
We present a new generalizable NeRF method that is able to directly generalize to new unseen scenarios and perform novel view synthesis with as few as two source views.
Establishing Task Scaling Laws via Compute-Efficient Model Ladders
Authors: Akshita Bhagia, Jiacheng Liu, Alexander Wettig, David Heineman, Oyvind Tafjord, Ananya Harsh Jha, Luca Soldaini, Noah A. Smith, Dirk Groeneveld, Pang Wei Koh, Jesse Dodge, Hannaneh Hajishirzi
First: 2024-12-05T18:21:49+00:00 · Latest: 2025-08-22T17:27:09+00:00
Comments: COLM 2025
Abstract
We develop task scaling laws and model ladders to predict the individual task
performance of pretrained language models (LMs) in the overtrained setting.
Standard power laws for language modeling loss cannot accurately model task
performance. Therefore, we leverage a two-step prediction approach: (1) use
model and data size to predict an intermediate loss, then (2) use it to predict
task performance. We train a set of small-scale "ladder" models, collect data
points to fit the parameterized functions of the two prediction steps, and make
predictions for two target models: a 7B model trained to 4T tokens and a 13B
model trained to 5T tokens. Training the ladder models only costs 1% of the
compute used for the target models. On four multiple-choice tasks formatted as
ranked classification, we can predict the accuracy of both target models within
2 points of absolute error. We find that tasks with higher prediction error
also have higher variance in the metrics over model checkpoints. We also
contrast multiple design choices for predicting accuracy, and present
recommendations for extending our method to new models and tasks.
中文标题/摘要
标题:通过计算高效模型阶梯建立任务缩放定律
我们开发了任务缩放定律和模型阶梯,用于预测预训练语言模型(LMs)在过度训练设置下的个体任务表现。标准的语言建模损失幂律无法准确建模任务性能,因此采用两步预测方法:(1)利用模型和数据规模预测中间损失,(2)用其预测任务性能。通过训练一组小规模“阶梯”模型收集数据点,拟合两个预测步骤的参数化函数,并对两个目标模型(训练至4T词元的7B模型和训练至5T词元的13B模型)进行预测。阶梯模型的训练成本仅占目标模型计算量的1%。在四个格式化为排序分类的多选题任务上,我们能将两个目标模型的准确率预测误差控制在2个百分点内。发现预测误差较高的任务在模型检查点指标上也具有更高方差。同时对比了多种准确率预测的设计方案,并提出将方法扩展到新模型和任务的建议。
Summary / 总结
We develop task scaling laws and model ladders to predict the individual task performance of pretrained language models (LMs) in the overtrained setting.
A Curious Case of Remarkable Resilience to Gradient Attacks via Fully Convolutional and Differentiable Front End with a Skip Connection
Authors: Leonid Boytsov, Ameya Joshi, Filipe Condessa
First: 2024-02-26T20:55:47+00:00 · Latest: 2025-08-22T17:26:54+00:00
Comments: Accepted at TMLR (2025/08)
Abstract
We experimented with front-end enhanced neural models where a differentiable
and fully convolutional model with a skip connection is added before a frozen
backbone classifier. By training such composite models using a small learning
rate for about one epoch, we obtained models that retained the accuracy of the
backbone classifier while being unusually resistant to gradient
attacks-including APGD and FAB-T attacks from the AutoAttack package-which we
attribute to gradient masking. Although gradient masking is not new, the degree
we observe is striking for fully differentiable models without obvious
gradient-shattering-e.g., JPEG compression-or gradient-diminishing components.
The training recipe to produce such models is also remarkably stable and
reproducible: We applied it to three datasets (CIFAR10, CIFAR100, and ImageNet)
and several modern architectures (including vision Transformers) without a
single failure case. While black-box attacks such as the SQUARE attack and
zero-order PGD can partially overcome gradient masking, these attacks are
easily defeated by simple randomized ensembles. We estimate that these
ensembles achieve near-SOTA AutoAttack accuracy on CIFAR10, CIFAR100, and
ImageNet (while retaining almost all clean accuracy of the original
classifiers) despite having near-zero accuracy under adaptive attacks.
Adversarially training the backbone further amplifies this front-end
"robustness". On CIFAR10, the respective randomized ensemble achieved 90.8$\pm
2.5\%$ (99\% CI) accuracy under the full AutoAttack while having only 18.2$\pm
3.6\%$ accuracy under the adaptive attack ($\varepsilon=8/255$, $L^\infty$
norm). We conclude the paper with a discussion of whether randomized ensembling
can serve as a practical defense.
Code and instructions to reproduce key results are available.
https://github.com/searchivarius/curious_case_of_gradient_masking
中文标题/摘要
标题:一种通过全卷积可微分前端与跳跃连接展现对梯度攻击显著鲁棒性的奇特案例
我们实验了前端增强的神经模型,在冻结的主干分类器前添加了带有跳跃连接的可微分全卷积模型。通过以较小学习率训练约一个周期,获得的模型在保持主干分类器精度的同时,对梯度攻击(包括AutoAttack包中的APGD和FAB-T攻击)表现出异常抵抗力,这归因于梯度掩蔽。虽然梯度掩蔽并非新现象,但在完全可微分且无梯度破碎(如JPEG压缩)或梯度衰减组件的模型中观察到的程度令人惊讶。该训练方法稳定且可复现:应用于三个数据集(CIFAR10、CIFAR100和ImageNet)及多种现代架构(包括视觉Transformer)均未出现失败案例。虽然黑盒攻击(如SQUARE攻击和零阶PGD)可部分克服梯度掩蔽,但简单随机集成能轻松抵御这些攻击。我们估计这些集成在CIFAR10、CIFAR100和ImageNet上实现了接近SOTA的AutoAttack精度(同时保持原分类器几乎所有清洁精度),尽管在自适应攻击下精度近乎为零。对抗训练主干网络进一步增强了前端“鲁棒性”。在CIFAR10上,相应随机集成在完整AutoAttack下达到90.8±2.5%(99%置信区间)精度,而在自适应攻击下仅18.2±3.6%精度(ε=8/255,L∞范数)。最后我们讨论了随机集成能否作为实用防御策略。代码与复现指南详见:https://github.com/searchivarius/curious_case_of_gradient_masking
Summary / 总结
We experimented with front-end enhanced neural models where a differentiable and fully convolutional model with a skip connection is added before a frozen backbone classifier.
Sparse but Wrong: Incorrect L0 Leads to Incorrect Features in Sparse Autoencoders
Authors: David Chanin, Adrià Garriga-Alonso
First: 2025-08-22T17:26:33+00:00 · Latest: 2025-08-22T17:26:33+00:00
Abstract
Sparse Autoencoders (SAEs) extract features from LLM internal activations,
meant to correspond to single concepts. A core SAE training hyperparameter is
L0: how many features should fire per token on average. Existing work compares
SAE algorithms using sparsity--reconstruction tradeoff plots, implying L0 is a
free parameter with no single correct value. In this work we study the effect
of L0 on BatchTopK SAEs, and show that if L0 is not set precisely, the SAE
fails to learn the underlying features of the LLM. If L0 is too low, the SAE
will mix correlated features to improve reconstruction. If L0 is too high, the
SAE finds degenerate solutions that also mix features. Further, we demonstrate
a method to determine the correct L0 value for an SAE on a given training
distribution, which finds the true L0 in toy models and coincides with peak
sparse probing performance in LLMs. We find that most commonly used SAEs have
an L0 that is too low. Our work shows that, to train SAEs with correct
features, practitioners must set L0 correctly.
中文标题/摘要
标题:稀疏但错误:错误的L0导致稀疏自编码器中的特征提取错误
稀疏自编码器(SAE)从大语言模型的内部激活中提取特征,这些特征本应对应单一概念。SAE训练的核心超参数L0表示每个令牌平均应激活的特征数量。现有研究通过稀疏度-重构权衡图比较SAE算法,暗示L0是可自由调节的参数。本研究探讨了L0对BatchTopK SAE的影响,表明若L0设置不精确,SAE将无法学习大语言模型的基础特征:L0过低会使SAE混合相关特征以改善重构;L0过高则会导致退化解并混合特征。我们进一步提出一种确定特定训练分布下SAE正确L0值的方法,该方法在玩具模型中能找到真实L0值,且与大语言模型中稀疏探测性能峰值吻合。研究发现常用SAE的L0普遍偏低。这项工作表明,要训练出具有正确特征的SAE,必须准确设置L0参数。
Summary / 总结
Sparse Autoencoders (SAEs) extract features from LLM internal activations, meant to correspond to single concepts.
Time-Aware One Step Diffusion Network for Real-World Image Super-Resolution
Authors: Tainyi Zhang, Zheng-Peng Duan, Peng-Tao Jiang, Bo Li, Ming-Ming Cheng, Chun-Le Guo, Chongyi Li
First: 2025-08-22T17:23:49+00:00 · Latest: 2025-08-22T17:23:49+00:00
Abstract
Diffusion-based real-world image super-resolution (Real-ISR) methods have
demonstrated impressive performance. To achieve efficient Real-ISR, many works
employ Variational Score Distillation (VSD) to distill pre-trained
stable-diffusion (SD) model for one-step SR with a fixed timestep. However, due
to the different noise injection timesteps, the SD will perform different
generative priors. Therefore, a fixed timestep is difficult for these methods
to fully leverage the generative priors in SD, leading to suboptimal
performance. To address this, we propose a Time-Aware one-step Diffusion
Network for Real-ISR (TADSR). We first introduce a Time-Aware VAE Encoder,
which projects the same image into different latent features based on
timesteps. Through joint dynamic variation of timesteps and latent features,
the student model can better align with the input pattern distribution of the
pre-trained SD, thereby enabling more effective utilization of SD's generative
capabilities. To better activate the generative prior of SD at different
timesteps, we propose a Time-Aware VSD loss that bridges the timesteps of the
student model and those of the teacher model, thereby producing more consistent
generative prior guidance conditioned on timesteps. Additionally, though
utilizing the generative prior in SD at different timesteps, our method can
naturally achieve controllable trade-offs between fidelity and realism by
changing the timestep condition. Experimental results demonstrate that our
method achieves both state-of-the-art performance and controllable SR results
with only a single step.
中文标题/摘要
标题:面向真实世界图像超分辨率的时序感知单步扩散网络
基于扩散模型的真实图像超分辨率(Real-ISR)方法已展现出卓越性能。为实现高效Real-ISR,许多研究采用变分分数蒸馏(VSD)技术,以固定时间步长蒸馏预训练稳定扩散(SD)模型实现单步超分。但由于不同噪声注入时间步会导致SD生成先验的差异,固定时间步长难以充分利用SD的生成先验,导致性能次优。为此,我们提出时序感知单步扩散网络(TADSR)。首先设计时序感知VAE编码器,根据时间步将同一图像映射为不同潜在特征。通过时间步与潜在特征的联合动态变化,学生模型能更好对齐预训练SD的输入模式分布,从而更有效利用其生成能力。为进一步激活SD在不同时间步的生成先验,提出时序感知VSD损失函数,桥接学生模型与教师模型的时间步,产生更符合时间步条件的生成先验指导。此外,通过利用SD在不同时间步的生成先验,本方法可通过改变时间步条件自然实现保真度与真实感的可控权衡。实验结果表明,我们的方法仅需单步即可同时实现最先进性能和可控超分结果。
Summary / 总结
Diffusion-based real-world image super-resolution (Real-ISR) methods have demonstrated impressive performance.
Transfer Learning via Lexical Relatedness: A Sarcasm and Hate Speech Case Study
Authors: Angelly Cabrera, Linus Lei, Antonio Ortega
First: 2025-08-22T17:23:08+00:00 · Latest: 2025-08-22T17:23:08+00:00
Abstract
Detecting hate speech in non-direct forms, such as irony, sarcasm, and
innuendos, remains a persistent challenge for social networks. Although sarcasm
and hate speech are regarded as distinct expressions, our work explores whether
integrating sarcasm as a pre-training step improves implicit hate speech
detection and, by extension, explicit hate speech detection. Incorporating
samples from ETHOS, Sarcasm on Reddit, and Implicit Hate Corpus, we devised two
training strategies to compare the effectiveness of sarcasm pre-training on a
CNN+LSTM and BERT+BiLSTM model. The first strategy is a single-step training
approach, where a model trained only on sarcasm is then tested on hate speech.
The second strategy uses sequential transfer learning to fine-tune models for
sarcasm, implicit hate, and explicit hate. Our results show that sarcasm
pre-training improved the BERT+BiLSTM's recall by 9.7%, AUC by 7.8%, and
F1-score by 6% on ETHOS. On the Implicit Hate Corpus, precision increased by
7.8% when tested only on implicit samples. By incorporating sarcasm into the
training process, we show that models can more effectively detect both implicit
and explicit hate.
中文标题/摘要
标题:基于词汇相关性的迁移学习:讽刺与仇恨言论案例研究
检测非直接形式的仇恨言论,如反讽、讽刺和影射,仍是社交媒体面临的持续挑战。尽管讽刺与仇恨言论被视为不同的表达方式,本研究探讨将讽刺检测作为预训练步骤是否能提升隐式仇恨言论检测效果,并进而改善显式仇恨言论检测。通过整合ETHOS、Reddit讽刺语料和隐式仇恨语料库的样本,我们设计了两种训练策略来比较CNN+LSTM与BERT+BiLSTM模型中讽刺预训练的效果。第一种是单步训练策略,即仅在讽刺数据上训练的模型直接测试仇恨言论检测;第二种采用序列迁移学习,依次对讽刺、隐式仇恨和显式仇恨进行模型微调。实验结果表明:在ETHOS数据集上,讽刺预训练使BERT+BiLSTM模型的召回率提升9.7%,AUC提高7.8%,F1分数增长6%;在隐式仇恨语料库中,仅测试隐式样本时精确度上升7.8%。研究表明,将讽刺纳入训练过程能有效提升模型对隐性与显性仇恨言论的检测能力。
Summary / 总结
Detecting hate speech in non-direct forms, such as irony, sarcasm, and innuendos, remains a persistent challenge for social networks.
Machine Learning Time Propagators for Time-Dependent Density Functional Theory Simulations
Authors: Karan Shah, Attila Cangi
First: 2025-08-22T17:22:24+00:00 · Latest: 2025-08-22T17:22:24+00:00
Comments: 20 pages, 5 figures
Abstract
Time-dependent density functional theory (TDDFT) is a widely used method to
investigate electron dynamics under external time-dependent perturbations such
as laser fields. In this work, we present a novel approach to accelerate
electron dynamics simulations based on real time TDDFT using autoregressive
neural operators as time-propagators for the electron density. By leveraging
physics-informed constraints and featurization, and high-resolution training
data, our model achieves superior accuracy and computational speed compared to
traditional numerical solvers. We demonstrate the effectiveness of our model on
a class of one-dimensional diatomic molecules under the influence of a range of
laser parameters. This method has potential in enabling real-time, on-the-fly
modeling of laser-irradiated molecules and materials with varying experimental
parameters.
中文标题/摘要
标题:机器学习时间传播子用于含时密度泛函理论模拟
含时密度泛函理论(TDDFT)是研究外场时变扰动(如激光场)下电子动力学的常用方法。本研究提出一种创新方法,通过使用自回归神经算子作为电子密度的时间传播子,加速基于实时TDDFT的电子动力学模拟。通过结合物理约束、特征化处理及高分辨率训练数据,我们的模型在精度和计算速度上均优于传统数值求解器。我们在一维双原子分子体系上验证了该方法在不同激光参数下的有效性,此技术有望实现对实验参数变化的激光辐照分子与材料进行实时动态建模。
Summary / 总结
Time-dependent density functional theory (TDDFT) is a widely used method to investigate electron dynamics under external time-dependent perturbations such as laser fields.
TinyML Towards Industry 4.0: Resource-Efficient Process Monitoring of a Milling Machine
Authors: Tim Langer, Matthias Widra, Volkhard Beyer
First: 2025-08-22T17:21:56+00:00 · Latest: 2025-08-22T17:21:56+00:00
Comments: 10 pages, 5 figures, 1 table
Abstract
In the context of industry 4.0, long-serving industrial machines can be
retrofitted with process monitoring capabilities for future use in a smart
factory. One possible approach is the deployment of wireless monitoring
systems, which can benefit substantially from the TinyML paradigm. This work
presents a complete TinyML flow from dataset generation, to machine learning
model development, up to implementation and evaluation of a full preprocessing
and classification pipeline on a microcontroller. After a short review on
TinyML in industrial process monitoring, the creation of the novel MillingVibes
dataset is described. The feasibility of a TinyML system for
structure-integrated process quality monitoring could be shown by the
development of an 8-bit-quantized convolutional neural network (CNN) model with
12.59kiB parameter storage. A test accuracy of 100.0% could be reached at
15.4ms inference time and 1.462mJ per quantized CNN inference on an ARM Cortex
M4F microcontroller, serving as a reference for future TinyML process
monitoring solutions.
中文标题/摘要
标题:TinyML迈向工业4.0:铣床资源高效型过程监控
在工业4.0背景下,可为长期服役的工业机械加装过程监控功能,以适应未来智能工厂的应用。部署无线监控系统是一种可行方案,其能显著受益于TinyML范式。本研究展示了完整的TinyML流程,涵盖从数据集生成、机器学习模型开发,到在微控制器上实现并评估完整的预处理与分类流水线。在简要回顾工业过程监控中的TinyML应用后,详细介绍了新型MillingVibes数据集的创建过程。通过开发参数量存储仅12.59kiB的8位量化卷积神经网络(CNN)模型,验证了结构集成式过程质量监控的TinyML系统可行性。在ARM Cortex M4F微控制器上实现了100.0%的测试准确率,单次量化CNN推理耗时15.4毫秒、能耗1.462毫焦,为未来TinyML过程监控方案提供了参考基准。
Summary / 总结
In the context of industry 4.0, long-serving industrial machines can be retrofitted with process monitoring capabilities for future use in a smart factory.
Enhanced NIRMAL Optimizer With Damped Nesterov Acceleration: A Comparative Analysis
Authors: Nirmal Gaud, Prasad Krishna Murthy, Mostaque Md. Morshedur Hassan, Abhijit Ganguly, Vinay Mali, Ms Lalita Bhagwat Randive, Abhaypratap Singh
First: 2025-08-22T17:16:06+00:00 · Latest: 2025-08-22T17:16:06+00:00
Comments: 7 pages, 1 figure, 1 table. arXiv admin note: substantial text
overlap with arXiv:2508.04293
Abstract
This study introduces the Enhanced NIRMAL (Novel Integrated Robust
Multi-Adaptation Learning with Damped Nesterov Acceleration) optimizer, an
improved version of the original NIRMAL optimizer. By incorporating an
$(\alpha, r)$-damped Nesterov acceleration mechanism, Enhanced NIRMAL improves
convergence stability while retaining chess-inspired strategies of gradient
descent, momentum, stochastic perturbations, adaptive learning rates, and
non-linear transformations.
We evaluate Enhanced NIRMAL against Adam, SGD with Momentum, Nesterov, and
the original NIRMAL on four benchmark image classification datasets: MNIST,
FashionMNIST, CIFAR-10, and CIFAR-100, using tailored convolutional neural
network (CNN) architectures.
Enhanced NIRMAL achieves a test accuracy of 46.06\% and the lowest test loss
(1.960435) on CIFAR-100, surpassing the original NIRMAL (44.34\% accuracy) and
closely rivaling SGD with Momentum (46.43\% accuracy). These results underscore
Enhanced NIRMAL's superior generalization and stability, particularly on
complex datasets.
中文标题/摘要
标题:增强型NIRMAL优化器与阻尼Nesterov加速的对比分析
本研究介绍了增强型NIRMAL(新型集成鲁棒多适应学习与阻尼Nesterov加速)优化器,作为原始NIRMAL优化器的改进版本。通过引入(α, r)-阻尼Nesterov加速机制,增强型NIRMAL在保留梯度下降、动量、随机扰动、自适应学习率和非线性变换等棋类启发的策略的同时,提升了收敛稳定性。我们在MNIST、FashionMNIST、CIFAR-10和CIFAR-100四个基准图像分类数据集上,采用定制卷积神经网络(CNN)架构,将增强型NIRMAL与Adam、带动量的SGD、Nesterov及原始NIRMAL进行对比评估。增强型NIRMAL在CIFAR-100上取得了46.06%的测试准确率和最低测试损失(1.960435),超越了原始NIRMAL(44.34%准确率),并与带动量的SGD(46.43%准确率)表现接近。这些结果凸显了增强型NIRMAL在复杂数据集上卓越的泛化能力和稳定性。
Summary / 总结
This study introduces the Enhanced NIRMAL (Novel Integrated Robust Multi-Adaptation Learning with Damped Nesterov Acceleration) optimizer, an improved version of the original NIRMAL optimizer.
RL Is Neither a Panacea Nor a Mirage: Understanding Supervised vs. Reinforcement Learning Fine-Tuning for LLMs
Authors: Hangzhan Jin, Sicheng Lv, Sifan Wu, Mohammad Hamdaqa
First: 2025-08-22T17:10:37+00:00 · Latest: 2025-08-22T17:10:37+00:00
Abstract
Training large language models (LLMs) from scratch is increasingly
impractical, making post-training methods such as supervised fine-tuning (SFT)
and reinforcement-learning fine-tuning (RL-FT, e.g., PPO) central to modern
practice. Using an out-of-distribution (OOD) variant of the 24-point card game
and new spectrum-based diagnostics, we revisit how these two stages reshape
model representation and OOD performance. Our key findings are- (1) RL-FT can
restore much of the OOD performance loss from SFT (e.g., Llama-11B 8.97% to
15.38%, Qwen-7B 17.09% to 19.66%). But when SFT induces severe overfitting and
a clear distribution shift, RL-FT cannot fully recover OOD performance. (2)
Direction shifts of singular vectors matter more than singular value
magnitudes. These shifts concentrate on directions linked to the largest and
smallest singular values, leaving the bulk spectrum intact. (3) Low-rank and
shallow recovery is effective: restoring singular vector directions for the top
20% of values or first 25% of layers recovers 70-80% of OOD performance. (4)
Stronger SFT checkpoints enable better recovery by RL, while overfitted ones
resist restoration. These results reconcile prior reports of RL superior OOD
performance: RL primarily counteracts SFT-induced directional drift rather than
finding new solutions. Our spectrum-aware analysis highlights inexpensive
recovery knobs low-rank UV merging and shallow-layer resets that practitioners
can use before costly RL fine-tuning.
中文标题/摘要
标题:强化学习既非万能亦非幻影:理解监督与强化学习在大型语言模型微调中的作用
从头训练大型语言模型(LLMs)日益不切实际,使得监督微调(SFT)和强化学习微调(RL-FT,如PPO)成为现代实践的核心。通过采用24点卡牌的分布外(OOD)变体及新型频谱诊断方法,我们重新审视这两个阶段如何重塑模型表示与OOD性能。主要发现包括:(1)RL-FT可大幅恢复SFT导致的OOD性能损失(如Llama-11B从8.97%升至15.38%,Qwen-7B从17.09%升至19.66%),但当SFT引发严重过拟合和明显分布偏移时,RL-FT无法完全恢复;(2)奇异向量方向偏移比奇异值幅度更重要,这种偏移集中在最大和最小奇异值方向,而主体频谱保持稳定;(3)低秩浅层恢复有效:恢复前20%奇异值或前25%层的向量方向可重建70-80%的OOD性能;(4)强SFT检查点更易被RL修复,而过拟合检查点难以恢复。这些发现调和了先前关于RL优越OOD性能的报告:RL主要抵消SFT引发的方向漂移而非寻找新解决方案。我们的频谱感知分析揭示了低成本恢复手段——低秩UV合并和浅层重置,可供实践者在昂贵RL微调前采用。
Summary / 总结
Training large language models (LLMs) from scratch is increasingly impractical, making post-training methods such as supervised fine-tuning (SFT) and reinforcement-learning fine-tuning (RL-FT, e.g., PPO) central to modern practice.
Parameter-Free Logit Distillation via Sorting Mechanism
Authors: Stephen Ekaputra Limantoro
First: 2025-08-22T17:09:38+00:00 · Latest: 2025-08-22T17:09:38+00:00
Comments: Accepted in IEEE Signal Processing Letters 2025
Abstract
Knowledge distillation (KD) aims to distill the knowledge from the teacher
(larger) to the student (smaller) model via soft-label for the efficient neural
network. In general, the performance of a model is determined by accuracy,
which is measured with labels. However, existing KD approaches usually use the
teacher with its original distribution, neglecting the potential of incorrect
prediction. This may contradict the motivation of hard-label learning through
cross-entropy loss, which may lead to sub-optimal knowledge distillation on
certain samples. To address this issue, we propose a novel logit processing
scheme via a sorting mechanism. Specifically, our method has a two-fold goal:
(1) fixing the incorrect prediction of the teacher based on the labels and (2)
reordering the distribution in a natural way according to priority rank at
once. As an easy-to-use, plug-and-play pre-processing, our sort method can be
effectively applied to existing logit-based KD methods. Extensive experiments
on the CIFAR-100 and ImageNet datasets demonstrate the effectiveness of our
method.
中文标题/摘要
标题:基于排序机制的无参数Logit蒸馏
知识蒸馏(KD)旨在通过软标签将教师(较大)模型的知识提炼给学生(较小)模型,以实现高效的神经网络。通常,模型性能由基于标签的准确率决定。然而,现有KD方法多直接采用教师模型的原始分布,忽略了错误预测的潜在影响。这可能与通过交叉熵损失进行硬标签学习的动机相矛盾,导致在某些样本上产生次优的知识蒸馏效果。为解决此问题,我们提出了一种通过排序机制的新型logit处理方案。具体而言,我们的方法具有双重目标:(1)基于标签修正教师模型的错误预测;(2)按优先级排序自然重排分布。作为一种即插即用的预处理方法,我们的排序技术可有效应用于现有基于logit的KD方法。在CIFAR-100和ImageNet数据集上的大量实验证明了该方法的有效性。
Summary / 总结
Knowledge distillation (KD) aims to distill the knowledge from the teacher (larger) to the student (smaller) model via soft-label for the efficient neural network.
Explainable AI in Deep Learning-Based Prediction of Solar Storms
Authors: Adam O. Rawashdeh, Jason T. L. Wang, Katherine G. Herbert
First: 2025-08-22T17:09:00+00:00 · Latest: 2025-08-22T17:09:00+00:00
Comments: 6 pages, 8 figures
Abstract
A deep learning model is often considered a black-box model, as its internal
workings tend to be opaque to the user. Because of the lack of transparency, it
is challenging to understand the reasoning behind the model's predictions.
Here, we present an approach to making a deep learning-based solar storm
prediction model interpretable, where solar storms include solar flares and
coronal mass ejections (CMEs). This deep learning model, built based on a long
short-term memory (LSTM) network with an attention mechanism, aims to predict
whether an active region (AR) on the Sun's surface that produces a flare within
24 hours will also produce a CME associated with the flare. The crux of our
approach is to model data samples in an AR as time series and use the LSTM
network to capture the temporal dynamics of the data samples. To make the
model's predictions accountable and reliable, we leverage post hoc
model-agnostic techniques, which help elucidate the factors contributing to the
predicted output for an input sequence and provide insights into the model's
behavior across multiple sequences within an AR. To our knowledge, this is the
first time that interpretability has been added to an LSTM-based solar storm
prediction model.
中文标题/摘要
标题:基于深度学习的太阳风暴预测中的可解释性人工智能
深度学习模型常被视为黑箱模型,因其内部机制对用户而言往往不透明。这种透明度的缺失使得理解模型预测背后的逻辑具有挑战性。本文提出一种方法,使基于深度学习的太阳风暴预测模型具备可解释性,其中太阳风暴包括太阳耀斑和日冕物质抛射(CMEs)。该深度学习模型基于带有注意力机制的长短期记忆(LSTM)网络构建,旨在预测太阳表面活动区(AR)在24小时内产生耀斑的同时是否会引发与之关联的CME。我们的方法核心是将AR中的数据样本建模为时间序列,并利用LSTM网络捕捉其时序动态特性。为确保模型预测的可追溯性和可靠性,我们采用事后模型无关技术,这些技术能阐明输入序列中对预测结果产生影响的因素,并揭示模型在AR内多个序列中的行为模式。据我们所知,这是首次在基于LSTM的太阳风暴预测模型中引入可解释性。
Summary / 总结
A deep learning model is often considered a black-box model, as its internal workings tend to be opaque to the user.
Escaping Saddle Points via Curvature-Calibrated Perturbations: A Complete Analysis with Explicit Constants and Empirical Validation
Authors: Faruk Alpay, Hamdi Alakkad
First: 2025-08-22T17:06:28+00:00 · Latest: 2025-08-22T17:06:28+00:00
Comments: 16 pages. Perturbed gradient descent with fully explicit constants
for escaping saddle points, validated empirically
Abstract
We present a comprehensive theoretical analysis of first-order methods for
escaping strict saddle points in smooth non-convex optimization. Our main
contribution is a Perturbed Saddle-escape Descent (PSD) algorithm with fully
explicit constants and a rigorous separation between gradient-descent and
saddle-escape phases. For a function $f:\mathbb{R}^d\to\mathbb{R}$ with
$\ell$-Lipschitz gradient and $\rho$-Lipschitz Hessian, we prove that PSD finds
an $(\epsilon,\sqrt{\rho\epsilon})$-approximate second-order stationary point
with high probability using at most $O(\ell\Delta_f/\epsilon^2)$ gradient
evaluations for the descent phase plus
$O((\ell/\sqrt{\rho\epsilon})\log(d/\delta))$ evaluations per escape episode,
with at most $O(\ell\Delta_f/\epsilon^2)$ episodes needed. We validate our
theoretical predictions through extensive experiments across both synthetic
functions and practical machine learning tasks, confirming the logarithmic
dimension dependence and the predicted per-episode function decrease. We also
provide complete algorithmic specifications including a finite-difference
variant (PSD-Probe) and a stochastic extension (PSGD) with robust mini-batch
sizing.
中文标题/摘要
标题:通过曲率校准扰动逃离鞍点:含显式常数与实证验证的完整分析
本文对光滑非凸优化中逃离严格鞍点的一阶方法进行了全面理论分析。核心贡献是提出了具有完全显式常数的扰动鞍点逃离下降算法(PSD),并严格区分梯度下降阶段与鞍点逃离阶段。对于梯度Lipschitz常数为ℓ、Hessian矩阵Lipschitz常数为ρ的函数f:ℝᵈ→ℝ,我们证明PSD能以高概率找到(ε,√(ρε))-近似二阶稳定点,其中下降阶段最多使用O(ℓΔ_f/ε²)次梯度计算,每次逃离事件需O((ℓ/√(ρε))log(d/δ))次计算,且最多需要O(ℓΔ_f/ε²)次逃离事件。通过合成函数和实际机器学习任务的广泛实验,我们验证了理论预测,确认了对数维度依赖性和预测的每次事件函数下降量。同时提供了完整算法规范,包括有限差分变体(PSD-Probe)和具有鲁棒小批量大小的随机扩展(PSGD)。
Summary / 总结
We present a comprehensive theoretical analysis of first-order methods for escaping strict saddle points in smooth non-convex optimization.