Py学习  »  机器学习算法

机器学习学术速递[1.6]

arXiv每日学术速递 • 2 周前 • 408 次点击  

点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!


cs.LG 方向,今日共计247篇


大模型相关(23篇)

【1】Heterogeneous Low-Bandwidth Pre-Training of LLMs
标题:LLM的异类低带宽预训练
链接:https://arxiv.org/abs/2601.02360

作者:Yazan Obeidi,Amir Sarfi,Joel Lidin,Paul Janson,Eugene Belilovsky
摘要:预训练大型语言模型(LLM)越来越需要分布式计算,但带宽限制使其难以扩展到配置良好的数据中心之外,特别是当模型并行性迫使频繁的大型设备间通信时。我们研究是否SparseLoCo,低通信数据并行方法的基础上不频繁的同步和稀疏伪梯度交换,可以结合低带宽的流水线模型并行通过激活和激活梯度压缩。我们引入了一个异构的分布式训练框架,其中一些参与者在高带宽互连上托管完整的副本,而资源有限的参与者被分组,共同实例化一个副本使用管道并行与子空间投影级间通信。为了使最近推出的子空间管道压缩兼容SparseLoCo,我们研究了一些适应。在标准预训练语料库上的大规模语言建模实验(178 M-1B参数)中,我们发现激活压缩以适度的成本与SparseLoCo组合,而选择性(异构)压缩相对于压缩所有副本始终提高了损失-通信权衡,特别是在积极的压缩比下。这些结果表明,将低带宽模型并行性和异构参与者纳入LLM预训练的实用路径。
摘要:Pre-training large language models (LLMs) increasingly requires distributed compute, yet bandwidth constraints make it difficult to scale beyond well-provisioned datacenters-especially when model parallelism forces frequent, large inter-device communications. We study whether SparseLoCo, a low-communication data parallel method based on infrequent synchronization and sparse pseudo-gradient exchange, can be combined with low-bandwidth pipeline model parallelism via activation and activation-gradient compression. We introduce a heterogeneous distributed training framework where some participants host full replicas on high-bandwidth interconnects, while resource-limited participants are grouped to jointly instantiate a replica using pipeline parallelism with subspace-projected inter-stage communication. To make the recently introduced subspace pipeline compression compatible with SparseLoCo, we study a number of adaptations. Across large-scale language modeling experiments (178M-1B parameters) on standard pretraining corpora, we find that activation compression composes with SparseLoCo at modest cost, while selective (heterogeneous) compression consistently improves the loss-communication tradeoff relative to compressing all replicas-especially at aggressive compression ratios. These results suggest a practical path to incorporating low-bandwidth model parallelism and heterogeneous participants into LLM pre-training.


【2】ELLA: Efficient Lifelong Learning for Adapters in Large Language Models
标题:ELLA:大型语言模型中适应者的高效终身学习
链接:https://arxiv.org/abs/2601.02232

作者:Shristi Das Biswas,Yue Zhang,Anwesan Pal,Radhika Bhargava,Kaushik Roy
摘要:大型语言模型(LLM)在连续学习(CL)环境中顺序适应新任务时会遭受严重的灾难性遗忘。现有的方法从根本上是有限的:重放为基础的方法是不切实际的,侵犯隐私,而严格的正交为基础的方法崩溃的规模:每个新的任务被投射到一个正交补,逐步减少剩余的自由度和消除前向传输禁止重叠共享表示。在这项工作中,我们介绍了ELLA,一个训练框架,建立在选择性子空间去相关的原则。ELLA并没有禁止所有重叠,而是明确地描述了过去更新的结构,并惩罚沿着其高能量、特定任务方向的对齐,同时保留低能量剩余子空间中的自由以实现传输。形式上,这是通过一个轻量级的正则化器在一个单一的聚合更新矩阵。我们证明了这种机制对应于各向异性收缩算子的边界干扰,产生的惩罚,无论任务序列长度是内存和计算常数。ELLA不需要数据重放,不需要架构扩展,并且存储量可以忽略不计。根据经验,它在三个流行的基准测试中实现了最先进的CL性能,相对精度增益高达9.6美元,内存占用量减少了35倍。此外,ELLA在架构之间进行了稳健的扩展,并积极增强了模型在看不见的任务上的zero-shot泛化性能,为建设性的终身LLM适应建立了一个原则性和可扩展的解决方案。
摘要:Large Language Models (LLMs) suffer severe catastrophic forgetting when adapted sequentially to new tasks in a continual learning (CL) setting. Existing approaches are fundamentally limited: replay-based methods are impractical and privacy-violating, while strict orthogonality-based methods collapse under scale: each new task is projected onto an orthogonal complement, progressively reducing the residual degrees of freedom and eliminating forward transfer by forbidding overlap in shared representations. In this work, we introduce ELLA, a training framework built on the principle of selective subspace de-correlation. Rather than forbidding all overlap, ELLA explicitly characterizes the structure of past updates and penalizes alignments along their high-energy, task-specific directions, while preserving freedom in the low-energy residual subspaces to enable transfer. Formally, this is realized via a lightweight regularizer on a single aggregated update matrix. We prove this mechanism corresponds to an anisotropic shrinkage operator that bounds interference, yielding a penalty that is both memory- and compute-constant regardless of task sequence length. ELLA requires no data replay, no architectural expansion, and negligible storage. Empirically, it achieves state-of-the-art CL performance on three popular benchmarks, with relative accuracy gains of up to $9.6\%$ and a $35\times$ smaller memory footprint. Further, ELLA scales robustly across architectures and actively enhances the model's zero-shot generalization performance on unseen tasks, establishing a principled and scalable solution for constructive lifelong LLM adaptation.


【3】BiPrompt: Bilateral Prompt Optimization for Visual and Textual Debiasing in Vision-Language Models
标题:BiPrompt:视觉语言模型中视觉和文本去偏置的双边提示优化
链接:https://arxiv.org/abs/2601.02147

作者:Sunny Gupta,Shounak Das,Amit Sethi
备注:Accepted at the AAAI 2026 Workshop AIR-FM, Assessing and Improving Reliability of Foundation Models in the Real World
摘要:视觉语言基础模型,如CLIP表现出令人印象深刻的zero-shot泛化,但仍然容易受到视觉和文本模态之间的虚假相关性。现有的去偏方法通常针对视觉或文本的单一模态,导致分布变化下的部分鲁棒性和不稳定适应。我们提出了一个双边提示优化框架(BiPrompt),同时减轻非因果特征依赖在两种模式在测试时的适应。在视觉方面,它采用结构化的注意力引导擦除来抑制背景激活,并在因果区域和虚假区域之间执行正交预测一致性。在文本方面,它引入了平衡的提示规范化,这是一种可学习的重新居中机制,将类嵌入对齐到各向同性的语义空间。总之,这些模块共同最小化虚假线索和预测之间的条件互信息,将模型转向因果,域不变推理,而无需再训练或域监督。对真实世界和合成偏差基准的广泛评估表明,与先前的测试时间去偏差方法相比,平均和最差组准确度都有了一致的改进,建立了一条轻量级但有效的路径,以实现可信和因果关系的视觉语言适应。
摘要 :Vision language foundation models such as CLIP exhibit impressive zero-shot generalization yet remain vulnerable to spurious correlations across visual and textual modalities. Existing debiasing approaches often address a single modality either visual or textual leading to partial robustness and unstable adaptation under distribution shifts. We propose a bilateral prompt optimization framework (BiPrompt) that simultaneously mitigates non-causal feature reliance in both modalities during test-time adaptation. On the visual side, it employs structured attention-guided erasure to suppress background activations and enforce orthogonal prediction consistency between causal and spurious regions. On the textual side, it introduces balanced prompt normalization, a learnable re-centering mechanism that aligns class embeddings toward an isotropic semantic space. Together, these modules jointly minimize conditional mutual information between spurious cues and predictions, steering the model toward causal, domain invariant reasoning without retraining or domain supervision. Extensive evaluations on real-world and synthetic bias benchmarks demonstrate consistent improvements in both average and worst-group accuracies over prior test-time debiasing methods, establishing a lightweight yet effective path toward trustworthy and causally grounded vision-language adaptation.


【4】MDAgent2: Large Language Model for Code Generation and Knowledge Q&A in Molecular Dynamics
标题:MDAGent 2:分子动力学中代码生成和知识问答的大型语言模型
链接:https://arxiv.org/abs/2601.02075

作者:Zhuofan Shi,Hubao A,Yufei Shao,Mengyan Dai,Yadong Yu,Pan Xiang,Dongliang Huang,Hongxu An,Chunxiao Xin,Haiyang Shen,Zhenyu Wang,Yunshan Na,Gang Huang,Xiang Jing
备注:24 pages,4 figures
摘要:分子动力学(MD)模拟对于理解材料科学中的原子尺度行为至关重要,但编写LAMMPS脚本仍然是高度专业化和耗时的任务。虽然LLM在代码生成和特定领域的问题回答方面表现出了希望,但它们在MD场景中的性能受到稀缺的领域数据、最先进的LLM的高部署成本和低代码可执行性的限制。在我们以前的MDAgent的基础上,我们提出了MDAgent 2,第一个端到端的框架,能够在MD域中执行知识问答和代码生成。我们构建了一个特定于领域的数据构建管道,产生三个高质量的数据集,涵盖MD知识,问题回答和代码生成。基于这些数据集,我们采用三阶段后训练策略-继续预训练(CPT),监督微调(SFT)和强化学习(RL)-训练两个领域适应模型,MD-Instruct和MD-Code。此外,我们还引入了MD-GRPO,这是一种闭环RL方法,它利用模拟结果作为奖励信号,并对低奖励轨迹进行持续改进。我们进一步构建MDAgent 2-RUNTIME,一个可部署的多代理系统,集成了代码生成,执行,评估和自我纠正。结合本文提出的LAMMPS代码生成和问题回答的第一个基准MD-EvalBench,我们的模型和系统实现了超过多个强基线的性能,系统地展示了大型语言模型在工业仿真任务中的适应性和泛化能力,为科学人工智能和工业规模仿真的自动代码生成奠定了方法论基础。网址:https://github.com/FredericVAN/PKU_MDAgent2
摘要:Molecular dynamics (MD) simulations are essential for understanding atomic-scale behaviors in materials science, yet writing LAMMPS scripts remains highly specialized and time-consuming tasks. Although LLMs show promise in code generation and domain-specific question answering, their performance in MD scenarios is limited by scarce domain data, the high deployment cost of state-of-the-art LLMs, and low code executability. Building upon our prior MDAgent, we present MDAgent2, the first end-to-end framework capable of performing both knowledge Q&A and code generation within the MD domain. We construct a domain-specific data-construction pipeline that yields three high-quality datasets spanning MD knowledge, question answering, and code generation. Based on these datasets, we adopt a three stage post-training strategy--continued pre-training (CPT), supervised fine-tuning (SFT), and reinforcement learning (RL)--to train two domain-adapted models, MD-Instruct and MD-Code. Furthermore, we introduce MD-GRPO, a closed-loop RL method that leverages simulation outcomes as reward signals and recycles low-reward trajectories for continual refinement. We further build MDAgent2-RUNTIME, a deployable multi-agent system that integrates code generation, execution, evaluation, and self-correction. Together with MD-EvalBench proposed in this work, the first benchmark for LAMMPS code generation and question answering, our models and system achieve performance surpassing several strong baselines.This work systematically demonstrates the adaptability and generalization capability of large language models in industrial simulation tasks, laying a methodological foundation for automatic code generation in AI for Science and industrial-scale simulations. URL: https://github.com/FredericVAN/PKU_MDAgent2


【5】Output Embedding Centering for Stable LLM Pretraining
标题:输出嵌入定中心以实现稳定的LLM预训练
链接:https://arxiv.org/abs/2601.02031

作者:Felix Stollenwerk,Anna Lokrantz,Niclas Hertzberg
备注:11 pages, 5 figures
摘要:大型语言模型的预训练不仅昂贵,而且容易出现某些训练不稳定性。在训练结束时,大学习率经常出现的一个特定的不稳定性是输出logit发散。最广泛使用的缓解策略,z损失,只是解决了问题的症状,而不是根本原因。本文从输出嵌入几何的角度分析了这种不稳定性,并找出了其原因。在此基础上,我们提出了输出嵌入中心(OEC)作为一种新的缓解策略,并证明了它抑制输出logit发散。OEC可以通过两种不同的方式实现,一种是称为μ-centering的确定性操作,另一种是称为μ-loss的正则化方法。我们的实验表明,这两种变体在训练稳定性和学习率敏感性方面都优于z-loss。特别是,它们确保即使在z损失失败时,训练也会收敛到大的学习率。此外,我们发现μ-损失对正则化超参数调整的敏感性明显低于z-损失。
摘要:Pretraining of large language models is not only expensive but also prone to certain training instabilities. A specific instability that often occurs for large learning rates at the end of training is output logit divergence. The most widely used mitigation strategy, z-loss, merely addresses the symptoms rather than the underlying cause of the problem. In this paper, we analyze the instability from the perspective of the output embeddings' geometry and identify its cause. Based on this, we propose output embedding centering (OEC) as a new mitigation strategy, and prove that it suppresses output logit divergence. OEC can be implemented in two different ways, as a deterministic operation called μ-centering, or a regularization method called μ-loss. Our experiments show that both variants outperform z-loss in terms of training stability and learning rate sensitivity. In particular, they ensure that training converges even for large learning rates when z-loss fails. Furthermore, we find that μ-loss is significantly less sensitive to regularization hyperparameter tuning than z-loss.


【6】Refinement Provenance Inference: Detecting LLM-Refined Training Prompts from Model Behavior
标题:细化起源推理:从模型行为中检测LLM细化训练预算
链接:https://arxiv.org/abs/2601.01966

作者:Bo Yin,Qi Li,Runpeng Yu,Xinchao Wang
摘要:指令调优越来越依赖于基于LLM的提示细化,其中训练语料库中的提示由外部细化器选择性地重写,以提高清晰度和指令对齐。这激发了一个实例级的审计问题:对于一个微调的模型和一个训练的提示-响应对,我们能否推断出模型是在原始提示上训练的,还是在混合语料库中的LLM改进版本上训练的?当训练数据有争议时,这对数据集治理和争议解决至关重要。然而,它在实践中并不平凡:精炼和原始实例在训练语料库中交错,具有未知的、依赖于源的混合比,使得更难以开发跨模型和训练设置进行泛化的起源方法。在本文中,我们将此审计任务形式化为细化出处推断(RPI),并表明即使在语义差异不明显的情况下,及时细化也会在教师强制令牌分布中产生稳定的、可检测的变化。基于这一现象,我们提出了RePro,一个基于logit的出处框架,融合了教师强迫的似然特征与logit排名信号。在训练过程中,RePro通过阴影微调学习可转移的表示,并使用轻量级的线性头来推断看不见的受害者的出处,而无需访问训练数据。从经验上讲,RePro始终获得强大的性能,并在精炼器之间传输良好,这表明它利用了精炼器不可知的分布变化,而不是重写风格的工件。
摘要:Instruction tuning increasingly relies on LLM-based prompt refinement, where prompts in the training corpus are selectively rewritten by an external refiner to improve clarity and instruction alignment. This motivates an instance-level audit problem: for a fine-tuned model and a training prompt-response pair, can we infer whether the model was trained on the original prompt or its LLM-refined version within a mixed corpus? This matters for dataset governance and dispute resolution when training data are contested. However, it is non-trivial in practice: refined and raw instances are interleaved in the training corpus with unknown, source-dependent mixture ratios, making it harder to develop provenance methods that generalize across models and training setups. In this paper, we formalize this audit task as Refinement Provenance Inference (RPI) and show that prompt refinement yields stable, detectable shifts in teacher-forced token distributions, even when semantic differences are not obvious. Building on this phenomenon, we propose RePro, a logit-based provenance framework that fuses teacher-forced likelihood features with logit-ranking signals. During training, RePro learns a transferable representation via shadow fine-tuning, and uses a lightweight linear head to infer provenance on unseen victims without training-data access. Empirically, RePro consistently attains strong performance and transfers well across refiners, suggesting that it exploits refiner-agnostic distribution shifts rather than rewrite-style artifacts.


【7】Safety at One Shot: Patching Fine-Tuned LLMs with A Single Instance
标题 :一次性安全:用单个实例修补微调的LLM
链接:https://arxiv.org/abs/2601.01887

作者:Jiawen Zhang,Lipeng He,Kejia Chen,Jian Lou,Jian Liu,Xiaohu Yang,Ruoxi Jia
摘要:微调安全对齐的大型语言模型(LLM)可能会严重损害其安全性。以前的方法需要许多安全样本或校准集,这不仅会在重新对齐过程中产生显着的计算开销,而且会导致模型实用性的显着下降。相反,这种信念,我们表明,安全对齐可以完全恢复,只有一个单一的安全的例子,而不牺牲效用,以最小的成本。值得注意的是,这种恢复是有效的,无论在微调中使用的有害示例的数量或基础模型的大小,并且在几个时期内实现收敛。此外,我们揭示了安全梯度的低秩结构,这解释了为什么这种有效的校正是可能的。我们在五个安全一致的LLM和多个数据集上验证了我们的发现,证明了我们方法的通用性。
摘要:Fine-tuning safety-aligned large language models (LLMs) can substantially compromise their safety. Previous approaches require many safety samples or calibration sets, which not only incur significant computational overhead during realignment but also lead to noticeable degradation in model utility. Contrary to this belief, we show that safety alignment can be fully recovered with only a single safety example, without sacrificing utility and at minimal cost. Remarkably, this recovery is effective regardless of the number of harmful examples used in fine-tuning or the size of the underlying model, and convergence is achieved within just a few epochs. Furthermore, we uncover the low-rank structure of the safety gradient, which explains why such efficient correction is possible. We validate our findings across five safety-aligned LLMs and multiple datasets, demonstrating the generality of our approach.


【8】Crafting Adversarial Inputs for Large Vision-Language Models Using Black-Box Optimization
标题:使用黑匣子优化为大型视觉语言模型制作对抗输入
链接:https://arxiv.org/abs/2601.01747

作者:Jiwei Guan,Haibo Jin,Haohan Wang
备注:EACL
摘要:大型视觉语言模型(LVLM)的最新进展已经在各种多模态任务中显示出突破性的能力。然而,这些模型仍然容易受到对抗性越狱攻击的影响,在这种攻击中,对手精心制作了微妙的扰动来绕过安全机制并触发有害输出。现有的白盒攻击方法需要完全的模型可访问性,受计算成本的影响,并且表现出不足的对抗性可转移性,使得它们对于现实世界的黑盒设置来说是不切实际的。为了解决这些限制,我们提出了一个黑盒越狱攻击LVLM通过零阶优化使用同步扰动随机逼近(ZO-SPSA)。ZO-SPSA提供了三个关键优势:(i)通过输入-输出交互进行无梯度近似,无需模型知识,(ii)无需代理模型的模型无关优化,以及(iii)降低GPU内存消耗,降低资源需求。我们在三个LVLM上评估了ZO-SPSA,包括InstructBLIP,LLaVA和MiniGPT-4,在InstructBLIP上实现了83.0%的最高越狱成功率,同时保持了与白盒方法相当的不可察觉的扰动。此外,从MiniGPT-4生成的对抗性示例表现出很强的可移植性,ASR达到64.18%。这些发现强调了黑盒越狱在现实世界中的可行性,并暴露了当前LVLM安全机制的关键弱点
摘要:Recent advancements in Large Vision-Language Models (LVLMs) have shown groundbreaking capabilities across diverse multimodal tasks. However, these models remain vulnerable to adversarial jailbreak attacks, where adversaries craft subtle perturbations to bypass safety mechanisms and trigger harmful outputs. Existing white-box attacks methods require full model accessibility, suffer from computing costs and exhibit insufficient adversarial transferability, making them impractical for real-world, black-box settings. To address these limitations, we propose a black-box jailbreak attack on LVLMs via Zeroth-Order optimization using Simultaneous Perturbation Stochastic Approximation (ZO-SPSA). ZO-SPSA provides three key advantages: (i) gradient-free approximation by input-output interactions without requiring model knowledge, (ii) model-agnostic optimization without the surrogate model and (iii) lower resource requirements with reduced GPU memory consumption. We evaluate ZO-SPSA on three LVLMs, including InstructBLIP, LLaVA and MiniGPT-4, achieving the highest jailbreak success rate of 83.0% on InstructBLIP, while maintaining imperceptible perturbations comparable to white-box methods. Moreover, adversarial examples generated from MiniGPT-4 exhibit strong transferability to other LVLMs, with ASR reaching 64.18%. These findings underscore the real-world feasibility of black-box jailbreaks and expose critical weaknesses in the safety mechanisms of current LVLMs


【9】The Two-Stage Decision-Sampling Hypothesis: Understanding the Emergence of Self-Reflection in RL-Trained LLMs
标题:两阶段决策抽样假设:了解RL训练的LLM中自我反思的出现
链接:https://arxiv.org/abs/2601.01580

作者:Zibo Zhao,Yuanting Zha,Haipeng Zhang,Xingcheng Xu
摘要:在RL后训练之后,大型语言模型中出现了自我反思能力,多轮RL比SFT同行取得了实质性的进步。然而,统一的优化目标如何产生生成解决方案和评估何时修改它们的功能上不同的能力的机制仍然不透明。为了解决这个问题,我们引入了梯度属性来描述奖励梯度如何在策略组件之间分布,通过两阶段决策采样(DS)假设来形式化,该假设将策略分解为用于生成的采样($π_{sample}$)和用于验证的决策($π_{d}$)。我们证明了代理奖励表现出平衡梯度属性,而SFT和KL惩罚表现出不平衡梯度属性,长度加权创建非对称正则化,约束$π_{sample}$,而留下$π_{d}$欠优化,提供了一个理论解释为什么RL成功而SFT失败。我们还实证验证了我们对算术推理的理论预测表明,RL的优越的泛化主要源于改进的决策($π_{d}$),而不是采样能力,为思维模型中的自我校正提供了第一原理的机械解释。
摘要:Self-reflection capabilities emerge in Large Language Models after RL post-training, with multi-turn RL achieving substantial gains over SFT counterparts. Yet the mechanism of how a unified optimization objective gives rise to functionally distinct capabilities of generating solutions and evaluating when to revise them remains opaque. To address this question, we introduce the Gradient Attribution Property to characterize how reward gradients distribute across policy components, formalized through the Two-Stage Decision-Sampling (DS) Hypothesis, which decomposes the policy into sampling ($π_{sample}$) for generation and decision ($π_{d}$) for verification. We prove that surrogate rewards exhibit Balanced Gradient Attribution, while SFT and KL penalties exhibit Unbalanced Gradient Attribution, with length-weighting creating asymmetric regularization that constrains $π_{sample}$ while leaving $π_{d}$ under-optimized, providing an theoretical explanation of why RL succeeds where SFT fails. We also empirically validate our theoretical predictions on arithmetic reasoning demonstrates that RL's superior generalization stems primarily from improved decision-making ($π_{d}$) rather than sampling capabilities, providing a first-principles mechanistic explanation for self-correction in thinking models.


【10】Bayesian Subspace Gradient Estimation for Zeroth-Order Optimization of Large Language Models
标题:大型语言模型零阶优化的Bayesian子空间梯度估计
链接:https://arxiv.org/abs/2601.01452

作者:Jian Feng,Zhihong Huang
备注:19 pages, 1 figures, 4 tables
摘要:使用零阶(ZO)优化微调大型语言模型(LLM)通过函数求值近似梯度来减少内存,但现有方法依赖于随机扰动的一步梯度估计。我们介绍贝叶斯子空间零阶优化(BSZO),ZO优化器,适用于卡尔曼滤波结合有限差分信息跨多个扰动方向。通过将每个有限差分测量视为噪声观测,BSZO在投影梯度上构建后验分布,并通过贝叶斯推理更新它,并使用基于残差的自适应机制来调整扰动尺度。理论分析表明,BSZO方法比标准ZO方法的收敛速度提高了$k/γ$。在RoBERTa、Mistral和OPT模型上的实验表明,BSZO在各种任务上的表现优于MeZO、MeZO-Adam和HiZOO,在OPT-13 B上实现了高达6.67%的绝对平均改进,同时保持内存使用接近仅推理基线(MeZO的1.00 $\times $--1.08$\times $)。
摘要 :Fine-tuning large language models (LLMs) with zeroth-order (ZO) optimization reduces memory by approximating gradients through function evaluations, but existing methods rely on one-step gradient estimates from random perturbations. We introduce Bayesian Subspace Zeroth-Order optimization (BSZO), a ZO optimizer that applies Kalman filtering to combine finite-difference information across multiple perturbation directions. By treating each finite-difference measurement as a noisy observation, BSZO builds a posterior distribution over the projected gradient and updates it through Bayesian inference, with a residual-based adaptive mechanism to adjust perturbation scales. Theoretical analysis shows that BSZO improves the convergence rate by a factor of $k/γ$ compared to standard ZO methods. Experiments on RoBERTa, Mistral, and OPT models show that BSZO outperforms MeZO, MeZO-Adam, and HiZOO across various tasks, achieving up to 6.67\% absolute average improvement on OPT-13B while keeping memory usage close to inference-only baselines (1.00$\times$--1.08$\times$ of MeZO).


【11】LANCET: Neural Intervention via Structural Entropy for Mitigating Faithfulness Hallucinations in LLMs
标题:LANCET:通过结构熵进行神经干预以减轻LLM的忠诚幻觉
链接:https://arxiv.org/abs/2601.01401

作者:Chenxu Wang,Chaozhuo Li,Pengbo Wang,Litian Zhang,Songyang Liu,Ji Qi,Jiahui Hu,Yushan Cai,Hao Zhao,Rui Pu
摘要:大型语言模型已经彻底改变了信息处理,但它们的可靠性受到忠实幻觉的严重影响。虽然目前的方法试图通过节点级调整或粗抑制来缓解这个问题,但它们往往忽略了神经信息的分布式性质,导致不精确的干预。认识到幻觉通过特定的前向传播途径传播,就像感染一样,我们的目标是使用精确的结构分析手术阻断这种流动。为了利用这一点,我们提出了Lancet,这是一种新的框架,通过利用结构熵和幻觉差异比率来实现精确的神经干预。Lancet首先通过梯度驱动的对比分析来定位幻觉倾向神经元,然后通过最小化结构熵来映射它们的传播路径,最后实现一种保留一般模型能力的分层干预策略。对幻觉基准数据集的综合评估表明,Lancet的表现明显优于最先进的方法,验证了我们神经干预手术方法的有效性。
摘要:Large Language Models have revolutionized information processing, yet their reliability is severely compromised by faithfulness hallucinations. While current approaches attempt to mitigate this issue through node-level adjustments or coarse suppression, they often overlook the distributed nature of neural information, leading to imprecise interventions. Recognizing that hallucinations propagate through specific forward transmission pathways like an infection, we aim to surgically block this flow using precise structural analysis. To leverage this, we propose Lancet, a novel framework that achieves precise neural intervention by leveraging structural entropy and hallucination difference ratios. Lancet first locates hallucination-prone neurons via gradient-driven contrastive analysis, then maps their propagation pathways by minimizing structural entropy, and finally implements a hierarchical intervention strategy that preserves general model capabilities. Comprehensive evaluations across hallucination benchmark datasets demonstrate that Lancet significantly outperforms state-of-the-art methods, validating the effectiveness of our surgical approach to neural intervention.


【12】Investigating the Multilingual Calibration Effects of Language Model Instruction-Tuning
标题:研究语言模型教学调整的多语言校准效果
链接:https://arxiv.org/abs/2601.01362

作者:Jerry Huang,Peng Lu,Qiuhao Zeng,Yusuke Iwasawa,Yutaka Matsuo,Sarath Chandar,Edison Marrese-Taylor,Irene Li
备注:Accepted to The 19th Conference of the European Chapter of the Association for Computational Linguistics (EACL)
摘要:确保深度学习模型在预测不确定性方面得到良好的校准对于维持其可信度和可靠性至关重要,然而,尽管基础模型研究取得了越来越多的进展,但此类大型语言模型(LLM)及其校准之间的关系仍然是一个开放的研究领域。在这项工作中,我们研究了多语言环境中LLM校准的关键差距,试图更好地了解数据稀缺如何可能导致不同的校准效果,以及常用技术如何应用于这些环境。我们对两种多语言基准测试(分别超过29种和42种语言)的分析表明,即使在低资源语言中,在对高资源语言SFT数据集进行预调优后,模型置信度也可以显着增加。然而,准确性的提高是微不足道的或根本不存在的,导致错误的校准,突出了标准SFT的多语言的一个关键缺陷。此外,我们观察到使用标签平滑是一种合理的方法,可以缓解这种担忧,再次不需要任何低资源SFT数据,在所有语言中保持更好的校准。总的来说,这突出了多语言考虑对培训和调整LLM的重要性,以提高其在下游使用中的可靠性和公平性。
摘要:Ensuring that deep learning models are well-calibrated in terms of their predictive uncertainty is essential in maintaining their trustworthiness and reliability, yet despite increasing advances in foundation model research, the relationship between such large language models (LLMs) and their calibration remains an open area of research. In this work, we look at a critical gap in the calibration of LLMs within multilingual settings, in an attempt to better understand how the data scarcity can potentially lead to different calibration effects and how commonly used techniques can apply in these settings. Our analysis on two multilingual benchmarks, over 29 and 42 languages respectively, reveals that even in low-resource languages, model confidence can increase significantly after instruction-tuning on high-resource language SFT datasets. However, improvements in accuracy are marginal or non-existent, resulting in mis-calibration, highlighting a critical shortcoming of standard SFT for multilingual languages. Furthermore, we observe that the use of label smoothing to be a reasonable method alleviate this concern, again without any need for low-resource SFT data, maintaining better calibration across all languages. Overall, this highlights the importance of multilingual considerations for both training and tuning LLMs in order to improve their reliability and fairness in downstream use.


【13】Towards LLM-enabled autonomous combustion research: A literature-aware agent for self-corrective modeling workflows
标题:迈向LLM支持的自主燃烧研究:用于自我纠正建模工作流程的文献感知代理
链接:https://arxiv.org/abs/2601.01357

作者:Ke Xiao,Haoze Zhang,Runze Mao,Han Li,Zhi X. Chen
摘要:大型语言模型(LLM)的快速发展正在将人工智能转变为自主研究伙伴,但在燃烧建模等复杂科学领域仍然存在关键差距。在这里,实用的人工智能辅助需要将领域文献知识与专业知识密集型工具(如计算流体动力学(CFD)代码)的强大执行能力无缝集成。为了弥合这一差距,我们引入了FlamePilot,这是一个LLM代理,旨在通过自动化和自我纠正的CFD工作流程来增强燃烧建模研究。FlamePilot通过一个利用原子工具的架构来区分自己,以确保在OpenFOAM和DeepFlame等扩展框架中进行复杂模拟的强大设置和执行。该系统还能够从科学文章中学习,提取关键信息以指导模拟从初始设置到优化结果。在公共基准测试中的验证显示,FlamePilot获得了完美的1.0可执行性分数和0.438的成功率,分别超过了之前报告的最佳代理分数0.625和0.250。此外,对中等或强烈低氧稀释(MILD)燃烧模拟的详细案例研究证明了其作为协作研究副驾驶员的有效性,其中FlamePilot自主地将研究论文转化为配置的模拟,进行模拟,后处理结果,提出基于证据的改进,并在最少的人为干预下管理多步参数研究以收敛。通过采用透明和可解释的范式,FlamePilot为AI授权的燃烧建模建立了一个基础框架,促进了合作伙伴关系,其中代理管理工作流程编排,使研究人员能够进行高级分析。
摘要 :The rapid evolution of large language models (LLMs) is transforming artificial intelligence into autonomous research partners, yet a critical gap persists in complex scientific domains such as combustion modeling. Here, practical AI assistance requires the seamless integration of domain literature knowledge with robust execution capabilities for expertise-intensive tools such as computational fluid dynamics (CFD) codes. To bridge this gap, we introduce FlamePilot, an LLM agent designed to empower combustion modeling research through automated and self-corrective CFD workflows. FlamePilot differentiates itself through an architecture that leverages atomic tools to ensure the robust setup and execution of complex simulations in both OpenFOAM and extended frameworks such as DeepFlame. The system is also capable of learning from scientific articles, extracting key information to guide the simulation from initial setup to optimized results. Validation on a public benchmark shows FlamePilot achieved a perfect 1.0 executability score and a 0.438 success rate, surpassing the prior best reported agent scores of 0.625 and 0.250, respectively. Furthermore, a detailed case study on Moderate or Intense Low-oxygen Dilution (MILD) combustion simulation demonstrates its efficacy as a collaborative research copilot, where FlamePilot autonomously translated a research paper into a configured simulation, conducted the simulation, post-processed the results, proposed evidence-based refinements, and managed a multi-step parameter study to convergence under minimal human intervention. By adopting a transparent and interpretable paradigm, FlamePilot establishes a foundational framework for AI-empowered combustion modeling, fostering a collaborative partnership where the agent manages workflow orchestration, freeing the researcher for high-level analysis.


【14】Making MoE based LLM inference resilient with Tarragon
标题:使用Tarragon使基于MoE的LLM推理具有弹性
链接:https://arxiv.org/abs/2601.01310

作者:Songyu Zhang,Aaron Tam,Myungjin Lee,Shixiong Qi,K. K. Ramakrishnan
摘要:混合专家(MoE)模型越来越多地用于大规模服务LLM,但随着部署规模的增长,故障变得常见。现有的系统表现出很差的故障恢复能力:即使是一个单一的工人故障也会触发粗粒度的服务范围的重启,丢弃累积的进度并在恢复期间停止整个推理管道-这种方法显然不适合延迟敏感的LLM服务。   我们提出了龙蒿,一个有弹性的MoE推理框架,将故障的影响限制在单个工人,同时允许管道的其余部分继续取得进展。龙蒿利用自然分离的注意力和专家计算在基于MoE的Transformers,治疗注意力工人(AW)和专家工人(EW)作为不同的故障域。Tarragon引入了一个可重构的数据路径,通过将请求重新路由到健康的工人来掩盖故障。在此数据路径之上,Tarragon实现了一种自我修复机制,该机制放松了现有MoE框架的紧密同步执行。对于有状态的AW,Tarragon执行异步、增量KV缓存检查点,并按请求恢复,对于无状态的EW,它利用剩余的GPU内存来部署影子专家。这些共同保持恢复成本和重新计算开销极低。我们的评估表明,与最先进的MegaScale-Infer相比,Tarragon将故障引起的失速减少了160- 213倍(从~64 s降至0.3-0.4 s),同时在没有故障发生时保持性能。
摘要:Mixture-of-Experts (MoE) models are increasingly used to serve LLMs at scale, but failures become common as deployment scale grows. Existing systems exhibit poor failure resilience: even a single worker failure triggers a coarse-grained, service-wide restart, discarding accumulated progress and halting the entire inference pipeline during recovery--an approach clearly ill-suited for latency-sensitive, LLM services.   We present Tarragon, a resilient MoE inference framework that confines the failures impact to individual workers while allowing the rest of the pipeline to continue making forward progress. Tarragon exploits the natural separation between the attention and expert computation in MoE-based transformers, treating attention workers (AWs) and expert workers (EWs) as distinct failure domains. Tarragon introduces a reconfigurable datapath to mask failures by rerouting requests to healthy workers. On top of this datapath, Tarragon implements a self-healing mechanism that relaxes the tightly synchronized execution of existing MoE frameworks. For stateful AWs, Tarragon performs asynchronous, incremental KV cache checkpointing with per-request restoration, and for stateless EWs, it leverages residual GPU memory to deploy shadow experts. These together keep recovery cost and recomputation overhead extremely low. Our evaluation shows that, compared to state-of-the-art MegaScale-Infer, Tarragon reduces failure-induced stalls by 160-213x (from ~64 s down to 0.3-0.4 s) while preserving performance when no failures occur.


【15】Aggressive Compression Enables LLM Weight Theft
标题:激进的压缩导致LLM体重盗窃
链接:https://arxiv.org/abs/2601.01296

作者:Davis Brown,Juan-Pablo Rivera,Dan Hendrycks,Mantas Mazeika
备注:An early version of this work was presented at the SoLAR Workshop at NeurIPS 2024
摘要:随着前沿AI变得越来越强大,开发成本越来越高,对手越来越有动机通过实施渗透攻击来窃取模型权重。在这项工作中,我们考虑渗透攻击,其中对手试图通过网络从数据中心窃取模型权重。虽然渗透攻击是多步骤的网络攻击,但我们证明了一个因素,即模型权重的可压缩性,显着提高了大型语言模型(LLM)的渗透风险。我们通过放松解压缩约束来定制专门用于渗透的压缩,并证明攻击者可以以最小的权衡实现16倍到100倍的压缩,从而减少攻击者从防御者的服务器非法传输模型权重所需的时间。最后,我们研究了旨在通过三种不同方式降低泄露风险的防御措施:使模型更难压缩,使它们更难“找到”,以及使用取证水印跟踪来源以进行攻击后分析。虽然所有的防御都是有希望的,但取证水印防御既有效又便宜,因此是减轻重量泄漏风险的一个特别有吸引力的杠杆。
摘要:As frontier AIs become more powerful and costly to develop, adversaries have increasing incentives to steal model weights by mounting exfiltration attacks. In this work, we consider exfiltration attacks where an adversary attempts to sneak model weights out of a datacenter over a network. While exfiltration attacks are multi-step cyber attacks, we demonstrate that a single factor, the compressibility of model weights, significantly heightens exfiltration risk for large language models (LLMs). We tailor compression specifically for exfiltration by relaxing decompression constraints and demonstrate that attackers could achieve 16x to 100x compression with minimal trade-offs, reducing the time it would take for an attacker to illicitly transmit model weights from the defender's server from months to days. Finally, we study defenses designed to reduce exfiltration risk in three distinct ways: making models harder to compress, making them harder to 'find,' and tracking provenance for post-attack analysis using forensic watermarks. While all defenses are promising, the forensic watermark defense is both effective and cheap, and therefore is a particularly attractive lever for mitigating weight-exfiltration risk.


【16】Bridging the Semantic Gap for Categorical Data Clustering via Large Language Models
标题:通过大型语言模型弥合分类数据集群的语义差距
链接:https://arxiv.org/abs/2601.01162

作者:Zihua Yang,Xin Liao,Yiqun Zhang,Yiu-ming Cheung
备注:Submitted to ICPR 2026
摘要:分类数据在医疗保健、营销和生物信息学等领域很普遍,聚类是模式发现的基本工具。分类数据聚类的一个核心挑战在于度量缺乏内在排序或距离的属性值之间的相似性。如果没有适当的相似性度量,值通常被视为等距的,从而产生模糊潜在结构并降低聚类质量的语义间隙。虽然现有的方法从数据集内的共现模式推断值关系,但当样本有限时,这种推断变得不可靠,从而使数据的语义上下文未被充分探索。为了弥合这一差距,我们提出了ARISE(集成语义嵌入的注意力加权表示),它利用大型语言模型(LLM)的外部语义知识来构建语义感知表示,以补充分类数据的度量空间,从而实现准确的聚类。也就是说,采用LLM来描述表示增强的属性值,并将LLM增强的嵌入与原始数据相结合,以探索语义突出的聚类。在八个基准数据集上的实验表明,与七个代表性的对应数据集相比,性能得到了一致的改善,增益为19- 27%。代码可在https://github.com/develop-yang/ARISE上获得
摘要:Categorical data are prevalent in domains such as healthcare, marketing, and bioinformatics, where clustering serves as a fundamental tool for pattern discovery. A core challenge in categorical data clustering lies in measuring similarity among attribute values that lack inherent ordering or distance. Without appropriate similarity measures, values are often treated as equidistant, creating a semantic gap that obscures latent structures and degrades clustering quality. Although existing methods infer value relationships from within-dataset co-occurrence patterns, such inference becomes unreliable when samples are limited, leaving the semantic context of the data underexplored. To bridge this gap, we present ARISE (Attention-weighted Representation with Integrated Semantic Embeddings), which draws on external semantic knowledge from Large Language Models (LLMs) to construct semantic-aware representations that complement the metric space of categorical data for accurate clustering. That is, LLM is adopted to describe attribute values for representation enhancement, and the LLM-enhanced embeddings are combined with the original data to explore semantically prominent clusters. Experiments on eight benchmark datasets demonstrate consistent improvements over seven representative counterparts, with gains of 19-27%. Code is available at https://github.com/develop-yang/ARISE


【17】RovoDev Code Reviewer: A Large-Scale Online Evaluation of LLM-based Code Review Automation at Atlassian
标题:RovoDev代码审查器:Atlassian基于LLM的代码审查自动化的大规模在线评估
链接:https://arxiv.org/abs/2601.01129

作者:Kla Tantithamthavorn,Yaotian Zou,Andy Wong,Michael Gupta,Zhe Wang,Mike Buller,Ryan Jiang,Matthew Watson,Minwoo Jeong,Kun Chen,Ming Wu
备注:Accepted at the 48th International Conference on Software Engineering (ICSE'26), SEIP Track. 12 Pages
摘要:大型语言模型(LLM)驱动的代码审查自动化有可能改变代码审查工作流程。尽管LLM支持的代码审查注释生成方法取得了进步,但设计企业级代码审查自动化工具仍然存在一些实际挑战。特别是,本文旨在回答实际问题:我们如何设计一个审查指导,上下文感知,质量检查代码审查评论生成没有微调?   在本文中,我们介绍了RovoDev Code Reviewer,这是一个基于LLM的企业级代码审查自动化工具,在Atlassian的开发生态系统中大规模设计和部署,并无缝集成到Atlassian的Bitbucket中。通过一年时间内的离线、在线、用户反馈评估,我们得出结论:RovoDev Code Reviewer(1)在生成代码审查评论方面有效,可导致38.70%的代码解决率(即,在后续提交中触发代码改变的注释);以及(2)提供加速反馈周期的承诺(即,减少PR周期时间30.8%),减轻审查员工作量(即,减少了35.6%的人工撰写评论的数量),并提高了整体软件质量(即,用可操作的建议发现错误)。
摘要:Large Language Models (LLMs)-powered code review automation has the potential to transform code review workflows. Despite the advances of LLM-powered code review comment generation approaches, several practical challenges remain for designing enterprise-grade code review automation tools. In particular, this paper aims at answering the practical question: how can we design a review-guided, context-aware, quality-checked code review comment generation without fine-tuning?   In this paper, we present RovoDev Code Reviewer, an enterprise-grade LLM-based code review automation tool designed and deployed at scale within Atlassian's development ecosystem with seamless integration into Atlassian's Bitbucket. Through the offline, online, user feedback evaluations over a one-year period, we conclude that RovoDev Code Reviewer is (1) effective in generating code review comments that could lead to code resolution for 38.70% (i.e., comments that triggered code changes in the subsequent commits); and (2) offers the promise of accelerating feedback cycles (i.e., decreasing the PR cycle time by 30.8%), alleviating reviewer workload (i.e., reducing the number of human-written comments by 35.6%), and improving overall software quality (i.e., finding errors with actionable suggestions).


【18】NarrativeTrack: Evaluating Video Language Models Beyond the Frame
标题:NarrativeTrack:评估框架之外的视频语言模型
链接:https://arxiv.org/abs/2601.01095

作者:Hyeonjeong Ha,Jinjin Ge,Bo Feng,Kaixin Ma,Gargi Chakraborty
备注:VideoLLM Fine-Grained Evaluation
摘要:多模态大型语言模型(MLLM)在视觉语言推理方面取得了令人印象深刻的进展,但它们理解视频中时间展开叙事的能力仍然没有得到充分的探索。真正的叙事理解需要了解谁在做什么,何时何地,在动态的视觉和时间背景下保持连贯的实体表示。我们介绍NarrativeTrack,第一个基准评估叙事理解MLLM通过细粒度的实体为中心的推理。与现有的基准局限于短剪辑或粗糙的场景级语义不同,我们将视频分解为组成实体,并通过组合推理进展(CRP)检查其连续性,这是一种结构化的评估框架,可以在三个维度上逐步增加叙事复杂性:实体存在,实体变化和实体模糊性。CRP挑战模型从时间持久性发展到上下文进化和细粒度感知推理。一个完全自动化的以实体为中心的流水线可以实现时间接地实体表示的可扩展提取,为CRP提供基础。对最先进的MLLM的评估表明,模型无法在视觉过渡和时间动态中鲁棒地跟踪实体,经常在上下文变化下产生幻觉。开源的通用MLLM表现出很强的感知接地,但弱的时间连贯性,而视频特定的MLLM捕捉时间的上下文,但幻觉实体的上下文。这些发现揭示了感知接地和时间推理之间的根本权衡,表明叙事理解出现只有从他们的整合。NarrativeTrack提供了第一个系统的框架来诊断和推进MLLM中的时间接地叙事理解。
摘要:Multimodal large language models (MLLMs) have achieved impressive progress in vision-language reasoning, yet their ability to understand temporally unfolding narratives in videos remains underexplored. True narrative understanding requires grounding who is doing what, when, and where, maintaining coherent entity representations across dynamic visual and temporal contexts. We introduce NarrativeTrack, the first benchmark to evaluate narrative understanding in MLLMs through fine-grained entity-centric reasoning. Unlike existing benchmarks limited to short clips or coarse scene-level semantics, we decompose videos into constituent entities and examine their continuity via a Compositional Reasoning Progression (CRP), a structured evaluation framework that progressively increases narrative complexity across three dimensions: entity existence, entity changes, and entity ambiguity. CRP challenges models to advance from temporal persistence to contextual evolution and fine-grained perceptual reasoning. A fully automated entity-centric pipeline enables scalable extraction of temporally grounded entity representations, providing the foundation for CRP. Evaluations of state-of-the-art MLLMs reveal that models fail to robustly track entities across visual transitions and temporal dynamics, often hallucinating identity under context shifts. Open-source general-purpose MLLMs exhibit strong perceptual grounding but weak temporal coherence, while video-specific MLLMs capture temporal context yet hallucinate entity's contexts. These findings uncover a fundamental trade-off between perceptual grounding and temporal reasoning, indicating that narrative understanding emerges only from their integration. NarrativeTrack provides the first systematic framework to diagnose and advance temporally grounded narrative comprehension in MLLMs.


【19】SPoRC-VIST: A Benchmark for Evaluating Generative Natural Narrative in Vision-Language Models
标题:SPoRC-VIST:视觉语言模型中评估生成性自然叙事的基准
链接:https://arxiv.org/abs/2601.01062

作者:Yunlin Zeng
备注:14 pages, 3 figures. Accepted to WVAQ 2026, WACV 2026
摘要:视觉语言模型(VLM)在图像字幕和视觉问答(VQA)等描述性任务中取得了显着的成功。然而,他们产生引人入胜的长篇叙事的能力-特别是多扬声器播客对话-仍然没有得到充分探索,难以评估。像BLEU和ROUGE这样的标准指标无法捕捉到对话自然性、个性和叙事流程的细微差别,往往会奖励安全、重复的输出,而不是引人入胜的故事。在这项工作中,我们提出了一种用于端到端视觉播客生成的新管道,并在4,000个图像对话对的策展数据集上微调Qwen 3-VL-32 B模型。至关重要的是,我们使用合成到真实的训练策略:我们对来自结构化播客研究语料库(SPoRC)的高质量播客对话进行训练,并与合成生成的图像配对,并对来自视觉故事数据集(VIST)的真实世界照片序列进行评估。这种严格的设置测试了模型从合成训练数据推广到现实世界视觉领域的能力。我们提出了一个超越文本重叠的综合评估框架,并使用AI作为评判(Gemini 3 Pro,Claude Opus 4.5,GPT 5.2)和新颖的风格指标(平均回合长度,扬声器切换率)来评估质量。我们的实验表明,我们的微调32 B模型显着优于235 B的基础模型在会话的自然性($> 80\%的胜率)和叙事深度(+50\ %的回合长度),同时保持相同的视觉接地能力(CLIPScore:20.39)。
摘要:Vision-Language Models (VLMs) have achieved remarkable success in descriptive tasks such as image captioning and visual question answering (VQA). However, their ability to generate engaging, long-form narratives -- specifically multi-speaker podcast dialogues -- remains under-explored and difficult to evaluate. Standard metrics like BLEU and ROUGE fail to capture the nuances of conversational naturalness, personality, and narrative flow, often rewarding safe, repetitive outputs over engaging storytelling. In this work, we present a novel pipeline for end-to-end visual podcast generation, and fine-tune a Qwen3-VL-32B model on a curated dataset of 4,000 image-dialogue pairs. Crucially, we use a synthetic-to-real training strategy: we train on high-quality podcast dialogues from the Structured Podcast Research Corpus (SPoRC) paired with synthetically generated imagery, and evaluate on real-world photo sequences from the Visual Storytelling Dataset (VIST). This rigorous setup tests the model's ability to generalize from synthetic training data to real-world visual domains. We propose a comprehensive evaluation framework that moves beyond textual overlap, and use AI-as-a-judge (Gemini 3 Pro, Claude Opus 4.5, GPT 5.2) and novel style metrics (average turn length, speaker switch rate) to assess quality. Our experiments demonstrate that our fine-tuned 32B model significantly outperforms a 235B base model in conversational naturalness ($>$80\% win rate) and narrative depth (+50\% turn length), while maintaining identical visual grounding capabilities (CLIPScore: 20.39).


【20】Reliability Under Randomness: An Empirical Analysis of Sparse and Dense Language Models Across Decoding Temperatures
标题:随机性下的可靠性:跨解码温度的稀疏和密集语言模型的实证分析
链接:https://arxiv.org/abs/2601.00942

作者:Kabir Grover
摘要:稀疏混合专家(MoE)架构在大型语言模型中的日益普及提出了关于随机解码下其可靠性的重要问题。虽然条件计算能够大幅提高计算效率,但仍不清楚稀疏路由和基于温度的采样之间的相互作用是否会损害相对于密集架构的输出稳定性。这项工作调查是否有条件的计算在MoE模型放大解码引起的随机性,导致可靠性降低,温度升高。我们评估了三个代表性的模型:OLMoE-7 B(稀疏基),Mixtral-8x 7 B(稀疏调整)和Qwen2.5-3B(密集调整)确定性算术推理任务与客观可验证的答案。实验跨越四种解码配置,范围从贪婪解码到T=1.0。我们的评估包括准确性、格式合规性、重复生成的输出一致性和置信度指标,共计9,360个模型生成。结果表明,稀疏稀疏调制调谐模型在所有解码温度下表现出与密集调制调谐模型相当的稳定性,而稀疏基模型随着温度的增加而表现出系统性退化。这些研究结果表明,指令调整,而不是建筑稀疏,是鲁棒性的主要决定因素解码随机性确定性任务。我们讨论了这些结果的影响,部署稀疏语言模型在可靠性关键的应用程序,突出的情况下,稀疏架构可以安全地采用,而不牺牲输出稳定性。
摘要:The increasing prevalence of sparse Mixture-of-Experts (MoE) architectures in large language models raises important questions regarding their reliability under stochastic decoding. While conditional computation enables substantial gains in computational efficiency, it remains unclear whether the interaction between sparse routing and temperature-based sampling compromises output stability relative to dense architectures. This work investigates whether conditional computation in MoE models amplifies decoding-induced randomness, leading to reduced reliability as temperature increases. We evaluate three representative models: OLMoE-7B (sparse base), Mixtral-8x7B (sparse instruction-tuned), and Qwen2.5-3B (dense instruction-tuned) on deterministic arithmetic reasoning tasks with objectively verifiable answers. Experiments span four decoding configurations, ranging from greedy decoding to T=1.0. Our evaluation encompasses accuracy, format compliance, output consistency across repeated generations, and confidence metrics, totaling 9,360 model generations. Results demonstrate that the sparse instruction-tuned model exhibits stability comparable to the dense instruction-tuned model across all decoding temperatures, while the sparse base model shows systematic degradation as temperature increases. These findings indicate that instruction tuning, rather than architectural sparsity, is the primary determinant of robustness to decoding randomness on deterministic tasks. We discuss the implications of these results for deploying sparse language models in reliability-critical applications, highlighting scenarios in which sparse architectures can be safely adopted without sacrificing output stability.


【21】LLMize: A Framework for Large Language Model-Based Numerical Optimization
标题:LLMize:基于大型语言模型的数值优化框架
链接:https://arxiv.org/abs/2601.00874

作者:M. Rizki Oktavian
摘要:大型语言模型(LLM)最近显示出超越传统语言任务的强大推理能力,促使它们用于数值优化。本文介绍了LLMize,这是一个开源Python框架,通过迭代提示和上下文学习实现了LLM驱动的优化。LLMize将优化制定为一个黑盒过程,其中候选解决方案以自然语言生成,由外部目标函数评估,并使用解决方案分数反馈在连续迭代中进行优化。该框架支持多种优化策略,包括优化算法(OPRO)和受进化算法和模拟退火启发的基于混合LLM的方法。LLMize的一个关键优势是能够直接通过自然语言描述注入约束、规则和领域知识,允许从业者定义复杂的优化问题,而不需要数学编程或元启发式设计方面的专业知识。LLMize在凸优化,线性规划,旅行商问题,神经网络超参数调整和核燃料晶格优化方面进行了评估。结果表明,虽然LLM为基础的优化是没有竞争力的经典解决简单的问题,它提供了一个实用的和可访问的方法,复杂的,特定领域的任务,约束和算法是难以形式化。
摘要:Large language models (LLMs) have recently shown strong reasoning capabilities beyond traditional language tasks, motivating their use for numerical optimization. This paper presents LLMize, an open-source Python framework that enables LLM-driven optimization through iterative prompting and in-context learning. LLMize formulates optimization as a black-box process in which candidate solutions are generated in natural language, evaluated by an external objective function, and refined over successive iterations using solution-score feedback. The framework supports multiple optimization strategies, including Optimization by Prompting (OPRO) and hybrid LLM-based methods inspired by evolutionary algorithms and simulated annealing. A key advantage of LLMize is the ability to inject constraints, rules, and domain knowledge directly through natural language descriptions, allowing practitioners to define complex optimization problems without requiring expertise in mathematical programming or metaheuristic design. LLMize is evaluated on convex optimization, linear programming, the Traveling Salesman Problem, neural network hyperparameter tuning, and nuclear fuel lattice optimization. Results show that while LLM-based optimization is not competitive with classical solvers for simple problems, it provides a practical and accessible approach for complex, domain-specific tasks where constraints and heuristics are difficult to formalize.


【22】Towards Long-window Anchoring in Vision-Language Model Distillation
标题:视觉语言模型蒸馏中的长窗口锚定
链接:https://arxiv.org/abs/2512.21576

作者:Haoyi Zhou,Shuo Li,Tianyu Chen,Qi Song,Chonghan Gao,Jianxin Li
备注:Accepted by AAAI 2026
摘要:虽然大型视觉语言模型(VLM)表现出强大的长期上下文理解,但它们普遍存在的小分支在有限的窗口大小下无法进行语言学-摄影对齐。我们发现,知识蒸馏提高了学生的能力,作为补充旋转位置嵌入(RoPE)的窗口大小(锚定从大模型)。基于这一认识,我们提出了LAid,它直接旨在通过两个互补的组件转移远程注意力机制:(1)渐进的距离加权注意力匹配,在训练期间动态强调较长的位置差异,以及(2)可学习的RoPE响应增益调制,在需要时选择性地放大位置灵敏度。跨多个模型系列的广泛实验表明,LAid蒸馏模型实现了高达3.2倍的有效上下文窗口相比,基线小模型,同时保持或提高标准VL基准的性能。频谱分析还表明,LAid成功地保留了传统方法无法转移的关键低频注意力成分。我们的工作不仅为构建更有效的长上下文VLM提供了实用技术,而且还为位置理解如何在蒸馏过程中出现和转移提供了理论见解。
摘要:While large vision-language models (VLMs) demonstrate strong long-context understanding, their prevalent small branches fail on linguistics-photography alignment for a limited window size. We discover that knowledge distillation improves students' capability as a complement to Rotary Position Embeddings (RoPE) on window sizes (anchored from large models). Building on this insight, we propose LAid, which directly aims at the transfer of long-range attention mechanisms through two complementary components: (1) a progressive distance-weighted attention matching that dynamically emphasizes longer position differences during training, and (2) a learnable RoPE response gain modulation that selectively amplifies position sensitivity where needed. Extensive experiments across multiple model families demonstrate that LAid-distilled models achieve up to 3.2 times longer effective context windows compared to baseline small models, while maintaining or improving performance on standard VL benchmarks. Spectral analysis also suggests that LAid successfully preserves crucial low-frequency attention components that conventional methods fail to transfer. Our work not only provides practical techniques for building more efficient long-context VLMs but also offers theoretical insights into how positional understanding emerges and transfers during distillation.


【23】Can Large Language Models Improve Venture Capital Exit Timing After IPO?
标题:大型语言模型能否改善IPO后风险投资退出时机?
链接:https://arxiv.org/abs/2601.00810

作者:Mohammadhossien Rashidi
摘要:IPO后的退出时机是风险资本(VC)投资者最重要的决策之一,但现有的研究主要集中在描述VC退出时,而不是评估这些选择是否是经济上最优的。与此同时,大型语言模型(LLM)在综合复杂的财务数据和文本信息方面表现出了希望,但尚未应用于IPO后的退出决策。本研究介绍了一个框架,使用LLM估计的最佳时间为VC退出,通过分析每月后IPO信息的财务表现,文件,新闻和市场信号,并建议是否出售或继续持有。我们将这些LLM生成的建议与观察到的VC实际退出日期进行比较,并计算两种策略之间的回报差异。通过量化与遵循LLM相关的收益或损失,这项研究提供了证据,证明人工智能驱动的指导是否可以改善退出时机,并补充风险投资研究中的传统风险和实物期权模型。
摘要 :Exit timing after an IPO is one of the most consequential decisions for venture capital (VC) investors, yet existing research focuses mainly on describing when VCs exit rather than evaluating whether those choices are economically optimal. Meanwhile, large language models (LLMs) have shown promise in synthesizing complex financial data and textual information but have not been applied to post-IPO exit decisions. This study introduces a framework that uses LLMs to estimate the optimal time for VC exit by analyzing monthly post IPO information financial performance, filings, news, and market signals and recommending whether to sell or continue holding. We compare these LLM generated recommendations with the actual exit dates observed for VCs and compute the return differences between the two strategies. By quantifying gains or losses associated with following the LLM, this study provides evidence on whether AI-driven guidance can improve exit timing and complements traditional hazard and real-options models in venture capital research.


Graph相关(图学习|图神经网络|图优化等)(14篇)

【1】Quantized SO(3)-Equivariant Graph Neural Networks for Efficient Molecular Property Prediction
标题:用于高效分子性质预测的量化SO(3)-等变图神经网络
链接:https://arxiv.org/abs/2601.02213

作者:Haoyu Zhou,Ping Xue,Tianfan Fu,Hao Zhang
摘要:在边缘设备上部署与3D旋转(SO(3)组)等变的3D图神经网络(GNN)是具有挑战性的,因为它们的计算成本很高。本文通过使用低比特量化技术对SO(3)-等变GNN进行压缩和加速来解决这个问题。具体来说,我们介绍了量化等变Transformers的三个创新:(1)幅度-方向解耦量化方案,其分别量化等变(向量)特征的范数和方向,(2)分支分离量化感知训练策略,其在基于注意力的$SO(3)$-GNN中不同地对待不变和等变特征通道,以及(3)稳定低精度注意力计算的鲁棒性增强注意力归一化机制。在QM 9和rMD 17分子基准上的实验表明,我们的8位模型在能量和力预测方面达到了与全精度基线相当的精度,并且效率显著提高。我们还进行消融研究,以量化每个组件的贡献,以保持量化下的准确性和等方差,使用局部等方差误差(LEE)度量。所提出的技术能够在实际化学应用中部署具有高度感知能力的GNN,推理速度提高2.37- 2.73倍,模型尺寸缩小4倍,而不会牺牲准确性或物理对称性。
摘要:Deploying 3D graph neural networks (GNNs) that are equivariant to 3D rotations (the group SO(3)) on edge devices is challenging due to their high computational cost. This paper addresses the problem by compressing and accelerating an SO(3)-equivariant GNN using low-bit quantization techniques. Specifically, we introduce three innovations for quantized equivariant transformers: (1) a magnitude-direction decoupled quantization scheme that separately quantizes the norm and orientation of equivariant (vector) features, (2) a branch-separated quantization-aware training strategy that treats invariant and equivariant feature channels differently in an attention-based $SO(3)$-GNN, and (3) a robustness-enhancing attention normalization mechanism that stabilizes low-precision attention computations. Experiments on the QM9 and rMD17 molecular benchmarks demonstrate that our 8-bit models achieve accuracy on energy and force predictions comparable to full-precision baselines with markedly improved efficiency. We also conduct ablation studies to quantify the contribution of each component to maintain accuracy and equivariance under quantization, using the Local error of equivariance (LEE) metric. The proposed techniques enable the deployment of symmetry-aware GNNs in practical chemistry applications with 2.37--2.73x faster inference and 4x smaller model size, without sacrificing accuracy or physical symmetry.


【2】CORE: Code-based Inverse Self-Training Framework with Graph Expansion for Virtual Agents
标题:CORE:基于代码的虚拟Agent逆向图扩展自训练框架
链接:https://arxiv.org/abs/2601.02201

作者:Keyu Wang,Bingchen Miao,Wendong Bu,Yu Wu,Juncheng Li,Shengyu Zhang,Wenqiao Zhang,Siliang Tang,Jun Xiao,Yueting Zhuang
备注:19 pages, 12 figures
摘要:多模态虚拟Agent的开发通过多模态大语言模型的集成取得了重大进展。然而,主流训练范式面临着关键挑战:行为克隆通过模仿简单有效,但行为多样性低,而强化学习能够通过探索发现新策略,但严重依赖手动设计的奖励函数。为了解决这两种方法之间的冲突,我们提出了CORE,一个基于代码的逆向自我训练框架,具有图形扩展,连接模仿和探索,提供了一个新的训练框架,促进行为多样性,同时消除了对手动奖励设计的依赖。具体来说,我们引入语义代码抽象自动推断奖励功能,从专家演示,而无需手动设计。推断的奖励函数,称为标签函数,是验证任务中的一个关键步骤的可执行代码。在此基础上,我们提出了策略图扩展,以提高域内的行为多样性,它构建了一个多路径图称为策略图,捕捉不同的有效解决方案,超越专家演示。此外,我们引入了轨迹引导外推,它通过利用成功和失败的轨迹来扩展任务空间,从而丰富了域外行为的多样性。在Web和Android平台上的实验表明,CORE显着提高了整体性能和泛化能力,突出了其作为一个强大的和可推广的训练范例,建立强大的虚拟代理的潜力。
摘要:The development of Multimodal Virtual Agents has made significant progress through the integration of Multimodal Large Language Models. However, mainstream training paradigms face key challenges: Behavior Cloning is simple and effective through imitation but suffers from low behavioral diversity, while Reinforcement Learning is capable of discovering novel strategies through exploration but heavily relies on manually designed reward functions. To address the conflict between these two methods, we present CORE, a Code-based Inverse Self-Training Framework with Graph Expansion that bridges imitation and exploration, offering a novel training framework that promotes behavioral diversity while eliminating the reliance on manually reward design. Specifically, we introduce Semantic Code Abstraction to automatically infers reward functions from expert demonstrations without manual design. The inferred reward function, referred to as the Label Function, is executable code that verifies one key step within a task. Building on this, we propose Strategy Graph Expansion to enhance in-domain behavioral diversity, which constructs a multi-path graph called Strategy Graph that captures diverse valid solutions beyond expert demonstrations. Furthermore, we introduce Trajectory-Guided Extrapolation, which enriches out-of-domain behavioral diversity by utilizing both successful and failed trajectories to expand the task space. Experiments on Web and Android platforms demonstrate that CORE significantly improves both overall performance and generalization, highlighting its potential as a robust and generalizable training paradigm for building powerful virtual agents.


【3】ACDZero: Graph-Embedding-Based Tree Search for Mastering Automated Cyber Defense
标题:ACDZero:基于图形嵌入的树搜索,掌握自动化网络防御
链接:https://arxiv.org/abs/2601.02196

作者:Yu Li,Sizhe Tang,Rongqian Chen,Fei Xu Yu,Guangyu Jiang,Mahdi Imani,Nathaniel D. Bastian,Tian Lan
摘要:自动化网络防御(ACD)旨在以最少或没有人为干预的方式保护计算机网络,通过采取隔离主机、重置服务、部署诱饵或更新访问控制等纠正措施来对入侵做出反应。然而,现有的ACD方法,如深度强化学习(RL),通常在具有大决策/状态空间的复杂网络中面临困难的探索,因此需要大量的样本。受需要学习样本有效的防御策略的启发,我们在CAGE挑战4(CAGE-4 /CC 4)中将ACD框架为基于上下文的部分可观察马尔可夫决策问题,并提出了一种基于蒙特卡洛树搜索(MCTS)的以规划为中心的防御策略。它明确地模拟了ACD中的勘探开发权衡,并使用统计抽样来指导勘探和决策。我们利用图神经网络(GNNs)的新用途,将网络中的观察结果嵌入属性图,以实现对主机及其关系的置换不变推理。为了使我们的解决方案在复杂的搜索空间中实用,我们使用学习的图嵌入和先验知识指导MCTS,将无模型泛化和策略蒸馏与前瞻规划相结合。我们评估了CC 4场景中涉及不同网络结构和对手行为的代理,并表明我们的搜索引导,基于图形嵌入的规划相对于最先进的RL基线提高了防御奖励和鲁棒性。
摘要 :Automated cyber defense (ACD) seeks to protect computer networks with minimal or no human intervention, reacting to intrusions by taking corrective actions such as isolating hosts, resetting services, deploying decoys, or updating access controls. However, existing approaches for ACD, such as deep reinforcement learning (RL), often face difficult exploration in complex networks with large decision/state spaces and thus require an expensive amount of samples. Inspired by the need to learn sample-efficient defense policies, we frame ACD in CAGE Challenge 4 (CAGE-4 / CC4) as a context-based partially observable Markov decision problem and propose a planning-centric defense policy based on Monte Carlo Tree Search (MCTS). It explicitly models the exploration-exploitation tradeoff in ACD and uses statistical sampling to guide exploration and decision making. We make novel use of graph neural networks (GNNs) to embed observations from the network as attributed graphs, to enable permutation-invariant reasoning over hosts and their relationships. To make our solution practical in complex search spaces, we guide MCTS with learned graph embeddings and priors over graph-edit actions, combining model-free generalization and policy distillation with look-ahead planning. We evaluate the resulting agent on CC4 scenarios involving diverse network structures and adversary behaviors, and show that our search-guided, graph-embedding-based planning improves defense reward and robustness relative to state-of-the-art RL baselines.


【4】Advanced Global Wildfire Activity Modeling with Hierarchical Graph ODE
标题:基于层次图ODE的全球野火活动建模
链接:https://arxiv.org/abs/2601.01501

作者:Fan Xu,Wei Gong,Hao Wu,Lilan Peng,Nan Wang,Qingsong Wen,Xian Wu,Kun Wang,Xibin Zhao
摘要:野火作为地球系统的一个组成部分,受到大气、海洋和陆地过程的复杂相互作用的控制,这些过程跨越了广阔的时空尺度。因此,在大的时间尺度上模拟它们的全球活动是一项关键而又具有挑战性的任务。虽然深度学习最近在全球天气预报方面取得了重大突破,但其在全球野火行为预测方面的潜力仍有待开发。在这项工作中,我们重新定义了这个问题,并介绍了层次图ODE(HiGO),一种新的框架,旨在学习野火的多尺度,连续时间动态。具体来说,我们表示地球系统作为一个多层次的图形层次结构,并提出了一个自适应过滤消息传递机制的内部和层间的信息流,使更有效的特征提取和融合。此外,我们将GNN参数化的神经ODE模块在多个级别上显式地学习每个尺度固有的连续动态。通过对SeasFire Cube数据集的广泛实验,我们证明了HiGO在长期野火预测方面的表现明显优于最先进的基线。此外,它的连续时间预测表现出很强的观测一致性,突出了其在现实世界中的应用潜力。
摘要:Wildfires, as an integral component of the Earth system, are governed by a complex interplay of atmospheric, oceanic, and terrestrial processes spanning a vast range of spatiotemporal scales. Modeling their global activity on large timescales is therefore a critical yet challenging task. While deep learning has recently achieved significant breakthroughs in global weather forecasting, its potential for global wildfire behavior prediction remains underexplored. In this work, we reframe this problem and introduce the Hierarchical Graph ODE (HiGO), a novel framework designed to learn the multi-scale, continuous-time dynamics of wildfires. Specifically, we represent the Earth system as a multi-level graph hierarchy and propose an adaptive filtering message passing mechanism for both intra- and inter-level information flow, enabling more effective feature extraction and fusion. Furthermore, we incorporate GNN-parameterized Neural ODE modules at multiple levels to explicitly learn the continuous dynamics inherent to each scale. Through extensive experiments on the SeasFire Cube dataset, we demonstrate that HiGO significantly outperforms state-of-the-art baselines on long-range wildfire forecasting. Moreover, its continuous-time predictions exhibit strong observational consistency, highlighting its potential for real-world applications.


【5】Accelerating Storage-Based Training for Graph Neural Networks
标题:加速图神经网络的基于时间表的训练
链接:https://arxiv.org/abs/2601.01473

作者:Myung-Hwan Jang,Jeong-Min Park,Yunyong Ko,Sang-Wook Kim
备注:10 pages, 12 figures, 2 tables, ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) 2026
摘要:图神经网络(GNN)由于其强大的表达能力,在各种现实世界的下游任务中取得了突破。随着真实世界图形的规模不断增长,\texit {一种基于存储的GNN训练方法}已经得到研究,该方法利用外部存储(例如,NVMe SSD)在一台机器上处理这种网络规模的图形。虽然这种基于存储的GNN训练方法在大规模GNN训练中表现出了很好的潜力,但我们观察到它们在数据准备方面遇到了严重的瓶颈,因为它们忽略了一个关键挑战:\textit{如何处理大量的小存储I/O}。为了应对这一挑战,本文提出了一种新的基于存储的GNN训练框架,名为\textsf{AGNES},它采用了\textit{块存储I/O处理}的方法,以充分利用高性能存储设备的I/O带宽。此外,为了进一步提高每个存储I/O的效率,\textsf{AGNES}采用了一种简单而有效的策略,\texttit {hyperbatch-based processing}基于真实世界图形的特征。在五个真实世界的图形上进行的综合实验表明,\textsf{AGNES}始终优于四种最先进的方法,比最好的竞争对手快4.1倍。我们的代码可在https://github.com/Bigdasgit/agnes-kdd26上获得。
摘要:Graph neural networks (GNNs) have achieved breakthroughs in various real-world downstream tasks due to their powerful expressiveness. As the scale of real-world graphs has been continuously growing, \textit{a storage-based approach to GNN training} has been studied, which leverages external storage (e.g., NVMe SSDs) to handle such web-scale graphs on a single machine. Although such storage-based GNN training methods have shown promising potential in large-scale GNN training, we observed that they suffer from a severe bottleneck in data preparation since they overlook a critical challenge: \textit{how to handle a large number of small storage I/Os}. To address the challenge, in this paper, we propose a novel storage-based GNN training framework, named \textsf{AGNES}, that employs a method of \textit{block-wise storage I/O processing} to fully utilize the I/O bandwidth of high-performance storage devices. Moreover, to further enhance the efficiency of each storage I/O, \textsf{AGNES} employs a simple yet effective strategy, \textit{hyperbatch-based processing} based on the characteristics of real-world graphs. Comprehensive experiments on five real-world graphs reveal that \textsf{AGNES} consistently outperforms four state-of-the-art methods, by up to 4.1$\times$ faster than the best competitor. Our code is available at https://github.com/Bigdasgit/agnes-kdd26.


【6】A Depth Hierarchy for Computing the Maximum in ReLU Networks via Extremal Graph Theory
标题:通过极图理论计算ReLU网络中最大值的深度层次结构
链接:https://arxiv.org/abs/2601.01417

作者:Itay Safran
摘要:我们考虑使用ReLU神经网络精确计算$d$实输入上的最大函数的问题。我们证明了一个深度层次,其中宽度$Ω\big(d^{1+\frac{1}{2^{k-2}-1}}\big)$是表示任何深度$3\lek\le\log_2(\log_2(d))$的最大值所必需的。这是第一个无条件的超线性下限为这个基本算子在深度$k\ge3$,它保持即使深度尺度与$d$。我们的证明技术是基于一个组合的参数和关联的不可微的脊的最大值与集团在一个图形中引起的第一个隐藏层的计算网络,利用图兰定理从极值图论表明,一个足够窄的网络不能捕获的非线性的最大值。这表明,尽管其性质简单,但最大值函数具有固有的复杂性,这源于其不可微超平面的几何结构,并为证明深度神经网络的下界提供了一种新方法。
摘要:We consider the problem of exact computation of the maximum function over $d$ real inputs using ReLU neural networks. We prove a depth hierarchy, wherein width $Ω\big(d^{1+\frac{1}{2^{k-2}-1}}\big)$ is necessary to represent the maximum for any depth $3\le k\le \log_2(\log_2(d))$. This is the first unconditional super-linear lower bound for this fundamental operator at depths $k\ge3$, and it holds even if the depth scales with $d$. Our proof technique is based on a combinatorial argument and associates the non-differentiable ridges of the maximum with cliques in a graph induced by the first hidden layer of the computing network, utilizing Turán's theorem from extremal graph theory to show that a sufficiently narrow network cannot capture the non-linearities of the maximum. This suggests that despite its simple nature, the maximum function possesses an inherent complexity that stems from the geometric structure of its non-differentiable hyperplanes, and provides a novel approach for proving lower bounds for deep neural networks.


【7】A Graph-based Framework for Online Time Series Anomaly Detection Using Model Ensemble
标题:使用模型集合进行在线时间序列异常检测的基于图的框架
链接:https://arxiv.org/abs/2601.01403

作者:Zewei Yu,Jianqiu Xu,Caimin Li
备注:8 pages
摘要:随着工业系统中流数据量的增加,在线异常检测已成为一项关键任务。多样化和快速发展的数据模式对在线异常检测提出了重大挑战。现有的许多异常检测方法都是针对离线环境设计的,或者很难有效地处理异构流数据。本文提出了GDME,一个无监督的基于图的框架,在线时间序列异常检测使用模型集成。GDME维护一个动态模型池,通过修剪表现不佳的模型和引入新的模型来不断更新。它利用动态图结构来表示模型之间的关系,并采用社区检测图上选择一个合适的子集集成。图结构还用于通过监测结构变化来检测概念漂移,从而允许框架适应不断变化的流数据。七个异构时间序列的实验表明,GDME优于现有的在线异常检测方法,实现高达24%的改进。此外,它的集成策略提供了优越的检测性能相比,无论是个人的模型和平均集成,具有竞争力的计算效率。
摘要:With the increasing volume of streaming data in industrial systems, online anomaly detection has become a critical task. The diverse and rapidly evolving data patterns pose significant challenges for online anomaly detection. Many existing anomaly detection methods are designed for offline settings or have difficulty in handling heterogeneous streaming data effectively. This paper proposes GDME, an unsupervised graph-based framework for online time series anomaly detection using model ensemble. GDME maintains a dynamic model pool that is continuously updated by pruning underperforming models and introducing new ones. It utilizes a dynamic graph structure to represent relationships among models and employs community detection on the graph to select an appropriate subset for ensemble. The graph structure is also used to detect concept drift by monitoring structural changes, allowing the framework to adapt to evolving streaming data. Experiments on seven heterogeneous time series demonstrate that GDME outperforms existing online anomaly detection methods, achieving improvements of up to 24%. In addition, its ensemble strategy provides superior detection performance compared with both individual models and average ensembles, with competitive computational efficiency.


【8】Scale-Adaptive Power Flow Analysis with Local Topology Slicing and Multi-Task Graph Learning
标题:采用局部布局切片和多任务图学习的规模自适应潮流分析
链接:https://arxiv.org/abs/2601.01387

作者:Yongzhe Li,Lin Guan,Zihan Cai,Zuxian Lin,Jiyu Huang,Liukai Chen
摘要:开发对拓扑变化具有较强适应性的深度学习模型对于潮流分析具有重要的现实意义。为了提高模型在变系统规模下的性能,提高支路功率预测的鲁棒性,提出了一种规模自适应多任务潮流分析(SaMPFA)框架。SaMPFA引入了局部拓扑切片(LTS)采样技术,从完整的电力网络中提取不同尺度的子图,以加强模型的跨尺度学习能力。在此基础上,设计了一种无参考多任务图学习(RMGL)模型用于鲁棒潮流预测。与现有方法不同,RMGL预测母线电压和支路功率,而不是相位角。这种设计不仅避免了支路功率计算中误差放大的风险,而且引导模型学习相角差的物理关系。此外,损失函数还包含了额外的项,这些项鼓励模型捕捉角度差和功率传输的物理模式,进一步提高了预测与物理定律之间的一致性。IEEE 39节点系统和中国某省级电网的仿真结果表明,该模型在变系统规模下具有较好的适应性和泛化能力,精度分别提高了4.47%和36.82%。
摘要:Developing deep learning models with strong adaptability to topological variations is of great practical significance for power flow analysis. To enhance model performance under variable system scales and improve robustness in branch power prediction, this paper proposes a Scale-adaptive Multi-task Power Flow Analysis (SaMPFA) framework. SaMPFA introduces a Local Topology Slicing (LTS) sampling technique that extracts subgraphs of different scales from the complete power network to strengthen the model's cross-scale learning capability. Furthermore, a Reference-free Multi-task Graph Learning (RMGL) model is designed for robust power flow prediction. Unlike existing approaches, RMGL predicts bus voltages and branch powers instead of phase angles. This design not only avoids the risk of error amplification in branch power calculation but also guides the model to learn the physical relationships of phase angle differences. In addition, the loss function incorporates extra terms that encourage the model to capture the physical patterns of angle differences and power transmission, further improving consistency between predictions and physical laws. Simulations on the IEEE 39-bus system and a real provincial grid in China demonstrate that the proposed model achieves superior adaptability and generalization under variable system scales, with accuracy improvements of 4.47% and 36.82%, respectively.


【9】From Classification to Generation: An Open-Ended Paradigm for Adverse Drug Reaction Prediction Based on Graph-Motif Feature Fusion
标题:从分类到生成:基于图-基蒂特征融合的开放式药物不良反应预测范式
链接:https://arxiv.org/abs/2601.01347

作者:Yuyan Pi,Min Jin,Wentao Xie,Xinhua Liu
备注:34 pages,5 figures
摘要:计算生物学为通过药物不良反应(ADR)预测降低新药开发的高成本和延长周期提供了巨大的潜力。然而,目前的方法仍然受到药物数据稀缺引起的冷启动挑战,封闭的标签集和标签依赖性建模不足的阻碍。在这里,我们提出了一个开放式的ADR预测模式的基础上图-模体特征融合和多标签生成(GM-MLG)。利用分子结构作为内在和固有的特征,GM-MLG构建了一个双图表示架构,跨越原子水平,局部分子水平(利用通过BRICS算法动态提取的细粒度基序结合额外的碎片规则)和全局分子水平。独特的是,GM-MLG率先将ADR预测从多标签分类转变为基于Transformer Decoder的多标签生成。通过将ADR标签视为离散令牌序列,它采用位置嵌入来显式捕获大规模标签空间内的依赖关系和共现关系,通过自回归解码生成预测以动态扩展预测空间。实验表明,GM-MLG实现了高达38%的改进和20%的平均增益,预测空间从200扩展到超过10,000类型。此外,它阐明了非线性的结构-活性之间的关系,ADR和基序通过逆转录合成基序分析,提供了可解释的和创新的支持,系统的风险降低药物安全性。
摘要:Computational biology offers immense potential for reducing the high costs and protracted cycles of new drug development through adverse drug reaction (ADR) prediction. However, current methods remain impeded by drug data scarcity-induced cold-start challenge, closed label sets, and inadequate modeling of label dependencies. Here we propose an open-ended ADR prediction paradigm based on Graph-Motif feature fusion and Multi-Label Generation (GM-MLG). Leveraging molecular structure as an intrinsic and inherent feature, GM-MLG constructs a dual-graph representation architecture spanning the atomic level, the local molecular level (utilizing fine-grained motifs dynamically extracted via the BRICS algorithm combined with additional fragmentation rules), and the global molecular level. Uniquely, GM-MLG pioneers transforming ADR prediction from multi-label classification into Transformer Decoder-based multi-label generation. By treating ADR labels as discrete token sequences, it employs positional embeddings to explicitly capture dependencies and co-occurrence relationships within large-scale label spaces, generating predictions via autoregressive decoding to dynamically expand the prediction space. Experiments demonstrate GM-MLG achieves up to 38% improvement and an average gain of 20%, expanding the prediction space from 200 to over 10,000 types. Furthermore, it elucidates non-linear structure-activity relationships between ADRs and motifs via retrosynthetic motif analysis, providing interpretable and innovative support for systematic risk reduction in drug safety.


【10】Generating Diverse TSP Tours via a Combination of Graph Pointer Network and Dispersion
标题:基于图指针网络与离散度相结合的多路径TSP生成算法
链接:https://arxiv.org/abs/2601.01132

作者:Hao-Hsung Yang,Ssu-Yuan Lo,Kuan-Lun Chen,Ching-Kai Wang
摘要 :我们解决的多样性旅行商问题(D-TSP),双标准优化的挑战,寻求一组$k$不同的TSP旅游。目的是要求每一个选定的旅游有一个长度最多$c| T^*|$(其中$|T^*|$是最优的游览长度),同时最小化所有游览对的平均Jaccard相似性。这种公式对于需要高解决方案质量和容错能力的应用至关重要,例如物流规划,机器人寻路或战略巡逻。目前的方法是有限的:诸如小生境模因算法(NMA)或双标准优化的传统算法导致高计算复杂度O(n^3),而现代神经方法(例如,RF-MA 3S)实现有限的多样性质量,并依赖于复杂的外部机制。   为了克服这些限制,我们提出了一种新的混合框架,分解成两个有效的步骤D-TSP。首先,我们利用一个简单的图指针网络(GPN),增加了近似的序列熵损失,有效地采样一个大的,不同的高质量的旅游池。这种简单的修改有效地控制了质量多样性的权衡,而无需复杂的外部机制。其次,我们应用一个贪婪的算法,产生一个2-近似的分散问题,以选择最终的$k$最大不同的旅游从生成的池。我们的结果展示了最先进的性能。在柏林的实例中,我们的模型实现了0.015美元的平均Jaccard指数,显著优于NMA(0.081美元)和RF-MA 3S。通过利用GPU加速,我们的GPN结构实现了$O(n)$的近线性经验运行时间增长。在保持解决方案多样性的同时,与复杂的双标准算法相比,我们的方法在大规模实例(783个城市)上的速度快了360倍以上,以前所未有的效率和简单性提供高质量的TSP解决方案。
摘要:We address the Diverse Traveling Salesman Problem (D-TSP), a bi-criteria optimization challenge that seeks a set of $k$ distinct TSP tours. The objective requires every selected tour to have a length at most $c|T^*|$ (where $|T^*|$ is the optimal tour length) while minimizing the average Jaccard similarity across all tour pairs. This formulation is crucial for applications requiring both high solution quality and fault tolerance, such as logistics planning, robotics pathfinding or strategic patrolling. Current methods are limited: traditional heuristics, such as the Niching Memetic Algorithm (NMA) or bi-criteria optimization, incur high computational complexity $O(n^3)$, while modern neural approaches (e.g., RF-MA3S) achieve limited diversity quality and rely on complex, external mechanisms.   To overcome these limitations, we propose a novel hybrid framework that decomposes D-TSP into two efficient steps. First, we utilize a simple Graph Pointer Network (GPN), augmented with an approximated sequence entropy loss, to efficiently sample a large, diverse pool of high-quality tours. This simple modification effectively controls the quality-diversity trade-off without complex external mechanisms. Second, we apply a greedy algorithm that yields a 2-approximation for the dispersion problem to select the final $k$ maximally diverse tours from the generated pool. Our results demonstrate state-of-the-art performance. On the Berlin instance, our model achieves an average Jaccard index of $0.015$, significantly outperforming NMA ($0.081$) and RF-MA3S. By leveraging GPU acceleration, our GPN structure achieves a near-linear empirical runtime growth of $O(n)$. While maintaining solution diversity comparable to complex bi-criteria algorithms, our approach is over 360 times faster on large-scale instances (783 cities), delivering high-quality TSP solutions with unprecedented efficiency and simplicity.


【11】Learning from Historical Activations in Graph Neural Networks
标题:从图神经网络中的历史激活中学习
链接:https://arxiv.org/abs/2601.01123

作者:Yaniv Galron,Hadar Sinai,Haggai Maron,Moshe Eliasof
摘要:图神经网络(GNN)在社交网络、分子化学等各个领域都取得了显著的成功。GNN的一个关键组成部分是池化过程,其中模型计算的节点特征被组合起来,形成一个用于下游任务的信息性最终描述符。然而,以前的图池化方案依赖于最后一个GNN层特征作为池化或分类器层的输入,可能没有充分利用在模型的前向传递期间产生的先前层的重要激活,我们将其视为历史图激活。这种差距在节点的表示可能在许多图神经层的过程中发生显著变化的情况下尤其明显,并且由于特定于图的挑战(例如深度架构中的过度平滑)而恶化。为了弥合这一差距,我们引入了HISTOGRAPH,这是一种新的基于注意力的两阶段最终聚合层,它首先在中间激活上应用统一的逐层注意力,然后是逐节点注意力。通过对跨层节点表示的演变进行建模,我们的HISTOGRAPH利用节点的激活历史和图形结构来细化用于最终预测的特征。多个图分类基准的实证结果表明,HISTOGRAPH提供了强大的性能,不断改进传统技术,在深度GNN中具有特别强的鲁棒性。
摘要:Graph Neural Networks (GNNs) have demonstrated remarkable success in various domains such as social networks, molecular chemistry, and more. A crucial component of GNNs is the pooling procedure, in which the node features calculated by the model are combined to form an informative final descriptor to be used for the downstream task. However, previous graph pooling schemes rely on the last GNN layer features as an input to the pooling or classifier layers, potentially under-utilizing important activations of previous layers produced during the forward pass of the model, which we regard as historical graph activations. This gap is particularly pronounced in cases where a node's representation can shift significantly over the course of many graph neural layers, and worsened by graph-specific challenges such as over-smoothing in deep architectures. To bridge this gap, we introduce HISTOGRAPH, a novel two-stage attention-based final aggregation layer that first applies a unified layer-wise attention over intermediate activations, followed by node-wise attention. By modeling the evolution of node representations across layers, our HISTOGRAPH leverages both the activation history of nodes and the graph structure to refine features used for final prediction. Empirical results on multiple graph classification benchmarks demonstrate that HISTOGRAPH offers strong performance that consistently improves traditional techniques, with particularly strong robustness in deep GNNs.


【12】Distribution Matching for Graph Quantification Under Structural Covariate Shift
标题:结构协变量位移下图量化的分布匹配
链接:https://arxiv.org/abs/2601.00864

作者:Clemens Damke,Eyke Hüllermeier
备注:17 pages, presented at ECML-PKDD 2025
摘要:图通常用于机器学习中,以建模实例之间的关系。考虑预测社交网络中用户的政治偏好的任务;为了解决这个任务,我们应该考虑每个用户的特征以及他们之间的关系。然而,人们通常对单个实例的标签不感兴趣,而是对标签在一组实例上的分布感兴趣;例如,当预测用户的政治偏好时,给定意见的总体流行度可能比特定人的意见更令人感兴趣。这种标签流行率估计任务通常被称为量化学习(QL)。当前表格数据的QL方法通常基于所谓的先验概率偏移(PPS)假设,该假设指出标签条件实例分布在训练和测试数据中应保持相等。在图形设置中,如果训练数据和测试数据之间的偏移是结构性的,即,如果训练数据来自与测试数据不同的图的区域。为了解决这种结构性变化,以前已经提出了流行的调整计数量化方法的重要性采样变体。在这项工作中,我们将结构重要性抽样的思想扩展到最先进的KDEy量化方法。我们表明,我们提出的方法适应结构变化,优于标准的量化方法。
摘要:Graphs are commonly used in machine learning to model relationships between instances. Consider the task of predicting the political preferences of users in a social network; to solve this task one should consider, both, the features of each individual user and the relationships between them. However, oftentimes one is not interested in the label of a single instance but rather in the distribution of labels over a set of instances; e.g., when predicting the political preferences of users, the overall prevalence of a given opinion might be of higher interest than the opinion of a specific person. This label prevalence estimation task is commonly referred to as quantification learning (QL). Current QL methods for tabular data are typically based on the so-called prior probability shift (PPS) assumption which states that the label-conditional instance distributions should remain equal across the training and test data. In the graph setting, PPS generally does not hold if the shift between training and test data is structural, i.e., if the training data comes from a different region of the graph than the test data. To address such structural shifts, an importance sampling variant of the popular adjusted count quantification approach has previously been proposed. In this work, we extend the idea of structural importance sampling to the state-of-the-art KDEy quantification approach. We show that our proposed method adapts to structural shifts and outperforms standard quantification approaches.


【13】A Knowledge Graph and Deep Learning-Based Semantic Recommendation Database System for Advertisement Retrieval and Personalization
标题:基于知识图谱和深度学习的广告检索和个性化语义推荐数据库系统
链接:https://arxiv.org/abs/2601.00833

作者:Tangtang Wang,Kaijie Zhang,Kuangcong Liu
摘要:在现代数字营销中,广告数据日益复杂,需要能够理解产品、受众和广告内容之间语义关系的智能系统。为了应对这一挑战,本文提出了一个基于知识图和深度学习的语义推荐数据库系统(KGSR-ADS),用于广告检索和个性化。所提出的框架集成了一个异构的广告知识图(Ad-KG),捕捉多关系语义,语义嵌入层,利用大型语言模型(LLM),如GPT和LLaMA,以生成上下文感知的矢量表示,GNN +注意力模型,推断跨实体的依赖关系,和数据库优化和检索层基于矢量索引(FAISS/Milvus)的高效语义搜索。这种分层架构实现了精确的语义匹配和可扩展的检索,从而在大规模异构工作负载下实现个性化广告推荐。
摘要 :In modern digital marketing, the growing complexity of advertisement data demands intelligent systems capable of understanding semantic relationships among products, audiences, and advertising content. To address this challenge, this paper proposes a Knowledge Graph and Deep Learning-Based Semantic Recommendation Database System (KGSR-ADS) for advertisement retrieval and personalization. The proposed framework integrates a heterogeneous Ad-Knowledge Graph (Ad-KG) that captures multi-relational semantics, a Semantic Embedding Layer that leverages large language models (LLMs) such as GPT and LLaMA to generate context-aware vector representations, a GNN + Attention Model that infers cross-entity dependencies, and a Database Optimization & Retrieval Layer based on vector indexing (FAISS/Milvus) for efficient semantic search. This layered architecture enables both accurate semantic matching and scalable retrieval, allowing personalized ad recommendations under large-scale heterogeneous workloads.


【14】From Mice to Trains: Amortized Bayesian Inference on Graph Data
标题:从老鼠到火车:图形数据的摊销式Bayesian推理
链接:https://arxiv.org/abs/2601.02241

作者:Svenja Jedhoff,Elizaveta Semenova,Aura Raulo,Anne Meyer,Paul-Christian Bürkner
摘要:图表出现在不同的领域,从生物学和化学到社会和信息网络,以及运输和物流。对图结构数据的推断需要具有置换不变、可扩展到不同大小和稀疏度的方法,并且能够捕获复杂的长程依赖关系,这使得对图参数的后验估计特别具有挑战性。摊销贝叶斯推理(ABI)是一个基于模拟的框架,它采用生成神经网络来实现快速,无似然的后验推理。我们适应ABI图形数据,以解决这些挑战,执行节点,边缘和图形级参数的推理。我们的方法将置换不变图编码器与灵活的神经后验估计器耦合在两个模块的管道中:摘要网络将属性图映射到固定长度的表示,推理网络近似参数的后验。在这种情况下,几个神经架构可以作为汇总网络。在这项工作中,我们评估多种架构,并评估其性能控制合成设置和两个现实世界的域-生物学和物流-在恢复和校准方面。
摘要:Graphs arise across diverse domains, from biology and chemistry to social and information networks, as well as in transportation and logistics. Inference on graph-structured data requires methods that are permutation-invariant, scalable across varying sizes and sparsities, and capable of capturing complex long-range dependencies, making posterior estimation on graph parameters particularly challenging. Amortized Bayesian Inference (ABI) is a simulation-based framework that employs generative neural networks to enable fast, likelihood-free posterior inference. We adapt ABI to graph data to address these challenges to perform inference on node-, edge-, and graph-level parameters. Our approach couples permutation-invariant graph encoders with flexible neural posterior estimators in a two-module pipeline: a summary network maps attributed graphs to fixed-length representations, and an inference network approximates the posterior over parameters. In this setting, several neural architectures can serve as the summary network. In this work we evaluate multiple architectures and assess their performance on controlled synthetic settings and two real-world domains - biology and logistics - in terms of recovery and calibration.


Transformer(7篇)

【1】Differential Privacy for Transformer Embeddings of Text with Nonparametric Variational Information Bottleneck
标题:具有非参数变分信息瓶颈的Transformer转换器嵌入的差异隐私
链接:https://arxiv.org/abs/2601.02307

作者:Dina El Zein,James Henderson
备注:11 pages, 2 figures
摘要:我们提出了一个隐私保护的方法,通过共享嘈杂的版本,他们的Transformer嵌入共享文本数据。研究表明,通过深度模型学习的隐藏表示可以对输入中的敏感信息进行编码,从而使对手能够以相当高的精度恢复输入数据。这个问题在Transformer嵌入中更加严重,因为它们由多个向量组成,每个令牌一个。为了减轻这种风险,我们提出了非参数变分差分隐私(NVDP),它既确保了有用的数据共享,又提供了强大的隐私保护。我们采取了差分隐私的方法,集成了非参数变分信息瓶颈(NVIB)层的Transformer架构中注入噪声到其多向量嵌入,从而隐藏信息,并衡量隐私保护与雷尼分歧及其相应的贝叶斯差分隐私(BDP)保证。训练NVIB层根据效用校准噪声水平。我们在GLUE基准测试上测试了NVDP,结果表明,改变噪声水平可以在隐私和准确性之间进行有益的权衡。由于噪声水平较低,我们的模型保持了高准确性,同时提供了强有力的隐私保证,有效地平衡了隐私和实用性。
摘要:We propose a privacy-preserving method for sharing text data by sharing noisy versions of their transformer embeddings. It has been shown that hidden representations learned by deep models can encode sensitive information from the input, making it possible for adversaries to recover the input data with considerable accuracy. This problem is exacerbated in transformer embeddings because they consist of multiple vectors, one per token. To mitigate this risk, we propose Nonparametric Variational Differential Privacy (NVDP), which ensures both useful data sharing and strong privacy protection. We take a differential privacy approach, integrating a Nonparametric Variational Information Bottleneck (NVIB) layer into the transformer architecture to inject noise into its multi-vector embeddings and thereby hide information, and measuring privacy protection with Rényi divergence and its corresponding Bayesian Differential Privacy (BDP) guarantee. Training the NVIB layer calibrates the noise level according to utility. We test NVDP on the GLUE benchmark and show that varying the noise level gives us a useful tradeoff between privacy and accuracy. With lower noise levels, our model maintains high accuracy while offering strong privacy guarantees, effectively balancing privacy and utility.


【2】Context-Free Recognition with Transformers
标题:Transformer的无上下文识别
链接:https://arxiv.org/abs/2601.01754

作者:Selim Jerad,Anej Svete,Sophie Hao,Ryan Cotterell,William Merrill
摘要:Transformers擅长根据某些语法(如自然语言和代码)处理格式良好的输入的任务。然而,尚不清楚它们如何处理语法句法。事实上,在标准复杂度架构下,标准Transformers不能识别上下文无关语言(CFL),一种描述语法的规范形式主义,甚至不能识别正则语言,CFL的一个子类(Merrill et al.,2022年)。Merrill & Sabharwal(2024)表明,$\mathcal{O}(\log n)$循环层(w.r.t.输入长度$n$)允许Transformers识别常规语言,但是上下文无关识别的问题仍然是开放的。在这项工作中,我们证明了具有$\mathcal{O}(\log n)$循环层和$\mathcal{O}(n^6)$填充令牌的循环Transformers可以识别所有CFL。但是,使用$\mathcal{O}(n^6)$填充标记进行训练和推理可能不切实际。幸运的是,我们表明,对于自然子类,如明确的节能灯,Transformers的识别问题变得更加容易处理,需要$\mathcal{O}(n^3)$填充。我们经验验证我们的结果,并表明循环有助于语言,可证明需要对数深度。总的来说,我们的研究结果揭示了Transformers CFL识别的复杂性:虽然一般识别可能需要大量的填充,但自然约束(如无歧义)会产生有效的识别算法。
摘要:Transformers excel on tasks that process well-formed inputs according to some grammar, such as natural language and code. However, it remains unclear how they can process grammatical syntax. In fact, under standard complexity conjectures, standard transformers cannot recognize context-free languages (CFLs), a canonical formalism to describe syntax, or even regular languages, a subclass of CFLs (Merrill et al., 2022). Merrill & Sabharwal (2024) show that $\mathcal{O}(\log n)$ looping layers (w.r.t. input length $n$) allows transformers to recognize regular languages, but the question of context-free recognition remained open. In this work, we show that looped transformers with $\mathcal{O}(\log n)$ looping layers and $\mathcal{O}(n^6)$ padding tokens can recognize all CFLs. However, training and inference with $\mathcal{O}(n^6)$ padding tokens is potentially impractical. Fortunately, we show that, for natural subclasses such as unambiguous CFLs, the recognition problem on transformers becomes more tractable, requiring $\mathcal{O}(n^3)$ padding. We empirically validate our results and show that looping helps on a language that provably requires logarithmic depth. Overall, our results shed light on the intricacy of CFL recognition by transformers: While general recognition may require an intractable amount of padding, natural constraints such as unambiguity yield efficient recognition algorithms.


【3】AI-Powered Deepfake Detection Using CNN and Vision Transformer Architectures
标题 :使用CNN和Vision Transformer架构的人工智能驱动Deepfake检测
链接:https://arxiv.org/abs/2601.01281

作者:Sifatullah Sheikh Urmi,Kirtonia Nuzath Tabassum Arthi,Md Al-Imran
备注:6 pages, 6 figures, 3 tables. Conference paper
摘要:越来越多地使用人工智能生成的deepfake在维护数字真实性方面带来了重大挑战。使用大型人脸图像数据集评估了四个基于AI的模型,包括三个CNN和一个Vision Transformer。数据预处理和增强技术提高了不同场景下的模型性能。VFDNET在MobileNetV3上表现出了卓越的准确性,显示出高效的性能,从而展示了AI可靠的deepfake检测能力。
摘要:The increasing use of artificial intelligence generated deepfakes creates major challenges in maintaining digital authenticity. Four AI-based models, consisting of three CNNs and one Vision Transformer, were evaluated using large face image datasets. Data preprocessing and augmentation techniques improved model performance across different scenarios. VFDNET demonstrated superior accuracy with MobileNetV3, showing efficient performance, thereby demonstrating AI's capabilities for dependable deepfake detection.


【4】Benchmarking the Computational and Representational Efficiency of State Space Models against Transformers on Long-Context Dyadic Sessions
标题:在长上下文二元会话上对状态空间模型的计算和表示效率进行基准测试
链接:https://arxiv.org/abs/2601.01237

作者:Abidemi Koledoye,Chinemerem Unachukwu,Gold Nwobu,Hasin Rana
备注:14 pages
摘要:状态空间模型(SSM)已经成为一个有前途的替代Transformers的长上下文序列建模,提供线性$O(N)$的计算复杂度相比,Transformer的二次$O(N^2)$缩放。本文提出了一个全面的基准研究比较的Mamba SSM对LLaMA Transformer的长上下文序列,使用二元治疗会话作为一个代表性的测试用例。我们从两个方面评估了这两种架构:(1)计算效率,我们从512到8,192个令牌测量内存使用和推理速度,以及(2)表征效率,我们分析隐藏的状态动态和注意力模式。我们的研究结果为从事长期应用的从业者提供了可操作的见解,建立了SSM提供优于Transformers的精确条件。
摘要:State Space Models (SSMs) have emerged as a promising alternative to Transformers for long-context sequence modeling, offering linear $O(N)$ computational complexity compared to the Transformer's quadratic $O(N^2)$ scaling. This paper presents a comprehensive benchmarking study comparing the Mamba SSM against the LLaMA Transformer on long-context sequences, using dyadic therapy sessions as a representative test case. We evaluate both architectures across two dimensions: (1) computational efficiency, where we measure memory usage and inference speed from 512 to 8,192 tokens, and (2) representational efficiency, where we analyze hidden state dynamics and attention patterns. Our findings provide actionable insights for practitioners working with long-context applications, establishing precise conditions under which SSMs offer advantages over Transformers.


【5】Central Dogma Transformer: Towards Mechanism-Oriented AI for Cellular Understanding
标题:中心教条Transformer:迈向以机制为导向的人工智能以实现细胞理解
链接:https://arxiv.org/abs/2601.01089

作者:Nobuyuki Ota
摘要:理解细胞机制需要整合DNA、RNA和蛋白质的信息--这三个分子系统由分子生物学的中心法则联系在一起。虽然特定领域的基础模型已经分别为每种模式取得了成功,但它们仍然是孤立的,限制了我们对集成细胞过程建模的能力。在这里,我们提出了中心法则Transformer(CDT),这是一种架构,它集成了DNA,RNA和蛋白质的预训练语言模型,遵循中心法则的方向逻辑。CDT采用定向交叉注意机制-DNA到RNA的注意模型转录调控,而RNA到蛋白质的注意模型翻译关系-产生一个统一的虚拟细胞嵌入,集成了所有三种模式。我们验证了CDT v1 -一种使用固定(非细胞特异性)RNA和蛋白质嵌入的概念验证实施-来自K562细胞的CRISPRi增强子扰动数据,实现了0.503的Pearson相关性,代表了交叉实验变异性设定的理论上限的63%(r = 0.797)。注意力和梯度分析提供了互补的解释窗口:在详细的案例研究中,这些方法突出了很大程度上不同的基因组区域,梯度分析确定了一个CTCF结合位点,Hi-C数据显示该位点与增强子和靶基因物理接触。这些结果表明,与生物信息流一致的AI架构可以实现预测准确性和机械可解释性。
摘要:Understanding cellular mechanisms requires integrating information across DNA, RNA, and protein - the three molecular systems linked by the Central Dogma of molecular biology. While domain-specific foundation models have achieved success for each modality individually, they remain isolated, limiting our ability to model integrated cellular processes. Here we present the Central Dogma Transformer (CDT), an architecture that integrates pre-trained language models for DNA, RNA, and protein following the directional logic of the Central Dogma. CDT employs directional cross-attention mechanisms - DNA-to-RNA attention models transcriptional regulation, while RNA-to-Protein attention models translational relationships - producing a unified Virtual Cell Embedding that integrates all three modalities. We validate CDT v1 - a proof-of-concept implementation using fixed (non-cell-specific) RNA and protein embeddings - on CRISPRi enhancer perturbation data from K562 cells, achieving a Pearson correlation of 0.503, representing 63% of the theoretical ceiling set by cross-experiment variability (r = 0.797). Attention and gradient analyses provide complementary interpretive windows: in detailed case studies, these approaches highlight largely distinct genomic regions, with gradient analysis identifying a CTCF binding site that Hi-C data showed as physically contacting both enhancer and target gene. These results suggest that AI architectures aligned with biological information flow can achieve both predictive accuracy and mechanistic interpretability.


【6】Geometric and Dynamic Scaling in Deep Transformers
标题:深度Transformer中的几何和动态缩放
链接:https://arxiv.org/abs/2601.01014

作者:Haoran Su,Chenyu You
备注:Research Proposal Only
摘要:尽管取得了经验上的成功,但将Transformer架构推向极端深度往往会导致自相矛盾的失败:表示变得越来越多余,失去等级,并最终崩溃。现有的解释在很大程度上将这种现象归因于优化不稳定性或消失的梯度,但这些解释无法解释为什么即使在现代标准化和初始化方案下,崩溃仍然存在。在本文中,我们认为,倒塌的深Transformers从根本上说是一个几何问题。标准残差更新隐含地假设特征积累总是有益的,但没有提供任何机制来约束更新方向或删除过时的信息。随着深度的增加,这会导致语义流形的系统漂移和单调的特征积累,导致表征退化。我们提出了一个统一的几何框架,通过两个正交的原则,解决这些故障。首先,流形约束的超连接将残差更新限制在有效的局部切线方向,防止不受控制的流形漂移。其次,深度增量学习引入了依赖于数据的非单调更新,可以反射和删除冗余特征,而不是无条件积累。总之,这些机制解耦了特征更新的方向和符号,产生了跨深度的稳定几何演化。我们将得到的架构的流形几何Transformer(MGT)。我们的分析预测,在允许动态擦除的同时实施几何有效性对于避免超深度网络中的秩崩溃至关重要。我们概述了一个超过100层的Transformers的评估协议,以测试几何结构而不是深度本身是深度表示学习的关键限制因素的假设。
摘要 :Despite their empirical success, pushing Transformer architectures to extreme depth often leads to a paradoxical failure: representations become increasingly redundant, lose rank, and ultimately collapse. Existing explanations largely attribute this phenomenon to optimization instability or vanishing gradients, yet such accounts fail to explain why collapse persists even under modern normalization and initialization schemes. In this paper, we argue that the collapse of deep Transformers is fundamentally a geometric problem. Standard residual updates implicitly assume that feature accumulation is always beneficial, but offer no mechanism to constrain update directions or to erase outdated information. As depth increases, this leads to systematic drift off the semantic manifold and monotonic feature accumulation, causing representational degeneracy. We propose a unified geometric framework that addresses these failures through two orthogonal principles. First, manifold-constrained hyper-connections restrict residual updates to valid local tangent directions, preventing uncontrolled manifold drift. Second, deep delta learning introduces data-dependent, non-monotonic updates that enable reflection and erasure of redundant features rather than their unconditional accumulation. Together, these mechanisms decouple the direction and sign of feature updates, yielding a stable geometric evolution across depth. We term the resulting architecture the Manifold-Geometric Transformer (MGT). Our analysis predicts that enforcing geometric validity while allowing dynamic erasure is essential for avoiding rank collapse in ultra-deep networks. We outline an evaluation protocol for Transformers exceeding 100 layers to test the hypothesis that geometry, rather than depth itself, is the key limiting factor in deep representation learning.


【7】You Only Need Your Transformer 25% of the Time: Meaning-First Execution for Eliminating Unnecessary Inference
标题:25%的时间你只需要你的Transformer:有意义的第一执行,消除不必要的推理
链接:https://arxiv.org/abs/2601.00847

作者:Ryan Shamim
备注:24 pages, 5 figures. Deterministic evaluation protocol. Includes theoretical analysis and empirical validation on GPT-2 and Gemma 2 9B
摘要:现代AI推理系统将Transformer执行视为强制性的,将模型能力与执行必要性混为一谈。我们将推理重新定义为一个控制平面决策问题:确定何时执行是必要的,何时可以通过替代途径保留正确性。我们引入意义优先执行(MFEE),控制平面架构实现这个框架,有选择地调用Transformer推理只有在需要的时候。MFEE作为现有堆栈上的门控层运行,无需修改模型、权重或参数。在确定性解码下的1,000个不同提示中,MFEE实现了78.1%的执行减少,同时保持了调用执行的100%精确匹配等效性。通过语义分析,基于模式的路由器在正确性失败的情况下最多可避免53.3%的错误,而MFEE可避免100%的错误。我们通过定理1证明了这一限制:任何路由器只在有限的特征映射上运行,不能同时保证零错误跳过和积极避免特征冲突对。这些结果将执行治理作为ML系统基础设施的基础层,与模型级优化技术正交。
摘要:Modern AI inference systems treat transformer execution as mandatory, conflating model capability with execution necessity. We reframe inference as a control-plane decision problem: determining when execution is necessary versus when correctness can be preserved through alternative pathways. We introduce Meaning-First Execution (MFEE), a control-plane architecture implementing this framework, selectively invoking transformer inference only when required. MFEE operates as a gating layer above existing stacks without modifying models, weights, or parameters. Across 1,000 diverse prompts under deterministic decoding, MFEE achieves 78.1% execution reduction while maintaining 100% exact-match equivalence for invoked executions. Comparative evaluation reveals pattern-based routers achieve at most 53.3% avoidance with correctness failures, while MFEE reaches 100% avoidance with zero failures through semantic analysis. We prove this limitation via Theorem 1: any router operating solely on finite feature maps cannot simultaneously guarantee zero false skips and positive avoidance on feature-collision pairs. These results establish execution governance as a foundational layer in ML systems infrastructure, orthogonal to model-level optimization techniques.


GAN|对抗|攻击|生成相关(15篇)

【1】VAR RL Done Right: Tackling Asynchronous Policy Conflicts in Visual Autoregressive Generation
标题:VAR RL做得正确:解决视觉自回归生成中的同步政策冲突
链接:https://arxiv.org/abs/2601.02256

作者:Shikun Sun,Liao Qu,Huichao Zhang,Yiheng Liu,Yangyang Song,Xian Li,Xu Wang,Yi Jiang,Daniel K. Du,Xinglong Wu,Jia Jia
备注:Project page: https://github.com/ByteVisionLab/NextFlow
摘要:视觉生成由三种范式主导:自回归(AR)模型、扩散模型和视觉自回归(VAR)模型。与AR和扩散不同,VAR在其生成步骤中对异构输入结构进行操作,这会产生严重的异步策略冲突。这个问题在强化学习(RL)场景中变得特别严重,导致不稳定的训练和次优对齐。为了解决这个问题,我们提出了一个新的框架,以提高组相对策略优化(GRPO)明确管理这些冲突。我们的方法集成了三个协同组件:1)稳定的中间奖励,以指导早期阶段的生成; 2)用于精确信用分配的动态时间步重新加权方案;以及3)一种新的掩码传播算法,来自奖励反馈学习(ReFL)的原理,旨在隔离空间和时间上的优化效果。我们的方法在样本质量和目标对齐方面比普通GRPO基线有了显着改进,从而实现了VAR模型的稳健和有效优化。
摘要:Visual generation is dominated by three paradigms: AutoRegressive (AR), diffusion, and Visual AutoRegressive (VAR) models. Unlike AR and diffusion, VARs operate on heterogeneous input structures across their generation steps, which creates severe asynchronous policy conflicts. This issue becomes particularly acute in reinforcement learning (RL) scenarios, leading to unstable training and suboptimal alignment. To resolve this, we propose a novel framework to enhance Group Relative Policy Optimization (GRPO) by explicitly managing these conflicts. Our method integrates three synergistic components: 1) a stabilizing intermediate reward to guide early-stage generation; 2) a dynamic time-step reweighting scheme for precise credit assignment; and 3) a novel mask propagation algorithm, derived from principles of Reward Feedback Learning (ReFL), designed to isolate optimization effects both spatially and temporally. Our approach demonstrates significant improvements in sample quality and objective alignment over the vanilla GRPO baseline, enabling robust and effective optimization for VAR models.


【2】Learning with Monotone Adversarial Corruptions
标题:在单调对抗性腐蚀中学习
链接:https://arxiv.org/abs/2601.02193

作者:Kasper Green Larsen,Chirag Pabbaraju,Abhishek Shetty
摘要:我们通过引入单调对抗腐败模型来研究标准机器学习算法对数据交换和独立性的依赖程度。在这个模型中,对手看到一个“干净”的i.i.d.数据集,将他们选择的其他“损坏”点插入到数据集中。这些添加的点被约束为单调损坏,因为它们根据地面实况目标函数进行标记。也许令人惊讶的是,我们证明了在这种设置中,所有已知的二进制分类的最佳学习算法都可以在一个新的独立测试点上实现次优的预期误差,该测试点来自与干净数据集相同的分布。另一方面,我们表明,统一的基于收敛的算法不会降低他们的保证。我们的研究结果展示了最佳学习算法如何在看似有益的单调腐败面前崩溃,暴露了它们对交换的过度依赖。
摘要:We study the extent to which standard machine learning algorithms rely on exchangeability and independence of data by introducing a monotone adversarial corruption model. In this model, an adversary, upon looking at a "clean" i.i.d. dataset, inserts additional "corrupted" points of their choice into the dataset. These added points are constrained to be monotone corruptions, in that they get labeled according to the ground-truth target function. Perhaps surprisingly, we demonstrate that in this setting, all known optimal learning algorithms for binary classification can be made to achieve suboptimal expected error on a new independent test point drawn from the same distribution as the clean dataset. On the other hand, we show that uniform convergence-based algorithms do not degrade in their guarantees. Our results showcase how optimal learning algorithms break down in the face of seemingly helpful monotone corruptions, exposing their overreliance on exchangeability.


【3】A Differentiable Adversarial Framework for Task-Aware Data Subsampling
标题:任务感知数据二次抽样的差异对抗框架
链接:https://arxiv.org/abs/2601.02081

作者:Jiacheng Lyu,Bihua Bao
备注:14 pages
摘要:大规模数据集的激增对模型训练提出了重大的计算挑战。传统的数据子采样方法是一个静态的、与任务无关的预处理步骤,通常会丢弃对下游预测至关重要的信息。在本文中,我们介绍了对抗性软选择子采样(ASSS)框架,这是一种新的范式,将数据约简重构为可区分的端到端学习问题。ASSS使用选择器网络和任务网络之间的对抗博弈,选择器网络学习为样本分配连续的重要性权重。这种由Gumbel-Softmax松弛实现的直接优化允许选择器在损失函数的指导下识别和保留具有特定任务目标的最大信息量的样本,该损失函数平衡了预测的保真度和稀疏性。理论分析将这一框架与信息瓶颈原理联系起来。在四个大规模真实世界数据集上的综合实验表明,ASSS在保持模型性能方面始终优于启发式子采样基线,如聚类和最近邻细化。值得注意的是,ASSS不仅可以匹配,有时甚至超过整个数据集的训练性能,展示了智能去噪的效果。这项工作将任务感知数据子采样作为一个可学习的组件,为有效的大规模数据学习提供了一个原则性的解决方案。
摘要:The proliferation of large-scale datasets poses a major computational challenge to model training. The traditional data subsampling method works as a static, task independent preprocessing step which usually discards information that is critical to downstream prediction. In this paper, we introduces the antagonistic soft selection subsampling (ASSS) framework as is a novel paradigm that reconstructs data reduction into a differentiable end-to-end learning problem. ASSS uses the adversarial game between selector network and task network, and selector network learning assigns continuous importance weights to samples. This direct optimization implemented by Gumbel-Softmax relaxation allows the selector to identify and retain samples with the maximum amount of information for a specific task target under the guidance of the loss function that balances the fidelity and sparsity of the prediction. Theoretical analysis links this framework with the information bottleneck principle. Comprehensive experiments on four large-scale real world datasets show that ASSS has always been better than heuristic subsampling baselines such as clustering and nearest neighbor thinning in maintaining model performance. It is worth noting that ASSS can not only match, but also sometimes exceed the training performance of the entire dataset, showcasing the effect of intelligent denoising. This work establishes task aware data subsampling as a learnable component, providing a principled solution for effective large-scale data learning.


【4】FAROS: Robust Federated Learning with Adaptive Scaling against Backdoor Attacks
标题:FAROS:具有针对后门攻击的自适应扩展的稳健联邦学习
链接:https://arxiv.org/abs/2601.01833

作者:Chenyu Hu,Qiming Hu,Sinan Chen,Nianyu Li,Mingyue Zhang,Jialong Li
摘要:联合学习(FL)使多个客户端能够协作训练共享模型,而无需公开本地数据。然而,后门攻击对FL构成了重大威胁。这些攻击旨在将隐形触发器植入全局模型,导致其在具有特定触发器的输入上产生误导,而在良性数据上正常运行。尽管预聚合检测是主要的防御方向,但现有的最先进的防御通常依赖于固定的防御参数。这种依赖性使它们容易受到单点故障风险的影响,从而降低了它们对复杂攻击者的有效性。为了解决这些限制,我们提出了FAROS,一个增强的FL框架,它结合了自适应差分缩放(ADS)和鲁棒核心集计算(RCC)。ADS机制根据客户端在每轮中上传的梯度的分散度动态调整防御的灵敏度。这使得它能够对抗那些在隐身和有效性之间进行战略转换的攻击者。此外,RCC通过计算包括具有最高置信度的客户端的核心集合的质心来有效地减轻单点故障的风险。我们在各种数据集、模型和攻击场景中进行了广泛的实验。结果表明,我们的方法优于目前的防御在攻击成功率和主要任务的准确性。
摘要:Federated Learning (FL) enables multiple clients to collaboratively train a shared model without exposing local data. However, backdoor attacks pose a significant threat to FL. These attacks aim to implant a stealthy trigger into the global model, causing it to mislead on inputs that possess a specific trigger while functioning normally on benign data. Although pre-aggregation detection is a main defense direction, existing state-of-the-art defenses often rely on fixed defense parameters. This reliance makes them vulnerable to single-point-of-failure risks, rendering them less effective against sophisticated attackers. To address these limitations, we propose FAROS, an enhanced FL framework that incorporates Adaptive Differential Scaling (ADS) and Robust Core-set Computing (RCC). The ADS mechanism adjusts the defense's sensitivity dynamically, based on the dispersion of uploaded gradients by clients in each round. This allows it to counter attackers who strategically shift between stealthiness and effectiveness. Furthermore, the RCC effectively mitigates the risk of single-point failure by computing the centroid of a core set comprising clients with the highest confidence. We conducted extensive experiments across various datasets, models, and attack scenarios. The results demonstrate that our method outperforms current defenses in both attack success rate and main task accuracy.


【5】Adversarial Instance Generation and Robust Training for Neural Combinatorial Optimization with Multiple Objectives
标题:多目标神经组合优化的对抗性实例生成和鲁棒训练
链接:https://arxiv.org/abs/2601.01665

作者:Wei Liu,Yaoxin Wu,Yingqian Zhang,Thomas Bäck,Yingjie Fan
摘要:深度强化学习(DRL)在解决多目标组合优化问题(MOCOPs)方面表现出了巨大的潜力。然而,这些基于学习的求解器的鲁棒性仍然没有得到充分的探索,特别是在不同的和复杂的问题分布。在本文中,我们提出了一个统一的鲁棒性为导向的框架偏好条件DRL求解MOCOP。在这个框架内,我们开发了一种基于偏好的对抗性攻击,以生成暴露求解器弱点的硬实例,并通过对Pareto前沿质量的退化来量化攻击的影响。我们还引入了一种防御策略,将硬度感知偏好选择集成到对抗训练中,以减少对受限偏好区域的过度拟合,并提高分布外性能。在多目标旅行商问题(MOTSP)、多目标容量限制车辆路径问题(MOCVRP)和多目标背包问题(MOKP)上的实验结果表明,该攻击方法能够成功地学习不同求解器的硬实例。此外,我们的防御方法显着增强了神经求解器的鲁棒性和通用性,在硬或非分布实例上提供卓越的性能。
摘要:Deep reinforcement learning (DRL) has shown great promise in addressing multi-objective combinatorial optimization problems (MOCOPs). Nevertheless, the robustness of these learning-based solvers has remained insufficiently explored, especially across diverse and complex problem distributions. In this paper, we propose a unified robustness-oriented framework for preference-conditioned DRL solvers for MOCOPs. Within this framework, we develop a preference-based adversarial attack to generate hard instances that expose solver weaknesses, and quantify the attack impact by the resulting degradation on Pareto-front quality. We further introduce a defense strategy that integrates hardness-aware preference selection into adversarial training to reduce overfitting to restricted preference regions and improve out-of-distribution performance. The experimental results on multi-objective traveling salesman problem (MOTSP), multi-objective capacitated vehicle routing problem (MOCVRP), and multi-objective knapsack problem (MOKP) verify that our attack method successfully learns hard instances for different solvers. Furthermore, our defense method significantly strengthens the robustness and generalizability of neural solvers, delivering superior performance on hard or out-of-distribution instances.


【6】Length-Aware Adversarial Training for Variable-Length Trajectories: Digital Twins for Mall Shopper Paths
标题:可变长度轨迹的长度感知对抗训练:购物中心购物者路径的数字双胞胎
链接:https://arxiv.org/abs/2601.01663

作者:He Sun,Jiwoong Shin,Ravi Dhar
摘要:我们研究生成建模的\endash {可变长度轨迹} -访问的位置/项目序列与相关的时间戳-下游模拟和反事实分析。一个反复出现的实际问题是,当轨迹长度高度异质时,标准的小批量训练可能是不稳定的,这反过来又会降低概率导出统计的分布匹配。我们提出了\textbf{长度感知采样(LAS)},这是一种简单的重复策略,可以按长度对轨迹进行分组,并从单个长度桶中对批次进行采样,从而在不更改模型类的情况下减少批次内长度异质性(并使更新更加一致)。我们将LAS集成到一个有条件的轨迹GAN与辅助时间对齐损失,并提供(i)在温和的有界性假设下导出变量的分布水平保证,以及(ii)IPM/Wasserstein机制解释了为什么LAS通过删除长度只捷径批评和针对桶内差异来改善分布匹配。从经验上讲,LAS始终如一地改善了购物者轨迹的多商场数据集和各种公共序列数据集(GPS,教育,电子商务和电影)上衍生变量分布的匹配,在特定于购物车的指标上优于随机抽样。
摘要 :We study generative modeling of \emph{variable-length trajectories} -- sequences of visited locations/items with associated timestamps -- for downstream simulation and counterfactual analysis. A recurring practical issue is that standard mini-batch training can be unstable when trajectory lengths are highly heterogeneous, which in turn degrades \emph{distribution matching} for trajectory-derived statistics. We propose \textbf{length-aware sampling (LAS)}, a simple batching strategy that groups trajectories by length and samples batches from a single length bucket, reducing within-batch length heterogeneity (and making updates more consistent) without changing the model class. We integrate LAS into a conditional trajectory GAN with auxiliary time-alignment losses and provide (i) a distribution-level guarantee for derived variables under mild boundedness assumptions, and (ii) an IPM/Wasserstein mechanism explaining why LAS improves distribution matching by removing length-only shortcut critics and targeting within-bucket discrepancies. Empirically, LAS consistently improves matching of derived-variable distributions on a multi-mall dataset of shopper trajectories and on diverse public sequence datasets (GPS, education, e-commerce, and movies), outperforming random sampling across dataset-specific metrics.


【7】Learning Resilient Elections with Adversarial GNNs
标题:通过对抗性GNN学习弹性选举
链接:https://arxiv.org/abs/2601.01653

作者:Hao Xiang Li,Yash Shah,Lorenzo Giusti
摘要:面对不利的动机,达成共识是必不可少的。自17世纪以来,选举一直是现代民主运作的规范方式。如今,它们规范市场,为现代推荐系统或点对点网络提供引擎,并仍然是代表民主的主要方法。然而,一个理想的普遍的投票规则,满足所有假设的情况下仍然是一个具有挑战性的课题,这些系统的设计是在机制设计研究的前沿。自动机制设计是一种很有前途的方法,最近的工作表明,集不变的架构是唯一适合模拟选举系统。然而,各种问题阻止直接应用到现实世界的设置,如鲁棒性的战略投票。在本文中,我们概括了学习的投票规则的表达能力,并将神经网络架构的改进与对抗训练相结合,以提高投票规则的弹性,同时最大限度地提高社会福利。我们评估了我们的方法在合成和真实世界数据集上的有效性。我们的方法解决了先前工作的关键限制,通过使用二分图表示选举来学习投票规则,并使用图神经网络学习这些投票规则。我们相信这为将机器学习应用于现实世界的选举开辟了新的领域。
摘要:In the face of adverse motives, it is indispensable to achieve a consensus. Elections have been the canonical way by which modern democracy has operated since the 17th century. Nowadays, they regulate markets, provide an engine for modern recommender systems or peer-to-peer networks, and remain the main approach to represent democracy. However, a desirable universal voting rule that satisfies all hypothetical scenarios is still a challenging topic, and the design of these systems is at the forefront of mechanism design research. Automated mechanism design is a promising approach, and recent works have demonstrated that set-invariant architectures are uniquely suited to modelling electoral systems. However, various concerns prevent the direct application to real-world settings, such as robustness to strategic voting. In this paper, we generalise the expressive capability of learned voting rules, and combine improvements in neural network architecture with adversarial training to improve the resilience of voting rules while maximizing social welfare. We evaluate the effectiveness of our methods on both synthetic and real-world datasets. Our method resolves critical limitations of prior work regarding learning voting rules by representing elections using bipartite graphs, and learning such voting rules using graph neural networks. We believe this opens new frontiers for applying machine learning to real-world elections.


【8】Causal discovery for linear causal model with correlated noise: an Adversarial Learning Approach
标题:具有相关噪音的线性因果模型的因果发现:对抗学习方法
链接:https://arxiv.org/abs/2601.01368

作者:Mujin Zhou,Junzhe Zhang
摘要:从具有不可测量混杂因素的数据中发现因果关系是一个具有挑战性的问题。本文提出了一种基于f-GAN框架的方法,学习独立于特定权重值的二元因果结构。我们将结构学习问题重新表述为最小化贝叶斯自由能,并证明了这个问题等价于最小化真实数据分布和模型生成分布之间的f-散度。使用f-GAN框架,我们将此目标转换为最小-最大对抗优化问题。我们使用Gumbel-Softmax松弛在离散图空间中实现梯度搜索。
摘要:Causal discovery from data with unmeasured confounding factors is a challenging problem. This paper proposes an approach based on the f-GAN framework, learning the binary causal structure independent of specific weight values. We reformulate the structure learning problem as minimizing Bayesian free energy and prove that this problem is equivalent to minimizing the f-divergence between the true data distribution and the model-generated distribution. Using the f-GAN framework, we transform this objective into a min-max adversarial optimization problem. We implement the gradient search in the discrete graph space using Gumbel-Softmax relaxation.


【9】AppellateGen: A Benchmark for Appellate Legal Judgment Generation
标题:AppellateGen:上诉法律判决生成的基准
链接:https://arxiv.org/abs/2601.01331

作者:Hongkun Yang,Lionel Z. Wang,Wei Fan,Yiran Hu,Lixu Wang,Chenyu Liu,Shenghong Fu,Haoyang Li,Xin Xu,Jiexin Zheng,Wei Dong
备注:15 pages, 4 figures, 3 tables
摘要:法律判断生成是法律智能的一项重要任务。然而,现有的法律判决生成研究主要集中在一审审判,依赖于静态的事实到判决的映射,而忽视了上诉(二审)审查的辩证性质。为了解决这个问题,我们引入AppellateGen,这是一个用于生成二审法律判决的基准,包括7,351个案例对。该任务要求模型通过对初始判决和证据更新进行推理来起草具有法律约束力的判决,从而对审判阶段之间的因果关系进行建模。我们进一步提出了一个司法标准操作程序(SOP)为基础的法律多代理系统(SLMAS)模拟司法工作流程,它分解成离散的阶段问题识别,检索和起草的生成过程。实验结果表明,虽然SLMAS提高了逻辑一致性,上诉推理的复杂性仍然是当前LLM的一个重大挑战。数据集和代码可在https://anonymous.4open.science/r/AppellateGen-5763上公开获取。
摘要:Legal judgment generation is a critical task in legal intelligence. However, existing research in legal judgment generation has predominantly focused on first-instance trials, relying on static fact-to-verdict mappings while neglecting the dialectical nature of appellate (second-instance) review. To address this, we introduce AppellateGen, a benchmark for second-instance legal judgment generation comprising 7,351 case pairs. The task requires models to draft legally binding judgments by reasoning over the initial verdict and evidentiary updates, thereby modeling the causal dependency between trial stages. We further propose a judicial Standard Operating Procedure (SOP)-based Legal Multi-Agent System (SLMAS) to simulate judicial workflows, which decomposes the generation process into discrete stages of issue identification, retrieval, and drafting. Experimental results indicate that while SLMAS improves logical consistency, the complexity of appellate reasoning remains a substantial challenge for current LLMs. The dataset and code are publicly available at: https://anonymous.4open.science/r/AppellateGen-5763.


【10】Explainability-Guided Defense: Attribution-Aware Model Refinement Against Adversarial Data Attacks
标题:解释性引导防御:针对对抗性数据攻击的归因感知模型细化
链接:https://arxiv.org/abs/2601.00968

作者:Longwei Wang,Mohammad Navid Nayyem,Abdullah Al Rakin,KC Santosh,Chaowei Zhang,Yang Zhou
备注:8pages,4 figures
摘要 :在医疗保健和自主导航等安全关键领域,对深度学习模型的依赖越来越大,这凸显了对对抗性扰动和决策透明的防御的需求。在本文中,我们确定了可解释性和鲁棒性之间的联系,可以在训练过程中直接利用。具体来说,我们观察到,通过局部可解释模型不可知解释(LIME)识别的虚假,不稳定或语义无关的特征对对抗性脆弱性的贡献不成比例。在此基础上,我们引入了一个属性引导的细化框架,将LIME从被动诊断转换为主动训练信号。我们的方法在闭环细化管道中使用特征掩蔽,敏感性感知正则化和对抗性增强来系统地抑制虚假特征。这种方法不需要额外的数据集或模型架构,并且可以无缝集成到标准的对抗训练中。从理论上讲,我们推导出一个属性感知的对抗性失真的下限,正式解释对齐和鲁棒性之间的联系。对CIFAR-10、CIFAR-10-C和CIFAR-100的实证评估表明,对抗鲁棒性和分布外泛化能力有了显著改善。
摘要:The growing reliance on deep learning models in safety-critical domains such as healthcare and autonomous navigation underscores the need for defenses that are both robust to adversarial perturbations and transparent in their decision-making. In this paper, we identify a connection between interpretability and robustness that can be directly leveraged during training. Specifically, we observe that spurious, unstable, or semantically irrelevant features identified through Local Interpretable Model-Agnostic Explanations (LIME) contribute disproportionately to adversarial vulnerability. Building on this insight, we introduce an attribution-guided refinement framework that transforms LIME from a passive diagnostic into an active training signal. Our method systematically suppresses spurious features using feature masking, sensitivity-aware regularization, and adversarial augmentation in a closed-loop refinement pipeline. This approach does not require additional datasets or model architectures and integrates seamlessly into standard adversarial training. Theoretically, we derive an attribution-aware lower bound on adversarial distortion that formalizes the link between explanation alignment and robustness. Empirical evaluations on CIFAR-10, CIFAR-10-C, and CIFAR-100 demonstrate substantial improvements in adversarial robustness and out-of-distribution generalization.


【11】When to Ponder: Adaptive Compute Allocation for Code Generation via Test-Time Training
标题:何时思考:通过测试时训练为代码生成自适应计算分配
链接:https://arxiv.org/abs/2601.00894

作者:Gihyeon Sim
备注:14 pages, 1 figure, 14 tables, code available at https://github.com/deveworld/ponderTTT
摘要:大型语言模型对所有输入应用统一的计算,而不管难度如何。我们提出了PonderTTT,一种门控策略,使用TTT层的自监督重建损失来选择性地触发测试时训练(TTT)更新。门控决策本身是无需训练的-不需要学习分类器或辅助网络;只有一个标量阈值最初在未标记的数据上校准,并通过EMA不断调整以保持目标更新速率。我们使用GPT-2模型(124 M到1.5B)对代码语言建模(The Stack v2,教师强迫困惑)进行的实验表明,该信号是推理兼容的,不需要地面真实标签。我们的重建门控实现了82-89%的Oracle恢复,同时完全无需培训,显著优于随机跳过基线(OOD语言的损失降低高达16%)。
摘要:Large language models apply uniform computation to all inputs, regardless of difficulty. We propose PonderTTT, a gating strategy using the TTT layer's self-supervised reconstruction loss to selectively trigger Test-Time Training (TTT) updates. The gating decision itself is training-free--requiring no learned classifier or auxiliary networks; only a single scalar threshold is initially calibrated on unlabeled data and continuously adapted via EMA to maintain target update rates. Our experiments with GPT-2 models (124M to 1.5B) on code language modeling (The Stack v2, teacher-forced perplexity) demonstrate that this signal is inference-compatible, requiring no ground-truth labels. Our Reconstruction Gating achieves 82-89% Oracle Recovery while being fully training-free, significantly outperforming Random Skip baselines (up to 16% lower loss on OOD languages).


【12】Quantum Machine Learning Approaches for Coordinated Stealth Attack Detection in Distributed Generation Systems
标题:分布式发电系统中协调隐形攻击检测的量子机器学习方法
链接:https://arxiv.org/abs/2601.00873

作者:Osasumwen Cedric Ogiesoba-Eguakun,Suman Rath
备注:10 pages
摘要:协同隐形攻击对分布式发电系统来说是一个严重的网络安全威胁,因为它们修改控制和测量信号,同时保持接近正常行为,使得它们难以使用标准入侵检测方法检测到。本研究探讨了量子机器学习方法,用于检测微电网中分布式发电单元的协同隐形攻击。高质量的模拟测量被用来创建一个平衡的二进制分类数据集,使用三个功能:无功功率在DG 1,频率偏差相对于标称值,和终端电压幅值。评估了经典机器学习基线、全量子变分分类器和混合量子经典模型。结果表明,将量子特征嵌入与经典RBF支持向量机相结合的混合量子经典模型在这个低维数据集上实现了最佳的整体性能,在强经典SVM基线上的准确性和F1得分略有提高。由于训练不稳定和当前NISQ硬件的限制,完全量子模型的性能更差。相比之下,混合模型的训练更可靠,并证明量子特征映射可以增强入侵检测,即使完全量子学习还不实用。
摘要:Coordinated stealth attacks are a serious cybersecurity threat to distributed generation systems because they modify control and measurement signals while remaining close to normal behavior, making them difficult to detect using standard intrusion detection methods. This study investigates quantum machine learning approaches for detecting coordinated stealth attacks on a distributed generation unit in a microgrid. High-quality simulated measurements were used to create a balanced binary classification dataset using three features: reactive power at DG1, frequency deviation relative to the nominal value, and terminal voltage magnitude. Classical machine learning baselines, fully quantum variational classifiers, and hybrid quantum classical models were evaluated. The results show that a hybrid quantum classical model combining quantum feature embeddings with a classical RBF support vector machine achieves the best overall performance on this low dimensional dataset, with a modest improvement in accuracy and F1 score over a strong classical SVM baseline. Fully quantum models perform worse due to training instability and limitations of current NISQ hardware. In contrast, hybrid models train more reliably and demonstrate that quantum feature mapping can enhance intrusion detection even when fully quantum learning is not yet practical.


【13】SLO-Conditioned Action Routing for Retrieval-Augmented Generation: Objective Ablation and Failure Modes
标题:回收增强一代的SLO条件动作路径:客观消融和故障模式
链接:https://arxiv.org/abs/2601.00841

作者:Bharath Nunepalli
摘要:检索增强生成(RAG)引入了一个实际的控制问题:检索深度和生成行为必须选择每个查询,以满足服务级目标(SLO),如成本,拒绝率和幻觉风险。这项工作将每个查询的控制建模为一个小的离散操作:选择检索深度和生成模式(保护与自动),或者拒绝。通过执行每个动作并记录准确性、令牌成本、幻觉/拒绝指标和SLO加权奖励,从SQuAD 2.0构建离线日志数据集。两个简单的政策学习目标进行评估:监督分类的每状态最佳行动(Argmax-CE)和奖励加权的变体(Argmax-CE-WT)。在整个评估的设置,一个强大的固定基线(低k,保护提示)执行竞争力;学习的政策主要提供额外的成本节约下,以质量为中心的SLO,可以表现出拒绝崩溃下,一个廉价的SLO拒绝时,奖励丰厚。的贡献是一个可重复的案例研究的SLO意识控制RAG管道,强调故障模式和报告惯例,而不是提出一个新的检索器或语言模型。
摘要:Retrieval-augmented generation (RAG) introduces a practical control problem: retrieval depth and generation behavior must be chosen per query to satisfy service-level objectives (SLOs) such as cost, refusal rate, and hallucination risk. This work models per-query control as a small discrete action: choose a retrieval depth and a generation mode (guarded vs. auto), or refuse. An offline logged dataset is constructed from SQuAD 2.0 by executing each action and recording accuracy, token cost, hallucination/refusal indicators, and an SLO-weighted reward. Two simple policy-learning objectives are evaluated: supervised classification of the per-state best action (Argmax-CE) and a reward-weighted variant (Argmax-CE-WT). Across the evaluated settings, a strong fixed baseline (low k, guarded prompting) performs competitively; learned policies mainly provide additional cost savings under a quality-focused SLO and can exhibit refusal collapse under a cheap SLO when refusal is heavily rewarded. The contribution is a reproducible case study of SLO-aware control for RAG pipelines, emphasizing failure modes and reporting conventions rather than proposing a new retriever or language model.


【14】ShrimpXNet: A Transfer Learning Framework for Shrimp Disease Classification with Augmented Regularization, Adversarial Training, and Explainable AI
标题 :ShrimpXNet:具有增强正规化、对抗训练和可解释人工智能的虾病分类转移学习框架
链接:https://arxiv.org/abs/2601.00832

作者:Israk Hasan Jone,D. M. Rafiun Bin Masud,Promit Sarker,Sayed Fuad Al Labib,Nazmul Islam,Farhad Billah
备注:8 Page, fugure 11
摘要:虾是全球消费最广泛的水生物种之一,因其营养成分和经济重要性而受到重视。养虾是许多地区的重要收入来源;然而,与其他形式的水产养殖一样,它受到疾病爆发的严重影响。这些疾病对虾的可持续生产构成重大挑战。为了解决这个问题,自动疾病分类方法可以提供及时和准确的检测。这项研究提出了一种基于深度学习的方法,用于虾病的自动分类。使用了包括四种疾病类别的1,149张图像的数据集。部署了六个预训练的深度学习模型,ResNet 50,EfficientNet,DenseNet 201,MobileNet,ConvNeXt-Tiny和Xception,并对其性能进行了评估。图像背景被移除,然后通过Keras图像管道进行标准化预处理。采用快速梯度符号法(FGSM)通过对抗训练增强模型的鲁棒性。而先进的增强策略,包括CutMix和MixUp,则用于减轻过拟合和提高泛化能力。为了支持可解释性,并可视化模型关注区域,应用了Grad-CAM、Grad-CAM++和XGrad-CAM等事后解释方法。探索性结果表明,ConvNeXt-Tiny实现了最高的性能,在测试数据集上达到了96.88%的准确率。经过1000次迭代后,模型的99%置信区间为[0.953,0.971]。
摘要:Shrimp is one of the most widely consumed aquatic species globally, valued for both its nutritional content and economic importance. Shrimp farming represents a significant source of income in many regions; however, like other forms of aquaculture, it is severely impacted by disease outbreaks. These diseases pose a major challenge to sustainable shrimp production. To address this issue, automated disease classification methods can offer timely and accurate detection. This research proposes a deep learning-based approach for the automated classification of shrimp diseases. A dataset comprising 1,149 images across four disease classes was utilized. Six pretrained deep learning models, ResNet50, EfficientNet, DenseNet201, MobileNet, ConvNeXt-Tiny, and Xception were deployed and evaluated for performance. The images background was removed, followed by standardized preprocessing through the Keras image pipeline. Fast Gradient Sign Method (FGSM) was used for enhancing the model robustness through adversarial training. While advanced augmentation strategies, including CutMix and MixUp, were implemented to mitigate overfitting and improve generalization. To support interpretability, and to visualize regions of model attention, post-hoc explanation methods such as Grad-CAM, Grad-CAM++, and XGrad-CAM were applied. Exploratory results demonstrated that ConvNeXt-Tiny achieved the highest performance, attaining a 96.88% accuracy on the test dataset. After 1000 iterations, the 99% confidence interval for the model is [0.953,0.971].


【15】MORE: Multi-Objective Adversarial Attacks on Speech Recognition
标题:更多:语音识别的多目标对抗攻击
链接:https://arxiv.org/abs/2601.01852

作者:Xiaoxue Gao,Zexin Li,Yiming Chen,Nancy F. Chen
备注:19 pages
摘要:大规模自动语音识别(ASR)模型(如Whisper)的出现大大扩展了它们在各种现实应用中的应用。因此,确保对即使是微小的输入扰动的鲁棒性对于在实时环境中保持可靠的性能至关重要。虽然之前的工作主要研究了对抗性攻击下的准确性下降,但在效率方面的鲁棒性在很大程度上仍未得到探索。这种狭隘的关注只提供了对ASR模型漏洞的部分理解。为了解决这一差距,我们进行了全面的研究,在多种攻击场景下的ASR鲁棒性。我们引入了MORE,一种多目标重复加倍鼓励攻击,通过分层的阶段性排斥锚定机制,共同降低识别精度和推理效率。具体来说,我们将多目标对抗优化重新表述为一个分层框架,依次实现双重目标。为了进一步提高有效性,我们提出了一种新的重复鼓励加倍目标(REDO),通过保持准确性下降和定期加倍预测序列长度来诱导重复文本生成。总体而言,MORE迫使ASR模型以更高的计算成本产生不正确的transmittance,由单个对抗性输入触发。实验表明,与现有基线相比,MORE始终产生显著更长的传输时间,同时保持较高的单词错误率,强调了其在多目标对抗攻击中的有效性。
摘要:The emergence of large-scale automatic speech recognition (ASR) models such as Whisper has greatly expanded their adoption across diverse real-world applications. Ensuring robustness against even minor input perturbations is therefore critical for maintaining reliable performance in real-time environments. While prior work has mainly examined accuracy degradation under adversarial attacks, robustness with respect to efficiency remains largely unexplored. This narrow focus provides only a partial understanding of ASR model vulnerabilities. To address this gap, we conduct a comprehensive study of ASR robustness under multiple attack scenarios. We introduce MORE, a multi-objective repetitive doubling encouragement attack, which jointly degrades recognition accuracy and inference efficiency through a hierarchical staged repulsion-anchoring mechanism. Specifically, we reformulate multi-objective adversarial optimization into a hierarchical framework that sequentially achieves the dual objectives. To further amplify effectiveness, we propose a novel repetitive encouragement doubling objective (REDO) that induces duplicative text generation by maintaining accuracy degradation and periodically doubling the predicted sequence length. Overall, MORE compels ASR models to produce incorrect transcriptions at a substantially higher computational cost, triggered by a single adversarial input. Experiments show that MORE consistently yields significantly longer transcriptions while maintaining high word error rates compared to existing baselines, underscoring its effectiveness in multi-objective adversarial attack.


半/弱/无/有监督|不确定性|主动学习(8篇)

【1】Subimage Overlap Prediction: Task-Aligned Self-Supervised Pretraining For Semantic Segmentation In Remote Sensing Imagery
标题:子图像重叠预测:任务一致的自监督预训练用于遥感图像中的语义分割
链接:https://arxiv.org/abs/2601.01781

作者:Lakshay Sharma,Alex Marin
备注:Accepted at CV4EO Workshop at WACV 2026
摘要:自监督学习(SSL)方法已经成为创建通用模型的主要范例,其能力可以转移到下游监督学习任务。然而,大多数这样的方法依赖于大量的预训练数据。这项工作介绍了子图像重叠预测,一种新的自我监督的预训练任务,以帮助语义分割的遥感图像,使用显着较少的预训练图像。给定图像,提取子图像,并且训练模型以产生原始图像内所提取的子图像的位置的语义掩码。我们证明,使用此任务进行预训练可以显著加快收敛速度,并且在下游分割上具有相同或更好的性能(通过mIoU测量)。当标记的训练数据减少时,收敛和性能的差距会扩大。我们在多个架构类型和多个下游数据集上展示了这一点。我们还表明,我们的方法匹配或超过性能,同时相对于其他SSL方法需要更少的预训练数据。代码和模型权重在\href{https://github.com/sharmalakshay93/subimage-prediction}{github.com/sharmalakshay93/subimage-prediction}中提供。
摘要:Self-supervised learning (SSL) methods have become a dominant paradigm for creating general purpose models whose capabilities can be transferred to downstream supervised learning tasks. However, most such methods rely on vast amounts of pretraining data. This work introduces Subimage Overlap Prediction, a novel self-supervised pretraining task to aid semantic segmentation in remote sensing imagery that uses significantly lesser pretraining imagery. Given an image, a sub-image is extracted and the model is trained to produce a semantic mask of the location of the extracted sub-image within the original image. We demonstrate that pretraining with this task results in significantly faster convergence, and equal or better performance (measured via mIoU) on downstream segmentation. This gap in convergence and performance widens when labeled training data is reduced. We show this across multiple architecture types, and with multiple downstream datasets. We also show that our method matches or exceeds performance while requiring significantly lesser pretraining data relative to other SSL methods. Code and model weights are provided at \href{https://github.com/sharmalakshay93/subimage-overlap-prediction}{github.com/sharmalakshay93/subimage-overlap-prediction}.


【2】ARGUS: Adaptive Rotation-Invariant Geometric Unsupervised System
标题:ARGucci:自适应旋转不变几何无监督系统
链接:https://arxiv.org/abs/2601.01297

作者:Anantha Sharma
备注:26 pages
摘要:检测高维数据流中的分布漂移提出了根本性的挑战:全局比较方法的规模差,基于投影的方法失去几何结构,重新聚类方法遭受身份不稳定性。本文介绍了阿格斯,一个框架,重新概念化的漂移检测跟踪本地统计数据在一个固定的空间分区的数据流形。   主要贡献有四个方面。首先,证明了Voronoi镶嵌规范正交框架产生的漂移度量是不变的正交变换。保持欧几里得几何的旋转和反射。其次,它是建立在这个框架实现了O(N)的复杂性,每个快照,同时提供细胞级的分布变化的空间定位。第三,漂移传播的图论表征的开发,区分相干分布的变化孤立的扰动。第四,通过将空间分解成独立的子空间并跨子空间聚合漂移信号,引入乘积量化曲面细分以缩放到非常高的维度(d>500)。   本文形式化的理论基础,证明不变性,并提出实验验证证明,该框架正确识别坐标旋转下的漂移,而现有的方法产生误报。棋盘形的方法提供了一个原则性的几何基础分布监测,保留高维结构,而没有成对比较的计算负担。
摘要:Detecting distributional drift in high-dimensional data streams presents fundamental challenges: global comparison methods scale poorly, projection-based approaches lose geometric structure, and re-clustering methods suffer from identity instability. This paper introduces Argus, A framework that reconceptualizes drift detection as tracking local statistics over a fixed spatial partition of the data manifold.   The key contributions are fourfold. First, it is proved that Voronoi tessellations over canonical orthonormal frames yield drift metrics that are invariant to orthogonal transformations. The rotations and reflections that preserve Euclidean geometry. Second, it is established that this framework achieves O(N) complexity per snapshot while providing cell-level spatial localization of distributional change. Third, a graph-theoretic characterization of drift propagation is developed that distinguishes coherent distributional shifts from isolated perturbations. Fourth, product quantization tessellation is introduced for scaling to very high dimensions (d>500) by decomposing the space into independent subspaces and aggregating drift signals across subspaces.   This paper formalizes the theoretical foundations, proves invariance properties, and presents experimental validation demonstrating that the framework correctly identifies drift under coordinate rotation while existing methods produce false positives. The tessellated approach offers a principled geometric foundation for distribution monitoring that preserves high-dimensional structure without the computational burden of pairwise comparisons.


【3】The Alchemy of Thought: Understanding In-Context Learning Through Supervised Classification
标题:思想的炼金术:通过监督分类理解上下文学习
链接:https://arxiv.org/abs/2601.01290

作者:Harshita Narnoli,Mihai Surdeanu
备注:International Joint Conference on Natural Language Processing & Asia-Pacific Chapter of the Association for Computational Linguistics, 2025
摘要:情境学习(ICL)已经成为一种重要的范式,可以快速定制LLM以适应新任务,而无需进行微调。然而,尽管有经验证据证明其有用性,我们仍然没有真正了解ICL是如何工作的。在本文中,我们比较了在上下文学习的行为与监督分类器训练ICL演示调查三个研究问题:(1)LLM与ICL的行为相似的分类器训练相同的例子?(2)如果是这样,哪些分类器更接近,那些基于梯度下降(GD)的分类器还是那些基于k-最近邻(kNN)的分类器?(3)当他们的行为不相似时,什么条件与行为的差异有关?使用文本分类作为用例,使用六个数据集和三个LLM,我们观察到当演示的相关性很高时,LLM的行为与这些分类器相似。平均而言,ICL比逻辑回归更接近kNN,这提供了经验证据,表明注意力机制的行为比GD更类似于kNN。然而,当演示相关性较低时,LLM比这些分类器表现得更好,可能是因为LLM可以退回到它们的参数记忆,这是这些分类器所没有的奢侈品。
摘要:In-context learning (ICL) has become a prominent paradigm to rapidly customize LLMs to new tasks without fine-tuning. However, despite the empirical evidence of its usefulness, we still do not truly understand how ICL works. In this paper, we compare the behavior of in-context learning with supervised classifiers trained on ICL demonstrations to investigate three research questions: (1) Do LLMs with ICL behave similarly to classifiers trained on the same examples? (2) If so, which classifiers are closer, those based on gradient descent (GD) or those based on k-nearest neighbors (kNN)? (3) When they do not behave similarly, what conditions are associated with differences in behavior? Using text classification as a use case, with six datasets and three LLMs, we observe that LLMs behave similarly to these classifiers when the relevance of demonstrations is high. On average, ICL is closer to kNN than logistic regression, giving empirical evidence that the attention mechanism behaves more similarly to kNN than GD. However, when demonstration relevance is low, LLMs perform better than these classifiers, likely because LLMs can back off to their parametric memory, a luxury these classifiers do not have.


【4】Adaptive Conformal Prediction via Bayesian Uncertainty Weighting for Hierarchical Healthcare Data
标题:通过分层医疗保健数据的Bayesian不确定性加权进行自适应保形预测
链接:https://arxiv.org/abs/2601.01223

作者:Marzieh Amiri Shahbazi,Ali Baheri,Nasibeh Azadeh-Fard
摘要:临床决策需要提供无分布覆盖保证和风险自适应精度的不确定性量化,现有方法无法共同满足这些要求。我们提出了一个混合贝叶斯共形框架,解决了医疗保健预测的这一基本限制。我们的方法将贝叶斯分层随机森林与组感知的共形校准相结合,使用后验不确定性来加权一致性分数,同时保持严格的覆盖有效性。通过对美国3,793家医院和4个地区的61,538名住院患者进行评估,我们的方法实现了目标覆盖率(94.3% vs 95%目标),具有自适应精度:低不确定性病例的区间较窄21%,而高风险预测的区间适当扩大。重要的是,我们证明了校准良好的贝叶斯不确定性严重掩盖(14.1%),突出了我们的混合方法的必要性。该框架实现了风险分层的临床协议,高置信度预测的有效资源规划,以及对不确定病例加强监督的保守分配,在不同的医疗保健环境中提供不确定性感知决策支持。
摘要:Clinical decision-making demands uncertainty quantification that provides both distribution-free coverage guarantees and risk-adaptive precision, requirements that existing methods fail to jointly satisfy. We present a hybrid Bayesian-conformal framework that addresses this fundamental limitation in healthcare predictions. Our approach integrates Bayesian hierarchical random forests with group-aware conformal calibration, using posterior uncertainties to weight conformity scores while maintaining rigorous coverage validity. Evaluated on 61,538 admissions across 3,793 U.S. hospitals and 4 regions, our method achieves target coverage (94.3% vs 95% target) with adaptive precision: 21% narrower intervals for low-uncertainty cases while appropriately widening for high-risk predictions. Critically, we demonstrate that well-calibrated Bayesian uncertainties alone severely under-cover (14.1%), highlighting the necessity of our hybrid approach. This framework enables risk-stratified clinical protocols, efficient resource planning for high-confidence predictions, and conservative allocation with enhanced oversight for uncertain cases, providing uncertainty-aware decision support across diverse healthcare settings.


【5】Sparse Bayesian Message Passing under Structural Uncertainty
标题:结构不确定性下的稀疏Bayesian消息传递
链接:https://arxiv.org/abs/2601.01207

作者:Yoonhyuk Choi,Jiho Choi,Chanran Kim,Yumin Lee,Hawon Shin,Yeowon Jeon,Minjeong Kim,Jiwoo Kang
摘要:现实世界图上的半监督学习经常受到异质性的挑战,其中观察到的图是不可靠的或标签不一致的。许多现有的图神经网络要么依赖于固定的邻接结构,要么试图通过正则化来处理结构噪声。在这项工作中,我们明确地捕捉结构的不确定性,通过建模后验分布在有符号的邻接矩阵,允许每个边缘是积极的,消极的,或缺席。我们提出了一个稀疏的签名消息传递网络,自然是强大的边缘噪声和异质性,这可以从贝叶斯的角度来解释。通过将(i)符号图结构上的后边缘化与(ii)稀疏符号消息聚合相结合,我们的方法提供了一种处理边缘噪声和异质性的原则性方法。实验结果表明,我们的方法优于强基线模型的heterophilic基准下的合成和真实世界的结构噪声。
摘要 :Semi-supervised learning on real-world graphs is frequently challenged by heterophily, where the observed graph is unreliable or label-disassortative. Many existing graph neural networks either rely on a fixed adjacency structure or attempt to handle structural noise through regularization. In this work, we explicitly capture structural uncertainty by modeling a posterior distribution over signed adjacency matrices, allowing each edge to be positive, negative, or absent. We propose a sparse signed message passing network that is naturally robust to edge noise and heterophily, which can be interpreted from a Bayesian perspective. By combining (i) posterior marginalization over signed graph structures with (ii) sparse signed message aggregation, our approach offers a principled way to handle both edge noise and heterophily. Experimental results demonstrate that our method outperforms strong baseline models on heterophilic benchmarks under both synthetic and real-world structural noise.


【6】Self-Training the Neurochaos Learning Algorithm
标题:自训练神经混乱学习算法
链接:https://arxiv.org/abs/2601.01146

作者:Anusree M,Akhila Henry,Pramod P Nair
摘要:在许多实际应用中,获取大量的标记数据是具有挑战性和昂贵的,但未标记的数据是很容易获得的。传统的监督学习方法在具有少量标记数据或不平衡数据集的场景中经常表现不佳。本研究介绍了一种混合半监督学习(SSL)架构,该架构将神经混沌学习(NL)与基于阈值的自训练(ST)方法相结合,以克服这一限制。NL架构将输入特征转换为基于混沌的环速率表示,该表示将非线性关系封装在数据中,而ST利用高置信度伪标记样本逐步扩大标记集。该模型的性能使用10个基准数据集和5个机器学习分类器进行评估,其中85%的训练数据被认为是未标记的,只有15%被用作标记数据。相对于独立的ST模型,所提出的自训练神经混沌学习(NL+ST)架构始终获得卓越的性能增益,特别是在有限的、非线性和不平衡的数据集上,如Iris(188.66%)、Wine(158.58%)和Glass Identification(110.48%)。结果表明,使用基于混沌的特征提取与SSL提高了泛化能力,弹性和低数据环境中的分类精度。
摘要:In numerous practical applications, acquiring substantial quantities of labelled data is challenging and expensive, but unlabelled data is readily accessible. Conventional supervised learning methods frequently underperform in scenarios characterised by little labelled data or imbalanced datasets. This study introduces a hybrid semi-supervised learning (SSL) architecture that integrates Neurochaos Learning (NL) with a threshold-based Self-Training (ST) method to overcome this constraint. The NL architecture converts input characteristics into chaos-based ring-rate representations that encapsulate nonlinear relationships within the data, whereas ST progressively enlarges the labelled set utilising high-confidence pseudo-labelled samples. The model's performance is assessed using ten benchmark datasets and five machine learning classifiers, with 85% of the training data considered unlabelled and just 15% utilised as labelled data. The proposed Self-Training Neurochaos Learning (NL+ST) architecture consistently attains superior performance gain relative to standalone ST models, especially on limited, nonlinear and imbalanced datasets like Iris (188.66%), Wine (158.58%) and Glass Identification (110.48%). The results indicate that using chaos-based feature extraction with SSL improves generalisation, resilience, and classification accuracy in low-data contexts.


【7】Wireless Dataset Similarity: Measuring Distances in Supervised and Unsupervised Machine Learning
标题:无线数据集相似性:在监督和无监督机器学习中测量距离
链接:https://arxiv.org/abs/2601.01023

作者:João Morais,Sadjad Alikhani,Akshay Malhotra,Shahab Hamidi-Rad,Ahmed Alkhateeb
备注:resources available in: https://www.wi-lab.net/research/dataset-similarity
摘要:本文介绍了一种任务和模型感知框架,用于测量无线数据集之间的相似性,支持数据集选择/增强、模拟到真实(sim 2 real)比较、特定于任务的合成数据生成等应用,并为模型训练/适应新部署提供决策信息。我们通过预测跨数据集可转移性的程度来评估候选数据集距离度量:如果两个数据集的距离很小,则在一个数据集上训练的模型应该在另一个数据集上表现良好。我们将该框架应用于无监督任务,信道状态信息(CSI)压缩,使用自动编码器。使用基于UMAP嵌入的指标,结合Wasserstein和Euclidean距离,我们实现了数据集距离与训练一个/测试另一个任务性能之间的Pearson相关性超过0.85。我们还将该框架应用于使用卷积神经网络的下行链路中的监督波束预测。对于这项任务,我们通过整合监督UMAP和数据集不平衡的惩罚来获得标签感知距离。在这两项任务中,产生的距离优于传统基线,并始终表现出与模型可转移性的更强相关性,支持无线数据集之间的任务相关比较。
摘要:This paper introduces a task- and model-aware framework for measuring similarity between wireless datasets, enabling applications such as dataset selection/augmentation, simulation-to-real (sim2real) comparison, task-specific synthetic data generation, and informing decisions on model training/adaptation to new deployments. We evaluate candidate dataset distance metrics by how well they predict cross-dataset transferability: if two datasets have a small distance, a model trained on one should perform well on the other. We apply the framework on an unsupervised task, channel state information (CSI) compression, using autoencoders. Using metrics based on UMAP embeddings, combined with Wasserstein and Euclidean distances, we achieve Pearson correlations exceeding 0.85 between dataset distances and train-on-one/test-on-another task performance. We also apply the framework to a supervised beam prediction in the downlink using convolutional neural networks. For this task, we derive a label-aware distance by integrating supervised UMAP and penalties for dataset imbalance. Across both tasks, the resulting distances outperform traditional baselines and consistently exhibit stronger correlations with model transferability, supporting task-relevant comparisons between wireless datasets.


【8】Enhanced Data-Driven Product Development via Gradient Based Optimization and Conformalized Monte Carlo Dropout Uncertainty Estimation
标题:通过基于梯度的优化和适形蒙特卡罗辍学不确定性估计增强数据驱动的产品开发
链接:https://arxiv.org/abs/2601.00932

作者:Andrea Thomas Nava,Lijo Johny,Fabio Azzalini,Johannes Schneider,Arianna Casanova
备注:Accepted at the 18th International Conference on Agents and Artificial Intelligence (ICAART 2026)
摘要:数据驱动的产品开发(DDPD)利用数据来了解产品设计规范和结果属性之间的关系。为了发现改进的设计,我们在过去的实验中训练神经网络,并应用投影梯度下降来识别最佳输入特征,从而最大限度地提高性能。由于许多产品需要同时优化多个相关属性,我们的框架采用联合神经网络来捕获目标之间的相互依赖性。此外,我们集成不确定性估计通过\n {Conformalised Monte Carlo Dropout}(ConfMC),一种新的方法相结合的嵌套共形预测与Monte Carlo dropout提供模型不可知,有限样本覆盖下的数据交换的保证。对五个真实世界数据集的广泛实验表明,我们的方法具有最先进的性能,同时提供自适应的、非均匀的预测区间,并消除了调整覆盖水平时重新训练的需要。
摘要:Data-Driven Product Development (DDPD) leverages data to learn the relationship between product design specifications and resulting properties. To discover improved designs, we train a neural network on past experiments and apply Projected Gradient Descent to identify optimal input features that maximize performance. Since many products require simultaneous optimization of multiple correlated properties, our framework employs joint neural networks to capture interdependencies among targets. Furthermore, we integrate uncertainty estimation via \emph{Conformalised Monte Carlo Dropout} (ConfMC), a novel method combining Nested Conformal Prediction with Monte Carlo dropout to provide model-agnostic, finite-sample coverage guarantees under data exchangeability. Extensive experiments on five real-world datasets show that our method matches state-of-the-art performance while offering adaptive, non-uniform prediction intervals and eliminating the need for retraining when adjusting coverage levels.


迁移|Zero/Few/One-Shot|自适应(12篇)

【1】Meta-Learning Guided Pruning for Few-Shot Plant Pathology on Edge Devices
标题:边缘设备上Few-Shot植物病理学的元学习引导修剪
链接:https://arxiv.org/abs/2601.02353

作者:Shahnawaz Alam,Mohammed Mudassir Uddin,Mohammed Kaif Pasha
摘要:偏远地区的农民需要快速可靠的方法来识别植物病害,但他们往往无法获得实验室或高性能计算资源。深度学习模型可以高精度地从叶片图像中检测疾病,但这些模型通常太大,计算成本太高,无法在Raspberry Pi等低成本边缘设备上运行。此外,收集数千个标记的疾病图像进行训练既昂贵又耗时。本文通过将神经网络修剪(删除模型中不必要的部分)与Few-Shot学习相结合来解决这两个挑战,这使得模型能够从有限的示例中学习。本文提出了疾病感知通道重要性评分(DACIS),这是一种识别神经网络的哪些部分对于区分不同植物疾病最重要的方法,集成到三阶段修剪-然后-元学习-然后-修剪(PMP)管道中。在PlantVillage和PlantDoc数据集上的实验表明,所提出的方法将模型大小减少了78%,同时保持了92.3%的原始准确率,压缩模型在Raspberry Pi 4上以每秒7帧的速度运行,使实时田间诊断对小农实用。
摘要:Farmers in remote areas need quick and reliable methods for identifying plant diseases, yet they often lack access to laboratories or high-performance computing resources. Deep learning models can detect diseases from leaf images with high accuracy, but these models are typically too large and computationally expensive to run on low-cost edge devices such as Raspberry Pi. Furthermore, collecting thousands of labeled disease images for training is both expensive and time-consuming. This paper addresses both challenges by combining neural network pruning -- removing unnecessary parts of the model -- with few-shot learning, which enables the model to learn from limited examples. This paper proposes Disease-Aware Channel Importance Scoring (DACIS), a method that identifies which parts of the neural network are most important for distinguishing between different plant diseases, integrated into a three-stage Prune-then-Meta-Learn-then-Prune (PMP) pipeline. Experiments on PlantVillage and PlantDoc datasets demonstrate that the proposed approach reduces model size by 78\% while maintaining 92.3\% of the original accuracy, with the compressed model running at 7 frames per second on a Raspberry Pi 4, making real-time field diagnosis practical for smallholder farmers.


【2】TopoLoRA-SAM: Topology-Aware Parameter-Efficient Adaptation of Foundation Segmenters for Thin-Structure and Cross-Domain Binary Semantic Segmentation
标题:TopoLoRA-Sam:用于薄结构和跨域二进制语义分割的基础分段器的结构感知参数高效调整
链接:https://arxiv.org/abs/2601.02273

作者:Salim Khazem
摘要:诸如Segment Anything Model(SAM)之类的基础分割模型通过大规模预训练表现出很强的zero-shot泛化,但是使它们适应特定于领域的语义分割仍然具有挑战性,特别是对于薄结构(例如,视网膜血管)和噪声模态(例如,SAR图像)。完全的微调在计算上是昂贵的,并且有灾难性遗忘的风险。我们提出了\textbf{TopoLoRA-SAM},一个拓扑感知和参数有效的适应框架,用于二进制语义分割。TopoLoRA-SAM将低秩自适应(LoRA)注入冻结的ViT编码器,通过可微clDice增强了轻量级空间卷积适配器和可选的拓扑感知监督。我们在五个基准上评估了我们的方法,包括视网膜血管分割(DRIVE,STARE,CHASE\_DB1),息肉分割(Kvasir-SEG)和SAR海/陆分割(SL-SSDD),与U-Net,DeepLabV 3+,SegFormer和Mask 2Former进行比较。TopoLoRA-SAM在数据集上实现了最佳的视网膜平均Dice和最佳的总体平均Dice,同时只训练了\textbf{5.2\%}的模型参数(490万美元)。在具有挑战性的CHASE\_DB1数据集上,我们的方法大大提高了分割的准确性和鲁棒性,证明了拓扑感知的参数有效自适应可以匹配或超过完全微调的专家模型。代码可从以下网址获得:https://github.com/salimkhazem/Seglab.git
摘要:Foundation segmentation models such as the Segment Anything Model (SAM) exhibit strong zero-shot generalization through large-scale pretraining, but adapting them to domain-specific semantic segmentation remains challenging, particularly for thin structures (e.g., retinal vessels) and noisy modalities (e.g., SAR imagery). Full fine-tuning is computationally expensive and risks catastrophic forgetting. We propose \textbf{TopoLoRA-SAM}, a topology-aware and parameter-efficient adaptation framework for binary semantic segmentation. TopoLoRA-SAM injects Low-Rank Adaptation (LoRA) into the frozen ViT encoder, augmented with a lightweight spatial convolutional adapter and optional topology-aware supervision via differentiable clDice. We evaluate our approach on five benchmarks spanning retinal vessel segmentation (DRIVE, STARE, CHASE\_DB1), polyp segmentation (Kvasir-SEG), and SAR sea/land segmentation (SL-SSDD), comparing against U-Net, DeepLabV3+, SegFormer, and Mask2Former. TopoLoRA-SAM achieves the best retina-average Dice and the best overall average Dice across datasets, while training only \textbf{5.2\%} of model parameters ($\sim$4.9M). On the challenging CHASE\_DB1 dataset, our method substantially improves segmentation accuracy and robustness, demonstrating that topology-aware parameter-efficient adaptation can match or exceed fully fine-tuned specialist models. Code is available at : https://github.com/salimkhazem/Seglab.git


【3】A Comparative Study of Custom CNNs, Pre-trained Models, and Transfer Learning Across Multiple Visual Datasets
标题:多个视觉数据集中的自定义CNN、预训练模型和迁移学习的比较研究
链接:https://arxiv.org/abs/2601.02246

作者:Annoor Sharara Akhand
摘要:卷积神经网络(CNN)是视觉识别的标准方法,因为它们能够从原始像素中学习分层表示。在实践中,从业者通常在以下各项中进行选择:(i)从头开始训练紧凑的自定义CNN,(ii)使用大型预训练的CNN作为固定特征提取器,以及(iii)通过对预训练的骨干进行部分或全部微调来执行迁移学习。本报告对这三种模式在五个真实世界的图像分类数据集进行了对照比较,这些数据集包括路面缺陷识别、农业品种识别、水果/叶子疾病识别、行人通道侵入识别和未经授权的车辆识别。模型使用准确性和宏观F1分数进行评估,并辅以效率指标,包括每个时期的训练时间和参数计数。结果表明,迁移学习始终产生最强的预测性能,而自定义CNN提供了一个有吸引力的效率-准确性权衡,特别是当计算和内存预算受到限制时。
摘要:Convolutional Neural Networks (CNNs) are a standard approach for visual recognition due to their capacity to learn hierarchical representations from raw pixels. In practice, practitioners often choose among (i) training a compact custom CNN from scratch, (ii) using a large pre-trained CNN as a fixed feature extractor, and (iii) performing transfer learning via partial or full fine-tuning of a pre-trained backbone. This report presents a controlled comparison of these three paradigms across five real-world image classification datasets spanning road-surface defect recognition, agricultural variety identification, fruit/leaf disease recognition, pedestrian walkway encroachment recognition, and unauthorized vehicle recognition. Models are evaluated using accuracy and macro F1-score, complemented by efficiency metrics including training time per epoch and parameter counts. The results show that transfer learning consistently yields the strongest predictive performance, while the custom CNN provides an attractive efficiency--accuracy trade-off, especially when compute and memory budgets are constrained.


【4】Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting
标题:自适应的微调:解决自信冲突以减少遗忘
链接:https://arxiv.org/abs/2601.02151

作者:Muxi Diao,Lele Yang,Wuxuan Gong,Yutong Zhang,Zhonghao Yan,Yufei Han,Kongming Liang,Weiran Xu,Zhanyu Ma
摘要:监督微调(SFT)是领域自适应的标准范例,但它经常会导致灾难性遗忘的代价。与此形成鲜明对比的是,基于策略的强化学习(RL)有效地保留了一般能力。我们调查了这种差异,并确定了一个根本的分配差距:虽然RL与模型的内部信念一致,SFT迫使模型适应外部监督。这种不匹配通常表现为“置信冲突”令牌,其特征在于低概率但低熵。在这些情况下,模型对自己的预测非常有信心,但被迫学习不同的基础事实,从而触发破坏性的梯度更新。为了解决这个问题,我们提出了熵自适应微调(EAFT)。与仅依赖于预测概率的方法不同,EAFT利用令牌级熵作为门控机制来区分认知不确定性和知识冲突。这允许模型从不确定的样本中学习,同时抑制冲突数据的梯度。Qwen和GLM系列(范围从4B到32B参数)跨越数学,医学和agentic域的广泛实验证实了我们的假设。EAFT始终与标准SFT的下游性能相匹配,同时显著减轻了一般能力的退化。
摘要 :Supervised Fine-Tuning (SFT) is the standard paradigm for domain adaptation, yet it frequently incurs the cost of catastrophic forgetting. In sharp contrast, on-policy Reinforcement Learning (RL) effectively preserves general capabilities. We investigate this discrepancy and identify a fundamental distributional gap: while RL aligns with the model's internal belief, SFT forces the model to fit external supervision. This mismatch often manifests as "Confident Conflicts" tokens characterized by low probability but low entropy. In these instances, the model is highly confident in its own prediction but is forced to learn a divergent ground truth, triggering destructive gradient updates. To address this, we propose Entropy-Adaptive Fine-Tuning (EAFT). Unlike methods relying solely on prediction probability, EAFT utilizes token-level entropy as a gating mechanism to distinguish between epistemic uncertainty and knowledge conflict. This allows the model to learn from uncertain samples while suppressing gradients on conflicting data. Extensive experiments on Qwen and GLM series (ranging from 4B to 32B parameters) across mathematical, medical, and agentic domains confirm our hypothesis. EAFT consistently matches the downstream performance of standard SFT while significantly mitigating the degradation of general capabilities.


【5】FedBiCross: A Bi-Level Optimization Framework to Tackle Non-IID Challenges in Data-Free One-Shot Federated Learning on Medical Data
标题:FedBiCross:一个双层优化框架,可应对医疗数据无数据一次联邦学习中的非IID挑战
链接:https://arxiv.org/abs/2601.01901

作者:Yuexuan Xia,Yinghao Zhang,Yalin Liu,Hong-Ning Dai,Yong Xia
摘要:基于无数据知识蒸馏的一次性联邦学习(OSFL)在单次通信中训练模型,而无需共享原始数据,使OSFL对隐私敏感的医疗应用具有吸引力。然而,现有的方法汇总了所有客户端的预测,以形成一个全球教师。在非IID数据下,相互冲突的预测在平均过程中被抵消,产生近乎统一的软标签,为蒸馏提供弱监督。我们提出了FedBiCross,一个个性化的OSFL框架,分为三个阶段:(1)通过模型输出相似性对客户端进行聚类,以形成连贯的子集合,(2)双层跨集群优化,学习自适应权重,选择性地利用有益的跨集群知识,同时抑制负迁移,以及(3)针对客户端特定适应的个性化蒸馏。在四个医学图像数据集上的实验表明,FedBiCross在不同的非IID程度上始终优于最先进的基线。
摘要:Data-free knowledge distillation-based one-shot federated learning (OSFL) trains a model in a single communication round without sharing raw data, making OSFL attractive for privacy-sensitive medical applications. However, existing methods aggregate predictions from all clients to form a global teacher. Under non-IID data, conflicting predictions cancel out during averaging, yielding near-uniform soft labels that provide weak supervision for distillation. We propose FedBiCross, a personalized OSFL framework with three stages: (1) clustering clients by model output similarity to form coherent sub-ensembles, (2) bi-level cross-cluster optimization that learns adaptive weights to selectively leverage beneficial cross-cluster knowledge while suppressing negative transfer, and (3) personalized distillation for client-specific adaptation. Experiments on four medical image datasets demonstrate that FedBiCross consistently outperforms state-of-the-art baselines across different non-IID degrees.


【6】REE-TTT: Highly Adaptive Radar Echo Extrapolation Based on Test-Time Training
标题:REE-TTT:基于测试时训练的高适应性雷达回声外推
链接:https://arxiv.org/abs/2601.01605

作者:Xin Di,Xinglin Piao,Fei Wang,Guodong Jing,Yong Zhang
摘要:降水临近预报是气象预报的重要组成部分。基于深度学习的雷达回波外推(REE)已成为一种主要的临近预报方法,但由于其依赖于高质量的本地训练数据和静态模型参数,因此泛化能力较差,限制了其在不同地区和极端事件中的适用性。为了克服这一点,我们提出了REE-TTT,一种新的模型,采用了自适应测试时训练(TTT)机制。我们模型的核心在于新设计的时空测试时间训练(ST-TTT)块,它用特定任务的注意力机制取代了TTT层中的标准线性投影,使其能够鲁棒地适应非平稳的气象分布,从而显着增强降水的特征表示。跨区域极端降水情景下的试验表明,REE-TTT在预报精度和泛化能力上明显优于现有的基线模型,对数据分布的变化表现出显著的适应性。
摘要:Precipitation nowcasting is critically important for meteorological forecasting. Deep learning-based Radar Echo Extrapolation (REE) has become a predominant nowcasting approach, yet it suffers from poor generalization due to its reliance on high-quality local training data and static model parameters, limiting its applicability across diverse regions and extreme events. To overcome this, we propose REE-TTT, a novel model that incorporates an adaptive Test-Time Training (TTT) mechanism. The core of our model lies in the newly designed Spatio-temporal Test-Time Training (ST-TTT) block, which replaces the standard linear projections in TTT layers with task-specific attention mechanisms, enabling robust adaptation to non-stationary meteorological distributions and thereby significantly enhancing the feature representation of precipitation. Experiments under cross-regional extreme precipitation scenarios demonstrate that REE-TTT substantially outperforms state-of-the-art baseline models in prediction accuracy and generalization, exhibiting remarkable adaptability to data distribution shifts.


【7】Rethinking Multimodal Few-Shot 3D Point Cloud Segmentation: From Fused Refinement to Decoupled Arbitration
标题:重新思考多模式Few-Shot3D点云分割:从融合细化到脱钩仲裁
链接:https://arxiv.org/abs/2601.01456

作者:Wentao Bian,Fenglei Xu
备注:10 pages, 4 figures, 3 tables
摘要:在本文中,我们重新审视多模态Few-Shot三维点云语义分割(FS-PCS),确定“融合-然后-细化”范式中的冲突:“可塑性-稳定性困境”。此外,CLIP的类间混淆可能导致语义盲视。为了解决这些问题,我们提出了解耦专家仲裁Few-Shot SegNet(DA-FSS),这是一种有效区分语义和几何路径并相互正则化其梯度以实现更好泛化的模型。DA-FSS使用与MM-FSS相同的主干和预训练的文本编码器来生成文本嵌入,这可以提高自由模态的利用率,并更好地利用每个模态的信息空间。为了实现这一点,我们提出了一个并行专家细化模块来生成每个模态相关性。我们还提出了一个堆叠仲裁模块(SAM)来执行卷积融合和仲裁每个模态路径的相关性。并行专家解耦两条路径:几何专家保持可塑性,语义专家确保稳定性。它们通过解耦对齐模块(DAM)进行协调,该模块在不传播混淆的情况下传输知识。在S3 DIS、ScanNet等数据集上的实验表明,DA-FSS算法优于MM-FSS算法。同时,几何边界,完整性和纹理分化都优于基线。该代码可在https://github.com/MoWenQAQ/DA-FSS上获得。
摘要:In this paper, we revisit multimodal few-shot 3D point cloud semantic segmentation (FS-PCS), identifying a conflict in "Fuse-then-Refine" paradigms: the "Plasticity-Stability Dilemma." In addition, CLIP's inter-class confusion can result in semantic blindness. To address these issues, we present the Decoupled-experts Arbitration Few-Shot SegNet (DA-FSS), a model that effectively distinguishes between semantic and geometric paths and mutually regularizes their gradients to achieve better generalization. DA-FSS employs the same backbone and pre-trained text encoder as MM-FSS to generate text embeddings, which can increase free modalities' utilization rate and better leverage each modality's information space. To achieve this, we propose a Parallel Expert Refinement module to generate each modal correlation. We also propose a Stacked Arbitration Module (SAM) to perform convolutional fusion and arbitrate correlations for each modality pathway. The Parallel Experts decouple two paths: a Geometric Expert maintains plasticity, and a Semantic Expert ensures stability. They are coordinated via a Decoupled Alignment Module (DAM) that transfers knowledge without propagating confusion. Experiments on popular datasets (S3DIS, ScanNet) demonstrate the superiority of DA-FSS over MM-FSS. Meanwhile, geometric boundaries, completeness, and texture differentiation are all superior to the baseline. The code is available at: https://github.com/MoWenQAQ/DA-FSS.


【8】Evaluating transfer learning strategies for improving dairy cattle body weight prediction in small farms using depth-image and point-cloud data
标题:使用深度图像和点云数据评估用于改善小型农场奶牛体重预测的迁移学习策略
链接:https://arxiv.org/abs/2601.01044

作者:Jin Wang,Angelo De Castro,Yuxi Zhang,Lucas Basolli Borsatto,Yuechen Guo,Victoria Bastos Primo,Ana Beatriz Montevecchio Bernardino,Gota Morota,Ricardo C Chebel,Haipeng Yu
摘要:计算机视觉为监控奶牛提供了自动化、非侵入性和可扩展的工具,从而支持管理、健康评估和表型数据收集。虽然迁移学习通常用于从图像中预测体重,但其有效性和最佳微调策略在畜牧业应用中仍然知之甚少,特别是除了使用预训练的ImageNet或COCO权重之外。此外,虽然深度图像和三维点云数据已被探索用于体重预测,但这两种模式在奶牛中的直接比较是有限的。因此,本研究的目的是:1)评估来自大型农场的迁移学习是否可以增强数据有限的小型农场的体重预测,2)比较三种实验设计下基于深度图像和点云的方法的预测性能。分别从大型、中型和小型奶牛场的1,201,215和58头奶牛中收集俯视深度图像和点云数据。评估了四种深度学习模型:用于深度图像的ConvNeXt和MobileViT,以及用于点云的PointNet和DGCNN。在所有四种模型中,迁移学习显著改善了小农场的体重预测,优于单源学习,并实现了与联合学习相当或更高的收益。这些结果表明,预训练的表示在具有不同成像条件和奶牛种群的农场中具有良好的泛化能力。在基于深度图像和基于点云的模型之间没有观察到一致的性能差异。总体而言,这些研究结果表明,迁移学习非常适合小型农场预测场景,其中跨农场数据共享受到隐私,物流或政策限制的限制,因为它只需要访问预训练的模型权重而不是原始数据。
摘要:Computer vision provides automated, non-invasive, and scalable tools for monitoring dairy cattle, thereby supporting management, health assessment, and phenotypic data collection. Although transfer learning is commonly used for predicting body weight from images, its effectiveness and optimal fine-tuning strategies remain poorly understood in livestock applications, particularly beyond the use of pretrained ImageNet or COCO weights. In addition, while both depth images and three-dimensional point-cloud data have been explored for body weight prediction, direct comparisons of these two modalities in dairy cattle are limited. Therefore, the objectives of this study were to 1) evaluate whether transfer learning from a large farm enhances body weight prediction on a small farm with limited data, and 2) compare the predictive performance of depth-image- and point-cloud-based approaches under three experimental designs. Top-view depth images and point-cloud data were collected from 1,201, 215, and 58 cows at large, medium, and small dairy farms, respectively. Four deep learning models were evaluated: ConvNeXt and MobileViT for depth images, and PointNet and DGCNN for point clouds. Transfer learning markedly improved body weight prediction on the small farm across all four models, outperforming single-source learning and achieving gains comparable to or greater than joint learning. These results indicate that pretrained representations generalize well across farms with differing imaging conditions and dairy cattle populations. No consistent performance difference was observed between depth-image- and point-cloud-based models. Overall, these findings suggest that transfer learning is well suited for small farm prediction scenarios where cross-farm data sharing is limited by privacy, logistical, or policy constraints, as it requires access only to pretrained model weights rather than raw data.


【9】Zero-shot Forecasting by Simulation Alone
标题:仅通过模拟进行Zero-Shot预测
链接:https://arxiv.org/abs/2601.00970

作者:Boris N. Oreshkin,Mayank Jauhari,Ravi Kiran Selvam,Malcolm Wolff,Wenhao Pan,Shankar Ramasubramanian,Kin G. Olivares,Tatiana Konstantinova,Andres Potapczynski,Mengfei Cao,Dmitry Efimov,Michael W. Mahoney,Andrew G. Wilson
摘要:Zero-shot时间序列预测有很大的希望,但仍处于起步阶段,受到有限和有偏见的数据库,容易泄漏的评估以及隐私和许可限制的阻碍。出于这些挑战,我们提出了第一个实用的单变量时间序列仿真管道,它同时足够快的实时数据生成,并使显着的zero-shot预测性能的M系列和GiftEval基准捕捉趋势/季节性/波动模式,典型的工业预测应用在各个领域。我们的模拟器,我们称之为SarSim 0(SARIMA模拟器Zero-Shot预测),是基于季节性自回归综合移动平均(SARIMA)模型作为其核心数据源。由于自回归分量的不稳定性,朴素SARIMA模拟经常导致不可用的路径。相反,我们遵循一个三步程序:(1)我们从其特征多项式稳定区域中采样表现良好的轨迹;(2)我们引入一个叠加方案,将多个路径组合成丰富的多季节性轨迹;(3)我们添加基于速率的重尾噪声模型,以捕获突发性和不稳定性以及季节性和趋势。SarSim 0比基于内核的生成器快了几个数量级,并且它可以在大约1B的独特纯模拟序列上进行训练,这些序列是动态生成的;之后,成熟的神经网络主干表现出强大的zero-shot泛化能力,超过了强大的统计预测器和最近的基础基线,同时在严格的zero-shot协议下运行。值得注意的是,在GiftEval上,我们观察到一种“学生击败教师”的效应:在我们的模拟上训练的模型超过了AutoARIMA生成过程的预测准确性。
摘要:Zero-shot time-series forecasting holds great promise, but is still in its infancy, hindered by limited and biased data corpora, leakage-prone evaluation, and privacy and licensing constraints. Motivated by these challenges, we propose the first practical univariate time series simulation pipeline which is simultaneously fast enough for on-the-fly data generation and enables notable zero-shot forecasting performance on M-Series and GiftEval benchmarks that capture trend/seasonality/intermittency patterns, typical of industrial forecasting applications across a variety of domains. Our simulator, which we call SarSim0 (SARIMA Simulator for Zero-Shot Forecasting), is based off of a seasonal autoregressive integrated moving average (SARIMA) model as its core data source. Due to instability in the autoregressive component, naive SARIMA simulation often leads to unusable paths. Instead, we follow a three-step procedure: (1) we sample well-behaved trajectories from its characteristic polynomial stability region; (2) we introduce a superposition scheme that combines multiple paths into rich multi-seasonality traces; and (3) we add rate-based heavy-tailed noise models to capture burstiness and intermittency alongside seasonalities and trends. SarSim0 is orders of magnitude faster than kernel-based generators, and it enables training on circa 1B unique purely simulated series, generated on the fly; after which well-established neural network backbones exhibit strong zero-shot generalization, surpassing strong statistical forecasters and recent foundation baselines, while operating under strict zero-shot protocol. Notably, on GiftEval we observe a "student-beats-teacher" effect: models trained on our simulations exceed the forecasting accuracy of the AutoARIMA generating processes.


【10】Noise-Aware and Dynamically Adaptive Federated Defense Framework for SAR Image Target Recognition
标题:SAR图像目标识别的噪音感知和动态自适应联邦防御框架
链接:https://arxiv.org/abs/2601.00900

作者:Yuchao Hou,Zixuan Zhang,Jie Wang,Wenke Huang,Lianhui Liang,Di Wu,Zhiquan Liu,Youliang Tian,Jianming Zhu,Jisheng Dang,Junhao Dong,Zhongliang Guo
备注:This work was supported in part by the National Key Research and Development Program of China under Grant 2021YFB3101100, in part by the National Natural Science Foundation of China under Grant 62272123, 42371470, and 42461057, in part by the Fundamental Research Program of Shanxi Province under Grant 202303021212164. Corresponding authors: Zhongliang Guo and Junhao Dong
摘要:作为计算智能在遥感领域的重要应用,基于深度学习的合成孔径雷达(SAR)图像目标识别有助于智能感知,但通常依赖于集中式训练,其中多源SAR数据被上传到单个服务器,引发了隐私和安全问题。联邦学习(FL)为SAR图像目标识别提供了一种新兴的计算智能范例,在保护本地数据隐私的同时实现跨站点协作。然而,FL面临着严重的安全风险,恶意客户端可以利用SAR的乘性斑点噪声来隐藏后门触发器,严重挑战计算智能模型的鲁棒性。为了应对这一挑战,我们提出了NADAFD,一个噪声感知和动态自适应的联邦防御框架,集成了频域,空间域和客户端行为分析,以应对SAR特定的后门威胁。具体来说,我们引入了频域协作反演机制,以暴露跨客户端的频谱不一致性,指示隐藏的后门触发器。我们进一步设计了一种噪声感知的对抗训练策略,将$Γ$分布的斑点特征嵌入到掩模引导的对抗样本生成中,以增强对后门攻击和SAR斑点噪声的鲁棒性。此外,我们提出了一个动态的健康评估模块,跟踪客户端更新行为的训练轮,并自适应地调整聚合权重,以减轻不断演变的恶意贡献。MSTAR和OpenSARShip数据集上的实验表明,NADAFD在干净的测试样本上实现了更高的准确率,并且在触发输入上的后门攻击成功率低于现有的联合后门防御SAR目标识别。
摘要 :As a critical application of computational intelligence in remote sensing, deep learning-based synthetic aperture radar (SAR) image target recognition facilitates intelligent perception but typically relies on centralized training, where multi-source SAR data are uploaded to a single server, raising privacy and security concerns. Federated learning (FL) provides an emerging computational intelligence paradigm for SAR image target recognition, enabling cross-site collaboration while preserving local data privacy. However, FL confronts critical security risks, where malicious clients can exploit SAR's multiplicative speckle noise to conceal backdoor triggers, severely challenging the robustness of the computational intelligence model. To address this challenge, we propose NADAFD, a noise-aware and dynamically adaptive federated defense framework that integrates frequency-domain, spatial-domain, and client-behavior analyses to counter SAR-specific backdoor threats. Specifically, we introduce a frequency-domain collaborative inversion mechanism to expose cross-client spectral inconsistencies indicative of hidden backdoor triggers. We further design a noise-aware adversarial training strategy that embeds $Γ$-distributed speckle characteristics into mask-guided adversarial sample generation to enhance robustness against both backdoor attacks and SAR speckle noise. In addition, we present a dynamic health assessment module that tracks client update behaviors across training rounds and adaptively adjusts aggregation weights to mitigate evolving malicious contributions. Experiments on MSTAR and OpenSARShip datasets demonstrate that NADAFD achieves higher accuracy on clean test samples and a lower backdoor attack success rate on triggered inputs than existing federated backdoor defenses for SAR target recognition.


【11】FANoS: Friction-Adaptive Nosé--Hoover Symplectic Momentum for Stiff Objectives
标题:FNoS:描述-适应性鼻子--胡佛辛动量的刚性目标
链接:https://arxiv.org/abs/2601.00889

作者:Nalin Dhiman
备注:13 pages, 5 figures, 4 tables
摘要:我们研究了一个物理启发的优化器,\n {FANoS}(自适应Nosé-胡佛辛动量),它结合了(i)作为离散二阶动力学系统编写的动量更新,(ii)一个Nosé-胡佛一样的恒温器变量,适应标量摩擦系数使用动能反馈,和(iii)半隐式(辛欧拉)积分器,可选的对角RMS预处理器。该方法的动机是从分子动力学的结构保持集成和恒温器的想法,但在这里纯粹作为一个优化启发式。   我们提供的算法和有限的理论观察在理想化的设置。在具有3000个梯度评估的确定性Rosenbrock-100 D基准上,FANoS-RMS达到了$1.74\times 10^{-2}$的平均最终目标值,大大优于该协议中的未剪裁AdamW($48.50$)和SGD+动量($90.76$)。但是,具有梯度剪裁的AdamW更强,达到1.87\times 10^{-3}$,L-BFGS达到约4.4\times 10^{-10}$。在病态凸二次和一个小的PINN热启动套件(Burgers和Allen-Cahn),默认的FANoS配置表现不佳的AdamW,可以是不稳定的或高方差。   总的来说,证据支持一个保守的结论:FANoS是对现有想法的可解释的综合,可以帮助一些僵硬的非凸谷,但它不是现代基线的一般优越替代品,并且它的行为对温度计划和超参数选择敏感。
摘要:We study a physics-inspired optimizer, \emph{FANoS} (Friction-Adaptive Nosé--Hoover Symplectic momentum), which combines (i) a momentum update written as a discretized second-order dynamical system, (ii) a Nosé--Hoover-like thermostat variable that adapts a scalar friction coefficient using kinetic-energy feedback, and (iii) a semi-implicit (symplectic-Euler) integrator, optionally with a diagonal RMS preconditioner. The method is motivated by structure-preserving integration and thermostat ideas from molecular dynamics, but is used here purely as an optimization heuristic.   We provide the algorithm and limited theoretical observations in idealized settings. On the deterministic Rosenbrock-100D benchmark with 3000 gradient evaluations, FANoS-RMS attains a mean final objective value of $1.74\times 10^{-2}$, improving substantially over unclipped AdamW ($48.50$) and SGD+momentum ($90.76$) in this protocol. However, AdamW with gradient clipping is stronger, reaching $1.87\times 10^{-3}$, and L-BFGS reaches $\approx 4.4\times 10^{-10}$. On ill-conditioned convex quadratics and in a small PINN warm-start suite (Burgers and Allen--Cahn), the default FANoS configuration underperforms AdamW and can be unstable or high-variance.   Overall, the evidence supports a conservative conclusion: FANoS is an interpretable synthesis of existing ideas that can help on some stiff nonconvex valleys, but it is not a generally superior replacement for modern baselines, and its behavior is sensitive to temperature-schedule and hyperparameter choices.


【12】Environment-Adaptive Covariate Selection: Learning When to Use Spurious Correlations for Out-of-Distribution Prediction
标题:环境适应性协变量选择:学习何时使用伪相关性进行分布外预测
链接:https://arxiv.org/abs/2601.02322

作者:Shuozhi Zuo,Yixin Wang
摘要:分布外(OOD)预测通常通过将模型限制为因果或不变协变量来实现,以避免可能在环境中不稳定的非因果虚假关联。尽管理论上很有吸引力,但这种策略在实践中经常表现不佳。我们调查了这种差距的来源,并表明,这种失败自然出现时,只有一个子集的结果的真正原因被观察到。在这些情况下,非因果性的虚假协变量可以作为未观察到的原因的信息代理,并大大提高预测,除非分布变化打破这些代理关系。因此,预测协变量的最佳集合既不是通用的,也不一定表现出与所有环境中的结果的不变关系,而是取决于所遇到的特定类型的转变。至关重要的是,我们观察到不同的协变量变化会在协变量分布本身中诱导出不同的、可观察到的特征。此外,这些签名可以从目标OOD环境中的未标记数据中提取,并用于评估代理协变量何时保持可靠以及何时失败。在此基础上,我们提出了一种环境自适应协变量选择(EACS)算法,该算法将环境级别的协变量摘要映射到环境特定的协变量集,同时允许将先验因果知识作为约束条件。在模拟和应用数据集上,EACS在不同的分布变化下始终优于静态因果,不变和基于ERM的预测因子。
摘要:Out-of-distribution (OOD) prediction is often approached by restricting models to causal or invariant covariates, avoiding non-causal spurious associations that may be unstable across environments. Despite its theoretical appeal, this strategy frequently underperforms empirical risk minimization (ERM) in practice. We investigate the source of this gap and show that such failures naturally arise when only a subset of the true causes of the outcome is observed. In these settings, non-causal spurious covariates can serve as informative proxies for unobserved causes and substantially improve prediction, except under distribution shifts that break these proxy relationships. Consequently, the optimal set of predictive covariates is neither universal nor necessarily exhibits invariant relationships with the outcome across all environments, but instead depends on the specific type of shift encountered. Crucially, we observe that different covariate shifts induce distinct, observable signatures in the covariate distribution itself. Moreover, these signatures can be extracted from unlabeled data in the target OOD environment and used to assess when proxy covariates remain reliable and when they fail. Building on this observation, we propose an environment-adaptive covariate selection (EACS) algorithm that maps environment-level covariate summaries to environment-specific covariate sets, while allowing the incorporation of prior causal knowledge as constraints. Across simulations and applied datasets, EACS consistently outperforms static causal, invariant, and ERM-based predictors under diverse distribution shifts.


强化学习(8篇)

【1】Higher-Order Action Regularization in Deep Reinforcement Learning: From Continuous Control to Building Energy Management
标题:深度强化学习中的高级动作规范化:从持续控制到建筑能源管理
链接:https://arxiv.org/abs/2601.02061

作者:Faizan Ahmed,Aniket Dixit,James Brusey
备注:6 pages, accepted at NeurIPS workshop 2025
摘要:深度强化学习代理通常表现出不稳定的高频控制行为,由于过度的能量消耗和机械磨损,这些行为阻碍了现实世界的部署。我们通过高阶导数惩罚系统地研究了动作平滑正则化,从连续控制基准的理论理解到建筑能源管理的实际验证。我们在四个连续控制环境的综合评估表明,三阶导数惩罚(加加速度最小化)始终实现卓越的平滑性,同时保持竞争力的性能。我们将这些发现扩展到HVAC控制系统,其中平滑的策略将设备切换减少了60%,转化为显着的运营效益。我们的工作建立了高阶动作正则化作为RL优化和操作约束之间的有效桥梁,在能源关键型应用程序。
摘要:Deep reinforcement learning agents often exhibit erratic, high-frequency control behaviors that hinder real-world deployment due to excessive energy consumption and mechanical wear. We systematically investigate action smoothness regularization through higher-order derivative penalties, progressing from theoretical understanding in continuous control benchmarks to practical validation in building energy management. Our comprehensive evaluation across four continuous control environments demonstrates that third-order derivative penalties (jerk minimization) consistently achieve superior smoothness while maintaining competitive performance. We extend these findings to HVAC control systems where smooth policies reduce equipment switching by 60%, translating to significant operational benefits. Our work establishes higher-order action regularization as an effective bridge between RL optimization and operational constraints in energy-critical applications.


【2】Distorted Distributional Policy Evaluation for Offline Reinforcement Learning
标题:离线强化学习的扭曲分布策略评估
链接:https://arxiv.org/abs/2601.01917

作者:Ryo Iwaki,Takayuki Osogami
备注:The preprint version of the paper accepted to ICONIP2025. The Version of Record is available online at https://link.springer.com/chapter/10.1007/978-981-95-4091-4_35
摘要:虽然分布式强化学习(DRL)方法在在线环境中表现出强大的性能,但其在离线场景中的成功仍然有限。我们假设,现有的离线DRL方法的一个关键限制在于他们的方法,一致低估回报分位数。这种一致的悲观情绪可能会导致过于保守的价值估计,最终阻碍概括和性能。为了解决这个问题,我们引入了一个新的概念,称为分位数失真,它使非均匀的悲观主义,根据支持数据的可用性,调整保守的程度。我们的方法是基于理论分析和经验验证,表现出改善性能的统一悲观。
摘要:While Distributional Reinforcement Learning (DRL) methods have demonstrated strong performance in online settings, its success in offline scenarios remains limited. We hypothesize that a key limitation of existing offline DRL methods lies in their approach to uniformly underestimate return quantiles. This uniform pessimism can lead to overly conservative value estimates, ultimately hindering generalization and performance. To address this, we introduce a novel concept called quantile distortion, which enables non-uniform pessimism by adjusting the degree of conservatism based on the availability of supporting data. Our approach is grounded in theoretical analysis and empirically validated, demonstrating improved performance over uniform pessimism.


【3】Evaluating Feature Dependent Noise in Preference-based Reinforcement Learning
标题:在基于偏好的强化学习中评估特征相关噪音
链接:https://arxiv.org/abs/2601.01904

作者:Yuxuan Li,Harshith Reddy Kethireddy,Srijita Das
摘要:强化学习中的偏好学习(PbRL)最近引起了人们的关注,因为它非常适合奖励函数不容易获得的复杂任务。然而,如果不是来自完美的老师,偏好往往伴随着不确定性和噪音。许多先前的文献旨在检测噪声,但噪声类型有限,并且大多数均匀分布,与观测无关。在这项工作中,我们正式定义了目标特征相关噪声的概念,并提出了几种变体,如轨迹特征噪声,轨迹相似性噪声,不确定性感知噪声和语言模型噪声。   我们评估特征相关的噪声,其中噪声与DMControl和Meta-world的复杂连续控制任务中的某些特征相关。我们的实验表明,在一些特征相关的噪声设置中,最先进的噪声鲁棒性PbRL方法的学习性能显着恶化,而没有显式去噪的PbRL方法在大多数设置中可以令人惊讶地优于噪声鲁棒性PbRL。   我们还发现,语言模型的噪声表现出类似的特征依赖噪声,从而模拟现实的人,并呼吁进一步研究与特征依赖噪声的学习鲁棒。
摘要:Learning from Preferences in Reinforcement Learning (PbRL) has gained attention recently, as it serves as a natural fit for complicated tasks where the reward function is not easily available. However, preferences often come with uncertainty and noise if they are not from perfect teachers. Much prior literature aimed to detect noise, but with limited types of noise and most being uniformly distributed with no connection to observations. In this work, we formalize the notion of targeted feature-dependent noise and propose several variants like trajectory feature noise, trajectory similarity noise, uncertainty-aware noise, and Language Model noise.   We evaluate feature-dependent noise, where noise is correlated with certain features in complex continuous control tasks from DMControl and Meta-world. Our experiments show that in some feature-dependent noise settings, the state-of-the-art noise-robust PbRL method's learning performance is significantly deteriorated, while PbRL method with no explicit denoising can surprisingly outperform noise-robust PbRL in majority settings.   We also find language model's noise exhibits similar characteristics to feature-dependent noise, thereby simulating realistic humans and call for further study in learning with feature-dependent noise robustly.


【4】Sparse Threats, Focused Defense: Criticality-Aware Robust Reinforcement Learning for Safe Autonomous Driving
标题:稀疏威胁,专注防御:具有批判性的鲁棒强化学习,实现安全自动驾驶
链接:https://arxiv.org/abs/2601.01800

作者:Qi Wei,Junchao Fan,Zhao Yang,Jianhua Wang,Jingkai Mao,Xiaolin Chang
摘要:强化学习(RL)在自动驾驶(AD)中显示出相当大的潜力,但其对扰动的脆弱性仍然是现实世界部署的关键障碍。作为主要对策,对抗性训练通过在存在故意引入扰动的对手的情况下训练AD代理来提高策略鲁棒性。现有的方法通常将交互建模为具有连续攻击的零和游戏。然而,这样的设计忽略了代理和对手之间固有的不对称性,然后无法反映安全关键风险的稀疏性,从而使所实现的鲁棒性不足以用于实际的AD场景。为了解决这些局限性,我们引入了临界感知鲁棒强化学习(CARRL),这是一种新型的对抗性训练方法,用于处理自动驾驶中稀疏的安全关键风险。CARRL由两个相互作用的组件组成:风险暴露对手(REA)和风险目标鲁棒代理(RTRA)。我们将REA和RTRA之间的相互作用建模为一般和博弈,允许REA专注于暴露安全关键故障(例如,碰撞),而RTRA学会平衡安全性与驾驶效率。REA采用解耦优化机制,以更好地识别和利用有限预算下的稀疏安全关键时刻。然而,这种集中攻击不可避免地导致对抗性数据的稀缺。RTRA通过双重重放缓冲区联合利用良性和对抗性经验来应对这种稀缺性,并在扰动下强制执行策略一致性以稳定行为。实验结果表明,我们的方法减少了至少22.66%的碰撞率在所有情况下相比,国家的最先进的基线方法。
摘要:Reinforcement learning (RL) has shown considerable potential in autonomous driving (AD), yet its vulnerability to perturbations remains a critical barrier to real-world deployment. As a primary countermeasure, adversarial training improves policy robustness by training the AD agent in the presence of an adversary that deliberately introduces perturbations. Existing approaches typically model the interaction as a zero-sum game with continuous attacks. However, such designs overlook the inherent asymmetry between the agent and the adversary and then fail to reflect the sparsity of safety-critical risks, rendering the achieved robustness inadequate for practical AD scenarios. To address these limitations, we introduce criticality-aware robust RL (CARRL), a novel adversarial training approach for handling sparse, safety-critical risks in autonomous driving. CARRL consists of two interacting components: a risk exposure adversary (REA) and a risk-targeted robust agent (RTRA). We model the interaction between the REA and RTRA as a general-sum game, allowing the REA to focus on exposing safety-critical failures (e.g., collisions) while the RTRA learns to balance safety with driving efficiency. The REA employs a decoupled optimization mechanism to better identify and exploit sparse safety-critical moments under a constrained budget. However, such focused attacks inevitably result in a scarcity of adversarial data. The RTRA copes with this scarcity by jointly leveraging benign and adversarial experiences via a dual replay buffer and enforces policy consistency under perturbations to stabilize behavior. Experimental results demonstrate that our approach reduces the collision rate by at least 22.66\% across all cases compared to state-of-the-art baseline methods.


【5】SRAS: A Lightweight Reinforcement Learning-based Document Selector for Edge-Native RAG Pipelines
标题:Sras:边缘原生RAG管道的轻量级基于强化学习的文档格式
链接:https://arxiv.org/abs/2601.01785

作者:Rajiv Chaitanya Muttur
备注:Presented at ICEdge 2025; nominated for Best Paper Award
摘要 :检索增强生成(RAG)系统通常依赖于固定的top-k文档选择机制,该机制忽略下游生成质量并施加计算开销。我们提出了SRAS(稀疏奖励感知),这是一种通过强化学习(RL)训练的轻量级文档选择器,用于边缘原生RAG部署。与之前假设大内存和延迟预算的基于RL的检索器不同,SRAS使用邻近策略优化(PPO)学习紧凑(约0.76MB)策略,由结合Relaxed F1和BERTScore的混合奖励信号指导。我们的方法在严格的令牌和计算约束下运行,在CPU上保持<1 s的延迟。SRAS在综合QA基准测试中优于监督和随机选择器,并推广到真实世界的数据,在SQuAD v2上实现BERTScore F1为0.8546,无需特定于域的调优。这项工作首次证明了基于RL的文档选择可以实现超轻量化、延迟感知,并且对设备上的RAG管道有效。
摘要:Retrieval-Augmented Generation (RAG) systems often rely on fixed top-k document selection mechanisms that ignore downstream generation quality and impose computational overheads. We propose SRAS (Sparse Reward-Aware Selector), a lightweight document selector trained via reinforcement learning (RL) for edge-native RAG deployment. Unlike prior RL-based retrievers that assume large memory and latency budgets, SRAS learns a compact (~0.76MB) policy using Proximal Policy Optimization (PPO), guided by a hybrid reward signal combining Relaxed F1 and BERTScore. Our method operates under tight token and compute constraints, maintaining <1s latency on CPU. SRAS outperforms supervised and random selectors on a synthetic QA benchmark, and generalizes to real-world data, achieving BERTScore F1 of 0.8546 on SQuAD v2 without domain-specific tuning. This work is the first to demonstrate that RL-based document selection can be made ultra-lightweight, latency-aware, and effective for on-device RAG pipelines.


【6】SmartFlow Reinforcement Learning and Agentic AI for Bike-Sharing Optimisation
标题:SmartFlow强化学习和用于共享自行车优化的大型人工智能
链接:https://arxiv.org/abs/2601.00868

作者:Aditya Sreevatsa K,Arun Kumar Raveendran,Jesrael K Mani,Prakash G Shigli,Rajkumar Rangadore,Narayana Darapaneni,Anwesh Reddy Paduri
摘要:SmartFlow是一个多层框架,集成了强化学习和人工智能,以解决城市自行车共享服务中的动态再平衡问题。其架构将战略、战术和通信功能分离,以实现清晰性和可扩展性。在战略层面,在纽约花旗自行车网络的高保真模拟中训练的深度Q网络(DQN)代理通过将挑战建模为马尔可夫决策过程来学习强大的再平衡政策。这些高级策略输入到一个确定性的战术模块中,该模块优化了多段行程,并及时安排调度,以最大限度地减少车队行程。通过多个种子运行的评估证明了SmartFlows的高效率,将网络不平衡减少了95%以上,同时需要最小的行驶距离并实现强大的卡车利用率。通信层由具有大型语言模型(LLM)的接地人工智能提供支持,将后勤计划转化为操作人员的清晰,可操作的指令,确保可解释性和执行准备就绪。这种集成将机器智能与人工操作连接起来,提供了一种可扩展的解决方案,可减少空闲时间,提高自行车可用性并降低运营成本。SmartFlow为复杂的城市交通网络中可解释的人工智能驱动的物流提供了蓝图。
摘要:SmartFlow is a multi-layered framework that integrates Reinforcement Learning and Agentic AI to address the dynamic rebalancing problem in urban bike-sharing services. Its architecture separates strategic, tactical, and communication functions for clarity and scalability. At the strategic level, a Deep Q-Network (DQN) agent, trained in a high-fidelity simulation of New Yorks Citi Bike network, learns robust rebalancing policies by modelling the challenge as a Markov Decision Process. These high-level strategies feed into a deterministic tactical module that optimises multi-leg journeys and schedules just-in-time dispatches to minimise fleet travel. Evaluation across multiple seeded runs demonstrates SmartFlows high efficacy, reducing network imbalance by over 95% while requiring minimal travel distance and achieving strong truck utilisation. A communication layer, powered by a grounded Agentic AI with a Large Language Model (LLM), translates logistical plans into clear, actionable instructions for operational staff, ensuring interpretability and execution readiness. This integration bridges machine intelligence with human operations, offering a scalable solution that reduces idle time, improves bike availability, and lowers operational costs. SmartFlow provides a blueprint for interpretable, AI-driven logistics in complex urban mobility networks.


【7】Horizon Reduction as Information Loss in Offline Reinforcement Learning
标题:离线强化学习中的信息丢失的地平线减少
链接:https://arxiv.org/abs/2601.00831

作者:Uday Kumar Nidadala,Venkata Bhumika Guthi
备注:13 pages, 3 figures
摘要:水平缩减是离线强化学习(RL)中的常见设计策略,用于减轻长水平信用分配,提高稳定性,并通过截断展开、加窗训练或分层分解实现可扩展学习(Levine et al.,2020; Prudencio等人,2023年; Park等人,2025年)。尽管最近的经验证据表明,水平缩减可以改善具有挑战性的离线RL基准的缩放,但其理论含义仍然不发达(Park等人,2025年)。在本文中,我们表明,地平线减少会导致基本的和不可恢复的信息丢失离线RL。我们形式化的地平线减少学习从固定长度的轨迹段,并证明,在这种范式和任何学习接口限制到固定长度的轨迹段,最优的政策可能是统计上无法区分的次优的,即使有无限的数据和完美的函数逼近。通过一组最小的反例马尔可夫决策过程(MDP),我们确定了三种不同的结构故障模式:(i)前缀不可识别性导致可识别性故障,(ii)截断返回引起的客观错误指定,(iii)离线数据集支持和表示混淆。我们的研究结果建立了必要的条件,在这些条件下,水平缩减可以是安全的,并突出了无法单独通过算法改进来克服的内在限制,补充了保守目标和分布偏移的算法工作,解决了离线RL困难的不同轴(Fujimoto et al.,2019年; Kumar等人,2020; Gulcehre等人,2020年)。
摘要:Horizon reduction is a common design strategy in offline reinforcement learning (RL), used to mitigate long-horizon credit assignment, improve stability, and enable scalable learning through truncated rollouts, windowed training, or hierarchical decomposition (Levine et al., 2020; Prudencio et al., 2023; Park et al., 2025). Despite recent empirical evidence that horizon reduction can improve scaling on challenging offline RL benchmarks, its theoretical implications remain underdeveloped (Park et al., 2025). In this paper, we show that horizon reduction can induce fundamental and irrecoverable information loss in offline RL. We formalize horizon reduction as learning from fixed-length trajectory segments and prove that, under this paradigm and any learning interface restricted to fixed-length trajectory segments, optimal policies may be statistically indistinguishable from suboptimal ones even with infinite data and perfect function approximation. Through a set of minimal counterexample Markov decision processes (MDPs), we identify three distinct structural failure modes: (i) prefix indistinguishability leading to identifiability failure, (ii) objective misspecification induced by truncated returns, and (iii) offline dataset support and representation aliasing. Our results establish necessary conditions under which horizon reduction can be safe and highlight intrinsic limitations that cannot be overcome by algorithmic improvements alone, complementing algorithmic work on conservative objectives and distribution shift that addresses a different axis of offline RL difficulty (Fujimoto et al., 2019; Kumar et al., 2020; Gulcehre et al., 2020).


【8】Reinforcement Learning for Option Hedging: Static Implied-Volatility Fit versus Shortfall-Aware Performance
标题:期权对冲的强化学习:静态隐含波动率匹配与短缺感知性能
链接:https://arxiv.org/abs/2601.01709

作者:Ziheng Chen,Minxuan Hu,Jiayu Yi,Wenxi Sun
摘要:在Black-Scholes(QLBS)框架中引入风险厌恶和交易成本,扩展了QLBS框架,提出了一种新的复制学习期权定价方法(RLOP)。这两种方法都与标准的强化学习算法完全兼容,并在市场摩擦下运行。使用SPY和XOP选项数据,我们沿着静态和动态维度评估性能。自适应QLBS在隐含波动率空间实现了更高的静态定价精度,而RLOP通过降低短缺概率提供了更好的动态对冲性能。这些结果突出了静态拟合之外的期权定价模型评估的重要性,强调实现对冲的结果。
摘要:We extend the Q-learner in Black-Scholes (QLBS) framework by incorporating risk aversion and trading costs, and propose a novel Replication Learning of Option Pricing (RLOP) approach. Both methods are fully compatible with standard reinforcement learning algorithms and operate under market frictions. Using SPY and XOP option data, we evaluate performance along static and dynamic dimensions. Adaptive-QLBS achieves higher static pricing accuracy in implied volatility space, while RLOP delivers superior dynamic hedging performance by reducing shortfall probability. These results highlight the importance of evaluating option pricing models beyond static fit, emphasizing realized hedging outcomes.


分层学习(1篇)

【1】Forget Less by Learning from Parents Through Hierarchical Relationships
标题:通过等级关系向父母学习,忘记得更少
链接:https://arxiv.org/abs/2601.01892

作者:Arjun Ramesh Kaushik,Naresh Kumar Devulapally,Vishnu Suresh Lokhande,Nalini K. Ratha,Venu Govindaraju
备注:Accepted at AAAI-26
摘要:自定义扩散模型(CDM)在生成建模中提供了令人印象深刻的个性化功能,但在顺序学习新概念时,它们仍然容易受到灾难性遗忘的影响。现有的方法主要集中在尽量减少概念之间的干扰,往往忽略了积极的概念间的相互作用的潜力。在这项工作中,我们提出了遗忘少从父母学习(FLLP),一个新的框架,引入了双曲空间中的亲子概念间学习机制,以减轻遗忘。通过在洛伦兹流形中嵌入概念表示,自然适合于建模树状层次结构,我们定义了父子关系,其中以前学到的概念作为适应新概念的指导。我们的方法不仅保留了先验知识,而且还支持新概念的不断集成。我们在三个公共数据集和一个合成基准上验证了FLLP,在鲁棒性和泛化方面都有了一致的改进。
摘要:Custom Diffusion Models (CDMs) offer impressive capabilities for personalization in generative modeling, yet they remain vulnerable to catastrophic forgetting when learning new concepts sequentially. Existing approaches primarily focus on minimizing interference between concepts, often neglecting the potential for positive inter-concept interactions. In this work, we present Forget Less by Learning from Parents (FLLP), a novel framework that introduces a parent-child inter-concept learning mechanism in hyperbolic space to mitigate forgetting. By embedding concept representations within a Lorentzian manifold, naturally suited to modeling tree-like hierarchies, we define parent-child relationships in which previously learned concepts serve as guidance for adapting to new ones. Our method not only preserves prior knowledge but also supports continual integration of new concepts. We validate FLLP on three public datasets and one synthetic benchmark, showing consistent improvements in both robustness and generalization.


医学相关(7篇)

【1】A Novel Deep Learning Method for Segmenting the Left Ventricle in Cardiac Cine MRI
标题:一种用于在心脏电影MRI中分割左心室的新型深度学习方法
链接:https://arxiv.org/abs/2601.01512

作者:Wenhui Chu,Aobo Jin,Hardik A. Gohel
备注:9 pages, 5 figures
摘要:这项研究旨在开发一种新型的深度学习网络GBU-Net,它利用了一种组批规范化的U-Net框架,专门用于短轴电影MRI扫描中左心室的精确语义分割。该方法包括用于特征提取的下采样路径和用于细节恢复的上采样路径,增强了医学成像。关键修改包括更好地理解心脏MRI分割中至关重要的上下文的技术。该数据集由45名患者的805次左心室MRI扫描组成,并使用骰子系数和平均垂直距离等既定指标进行比较分析。GBU-Net显著提高了电影MRI扫描中左心室分割的准确性。其创新设计在测试中优于现有方法,超过了骰子系数和平均垂直距离等标准指标。该方法在捕获上下文信息的能力方面是独一无二的,而这在传统的基于CNN的分割中经常被遗漏。GBU-Net的集合在SunnyBrook测试数据集上获得了97%的骰子得分。GBU-Net为外科机器人和医疗分析提供了增强的左心室分割精度和上下文理解。
摘要:This research aims to develop a novel deep learning network, GBU-Net, utilizing a group-batch-normalized U-Net framework, specifically designed for the precise semantic segmentation of the left ventricle in short-axis cine MRI scans. The methodology includes a down-sampling pathway for feature extraction and an up-sampling pathway for detail restoration, enhanced for medical imaging. Key modifications include techniques for better contextual understanding crucial in cardiac MRI segmentation. The dataset consists of 805 left ventricular MRI scans from 45 patients, with comparative analysis using established metrics such as the dice coefficient and mean perpendicular distance. GBU-Net significantly improves the accuracy of left ventricle segmentation in cine MRI scans. Its innovative design outperforms existing methods in tests, surpassing standard metrics like the dice coefficient and mean perpendicular distance. The approach is unique in its ability to capture contextual information, often missed in traditional CNN-based segmentation. An ensemble of the GBU-Net attains a 97% dice score on the SunnyBrook testing dataset. GBU-Net offers enhanced precision and contextual understanding in left ventricle segmentation for surgical robotics and medical analysis.


【2】Unveiling the Heart-Brain Connection: An Analysis of ECG in Cognitive Performance
标题:揭开心肺联系:心电图在认知表现中的分析
链接:https://arxiv.org/abs/2601.01424

作者:Akshay Sasi,Malavika Pradeep,Nusaibah Farrukh,Rahul Venugopal,Elizabeth Sherly
备注:6 pages, 6 figures. Code available at https://github.com/AkshaySasi/Unveiling-the-Heart-Brain-Connection-An-Analysis-of-ECG-in-Cognitive-Performance. Presented at AIHC (not published)
摘要:了解认知活动期间神经和心脏系统的相互作用对于推进生理计算至关重要。虽然EEG一直是评估脑力负荷的金标准,但其有限的便携性限制了其在现实世界中的使用。通过可穿戴设备广泛使用的ECG提出了一种实用的替代方案。本研究探讨ECG信号是否可以可靠地反映认知负荷,并作为代理的EEG为基础的指标。在这项工作中,我们提出了多模态数据从两个不同的范式,涉及工作记忆和被动倾听任务。对于每种模态,我们提取ECG时域HRV指标和Catch22描述符对EEG频谱和Catch22功能,分别。我们提出了一个跨模态XGBoost框架,将ECG特征投影到EEG代表的认知空间上,从而允许仅使用ECG进行工作量推断。我们的研究结果表明,心电图衍生的预测表达捕捉认知状态的变化,并提供了良好的支持,准确的分类。我们的研究结果支持ECG作为日常认知监测的可解释,实时,可穿戴解决方案。
摘要:Understanding the interaction of neural and cardiac systems during cognitive activity is critical to advancing physiological computing. Although EEG has been the gold standard for assessing mental workload, its limited portability restricts its real-world use. Widely available ECG through wearable devices proposes a pragmatic alternative. This research investigates whether ECG signals can reliably reflect cognitive load and serve as proxies for EEG-based indicators. In this work, we present multimodal data acquired from two different paradigms involving working-memory and passive-listening tasks. For each modality, we extracted ECG time-domain HRV metrics and Catch22 descriptors against EEG spectral and Catch22 features, respectively. We propose a cross-modal XGBoost framework to project the ECG features onto EEG-representative cognitive spaces, thereby allowing workload inferences using only ECG. Our results show that ECG-derived projections expressively capture variation in cognitive states and provide good support for accurate classification. Our findings underpin ECG as an interpretable, real-time, wearable solution for everyday cognitive monitoring.


【3】Community-Based Early-Stage Chronic Kidney Disease Screening using Explainable Machine Learning for Low-Resource Settings
标题:使用可解释机器学习在低资源环境中进行基于社区的早期慢性肾病筛查
链接:https://arxiv.org/abs/2601.01119

作者:Muhammad Ashad Kabir,Sirajam Munira,Dewan Tasnia Azad,Saleh Mohammed Ikram,Mohammad Habibur Rahman Sarker,Syed Manzoor Ahmed Hanifi
备注:27 pages
摘要 :早期发现慢性肾脏病(CKD)对于防止进展为终末期肾脏病至关重要。然而,现有的筛查工具-主要是利用高收入国家的人口开发的-在孟加拉国和南亚往往表现不佳,因为那里的风险状况各不相同。这些工具中的大多数依赖于简单的累加评分函数,并且基于晚期CKD患者的数据。因此,它们无法捕捉风险因素之间复杂的相互作用,并且在预测早期慢性肾病方面受到限制。我们的目标是开发和评估一个可解释的机器学习(ML)框架,用于在低资源环境中进行基于社区的早期CKD筛查,并针对孟加拉国和南亚人口背景进行定制。我们使用了来自孟加拉国的基于社区的数据集,这是南亚和南亚的第一个CKD数据集,并评估了跨多个特征域的12个ML分类器。十个互补的特征选择技术,以确定强大的,可推广的预测。使用10倍交叉验证评估最终模型。对来自印度、阿联酋和孟加拉国的三个独立数据集进行了外部验证。SHAP(SHapley加法解释)用于提供模型的可解释性。在RFECV选择的特征子集上训练的ML模型实现了90.40%的平衡准确度,而最小的非病理学测试特征表现出出色的预测能力,平衡准确度为89.23%,通常优于较大或完整的特征集。与现有的筛选工具相比,拟议的模型实现了更高的准确性和灵敏度,同时需要更少和更容易获得的投入。外部验证证实了78%至98%的灵敏度具有很强的普遍性。SHAP解释确定了与确定的CKD风险因素一致的有临床意义的预测因子。
摘要:Early detection of chronic kidney disease (CKD) is essential for preventing progression to end-stage renal disease. However, existing screening tools - primarily developed using populations from high-income countries - often underperform in Bangladesh and South Asia, where risk profiles differ. Most of these tools rely on simple additive scoring functions and are based on data from patients with advanced-stage CKD. Consequently, they fail to capture complex interactions among risk factors and are limited in predicting early-stage CKD. Our objective was to develop and evaluate an explainable machine learning (ML) framework for community-based early-stage CKD screening for low-resource settings, tailored to the Bangladeshi and South Asian population context. We used a community-based dataset from Bangladesh, the first such CKD dataset in South and South Asia, and evaluated twelve ML classifiers across multiple feature domains. Ten complementary feature selection techniques were applied to identify robust, generalizable predictors. The final models were assessed using 10-fold cross-validation. External validation was conducted on three independent datasets from India, the UAE, and Bangladesh. SHAP (SHapley Additive exPlanations) was used to provide model explainability. An ML model trained on an RFECV-selected feature subset achieved a balanced accuracy of 90.40%, whereas minimal non-pathology-test features demonstrated excellent predictive capability with a balanced accuracy of 89.23%, often outperforming larger or full feature sets. Compared with existing screening tools, the proposed models achieved substantially higher accuracy and sensitivity while requiring fewer and more accessible inputs. External validation confirmed strong generalizability with 78% to 98% sensitivity. SHAP interpretation identified clinically meaningful predictors consistent with established CKD risk factors.


【4】Practical Geometric and Quantum Kernel Methods for Predicting Skeletal Muscle Outcomes in chronic obstructive pulmonary disease
标题:预测慢性阻塞性肺病骨骼肌结局的实用几何和量子核方法
链接:https://arxiv.org/abs/2601.00921

作者:Azadeh Alavi,Hamidreza Khalili,Stanley H. Chan,Fatemeh Kouchmeshki,Ross Vlahos
备注:24 pages, 4 figures
摘要:骨骼肌功能障碍是慢性阻塞性肺疾病(COPD)的临床相关肺外表现,与全身和气道炎症密切相关。这激发了从可以纵向获取的微创生物标志物对肌肉结果的预测建模。我们研究了一个小样本临床前数据集,包括213只动物在两种条件下(假手术与烟雾暴露),血液和支气管肺泡灌洗液测量和三个连续的目标:胫骨前肌重量(毫克:毫克),比力(毫牛顿:mN),和衍生的肌肉质量指数(mN每毫克)。我们基准调整经典基线,几何感知对称正定(SPD)描述符与斯坦分歧,量子内核模型设计的低维表格数据。在肌肉重量设置中,使用四个可解释的输入(血液C反应蛋白、中性粒细胞计数、支气管肺泡灌洗液细胞结构和状况)的量子核岭回归获得4.41 mg的测试均方根误差和0.605的决定系数,在相同特征集(4.70 mg和0.553)上比匹配的岭基线改善。几何信息斯坦发散原型距离在仅生物标记物设置中产生较小但一致的增益(4.55 mg与4.79 mg)。筛选式评估,通过在0.8倍的训练假平均值下对连续结果进行阈值处理获得,达到了高达0.90的受试者工作特征曲线下面积(ROC-AUC),用于检测低肌肉重量。这些结果表明,几何和量子内核提升可以在低数据、低特征的生物医学预测问题中提供可衡量的好处,同时保持可解释性和透明的模型选择。
摘要:Skeletal muscle dysfunction is a clinically relevant extra-pulmonary manifestation of chronic obstructive pulmonary disease (COPD) and is closely linked to systemic and airway inflammation. This motivates predictive modelling of muscle outcomes from minimally invasive biomarkers that can be acquired longitudinally. We study a small-sample preclinical dataset comprising 213 animals across two conditions (Sham versus cigarette-smoke exposure), with blood and bronchoalveolar lavage fluid measurements and three continuous targets: tibialis anterior muscle weight (milligram: mg), specific force (millinewton: mN), and a derived muscle quality index (mN per mg). We benchmark tuned classical baselines, geometry-aware symmetric positive definite (SPD) descriptors with Stein divergence, and quantum kernel models designed for low-dimensional tabular data. In the muscle-weight setting, quantum kernel ridge regression using four interpretable inputs (blood C-reactive protein, neutrophil count, bronchoalveolar lavage cellularity, and condition) attains a test root mean squared error of 4.41 mg and coefficient of determination of 0.605, improving over a matched ridge baseline on the same feature set (4.70 mg and 0.553). Geometry-informed Stein-divergence prototype distances yield a smaller but consistent gain in the biomarker-only setting (4.55 mg versus 4.79 mg). Screening-style evaluation, obtained by thresholding the continuous outcome at 0.8 times the training Sham mean, achieves an area under the receiver operating characteristic curve (ROC-AUC) of up to 0.90 for detecting low muscle weight. These results indicate that geometric and quantum kernel lifts can provide measurable benefits in low-data, low-feature biomedical prediction problems, while preserving interpretability and transparent model selection.


【5】Conformal Prediction Under Distribution Shift: A COVID-19 Natural Experiment
标题:分布转移下的保形预测:COVID-19自然实验
链接:https://arxiv.org/abs/2601.00908

作者:Chorok Lee
摘要:保形预测保证在分布移位下降级。我们使用COVID-19作为8个供应链任务的自然实验来研究这一点。尽管相同的严重特征周转(Jaccard约为0),覆盖率下降从0%到86.7%不等,跨越两个数量级。使用SHapley加法解释(SHAP)分析,我们发现灾难性故障与单特征依赖相关(rho = 0.714,p = 0.047)。灾难性任务将重要性集中在一个功能上(增加4.5倍),而健壮性任务将重要性重新分布在许多功能上(10- 20倍)。季度再培训将灾难性任务的覆盖率从22%恢复到41%(+19 pp,p = 0.04),但对稳健任务没有任何好处(99.8%的覆盖率)。具有中等特征稳定性(Jaccard 0.13-0.86)的4个额外任务的探索性分析显示,特征稳定性而不是浓度决定了稳健性,表明浓度效应特别适用于严重偏移。我们提供了一个决策框架:在部署前监测SHAP浓度;如果易受攻击(浓度>40%),则每季度进行一次再培训;如果稳定,则跳过再培训。
摘要:Conformal prediction guarantees degrade under distribution shift. We study this using COVID-19 as a natural experiment across 8 supply chain tasks. Despite identical severe feature turnover (Jaccard approximately 0), coverage drops vary from 0% to 86.7%, spanning two orders of magnitude. Using SHapley Additive exPlanations (SHAP) analysis, we find catastrophic failures correlate with single-feature dependence (rho = 0.714, p = 0.047). Catastrophic tasks concentrate importance in one feature (4.5x increase), while robust tasks redistribute across many (10-20x). Quarterly retraining restores catastrophic task coverage from 22% to 41% (+19 pp, p = 0.04), but provides no benefit for robust tasks (99.8% coverage). Exploratory analysis of 4 additional tasks with moderate feature stability (Jaccard 0.13-0.86) reveals feature stability, not concentration, determines robustness, suggesting concentration effects apply specifically to severe shifts. We provide a decision framework: monitor SHAP concentration before deployment; retrain quarterly if vulnerable (>40% concentration); skip retraining if robust.


【6】LearnAD: Learning Interpretable Rules for Brain Networks in Alzheimer's Disease Classification
标题:LearnAD:学习阿尔茨海默病分类中大脑网络的可解释规则
链接:https://arxiv.org/abs/2601.00877

作者:Thomas Andrews,Mark Law,Sara Ahmadi-Abhari,Alessandra Russo
备注:NeurIPS 2025, Data on the Brain & Mind Workshop
摘要:我们介绍了LearnAD,一种从脑磁共振成像数据预测阿尔茨海默病的神经符号方法,学习完全可解释的规则。LearnAD应用统计模型、决策树、随机森林或GNN来识别相关的大脑连接,然后使用FastLAS来学习全局规则。我们最好的实例优于决策树,与支持向量机的准确性相匹配,并且在所有特征上训练的性能仅略低于随机森林和GNN,同时保持完全可解释性。消融研究表明,我们的神经符号的方法提高了可解释性与纯统计模型的性能相当。LearnAD展示了符号学习如何加深我们对临床神经科学中GNN行为的理解。
摘要 :We introduce LearnAD, a neuro-symbolic method for predicting Alzheimer's disease from brain magnetic resonance imaging data, learning fully interpretable rules. LearnAD applies statistical models, Decision Trees, Random Forests, or GNNs to identify relevant brain connections, and then employs FastLAS to learn global rules. Our best instance outperforms Decision Trees, matches Support Vector Machine accuracy, and performs only slightly below Random Forests and GNNs trained on all features, all while remaining fully interpretable. Ablation studies show that our neuro-symbolic approach improves interpretability with comparable performance to pure statistical models. LearnAD demonstrates how symbolic learning can deepen our understanding of GNN behaviour in clinical neuroscience.


【7】Predicting Early and Complete Drug Release from Long-Acting Injectables Using Explainable Machine Learning
标题:使用可解释机器学习预测长效注射剂的早期和完全药物释放
链接:https://arxiv.org/abs/2601.02265

作者:Karla N. Robles,Manar D. Samad
摘要:基于聚合物的长效注射剂(LAI)通过实现受控药物递送,从而降低给药频率并延长治疗持续时间,从而改变了慢性疾病的治疗。实现药物从叶面积指数中的控制释放需要对复杂的潜在物理化学性质进行广泛优化。机器学习(ML)可以通过对LAI特性与药物释放之间的复杂关系进行建模来加速LAI的发展。然而,由于缺乏针对LAI数据的定制建模和分析,最近的ML研究提供了关于调节药物释放的关键特性的有限信息。本文提出了一种新的数据转换和可解释的ML方法,通过预测24,48和72小时的早期药物释放,释放曲线类型的分类和完整释放曲线的预测,从321个LAI制剂中合成可操作的信息。这三个实验研究了LAI材料特性在早期和完全药物释放曲线中的贡献和控制。在72小时内观察到真实和预测的药物释放之间的强相关性(>0.65),而在对释放曲线类型进行分类时获得0.87的F1分数。与时间无关的ML框架预测延迟的双相和三相曲线,其性能优于当前的时间依赖性方法。Shapley加法解释揭示了早期和完全释放期间材料特性的相对影响,填补了先前体外和ML研究中的几个空白。新的方法和研究结果可以为科学家优化LAI的药物释放动力学提供定量策略和建议。模型实现的源代码是公开的。
摘要:Polymer-based long-acting injectables (LAIs) have transformed the treatment of chronic diseases by enabling controlled drug delivery, thus reducing dosing frequency and extending therapeutic duration. Achieving controlled drug release from LAIs requires extensive optimization of the complex underlying physicochemical properties. Machine learning (ML) can accelerate LAI development by modeling the complex relationships between LAI properties and drug release. However, recent ML studies have provided limited information on key properties that modulate drug release, due to the lack of custom modeling and analysis tailored to LAI data. This paper presents a novel data transformation and explainable ML approach to synthesize actionable information from 321 LAI formulations by predicting early drug release at 24, 48, and 72 hours, classification of release profile types, and prediction of complete release profiles. These three experiments investigate the contribution and control of LAI material characteristics in early and complete drug release profiles. A strong correlation (>0.65) is observed between the true and predicted drug release in 72 hours, while a 0.87 F1-score is obtained in classifying release profile types. A time-independent ML framework predicts delayed biphasic and triphasic curves with better performance than current time-dependent approaches. Shapley additive explanations reveal the relative influence of material characteristics during early and for complete release which fill several gaps in previous in-vitro and ML-based studies. The novel approach and findings can provide a quantitative strategy and recommendations for scientists to optimize the drug-release dynamics of LAI. The source code for the model implementation is publicly available.


蒸馏|知识提取(3篇)

【1】Aspect Extraction from E-Commerce Product and Service Reviews
标题:电子商务产品和服务评论的方面提取
链接:https://arxiv.org/abs/2601.01827

作者:Valiant Lance D. Dionela,Fatima Kriselle S. Dy,Robin James M. Hombrebueno,Aaron Rae M. Nicolas,Charibeth K. Cheng,Raphael W. Gonda
摘要:Aspect Extraction(AE)是基于语义的情感分析(ABSA)中的一项关键任务,但它仍然很难应用于低资源和代码转换的上下文,如Taglish,菲律宾电子商务评论中常用的Taglish和英语的混合。本文介绍了一个全面的AE管道设计的Taglish,结合基于规则,大语言模型(LLM)的基础上,微调技术,以解决双方的方面识别和提取。通过多方法主题建模,以及显式和隐式方面的双模式标记方案,开发了层次方面框架(HAF)。对于方面识别,评估了四个不同的模型:基于规则的系统,生成LLM(Gemini 2.0 Flash)和两个在不同数据集上训练的微调Gemma-3 1B模型(基于规则与LLM注释)。结果表明,生成LLM在所有任务中实现了最高性能(宏F1 0.91),在处理隐式方面表现出卓越的能力。相比之下,由于数据集不平衡和架构容量限制,微调模型表现出有限的性能。这项工作有助于提高ABSA在不同的,代码转换环境中的可扩展性和语言自适应的框架。
摘要:Aspect Extraction (AE) is a key task in Aspect-Based Sentiment Analysis (ABSA), yet it remains difficult to apply in low-resource and code-switched contexts like Taglish, a mix of Tagalog and English commonly used in Filipino e-commerce reviews. This paper introduces a comprehensive AE pipeline designed for Taglish, combining rule-based, large language model (LLM)-based, and fine-tuning techniques to address both aspect identification and extraction. A Hierarchical Aspect Framework (HAF) is developed through multi-method topic modeling, along with a dual-mode tagging scheme for explicit and implicit aspects. For aspect identification, four distinct models are evaluated: a Rule-Based system, a Generative LLM (Gemini 2.0 Flash), and two Fine-Tuned Gemma-3 1B models trained on different datasets (Rule-Based vs. LLM-Annotated). Results indicate that the Generative LLM achieved the highest performance across all tasks (Macro F1 0.91), demonstrating superior capability in handling implicit aspects. In contrast, the fine-tuned models exhibited limited performance due to dataset imbalance and architectural capacity constraints. This work contributes a scalable and linguistically adaptive framework for enhancing ABSA in diverse, code-switched environments.


【2】DiMEx: Breaking the Cold Start Barrier in Data-Free Model Extraction via Latent Diffusion Priors
标题:DiMEx:通过潜在扩散先验打破无数据模型提取中的冷启动障碍
链接:https://arxiv.org/abs/2601.01688

作者:Yash Thesia,Meera Suthar
备注:8 pages, 3 figures, 4 tables
摘要:模型窃取攻击对机器学习即服务(MLaaS)构成了生存威胁,允许对手以训练成本的一小部分复制专有模型。虽然无数据模型提取(DFME)已经成为一个隐形的向量,但它仍然受到“冷启动”问题的根本限制:基于GAN的对手浪费了数千个从随机噪声到有意义数据的查询。我们提出了DiMEx,这是一个框架,它将预训练的潜在扩散模型的丰富语义先验武器化,以完全绕过这个初始化障碍。通过在生成器的潜在空间内采用随机嵌入贝叶斯优化(REMBO),DiMEx立即合成高保真查询,仅用2,000个查询就实现了SVHN的52.1%一致率-比最先进的GAN基线高出16%以上。为了应对这种高度语义化的威胁,我们引入了混合状态枚举(HSE)防御,它可以识别潜在空间攻击的独特“优化轨迹”。我们的研究结果表明,虽然DiMEx避开了静态分布检测器,但HSE利用这种时间特征将攻击成功率抑制到21.6%,延迟可以忽略不计。
摘要:Model stealing attacks pose an existential threat to Machine Learning as a Service (MLaaS), allowing adversaries to replicate proprietary models for a fraction of their training cost. While Data-Free Model Extraction (DFME) has emerged as a stealthy vector, it remains fundamentally constrained by the "Cold Start" problem: GAN-based adversaries waste thousands of queries converging from random noise to meaningful data. We propose DiMEx, a framework that weaponizes the rich semantic priors of pre-trained Latent Diffusion Models to bypass this initialization barrier entirely. By employing Random Embedding Bayesian Optimization (REMBO) within the generator's latent space, DiMEx synthesizes high-fidelity queries immediately, achieving 52.1 percent agreement on SVHN with just 2,000 queries - outperforming state-of-the-art GAN baselines by over 16 percent. To counter this highly semantic threat, we introduce the Hybrid Stateful Ensemble (HSE) defense, which identifies the unique "optimization trajectory" of latent-space attacks. Our results demonstrate that while DiMEx evades static distribution detectors, HSE exploits this temporal signature to suppress attack success rates to 21.6 percent with negligible latency.


【3】SGD-Based Knowledge Distillation with Bayesian Teachers: Theory and Guidelines
标题:基于SGD的贝叶斯教师知识提炼:理论与指导
链接:https://arxiv.org/abs/2601.01484

作者:Itai Morad,Nir Shlezinger,Yonina C. Eldar
摘要:知识蒸馏(KD)是将知识从大型教师网络转移到通常较小的学生模型的中心范式,通常通过利用软概率输出。虽然KD在许多应用中表现出强大的经验成功,但其理论基础仍然只有部分理解。在这项工作中,我们采用贝叶斯观点对KD进行严格分析,以随机梯度下降(SGD)训练的学生的收敛行为。我们研究两种制度:$(i)$当教师提供精确的贝叶斯类概率(BCP)时;和$(ii)$监督BCP的噪声近似。我们的分析表明,从BCP的学习产生方差减少,并删除邻域项的收敛范围相比,一个热监督。我们进一步描述了噪声水平如何影响泛化和准确性。受这些见解的启发,我们提倡使用贝叶斯深度学习模型,作为KD的教师,该模型通常提供对BCP的改进估计。与我们的分析一致,我们通过实验证明,与从确定性教师中提取的学生相比,从贝叶斯教师中提取的学生不仅实现了更高的准确率(高达+4.27%),而且表现出更稳定的收敛性(噪音减少30%)。
摘要:Knowledge Distillation (KD) is a central paradigm for transferring knowledge from a large teacher network to a typically smaller student model, often by leveraging soft probabilistic outputs. While KD has shown strong empirical success in numerous applications, its theoretical underpinnings remain only partially understood. In this work, we adopt a Bayesian perspective on KD to rigorously analyze the convergence behavior of students trained with Stochastic Gradient Descent (SGD). We study two regimes: $(i)$ when the teacher provides the exact Bayes Class Probabilities (BCPs); and $(ii)$ supervision with noisy approximations of the BCPs. Our analysis shows that learning from BCPs yields variance reduction and removes neighborhood terms in the convergence bounds compared to one-hot supervision. We further characterize how the level of noise affects generalization and accuracy. Motivated by these insights, we advocate the use of Bayesian deep learning models, which typically provide improved estimates of the BCPs, as teachers in KD. Consistent with our analysis, we experimentally demonstrate that students distilled from Bayesian teachers not only achieve higher accuracies (up to +4.27%), but also exhibit more stable convergence (up to 30% less noise), compared to students distilled from deterministic teachers.


推荐(1篇)

【1】RelayGR: Scaling Long-Sequence Generative Recommendation via Cross-Stage Relay-Race Inference
标题:RelayGR:通过跨阶段Relay-Race推理扩展长序列生成推荐
链接:https://arxiv.org/abs/2601.01712

作者:Jiarui Wang,Huichao Chai,Yuanhang Zhang,Zongjin Zhou,Wei Guo,Xingkun Yang,Qiang Tang,Bo Pan,Jiawei Zhu,Ke Cheng,Yuting Yan,Shulan Wang,Yingjie Zhu,Zhengfan Yuan,Jiaqi Huang,Yuhan Zhang,Xiaosong Sun,Zhinan Zhang,Hong Zhu,Yongsheng Zhang,Tiantian Dong,Zhong Xiao,Deliang Liu,Chengzhou Lu,Yuan Sun,Zhiyuan Chen,Xinming Han,Zaizhu Liu,Yaoyuan Wang,Ziyang Zhang,Yong Liu,Jinxin Xu,Yajing Sun,Zhoujun Yu,Wenting Zhou,Qidong Zhang,Zhengyong Zhang,Zhonghai Gu,Yibo Jin,Yongxiang Feng,Pengfei Zuo
摘要:实时推荐系统在严格的尾部延迟SLO下执行多级级联(检索,预处理,细粒度排名),只留下几十毫秒的排名时间。生成式推荐(GR)模型可以通过消耗长的用户行为序列来提高质量,但在生产中,它们的在线序列长度受到排名阶段P99预算的严格限制。我们观察到,大多数GR令牌编码的用户行为与候选项无关,这表明有机会预先推断一次用户行为前缀,并在排名过程中重用它,而不是在关键路径上重新计算它。在工业规模上实现这一想法并不容易:前缀高速缓存必须在确定最终排序实例之前跨多个流水线级存活,用户数量意味着高速缓存占用空间远远超过单个设备,并且不加选择的预推断将使高QPS下的共享资源过载。我们提出了RelayGR,一个生产系统,使在HBM中继竞争推理GR. RelayGR选择性地预推断长期用户前缀,保持其KV缓存驻留在HBM在请求的生命周期,并确保随后的排名可以消耗它们没有远程读取。RelayGR结合了三种技术:1)序列感知触发器,在有界缓存占用空间和预推断负载下仅允许有风险的请求,2)亲和力感知路由器,通过将辅助预推断信号和排名请求路由到同一实例,将缓存生产和消耗放在一起,以及3)内存感知扩展器,使用服务器本地DRAM捕获短期交叉请求重用,同时避免冗余重新加载。我们在华为Ascend NPU上实现了RelayGR,并通过实际查询进行了评估。在固定的P99 SLO下,RelayGR支持最多1.5\times $更长的序列,并将符合SLO的吞吐量提高最多3.6\times $。
摘要:Real-time recommender systems execute multi-stage cascades (retrieval, pre-processing, fine-grained ranking) under strict tail-latency SLOs, leaving only tens of milliseconds for ranking. Generative recommendation (GR) models can improve quality by consuming long user-behavior sequences, but in production their online sequence length is tightly capped by the ranking-stage P99 budget. We observe that the majority of GR tokens encode user behaviors that are independent of the item candidates, suggesting an opportunity to pre-infer a user-behavior prefix once and reuse it during ranking rather than recomputing it on the critical path. Realizing this idea at industrial scale is non-trivial: the prefix cache must survive across multiple pipeline stages before the final ranking instance is determined, the user population implies cache footprints far beyond a single device, and indiscriminate pre-inference would overload shared resources under high QPS. We present RelayGR, a production system that enables in-HBM relay-race inference for GR. RelayGR selectively pre-infers long-term user prefixes, keeps their KV caches resident in HBM over the request lifecycle, and ensures the subsequent ranking can consume them without remote fetches. RelayGR combines three techniques: 1) a sequence-aware trigger that admits only at-risk requests under a bounded cache footprint and pre-inference load, 2) an affinity-aware router that co-locates cache production and consumption by routing both the auxiliary pre-infer signal and the ranking request to the same instance, and 3) a memory-aware expander that uses server-local DRAM to capture short-term cross-request reuse while avoiding redundant reloads. We implement RelayGR on Huawei Ascend NPUs and evaluate it with real queries. Under a fixed P99 SLO, RelayGR supports up to 1.5$\times$ longer sequences and improves SLO-compliant throughput by up to 3.6$\times$.


聚类(3篇)

【1】Wittgenstein's Family Resemblance Clustering Algorithm
标题:维特根斯坦家族相似性聚集算法
链接:https://arxiv.org/abs/2601.01127

作者:Golbahar Amanpour,Benyamin Ghojogh
摘要:This paper, introducing a novel method in philomatics, draws on Wittgenstein's concept of family resemblance from analytic philosophy to develop a clustering algorithm for machine learning. According to Wittgenstein's Philosophical Investigations (1953), family resemblance holds that members of a concept or category are connected by overlapping similarities rather than a single defining property. Consequently, a family of entities forms a chain of items sharing overlapping traits. This philosophical idea naturally lends itself to a graph-based approach in machine learning. Accordingly, we propose the Wittgenstein's Family Resemblance (WFR) clustering algorithm and its kernel variant, kernel WFR. This algorithm computes resemblance scores between neighboring data instances, and after thresholding these scores, a resemblance graph is constructed. The connected components of this graph define the resulting clusters. Simulations on benchmark datasets demonstrate that WFR is an effective nonlinear clustering algorithm that does not require prior knowledge of the number of clusters or assumptions about their shapes.


【2】Deep Clustering with Associative Memories
标题:具有关联记忆的深度集群
链接:https://arxiv.org/abs/2601.00963

作者:Bishwajit Saha,Dmitry Krotov,Mohammed J. Zaki,Parikshit Ram
摘要:Deep clustering - joint representation learning and latent space clustering - is a well studied problem especially in computer vision and text processing under the deep learning framework. While the representation learning is generally differentiable, clustering is an inherently discrete optimization task, requiring various approximations and regularizations to fit in a standard differentiable pipeline. This leads to a somewhat disjointed representation learning and clustering. In this work, we propose a novel loss function utilizing energy-based dynamics via Associative Memories to formulate a new deep clustering method, DCAM, which ties together the representation learning and clustering aspects more intricately in a single objective. Our experiments showcase the advantage of DCAM, producing improved clustering quality for various architecture choices (convolutional, residual or fully-connected) and data modalities (images or text).


【3】Hierarchical topological clustering
标题:层次式拓扑集群
链接:https://arxiv.org/abs/2601.00892

作者:Ana Carpio,Gema Duro
备注:not peer reviewed, reviewed version to appear in Soft Computing
摘要:Topological methods have the potential of exploring data clouds without making assumptions on their the structure. Here we propose a hierarchical topological clustering algorithm that can be implemented with any distance choice. The persistence of outliers and clusters of arbitrary shape is inferred from the resulting hierarchy. We demonstrate the potential of the algorithm on selected datasets in which outliers play relevant roles, consisting of images, medical and economic data. These methods can provide meaningful clusters in situations in which other techniques fail to do so.


点云|SLAM|雷达|激光|深度RGBD相关(1篇)

【1】Car Drag Coefficient Prediction from 3D Point Clouds Using a Slice-Based Surrogate Model
标题:使用基于切片的代理模型从3D点云预测汽车阻力系数
链接:https://arxiv.org/abs/2601.02112

作者:Utkarsh Singh,Absaar Ali,Adarsh Roy
备注:14 pages, 5 figures. Published in: Bramer M., Stahl F. (eds) Artificial Intelligence XLII. SGAI 2025. Lecture Notes in Computer Science, vol 16302. Springer, Cham
摘要:The automotive industry's pursuit of enhanced fuel economy and performance necessitates efficient aerodynamic design. However, traditional evaluation methods such as computational fluid dynamics (CFD) and wind tunnel testing are resource intensive, hindering rapid iteration in the early design stages. Machine learning-based surrogate models offer a promising alternative, yet many existing approaches suffer from high computational complexity, limited interpretability, or insufficient accuracy for detailed geometric inputs. This paper introduces a novel lightweight surrogate model for the prediction of the aerodynamic drag coefficient (Cd) based on a sequential slice-wise processing of the geometry of the 3D vehicle. Inspired by medical imaging, 3D point clouds of vehicles are decomposed into an ordered sequence of 2D cross-sectional slices along the stream-wise axis. Each slice is encoded by a lightweight PointNet2D module, and the sequence of slice embeddings is processed by a bidirectional LSTM to capture longitudinal geometric evolution. The model, trained and evaluated on the DrivAerNet++ dataset, achieves a high coefficient of determination (R^2 > 0.9528) and a low mean absolute error (MAE approx 6.046 x 10^{-3}) in Cd prediction. With an inference time of approximately 0.025 seconds per sample on a consumer-grade GPU, our approach provides fast, accurate, and interpretable aerodynamic feedback, facilitating more agile and informed automotive design exploration.


联邦学习|隐私保护|加密(3篇)

【1】Tackling Resource-Constrained and Data-Heterogeneity in Federated Learning with Double-Weight Sparse Pack
标题:利用双权重稀疏包解决联邦学习中的资源受限和数据异类
链接:https://arxiv.org/abs/2601.01840

作者:Qiantao Yang,Liquan Chen,Mingfu Xue,Songze Li
备注:Accepted in AAAI 2026
摘要:Federated learning has drawn widespread interest from researchers, yet the data heterogeneity across edge clients remains a key challenge, often degrading model performance. Existing methods enhance model compatibility with data heterogeneity by splitting models and knowledge distillation. However, they neglect the insufficient communication bandwidth and computing power on the client, failing to strike an effective balance between addressing data heterogeneity and accommodating limited client resources. To tackle this limitation, we propose a personalized federated learning method based on cosine sparsification parameter packing and dual-weighted aggregation (FedCSPACK), which effectively leverages the limited client resources and reduces the impact of data heterogeneity on model performance. In FedCSPACK, the client packages model parameters and selects the most contributing parameter packages for sharing based on cosine similarity, effectively reducing bandwidth requirements. The client then generates a mask matrix anchored to the shared parameter package to improve the alignment and aggregation efficiency of sparse updates on the server. Furthermore, directional and distribution distance weights are embedded in the mask to implement a weighted-guided aggregation mechanism, enhancing the robustness and generalization performance of the global model. Extensive experiments across four datasets using ten state-of-the-art methods demonstrate that FedCSPACK effectively improves communication and computational efficiency while maintaining high model accuracy.


【2】Distributed Federated Learning by Alternating Periods of Training
标题:通过交替训练周期进行分布式联邦学习
链接:https://arxiv.org/abs/2601.01793

作者:Shamik Bhattacharyya,Rachel Kalpana Kalaimani
摘要 :Federated learning is a privacy-focused approach towards machine learning where models are trained on client devices with locally available data and aggregated at a central server. However, the dependence on a single central server is challenging in the case of a large number of clients and even poses the risk of a single point of failure. To address these critical limitations of scalability and fault-tolerance, we present a distributed approach to federated learning comprising multiple servers with inter-server communication capabilities. While providing a fully decentralized approach, the designed framework retains the core federated learning structure where each server is associated with a disjoint set of clients with server-client communication capabilities. We propose a novel DFL (Distributed Federated Learning) algorithm which uses alternating periods of local training on the client data followed by global training among servers. We show that the DFL algorithm, under a suitable choice of parameters, ensures that all the servers converge to a common model value within a small tolerance of the ideal model, thus exhibiting effective integration of local and global training models. Finally, we illustrate our theoretical claims through numerical simulations.


【3】Byzantine-Robust Federated Learning Framework with Post-Quantum Secure Aggregation for Real-Time Threat Intelligence Sharing in Critical IoT Infrastructure
标题:具有后量子安全聚合的拜占庭稳健联邦学习框架,用于关键物联网基础设施中的实时威胁情报共享
链接:https://arxiv.org/abs/2601.01053

作者:Milad Rahmati,Nima Rahmati
摘要:The proliferation of Internet of Things devices in critical infrastructure has created unprecedented cybersecurity challenges, necessitating collaborative threat detection mechanisms that preserve data privacy while maintaining robustness against sophisticated attacks. Traditional federated learning approaches for IoT security suffer from two critical vulnerabilities: susceptibility to Byzantine attacks where malicious participants poison model updates, and inadequacy against future quantum computing threats that can compromise cryptographic aggregation protocols. This paper presents a novel Byzantine-robust federated learning framework integrated with post-quantum secure aggregation specifically designed for real-time threat intelligence sharing across critical IoT infrastructure. The proposed framework combines a adaptive weighted aggregation mechanism with lattice-based cryptographic protocols to simultaneously defend against model poisoning attacks and quantum adversaries. We introduce a reputation-based client selection algorithm that dynamically identifies and excludes Byzantine participants while maintaining differential privacy guarantees. The secure aggregation protocol employs CRYSTALS-Kyber for key encapsulation and homomorphic encryption to ensure confidentiality during parameter updates. Experimental evaluation on industrial IoT intrusion detection datasets demonstrates that our framework achieves 96.8% threat detection accuracy while successfully mitigating up to 40% Byzantine attackers, with only 18% computational overhead compared to non-secure federated approaches. The framework maintains sub-second aggregation latency suitable for real-time applications and provides 256-bit post-quantum security level.


推理|分析|理解|解释(16篇)

【1】POSEIDON: Physics-Optimized Seismic Energy Inference and Detection Operating Network
标题:POSEIDON:物理优化的地震能量推断和检测操作网络
链接:https://arxiv.org/abs/2601.02264

作者:Boris Kriuk,Fedor Kriuk
备注:8 pages, 14 figures
摘要:Earthquake prediction and seismic hazard assessment remain fundamental challenges in geophysics, with existing machine learning approaches often operating as black boxes that ignore established physical laws. We introduce POSEIDON (Physics-Optimized Seismic Energy Inference and Detection Operating Network), a physics-informed energy-based model for unified multi-task seismic event prediction, alongside the Poseidon dataset -- the largest open-source global earthquake catalog comprising 2.8 million events spanning 30 years. POSEIDON embeds fundamental seismological principles, including the Gutenberg-Richter magnitude-frequency relationship and Omori-Utsu aftershock decay law, as learnable constraints within an energy-based modeling framework. The architecture simultaneously addresses three interconnected prediction tasks: aftershock sequence identification, tsunami generation potential, and foreshock detection. Extensive experiments demonstrate that POSEIDON achieves state-of-the-art performance across all tasks, outperforming gradient boosting, random forest, and CNN baselines with the highest average F1 score among all compared methods. Crucially, the learned physics parameters converge to scientifically interpretable values -- Gutenberg-Richter b-value of 0.752 and Omori-Utsu parameters p=0.835, c=0.1948 days -- falling within established seismological ranges while enhancing rather than compromising predictive accuracy. The Poseidon dataset is publicly available at https://huggingface.co/datasets/BorisKriuk/Poseidon, providing pre-computed energy features, spatial grid indices, and standardized quality metrics to advance physics-informed seismic research.


【2】Entropy-Aligned Decoding of LMs for Better Writing and Reasoning
标题:LM的信息量匹配解码以实现更好的写作和推理
链接:https://arxiv.org/abs/2601.01714

作者:Kareem Ahmed,Sameer Singh
摘要:Language models (LMs) are trained on billions of tokens in an attempt to recover the true language distribution. Still, vanilla random sampling from LMs yields low quality generations. Decoding algorithms attempt to restrict the LM distribution to a set of high-probability continuations, but rely on greedy heuristics that introduce myopic distortions, yielding sentences that are homogeneous, repetitive and incoherent. In this paper, we introduce EPIC, a hyperparameter-free decoding approach that incorporates the entropy of future trajectories into LM decoding. EPIC explicitly regulates the amount of uncertainty expressed at every step of generation, aligning the sampling distribution's entropy to the aleatoric (data) uncertainty. Through Entropy-Aware Lazy Gumbel-Max sampling, EPIC manages to be exact, while also being efficient, requiring only a sublinear number of entropy evaluations per step. Unlike current baselines, EPIC yields sampling distributions that are empirically well-aligned with the entropy of the underlying data distribution. Across creative writing and summarization tasks, EPIC consistently improves LM-as-judge preference win-rates over widely used decoding strategies. These preference gains are complemented by automatic metrics, showing that EPIC produces more diverse generations and more faithful summaries. We also evaluate EPIC on mathematical reasoning, where it outperforms all baselines.


【3】Hidden costs for inference with deep network on embedded system devices
标题:嵌入式系统设备上深度网络推理的隐藏成本
链接:https://arxiv.org/abs/2601.01698

作者:Chankyu Lee,Woohyun Choi,Sangwook Park
备注:published in Proc. of IEEE ICCE 2025
摘要:This study evaluates the inference performance of various deep learning models under an embedded system environment. In previous works, Multiply-Accumulate operation is typically used to measure computational load of a deep model. According to this study, however, this metric has a limitation to estimate inference time on embedded devices. This paper poses the question of what aspects are overlooked when expressed in terms of Multiply-Accumulate operations. In experiments, an image classification task is performed on an embedded system device using the CIFAR-100 dataset to compare and analyze the inference times of ten deep models with the theoretically calculated Multiply-Accumulate operations for each model. The results highlight the importance of considering additional computations between tensors when optimizing deep learning models for real-time performing in embedded systems.


【4】EscherVerse: An Open World Benchmark and Dataset for Teleo-Spatial Intelligence with Physical-Dynamic and Intent-Driven Understanding
标题:Earthquake Verse:一个开放世界的基准和数据集,用于具有物理动态和意图驱动理解的目的空间智能
链接:https://arxiv.org/abs/2601.01547

作者:Tianjun Gu,Chenghua Gong,Jingyu Gong,Zhizhong Zhang,Yuan Xie,Lizhuang Ma,Xin Tan
摘要:The ability to reason about spatial dynamics is a cornerstone of intelligence, yet current research overlooks the human intent behind spatial changes. To address these limitations, we introduce Teleo-Spatial Intelligence (TSI), a new paradigm that unifies two critical pillars: Physical-Dynamic Reasoning--understanding the physical principles of object interactions--and Intent-Driven Reasoning--inferring the human goals behind these actions. To catalyze research in TSI, we present EscherVerse, consisting of a large-scale, open-world benchmark (Escher-Bench), a dataset (Escher-35k), and models (Escher series). Derived from real-world videos, EscherVerse moves beyond constrained settings to explicitly evaluate an agent's ability to reason about object permanence, state transitions, and trajectory prediction in dynamic, human-centric scenarios. Crucially, it is the first benchmark to systematically assess Intent-Driven Reasoning, challenging models to connect physical events to their underlying human purposes. Our work, including a novel data curation pipeline, provides a foundational resource to advance spatial intelligence from passive scene description toward a holistic, purpose-driven understanding of the world.


【5】Aletheia: Quantifying Cognitive Conviction in Reasoning Models via Regularized Inverse Confusion Matrix
标题:Aletheia:通过正规化逆混淆矩阵量化推理模型中的认知信念
链接:https://arxiv.org/abs/2601.01532

作者:Fanzhe Fu
备注:6 pages, 2 figures
摘要:In the progressive journey toward Artificial General Intelligence (AGI), current evaluation paradigms face an epistemological crisis. Static benchmarks measure knowledge breadth but fail to quantify the depth of belief. While Simhi et al. (2025) defined the CHOKE phenomenon in standard QA, we extend this framework to quantify "Cognitive Conviction" in System 2 reasoning models. We propose Project Aletheia, a cognitive physics framework that employs Tikhonov Regularization to invert the judge's confusion matrix. To validate this methodology without relying on opaque private data, we implement a Synthetic Proxy Protocol. Our preliminary pilot study on 2025 baselines (e.g., DeepSeek-R1, OpenAI o1) suggests that while reasoning models act as a "cognitive buffer," they may exhibit "Defensive OverThinking" under adversarial pressure. Furthermore, we introduce the Aligned Conviction Score (S_aligned) to verify that conviction does not compromise safety. This work serves as a blueprint for measuring AI scientific integrity.


【6】LinMU: Multimodal Understanding Made Linear
标题:LinMU:多模式理解线性化
链接:https://arxiv.org/abs/2601.01322

作者:Hongjie Wang,Niraj K. Jha
备注:23 pages, 7 figures
摘要:Modern Vision-Language Models (VLMs) achieve impressive performance but are limited by the quadratic complexity of self-attention, which prevents their deployment on edge devices and makes their understanding of high-resolution images and long-context videos prohibitively expensive. To address this challenge, we introduce LinMU (Linear-complexity Multimodal Understanding), a VLM design that achieves linear complexity without using any quadratic-complexity modules while maintaining the performance of global-attention-based VLMs. LinMU replaces every self-attention layer in the VLM with the M-MATE block: a dual-branch module that combines a bidirectional state-space model for global context (Flex-MA branch) with localized Swin-style window attention (Local-Swin branch) for adjacent correlations. To transform a pre-trained VLM into the LinMU architecture, we propose a three-stage distillation framework that (i) initializes both branches with self-attention weights and trains the Flex-MA branch alone, (ii) unfreezes the Local-Swin branch and fine-tunes it jointly with the Flex-MA branch, and (iii) unfreezes the remaining blocks and fine-tunes them using LoRA adapters, while regressing on hidden states and token-level logits of the frozen VLM teacher. On MMMU, TextVQA, LongVideoBench, Video-MME, and other benchmarks, LinMU matches the performance of teacher models, yet reduces Time-To-First-Token (TTFT) by up to 2.7$\times$ and improves token throughput by up to 9.0$\times$ on minute-length videos. Ablations confirm the importance of each distillation stage and the necessity of the two branches of the M-MATE block. The proposed framework demonstrates that state-of-the-art multimodal reasoning can be achieved without quadratic attention, thus opening up avenues for long-context VLMs that can deal with high-resolution images and long videos.


【7】Scalable Data-Driven Reachability Analysis and Control via Koopman Operators with Conformal Coverage Guarantees
标题:通过具有保形覆盖保证的Koopman运算符进行可扩展数据驱动可达性分析和控制
链接:https://arxiv.org/abs/2601.01076

作者:Devesh Nath,Haoran Yin,Glen Chou
备注:Under review, 28 pages, 12 figures
摘要 :We propose a scalable reachability-based framework for probabilistic, data-driven safety verification of unknown nonlinear dynamics. We use Koopman theory with a neural network (NN) lifting function to learn an approximate linear representation of the dynamics and design linear controllers in this space to enable closed-loop tracking of a reference trajectory distribution. Closed-loop reachable sets are efficiently computed in the lifted space and mapped back to the original state space via NN verification tools. To capture model mismatch between the Koopman dynamics and the true system, we apply conformal prediction to produce statistically-valid error bounds that inflate the reachable sets to ensure the true trajectories are contained with a user-specified probability. These bounds generalize across references, enabling reuse without recomputation. Results on high-dimensional MuJoCo tasks (11D Hopper, 28D Swimmer) and 12D quadcopters show improved reachable set coverage rate, computational efficiency, and conservativeness over existing methods.


【8】A-PINN: Auxiliary Physics-informed Neural Networks for Structural Vibration Analysis in Continuous Euler-Bernoulli Beam
标题:A-PINN:用于连续Euler-Bernoulli梁结构振动分析的辅助物理神经网络
链接:https://arxiv.org/abs/2601.00866

作者:Shivani Saini,Ramesh Kumar Vats,Arup Kumar Sahoo
备注:31 pages
摘要:Recent advancements in physics-informed neural networks (PINNs) and their variants have garnered substantial focus from researchers due to their effectiveness in solving both forward and inverse problems governed by differential equations. In this research, a modified Auxiliary physics-informed neural network (A-PINN) framework with balanced adaptive optimizers is proposed for the analysis of structural vibration problems. In order to accurately represent structural systems, it is critical for capturing vibration phenomena and ensuring reliable predictive analysis. So, our investigations are crucial for gaining deeper insight into the robustness of scientific machine learning models for solving vibration problems. Further, to rigorously evaluate the performance of A-PINN, we conducted different numerical simulations to approximate the Euler-Bernoulli beam equations under the various scenarios. The numerical results substantiate the enhanced performance of our model in terms of both numerical stability and predictive accuracy. Our model shows improvement of at least 40% over the baselines.


【9】Selective Imperfection as a Generative Framework for Analysis, Creativity and Discovery
标题:选择性不完美作为分析、创造力和发现的生成框架
链接:https://arxiv.org/abs/2601.00863

作者:Markus J. Buehler
摘要:We introduce materiomusic as a generative framework linking the hierarchical structures of matter with the compositional logic of music. Across proteins, spider webs and flame dynamics, vibrational and architectural principles recur as tonal hierarchies, harmonic progressions, and long-range musical form. Using reversible mappings, from molecular spectra to musical tones and from three-dimensional networks to playable instruments, we show how sound functions as a scientific probe, an epistemic inversion where listening becomes a mode of seeing and musical composition becomes a blueprint for matter. These mappings excavate deep time: patterns originating in femtosecond molecular vibrations or billion-year evolutionary histories become audible. We posit that novelty in science and art emerges when constraints cannot be satisfied within existing degrees of freedom, forcing expansion of the space of viable configurations. Selective imperfection provides the mechanism restoring balance between coherence and adaptability. Quantitative support comes from exhaustive enumeration of all 2^12 musical scales, revealing that culturally significant systems cluster in a mid-entropy, mid-defect corridor, directly paralleling the Hall-Petch optimum where intermediate defect densities maximize material strength. Iterating these mappings creates productive collisions between human creativity and physics, generating new information as musical structures encounter evolutionary constraints. We show how swarm-based AI models compose music exhibiting human-like structural signatures such as small-world connectivity, modular integration, long-range coherence, suggesting a route beyond interpolation toward invention. We show that science and art are generative acts of world-building under constraint, with vibration as a shared grammar organizing structure across scales.


【10】EdgeJury: Cross-Reviewed Small-Model Ensembles for Truthful Question Answering on Serverless Edge Inference
标题:EdgeJury:在无服务器边缘推理上进行真实问题解答的交叉审查小型模型套件
链接:https://arxiv.org/abs/2601.00850

作者:Aayush Kumar
备注:24 pages,3 Figures, Submitting to IEEE Access
摘要:Hallucinations hinder reliable question answering, especially in resource-constrained deployments where frontier-scale models or retrieval pipelines may be impractical. We present EdgeJury, a lightweight ensemble framework that improves truthfulness and robustness using only small instruction-tuned language models (3B-8B) suitable for serverless edge inference. EdgeJury orchestrates four stages: (1) parallel role-specialized generation, (2) anonymized cross-review with structured critiques and rankings, (3) chairman synthesis that integrates the strongest content while addressing flagged issues, and (4) claim-level consistency labeling based on inter-model agreement. On TruthfulQA (MC1), EdgeJury achieves 76.2% accuracy (95% CI: 72.8-79.6%), a +21.4% relative improvement over a single 8B baseline (62.8%), and outperforms standard baselines including self-consistency and majority voting under transparent compute accounting (total tokens and platform cost reported). On a 200-question adversarial EdgeCases set, EdgeJury yields +48.2% relative gains (95% CI: 44.0-52.4%). Manual analysis on 100 incorrect answers shows an approximately 55% reduction in factual hallucination errors versus the single-model baseline. Deployed on Cloudflare Workers AI, EdgeJury achieves 8.4 s median end-to-end latency, demonstrating that coordinated small-model ensembles can improve truthfulness on misconception-heavy QA benchmarks without external retrieval or proprietary large-model APIs.


【11】Simplex Deep Linear Discriminant Analysis
标题:单纯形深度线性鉴别分析
链接:https://arxiv.org/abs/2601.01679

作者:Maxat Tezekbayev,Arman Bolatov,Zhenisbek Assylbekov
摘要 :We revisit Deep Linear Discriminant Analysis (Deep LDA) from a likelihood-based perspective. While classical LDA is a simple Gaussian model with linear decision boundaries, attaching an LDA head to a neural encoder raises the question of how to train the resulting deep classifier by maximum likelihood estimation (MLE). We first show that end-to-end MLE training of an unconstrained Deep LDA model ignores discrimination: when both the LDA parameters and the encoder parameters are learned jointly, the likelihood admits a degenerate solution in which some of the class clusters may heavily overlap or even collapse, and classification performance deteriorates. Batchwise moment re-estimation of the LDA parameters does not remove this failure mode. We then propose a constrained Deep LDA formulation that fixes the class means to the vertices of a regular simplex in the latent space and restricts the shared covariance to be spherical, leaving only the priors and a single variance parameter to be learned along with the encoder. Under these geometric constraints, MLE becomes stable and yields well-separated class clusters in the latent space. On images (Fashion-MNIST, CIFAR-10, CIFAR-100), the resulting Deep LDA models achieve accuracy competitive with softmax baselines while offering a simple, interpretable latent geometry that is clearly visible in two-dimensional projections.


【12】Deep Linear Discriminant Analysis Revisited
标题:深度线性辨别分析重新审视
链接:https://arxiv.org/abs/2601.01619

作者:Maxat Tezekbayev,Rustem Takhanov,Arman Bolatov,Zhenisbek Assylbekov
摘要:We show that for unconstrained Deep Linear Discriminant Analysis (LDA) classifiers, maximum-likelihood training admits pathological solutions in which class means drift together, covariances collapse, and the learned representation becomes almost non-discriminative. Conversely, cross-entropy training yields excellent accuracy but decouples the head from the underlying generative model, leading to highly inconsistent parameter estimates. To reconcile generative structure with discriminative performance, we introduce the \emph{Discriminative Negative Log-Likelihood} (DNLL) loss, which augments the LDA log-likelihood with a simple penalty on the mixture density. DNLL can be interpreted as standard LDA NLL plus a term that explicitly discourages regions where several classes are simultaneously likely. Deep LDA trained with DNLL produces clean, well-separated latent spaces, matches the test accuracy of softmax classifiers on synthetic data and standard image benchmarks, and yields substantially better calibrated predictive probabilities, restoring a coherent probabilistic interpretation to deep discriminant models.


【13】SGD with Dependent Data: Optimal Estimation, Regret, and Inference
标题:相依数据下的SGD:最优估计、后悔与推断
链接:https://arxiv.org/abs/2601.01371

作者:Yinan Shen,Yichen Zhang,Wen-Xin Zhou
摘要:This work investigates the performance of the final iterate produced by stochastic gradient descent (SGD) under temporally dependent data. We consider two complementary sources of dependence: $(i)$ martingale-type dependence in both the covariate and noise processes, which accommodates non-stationary and non-mixing time series data, and $(ii)$ dependence induced by sequential decision making. Our formulation runs in parallel with classical notions of (local) stationarity and strong mixing, while neither framework fully subsumes the other. Remarkably, SGD is shown to automatically accommodate both independent and dependent information under a broad class of stepsize schedules and exploration rate schemes.   Non-asymptotically, we show that SGD simultaneously achieves statistically optimal estimation error and regret, extending and improving existing results. In particular, our tail bounds remain sharp even for potentially infinite horizon $T=+\infty$. Asymptotically, the SGD iterates converge to a Gaussian distribution with only an $O_{\PP}(1/\sqrt{t})$ remainder, demonstrating that the supposed estimation-regret trade-off claimed in prior work can in fact be avoided. We further propose a new ``conic'' approximation of the decision region that allows the covariates to have unbounded support. For online sparse regression, we develop a new SGD-based algorithm that uses only $d$ units of storage and requires $O(d)$ flops per iteration, achieving the long term statistical optimality. Intuitively, each incoming observation contributes to estimation accuracy, while aggregated summary statistics guide support recovery.


【14】A New Framework for Explainable Rare Cell Identification in Single-Cell Transcriptomics Data
标题:单细胞转录组学数据中可解释稀有细胞鉴定的新框架
链接:https://arxiv.org/abs/2601.01358

作者:Di Su,Kai Ming Ting,Jie Zhang,Xiaorui Zhang,Xinpeng Li
摘要:The detection of rare cell types in single-cell transcriptomics data is crucial for elucidating disease pathogenesis and tissue development dynamics. However, a critical gap that persists in current methods is their inability to provide an explanation based on genes for each cell they have detected as rare. We identify three primary sources of this deficiency. First, the anomaly detectors often function as "black boxes", designed to detect anomalies but unable to explain why a cell is anomalous. Second, the standard analytical framework hinders interpretability by relying on dimensionality reduction techniques, such as Principal Component Analysis (PCA), which transform meaningful gene expression data into abstract, uninterpretable features. Finally, existing explanation algorithms cannot be readily applied to this domain, as single-cell data is characterized by high dimensionality, noise, and substantial sparsity. To overcome these limitations, we introduce a framework for explainable anomaly detection in single-cell transcriptomics data which not only identifies individual anomalies, but also provides a visual explanation based on genes that makes an instance anomalous. This framework has two key ingredients that are not existed in current methods applied in this domain. First, it eliminates the PCA step which is deemed to be an essential component in previous studies. Second, it employs the state-of-art anomaly detector and explainer as the efficient and effective means to find each rare cell and the relevant gene subspace in order to provide explanations for each rare cell as well as the typical normal cell associated with the rare cell's closest normal cells.


【15】Concave Certificates: Geometric Framework for Distributionally Robust Risk and Complexity Analysis
标题:凹凸证书:分布稳健风险和复杂性分析的几何框架
链接:https://arxiv.org/abs/2601.01311

作者:Hong T. M. Chu
备注:30 pages, 7 figures
摘要 :Distributionally Robust (DR) optimization aims to certify worst-case risk within a Wasserstein uncertainty set. Current certifications typically rely either on global Lipschitz bounds, which are often conservative, or on local gradient information, which provides only a first-order approximation. This paper introduces a novel geometric framework based on the least concave majorants of the growth rate function. Our proposed concave certificate establishes a tight bound of DR risk that remains applicable to non-Lipschitz and non-differentiable losses. We extend this framework to complexity analysis, introducing a deterministic bound that complements standard statistical generalization bound. Furthermore, we utilize this certificate to bound the gap between adversarial and empirical Rademacher complexity, demonstrating that dependencies on input diameter, network width, and depth can be eliminated. For practical application in deep learning, we introduce the adversarial score as a tractable relaxation of the concave certificate that enables efficient and layer-wise analysis of neural networks. We validate our theoretical results in various numerical experiments on classification and regression tasks on real-world data.


【16】NeuroSSM: Multiscale Differential State-Space Modeling for Context-Aware fMRI Analysis
标题:Neurosm:用于上下文感知fMRI分析的多尺度差异状态空间建模
链接:https://arxiv.org/abs/2601.01229

作者:Furkan Genç,Boran İsmet Macun,Sait Sarper Özaslan,Emine U. Saritas,Tolga Çukur
摘要:Accurate fMRI analysis requires sensitivity to temporal structure across multiple scales, as BOLD signals encode cognitive processes that emerge from fast transient dynamics to slower, large-scale fluctuations. Existing deep learning (DL) approaches to temporal modeling face challenges in jointly capturing these dynamics over long fMRI time series. Among current DL models, transformers address long-range dependencies by explicitly modeling pairwise interactions through attention, but the associated quadratic computational cost limits effective integration of temporal dependencies across long fMRI sequences. Selective state-space models (SSMs) instead model long-range temporal dependencies implicitly through latent state evolution in a dynamical system, enabling efficient propagation of dependencies over time. However, recent SSM-based approaches for fMRI commonly operate on derived functional connectivity representations and employ single-scale temporal processing. These design choices constrain the ability to jointly represent fast transient dynamics and slower global trends within a single model. We propose NeuroSSM, a selective state-space architecture designed for end-to-end analysis of raw BOLD signals in fMRI time series. NeuroSSM addresses the above limitations through two complementary design components: a multiscale state-space backbone that captures fast and slow dynamics concurrently, and a parallel differencing branch that increases sensitivity to transient state changes. Experiments on clinical and non-clinical datasets demonstrate that NeuroSSM achieves competitive performance and efficiency against state-of-the-art fMRI analysis methods.


检测相关(11篇)

【1】Multivariate Time-series Anomaly Detection via Dynamic Model Pool & Ensembling
标题:通过动态模型池和集成进行多元时间序列异常检测
链接:https://arxiv.org/abs/2601.02037

作者:Wei Hu,Zewei Yu,Jianqiu Xu
摘要:Multivariate time-series (MTS) anomaly detection is critical in domains such as service monitor, IoT, and network security. While multi-model methods based on selection or ensembling outperform single-model ones, they still face limitations: (i) selection methods rely on a single chosen model and are sensitive to the strategy; (ii) ensembling methods often combine all models or are restricted to univariate data; and (iii) most methods depend on fixed data dimensionality, limiting scalability. To address these, we propose DMPEAD, a Dynamic Model Pool and Ensembling framework for MTS Anomaly Detection. The framework first (i) constructs a diverse model pool via parameter transfer and diversity metric, then (ii) updates it with a meta-model and similarity-based strategy for adaptive pool expansion, subset selection, and pool merging, finally (iii) ensembles top-ranked models through proxy metric ranking and top-k aggregation in the selected subset, outputting the final anomaly detection result. Extensive experiments on 8 real-world datasets show that our model outperforms all baselines, demonstrating superior adaptability and scalability.


【2】Enhancing Object Detection with Privileged Information: A Model-Agnostic Teacher-Student Approach
标题:利用特权信息增强对象检测:模型不可知的师生方法
链接:https://arxiv.org/abs/2601.02016

作者:Matthias Bartolo,Dylan Seychell,Gabriel Hili,Matthew Montebello,Carl James Debono,Saviour Formosa,Konstantinos Makantasis
备注:Code available on GitHub: https://github.com/mbar0075/lupi-for-object-detection
摘要:This paper investigates the integration of the Learning Using Privileged Information (LUPI) paradigm in object detection to exploit fine-grained, descriptive information available during training but not at inference. We introduce a general, model-agnostic methodology for injecting privileged information-such as bounding box masks, saliency maps, and depth cues-into deep learning-based object detectors through a teacher-student architecture. Experiments are conducted across five state-of-the-art object detection models and multiple public benchmarks, including UAV-based litter detection datasets and Pascal VOC 2012, to assess the impact on accuracy, generalization, and computational efficiency. Our results demonstrate that LUPI-trained students consistently outperform their baseline counterparts, achieving significant boosts in detection accuracy with no increase in inference complexity or model size. Performance improvements are especially marked for medium and large objects, while ablation studies reveal that intermediate weighting of teacher guidance optimally balances learning from privileged and standard inputs. The findings affirm that the LUPI framework provides an effective and practical strategy for advancing object detection systems in both resource-constrained and real-world settings.


【3】High-Order Epistasis Detection Using Factorization Machine with Quadratic Optimization Annealing and MDR-Based Evaluation
标题:使用因子分解机、二次优化模拟和基于MDR的评估进行高位寄生检测
链接:https://arxiv.org/abs/2601.01860

作者:Shuta Kikuchi,Shu Tanaka
备注:6 pages, 2 figures
摘要:Detecting high-order epistasis is a fundamental challenge in genetic association studies due to the combinatorial explosion of candidate locus combinations. Although multifactor dimensionality reduction (MDR) is a widely used method for evaluating epistasis, exhaustive MDR-based searches become computationally infeasible as the number of loci or the interaction order increases. In this paper, we define the epistasis detection problem as a black-box optimization problem and solve it with a factorization machine with quadratic optimization annealing (FMQA). We propose an efficient epistasis detection method based on FMQA, in which the classification error rate (CER) computed by MDR is used as a black-box objective function. Experimental evaluations were conducted using simulated case-control datasets with predefined high-order epistasis. The results demonstrate that the proposed method successfully identified ground-truth epistasis across various interaction orders and the numbers of genetic loci within a limited number of iterations. These results indicate that the proposed method is effective and computationally efficient for high-order epistasis detection.


【4】Digital Twin-Driven Communication-Efficient Federated Anomaly Detection for Industrial IoT
标题:用于工业物联网的数字双驱动通信高效联邦异常检测
链接:https://arxiv.org/abs/2601.01701

作者:Mohammed Ayalew Belay,Adil Rasheed,Pierluigi Salvo Rossi
摘要:Anomaly detection is increasingly becoming crucial for maintaining the safety, reliability, and efficiency of industrial systems. Recently, with the advent of digital twins and data-driven decision-making, several statistical and machine-learning methods have been proposed. However, these methods face several challenges, such as dependence on only real sensor datasets, limited labeled data, high false alarm rates, and privacy concerns. To address these problems, we propose a suite of digital twin-integrated federated learning (DTFL) methods that enhance global model performance while preserving data privacy and communication efficiency. Specifically, we present five novel approaches: Digital Twin-Based Meta-Learning (DTML), Federated Parameter Fusion (FPF), Layer-wise Parameter Exchange (LPE), Cyclic Weight Adaptation (CWA), and Digital Twin Knowledge Distillation (DTKD). Each method introduces a unique mechanism to combine synthetic and real-world knowledge, balancing generalization with communication overhead. We conduct an extensive experiment using a publicly available cyber-physical anomaly detection dataset. For a target accuracy of 80%, CWA reaches the target in 33 rounds, FPF in 41 rounds, LPE in 48 rounds, and DTML in 87 rounds, whereas the standard FedAvg baseline and DTKD do not reach the target within 100 rounds. These results highlight substantial communication-efficiency gains (up to 62% fewer rounds than DTML and 31% fewer than LPE) and demonstrate that integrating DT knowledge into FL accelerates convergence to operationally meaningful accuracy thresholds for IIoT anomaly detection.


【5】Improving Variational Autoencoder using Random Fourier Transformation: An Aviation Safety Anomaly Detection Case-Study
标题:使用随机傅里叶变换改进变分自动编码器:航空安全异常检测案例研究
链接:https://arxiv.org/abs/2601.01016

作者:Ata Akbari Asanjan,Milad Memarzadeh,Bryan Matthews,Nikunj Oza
摘要:In this study, we focus on the training process and inference improvements of deep neural networks (DNNs), specifically Autoencoders (AEs) and Variational Autoencoders (VAEs), using Random Fourier Transformation (RFT). We further explore the role of RFT in model training behavior using Frequency Principle (F-Principle) analysis and show that models with RFT turn to learn low frequency and high frequency at the same time, whereas conventional DNNs start from low frequency and gradually learn (if successful) high-frequency features. We focus on reconstruction-based anomaly detection using autoencoder and variational autoencoder and investigate the RFT's role. We also introduced a trainable variant of RFT that uses the existing computation graph to train the expansion of RFT instead of it being random. We showcase our findings with two low-dimensional synthetic datasets for data representation, and an aviation safety dataset, called Dashlink, for high-dimensional reconstruction-based anomaly detection. The results indicate the superiority of models with Fourier transformation compared to the conventional counterpart and remain inconclusive regarding the benefits of using trainable Fourier transformation in contrast to the Random variant.


【6】Towards eco friendly cybersecurity: machine learning based anomaly detection with carbon and energy metrics
标题:迈向生态友好型网络安全:基于机器学习的异常检测以及碳和能源指标
链接:https://arxiv.org/abs/2601.00893

作者:KC Aashish,Md Zakir Hossain Zamil,Md Shafiqul Islam Mridul,Lamia Akter,Farmina Sharmin,Eftekhar Hossain Ayon,Md Maruf Bin Reza,Ali Hassan,Abdur Rahim,Sirapa Malla
备注:International Journal of Applied Mathematics 2025
摘要:The rising energy footprint of artificial intelligence has become a measurable component of US data center emissions, yet cybersecurity research seldom considers its environmental cost. This study introduces an eco aware anomaly detection framework that unifies machine learning based network monitoring with real time carbon and energy tracking. Using the publicly available Carbon Aware Cybersecurity Traffic Dataset comprising 2300 flow level observations, we benchmark Logistic Regression, Random Forest, Support Vector Machine, Isolation Forest, and XGBoost models across energy, carbon, and performance dimensions. Each experiment is executed in a controlled Colab environment instrumented with the CodeCarbon toolkit to quantify power draw and equivalent CO2 output during both training and inference. We construct an Eco Efficiency Index that expresses F1 score per kilowatt hour to capture the trade off between detection quality and environmental impact. Results reveal that optimized Random Forest and lightweight Logistic Regression models achieve the highest eco efficiency, reducing energy consumption by more than forty percent compared to XGBoost while sustaining competitive detection accuracy. Principal Component Analysis further decreases computational load with negligible loss in recall. Collectively, these findings establish that integrating carbon and energy metrics into cybersecurity workflows enables environmentally responsible machine learning without compromising operational protection. The proposed framework offers a reproducible path toward sustainable carbon accountable cybersecurity aligned with emerging US green computing and federal energy efficiency initiatives.


【7】Outlier Detection Using Vector Cosine Similarity by Adding a Dimension
标题:通过添加维度利用载体Cosine相似度进行离群点检测
链接:https://arxiv.org/abs/2601.00883

作者:Zhongyang Shen
备注:This is an updated version of the paper originally published in ICAIIC 2024 (DOI: 10.1109/ICAIIC60209.2024.10463442). Changes include minor typographical and grammatical corrections, as well as an added description of an optimized open-source Python implementation (MDOD) available on PyPI at https://pypi.org/project/mdod/
摘要:We propose a new outlier detection method for multi-dimensional data. The method detects outliers based on vector cosine similarity, using a new dataset constructed by adding a dimension with zero values to the original data. When a point in the new dataset is selected as the measured point, an observation point is created as the origin, differing only in the new dimension by having a non-zero value compared to the measured point. Vectors are then formed from the observation point to the measured point and to other points in the dataset. By comparing the cosine similarities of these vectors, abnormal data can be identified. An optimized implementation (MDOD) is available on PyPI: https://pypi.org/project/mdod/.


【8】Energy-Efficient Eimeria Parasite Detection Using a Two-Stage Spiking Neural Network Architecture
标题:使用两级尖峰神经网络架构的节能艾美耳寄生虫检测
链接:https://arxiv.org/abs/2601.00806

作者:Ángel Miguel García-Vico,Huseyin Seker,Muhammad Afzal
摘要:Coccidiosis, a disease caused by the Eimeria parasite, represents a major threat to the poultry and rabbit industries, demanding rapid and accurate diagnostic tools. While deep learning models offer high precision, their significant energy consumption limits their deployment in resource-constrained environments. This paper introduces a novel two-stage Spiking Neural Network (SNN) architecture, where a pre-trained Convolutional Neural Network is first converted into a spiking feature extractor and then coupled with a lightweight, unsupervised SNN classifier trained with Spike-Timing-Dependent Plasticity (STDP). The proposed model sets a new state-of-the-art, achieving 98.32\% accuracy in Eimeria classification. Remarkably, this performance is accomplished with a significant reduction in energy consumption, showing an improvement of more than 223 times compared to its traditional ANN counterpart. This work demonstrates a powerful synergy between high accuracy and extreme energy efficiency, paving the way for autonomous, low-power diagnostic systems on neuromorphic hardware.


【9】Real-Time Human Detection for Aerial Captured Video Sequences via Deep Models
标题:通过深度模型对空中捕获视频序列进行实时人体检测
链接:https://arxiv.org/abs/2601.00391

作者:Nouar AlDahoul,Aznul Qalid Md Sabri,Ali Mohammed Mansoor
摘要:Human detection in videos plays an important role in various real-life applications. Most traditional approaches depend on utilizing handcrafted features, which are problem-dependent and optimal for specific tasks. Moreover, they are highly susceptible to dynamical events such as illumination changes, camera jitter, and variations in object sizes. On the other hand, the proposed feature learning approaches are cheaper and easier because highly abstract and discriminative features can be produced automatically without the need of expert knowledge. In this paper, we utilize automatic feature learning methods, which combine optical flow and three different deep models (i.e., supervised convolutional neural network (S-CNN), pretrained CNN feature extractor, and hierarchical extreme learning machine) for human detection in videos captured using a nonstatic camera on an aerial platform with varying altitudes. The models are trained and tested on the publicly available and highly challenging UCF-ARG aerial dataset. The comparison between these models in terms of training, testing accuracy, and learning speed is analyzed. The performance evaluation considers five human actions (digging, waving, throwing, walking, and running). Experimental results demonstrated that the proposed methods are successful for the human detection task. The pretrained CNN produces an average accuracy of 98.09%. S-CNN produces an average accuracy of 95.6% with softmax and 91.7% with Support Vector Machines (SVM). H-ELM has an average accuracy of 95.9%. Using a normal Central Processing Unit (CPU), H-ELM's training time takes 445 seconds. Learning in S-CNN takes 770 seconds with a high-performance Graphical Processing Unit (GPU).


【10】Hunting for "Oddballs" with Machine Learning: Detecting Anomalous Exoplanets Using a Deep-Learned Low-Dimensional Representation of Transit Spectra with Autoencoders
链接:https://arxiv.org/abs/2601.02324

作者:Alexander Roman,Emilie Panek,Roy T. Forestano,Eyup B. Unlu,Katia Matcheva,Konstantin T. Matchev
备注:14 pages, 12 figures
摘要:This study explores the application of autoencoder-based machine learning techniques for anomaly detection to identify exoplanet atmospheres with unconventional chemical signatures using a low-dimensional data representation. We use the Atmospheric Big Challenge (ABC) database, a publicly available dataset with over 100,000 simulated exoplanet spectra, to construct an anomaly detection scenario by defining CO2-rich atmospheres as anomalies and CO2-poor atmospheres as the normal class. We benchmarked four different anomaly detection strategies: Autoencoder Reconstruction Loss, One-Class Support Vector Machine (1 class-SVM), K-means Clustering, and Local Outlier Factor (LOF). Each method was evaluated in both the original spectral space and the autoencoder's latent space using Receiver Operating Characteristic (ROC) curves and Area Under the Curve (AUC) metrics. To test the performance of the different methods under realistic conditions, we introduced Gaussian noise levels ranging from 10 to 50 ppm. Our results indicate that anomaly detection is consistently more effective when performed within the latent space across all noise levels. Specifically, K-means clustering in the latent space emerged as a stable and high-performing method. We demonstrate that this anomaly detection approach is robust to noise levels up to 30 ppm (consistent with realistic space-based observations) and remains viable even at 50 ppm when leveraging latent space representations. On the other hand, the performance of the anomaly detection methods applied directly in the raw spectral space degrades significantly with increasing the level of noise. This suggests that autoencoder-driven dimensionality reduction offers a robust methodology for flagging chemically anomalous targets in large-scale surveys where exhaustive retrievals are computationally prohibitive.


【11】Placenta Accreta Spectrum Detection using Multimodal Deep Learning
标题:使用多模态深度学习的胎盘植入光谱检测
链接:https://arxiv.org/abs/2601.00907

作者:Sumaiya Ali,Areej Alhothali,Sameera Albasri,Ohoud Alzamzami,Ahmed Abduljabbar,Muhammad Alwazzan
摘要:Placenta Accreta Spectrum (PAS) is a life-threatening obstetric complication involving abnormal placental invasion into the uterine wall. Early and accurate prenatal diagnosis is essential to reduce maternal and neonatal risks. This study aimed to develop and validate a deep learning framework that enhances PAS detection by integrating multiple imaging modalities. A multimodal deep learning model was designed using an intermediate feature-level fusion architecture combining 3D Magnetic Resonance Imaging (MRI) and 2D Ultrasound (US) scans. Unimodal feature extractors, a 3D DenseNet121-Vision Transformer for MRI and a 2D ResNet50 for US, were selected after systematic comparative analysis. Curated datasets comprising 1,293 MRI and 1,143 US scans were used to train the unimodal models and paired samples of patient-matched MRI-US scans was isolated for multimodal model development and evaluation. On an independent test set, the multimodal fusion model achieved superior performance, with an accuracy of 92.5% and an Area Under the Receiver Operating Characteristic Curve (AUC) of 0.927, outperforming the MRI-only (82.5%, AUC 0.825) and US-only (87.5%, AUC 0.879) models. Integrating MRI and US features provides complementary diagnostic information, demonstrating strong potential to enhance prenatal risk assessment and improve patient outcomes.


分类|识别(3篇)

【1】QuIC: A Quantum-Inspired Interaction Classifier for Revitalizing Shallow CNNs in Fine-Grained Recognition
标题:QuIC:一种量子启发的交互分类器,用于在细粒度识别中重振浅层CNN
链接:https://arxiv.org/abs/2601.02189

作者:Cheng Ying Wu,Yen Jui Chang
摘要:Deploying deep learning models for Fine-Grained Visual Classification (FGVC) on resource-constrained edge devices remains a significant challenge. While deep architectures achieve high accuracy on benchmarks like CUB-200-2011, their computational cost is often prohibitive. Conversely, shallow networks (e.g., AlexNet, VGG) offer efficiency but fail to distinguish visually similar sub-categories. This is because standard Global Average Pooling (GAP) heads capture only first-order statistics, missing the subtle high-order feature interactions required for FGVC. While Bilinear CNNs address this, they suffer from high feature dimensionality and instability during training. To bridge this gap, we propose the Quantum-inspired Interaction Classifier (QuIC). Drawing inspiration from quantum mechanics, QuIC models feature channels as interacting quantum states and captures second-order feature covariance via a learnable observable operator. Designed as a lightweight, plug-and-play module, QuIC supports stable, single-stage end-to-end training without exploding feature dimensions. Experimental results demonstrate that QuIC significantly revitalizes shallow backbones: it boosts the Top-1 accuracy of VGG16 by nearly 20% and outperforms state-of-the-art attention mechanisms (SE-Block) on ResNet18. Qualitative analysis, including t-SNE visualization, further confirms that QuIC resolves ambiguous cases by explicitly attending to fine-grained discriminative features and enforcing compact intra-class clustering.


【2】Evo-TFS: Evolutionary Time-Frequency Domain-Based Synthetic Minority Oversampling Approach to Imbalanced Time Series Classification
标题:Evo-TSB:基于进化时频域的合成少数派过采样方法不平衡时间序列分类
链接:https://arxiv.org/abs/2601.01150

作者:Wenbin Pei,Ruohao Dai,Bing Xue,Mengjie Zhang,Qiang Zhang,Yiu-Ming Cheung
摘要:Time series classification is a fundamental machine learning task with broad real-world applications. Although many deep learning methods have proven effective in learning time-series data for classification, they were originally developed under the assumption of balanced data distributions. Once data distribution is uneven, these methods tend to ignore the minority class that is typically of higher practical significance. Oversampling methods have been designed to address this by generating minority-class samples, but their reliance on linear interpolation often hampers the preservation of temporal dynamics and the generation of diverse samples. Therefore, in this paper, we propose Evo-TFS, a novel evolutionary oversampling method that integrates both time- and frequency-domain characteristics. In Evo-TFS, strongly typed genetic programming is employed to evolve diverse, high-quality time series, guided by a fitness function that incorporates both time-domain and frequency-domain characteristics. Experiments conducted on imbalanced time series datasets demonstrate that Evo-TFS outperforms existing oversampling methods, significantly enhancing the performance of time-domain and frequency-domain classifiers.


【3】Enhanced Leukemic Cell Classification Using Attention-Based CNN and Data Augmentation
标题:使用基于注意力的CNN和数据增强增强白血病细胞分类
链接:https://arxiv.org/abs/2601.01026

作者:Douglas Costa Braga,Daniel Oliveira Dantas
备注:9 pages, 5 figures, 4 tables. Submitted to VISAPP 2025
摘要:We present a reproducible deep learning pipeline for leukemic cell classification, focusing on system architecture, experimental robustness, and software design choices for medical image analysis. Acute lymphoblastic leukemia (ALL) is the most common childhood cancer, requiring expert microscopic diagnosis that suffers from inter-observer variability and time constraints. The proposed system integrates an attention-based convolutional neural network combining EfficientNetV2-B3 with Squeeze-and-Excitation mechanisms for automated ALL cell classification. Our approach employs comprehensive data augmentation, focal loss for class imbalance, and patient-wise data splitting to ensure robust and reproducible evaluation. On the C-NMC 2019 dataset (12,528 original images from 62 patients), the system achieves a 97.89% F1-score and 97.89% accuracy on the test set, with statistical validation through 100-iteration Monte Carlo experiments confirming significant improvements (p < 0.001) over baseline methods. The proposed pipeline outperforms existing approaches by up to 4.67% while using 89% fewer parameters than VGG16 (15.2M vs. 138M). The attention mechanism provides interpretable visualizations of diagnostically relevant cellular features, demonstrating that modern attention-based architectures can improve leukemic cell classification while maintaining computational efficiency suitable for clinical deployment.


3D|3D重建等相关(1篇)

【1】Clean-GS: Semantic Mask-Guided Pruning for 3D Gaussian Splatting
标题:Clean-GS:语义掩码引导的3D高斯溅射剪枝
链接:https://arxiv.org/abs/2601.00913

作者:Subhankar Mishra
摘要:3D Gaussian Splatting produces high-quality scene reconstructions but generates hundreds of thousands of spurious Gaussians (floaters) scattered throughout the environment. These artifacts obscure objects of interest and inflate model sizes, hindering deployment in bandwidth-constrained applications. We present Clean-GS, a method for removing background clutter and floaters from 3DGS reconstructions using sparse semantic masks. Our approach combines whitelist-based spatial filtering with color-guided validation and outlier removal to achieve 60-80\% model compression while preserving object quality. Unlike existing 3DGS pruning methods that rely on global importance metrics, Clean-GS uses semantic information from as few as 3 segmentation masks (1\% of views) to identify and remove Gaussians not belonging to the target object. Our multi-stage approach consisting of (1) whitelist filtering via projection to masked regions, (2) depth-buffered color validation, and (3) neighbor-based outlier removal isolates monuments and objects from complex outdoor scenes. Experiments on Tanks and Temples show that Clean-GS reduces file sizes from 125MB to 47MB while maintaining rendering quality, making 3DGS models practical for web deployment and AR/VR applications. Our code is available at https://github.com/smlab-niser/clean-gs


编码器(1篇)

【1】Physically-Constrained Autoencoder-Assisted Bayesian Optimization for Refinement of High-Dimensional Defect-Sensitive Single Crystalline Structure
标题:物理约束的自动编码器辅助的Bayesian优化用于细化多维缺陷敏感性晶体结构
链接:https://arxiv.org/abs/2601.00855

作者:Joseph Oche Agada,Andrew McAninch,Haley Day,Yasemin Tanyu,Ewan McCombs,Seyed M. Koohpayeh,Brian H. Toby,Yishu Wang,Arpan Biswas
备注:15 pages, 8 figures
摘要:Physical properties and functionalities of materials are dictated by global crystal structures as well as local defects. To establish a structure-property relationship, not only the crystallographic symmetry but also quantitative knowledge about defects are required. Here we present a hybrid Machine Learning framework that integrates a physically-constrained variational-autoencoder (pcVAE) with different Bayesian Optimization (BO) methods to systematically accelerate and improve crystal structure refinement with resolution of defects. We chose the pyrochlore structured Ho2Ti2O7 as a model system and employed the GSAS2 package for benchmarking crystallographic parameters from Rietveld refinement. However, the function space of these material systems is highly nonlinear, which limits optimizers like traditional Rietveld refinement, into trapping at local minima. Also, these naive methods don't provide an extensive learning about the overall function space, which is essential for large space, large time consuming explorations to identify various potential regions of interest. Thus, we present the approach of exploring the high Dimensional structure parameters of defect sensitive systems via pretrained pcVAE assisted BO and Sparse Axis Aligned BO. The pcVAE projects high-Dimensional diffraction data consisting of thousands of independently measured diffraction orders into a lowD latent space while enforcing scaling invariance and physical relevance. Then via BO methods, we aim to minimize the L2 norm based chisq errors in the real and latent spaces separately between experimental and simulated diffraction patterns, thereby steering the refinement towards potential optimum crystal structure parameters. We investigated and compared the results among different pcVAE assisted BO, non pcVAE assisted BO, and Rietveld refinement.


优化|敛散性(11篇)

【1】Theoretical Convergence of SMOTE-Generated Samples
标题:SMOTE生成样本的理论收敛
链接:https://arxiv.org/abs/2601.01927

作者:Firuz Kamalov,Hana Sulieman,Witold Pedrycz
摘要:Imbalanced data affects a wide range of machine learning applications, from healthcare to network security. As SMOTE is one of the most popular approaches to addressing this issue, it is imperative to validate it not only empirically but also theoretically. In this paper, we provide a rigorous theoretical analysis of SMOTE's convergence properties. Concretely, we prove that the synthetic random variable Z converges in probability to the underlying random variable X. We further prove a stronger convergence in mean when X is compact. Finally, we show that lower values of the nearest neighbor rank lead to faster convergence offering actionable guidance to practitioners. The theoretical results are supported by numerical experiments using both real-life and synthetic data. Our work provides a foundational understanding that enhances data augmentation techniques beyond imbalanced data scenarios.


【2】Moments Matter:Stabilizing Policy Optimization using Return Distributions
标题:时刻很重要:利用回报分布稳定政策优化
链接:https://arxiv.org/abs/2601.01803

作者:Dennis Jabs,Aditya Mohan,Marius Lindauer
备注:Workshop paper at RLDM'25
摘要 :Deep Reinforcement Learning (RL) agents often learn policies that achieve the same episodic return yet behave very differently, due to a combination of environmental (random transitions, initial conditions, reward noise) and algorithmic (minibatch selection, exploration noise) factors. In continuous control tasks, even small parameter shifts can produce unstable gaits, complicating both algorithm comparison and real-world transfer. Previous work has shown that such instability arises when policy updates traverse noisy neighborhoods and that the spread of post-update return distribution $R(θ)$, obtained by repeatedly sampling minibatches, updating $θ$, and measuring final returns, is a useful indicator of this noise. Although explicitly constraining the policy to maintain a narrow $R(θ)$ can improve stability, directly estimating $R(θ)$ is computationally expensive in high-dimensional settings. We propose an alternative that takes advantage of environmental stochasticity to mitigate update-induced variability. Specifically, we model state-action return distribution through a distributional critic and then bias the advantage function of PPO using higher-order moments (skewness and kurtosis) of this distribution. By penalizing extreme tail behaviors, our method discourages policies from entering parameter regimes prone to instability. We hypothesize that in environments where post-update critic values align poorly with post-update returns, standard PPO struggles to produce a narrow $R(θ)$. In such cases, our moment-based correction narrows $R(θ)$, improving stability by up to 75% in Walker2D, while preserving comparable evaluation returns.


【3】The Optimal Sample Complexity of Linear Contracts
标题:线性合同的最佳样本复杂性
链接:https://arxiv.org/abs/2601.01496

作者:Mikael Møller Høgsgaard
摘要:In this paper, we settle the problem of learning optimal linear contracts from data in the offline setting, where agent types are drawn from an unknown distribution and the principal's goal is to design a contract that maximizes her expected utility. Specifically, our analysis shows that the simple Empirical Utility Maximization (EUM) algorithm yields an $\varepsilon$-approximation of the optimal linear contract with probability at least $1-δ$, using just $O(\ln(1/δ) / \varepsilon^2)$ samples. This result improves upon previously known bounds and matches a lower bound from Duetting et al. [2025] up to constant factors, thereby proving its optimality. Our analysis uses a chaining argument, where the key insight is to leverage a simple structural property of linear contracts: their expected reward is non-decreasing. This property, which holds even though the utility function itself is non-monotone and discontinuous, enables the construction of fine-grained nets required for the chaining argument, which in turn yields the optimal sample complexity. Furthermore, our proof establishes the stronger guarantee of uniform convergence: the empirical utility of every linear contract is a $\varepsilon$-approximation of its true expectation with probability at least $1-δ$, using the same optimal $O(\ln(1/δ) / \varepsilon^2)$ sample complexity.


【4】Accelerating Decentralized Optimization via Overlapping Local Steps
标题:通过重叠本地步骤加速分散优化
链接:https://arxiv.org/abs/2601.01493

作者:Yijie Zhou,Shi Pu
摘要:Decentralized optimization has emerged as a critical paradigm for distributed learning, enabling scalable training while preserving data privacy through peer-to-peer collaboration. However, existing methods often suffer from communication bottlenecks due to frequent synchronization between nodes. We present Overlapping Local Decentralized SGD (OLDSGD), a novel approach to accelerate decentralized training by computation-communication overlapping, significantly reducing network idle time. With a deliberately designed update, OLDSGD preserves the same average update as Local SGD while avoiding communication-induced stalls. Theoretically, we establish non-asymptotic convergence rates for smooth non-convex objectives, showing that OLDSGD retains the same iteration complexity as standard Local Decentralized SGD while improving per-iteration runtime. Empirical results demonstrate OLDSGD's consistent improvements in wall-clock time convergence under different levels of communication delays. With minimal modifications to existing frameworks, OLDSGD offers a practical solution for faster decentralized learning without sacrificing theoretical guarantees.


【5】Multi-Subspace Multi-Modal Modeling for Diffusion Models: Estimation, Convergence and Mixture of Experts
标题:扩散模型的多子空间多模式建模:估计、收敛和专家混合
链接:https://arxiv.org/abs/2601.01475

作者:Ruofeng Yang,Yongcan Li,Bo Jiang,Cheng Chen,Shuai Li
摘要:Recently, diffusion models have achieved a great performance with a small dataset of size $n$ and a fast optimization process. However, the estimation error of diffusion models suffers from the curse of dimensionality $n^{-1/D}$ with the data dimension $D$. Since images are usually a union of low-dimensional manifolds, current works model the data as a union of linear subspaces with Gaussian latent and achieve a $1/\sqrt{n}$ bound. Though this modeling reflects the multi-manifold property, the Gaussian latent can not capture the multi-modal property of the latent manifold. To bridge this gap, we propose the mixture subspace of low-rank mixture of Gaussian (MoLR-MoG) modeling, which models the target data as a union of $K$ linear subspaces, and each subspace admits a mixture of Gaussian latent ($n_k$ modals with dimension $d_k$). With this modeling, the corresponding score function naturally has a mixture of expert (MoE) structure, captures the multi-modal information, and contains nonlinear property. We first conduct real-world experiments to show that the generation results of MoE-latent MoG NN are much better than MoE-latent Gaussian score. Furthermore, MoE-latent MoG NN achieves a comparable performance with MoE-latent Unet with $10 \times$ parameters. These results indicate that the MoLR-MoG modeling is reasonable and suitable for real-world data. After that, based on such MoE-latent MoG score, we provide a $R^4\sqrt{Σ_{k=1}^Kn_k}\sqrt{Σ_{k=1}^Kn_kd_k}/\sqrt{n}$ estimation error, which escapes the curse of dimensionality by using data structure. Finally, we study the optimization process and prove the convergence guarantee under the MoLR-MoG modeling. Combined with these results, under a setting close to real-world data, this work explains why diffusion models only require a small training sample and enjoy a fast optimization process to achieve a great performance.


【6】Accelerating Monte-Carlo Tree Search with Optimized Posterior Policies
标题:利用优化的后验策略加速蒙特卡洛树搜索
链接:https://arxiv.org/abs/2601.01301

作者:Keith Frankston,Benjamin Howard
备注:11 pages; an efficient implementation is available at https://github.com/bhoward73/rmcts
摘要 :We introduce a recursive AlphaZero-style Monte--Carlo tree search algorithm, "RMCTS". The advantage of RMCTS over AlphaZero's MCTS-UCB is speed. In RMCTS, the search tree is explored in a breadth-first manner, so that network inferences naturally occur in large batches. This significantly reduces the GPU latency cost. We find that RMCTS is often more than 40 times faster than MCTS-UCB when searching a single root state, and about 3 times faster when searching a large batch of root states.   The recursion in RMCTS is based on computing optimized posterior policies at each game state in the search tree, starting from the leaves and working back up to the root. Here we use the posterior policy explored in "Monte--Carlo tree search as regularized policy optimization" (Grill, et al.) Their posterior policy is the unique policy which maximizes the expected reward given estimated action rewards minus a penalty for diverging from the prior policy.   The tree explored by RMCTS is not defined in an adaptive manner, as it is in MCTS-UCB. Instead, the RMCTS tree is defined by following prior network policies at each node. This is a disadvantage, but the speedup advantage is more significant, and in practice we find that RMCTS-trained networks match the quality of MCTS-UCB-trained networks in roughly one-third of the training time. We include timing and quality comparisons of RMCTS vs. MCTS-UCB for three games: Connect-4, Dots-and-Boxes, and Othello.


【7】Discount Model Search for Quality Diversity Optimization in High-Dimensional Measure Spaces
标题:多维测量空间中质量多样性优化的折扣模型搜索
链接:https://arxiv.org/abs/2601.01082

作者:Bryon Tjanaka,Henry Chen,Matthew C. Fontaine,Stefanos Nikolaidis
备注:Source code available at https://github.com/icaros-usc/discount-models
摘要:Quality diversity (QD) optimization searches for a collection of solutions that optimize an objective while attaining diverse outputs of a user-specified, vector-valued measure function. Contemporary QD algorithms focus on low-dimensional measures because high-dimensional measures are prone to distortion, where many solutions found by the QD algorithm map to similar measures. For example, the CMA-MAE algorithm guides measure space exploration with a histogram in measure space that records so-called discount values. However, CMA-MAE stagnates in domains with high-dimensional measure spaces because solutions with similar measures fall into the same histogram cell and thus receive identical discount values. To address these limitations, we propose Discount Model Search (DMS), which guides exploration with a model that provides a smooth, continuous representation of discount values. In high-dimensional measure spaces, this model enables DMS to distinguish between solutions with similar measures and thus continue exploration. We show that DMS facilitates new QD applications by introducing two domains where the measure space is the high-dimensional space of images, which enables users to specify their desired measures by providing a dataset of images rather than hand-designing the measure function. Results in these domains and on high-dimensional benchmarks show that DMS outperforms CMA-MAE and other black-box QD algorithms.


【8】Dichotomous Diffusion Policy Optimization
标题:二分法扩散政策优化
链接:https://arxiv.org/abs/2601.00898

作者:Ruiming Liang,Yinan Zheng,Kexin Zheng,Tianyi Tan,Jianxiong Li,Liyuan Mao,Zhihao Wang,Guang Chen,Hangjun Ye,Jingjing Liu,Jinqiao Wang,Xianyuan Zhan
摘要:Diffusion-based policies have gained growing popularity in solving a wide range of decision-making tasks due to their superior expressiveness and controllable generation during inference. However, effectively training large diffusion policies using reinforcement learning (RL) remains challenging. Existing methods either suffer from unstable training due to directly maximizing value objectives, or face computational issues due to relying on crude Gaussian likelihood approximation, which requires a large amount of sufficiently small denoising steps. In this work, we propose DIPOLE (Dichotomous diffusion Policy improvement), a novel RL algorithm designed for stable and controllable diffusion policy optimization. We begin by revisiting the KL-regularized objective in RL, which offers a desirable weighted regression objective for diffusion policy extraction, but often struggles to balance greediness and stability. We then formulate a greedified policy regularization scheme, which naturally enables decomposing the optimal policy into a pair of stably learned dichotomous policies: one aims at reward maximization, and the other focuses on reward minimization. Under such a design, optimized actions can be generated by linearly combining the scores of dichotomous policies during inference, thereby enabling flexible control over the level of greediness.Evaluations in offline and offline-to-online RL settings on ExORL and OGBench demonstrate the effectiveness of our approach. We also use DIPOLE to train a large vision-language-action (VLA) model for end-to-end autonomous driving (AD) and evaluate it on the large-scale real-world AD benchmark NAVSIM, highlighting its potential for complex real-world applications.


【9】FedSCAM (Federated Sharpness-Aware Minimization with Clustered Aggregation and Modulation): Scam-resistant SAM for Robust Federated Optimization in Heterogeneous Environments
标题:FedSCAM(具有分散聚集和调制的联邦敏锐度感知最小化):抗诈骗的CAM,用于在异类环境中实现稳健的联邦优化
链接:https://arxiv.org/abs/2601.00853

作者:Sameer Rahil,Zain Abdullah Ahmad,Talha Asif
备注:13 pages, 27 figures
摘要:Federated Learning (FL) enables collaborative model training across decentralized edge devices while preserving data privacy. However, statistical heterogeneity among clients, often manifested as non-IID label distributions, poses significant challenges to convergence and generalization. While Sharpness-Aware Minimization (SAM) has been introduced to FL to seek flatter, more robust minima, existing approaches typically apply a uniform perturbation radius across all clients, ignoring client-specific heterogeneity. In this work, we propose \textbf{FedSCAM} (Federated Sharpness-Aware Minimization with Clustered Aggregation and Modulation), a novel algorithm that dynamically adjusts the SAM perturbation radius and aggregation weights based on client-specific heterogeneity scores. By calculating a heterogeneity metric for each client and modulating the perturbation radius inversely to this score, FedSCAM prevents clients with high variance from destabilizing the global model. Furthermore, we introduce a heterogeneity-aware weighted aggregation mechanism that prioritizes updates from clients that align with the global optimization direction. Extensive experiments on CIFAR-10 and Fashion-MNIST under various degrees of Dirichlet-based label skew demonstrate that FedSCAM achieves competitive performance among state-of-the-art baselines, including FedSAM, FedLESAM, etc. in terms of convergence speed and final test accuracy.


【10】Stochastic Control Methods for Optimization
标题:优化的随机控制方法
链接:https://arxiv.org/abs/2601.01248

作者:Jinniao Qiu
摘要 :In this work, we investigate a stochastic control framework for global optimization over both finite-dimensional Euclidean spaces and the Wasserstein space of probability measures. In the Euclidean setting, the original minimization problem is approximated by a family of regularized stochastic control problems; using dynamic programming, we analyze the associated Hamilton--Jacobi--Bellman equations and obtain tractable representations via the Cole--Hopf transform and the Feynman--Kac formula. For optimization over probability measures, we formulate a regularized mean-field control problem characterized by a master equation, and further approximate it by controlled $N$-particle systems. We establish that, as the regularization parameter tends to zero (and as the particle number tends to infinity for the optimization over probability measures), the value of the control problem converges to the global minimum of the original objective. Building on the resulting probabilistic representations, Monte Carlo-based numerical schemes are proposed and numerical experiments are reported to illustrate the practical performance of the methods and to support the theoretical convergence rates.


【11】Fibonacci-Driven Recursive Ensembles: Algorithms, Convergence, and Learning Dynamics
标题:斐波那契驱动的回归集成:算法、收敛和学习动力学
链接:https://arxiv.org/abs/2601.01055

作者:Ernest Fokoué
备注:19 pages
摘要:This paper develops the algorithmic and dynamical foundations of recursive ensemble learning driven by Fibonacci-type update flows. In contrast with classical boosting  Freund and Schapire (1997); Friedman (2001), where the ensemble evolves through first-order additive updates, we study second-order recursive architectures in which each predictor depends on its two immediate predecessors. These Fibonacci flows induce a learning dynamic with memory, allowing ensembles to integrate past structure while adapting to new residual information. We introduce a general family of recursive weight-update algorithms encompassing Fibonacci, tribonacci, and higher-order recursions, together with continuous-time limits that yield systems of differential equations governing ensemble evolution. We establish global convergence conditions, spectral stability criteria, and non-asymptotic generalization bounds under Rademacher Bartlett and Mendelson (2002) and algorithmic stability analyses. The resulting theory unifies recursive ensembles, structured weighting, and dynamical systems viewpoints in statistical learning. Experiments with kernel ridge regression Rasmussen and Williams (2006), spline smoothers Wahba (1990), and random Fourier feature models Rahimi and Recht (2007) demonstrate that recursive flows consistently improve approximation and generalization beyond static weighting. These results complete the trilogy begun in Papers I and II: from Fibonacci weighting, through geometric weighting theory, to fully dynamical recursive ensemble learning systems.


预测|估计(15篇)

【1】Temporal Kolmogorov-Arnold Networks (T-KAN) for High-Frequency Limit Order Book Forecasting: Efficiency, Interpretability, and Alpha Decay
标题:用于高频极限订单书预测的时态Kolmogorov-Arnold网络(T-KAN):效率、可解释性和Alpha衰变
链接:https://arxiv.org/abs/2601.02310

作者:Ahmad Makinde
备注:8 pages, 5 figures, Proposes T-KAN architecture for HFT. Achieves 19.1% F1-score improvement on FI-2010 and 132.48% return in cost-adjusted backtests.Proposes T-KAN architecture for HFT. Achieves 19.1% F1-score improvement on FI-2010 and 132.48% return in cost-adjusted backtests
摘要:High-Frequency trading (HFT) environments are characterised by large volumes of limit order book (LOB) data, which is notoriously noisy and non-linear. Alpha decay represents a significant challenge, with traditional models such as DeepLOB losing predictive power as the time horizon (k) increases. In this paper, using data from the FI-2010 dataset, we introduce Temporal Kolmogorov-Arnold Networks (T-KAN) to replace the fixed, linear weights of standard LSTMs with learnable B-spline activation functions. This allows the model to learn the 'shape' of market signals as opposed to just their magnitude. This resulted in a 19.1% relative improvement in the F1-score at the k = 100 horizon. The efficacy of T-KAN networks cannot be understated, producing a 132.48% return compared to the -82.76% DeepLOB drawdown under 1.0 bps transaction costs. In addition to this, the T-KAN model proves quite interpretable, with the 'dead-zones' being clearly visible in the splines. The T-KAN architecture is also uniquely optimized for low-latency FPGA implementation via High level Synthesis (HLS). The code for the experiments in this project can be found at https://github.com/AhmadMak/Temporal-Kolmogorov-Arnold-Networks-T-KAN-for-High-Frequency-Limit-Order-Book-Forecasting.


【2】Improved Accuracy for Private Continual Cardinality Estimation in Fully Dynamic Streams via Matrix Factorization
标题:通过矩阵分解提高全动态流中私人连续基数估计的准确性
链接:https://arxiv.org/abs/2601.02257

作者:Joel Daniel Andersson,Palak Jain,Satchit Sivakumar
摘要:We study differentially-private statistics in the fully dynamic continual observation model, where many updates can arrive at each time step and updates to a stream can involve both insertions and deletions of an item. Earlier work (e.g., Jain et al., NeurIPS 2023 for counting distinct elements; Raskhodnikova & Steiner, PODS 2025 for triangle counting with edge updates) reduced the respective cardinality estimation problem to continual counting on the difference stream associated with the true function values on the input stream. In such reductions, a change in the original stream can cause many changes in the difference stream, this poses a challenge for applying private continual counting algorithms to obtain optimal error bounds. We improve the accuracy of several such reductions by studying the associated $\ell_p$-sensitivity vectors of the resulting difference streams and isolating their properties.   We demonstrate that our framework gives improved bounds for counting distinct elements, estimating degree histograms, and estimating triangle counts (under a slightly relaxed privacy model), thus offering a general approach to private continual cardinality estimation in streaming settings. Our improved accuracy stems from tight analysis of known factorization mechanisms for the counting matrix in this setting; the key technical challenge is arguing that one can use state-of-the-art factorizations for sensitivity vector sets with the properties we isolate. Empirically and analytically, we demonstrate that our improved error bounds offer a substantial improvement in accuracy for cardinality estimation problems over a large range of parameters.


【3】Edge-aware GAT-based protein binding site prediction
标题:边缘感知的基于GAT的蛋白质结合位点预测
链接:https://arxiv.org/abs/2601.02138

作者:Weisen Yang,Hanqing Zhang,Wangren Qiu,Xuan Xiao,Weizhong Lin
备注:24 pages, 10 figures, 6 tables
摘要 :Accurate identification of protein binding sites is crucial for understanding biomolecular interaction mechanisms and for the rational design of drug targets. Traditional predictive methods often struggle to balance prediction accuracy with computational efficiency when capturing complex spatial conformations. To address this challenge, we propose an Edge-aware Graph Attention Network (Edge-aware GAT) model for the fine-grained prediction of binding sites across various biomolecules, including proteins, DNA/RNA, ions, ligands, and lipids. Our method constructs atom-level graphs and integrates multidimensional structural features, including geometric descriptors, DSSP-derived secondary structure, and relative solvent accessibility (RSA), to generate spatially aware embedding vectors. By incorporating interatomic distances and directional vectors as edge features within the attention mechanism, the model significantly enhances its representation capacity. On benchmark datasets, our model achieves an ROC-AUC of 0.93 for protein-protein binding site prediction, outperforming several state-of-the-art methods. The use of directional tensor propagation and residue-level attention pooling further improves both binding site localization and the capture of local structural details. Visualizations using PyMOL confirm the model's practical utility and interpretability. To facilitate community access and application, we have deployed a publicly accessible web server at http://119.45.201.89:5000/. In summary, our approach offers a novel and efficient solution that balances prediction accuracy, generalization, and interpretability for identifying functional sites in proteins.


【4】Horizon Activation Mapping for Neural Networks in Time Series Forecasting
标题:时间序列预测中神经网络的视界激活映射
链接:https://arxiv.org/abs/2601.02094

作者:Hans Krupakar,V A Kandappan
摘要:Neural networks for time series forecasting have relied on error metrics and architecture-specific interpretability approaches for model selection that don't apply across models of different families. To interpret forecasting models agnostic to the types of layers across state-of-the-art model families, we introduce Horizon Activation Mapping (HAM), a visual interpretability technique inspired by grad-CAM that uses gradient norm averages to study the horizon's subseries where grad-CAM studies attention maps over image data. We introduce causal and anti-causal modes to calculate gradient update norm averages across subseries at every timestep and lines of proportionality signifying uniform distributions of the norm averages. Optimization landscape studies with respect to changes in batch sizes, early stopping, train-val-test splits, univariate forecasting and dropouts are studied with respect to performances and subseries in HAM. Interestingly, batch size based differences in activities seem to indicate potential for existence of an exponential approximation across them per epoch relative to each other. Multivariate forecasting models including MLP-based CycleNet, N-Linear, N-HITS, self attention-based FEDformer, Pyraformer, SSM-based SpaceTime and diffusion-based Multi-Resolution DDPM over different horizon sizes trained over the ETTm2 dataset are used for HAM plots in this study. NHITS' neural approximation theorem and SpaceTime's exponential autoregressive activities have been attributed to trends in HAM plots over their training, validation and test sets. In general, HAM can be used for granular model selection, validation set choices and comparisons across different neural network model families.


【5】A Defect is Being Born: How Close Are We? A Time Sensitive Forecasting Approach
标题:一个缺陷正在诞生:我们有多接近?一种时间敏感的预测方法
链接:https://arxiv.org/abs/2601.01921

作者:Mikel Robredo,Matteo Esposito,Fabio Palomba,Rafael Peñaloza,Valentina Lenarduzzi
备注:ACCEPTED REGISTERED REPORT AT SANER (CORE A*) 2026
摘要:Background. Defect prediction has been a highly active topic among researchers in the Empirical Software Engineering field. Previous literature has successfully achieved the most accurate prediction of an incoming fault and identified the features and anomalies that precede it through just-in-time prediction. As software systems evolve continuously, there is a growing need for time-sensitive methods capable of forecasting defects before they manifest.   Aim. Our study seeks to explore the effectiveness of time-sensitive techniques for defect forecasting. Moreover, we aim to investigate the early indicators that precede the occurrence of a defect.   Method. We will train multiple time-sensitive forecasting techniques to forecast the future bug density of a software project, as well as identify the early symptoms preceding the occurrence of a defect.   Expected results. Our expected results are translated into empirical evidence on the effectiveness of our approach for early estimation of bug proneness.


【6】Enhanced Multi-model Online Conformal Prediction
标题:增强型多模型在线保形预测
链接:https://arxiv.org/abs/2601.01692

作者:Erfan Hajihashemi,Yanning Shen
摘要:Conformal prediction is a framework for uncertainty quantification that constructs prediction sets for previously unseen data, guaranteeing coverage of the true label with a specified probability. However, the efficiency of these prediction sets, measured by their size, depends on the choice of the underlying learning model. Relying on a single fixed model may lead to suboptimal performance in online environments, as a single model may not consistently perform well across all time steps. To mitigate this, prior work has explored selecting a model from a set of candidates. However, this approach becomes computationally expensive as the number of candidate models increases. Moreover, poorly performing models in the set may also hinder the effectiveness. To tackle this challenge, this work develops a novel multi-model online conformal prediction algorithm that reduces computational complexity and improves prediction efficiency. At each time step, a bipartite graph is generated to identify a subset of effective models, from which a model is selected to construct the prediction set. Experiments demonstrate that our method outperforms existing multi-model conformal prediction techniques in terms of both prediction set size and computational efficiency.


【7】Reliable Grid Forecasting: State Space Models for Safety-Critical Energy Systems
标题:可靠的电网预测:安全关键能源系统的状态空间模型
链接:https://arxiv.org/abs/2601.01410

作者:Jisoo Lee,Sunki Hong
备注:24 pages, 8 figures, 8 tables
摘要 :Accurate grid load forecasting is safety-critical: under-predictions risk supply shortfalls, while symmetric error metrics mask this operational asymmetry. We introduce a grid-specific evaluation framework--Asymmetric MAPE, Under-Prediction Rate, and Reserve Margin--that directly measures operational risk rather than statistical accuracy alone.   Using this framework, we conduct a systematic evaluation of Mamba-based State Space Models for California grid forecasting on a weather-aligned CAISO TAC-area dataset spanning Nov 2023--Nov 2025 (84,498 hourly records across 5 transmission areas). Our analysis reveals that standard accuracy metrics are poor proxies for operational safety: models with identical MAPE can require vastly different reserve margins.   We demonstrate that forecast errors are weakly but significantly associated with temperature (r = 0.16, p < 10^{-16}), motivating weather-aware modeling rather than loss function modification alone. The S-Mamba model achieves the lowest Reserve_{99.5}% margin (14.12%) compared to 16.66% for iTransformer, demonstrating superior forecast reliability under a 99.5th-percentile tail-risk reserve proxy.


【8】Data Complexity-aware Deep Model Performance Forecasting
标题:数据复杂性感知深度模型性能预测
链接:https://arxiv.org/abs/2601.01383

作者:Yen-Chia Chen,Hsing-Kuo Pao,Hanjuan Huang
备注:12 pages, 12 figures
摘要:Deep learning models are widely used across computer vision and other domains. When working on the model induction, selecting the right architecture for a given dataset often relies on repetitive trial-and-error procedures. This procedure is time-consuming, resource-intensive, and difficult to automate. While previous work has explored performance prediction using partial training or complex simulations, these methods often require significant computational overhead or lack generalizability. In this work, we propose an alternative approach: a lightweight, two-stage framework that can estimate model performance before training given the understanding of the dataset and the focused deep model structures. The first stage predicts a baseline based on the analysis of some measurable properties of the dataset, while the second stage adjusts the estimation with additional information on the model's architectural and hyperparameter details. The setup allows the framework to generalize across datasets and model types. Moreover, we find that some of the underlying features used for prediction - such as dataset variance - can offer practical guidance for model selection, and can serve as early indicators of data quality. As a result, the framework can be used not only to forecast model performance, but also to guide architecture choices, inform necessary preprocessing procedures, and detect potentially problematic datasets before training begins.


【9】MentalGame: Predicting Personality-Job Fitness for Software Developers Using Multi-Genre Games and Machine Learning Approaches
标题:MentalGame:使用多类型游戏和机器学习方法预测软件开发人员的性格与工作适合度
链接:https://arxiv.org/abs/2601.01206

作者:Soroush Elyasi,Arya VarastehNezhad,Fattaneh Taghiyareh
摘要:Personality assessment in career guidance and personnel selection traditionally relies on self-report questionnaires, which are susceptible to response bias, fatigue, and intentional distortion. Game-based assessment offers a promising alternative by capturing implicit behavioral signals during gameplay. This study proposes a multi-genre serious-game framework combined with machine-learning techniques to predict suitability for software development roles. Developer-relevant personality and behavioral traits were identified through a systematic literature review and an empirical study of professional software engineers. A custom mobile game was designed to elicit behaviors related to problem solving, planning, adaptability, persistence, time management, and information seeking. Fine-grained gameplay event data were collected and analyzed using a two-phase modeling strategy where suitability was predicted exclusively from gameplay-derived behavioral features. Results show that our model achieved up to 97% precision and 94% accuracy. Behavioral analysis revealed that proper candidates exhibited distinct gameplay patterns, such as more wins in puzzle-based games, more side challenges, navigating menus more frequently, and exhibiting fewer pauses, retries, and surrender actions. These findings demonstrate that implicit behavioral traces captured during gameplay is promising in predicting software-development suitability without explicit personality testing, supporting serious games as a scalable, engaging, and less biased alternative for career assessment.


【10】MODE: Efficient Time Series Prediction with Mamba Enhanced by Low-Rank Neural ODEs
标题:Mode:低级别神经ODE增强的Mamba高效时间序列预测
链接:https://arxiv.org/abs/2601.00920

作者:Xingsheng Chen,Regina Zhang,Bo Gao,Xingwei He,Xiaofeng Liu,Pietro Lio,Kwok-Yan Lam,Siu-Ming Yiu
备注:12 pages, 6 tables
摘要:Time series prediction plays a pivotal role across diverse domains such as finance, healthcare, energy systems, and environmental modeling. However, existing approaches often struggle to balance efficiency, scalability, and accuracy, particularly when handling long-range dependencies and irregularly sampled data. To address these challenges, we propose MODE, a unified framework that integrates Low-Rank Neural Ordinary Differential Equations (Neural ODEs) with an Enhanced Mamba architecture. As illustrated in our framework, the input sequence is first transformed by a Linear Tokenization Layer and then processed through multiple Mamba Encoder blocks, each equipped with an Enhanced Mamba Layer that employs Causal Convolution, SiLU activation, and a Low-Rank Neural ODE enhancement to efficiently capture temporal dynamics. This low-rank formulation reduces computational overhead while maintaining expressive power. Furthermore, a segmented selective scanning mechanism, inspired by pseudo-ODE dynamics, adaptively focuses on salient subsequences to improve scalability and long-range sequence modeling. Extensive experiments on benchmark datasets demonstrate that MODE surpasses existing baselines in both predictive accuracy and computational efficiency. Overall, our contributions include: (1) a unified and efficient architecture for long-term time series modeling, (2) integration of Mamba's selective scanning with low-rank Neural ODEs for enhanced temporal representation, and (3) substantial improvements in efficiency and scalability enabled by low-rank approximation and dynamic selective scanning.


【11】Universal Battery Degradation Forecasting Driven by Foundation Model Across Diverse Chemistries and Conditions
标题:基础模型驱动的通用电池退化预测跨越不同化学物质和条件
链接:https://arxiv.org/abs/2601.00862

作者:Joey Chan,Huan Wang,Haoyu Pan,Wei Wu,Zirong Wang,Zhen Chen,Ershun Pan,Min Xie,Lifeng Xi
备注:Due to space limitations, the open-source method for supporting materials is currently under discussion
摘要:Accurate forecasting of battery capacity fade is essential for the safety, reliability, and long-term efficiency of energy storage systems. However, the strong heterogeneity across cell chemistries, form factors, and operating conditions makes it difficult to build a single model that generalizes beyond its training domain. This work proposes a unified capacity forecasting framework that maintains robust performance across diverse chemistries and usage scenarios. We curate 20 public aging datasets into a large-scale corpus covering 1,704 cells and 3,961,195 charge-discharge cycle segments, spanning temperatures from $-5\,^{\circ}\mathrm{C}$ to $45\,^{\circ}\mathrm{C}$, multiple C-rates, and application-oriented profiles such as fast charging and partial cycling. On this corpus, we adopt a Time-Series Foundation Model (TSFM) backbone and apply parameter-efficient Low-Rank Adaptation (LoRA) together with physics-guided contrastive representation learning to capture shared degradation patterns. Experiments on both seen and deliberately held-out unseen datasets show that a single unified model achieves competitive or superior accuracy compared with strong per-dataset baselines, while retaining stable performance on chemistries, capacity scales, and operating conditions excluded from training. These results demonstrate the potential of TSFM-based architectures as a scalable and transferable solution for capacity degradation forecasting in real battery management systems.


【12】Efficient temporal prediction of compressible flows in irregular domains using Fourier neural operators
标题:使用傅里叶神经运算符对不规则域中可压缩流进行有效的时间预测
链接:https://arxiv.org/abs/2601.01922

作者:Yifan Nie,Qiaoxin Li
备注:18 pages, 15 figures
摘要:This paper investigates the temporal evolution of high-speed compressible fluids in irregular flow fields using the Fourier Neural Operator (FNO). We reconstruct the irregular flow field point set into sequential format compatible with FNO input requirements, and then embed temporal bundling technique within a recurrent neural network (RNN) for multi-step prediction. We further employ a composite loss function to balance errors across different physical quantities. Experiments are conducted on three different types of irregular flow fields, including orthogonal and non-orthogonal grid configurations. Then we comprehensively analyze the physical component loss curves, flow field visualizations, and physical profiles. Results demonstrate that our approach significantly surpasses traditional numerical methods in computational efficiency while achieving high accuracy, with maximum relative $L_2$ errors of (0.78, 0.57, 0.35)% for ($p$, $T$, $\mathbf{u}$) respectively. This verifies that the method can efficiently and accurately simulate the temporal evolution of high-speed compressible flows in irregular domains.


【13】UniCrop: A Universal, Multi-Source Data Engineering Pipeline for Scalable Crop Yield Prediction
标题:UniCrop:用于可扩展作物产量预测的通用、多源数据工程管道
链接:https://arxiv.org/abs/2601.01655

作者:Emiliya Khidirova,Oktay Karakuş
摘要:Accurate crop yield prediction relies on diverse data streams, including satellite, meteorological, soil, and topographic information. However, despite rapid advances in machine learning, existing approaches remain crop- or region-specific and require data engineering efforts. This limits scalability, reproducibility, and operational deployment. This study introduces UniCrop, a universal and reusable data pipeline designed to automate the acquisition, cleaning, harmonisation, and engineering of multi-source environmental data for crop yield prediction. For any given location, crop type, and temporal window, UniCrop automatically retrieves, harmonises, and engineers over 200 environmental variables (Sentinel-1/2, MODIS, ERA5-Land, NASA POWER, SoilGrids, and SRTM), reducing them to a compact, analysis-ready feature set utilising a structured feature reduction workflow with minimum redundancy maximum relevance (mRMR). To validate, UniCrop was applied to a rice yield dataset comprising 557 field observations. Using only the selected 15 features, four baseline machine learning models (LightGBM, Random Forest, Support Vector Regression, and Elastic Net) were trained. LightGBM achieved the best single-model performance (RMSE = 465.1 kg/ha, $R^2 = 0.6576$), while a constrained ensemble of all baselines further improved accuracy (RMSE = 463.2 kg/ha, $R^2 = 0.6604$). UniCrop contributes a scalable and transparent data-engineering framework that addresses the primary bottleneck in operational crop yield modelling: the preparation of consistent and harmonised multi-source data. By decoupling data specification from implementation and supporting any crop, region, and time frame through simple configuration updates, UniCrop provides a practical foundation for scalable agricultural analytics. The code and implementation documentation are shared in https://github.com/CoDIS-Lab/UniCrop.


【14】Beyond Demand Estimation: Consumer Surplus Evaluation via Cumulative Propensity Weights
标题:超越需求估计:通过累积倾向权重进行消费者剩余评估
链接:https://arxiv.org/abs/2601.01029

作者:Zeyu Bian,Max Biggs,Ruijiang Gao,Zhengling Qi
备注:74 pages
摘要 :This paper develops a practical framework for using observational data to audit the consumer surplus effects of AI-driven decisions, specifically in targeted pricing and algorithmic lending. Traditional approaches first estimate demand functions and then integrate to compute consumer surplus, but these methods can be challenging to implement in practice due to model misspecification in parametric demand forms and the large data requirements and slow convergence of flexible nonparametric or machine learning approaches. Instead, we exploit the randomness inherent in modern algorithmic pricing, arising from the need to balance exploration and exploitation, and introduce an estimator that avoids explicit estimation and numerical integration of the demand function. Each observed purchase outcome at a randomized price is an unbiased estimate of demand and by carefully reweighting purchase outcomes using novel cumulative propensity weights (CPW), we are able to reconstruct the integral. Building on this idea, we introduce a doubly robust variant named the augmented cumulative propensity weighting (ACPW) estimator that only requires one of either the demand model or the historical pricing policy distribution to be correctly specified. Furthermore, this approach facilitates the use of flexible machine learning methods for estimating consumer surplus, since it achieves fast convergence rates by incorporating an estimate of demand, even when the machine learning estimate has slower convergence rates. Neither of these estimators is a standard application of off-policy evaluation techniques as the target estimand, consumer surplus, is unobserved. To address fairness, we extend this framework to an inequality-aware surplus measure, allowing regulators and firms to quantify the profit-equity trade-off. Finally, we validate our methods through comprehensive numerical studies.


【15】Dynamic Accuracy Estimation in a Wi-Fi-based Positioning System
标题:基于Wi-Fi的定位系统中的动态精度估计
链接:https://arxiv.org/abs/2601.00999

作者:Marcin Kolakowski,Vitomir Djaja-Josko
备注:Originally presented at 2025 33rd Telecommunications Forum (TELFOR), Belgrade, Serbia
摘要:The paper presents a concept of a dynamic accuracy estimation method, in which the localization errors are derived based on the measurement results used by the positioning algorithm. The concept was verified experimentally in a Wi\nobreakdash-Fi based indoor positioning system, where several regression methods were tested (linear regression, random forest, k-nearest neighbors, and neural networks). The highest positioning error estimation accuracy was achieved for random forest regression, with a mean absolute error of 0.72 m.


其他神经网络|深度学习|模型|建模(30篇)

【1】Game of Coding: Coding Theory in the Presence of Rational Adversaries, Motivated by Decentralized Machine Learning
标题:编码游戏:理性对手存在下的编码理论,受去中心化机器学习的激励
链接:https://arxiv.org/abs/2601.02313

作者:Hanzaleh Akbari Nodehi,Viveck R. Cadambe,Mohammad Ali Maddah-Ali
摘要:Coding theory plays a crucial role in enabling reliable communication, storage, and computation. Classical approaches assume a worst-case adversarial model and ensure error correction and data recovery only when the number of honest nodes exceeds the number of adversarial ones by some margin. However, in some emerging decentralized applications, particularly in decentralized machine learning (DeML), participating nodes are rewarded for accepted contributions. This incentive structure naturally gives rise to rational adversaries who act strategically rather than behaving in purely malicious ways.   In this paper, we first motivate the need for coding in the presence of rational adversaries, particularly in the context of outsourced computation in decentralized systems. We contrast this need with existing approaches and highlight their limitations. We then introduce the game of coding, a novel game-theoretic framework that extends coding theory to trust-minimized settings where honest nodes are not in the majority. Focusing on repetition coding, we highlight two key features of this framework: (1) the ability to achieve a non-zero probability of data recovery even when adversarial nodes are in the majority, and (2) Sybil resistance, i.e., the equilibrium remains unchanged even as the number of adversarial nodes increases. Finally, we explore scenarios in which the adversary's strategy is unknown and outline several open problems for future research.


【2】Neuro-Channel Networks: A Multiplication-Free Architecture by Biological Signal Transmission
标题:神经通道网络:通过生物信号传输实现的无多重化架构
链接:https://arxiv.org/abs/2601.02253

作者:Emrah Mete,Emin Erkan Korkmaz
备注:9 pages, 4 figures
摘要:The rapid proliferation of Deep Learning is increasingly constrained by its heavy reliance on high-performance hardware, particularly Graphics Processing Units (GPUs). These specialized accelerators are not only prohibitively expensive and energy-intensive but also suffer from significant supply scarcity, limiting the ubiquity of Artificial Intelligence (AI) deployment on edge devices. The core of this inefficiency stems from the standard artificial perceptron's dependence on intensive matrix multiplications. However, biological nervous systems achieve unparalleled efficiency without such arithmetic intensity; synaptic signal transmission is regulated by physical ion channel limits and chemical neurotransmitter levels rather than a process that can be analogous to arithmetic multiplication. Inspired by this biological mechanism, we propose Neuro-Channel Networks (NCN), a novel multiplication-free architecture designed to decouple AI from expensive hardware dependencies. In our model, weights are replaced with Channel Widths that physically limit the signal magnitude, while a secondary parameter acts as a Neurotransmitter to regulate Signal Transmission based on sign logic. The forward pass relies exclusively on addition, subtraction, and bitwise operations (minimum, sign), eliminating floating-point multiplication entirely. In this proof-of-concept study, we demonstrate that NCNs can solve non-linearly separable problems like XOR and the Majority function with 100% accuracy using standard backpropagation, proving their capability to form complex decision boundaries without multiplicative weights. This architecture offers a highly efficient alternative for next-generation neuromorphic hardware, paving the way for running complex models on commodity CPUs or ultra-low-power chips without relying on costly GPU clusters.


【3】Mind the Gap: Continuous Magnification Sampling for Pathology Foundation Models
标题:注意差距:病理学基础模型的连续放大抽样
链接:https://arxiv.org/abs/2601.02198

作者:Alexander Möllers,Julius Hense,Florian Schulz,Timo Milbich,Maximilian Alber,Lukas Ruff
摘要 :In histopathology, pathologists examine both tissue architecture at low magnification and fine-grained morphology at high magnification. Yet, the performance of pathology foundation models across magnifications and the effect of magnification sampling during training remain poorly understood. We model magnification sampling as a multi-source domain adaptation problem and develop a simple theoretical framework that reveals systematic trade-offs between sampling strategies. We show that the widely used discrete uniform sampling of magnifications (0.25, 0.5, 1.0, 2.0 mpp) leads to degradation at intermediate magnifications. We introduce continuous magnification sampling, which removes gaps in magnification coverage while preserving performance at standard scales. Further, we derive sampling distributions that optimize representation quality across magnification scales. To evaluate these strategies, we introduce two new benchmarks (TCGA-MS, BRACS-MS) with appropriate metrics. Our experiments show that continuous sampling substantially improves over discrete sampling at intermediate magnifications, with gains of up to 4 percentage points in balanced classification accuracy, and that optimized distributions can further improve performance. Finally, we evaluate current histopathology foundation models, finding that magnification is a primary driver of performance variation across models. Our work paves the way towards future pathology foundation models that perform reliably across magnifications.


【4】Prototype-Based Learning for Healthcare: A Demonstration of Interpretable AI
标题:基于原型的医疗保健学习:可解释人工智能的演示
链接:https://arxiv.org/abs/2601.02106

作者:Ashish Rana,Ammar Shaker,Sascha Saralajew,Takashi Suzuki,Kosuke Yasuda,Shintaro Kato,Toshikazu Wada,Toshiyuki Fujikawa,Toru Kikutsuji
备注:Accepted to the Demo Track at the IEEE International Conference on Data Mining (ICDM) 2025, where it received the Best Demo Award
摘要:Despite recent advances in machine learning and explainable AI, a gap remains in personalized preventive healthcare: predictions, interventions, and recommendations should be both understandable and verifiable for all stakeholders in the healthcare sector. We present a demonstration of how prototype-based learning can address these needs. Our proposed framework, ProtoPal, features both front- and back-end modes; it achieves superior quantitative performance while also providing an intuitive presentation of interventions and their simulated outcomes.


【5】LION-DG: Layer-Informed Initialization with Deep Gradient Protocols for Accelerated Neural Network Training
标题:LION-DG:采用深度梯度协议的分层知情收件箱,用于加速神经网络训练
链接:https://arxiv.org/abs/2601.02105

作者:Hyunjun Kim
摘要:Weight initialization remains decisive for neural network optimization, yet existing methods are largely layer-agnostic. We study initialization for deeply-supervised architectures with auxiliary classifiers, where untrained auxiliary heads can destabilize early training through gradient interference.   We propose LION-DG, a layer-informed initialization that zero-initializes auxiliary classifier heads while applying standard He-initialization to the backbone. We prove that this implements Gradient Awakening: auxiliary gradients are exactly zero at initialization, then phase in naturally as weights grow -- providing an implicit warmup without hyperparameters.   Experiments on CIFAR-10 and CIFAR-100 with DenseNet-DS and ResNet-DS architectures demonstrate: (1) DenseNet-DS: +8.3% faster convergence on CIFAR-10 with comparable accuracy, (2) Hybrid approach: Combining LSUV with LION-DG achieves best accuracy (81.92% on CIFAR-10), (3) ResNet-DS: Positive speedup on CIFAR-100 (+11.3%) with side-tap auxiliary design.   We identify architecture-specific trade-offs and provide clear guidelines for practitioners. LION-DG is simple, requires zero hyperparameters, and adds no computational overhead.


【6】The Homogeneity Trap: Spectral Collapse in Doubly-Stochastic Deep Networks
标题:同质性陷阱:双随机深度网络中的谱崩溃
链接:https://arxiv.org/abs/2601.02080

作者:Yizhi Liu
摘要:Doubly-stochastic matrices (DSM) are increasingly utilized in structure-preserving deep architectures -- such as Optimal Transport layers and Sinkhorn-based attention -- to enforce numerical stability and probabilistic interpretability. In this work, we identify a critical spectral degradation phenomenon inherent to these constraints, termed the Homogeneity Trap. We demonstrate that the maximum-entropy bias, typical of Sinkhorn-based projections, drives the mixing operator towards the uniform barycenter, thereby suppressing the subdominant singular value σ_2 and filtering out high-frequency feature components. We derive a spectral bound linking σ_2 to the network's effective depth, showing that high-entropy constraints restrict feature transformation to a shallow effective receptive field. Furthermore, we formally demonstrate that Layer Normalization fails to mitigate this collapse in noise-dominated regimes; specifically, when spectral filtering degrades the Signal-to-Noise Ratio (SNR) below a critical threshold, geometric structure is irreversibly lost to noise-induced orthogonal collapse. Our findings highlight a fundamental trade-off between entropic stability and spectral expressivity in DSM-constrained networks.


【7】Explore the Ideology of Deep Learning in ENSO Forecasts
标题:探索ENSO预测中的深度学习意识形态
链接:https://arxiv.org/abs/2601.02050

作者:Yanhai Gan,Yipeng Chen,Ning Li,Xingguo Liu,Junyu Dong,Xianyao Chen
备注:5 figures. Code available at https://github.com/liuxingguo9349/pptv-enso-env
摘要:The El Ni{~n}o-Southern Oscillation (ENSO) exerts profound influence on global climate variability, yet its prediction remains a grand challenge. Recent advances in deep learning have significantly improved forecasting skill, but the opacity of these models hampers scientific trust and operational deployment. Here, we introduce a mathematically grounded interpretability framework based on bounded variation function. By rescuing the "dead" neurons from the saturation zone of the activation function, we enhance the model's expressive capacity. Our analysis reveals that ENSO predictability emerges dominantly from the tropical Pacific, with contributions from the Indian and Atlantic Oceans, consistent with physical understanding. Controlled experiments affirm the robustness of our method and its alignment with established predictors. Notably, we probe the persistent Spring Predictability Barrier (SPB), finding that despite expanded sensitivity during spring, predictive performance declines-likely due to suboptimal variable selection. These results suggest that incorporating additional ocean-atmosphere variables may help transcend SPB limitations and advance long-range ENSO prediction.


【8】GDRO: Group-level Reward Post-training Suitable for Diffusion Models
标题:GDRO:适合扩散模型的小组级训练后奖励
链接:https://arxiv.org/abs/2601.02036

作者:Yiyang Wang,Xi Chen,Xiaogang Xu,Yu Liu,Hengshuang Zhao
摘要 :Recent advancements adopt online reinforcement learning (RL) from LLMs to text-to-image rectified flow diffusion models for reward alignment. The use of group-level rewards successfully aligns the model with the targeted reward. However, it faces challenges including low efficiency, dependency on stochastic samplers, and reward hacking. The problem is that rectified flow models are fundamentally different from LLMs: 1) For efficiency, online image sampling takes much more time and dominates the time of training. 2) For stochasticity, rectified flow is deterministic once the initial noise is fixed. Aiming at these problems and inspired by the effects of group-level rewards from LLMs, we design Group-level Direct Reward Optimization (GDRO). GDRO is a new post-training paradigm for group-level reward alignment that combines the characteristics of rectified flow models. Through rigorous theoretical analysis, we point out that GDRO supports full offline training that saves the large time cost for image rollout sampling. Also, it is diffusion-sampler-independent, which eliminates the need for the ODE-to-SDE approximation to obtain stochasticity. We also empirically study the reward hacking trap that may mislead the evaluation, and involve this factor in the evaluation using a corrected score that not only considers the original evaluation reward but also the trend of reward hacking. Extensive experiments demonstrate that GDRO effectively and efficiently improves the reward score of the diffusion model through group-wise offline optimization across the OCR and GenEval tasks, while demonstrating strong stability and robustness in mitigating reward hacking.


【9】Forget Less by Learning Together through Concept Consolidation
标题:通过概念整合共同学习,减少遗忘
链接:https://arxiv.org/abs/2601.01963

作者:Arjun Ramesh Kaushik,Naresh Kumar Devulapally,Vishnu Suresh Lokhande,Nalini Ratha,Venu Govindaraju
备注:Accepted at WACV-26
摘要:Custom Diffusion Models (CDMs) have gained significant attention due to their remarkable ability to personalize generative processes. However, existing CDMs suffer from catastrophic forgetting when continuously learning new concepts. Most prior works attempt to mitigate this issue under the sequential learning setting with a fixed order of concept inflow and neglect inter-concept interactions. In this paper, we propose a novel framework - Forget Less by Learning Together (FL2T) - that enables concurrent and order-agnostic concept learning while addressing catastrophic forgetting. Specifically, we introduce a set-invariant inter-concept learning module where proxies guide feature selection across concepts, facilitating improved knowledge retention and transfer. By leveraging inter-concept guidance, our approach preserves old concepts while efficiently incorporating new ones. Extensive experiments, across three datasets, demonstrates that our method significantly improves concept retention and mitigates catastrophic forgetting, highlighting the effectiveness of inter-concept catalytic behavior in incremental concept learning of ten tasks with at least 2% gain on average CLIP Image Alignment scores.


【10】SynRXN: An Open Benchmark and Curated Dataset for Computational Reaction Modeling
标题:SynRXN:计算反应建模的开放基准和精选数据集
链接:https://arxiv.org/abs/2601.01943

作者:Tieu-Long Phan,Nhu-Ngoc Nguyen Song,Peter F. Stadler
备注:31 pages (including references), 3 figures, 7 tables
摘要:We present SynRXN, a unified benchmarking framework and open-data resource for computer-aided synthesis planning (CASP). SynRXN decomposes end-to-end synthesis planning into five task families, covering reaction rebalancing, atom-to-atom mapping, reaction classification, reaction property prediction, and synthesis route design. Curated, provenance-tracked reaction corpora are assembled from heterogeneous public sources into a harmonized representation and packaged as versioned datasets for each task family, with explicit source metadata, licence tags, and machine-readable manifests that record checksums, and row counts. For every task, SynRXN provides transparent splitting functions that generate leakage-aware train, validation, and test partitions, together with standardized evaluation workflows and metric suites tailored to classification, regression, and structured prediction settings. For sensitive benchmarking, we combine public training and validation data with held-out gold-standard test sets, and contamination-prone tasks such as reaction rebalancing and atom-to-atom mapping are distributed only as evaluation sets and are explicitly not intended for model training. Scripted build recipes enable bitwise-reproducible regeneration of all corpora across machines and over time, and the entire resource is released under permissive open licences to support reuse and extension. By removing dataset heterogeneity and packaging transparent, reusable evaluation scaffolding, SynRXN enables fair longitudinal comparison of CASP methods, supports rigorous ablations and stress tests along the full reaction-informatics pipeline, and lowers the barrier for practitioners who seek robust and comparable performance estimates for real-world synthesis planning workloads.


【11】Utilizing Earth Foundation Models to Enhance the Simulation Performance of Hydrological Models with AlphaEarth Embeddings
标题:利用地球基础模型提高AlphaEarth嵌入式水文模型的模拟性能
链接:https://arxiv.org/abs/2601.01558

作者:Pengfei Qu,Wenyu Ouyang,Chi Zhang,Yikai Chai,Shuolong Xu,Lei Ye,Yongri Piao,Miao Zhang,Huchuan Lu
备注:12 pages, 11 figures
摘要:Predicting river flow in places without streamflow records is challenging because basins respond differently to climate, terrain, vegetation, and soils. Traditional basin attributes describe some of these differences, but they cannot fully represent the complexity of natural environments. This study examines whether AlphaEarth Foundation embeddings, which are learned from large collections of satellite images rather than designed by experts, offer a more informative way to describe basin characteristics. These embeddings summarize patterns in vegetation, land surface properties, and long-term environmental dynamics. We find that models using them achieve higher accuracy when predicting flows in basins not used for training, suggesting that they capture key physical differences more effectively than traditional attributes. We further investigate how selecting appropriate donor basins influences prediction in ungauged regions. Similarity based on the embeddings helps identify basins with comparable environmental and hydrological behavior, improving performance, whereas adding many dissimilar basins can reduce accuracy. The results show that satellite-informed environmental representations can strengthen hydrological forecasting and support the development of models that adapt more easily to different landscapes.


【12】Sobolev Approximation of Deep ReLU Network in Log-weighted Barron Space
标题:log加权Barron空间中深度ReLU网络的Sobolev逼近
链接 :https://arxiv.org/abs/2601.01295

作者:Changhoon Song,Seungchan Ko,Youngjoon Hong
摘要:Universal approximation theorems show that neural networks can approximate any continuous function; however, the number of parameters may grow exponentially with the ambient dimension, so these results do not fully explain the practical success of deep models on high-dimensional data. Barron space theory addresses this: if a target function belongs to a Barron space, a two-layer network with $n$ parameters achieves an $O(n^{-1/2})$ approximation error in $L^2$. Yet classical Barron spaces $\mathscr{B}^{s+1}$ still require stronger regularity than Sobolev spaces $H^s$, and existing depth-sensitive results often assume constraints such as $sL \le 1/2$. In this paper, we introduce a log-weighted Barron space $\mathscr{B}^{\log}$, which requires a strictly weaker assumption than $\mathscr{B}^s$ for any $s>0$. For this new function space, we first study embedding properties and carry out a statistical analysis via the Rademacher complexity. Then we prove that functions in $\mathscr{B}^{\log}$ can be approximated by deep ReLU networks with explicit depth dependence. We then define a family $\mathscr{B}^{s,\log}$, establish approximation bounds in the $H^1$ norm, and identify maximal depth scales under which these rates are preserved. Our results clarify how depth reduces regularity requirements for efficient representation, offering a more precise explanation for the performance of deep architectures beyond the classical Barron setting, and for their stable use in high-dimensional problems used today.


【13】Accelerated Full Waveform Inversion by Deep Compressed Learning
标题:通过深度压缩学习加速全波倒置
链接:https://arxiv.org/abs/2601.01268

作者:Maayan Gelboim,Amir Adler,Mauricio Araya-Polo
摘要:We propose and test a method to reduce the dimensionality of Full Waveform Inversion (FWI) inputs as computational cost mitigation approach. Given modern seismic acquisition systems, the data (as input for FWI) required for an industrial-strength case is in the teraflop level of storage, therefore solving complex subsurface cases or exploring multiple scenarios with FWI become prohibitive. The proposed method utilizes a deep neural network with a binarized sensing layer that learns by compressed learning a succinct but consequential seismic acquisition layout from a large corpus of subsurface models. Thus, given a large seismic data set to invert, the trained network selects a smaller subset of the data, then by using representation learning, an autoencoder computes latent representations of the data, followed by K-means clustering of the latent representations to further select the most relevant data for FWI. Effectively, this approach can be seen as a hierarchical selection. The proposed approach consistently outperforms random data sampling, even when utilizing only 10% of the data for 2D FWI, these results pave the way to accelerating FWI in large scale 3D inversion.


【14】The Dependency Divide: An Interpretable Machine Learning Framework for Profiling Student Digital Satisfaction in the Bangladesh Context
标题:依赖鸿沟:用于在孟加拉国环境中分析学生数字满意度的可解释机器学习框架
链接:https://arxiv.org/abs/2601.01231

作者:Md Muhtasim Munif Fahim,Humyra Ankona,Md Monimul Huq,Md Rezaul Karim
备注:Conference Paper
摘要:Background: While digital access has expanded rapidly in resource-constrained contexts, satisfaction with digital learning platforms varies significantly among students with seemingly equal connectivity. Traditional digital divide frameworks fail to explain these variations.   Purpose: This study introduces the "Dependency Divide", a novel framework proposing that highly engaged students become conditionally vulnerable to infrastructure failures, challenging assumptions that engagement uniformly benefits learners in post-access environments.   Methods: We conducted a cross-sectional study of 396 university students in Bangladesh using a three-stage analytical approach: (1) stability-validated K-prototypes clustering to identify student profiles, (2) profile-specific Random Forest models with SHAP and ALE analysis to determine satisfaction drivers, and (3) formal interaction analysis with propensity score matching to test the Dependency Divide hypothesis.   Results: Three distinct profiles emerged: Casually Engaged (58%), Efficient Learners (35%), and Hyper-Engaged (7%). A significant interaction between educational device time and internet reliability (\b{eta} = 0.033, p = 0.028) confirmed the Dependency Divide: engagement increased satisfaction only when infrastructure remained reliable. Hyper-Engaged students showed greatest vulnerability despite or because of their sophisticated digital workflows. Policy simulations demonstrated that targeted reliability improvements for high-dependency users yielded 2.06 times greater returns than uniform interventions.   Conclusions: In fragile infrastructure contexts, capability can become liability. Digital transformation policies must prioritize reliability for dependency-prone users, establish contingency systems, and educate students about dependency risks rather than uniformly promoting engagement.


【15】Promptable Foundation Models for SAR Remote Sensing: Adapting the Segment Anything Model for Snow Avalanche Segmentation
标题:适合SAR遥感的基础模型:调整分段Anything模型进行雪灾分割
链接:https://arxiv.org/abs/2601.01213

作者:Riccardo Gelato,Carlo Sgaravatti,Jakob Grahn,Giacomo Boracchi,Filippo Maria Bianchi
摘要 :Remote sensing solutions for avalanche segmentation and mapping are key to supporting risk forecasting and mitigation in mountain regions. Synthetic Aperture Radar (SAR) imagery from Sentinel-1 can be effectively used for this task, but training an effective detection model requires gathering a large dataset with high-quality annotations from domain experts, which is prohibitively time-consuming. In this work, we aim to facilitate and accelerate the annotation of SAR images for avalanche mapping. We build on the Segment Anything Model (SAM), a segmentation foundation model trained on natural images, and tailor it to Sentinel-1 SAR data. Adapting SAM to our use-case requires addressing several domain-specific challenges: (i) domain mismatch, since SAM was not trained on satellite/SAR imagery; (ii) input adaptation, because SAR products typically provide more than three channels, while SAM is constrained to RGB images; (iii) robustness to imprecise prompts that can affect target identification and degrade the segmentation quality, an issue exacerbated in small, low-contrast avalanches; and (iv) training efficiency, since standard fine-tuning is computationally demanding for SAM. We tackle these challenges through a combination of adapters to mitigate the domain gap, multiple encoders to handle multi-channel SAR inputs, prompt-engineering strategies to improve avalanche localization accuracy, and a training algorithm that limits the training time of the encoder, which is recognized as the major bottleneck. We integrate the resulting model into an annotation tool and show experimentally that it speeds up the annotation of SAR images.


【16】Flow Equivariant World Models: Memory for Partially Observed Dynamic Environments
标题:流等变世界模型:部分观察动态环境的记忆
链接:https://arxiv.org/abs/2601.01075

作者:Hansen Jin Lillemark,Benhao Huang,Fangneng Zhan,Yilun Du,Thomas Anderson Keller
备注:11 main text pages, 10 figures
摘要:Embodied systems experience the world as 'a symphony of flows': a combination of many continuous streams of sensory input coupled to self-motion, interwoven with the dynamics of external objects. These streams obey smooth, time-parameterized symmetries, which combine through a precisely structured algebra; yet most neural network world models ignore this structure and instead repeatedly re-learn the same transformations from data. In this work, we introduce 'Flow Equivariant World Models', a framework in which both self-motion and external object motion are unified as one-parameter Lie group 'flows'. We leverage this unification to implement group equivariance with respect to these transformations, thereby providing a stable latent world representation over hundreds of timesteps. On both 2D and 3D partially observed video world modeling benchmarks, we demonstrate that Flow Equivariant World Models significantly outperform comparable state-of-the-art diffusion-based and memory-augmented world modeling architectures -- particularly when there are predictable world dynamics outside the agent's current field of view. We show that flow equivariance is particularly beneficial for long rollouts, generalizing far beyond the training horizon. By structuring world model representations with respect to internal and external motion, flow equivariance charts a scalable route to data efficient, symmetry-guided, embodied intelligence. Project link: https://flowequivariantworldmodels.github.io.


【17】Tiny Machine Learning for Real-Time Aquaculture Monitoring: A Case Study in Morocco
标题:用于实时水产养殖监测的微型机器学习:摩洛哥的案例研究
链接:https://arxiv.org/abs/2601.01065

作者:Achraf Hsain,Yahya Zaki,Othman Abaakil,Hibat-allah Bekkar,Yousra Chtouki
备注:Published in IEEE GCAIoT 2024
摘要:Aquaculture, the farming of aquatic organisms, is a rapidly growing industry facing challenges such as water quality fluctuations, disease outbreaks, and inefficient feed management. Traditional monitoring methods often rely on manual labor and are time consuming, leading to potential delays in addressing issues. This paper proposes the integration of low-power edge devices using Tiny Machine Learning (TinyML) into aquaculture systems to enable real-time automated monitoring and control, such as collecting data and triggering alarms, and reducing labor requirements. The system provides real-time data on the required parameters such as pH levels, temperature, dissolved oxygen, and ammonia levels to control water quality, nutrient levels, and environmental conditions enabling better maintenance, efficient resource utilization, and optimal management of the enclosed aquaculture space. The system enables alerts in case of anomaly detection. The data collected by the sensors over time can serve for important decision-making regarding optimizing water treatment processes, feed distribution, feed pattern analysis and improve feed efficiency, reducing operational costs. This research explores the feasibility of developing TinyML-based solutions for aquaculture monitoring, considering factors such as sensor selection, algorithm design, hardware constraints, and ethical considerations. By demonstrating the potential benefits of TinyML in aquaculture, our aim is to contribute to the development of more sustainable and efficient farming practices.


【18】Data-Driven Assessment of Concrete Mixture Compositions on Chloride Transport via Standalone Machine Learning Algorithms
标题:基于独立机器学习算法的混凝土混合物成分对氯离子迁移的数据驱动评估
链接:https://arxiv.org/abs/2601.01009

作者:Mojtaba Aliasghar-Mamaghani,Mohammadreza Khalafi
摘要:This paper employs a data-driven approach to determine the impact of concrete mixture compositions on the temporal evolution of chloride in concrete structures. This is critical for assessing the service life of civil infrastructure subjected to aggressive environments. The adopted methodology relies on several simple and complex standalone machine learning (ML) algorithms, with the primary objective of establishing confidence in the unbiased prediction of the underlying hidden correlations. The simple algorithms include linear regression (LR), k-nearest neighbors (KNN) regression, and kernel ridge regression (KRR). The complex algorithms entail support vector regression (SVR), Gaussian process regression (GPR), and two families of artificial neural networks, including a feedforward network (multilayer perceptron, MLP) and a gated recurrent unit (GRU). The MLP architecture cannot explicitly handle sequential data, a limitation addressed by the GRU. A comprehensive dataset is considered. The performance of ML algorithms is evaluated, with KRR, GPR, and MLP exhibiting high accuracy. Given the diversity of the adopted concrete mixture proportions, the GRU was unable to accurately reproduce the response in the test set. Further analyses elucidate the contributions of mixture compositions to the temporal evolution of chloride. The results obtained from the GPR model unravel latent correlations through clear and explainable trends. The MLP, SVR, and KRR also provide acceptable estimates of the overall trends. The majority of mixture components exhibit an inverse relation with chloride content, while a few components demonstrate a direct correlation. These findings highlight the potential of surrogate approaches for describing the physical processes involved in chloride ingress and the associated correlations, toward the ultimate goal of enhancing the service life of civil infrastructure.


【19】Harvesting AlphaEarth: Benchmarking the Geospatial Foundation Model for Agricultural Downstream Tasks
标题:收获AlphaEarth:农业下游任务的地理空间基础模型基准
链接:https://arxiv.org/abs/2601.00857

作者:Yuchi Ma,Yawen Shen,Anu Swatantran,David B. Lobell
摘要 :Geospatial foundation models (GFMs) have emerged as a promising approach to overcoming the limitations in existing featurization methods. More recently, Google DeepMind has introduced AlphaEarth Foundation (AEF), a GFM pre-trained using multi-source EOs across continuous time. An annual and global embedding dataset is produced using AEF that is ready for analysis and modeling. The internal experiments show that AEF embeddings have outperformed operational models in 15 EO tasks without re-training. However, those experiments are mostly about land cover and land use classification. Applying AEF and other GFMs to agricultural monitoring require an in-depth evaluation in critical agricultural downstream tasks. There is also a lack of comprehensive comparison between the AEF-based models and traditional remote sensing (RS)-based models under different scenarios, which could offer valuable guidance for researchers and practitioners. This study addresses some of these gaps by evaluating AEF embeddings in three agricultural downstream tasks in the U.S., including crop yield prediction, tillage mapping, and cover crop mapping. Datasets are compiled from both public and private sources to comprehensively evaluate AEF embeddings across tasks at different scales and locations, and RS-based models are trained as comparison models. AEF-based models generally exhibit strong performance on all tasks and are competitive with purpose-built RS-based models in yield prediction and county-level tillage mapping when trained on local data. However, we also find several limitations in current AEF embeddings, such as limited spatial transferability compared to RS-based models, low interpretability, and limited time sensitivity. These limitations recommend caution when applying AEF embeddings in agriculture, where time sensitivity, generalizability, and interpretability is important.


【20】Value-guided action planning with JEPA world models
标题:利用JEPA世界模型进行价值观引导的行动规划
链接:https://arxiv.org/abs/2601.00844

作者:Matthieu Destrade,Oumayma Bounou,Quentin Le Lidec,Jean Ponce,Yann LeCun
备注:Presented as a poster at the World Modeling Workshop 2026, Mila
摘要:Building deep learning models that can reason about their environment requires capturing its underlying dynamics. Joint-Embedded Predictive Architectures (JEPA) provide a promising framework to model such dynamics by learning representations and predictors through a self-supervised prediction objective. However, their ability to support effective action planning remains limited. We propose an approach to enhance planning with JEPA world models by shaping their representation space so that the negative goal-conditioned value function for a reaching cost in a given environment is approximated by a distance (or quasi-distance) between state embeddings. We introduce a practical method to enforce this constraint during training and show that it leads to significantly improved planning performance compared to standard JEPA models on simple control tasks.


【21】Intrinsic-Metric Physics-Informed Neural Networks (IM-PINN) for Reaction-Diffusion Dynamics on Complex Riemannian Manifolds
标题:复Riemann Manifian上反应扩散动力学的本质度量物理信息神经网络(IM-PINN)
链接:https://arxiv.org/abs/2601.00834

作者:Julian Evan Chrisnanto,Salsabila Rahma Alia,Nurfauzi Fadillah,Yulison Herry Chrisnanto
备注:19 pages, 7 figures
摘要:Simulating nonlinear reaction-diffusion dynamics on complex, non-Euclidean manifolds remains a fundamental challenge in computational morphogenesis, constrained by high-fidelity mesh generation costs and symplectic drift in discrete time-stepping schemes. This study introduces the Intrinsic-Metric Physics-Informed Neural Network (IM-PINN), a mesh-free geometric deep learning framework that solves partial differential equations directly in the continuous parametric domain. By embedding the Riemannian metric tensor into the automatic differentiation graph, our architecture analytically reconstructs the Laplace-Beltrami operator, decoupling solution complexity from geometric discretization. We validate the framework on a "Stochastic Cloth" manifold with extreme Gaussian curvature fluctuations ($K \in [-2489, 3580]$), where traditional adaptive refinement fails to resolve anisotropic Turing instabilities. Using a dual-stream architecture with Fourier feature embeddings to mitigate spectral bias, the IM-PINN recovers the "splitting spot" and "labyrinthine" regimes of the Gray-Scott model. Benchmarking against the Surface Finite Element Method (SFEM) reveals superior physical rigor: the IM-PINN achieves global mass conservation error of $\mathcal{E}_{mass} \approx 0.157$ versus SFEM's $0.258$, acting as a thermodynamically consistent global solver that eliminates mass drift inherent in semi-implicit integration. The framework offers a memory-efficient, resolution-independent paradigm for simulating biological pattern formation on evolving surfaces, bridging differential geometry and physics-informed machine learning.


【22】MathLedger: A Verifiable Learning Substrate with Ledger-Attested Feedback
标题:MathLedger:一个可验证的学习平台,带有Ledger证明的反馈
链接:https://arxiv.org/abs/2601.00816

作者:Ismail Ahmad Abdullah
备注:14 pages, 1 figure, 2 tables, 2 appendices with full proofs. Documents v0.9.4-pilot-audit-hardened audit surface with fail-closed governance, canonical JSON hashing, and artifact classification. Phase I infrastructure validation; no capability claims
摘要:Contemporary AI systems achieve extraordinary performance yet remain opaque and non-verifiable, creating a crisis of trust for safety-critical deployment. We introduce MathLedger, a substrate for verifiable machine cognition that integrates formal verification, cryptographic attestation, and learning dynamics into a single epistemic loop. The system implements Reflexive Formal Learning (RFL), a symbolic analogue of gradient descent where updates are driven by verifier outcomes rather than statistical loss.   Phase I experiments validate the measurement and governance substrate under controlled conditions. CAL-EXP-3 validates measurement infrastructure (Delta p computation, variance tracking); separate stress tests confirm fail-closed governance triggers correctly under out-of-bounds conditions. No convergence or capability claims are made. The contribution is infrastructural: a working prototype of ledger-attested learning that enables auditability at scale.   Keywords: verifiable learning, formal verification, cryptographic attestation, reflexive feedback, fail-closed governance


【23】ChronoPlastic Spiking Neural Networks
标题:ChronoPlastic尖峰神经网络
链接:https://arxiv.org/abs/2601.00805

作者:Sarim Chaudhry
备注:21 pages, 6 figures
摘要:Spiking neural networks (SNNs) offer a biologically grounded and energy-efficient alternative to conventional neural architectures; however, they struggle with long-range temporal dependencies due to fixed synaptic and membrane time constants. This paper introduces ChronoPlastic Spiking Neural Networks (CPSNNs), a novel architectural principle that enables adaptive temporal credit assignment by dynamically modulating synaptic decay rates conditioned on the state of the network. CPSNNs maintain multiple internal temporal traces and learn a continuous time-warping function that selectively preserves task-relevant information while rapidly forgetting noise. Unlike prior approaches based on adaptive membrane constants, attention mechanisms, or external memory, CPSNNs embed temporal control directly within local synaptic dynamics, preserving linear-time complexity and neuromorphic compatibility. We provide a formal description of the model, analyze its computational properties, and demonstrate empirically that CPSNNs learn long-gap temporal dependencies significantly faster and more reliably than standard SNN baselines. Our results suggest that adaptive temporal modulation is a key missing ingredient for scalable temporal learning in spiking systems.


【24】Machine learning modularity
标题:机器学习模块化
链接:https://arxiv.org/abs/2601.01779

作者:Yi Fan,Vishnu Jejjala,Yang Lei
备注:34 pages, 7 figures, 6 tables
摘要:Based on a transformer based sequence-to-sequence architecture combined with a dynamic batching algorithm, this work introduces a machine learning framework for automatically simplifying complex expressions involving multiple elliptic Gamma functions, including the $q$-$θ$ function and the elliptic Gamma function. The model learns to apply algebraic identities, particularly the SL$(2,\mathbb{Z})$ and SL$(3,\mathbb{Z})$ modular transformations, to reduce heavily scrambled expressions to their canonical forms. Experimental results show that the model achieves over 99\% accuracy on in-distribution tests and maintains robust performance (exceeding 90\% accuracy) under significant extrapolation, such as with deeper scrambling depths. This demonstrates that the model has internalized the underlying algebraic rules of modular transformations rather than merely memorizing training patterns. Our work presents the first successful application of machine learning to perform symbolic simplification using modular identities, offering a new automated tool for computations with special functions in quantum field theory and the string theory.


【25】Learning Relationship between Quantum Walks and Underdamped Langevin Dynamics
标题:量子行走与欠衰减朗之万动力学之间的学习关系
链接:https://arxiv.org/abs/2601.01589

作者:Yazhen Wang
摘要:Fast computational algorithms are in constant demand, and their development has been driven by advances such as quantum speedup and classical acceleration. This paper intends to study search algorithms based on quantum walks in quantum computation and sampling algorithms based on Langevin dynamics in classical computation. On the quantum side, quantum walk-based search algorithms can achieve quadratic speedups over their classical counterparts. In classical computation, a substantial body of work has focused on gradient acceleration, with gradient-adjusted algorithms derived from underdamped Langevin dynamics providing quadratic acceleration over conventional Langevin algorithms.   Since both search and sampling algorithms are designed to address learning tasks, we study learning relationship between coined quantum walks and underdamped Langevin dynamics. Specifically, we show that, in terms of the Le Cam deficiency distance, a quantum walk with randomization is asymptotically equivalent to underdamped Langevin dynamics, whereas the quantum walk without randomization is not asymptotically equivalent due to its high-frequency oscillatory behavior. We further discuss the implications of these equivalence and nonequivalence results for the computational and inferential properties of the associated algorithms in machine learning tasks. Our findings offer new insight into the relationship between quantum walks and underdamped Langevin dynamics, as well as the intrinsic mechanisms underlying quantum speedup and classical gradient acceleration.


【26】Modeling Information Blackouts in Missing Not-At-Random Time Series Data
标题:缺失非随机时间序列数据中的信息中断建模
链接:https://arxiv.org/abs/2601.01480

作者:Aman Sunesh,Allan Ma,Siddarth Nilol
备注:8 pages, 7 figures, 3 tables
摘要:Large-scale traffic forecasting relies on fixed sensor networks that often exhibit blackouts: contiguous intervals of missing measurements caused by detector or communication failures. These outages are typically handled under a Missing At Random (MAR) assumption, even though blackout events may correlate with unobserved traffic conditions (e.g., congestion or anomalous flow), motivating a Missing Not At Random (MNAR) treatment. We propose a latent state-space framework that jointly models (i) traffic dynamics via a linear dynamical system and (ii) sensor dropout via a Bernoulli observation channel whose probability depends on the latent traffic state. Inference uses an Extended Kalman Filter with Rauch-Tung-Striebel smoothing, and parameters are learned via an approximate EM procedure with a dedicated update for detector-specific missingness parameters. On the Seattle inductive loop detector data, introducing latent dynamics yields large gains over naive baselines, reducing blackout imputation RMSE from 7.02 (LOCF) and 5.02 (linear interpolation + seasonal naive) to 4.23 (MAR LDS), corresponding to about a 64% reduction in MSE relative to LOCF. Explicit MNAR modeling provides a consistent but smaller additional improvement on real data (imputation RMSE 4.20; 0.8% RMSE reduction relative to MAR), with similar modest gains for short-horizon post-blackout forecasts (evaluated at 1, 3, and 6 steps). In controlled synthetic experiments, the MNAR advantage increases as the true missingness dependence on latent state strengthens. Overall, temporal dynamics dominate performance, while MNAR modeling offers a principled refinement that becomes most valuable when missingness is genuinely informative.


【27】Fast Gibbs Sampling on Bayesian Hidden Markov Model with Missing Observations
标题:具有缺失观测值的贝叶斯隐马尔可夫模型的快速Gibbs抽样
链接:https://arxiv.org/abs/2601.01442

作者:Dongrong Li,Tianwei Yu,Xiaodan Fan
备注:45 pages, 2 figures
摘要:The Hidden Markov Model (HMM) is a widely-used statistical model for handling sequential data. However, the presence of missing observations in real-world datasets often complicates the application of the model. The EM algorithm and Gibbs samplers can be used to estimate the model, yet suffering from various problems including non-convexity, high computational complexity and slow mixing. In this paper, we propose a collapsed Gibbs sampler that efficiently samples from HMMs' posterior by integrating out both the missing observations and the corresponding latent states. The proposed sampler is fast due to its three advantages. First, it achieves an estimation accuracy that is comparable to existing methods. Second, it can produce a larger Effective Sample Size (ESS) per iteration, which can be justified theoretically and numerically. Third, when the number of missing entries is large, the sampler has a significant smaller computational complexity per iteration compared to other methods, thus is faster computationally. In summary, the proposed sampling algorithm is fast both computationally and theoretically and is particularly advantageous when there are a lot of missing entries. Finally, empirical evaluations based on numerical simulations and real data analysis demonstrate that the proposed algorithm consistently outperforms existing algorithms in terms of time complexity and sampling efficiency (measured in ESS).


【28】Evidence Slopes and Effective Dimension in Singular Linear Models
标题:奇异线性模型中的证据斜率和有效维数
链接:https://arxiv.org/abs/2601.01238

作者:Kalyaan Rao
备注:Preprint. 10 pages, 6 figures. Under review
摘要:Bayesian model selection commonly relies on Laplace approximation or the Bayesian Information Criterion (BIC), which assume that the effective model dimension equals the number of parameters. Singular learning theory replaces this assumption with the real log canonical threshold (RLCT), an effective dimension that can be strictly smaller in overparameterized or rank-deficient models.   We study linear-Gaussian rank models and linear subspace (dictionary) models in which the exact marginal likelihood is available in closed form and the RLCT is analytically tractable. In this setting, we show theoretically and empirically that the error of Laplace/BIC grows linearly with (d/2 minus lambda) times log n, where d is the ambient parameter dimension and lambda is the RLCT. An RLCT-aware correction recovers the correct evidence slope and is invariant to overcomplete reparameterizations that represent the same data subspace.   Our results provide a concrete finite-sample characterization of Laplace failure in singular models and demonstrate that evidence slopes can be used as a practical estimator of effective dimension in simple linear settings.


【29】Neural Networks on Symmetric Spaces of Noncompact Type
标题:非紧型对称空间上的神经网络
链接:https://arxiv.org/abs/2601.01097

作者:Xuan Son Nguyen,Shuo Yang,Aymeric Histace
摘要:Recent works have demonstrated promising performances of neural networks on hyperbolic spaces and symmetric positive definite (SPD) manifolds. These spaces belong to a family of Riemannian manifolds referred to as symmetric spaces of noncompact type. In this paper, we propose a novel approach for developing neural networks on such spaces. Our approach relies on a unified formulation of the distance from a point to a hyperplane on the considered spaces. We show that some existing formulations of the point-to-hyperplane distance can be recovered by our approach under specific settings. Furthermore, we derive a closed-form expression for the point-to-hyperplane distance in higher-rank symmetric spaces of noncompact type equipped with G-invariant Riemannian metrics. The derived distance then serves as a tool to design fully-connected (FC) layers and an attention mechanism for neural networks on the considered spaces. Our approach is validated on challenging benchmarks for image classification, electroencephalogram (EEG) signal classification, image generation, and natural language inference.


【30】Deep Learning Framework for RNA Inverse Folding with Geometric Structure Potentials
标题:具有几何结构潜力的RNA逆折叠深度学习框架
链接:https://arxiv.org/abs/2601.00895

作者:Annabelle Yao
摘要:RNA's diverse biological functions stem from its structural versatility, yet accurately predicting and designing RNA sequences given a 3D conformation (inverse folding) remains a challenge. Here, I introduce a deep learning framework that integrates Geometric Vector Perceptron (GVP) layers with a Transformer architecture to enable end-to-end RNA design. I construct a dataset consisting of experimentally solved RNA 3D structures, filtered and deduplicated from the BGSU RNA list, and evaluate performance using both sequence recovery rate and TM-score to assess sequence and structural fidelity, respectively. On standard benchmarks and RNA-Puzzles, my model achieves state-of-the-art performance, with recovery and TM-scores of 0.481 and 0.332, surpassing existing methods across diverse RNA families and length scales. Masked family-level validation using Rfam annotations confirms strong generalization beyond seen families. Furthermore, inverse-folded sequences, when refolded using AlphaFold3, closely resemble native structures, highlighting the critical role of geometric features captured by GVP layers in enhancing Transformer-based RNA design.


其他(53篇)

【1】DatBench: Discriminative, Faithful, and Efficient VLM Evaluations
标题:DatBench:区分性、忠实且高效的VLM评估
链接:https://arxiv.org/abs/2601.02316

作者:Siddharth Joshi,Haoli Yin,Rishabh Adiga,Ricardo Monti,Aldo Carranza,Alex Fang,Alvin Deng,Amro Abbas,Brett Larsen,Cody Blakeney,Darren Teh,David Schwab,Fan Pan,Haakon Mongstad,Jack Urbanek,Jason Lee,Jason Telanoff,Josh Wills,Kaleigh Mentzer,Luke Merrick,Parth Doshi,Paul Burstein,Pratyush Maini,Scott Loftin,Spandan Das,Tony Jiang,Vineeth Dorna,Zhengping Wang,Bogdan Gaza,Ari Morcos,Matthew Leavitt
摘要 :Empirical evaluation serves as the primary compass guiding research progress in foundation models. Despite a large body of work focused on training frontier vision-language models (VLMs), approaches to their evaluation remain nascent. To guide their maturation, we propose three desiderata that evaluations should satisfy: (1) faithfulness to the modality and application, (2) discriminability between models of varying quality, and (3) efficiency in compute. Through this lens, we identify critical failure modes that violate faithfulness and discriminability, misrepresenting model capabilities: (i) multiple-choice formats reward guessing, poorly reflect downstream use cases, and saturate early as models improve; (ii) blindly solvable questions, which can be answered without images, constitute up to 70% of some evaluations; and (iii) mislabeled or ambiguous samples compromise up to 42% of examples in certain datasets. Regarding efficiency, the computational burden of evaluating frontier models has become prohibitive: by some accounts, nearly 20% of development compute is devoted to evaluation alone. Rather than discarding existing benchmarks, we curate them via transformation and filtering to maximize fidelity and discriminability. We find that converting multiple-choice questions to generative tasks reveals sharp capability drops of up to 35%. In addition, filtering blindly solvable and mislabeled samples improves discriminative power while simultaneously reducing computational cost. We release DatBench-Full, a cleaned evaluation suite of 33 datasets spanning nine VLM capabilities, and DatBench, a discriminative subset that achieves 13x average speedup (up to 50x) while closely matching the discriminative power of the original datasets. Our work outlines a path toward evaluation practices that are both rigorous and sustainable as VLMs continue to scale.


【2】VIBE: Visual Instruction Based Editor
标题:VMBE:基于视觉指令的编辑器
链接:https://arxiv.org/abs/2601.02242

作者:Grigorii Alekseenko,Aleksandr Gordeev,Irina Tolstykh,Bulat Suleimanov,Vladimir Dokholyan,Georgii Fedorov,Sergey Yakubson,Aleksandra Tsybina,Mikhail Chernyshov,Maksim Kuprashevich
摘要:Instruction-based image editing is among the fastest developing areas in generative AI. Over the past year, the field has reached a new level, with dozens of open-source models released alongside highly capable commercial systems. However, only a limited number of open-source approaches currently achieve real-world quality. In addition, diffusion backbones, the dominant choice for these pipelines, are often large and computationally expensive for many deployments and research settings, with widely used variants typically containing 6B to 20B parameters. This paper presents a compact, high-throughput instruction-based image editing pipeline that uses a modern 2B-parameter Qwen3-VL model to guide the editing process and the 1.6B-parameter diffusion model Sana1.5 for image generation. Our design decisions across architecture, data processing, training configuration, and evaluation target low-cost inference and strict source consistency while maintaining high quality across the major edit categories feasible at this scale. Evaluated on the ImgEdit and GEdit benchmarks, the proposed method matches or exceeds the performance of substantially heavier baselines, including models with several times as many parameters and higher inference cost, and is particularly strong on edits that require preserving the input image, such as an attribute adjustment, object removal, background edits, and targeted replacement. The model fits within 24 GB of GPU memory and generates edited images at up to 2K resolution in approximately 4 seconds on an NVIDIA H100 in BF16, without additional inference optimizations or distillation.


【3】FormationEval, an open multiple-choice benchmark for petroleum geoscience
标题:FormationEval,石油地球科学开放多项选择基准
链接:https://arxiv.org/abs/2601.02158

作者:Almaz Ermilov
备注:24 pages, 8 figures, 10 tables; benchmark and code at https://github.com/AlmazErmilov/FormationEval-an-Open-Benchmark-for-Oil-Gas-Geoscience-MCQ-Evaluation
摘要:This paper presents FormationEval, an open multiple-choice question benchmark for evaluating language models on petroleum geoscience and subsurface disciplines. The dataset contains 505 questions across seven domains including petrophysics, petroleum geology and reservoir engineering, derived from three authoritative sources using a reasoning model with detailed instructions and a concept-based approach that avoids verbatim copying of copyrighted text. Each question includes source metadata to support traceability and audit. The evaluation covers 72 models from major providers including OpenAI, Anthropic, Google, Meta and open-weight alternatives. The top performers achieve over 97\% accuracy, with Gemini 3 Pro Preview reaching 99.8\%, while tier and domain gaps persist. Among open-weight models, GLM-4.7 leads at 98.6\%, with several DeepSeek, Llama, Qwen and Mistral models also exceeding 93\%. The performance gap between open-weight and closed models is narrower than expected, with several lower-cost open-weight models exceeding 90\% accuracy. Petrophysics emerges as the most challenging domain across all models, while smaller models show wider performance variance. Residual length bias in the dataset (correct answers tend to be longer) is documented along with bias mitigation strategies applied during construction. The benchmark, evaluation code and results are publicly available.


【4】Prior Diffusiveness and Regret in the Linear-Gaussian Bandit
标题:线性高斯强盗中的先验扩散与遗憾
链接:https://arxiv.org/abs/2601.02022

作者:Yifan Zhu,John C. Duchi,Benjamin Van Roy
摘要:We prove that Thompson sampling exhibits $\tilde{O}(σd \sqrt{T} + d r \sqrt{\mathrm{Tr}(Σ_0)})$ Bayesian regret in the linear-Gaussian bandit with a $\mathcal{N}(μ_0, Σ_0)$ prior distribution on the coefficients, where $d$ is the dimension, $T$ is the time horizon, $r$ is the maximum $\ell_2$ norm of the actions, and $σ^2$ is the noise variance. In contrast to existing regret bounds, this shows that to within logarithmic factors, the prior-dependent ``burn-in'' term $d r \sqrt{\mathrm{Tr}(Σ_0)}$ decouples additively from the minimax (long run) regret $σd \sqrt{T}$. Previous regret bounds exhibit a multiplicative dependence on these terms. We establish these results via a new ``elliptical potential'' lemma, and also provide a lower bound indicating that the burn-in term is unavoidable.


【5】SerpentFlow: Generative Unpaired Domain Alignment via Shared-Structure Decomposition
标题:SerpentFlow:通过共享结构分解生成的未配对域对齐
链接:https://arxiv.org/abs/2601.01979

作者:Julie Keisler,Anastase Alexandre Charantonis,Yannig Goude,Boutheina Oueslati,Claire Monteleoni
摘要 :Domain alignment refers broadly to learning correspondences between data distributions from distinct domains. In this work, we focus on a setting where domains share underlying structural patterns despite differences in their specific realizations. The task is particularly challenging in the absence of paired observations, which removes direct supervision across domains. We introduce a generative framework, called SerpentFlow (SharEd-structuRe decomPosition for gEnerative domaiN adapTation), for unpaired domain alignment. SerpentFlow decomposes data within a latent space into a shared component common to both domains and a domain-specific one. By isolating the shared structure and replacing the domain-specific component with stochastic noise, we construct synthetic training pairs between shared representations and target-domain samples, thereby enabling the use of conditional generative models that are traditionally restricted to paired settings. We apply this approach to super-resolution tasks, where the shared component naturally corresponds to low-frequency content while high-frequency details capture domain-specific variability. The cutoff frequency separating low- and high-frequency components is determined automatically using a classifier-based criterion, ensuring a data-driven and domain-adaptive decomposition. By generating pseudo-pairs that preserve low-frequency structures while injecting stochastic high-frequency realizations, we learn the conditional distribution of the target domain given the shared representation. We implement SerpentFlow using Flow Matching as the generative pipeline, although the framework is compatible with other conditional generative approaches. Experiments on synthetic images, physical process simulations, and a climate downscaling task demonstrate that the method effectively reconstructs high-frequency structures consistent with underlying low-frequency patterns, supporting shared-structure decomposition as an effective strategy for unpaired domain alignment.


【6】DéjàQ: Open-Ended Evolution of Diverse, Learnable and Verifiable Problems
标题:DéjàQ:多样化、可学习和可验证问题的开放式演变
链接:https://arxiv.org/abs/2601.01931

作者:Willem Röpke,Samuel Coward,Andrei Lupu,Thomas Foster,Tim Rocktäschel,Jakob Foerster
摘要:Recent advances in reasoning models have yielded impressive results in mathematics and coding. However, most approaches rely on static datasets, which have been suggested to encourage memorisation and limit generalisation. We introduce DéjàQ, a framework that departs from this paradigm by jointly evolving a diverse set of synthetic mathematical problems alongside model training. This evolutionary process adapts to the model's ability throughout training, optimising problems for learnability. We propose two LLM-driven mutation strategies in which the model itself mutates the training data, either by altering contextual details or by directly modifying problem structure. We find that the model can generate novel and meaningful problems, and that these LLM-driven mutations improve RL training. We analyse key aspects of DéjàQ, including the validity of generated problems and computational overhead. Our results underscore the potential of dynamically evolving training data to enhance mathematical reasoning and indicate broader applicability, which we will support by open-sourcing our code.


【7】TT-FSI: Scalable Faithful Shapley Interactions via Tensor-Train
标题:TT-FSI:通过张量训练的可扩展忠实Shapley交互
链接:https://arxiv.org/abs/2601.01903

作者:Ungsik Kim,Suwon Lee
摘要:The Faithful Shapley Interaction (FSI) index uniquely satisfies the faithfulness axiom among Shapley interaction indices, but computing FSI requires $O(d^\ell \cdot 2^d)$ time and existing implementations use $O(4^d)$ memory. We present TT-FSI, which exploits FSI's algebraic structure via Matrix Product Operators (MPO). Our main theoretical contribution is proving that the linear operator $v \mapsto \text{FSI}(v)$ admits an MPO representation with TT-rank $O(\ell d)$, enabling an efficient sweep algorithm with $O(\ell^2 d^3 \cdot 2^d)$ time and $O(\ell d^2)$ core storage an exponential improvement over existing methods. Experiments on six datasets ($d=8$ to $d=20$) demonstrate up to 280$\times$ speedup over baseline, 85$\times$ over SHAP-IQ, and 290$\times$ memory reduction. TT-FSI scales to $d=20$ (1M coalitions) where all competing methods fail.


【8】SafeLoad: Efficient Admission Control Framework for Identifying Memory-Overloading Queries in Cloud Data Warehouses
标题:SafeBoot:用于识别云数据仓库中内存超载的高效准入控制框架
链接:https://arxiv.org/abs/2601.01888

作者:Yifan Wu,Yuhan Li,Zhenhua Wang,Zhongle Xie,Dingyu Yang,Ke Chen,Lidan Shou,Bo Tang,Liang Lin,Huan Li,Gang Chen
备注:This paper has been accepted for presentation at VLDB 2026
摘要:Memory overload is a common form of resource exhaustion in cloud data warehouses. When database queries fail due to memory overload, it not only wastes critical resources such as CPU time but also disrupts the execution of core business processes, as memory-overloading (MO) queries are typically part of complex workflows. If such queries are identified in advance and scheduled to memory-rich serverless clusters, it can prevent resource wastage and query execution failure. Therefore, cloud data warehouses desire an admission control framework with high prediction precision, interpretability, efficiency, and adaptability to effectively identify MO queries. However, existing admission control frameworks primarily focus on scenarios like SLA satisfaction and resource isolation, with limited precision in identifying MO queries. Moreover, there is a lack of publicly available MO-labeled datasets with workloads for training and benchmarking. To tackle these challenges, we propose SafeLoad, the first query admission control framework specifically designed to identify MO queries. Alongside, we release SafeBench, an open-source, industrial-scale benchmark for this task, which includes 150 million real queries. SafeLoad first filters out memory-safe queries using the interpretable discriminative rule. It then applies a hybrid architecture that integrates both a global model and cluster-level models, supplemented by a misprediction correction module to identify MO queries. Additionally, a self-tuning quota management mechanism dynamically adjusts prediction quotas per cluster to improve precision. Experimental results show that SafeLoad achieves state-of-the-art prediction performance with low online and offline time overhead. Specifically, SafeLoad improves precision by up to 66% over the best baseline and reduces wasted CPU time by up to 8.09x compared to scenarios without SafeLoad.


【9】RealPDEBench: A Benchmark for Complex Physical Systems with Real-World Data
标题:RealPDEBench:具有现实世界数据的复杂物理系统的基准
链接:https://arxiv.org/abs/2601.01829

作者 :Peiyan Hu,Haodong Feng,Hongyuan Liu,Tongtong Yan,Wenhao Deng,Tianrun Gao,Rong Zheng,Haoren Zheng,Chenglei Yu,Chuanrui Wang,Kaiwen Li,Zhi-Ming Ma,Dezhi Zhou,Xingcai Lu,Dixia Fan,Tailin Wu
备注:46 pages, 21 figures
摘要:Predicting the evolution of complex physical systems remains a central problem in science and engineering. Despite rapid progress in scientific Machine Learning (ML) models, a critical bottleneck is the lack of expensive real-world data, resulting in most current models being trained and validated on simulated data. Beyond limiting the development and evaluation of scientific ML, this gap also hinders research into essential tasks such as sim-to-real transfer. We introduce RealPDEBench, the first benchmark for scientific ML that integrates real-world measurements with paired numerical simulations. RealPDEBench consists of five datasets, three tasks, eight metrics, and ten baselines. We first present five real-world measured datasets with paired simulated datasets across different complex physical systems. We further define three tasks, which allow comparisons between real-world and simulated data, and facilitate the development of methods to bridge the two. Moreover, we design eight evaluation metrics, spanning data-oriented and physics-oriented metrics, and finally benchmark ten representative baselines, including state-of-the-art models, pretrained PDE foundation models, and a traditional method. Experiments reveal significant discrepancies between simulated and real-world data, while showing that pretraining with simulated data consistently improves both accuracy and convergence. In this work, we hope to provide insights from real-world data, advancing scientific ML toward bridging the sim-to-real gap and real-world deployment. Our benchmark, datasets, and instructions are available at https://realpdebench.github.io/.


【10】HyperCLOVA X 8B Omni
标题:HyperCLVA X 8B Omni
链接:https://arxiv.org/abs/2601.01792

作者:NAVER Cloud HyperCLOVA X Team
备注:Technical Report
摘要:In this report, we present HyperCLOVA X 8B Omni, the first any-to-any omnimodal model in the HyperCLOVA X family that supports text, audio, and vision as both inputs and outputs. By consolidating multimodal understanding and generation into a single model rather than separate modality-specific pipelines, HyperCLOVA X 8B Omni serves as an 8B-scale omni-pathfinding point toward practical any-to-any omni assistants. At a high level, the model unifies modalities through a shared next-token prediction interface over an interleaved multimodal sequence, while vision and audio encoders inject continuous embeddings for fine-grained understanding and grounding. Empirical evaluations demonstrate competitive performance against comparably sized models across diverse input-output combinations spanning text, audio, and vision, in both Korean and English. We anticipate that the open-weight release of HyperCLOVA X 8B Omni will support a wide range of research and deployment scenarios.


【11】UnPII: Unlearning Personally Identifiable Information with Quantifiable Exposure Risk
标题:UnPRI:放弃具有可量化暴露风险的个人可识别信息
链接:https://arxiv.org/abs/2601.01786

作者:Intae Jeon,Yujeong Kwon,Hyungjoon Koo
备注:11 pages, 7 Tables, 6 Figures To appear in the Software Engineering in Practice (SEIP) track of ICSE
摘要:The ever-increasing adoption of Large Language Models in critical sectors like finance, healthcare, and government raises privacy concerns regarding the handling of sensitive Personally Identifiable Information (PII) during training. In response, regulations such as European Union's General Data Protection Regulation (GDPR) mandate the deletion of PII upon requests, underscoring the need for reliable and cost-effective data removal solutions. Machine unlearning has emerged as a promising direction for selectively forgetting data points. However, existing unlearning techniques typically apply a uniform forgetting strategy that neither accounts for the varying privacy risks posed by different PII attributes nor reflects associated business risks. In this work, we propose UnPII, the first PII-centric unlearning approach that prioritizes forgetting based on the risk of individual or combined PII attributes. To this end, we introduce the PII risk index (PRI), a composite metric that incorporates multiple dimensions of risk factors: identifiability, sensitivity, usability, linkability, permanency, exposability, and compliancy. The PRI enables a nuanced evaluation of privacy risks associated with PII exposures and can be tailored to align with organizational privacy policies. To support realistic assessment, we systematically construct a synthetic PII dataset (e.g., 1,700 PII instances) that simulates realistic exposure scenarios. UnPII seamlessly integrates with established unlearning algorithms, such as Gradient Ascent, Negative Preference Optimization, and Direct Preference Optimization, without modifying their underlying principles. Our experimental results demonstrate that UnPII achieves the improvements of accuracy up to 11.8%, utility up to 6.3%, and generalizability up to 12.4%, respectively, while incurring a modest fine-tuning overhead of 27.5% on average during unlearning.


【12】HeurekaBench: A Benchmarking Framework for AI Co-scientist
标题:HeurekaBench:人工智能联合科学家的基准框架
链接:https://arxiv.org/abs/2601.01678

作者:Siba Smarak Panigrahi,Jovana Videnović,Maria Brbić
备注:33 pages, 5 figures, 7 tables. Code available at https://github.com/mlbio-epfl/HeurekaBench
摘要:LLM-based reasoning models have enabled the development of agentic systems that act as co-scientists, assisting in multi-step scientific analysis. However, evaluating these systems is challenging, as it requires realistic, end-to-end research scenarios that integrate data analysis, interpretation, and the generation of new insights from the experimental data. To address this limitation, we introduce HeurekaBench, a framework to create benchmarks with exploratory, open-ended research questions for experimental datasets. Each such question is grounded in a scientific study and its corresponding code repository, and is created using a semi-automated pipeline that leverages multiple LLMs to extract insights and generate candidate workflows, which are then verified against reported findings. We instantiate the framework in single-cell biology to obtain sc-HeurekaBench benchmark and use it to compare state-of-the-art single-cell agents. We further showcase the benefits of our benchmark for quantitatively analyzing current design choices in agentic systems. We find that the addition of a critic module can improve ill-formed responses for open-source LLM-based agents by up to 22% and close the gap with their closed-source counterparts. Overall, HeurekaBench sets a path toward rigorous, end-to-end evaluation of scientific agents, grounding benchmark construction in real scientific workflows.


【13】Who is the Winning Algorithm? Rank Aggregation for Comparative Studies
标题 :谁是赢家算法?用于比较研究的秩聚合
链接:https://arxiv.org/abs/2601.01664

作者:Amichai Painsky
摘要:Consider a collection of m competing machine learning algorithms. Given their performance on a benchmark of datasets, we would like to identify the best performing algorithm. Specifically, which algorithm is most likely to ``win'' (rank highest) on a future, unseen dataset. The standard maximum likelihood approach suggests counting the number of wins per each algorithm. In this work, we argue that there is much more information in the complete rankings. That is, the number of times that each algorithm finished second, third and so forth. Yet, it is not entirely clear how to effectively utilize this information for our purpose. In this work we introduce a novel conceptual framework for estimating the win probability for each of the m algorithms, given their complete rankings over a benchmark of datasets. Our proposed framework significantly improves upon currently known methods in synthetic and real-world examples.


【14】Communication-Efficient Federated AUC Maximization with Cyclic Client Participation
标题:具有循环客户端参与的通信高效的联合AUC最大化
链接:https://arxiv.org/abs/2601.01649

作者:Umesh Vangapally,Wenhan Wu,Chen Chen,Zhishuai Guo
备注:Accepted to Transactions on Machine Learning Research (TMLR)
摘要:Federated AUC maximization is a powerful approach for learning from imbalanced data in federated learning (FL). However, existing methods typically assume full client availability, which is rarely practical. In real-world FL systems, clients often participate in a cyclic manner: joining training according to a fixed, repeating schedule. This setting poses unique optimization challenges for the non-decomposable AUC objective. This paper addresses these challenges by developing and analyzing communication-efficient algorithms for federated AUC maximization under cyclic client participation. We investigate two key settings: First, we study AUC maximization with a squared surrogate loss, which reformulates the problem as a nonconvex-strongly-concave minimax optimization. By leveraging the Polyak-Łojasiewicz (PL) condition, we establish a state-of-the-art communication complexity of $\widetilde{O}(1/ε^{1/2})$ and iteration complexity of $\widetilde{O}(1/ε)$. Second, we consider general pairwise AUC losses. We establish a communication complexity of $O(1/ε^3)$ and an iteration complexity of $O(1/ε^4)$. Further, under the PL condition, these bounds improve to communication complexity of $\widetilde{O}(1/ε^{1/2})$ and iteration complexity of $\widetilde{O}(1/ε)$. Extensive experiments on benchmark tasks in image classification, medical imaging, and fraud detection demonstrate the superior efficiency and effectiveness of our proposed methods.


【15】Real Time NILM Based Power Monitoring of Identical Induction Motors Representing Cutting Machines in Textile Industry
标题:基于实时NILM的对代表纺织行业切割机的相同感应电机进行功率监控
链接:https://arxiv.org/abs/2601.01616

作者:Md Istiauk Hossain Rifat,Moin Khan,Mohammad Zunaed
备注:9 pages, 9 figures
摘要:The textile industry in Bangladesh is one of the most energy-intensive sectors, yet its monitoring practices remain largely outdated, resulting in inefficient power usage and high operational costs. To address this, we propose a real-time Non-Intrusive Load Monitoring (NILM)-based framework tailored for industrial applications, with a focus on identical motor-driven loads representing textile cutting machines. A hardware setup comprising voltage and current sensors, Arduino Mega and ESP8266 was developed to capture aggregate and individual load data, which was stored and processed on cloud platforms. A new dataset was created from three identical induction motors and auxiliary loads, totaling over 180,000 samples, to evaluate the state-of-the-art MATNILM model under challenging industrial conditions. Results indicate that while aggregate energy estimation was reasonably accurate, per-appliance disaggregation faced difficulties, particularly when multiple identical machines operated simultaneously. Despite these challenges, the integrated system demonstrated practical real-time monitoring with remote accessibility through the Blynk application. This work highlights both the potential and limitations of NILM in industrial contexts, offering insights into future improvements such as higher-frequency data collection, larger-scale datasets and advanced deep learning approaches for handling identical loads.


【16】Four Quadrants of Difficulty: A Simple Categorisation and its Limits
标题:难度四象限:简单分类及其局限性
链接:https://arxiv.org/abs/2601.01488

作者:Vanessa Toborek,Sebastian Müller,Christian Bauckhage
备注:prepared for ESANN 2026 submission
摘要:Curriculum Learning (CL) aims to improve the outcome of model training by estimating the difficulty of samples and scheduling them accordingly. In NLP, difficulty is commonly approximated using task-agnostic linguistic heuristics or human intuition, implicitly assuming that these signals correlate with what neural models find difficult to learn. We propose a four-quadrant categorisation of difficulty signals -- human vs. model and task-agnostic vs. task-dependent -- and systematically analyse their interactions on a natural language understanding dataset. We find that task-agnostic features behave largely independently and that only task-dependent features align. These findings challenge common CL intuitions and highlight the need for lightweight, task-dependent difficulty estimators that better reflect model learning behaviour.


【17】Leveraging Flatness to Improve Information-Theoretic Generalization Bounds for SGD
标题:利用平坦性提高新元的信息理论概括界限
链接:https://arxiv.org/abs/2601.01465

作者:Ze Peng,Jian Zhang,Yisen Wang,Lei Qi,Yinghuan Shi,Yang Gao
备注:Published as a conference paper at ICLR 2025
摘要 :Information-theoretic (IT) generalization bounds have been used to study the generalization of learning algorithms. These bounds are intrinsically data- and algorithm-dependent so that one can exploit the properties of data and algorithm to derive tighter bounds. However, we observe that although the flatness bias is crucial for SGD's generalization, these bounds fail to capture the improved generalization under better flatness and are also numerically loose. This is caused by the inadequate leverage of SGD's flatness bias in existing IT bounds. This paper derives a more flatness-leveraging IT bound for the flatness-favoring SGD. The bound indicates the learned models generalize better if the large-variance directions of the final weight covariance have small local curvatures in the loss landscape. Experiments on deep neural networks show our bound not only correctly reflects the better generalization when flatness is improved, but is also numerically much tighter. This is achieved by a flexible technique called "omniscient trajectory". When applied to Gradient Descent's minimax excess risk on convex-Lipschitz-Bounded problems, it improves representative IT bounds' $Ω(1)$ rates to $O(1/\sqrt{n})$. It also implies a by-pass of memorization-generalization trade-offs.


【18】Segmentation and Processing of German Court Decisions from Open Legal Data
标题:从开放法律数据中分割和处理德国法院判决
链接:https://arxiv.org/abs/2601.01449

作者:Harshil Darji,Martin Heckelmann,Christina Kratsch,Gerard de Melo
备注:Accepted and published as a research article in Legal Knowledge and Information Systems (JURIX 2025 proceedings, IOS Press). Pages 276--281
摘要:The availability of structured legal data is important for advancing Natural Language Processing (NLP) techniques for the German legal system. One of the most widely used datasets, Open Legal Data, provides a large-scale collection of German court decisions. While the metadata in this raw dataset is consistently structured, the decision texts themselves are inconsistently formatted and often lack clearly marked sections. Reliable separation of these sections is important not only for rhetorical role classification but also for downstream tasks such as retrieval and citation analysis. In this work, we introduce a cleaned and sectioned dataset of 251,038 German court decisions derived from the official Open Legal Data dataset. We systematically separated three important sections in German court decisions, namely Tenor (operative part of the decision), Tatbestand (facts of the case), and Entscheidungsgründe (judicial reasoning), which are often inconsistently represented in the original dataset. To ensure the reliability of our extraction process, we used Cochran's formula with a 95% confidence level and a 5% margin of error to draw a statistically representative random sample of 384 cases, and manually verified that all three sections were correctly identified. We also extracted the Rechtsmittelbelehrung (appeal notice) as a separate field, since it is a procedural instruction and not part of the decision itself. The resulting corpus is publicly available in the JSONL format, making it an accessible resource for further research on the German legal system.


【19】iFlip: Iterative Feedback-driven Counterfactual Example Refinement
标题:iFlip:迭代反馈驱动的反事实示例细化
链接:https://arxiv.org/abs/2601.01446

作者:Yilong Wang,Qianli Wang,Nils Feldhus
备注:In submission
摘要:Counterfactual examples are minimal edits to an input that alter a model's prediction. They are widely employed in explainable AI to probe model behavior and in natural language processing (NLP) to augment training data. However, generating valid counterfactuals with large language models (LLMs) remains challenging, as existing single-pass methods often fail to induce reliable label changes, neglecting LLMs' self-correction capabilities. To explore this untapped potential, we propose iFlip, an iterative refinement approach that leverages three types of feedback, including model confidence, feature attribution, and natural language. Our results show that iFlip achieves an average 57.8% higher validity than the five state-of-the-art baselines, as measured by the label flipping rate. The user study further corroborates that iFlip outperforms baselines in completeness, overall satisfaction, and feasibility. In addition, ablation studies demonstrate that three components are paramount for iFlip to generate valid counterfactuals: leveraging an appropriate number of iterations, pointing to highly attributed words, and early stopping. Finally, counterfactuals generated by iFlip enable effective counterfactual data augmentation, substantially improving model performance and robustness.


【20】Efficient Cover Construction for Ball Mapper via Accelerated Range Queries
标题:通过加速范围搜索器有效构建球标绘图器的封面
链接:https://arxiv.org/abs/2601.01405

作者:Jay-Anne Bulauan,John Rick Manzanares
摘要:Ball Mapper is an widely used tool in topological data analysis for summarizing the structure of high-dimensional data through metric-based coverings and graph representations. A central computational bottleneck in Ball Mapper is the construction of the underlying cover, which requires repeated range queries to identify data points within a fixed distance of selected landmarks. As data sets grow in size and dimensionality, naive implementations of this step become increasingly inefficient.   In this work, we study practical strategies for accelerating cover construction in Ball Mapper by improving the efficiency of range queries. We integrate two complementary approaches into the Ball Mapper pipeline: hierarchical geometric pruning using ball tree data structures, and hardware-aware distance computation using Facebook AI Similarity Search. We describe the underlying algorithms, discuss their trade-offs with respect to metric flexibility and dimensionality, and provide implementation details relevant to large-scale data analysis.   Empirical benchmarks demonstrate that both approaches yield substantial speedups over the baseline implementation, with performance gains depending on data set size, dimensionality, and choice of distance function. These results improve the practical scalability of Ball Mapper without modifying its theoretical formulation and provide guidance for the efficient implementation of metric-based exploratory tools in modern data analysis workflows.


【21】FLOP-Efficient Training: Early Stopping Based on Test-Time Compute Awareness
标题:FLOP高效训练:基于测试时计算意识的早期停止
链接:https://arxiv.org/abs/2601.01332

作者 :Hossam Amer,Maryam Dialameh,Hossein Rajabzadeh,Walid Ahmed,Weiwei Zhang,Yang Liu
摘要:Scaling training compute, measured in FLOPs, has long been shown to improve the accuracy of large language models, yet training remains resource-intensive. Prior work shows that increasing test-time compute (TTC)-for example through iterative sampling-can allow smaller models to rival or surpass much larger ones at lower overall cost. We introduce TTC-aware training, where an intermediate checkpoint and a corresponding TTC configuration can together match or exceed the accuracy of a fully trained model while requiring substantially fewer training FLOPs. Building on this insight, we propose an early stopping algorithm that jointly selects a checkpoint and TTC configuration to minimize training compute without sacrificing accuracy. To make this practical, we develop an efficient TTC evaluation method that avoids exhaustive search, and we formalize a break-even bound that identifies when increased inference compute compensates for reduced training compute. Experiments demonstrate up to 92\% reductions in training FLOPs while maintaining and sometimes remarkably improving accuracy. These results highlight a new perspective for balancing training and inference compute in model development, enabling faster deployment cycles and more frequent model refreshes. Codes will be publicly released.


【22】Spectral-Window Hybrid (SWH)
标题:光谱窗混合(SWH)
链接:https://arxiv.org/abs/2601.01313

作者:Vladimer Khasia
摘要:Scaling sequence modeling to extreme contexts requires balancing computational efficiency with representational expressivity. While Transformers provide precise retrieval via the attention mechanism, their quadratic $\mathcal{O}(T^2)$ complexity limits their application to long-horizon tasks. In this work, we propose the \textbf{Spectral-Window Hybrid (SWH)}, an architecture that decouples sequence modeling into two \textit{parallel} streams: a global branch utilizing the Convolution Theorem to model long-range decay dynamics in $\mathcal{O}(T \log T)$ time, and a local branch employing sliding-window attention for token interactions within a bounded context. By aggregating these representations, SWH avoids the computational bottleneck of global attention while retaining local precision. We demonstrate that SWH matches the perplexity of standard Transformers on short contexts while enabling efficient linear scaling to extended sequences. The code is available at https://github.com/VladimerKhasia/SWH


【23】Towards a Principled Muon under $μ\mathsf{P}$: Ensuring Spectral Conditions throughout Training
链接:https://arxiv.org/abs/2601.01306

作者:John Zhao
备注:21 pages, 0 figures
摘要:The $μ$-parameterization ($μ$P) provides a principled foundation for large language model (LLM) training by prescribing width-independent learning dynamics, which in turn enables predictable scaling behavior and robust hyperparameter transfer across model sizes. A central requirement of $μ$P is the satisfaction of certain spectral conditions on weight matrices, which ensure consistent feature learning and optimization behavior as model width grows. While these conditions are well understood in theory, guaranteeing their validity in practical training for matrix-based optimizers such as Muon is still under studied. Existing works that study Muon under $μ$P exhibit important limitations: they either do not ensure that the spectral conditions hold throughout the entire training horizon, or require repeated spectral normalization (or Newton-Schulz iterations) applied to both weights and updates, leading to significant computational overhead and reduced practicality. In this work, we show how to reliably guarantee the spectral conditions required by $μ$P for Muon during the entire training process. Our key insight is that for moderately large models, maintaining spectral control at the level of optimizer updates alone is sufficient to preserve $μ$P-compatible scaling, eliminating the need for explicit spectral normalization of the weights. Based on this principle, we develop a variant of Muon, namely Muon++, that satisfies spectral condition throughout the training process. Our results bridge the gap between the theoretical promises of $μ$P and the practical deployment of matrix-based optimizers in long-horizon training. We also take the first step towards an adaptive spectral condition by incorporating data-dependent effects, making it better suited for long-horizon LLM training.


【24】Warp-Cortex: An Asynchronous, Memory-Efficient Architecture for Million-Agent Cognitive Scaling on Consumer Hardware
标题:Warp-Cortex:一种异步的、内存高效的消费硬件上的百万级智能体认知扩展架构
链接:https://arxiv.org/abs/2601.01298

作者:Jorge L. Ruiz Williams
摘要:Current multi-agent Large Language Model (LLM) frameworks suffer from linear memory scaling, rendering "System 2" parallel reasoning impractical on consumer hardware. We present Warp Cortex, an asynchronous architecture that theoretically enables million-agent cognitive scaling by decoupling agent logic from physical memory. Through Singleton Weight Sharing and a novel Topological Synapse--inspired by hybrid landmarking techniques from Topological Data Analysis (TDA)--we reduce memory complexity from O(N * L) to O(1) for weights and O(N * k) for context, where k << L. By treating the KV-cache as a point cloud in latent space, we apply witness-complex-inspired sparsification to preserve persistent homological features of the context manifold. On a single NVIDIA RTX 4090, we empirically demonstrate 100 concurrent agents at 2.2 GB total VRAM, with theoretical capacity exceeding 1,000 agents before compute latency becomes the bottleneck. We further introduce Referential Injection, a non-intrusive KV-cache update mechanism that allows asynchronous sub-agents to influence primary generation without stream disruption.


【25】Revisiting Weighted Strategy for Non-stationary Parametric Bandits and MDPs
标题:重新审视非平稳参数带宽和MDP的加权策略
链接:https://arxiv.org/abs/2601.01069

作者:Jing Wang,Peng Zhao,Zhi-Hua Zhou
备注:accepted by IEEE Transactions on Information Theory. arXiv admin note: substantial text overlap with arXiv:2303.02691
摘要 :Non-stationary parametric bandits have attracted much attention recently. There are three principled ways to deal with non-stationarity, including sliding-window, weighted, and restart strategies. As many non-stationary environments exhibit gradual drifting patterns, the weighted strategy is commonly adopted in real-world applications. However, previous theoretical studies show that its analysis is more involved and the algorithms are either computationally less efficient or statistically suboptimal. This paper revisits the weighted strategy for non-stationary parametric bandits. In linear bandits (LB), we discover that this undesirable feature is due to an inadequate regret analysis, which results in an overly complex algorithm design. We propose a \emph{refined analysis framework}, which simplifies the derivation and, importantly, produces a simpler weight-based algorithm that is as efficient as window/restart-based algorithms while retaining the same regret as previous studies. Furthermore, our new framework can be used to improve regret bounds of other parametric bandits, including Generalized Linear Bandits (GLB) and Self-Concordant Bandits (SCB). For example, we develop a simple weighted GLB algorithm with an $\tilde{O}(k_μ^{5/4} c_μ^{-3/4} d^{3/4} P_T^{1/4}T^{3/4})$ regret, improving the $\tilde{O}(k_μ^{2} c_μ^{-1}d^{9/10} P_T^{1/5}T^{4/5})$ bound in prior work, where $k_μ$ and $c_μ$ characterize the reward model's nonlinearity, $P_T$ measures the non-stationarity, $d$ and $T$ denote the dimension and time horizon. Moreover, we extend our framework to non-stationary Markov Decision Processes (MDPs) with function approximation, focusing on Linear Mixture MDP and Multinomial Logit (MNL) Mixture MDP. For both classes, we propose algorithms based on the weighted strategy and establish dynamic regret guarantees using our analysis framework.


【26】A UCB Bandit Algorithm for General ML-Based Estimators
标题:通用ML估计器的UCB Bandit算法
链接:https://arxiv.org/abs/2601.01061

作者:Yajing Liu,Erkao Bao,Linqi Song
备注:15 pages, 4 figures, 1 table, Multi-Arm bandit, psi-UCB, generalized machine learning models
摘要:We present ML-UCB, a generalized upper confidence bound algorithm that integrates arbitrary machine learning models into multi-armed bandit frameworks. A fundamental challenge in deploying sophisticated ML models for sequential decision-making is the lack of tractable concentration inequalities required for principled exploration. We overcome this limitation by directly modeling the learning curve behavior of the underlying estimator. Specifically, assuming the Mean Squared Error decreases as a power law in the number of training samples, we derive a generalized concentration inequality and prove that ML-UCB achieves sublinear regret. This framework enables the principled integration of any ML model whose learning curve can be empirically characterized, eliminating the need for model-specific theoretical analysis. We validate our approach through experiments on a collaborative filtering recommendation system using online matrix factorization with synthetic data designed to simulate a simplified two-tower model, demonstrating substantial improvements over LinUCB


【27】Coarse-Grained Kullback--Leibler Control of Diffusion-Based Generative AI
标题:粗粒度Kullback--基于扩散的生成人工智能的Leibler控制
链接:https://arxiv.org/abs/2601.01045

作者:Tatsuaki Tsuruyama
摘要:Diffusion models and score-based generative models provide a powerful framework for synthesizing high-quality images from noise. However, there is still no satisfactory theory that describes how coarse-grained quantities, such as blockwise intensity or class proportions after partitioning an image into spatial blocks, are preserved and evolve along the reverse diffusion dynamics. In previous work, the author introduced an information-theoretic Lyapunov function V for non-ergodic Markov processes on a state space partitioned into blocks, defined as the minimal Kullback-Leibler divergence to the set of stationary distributions reachable from a given initial condition, and showed that a leak-tolerant potential V-delta with a prescribed tolerance for block masses admits a closed-form expression as a scaling-and-clipping operation on block masses.   In this paper, I transplant this framework to the reverse diffusion process in generative models and propose a reverse diffusion scheme that is projected by the potential V-delta (referred to as the V-delta projected reverse diffusion). I extend the monotonicity of V to time-inhomogeneous block-preserving Markov kernels and show that, under small leakage and the V-delta projection, V-delta acts as an approximate Lyapunov function. Furthermore, using a toy model consisting of block-constant images and a simplified reverse kernel, I numerically demonstrate that the proposed method keeps the block-mass error and the leak-tolerant potential within the prescribed tolerance, while achieving pixel-wise accuracy and visual quality comparable to the non-projected dynamics. This study reinterprets generative sampling as a decrease of an information potential from noise to data, and provides a design principle for reverse diffusion processes with explicit control of coarse-grained quantities.


【28】Decoupling Amplitude and Phase Attention in Frequency Domain for RGB-Event based Visual Object Tracking
标题:频域幅度和相注意力去耦合基于RGB事件的视觉对象跟踪
链接:https://arxiv.org/abs/2601.01022

作者:Shiao Wang,Xiao Wang,Haonan Zhao,Jiarui Xu,Bo Jiang,Lin Zhu,Xin Zhao,Yonghong Tian,Jin Tang
摘要:Existing RGB-Event visual object tracking approaches primarily rely on conventional feature-level fusion, failing to fully exploit the unique advantages of event cameras. In particular, the high dynamic range and motion-sensitive nature of event cameras are often overlooked, while low-information regions are processed uniformly, leading to unnecessary computational overhead for the backbone network. To address these issues, we propose a novel tracking framework that performs early fusion in the frequency domain, enabling effective aggregation of high-frequency information from the event modality. Specifically, RGB and event modalities are transformed from the spatial domain to the frequency domain via the Fast Fourier Transform, with their amplitude and phase components decoupled. High-frequency event information is selectively fused into RGB modality through amplitude and phase attention, enhancing feature representation while substantially reducing backbone computation. In addition, a motion-guided spatial sparsification module leverages the motion-sensitive nature of event cameras to capture the relationship between target motion cues and spatial probability distribution, filtering out low-information regions and enhancing target-relevant features. Finally, a sparse set of target-relevant features is fed into the backbone network for learning, and the tracking head predicts the final target position. Extensive experiments on three widely used RGB-Event tracking benchmark datasets, including FE108, FELT, and COESOT, demonstrate the high performance and efficiency of our method. The source code of this paper will be released on https://github.com/Event-AHU/OpenEvTracking


【29】Expanding the Chaos: Neural Operator for Stochastic (Partial) Differential Equations
标题:扩展混乱:随机(偏)方程的神经运算符
链接:https://arxiv.org/abs/2601.01021

作者:Dai Shi,Lequan Lin,Andi Han,Luke Thompson,José Miguel Hernández-Lobato,Zhiyong Wang,Junbin Gao
摘要:Stochastic differential equations (SDEs) and stochastic partial differential equations (SPDEs) are fundamental tools for modeling stochastic dynamics across the natural sciences and modern machine learning. Developing deep learning models for approximating their solution operators promises not only fast, practical solvers, but may also inspire models that resolve classical learning tasks from a new perspective. In this work, we build on classical Wiener chaos expansions (WCE) to design neural operator (NO) architectures for SPDEs and SDEs: we project the driving noise paths onto orthonormal Wick Hermite features and parameterize the resulting deterministic chaos coefficients with neural operators, so that full solution trajectories can be reconstructed from noise in a single forward pass. On the theoretical side, we investigate the classical WCE results for the class of multi-dimensional SDEs and semilinear SPDEs considered here by explicitly writing down the associated coupled ODE/PDE systems for their chaos coefficients, which makes the separation between stochastic forcing and deterministic dynamics fully explicit and directly motivates our model designs. On the empirical side, we validate our models on a diverse suite of problems: classical SPDE benchmarks, diffusion one-step sampling on images, topological interpolation on graphs, financial extrapolation, parameter estimation, and manifold SDEs for flood prediction, demonstrating competitive accuracy and broad applicability. Overall, our results indicate that WCE-based neural operators provide a practical and scalable way to learn SDE/SPDE solution operators across diverse domains.


【30】Contractive Diffusion Policies: Robust Action Diffusion via Contractive Score-Based Sampling with Differential Equations
标题:压缩扩散政策:通过基于压缩分数的采样和方程的鲁棒行动扩散
链接:https://arxiv.org/abs/2601.01003

作者:Amin Abyaneh,Charlotte Morissette,Mohamad H. Danesh,Anas El Houssaini,David Meger,Gregory Dudek,Hsiu-Chin Lin
备注:Under review at ICLR 2026
摘要:Diffusion policies have emerged as powerful generative models for offline policy learning, whose sampling process can be rigorously characterized by a score function guiding a Stochastic Differential Equation (SDE). However, the same score-based SDE modeling that grants diffusion policies the flexibility to learn diverse behavior also incurs solver and score-matching errors, large data requirements, and inconsistencies in action generation. While less critical in image generation, these inaccuracies compound and lead to failure in continuous control settings. We introduce Contractive Diffusion Policies (CDPs) to induce contractive behavior in the diffusion sampling dynamics. Contraction pulls nearby flows closer to enhance robustness against solver and score-matching errors while reducing unwanted action variance. We develop an in-depth theoretical analysis along with a practical implementation recipe to incorporate CDPs into existing diffusion policy architectures with minimal modification and computational cost. We evaluate CDPs for offline learning by conducting extensive experiments in simulation and real-world settings. Across benchmarks, CDPs often outperform baseline policies, with pronounced benefits under data scarcity.


【31】Adapting Feature Attenuation to NLP
标题:将特征衰减适应NLP
链接:https://arxiv.org/abs/2601.00965

作者:Tianshuo Yang,Ryan Rabinowitz,Terrance E. Boult,Jugal Kalita
摘要:Transformer classifiers such as BERT deliver impressive closed-set accuracy, yet they remain brittle when confronted with inputs from unseen categories--a common scenario for deployed NLP systems. We investigate Open-Set Recognition (OSR) for text by porting the feature attenuation hypothesis from computer vision to transformers and by benchmarking it against state-of-the-art baselines. Concretely, we adapt the COSTARR framework--originally designed for classification in computer vision--to two modest language models (BERT (base) and GPT-2) trained to label 176 arXiv subject areas. Alongside COSTARR, we evaluate Maximum Softmax Probability (MSP), MaxLogit, and the temperature-scaled free-energy score under the OOSA and AUOSCR metrics. Our results show (i) COSTARR extends to NLP without retraining but yields no statistically significant gain over MaxLogit or MSP, and (ii) free-energy lags behind all other scores in this high-class-count setting. The study highlights both the promise and the current limitations of transplanting vision-centric OSR ideas to language models, and points toward the need for larger backbones and task-tailored attenuation strategies.


【32】LOFA: Online Influence Maximization under Full-Bandit Feedback using Lazy Forward Selection
标题:LOFA:使用懒惰向前选择在全强盗反馈下实现在线影响力最大化
链接:https://arxiv.org/abs/2601.00933

作者:Jinyu Xu,Abhishek K. Umrawal
备注:14 pages and 6 figures
摘要:We study the problem of influence maximization (IM) in an online setting, where the goal is to select a subset of nodes$\unicode{x2014}$called the seed set$\unicode{x2014}$at each time step over a fixed time horizon, subject to a cardinality budget constraint, to maximize the expected cumulative influence. We operate under a full-bandit feedback model, where only the influence of the chosen seed set at each time step is observed, with no additional structural information about the network or diffusion process. It is well-established that the influence function is submodular, and existing algorithms exploit this property to achieve low regret. In this work, we leverage this property further and propose the Lazy Online Forward Algorithm (LOFA), which achieves a lower empirical regret. We conduct experiments on a real-world social network to demonstrate that LOFA achieves superior performance compared to existing bandit algorithms in terms of cumulative regret and instantaneous reward.


【33】Complexity-based code embeddings
标题:基于复杂性的代码嵌入
链接:https://arxiv.org/abs/2601.00924

作者:Rares Folea,Radu Iacob,Emil Slusanschi,Traian Rebedea
摘要 :This paper presents a generic method for transforming the source code of various algorithms to numerical embeddings, by dynamically analysing the behaviour of computer programs against different inputs and by tailoring multiple generic complexity functions for the analysed metrics. The used algorithms embeddings are based on r-Complexity . Using the proposed code embeddings, we present an implementation of the XGBoost algorithm that achieves an average F1-score on a multi-label dataset with 11 classes, built using real-world code snippets submitted for programming competitions on the Codeforces platform.


【34】Attention Needs to Focus: A Unified Perspective on Attention Allocation
标题:注意力需要集中:注意力分配的统一视角
链接:https://arxiv.org/abs/2601.00919

作者:Zichuan Fu,Wentao Song,Guojing Li,Yejing Wang,Xian Wu,Yimin Deng,Hanyu Yan,Yefeng Zheng,Xiangyu Zhao
备注:ICLR 2026 conference
摘要:The Transformer architecture, a cornerstone of modern Large Language Models (LLMs), has achieved extraordinary success in sequence modeling, primarily due to its attention mechanism. However, despite its power, the standard attention mechanism is plagued by well-documented issues: representational collapse and attention sink. Although prior work has proposed approaches for these issues, they are often studied in isolation, obscuring their deeper connection. In this paper, we present a unified perspective, arguing that both can be traced to a common root -- improper attention allocation. We identify two failure modes: 1) Attention Overload, where tokens receive comparable high weights, blurring semantic features that lead to representational collapse; 2) Attention Underload, where no token is semantically relevant, yet attention is still forced to distribute, resulting in spurious focus such as attention sink. Building on this insight, we introduce Lazy Attention, a novel mechanism designed for a more focused attention distribution. To mitigate overload, it employs positional discrimination across both heads and dimensions to sharpen token distinctions. To counteract underload, it incorporates Elastic-Softmax, a modified normalization function that relaxes the standard softmax constraint to suppress attention on irrelevant tokens. Experiments on the FineWeb-Edu corpus, evaluated across nine diverse benchmarks, demonstrate that Lazy Attention successfully mitigates attention sink and achieves competitive performance compared to both standard attention and modern architectures, while reaching up to 59.58% attention sparsity.


【35】Latent-Constrained Conditional VAEs for Augmenting Large-Scale Climate Ensembles
标题:用于扩大大规模气候集合的潜在约束条件VAE
链接:https://arxiv.org/abs/2601.00915

作者:Jacquelyn Shelton,Przemyslaw Polewski,Alexander Robel,Matthew Hoffman,Stephen Price
备注:draft / preliminary
摘要:Large climate-model ensembles are computationally expensive; yet many downstream analyses would benefit from additional, statistically consistent realizations of spatiotemporal climate variables. We study a generative modeling approach for producing new realizations from a limited set of available runs by transferring structure learned across an ensemble. Using monthly near-surface temperature time series from ten independent reanalysis realizations (ERA5), we find that a vanilla conditional variational autoencoder (CVAE) trained jointly across realizations yields a fragmented latent space that fails to generalize to unseen ensemble members. To address this, we introduce a latent-constrained CVAE (LC-CVAE) that enforces cross-realization homogeneity of latent embeddings at a small set of shared geographic 'anchor' locations. We then use multi-output Gaussian process regression in the latent space to predict latent coordinates at unsampled locations in a new realization, followed by decoding to generate full time series fields. Experiments and ablations demonstrate (i) instability when training on a single realization, (ii) diminishing returns after incorporating roughly five realizations, and (iii) a trade-off between spatial coverage and reconstruction quality that is closely linked to the average neighbor distance in latent space.


【36】Device-Native Autonomous Agents for Privacy-Preserving Negotiations
标题:用于隐私保护谈判的设备原生自治代理
链接:https://arxiv.org/abs/2601.00911

作者:Joyjit Roy
备注:9 pages, 6 figuers, 9 tables, Submitted in conference 2nd International Conference on Artificial Intelligence Systems (AIS 2026)
摘要:Automated negotiations in insurance and business-to-business (B2B) commerce encounter substantial challenges. Current systems force a trade-off between convenience and privacy by routing sensitive financial data through centralized servers, increasing security risks, and diminishing user trust. This study introduces a device-native autonomous Artificial Intelligence (AI) agent system for privacy-preserving negotiations. The proposed system operates exclusively on user hardware, enabling real-time bargaining while maintaining sensitive constraints locally. It integrates zero-knowledge proofs to ensure privacy and employs distilled world models to support advanced on-device reasoning. The architecture incorporates six technical components within an agentic AI workflow. Agents autonomously plan negotiation strategies, conduct secure multi-party bargaining, and generate cryptographic audit trails without exposing user data to external servers. The system is evaluated in insurance and B2B procurement scenarios across diverse device configurations. Results show an average success rate of 87%, a 2.4x latency improvement over cloud baselines, and strong privacy preservation through zero-knowledge proofs. User studies show 27% higher trust scores when decision trails are available. These findings establish a foundation for trustworthy autonomous agents in privacy-sensitive financial domains.


【37】Security Hardening Using FABRIC: Implementing a Unified Compliance Aggregator for Linux Servers
标题:使用FABRIC加强安全性:为Linux服务器实现统一的合规性聚合器
链接:https://arxiv.org/abs/2601.00909

作者:Sheldon Paul,Izzat Alsmadi
摘要 :This paper presents a unified framework for evaluating Linux security hardening on the FABRIC testbed through aggregation of heterogeneous security auditing tools. We deploy three Ubuntu 22.04 nodes configured at baseline, partial, and full hardening levels, and evaluate them using Lynis, OpenSCAP, and AIDE across 108 audit runs. To address the lack of a consistent interpretation across tools, we implement a Unified Compliance Aggregator (UCA) that parses tool outputs, normalizes scores to a common 0--100 scale, and combines them into a weighted metric augmented by a customizable rule engine for organization-specific security policies. Experimental results show that full hardening increases OpenSCAP compliance from 39.7 to 71.8, while custom rule compliance improves from 39.3\% to 83.6\%. The results demonstrate that UCA provides a clearer and more reproducible assessment of security posture than individual tools alone, enabling systematic evaluation of hardening effectiveness in programmable testbed environments.


【38】Universal Conditional Logic: A Formal Language for Prompt Engineering
标题:通用条件逻辑:快速工程的形式语言
链接:https://arxiv.org/abs/2601.00880

作者:Anthony Mikinka
备注:25 pages, 15 figures, 5 tables. Includes appendices with variable reference, pattern library, and O_s calculation examples. Supplementary materials: V1-V4.1 prompt source code and 305 model responses available at GitHub repositories
摘要:We present Universal Conditional Logic (UCL), a mathematical framework for prompt optimization that transforms prompt engineering from heuristic practice into systematic optimization. Through systematic evaluation (N=305, 11 models, 4 iterations), we demonstrate significant token reduction (29.8%, t(10)=6.36, p < 0.001, Cohen's d = 2.01) with corresponding cost savings. UCL's structural overhead function O_s(A) explains version-specific performance differences through the Over-Specification Paradox: beyond threshold S* = 0.509, additional specification degrades performance quadratically. Core mechanisms -- indicator functions (I_i in {0,1}), structural overhead (O_s = gamma * sum(ln C_k)), early binding -- are validated. Notably, optimal UCL configuration varies by model architecture -- certain models (e.g., Llama 4 Scout) require version-specific adaptations (V4.1). This work establishes UCL as a calibratable framework for efficient LLM interaction, with model-family-specific optimization as a key research direction.


【39】Path Integral Solution for Dissipative Generative Dynamics
标题:消散生成动力学的路径积分解
链接:https://arxiv.org/abs/2601.00860

作者:Xidi Wang
备注:6 pages, 2 figures, 2 tables, along with 2 supplementary materials
摘要:Can purely mechanical systems generate intelligent language? We prove that dissipative quantum dynamics with analytically tractable non-local context aggregation produce coherent text generation, while conservation laws cause fundamental failure. Employing Koopman operators with closed-form path integral propagators, we show irreversible computation fundamentally requires both controlled information dissipation and causal context aggregation. Spectral analysis reveals emergent eigenvalue structure, separating into decay modes (forgetting), growth modes (amplification), and neutral modes (preservation) -- the essential ingredients for directed information flow. Hamiltonian constraints force the elimination of these dissipative modes and degrading performance despite unchanged model capacity. This establishes language generation as dissipative quantum field theory, proving mechanical systems acquire intelligence through the combination of dissipation and non-locality, not through conservation.


【40】Feature-based Inversion of 2.5D Controlled Source Electromagnetic Data using Generative Priors
标题:使用生成先验数据的基于生成的2.5D受控源电磁数据倒置
链接:https://arxiv.org/abs/2601.02145

作者:Hongyu Zhou,Haoran Sun,Rui Guo,Maokun Li,Fan Yang,Shenheng Xu
摘要:In this study, we investigate feature-based 2.5D controlled source marine electromagnetic (mCSEM) data inversion using generative priors. Two-and-half dimensional modeling using finite difference method (FDM) is adopted to compute the response of horizontal electric dipole (HED) excitation. Rather than using a neural network to approximate the entire inverse mapping in a black-box manner, we adopt a plug-andplay strategy in which a variational autoencoder (VAE) is used solely to learn prior information on conductivity distributions. During the inversion process, the conductivity model is iteratively updated using the Gauss Newton method, while the model space is constrained by projections onto the learned VAE decoder. This framework preserves explicit control over data misfit and enables flexible adaptation to different survey configurations. Numerical and field experiments demonstrate that the proposed approach effectively incorporates prior information, improves reconstruction accuracy, and exhibits good generalization performance.


【41】A Multilayered Approach to Classifying Customer Responsiveness and Credit Risk
标题:客户响应性和信用风险分类的多层方法
链接:https://arxiv.org/abs/2601.01970

作者:Ayomide Afolabi,Ebere Ogburu,Symon Kimitei
摘要:This study evaluates the performance of various classifiers in three distinct models: response, risk, and response-risk, concerning credit card mail campaigns and default prediction. In the response model, the Extra Trees classifier demonstrates the highest recall level (79.1%), emphasizing its effectiveness in identifying potential responders to targeted credit card offers. Conversely, in the risk model, the Random Forest classifier exhibits remarkable specificity of 84.1%, crucial for identifying customers least likely to default. Furthermore, in the multi-class response-risk model, the Random Forest classifier achieves the highest accuracy (83.2%), indicating its efficacy in discerning both potential responders to credit card mail campaign and low-risk credit card users. In this study, we optimized various performance metrics to solve a specific credit risk and mail responsiveness business problem.


【42】Random-Matrix-Induced Simplicity Bias in Over-parameterized Variational Quantum Circuits
标题:过度参数化变分量子电路中随机矩阵引起的简单性偏差
链接:https://arxiv.org/abs/2601.01877

作者:Jun Qi,Chao-Han Huck Yang,Pin-Yu Chen,Min-Hsiu Hsieh
备注:20 pages, 4 figures
摘要:Over-parameterization is commonly used to increase the expressivity of variational quantum circuits (VQCs), yet deeper and more highly parameterized circuits often exhibit poor trainability and limited generalization. In this work, we provide a theoretical explanation for this phenomenon from a function-class perspective. We show that sufficiently expressive, unstructured variational ansatze enter a Haar-like universality class in which both observable expectation values and parameter gradients concentrate exponentially with system size. As a consequence, the hypothesis class induced by such circuits collapses with high probability to a narrow family of near-constant functions, a phenomenon we term simplicity bias, with barren plateaus arising as a consequence rather than the root cause. Using tools from random matrix theory and concentration of measure, we rigorously characterize this universality class and establish uniform hypothesis-class collapse over finite datasets. We further show that this collapse is not unavoidable: tensor-structured VQCs, including tensor-network-based and tensor-hypernetwork parameterizations, lie outside the Haar-like universality class. By restricting the accessible unitary ensemble through bounded tensor rank or bond dimension, these architectures prevent concentration of measure, preserve output variability for local observables, and retain non-degenerate gradient signals even in over-parameterized regimes. Together, our results unify barren plateaus, expressivity limits, and generalization collapse under a single structural mechanism rooted in random-matrix universality, highlighting the central role of architectural inductive bias in variational quantum algorithms.


【43】Sparse Convex Biclustering
标题:稀疏凸双集群
链接:https://arxiv.org/abs/2601.01757

作者:Jiakun Jiang,Dewei Xiang,Chenliang Gu,Wei Liu,Binhuan Wang
摘要:Biclustering is an essential unsupervised machine learning technique for simultaneously clustering rows and columns of a data matrix, with widespread applications in genomics, transcriptomics, and other high-dimensional omics data. Despite its importance, existing biclustering methods struggle to meet the demands of modern large-scale datasets. The challenges stem from the accumulation of noise in high-dimensional features, the limitations of non-convex optimization formulations, and the computational complexity of identifying meaningful biclusters. These issues often result in reduced accuracy and stability as the size of the dataset increases. To overcome these challenges, we propose Sparse Convex Biclustering (SpaCoBi), a novel method that penalizes noise during the biclustering process to improve both accuracy and robustness. By adopting a convex optimization framework and introducing a stability-based tuning criterion, SpaCoBi achieves an optimal balance between cluster fidelity and sparsity. Comprehensive numerical studies, including simulations and an application to mouse olfactory bulb data, demonstrate that SpaCoBi significantly outperforms state-of-the-art methods in accuracy. These results highlight SpaCoBi as a robust and efficient solution for biclustering in high-dimensional and large-scale datasets.


【44】Latent Space Element Method
标题:隐空间元法
链接:https://arxiv.org/abs/2601.01741

作者:Seung Whan Chung,Youngsoo Choi,Christopher Miller,H. Keo Springer,Kyle T. Sullivan
备注:17 pages, 10 figures
摘要:How can we build surrogate solvers that train on small domains but scale to larger ones without intrusive access to PDE operators? Inspired by the Data-Driven Finite Element Method (DD-FEM) framework for modular data-driven solvers, we propose the Latent Space Element Method (LSEM), an element-based latent surrogate assembly approach in which a learned subdomain ("element") model can be tiled and coupled to form a larger computational domain. Each element is a LaSDI latent ODE surrogate trained from snapshots on a local patch, and neighboring elements are coupled through learned directional interaction terms in latent space, avoiding Schwarz iterations and interface residual evaluations. A smooth window-based blending reconstructs a global field from overlapping element predictions, yielding a scalable assembled latent dynamical system. Experiments on the 1D Burgers and Korteweg-de Vries equations show that LSEM maintains predictive accuracy while scaling to spatial domains larger than those seen in training. LSEM offers an interpretable and extensible route toward foundation-model surrogate solvers built from reusable local models.


【45】Variance-Reduced Diffusion Sampling via Conditional Score Expectation Identity
标题:通过条件分数期望同一性的方差降低扩散抽样
链接:https://arxiv.org/abs/2601.01594

作者:Alois Duston,Tan Bui-Thanh
摘要:We introduce and prove a \textbf{Conditional Score Expectation (CSE)} identity: an exact relation for the marginal score of affine diffusion processes that links scores across time via a conditional expectation under the forward dynamics. Motivated by this identity, we propose a CSE-based statistical estimator for the score using a Self-Normalized Importance Sampling (SNIS) procedure with prior samples and forward noise. We analyze its relationship to the standard Tweedie estimator, proving anti-correlation for Gaussian targets and establishing the same behavior for general targets in the small time-step regime. Exploiting this structure, we derive a variance-minimizing blended score estimator given by a state--time dependent convex combination of the CSE and Tweedie estimators. Numerical experiments show that this optimal-blending estimator reduces variance and improves sample quality for a fixed computational budget compared to either baseline. We further extend the framework to Bayesian inverse problems via likelihood-informed SNIS weights, and demonstrate improved reconstruction quality and sample diversity on high-dimensional image reconstruction tasks and PDE-governed inverse problems.


【46】Identifying recurrent flows in high-dimensional dissipative chaos from low-dimensional embeddings
标题:从低维嵌入识别多维消散混乱中的循环流
链接:https://arxiv.org/abs/2601.01590

作者:Pierre Beck,Tobias M. Schneider
摘要 :Unstable periodic orbits (UPOs) are the non-chaotic, dynamical building blocks of spatio-temporal chaos, motivating a first-principles based theory for turbulence ever since the discovery of deterministic chaos. Despite their key role in the ergodic theory approach to fluid turbulence, identifying UPOs is challenging for two reasons: chaotic dynamics and the high-dimensionality of the spatial discretization. We address both issues at once by proposing a loop convergence algorithm for UPOs directly within a low-dimensional embedding of the chaotic attractor. The convergence algorithm circumvents time-integration, hence avoiding instabilities from exponential error amplification, and operates on a latent dynamics obtained by pulling back the physical equations using automatic differentiation through the learned embedding function. The interpretable latent dynamics is accurate in a statistical sense, and, crucially, the embedding preserves the internal structure of the attractor, which we demonstrate through an equivalence between the latent and physical UPOs of both a model PDE and the 2D Navier-Stokes equations. This allows us to exploit the collapse of high-dimensional dissipative systems onto a lower dimensional manifold, and identify UPOs in the low-dimensional embedding.


【47】Bayesian Negative Binomial Regression of Afrobeats Chart Persistence
标题:非洲节拍图表持续性的Bayesian负二项回归
链接:https://arxiv.org/abs/2601.01391

作者:Ian Jacob Cabansag,Paul Ntegeka
摘要:Afrobeats songs compete for attention on streaming platforms, where chart visibility can influence both revenue and cultural impact. This paper examines whether collaborations help songs remain on the charts longer, using daily Nigeria Spotify Top 200 data from 2024. Each track is summarized by the number of days it appears in the Top 200 during the year and its total annual streams in Nigeria. A Bayesian negative binomial regression is applied, with days on chart as the outcome and collaboration status (solo versus multi-artist) and log total streams as predictors. This approach is well suited for overdispersed count data and allows the effect of collaboration to be interpreted while controlling for overall popularity. Posterior inference is conducted using Markov chain Monte Carlo, and results are assessed using rate ratios, posterior probabilities, and predictive checks. The findings indicate that, after accounting for total streams, collaboration tracks tend to spend slightly fewer days on the chart than comparable solo tracks.


【48】Gradient-Free Approaches is a Key to an Efficient Interaction with Markovian Stochasticity
标题:无对象方法是与马尔科夫随机性有效互动的关键
链接:https://arxiv.org/abs/2601.01160

作者:Boris Prokhorov,Semyon Chebykin,Alexander Gasnikov,Aleksandr Beznosikov
摘要:This paper deals with stochastic optimization problems involving Markovian noise with a zero-order oracle. We present and analyze a novel derivative-free method for solving such problems in strongly convex smooth and non-smooth settings with both one-point and two-point feedback oracles. Using a randomized batching scheme, we show that when mixing time $τ$ of the underlying noise sequence is less than the dimension of the problem $d$, the convergence estimates of our method do not depend on $τ$. This observation provides an efficient way to interact with Markovian stochasticity: instead of invoking the expensive first-order oracle, one should use the zero-order oracle. Finally, we complement our upper bounds with the corresponding lower bounds. This confirms the optimality of our results.


【49】Conformal Blindness: A Note on $A$-Cryptic change-points
标题:保形失明:关于$A$的注释-神秘的变化点
链接:https://arxiv.org/abs/2601.01147

作者:Johan Hallberg Szabadváry
备注:6 pages, 3 figures
摘要:Conformal Test Martingales (CTMs) are a standard method within the Conformal Prediction framework for testing the crucial assumption of data exchangeability by monitoring deviations from uniformity in the p-value sequence. Although exchangeability implies uniform p-values, the converse does not hold. This raises the question of whether a significant break in exchangeability can occur, such that the p-values remain uniform, rendering CTMs blind. We answer this affirmatively, demonstrating the phenomenon of \emph{conformal blindness}.   Through explicit construction, for the theoretically ideal ``oracle'' conformity measure (given by the true conditional density), we demonstrate the possibility of an \emph{$A$-cryptic change-point} (where $A$ refers to the conformity measure). Using bivariate Gaussian distributions, we identify a line along which a change in the marginal means does not alter the distribution of the conformity scores, thereby producing perfectly uniform p-values.   Simulations confirm that even a massive distribution shift can be perfectly cryptic to the CTM, highlighting a fundamental limitation and emphasising the critical role of the alignment of the conformity measure with potential shifts.


【50】Deep Deterministic Nonlinear ICA via Total Correlation Minimization with Matrix-Based Entropy Functional
标题:通过基于矩阵的熵函数的全相关最小化的深度确定性非线性ICA
链接:https://arxiv.org/abs/2601.00904

作者:Qiang Li,Shujian Yu,Liang Ma,Chen Ma,Jingyu Liu,Tulay Adali,Vince D. Calhoun
备注:16 pages, 9 figures
摘要 :Blind source separation, particularly through independent component analysis (ICA), is widely utilized across various signal processing domains for disentangling underlying components from observed mixed signals, owing to its fully data-driven nature that minimizes reliance on prior assumptions. However, conventional ICA methods rely on an assumption of linear mixing, limiting their ability to capture complex nonlinear relationships and to maintain robustness in noisy environments. In this work, we present deep deterministic nonlinear independent component analysis (DDICA), a novel deep neural network-based framework designed to address these limitations. DDICA leverages a matrix-based entropy function to directly optimize the independence criterion via stochastic gradient descent, bypassing the need for variational approximations or adversarial schemes. This results in a streamlined training process and improved resilience to noise. We validated the effectiveness and generalizability of DDICA across a range of applications, including simulated signal mixtures, hyperspectral image unmixing, modeling of primary visual receptive fields, and resting-state functional magnetic resonance imaging (fMRI) data analysis. Experimental results demonstrate that DDICA effectively separates independent components with high accuracy across a range of applications. These findings suggest that DDICA offers a robust and versatile solution for blind source separation in diverse signal processing tasks.


【51】Investigation into U.S. Citizen and Non-Citizen Worker Health Insurance and Employment
标题:美国公民和非公民工人健康保险和就业调查
链接:https://arxiv.org/abs/2601.00896

作者:Annabelle Yao
摘要:Socioeconomic integration is a critical dimension of social equity, yet persistent disparities remain in access to health insurance, education, and employment across different demographic groups. While previous studies have examined isolated aspects of inequality, there is limited research that integrates both statistical analysis and advanced machine learning to uncover hidden structures within population data. This study leverages statistical analysis ($χ^2$ test of independence and Two Proportion Z-Test) and machine learning clustering techniques -- K-Modes and K-Prototypes -- along with t-SNE visualization and CatBoost classification to analyze socioeconomic integration and inequality. Using statistical tests, we identified the proportion of the population with healthcare insurance, quality education, and employment. With this data, we concluded that there was an association between employment and citizenship status. Moreover, we were able to determine 5 distinct population groups using Machine Learning classification. The five clusters our analysis identifies reveal that while citizenship status shows no association with workforce participation, significant disparities exist in access to employer-sponsored health insurance. Each cluster represents a distinct demographic of the population, showing that there is a primary split along the lines of educational attainment which separates Clusters 0 and 4 from Clusters 1, 2, and 3. Furthermore, labor force status and nativity serve as secondary differentiators. Non-citizens are also disproportionately concentrated in precarious employment without benefits, highlighting systemic inequalities in healthcare access. By uncovering demographic clusters that face compounded disadvantages, this research contributes to a more nuanced understanding of socioeconomic stratification.


【52】Deep versus Broad Technology Search and the Timing of Innovation Impact
标题:深度技术搜索与广泛技术搜索以及创新影响的时机
链接:https://arxiv.org/abs/2601.00871

作者:Likun Cao,James Evans
备注:47 pages, 8 figures, 3 tables
摘要:This study offers a new perspective on the depth-versus-breadth debate in innovation strategy, by modeling inventive search within dynamic collective knowledge systems, and underscoring the importance of timing for technological impact. Using frontier machine learning to project patent citation networks in hyperbolic space, we analyze 4.9 million U.S. patents to examine how search strategies give rise to distinct temporal patterns in impact accumulation. We find that inventions based on deep search, which relies on a specialized understanding of complex recombination structures, drive higher short-term impact through early adoption within specialized communities, but face diminishing returns as innovations become "locked-in" with limited diffusion potential. Conversely, when inventions are grounded in broad search that spans disparate domains, they encounter initial resistance but achieve wider diffusion and greater long-term impact by reaching cognitively diverse audiences. Individual inventions require both depth and breadth for stable impact. Organizations can strategically balance approaches across multiple inventions: using depth to build reliable technological infrastructure while pursuing breadth to expand applications. We advance innovation theory by demonstrating how deep and broad search strategies distinctly shape the timing and trajectory of technological impact, and how individual inventors and organizations can leverage these mechanisms to balance exploitation and exploration.


【53】Autonomous battery research: Principles of heuristic operando experimentation
标题:自主电池研究:启发式操作实验的原则
链接:https://arxiv.org/abs/2601.00851

作者:Emily Lu,Gabriel Perez,Peter Baker,Daniel Irving,Santosh Kumar,Veronica Celorrio,Sylvia Britto,Thomas F. Headen,Miguel Gomez-Gonzalez,Connor Wright,Calum Green,Robert Scott Young,Oleg Kirichek,Ali Mortazavi,Sarah Day,Isabel Antony,Zoe Wright,Thomas Wood,Tim Snow,Jeyan Thiyagalingam,Paul Quinn,Martin Owen Jones,William David,James Le Houx
备注:38 pages, 14 figures. Includes a detailed technical review of the POLARIS, BAM, DRIX, M-Series, and B18 electrochemical cells in the Supplementary Information
摘要:Unravelling the complex processes governing battery degradation is critical to the energy transition, yet the efficacy of operando characterisation is severely constrained by a lack of Reliability, Representativeness, and Reproducibility (the 3Rs). Current methods rely on bespoke hardware and passive, pre-programmed methodologies that are ill-equipped to capture stochastic failure events. Here, using the Rutherford Appleton Laboratory's multi-modal toolkit as a case study, we expose the systemic inability of conventional experiments to capture transient phenomena like dendrite initiation. To address this, we propose Heuristic Operando experiments: a framework where an AI pilot leverages physics-based digital twins to actively steer the beamline to predict and deterministically capture these rare events. Distinct from uncertainty-driven active learning, this proactive search anticipates failure precursors, redefining experimental efficiency via an entropy-based metric that prioritises scientific insight per photon, neutron, or muon. By focusing measurements only on mechanistically decisive moments, this framework simultaneously mitigates beam damage and drastically reduces data redundancy. When integrated with FAIR data principles, this approach serves as a blueprint for the trusted autonomous battery laboratories of the future.


机器翻译由腾讯交互翻译提供,仅供参考

点击“阅读原文”获取带摘要的学术速递

Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/191362