Py学习  »  机器学习算法

机器学习学术速递[9.26]

arXiv每日学术速递 • 2 天前 • 64 次点击  

点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!


cs.LG 方向,今日共计185篇


大模型相关(24篇)

【1】It's Not You, It's Clipping: A Soft Trust-Region via Probability Smoothing for LLM RL
标题:这不是你,这是裁剪:通过概率平滑的LLM RL软信任域
链接:https://arxiv.org/abs/2509.21282

作者: Dwyer, Adam Sobey, Adriane Chapman
摘要:使用强化学习(RL)方法(如PPO和GRPO)训练大型语言模型(LLM)通常依赖于比率裁剪来稳定更新。虽然可以有效防止不稳定性,但裁剪会丢弃信息并引入梯度不连续性。我们提出了概率平滑策略优化(PSPO),它平滑当前政策的概率对旧的(行为)的政策之前,计算的重要性比,类似于标签平滑。与裁剪不同,PSPO保留了梯度信号,而对旧策略的插值创建了一个软信任区域,该区域具有正式的保证,不鼓励大的、不稳定的更新。   我们在GRPO(GR-PSPO)中实例化PSPO,并在GSM 8 K上对Qwen2.5-0.5B和Qwen2.5-1.5B进行微调,在GSM 8 K上进行测试,并在SVAMP,ASDiv和MATH-500上进行跨数据集推广。相对于未裁剪的GRPO(单次迭代;无数据重用,比率始终= 1),GR-PSPO实现了类似的性能,但改进了推理,导致更清晰,更简洁的响应,更符合逻辑。与削减GRPO相比,GR-PSPO大大提高了0.5B和1.5B型号的性能,在GSM 8 K上提高了20%以上(0.5B为39.7%对17.6%,1.5B为59.4%对37.8%)。
摘要:Training large language models (LLMs) with reinforcement learning (RL) methods such as PPO and GRPO commonly relies on ratio clipping to stabilise updates. While effective at preventing instability, clipping discards information and introduces gradient discontinuities. We propose Probability Smoothing Policy Optimisation (PSPO), which smooths the current policy's probabilities toward the old (behaviour) policy before computing the importance ratio, analogous to label smoothing. Unlike clipping, PSPO preserves gradient signal, while interpolation toward the old policy creates a soft trust region that discourages large, destabilising updates, with formal guarantees.   We instantiate PSPO within GRPO (GR-PSPO) and fine-tune Qwen2.5-0.5B and Qwen2.5-1.5B on GSM8K, evaluating on GSM8K test and the cross-dataset generalisation on SVAMP, ASDiv, and MATH-500. Relative to unclipped GRPO (single iteration; no data reuse, ratio always = 1), GR-PSPO achieves similar performance but improves the reasoning leading to clearer and more concise responses which are more logical. Compared to clipped GRPO, GR-PSPO substantially improves performance both the 0.5B and 1.5B models, with a boost of over 20% on GSM8K (39.7% vs. 17.6% for 0.5B, 59.4% vs. 37.8% for 1.5B).


【2】SuperOffload: Unleashing the Power of Large-Scale LLM Training on Superchips
标题:超级卸载:在超级芯片上释放大规模LLM训练的力量
链接:https://arxiv.org/abs/2509.21271

作者:n, Masahiro Tanaka, Olatunji Ruwase, Minjia Zhang
备注:16 pages, 15 figures
摘要:超级芯片的出现代表了下一代AI硬件的重大进步。这些超级芯片采用紧密耦合的异构架构,将GPU和CPU集成在同一个封装上,提供了前所未有的计算能力。然而,很少有研究调查LLM培训如何从这种新架构中受益。在这项工作中,我们第一次研究LLM训练解决方案的基础上卸载超级芯片。我们观察到超级芯片和传统的松耦合GPU-CPU架构之间的重要差异,这需要重新审视有关卸载的普遍假设。在此基础上,我们提出了SuperOffload,一个以超级芯片为中心的卸载系统,同时使用Hopper GPU,Grace CPU和NVLink-C2C互连更有效。SuperOffload通过多种技术的组合来实现这一目标,例如自适应权重卸载、分区重新分区、Superchip感知铸造、推测执行以及针对Grace CPU的高度优化的Adam优化器。我们对NVIDIA GH 200上的SuperOffload进行的评估表明,与最先进的基于卸载的系统相比,吞吐量提高了2.5倍,能够在单个超级芯片上训练多达25 B的模型,同时实现高训练吞吐量。我们还使用ZeRO风格的数据并行和DeepSpeed-Side序列并行扩展了SuperOffload,使序列长度高达100万个令牌的13 B模型能够在8 GH 200上训练,同时实现55%的MFU。
摘要:The emergence of Superchips represents a significant advancement in next-generation AI hardware. These Superchips employ a tightly coupled heterogeneous architecture that integrates GPU and CPU on the same package, which offers unprecedented computational power. However, there has been scant research investigating how LLM training benefits from this new architecture. In this work, for the first time, we study LLM training solutions based on offloading for Superchips. We observe important differences between Superchips and traditional loosely-coupled GPU-CPU architecture, which necessitate revisiting prevailing assumptions about offloading. Based on that, we present SuperOffload, a Superchip-centric offloading system that simultaneously uses Hopper GPU, Grace CPU, and NVLink-C2C interconnect more efficiently. SuperOffload accomplishes this via a combination of techniques, such as adaptive weight offloading, bucketization repartitioning, Superchip-aware casting, speculative execution, and a highly optimized Adam optimizer for Grace CPUs. Our evaluation of SuperOffload on NVIDIA GH200 demonstrates up to 2.5x throughput improvement compared to state-of-the-art offloading-based systems, enabling training of up to 25B model on a single Superchip while achieving high training throughput. We also extend SuperOffload with ZeRO-style data parallelism and DeepSpeed-Ulysses sequence parallelism, enabling training of 13B model with sequence lengths up to 1 million tokens on 8 GH200 while achieving 55% MFU.


【3】Decipher-MR: A Vision-Language Foundation Model for 3D MRI Representations
标题:Decipher-MR:3D MRI表示的视觉语言基础模型
链接:https://arxiv.org/abs/2509.21249

作者:ang, Noel DSouza, Istvan Megyeri, Xiaojian Xu, Amin Honarmandi Shandiz, Farzin Haddadpour, Krisztian Koos, Laszlo Rusko, Emanuele Valeriano, Bharadwaj Swaninathan, Lei Wu, Parminder Bhatia, Taha Kass-Hout, Erhan Bas
摘要 :磁共振成像(MRI)是临床诊断和研究中的关键医学成像模式,但其复杂性和异质性对自动化分析提出了挑战,特别是在可扩展和可推广的机器学习应用中。虽然基础模型已经彻底改变了自然语言和视觉任务,但由于数据稀缺和狭窄的解剖学焦点,它们在MRI中的应用仍然有限。在这项工作中,我们提出了Decipher-MR,这是一种在大规模数据集上训练的3D MRI特定视觉语言基础模型,该数据集包括来自不同解剖区域,序列和病理的22,000多项研究的200,000个MRI系列。Decipher-MR将自监督视觉学习与报告引导的文本监督相结合,以构建强大的、可推广的表示,从而实现跨广泛应用的有效适应。为了以最小的计算开销实现强大而多样化的临床任务,Decipher-MR支持模块化设计,可以调整连接到冻结预训练编码器的轻量级特定任务解码器。在此设置之后,我们评估了不同基准的Decipher-MR,包括疾病分类,人口统计预测,解剖定位和跨模态检索,证明了与现有基础模型和特定任务方法相比的一致性能增益。我们的研究结果使Decipher-MR成为基于MRI的AI的可扩展和多功能基础,促进了临床和研究领域的有效开发。
摘要:Magnetic Resonance Imaging (MRI) is a critical medical imaging modality in clinical diagnosis and research, yet its complexity and heterogeneity pose challenges for automated analysis, particularly in scalable and generalizable machine learning applications. While foundation models have revolutionized natural language and vision tasks, their application to MRI remains limited due to data scarcity and narrow anatomical focus. In this work, we present Decipher-MR, a 3D MRI-specific vision-language foundation model trained on a large-scale dataset comprising 200,000 MRI series from over 22,000 studies spanning diverse anatomical regions, sequences, and pathologies. Decipher-MR integrates self-supervised vision learning with report-guided text supervision to build robust, generalizable representations, enabling effective adaptation across broad applications. To enable robust and diverse clinical tasks with minimal computational overhead, Decipher-MR supports a modular design that enables tuning of lightweight, task-specific decoders attached to a frozen pretrained encoder. Following this setting, we evaluate Decipher-MR across diverse benchmarks including disease classification, demographic prediction, anatomical localization, and cross-modal retrieval, demonstrating consistent performance gains over existing foundation models and task-specific approaches. Our results establish Decipher-MR as a scalable and versatile foundation for MRI-based AI, facilitating efficient development across clinical and research domains.


【4】Explaining Fine Tuned LLMs via Counterfactuals A Knowledge Graph Driven Framework
标题:通过反事实解释微调的LLM知识图谱驱动框架
链接:https://arxiv.org/abs/2509.21241

作者:ang, Ziyang Chen, Md Faisal Kabir
备注:16 pages, 9 figures
摘要:低秩自适应(LoRA)的广泛采用使大型语言模型(LLM)能够以显着的效率获取特定领域的知识。然而,理解这种微调机制如何改变模型的结构推理和语义行为仍然是一个开放的挑战。这项工作引入了一个新的框架,解释微调LLM通过反事实接地知识图。具体来说,我们构建BioToolKG,一个特定领域的异构知识图在生物信息学工具和设计一个基于反事实的微调LLM解释器(CFFTLLMExplainer),学习软掩模在图形节点和边缘,以产生最小的结构扰动,诱导最大的语义分歧。我们的方法联合优化结构稀疏性和语义发散,同时执行可解释性保留约束,如熵正则化和边缘平滑。我们将此框架应用于微调的基于LLaMA的LLM,并揭示反事实掩蔽暴露了模型的结构依赖性,并与LoRA引起的参数变化保持一致。这项工作为微调LLM的内部机制提供了新的见解,并强调了反事实图作为可解释AI的潜在工具。
摘要:The widespread adoption of Low-Rank Adaptation (LoRA) has enabled large language models (LLMs) to acquire domain-specific knowledge with remarkable efficiency. However, understanding how such a fine-tuning mechanism alters a model's structural reasoning and semantic behavior remains an open challenge. This work introduces a novel framework that explains fine-tuned LLMs via counterfactuals grounded in knowledge graphs. Specifically, we construct BioToolKG, a domain-specific heterogeneous knowledge graph in bioinformatics tools and design a counterfactual-based fine-tuned LLMs explainer (CFFTLLMExplainer) that learns soft masks over graph nodes and edges to generate minimal structural perturbations that induce maximum semantic divergence. Our method jointly optimizes structural sparsity and semantic divergence while enforcing interpretability preserving constraints such as entropy regularization and edge smoothness. We apply this framework to a fine-tuned LLaMA-based LLM and reveal that counterfactual masking exposes the model's structural dependencies and aligns with LoRA-induced parameter shifts. This work provides new insights into the internal mechanisms of fine-tuned LLMs and highlights counterfactual graphs as a potential tool for interpretable AI.


【5】Tree Search for LLM Agent Reinforcement Learning
标题:LLM Agent强化学习的树搜索
链接:https://arxiv.org/abs/2509.21240

作者:i, Ziyu Ma, Yong Wang, Guanhua Chen, Xiangxiang Chu, Liaoni Wu
摘要:强化学习(RL)的最新进展显着增强了大型语言模型(LLM)的代理能力。在长期和多回合代理任务中,现有的方法仅由结果奖励驱动,往往受到稀疏监督的问题。为了解决这一挑战,我们提出了基于树的组相对策略优化(Tree-GRPO),一个基于树搜索的分组代理RL方法,其中每个树节点代表完整的代理交互步骤。通过共享公共前缀,树搜索采样增加了在令牌或工具调用的固定预算内可实现的转出数量。此外,我们发现,树结构的轨迹自然允许逐步过程监督信号的建设,即使只使用结果奖励。在此基础上,Tree-GRPO在树内和树间两个层次上估计了分组的相对优势。通过理论分析,我们证明了树内层次的组相对策略优化的目标是等效的步骤级的直接偏好学习。在11个数据集和3种类型的QA任务上的实验表明,所提出的基于树的RL优于基于链的RL方法。
摘要:Recent advances in reinforcement learning (RL) have significantly enhanced the agentic capabilities of large language models (LLMs). In long-term and multi-turn agent tasks, existing approaches driven solely by outcome rewards often suffer from the problem of sparse supervision. To address the challenge, we propose Tree-based Group Relative Policy Optimization (Tree-GRPO), a grouped agent RL method based on tree search, where each tree node represents the complete agent interaction step. By sharing common prefixes, the tree search sampling increases the number of rollouts achievable within a fixed budget of tokens or tool calls. Moreover, we find that the tree-structured trajectory naturally allows the construction of step-wise process supervised signals even using only the outcome reward. Based on this, Tree-GRPO estimates the grouped relative advantages both on intra-tree and inter-tree levels. Through theoretical analysis, we demonstrate that the objective of intra-tree level group relative policy optimization is equivalent to that of step-level direct preference learning. Experiments across 11 datasets and 3 types of QA tasks demonstrate the superiority of the proposed tree-based RL over the chain-based RL method.


【6】Go With The Flow: Churn-Tolerant Decentralized Training of Large Language Models
标题:随波逐流:大型语言模型的可容忍流失分散训练
链接:https://arxiv.org/abs/2509.21221

作者:lagoev, Bart Cox, Jérémie Decouchant, Lydia Y. Chen
摘要:由于大型语言模型(LLM)的出现以及民主化其培训的重要性,我们提出了GWTF,这是LLM的第一个崩溃容忍实用分散培训框架。与现有的分布式和联合培训框架不同,GWTF使LLM在自愿提供资源的异构客户端上进行高效的协作培训。此外,GWTF解决了节点流失问题,即,客户端在任何时间加入或离开系统;以及网络不稳定性,即,网络链接变得不稳定或不可靠。GWTF的核心是一种新型的分散流算法,该算法可以找到最有效的路由,以最低的延迟最大化训练的微批数量。我们广泛评估GWTF的GPT和LLaMa类模型,并将其与现有技术进行比较。我们的研究结果表明,GWTF减少了高达45%的训练时间,在现实和具有挑战性的情况下,涉及异构客户端节点分布在10个不同的地理位置与高节点流失率。
摘要:Motivated by the emergence of large language models (LLMs) and the importance of democratizing their training, we propose GWTF, the first crash tolerant practical decentralized training framework for LLMs. Differently from existing distributed and federated training frameworks, GWTF enables the efficient collaborative training of a LLM on heterogeneous clients that volunteer their resources. In addition, GWTF addresses node churn, i.e., clients joining or leaving the system at any time, and network instabilities, i.e., network links becoming unstable or unreliable. The core of GWTF is a novel decentralized flow algorithm that finds the most effective routing that maximizes the number of microbatches trained with the lowest possible delay. We extensively evaluate GWTF on GPT-like and LLaMa-like models and compare it against the prior art. Our results indicate that GWTF reduces the training time by up to 45% in realistic and challenging scenarios that involve heterogeneous client nodes distributed over 10 different geographic locations with a high node churn rate.


【7】Communication Bias in Large Language Models: A Regulatory Perspective
标题:大型语言模型中的沟通偏差:监管视角
链接:https://arxiv.org/abs/2509.21075

作者:enzler, Stefan Schmid
摘要:大型语言模型(LLM)对许多应用程序越来越重要,引起了人们对偏见,公平性和法规遵从性的担忧。本文回顾了偏见输出的风险及其社会影响,重点关注欧盟的人工智能法案和数字服务法案等框架。我们认为,除了持续的监管之外,还需要更加关注竞争和设计治理,以确保公平,值得信赖的人工智能。这是一个预印本的通信的ACM文章的同一标题。
摘要:Large language models (LLMs) are increasingly central to many applications, raising concerns about bias, fairness, and regulatory compliance. This paper reviews risks of biased outputs and their societal impact, focusing on frameworks like the EU's AI Act and the Digital Services Act. We argue that beyond constant regulation, stronger attention to competition and design governance is needed to ensure fair, trustworthy AI. This is a preprint of the Communications of the ACM article of the same title.


【8】Reinforcement Learning Fine-Tuning Enhances Activation Intensity and Diversity in the Internal Circuitry of LLMs
标题:强化学习微调增强LLM内部电路的激活强度和多样性
链接:https://arxiv.org/abs/2509.21044

作者:hang, Qianyue Hao, Fengli Xu, Yong Li
摘要:大型语言模型(LLM)通过大规模预训练获得广泛的先验知识,并可以通过基于监督微调(SFT)或强化学习(RL)的后训练进一步增强。越来越多的证据表明,RL微调提高了LLM的能力,超出了SFT单独实现的能力。然而,为什么RL微调能够增强具有不同内在特性的各种LLM的能力的潜在机制仍然未被探索。在这项研究中,我们从以前的工作边缘属性修补(EAP)的启发,研究RL微调前后的LLM的内部差异。我们对多个模型家族的分析显示了在线RL后训练的两个强大效果:(i)激活强度的整体增加,表明更多的内部通路被激活,它们的信号变得更强,(ii)激活模式的多样性更大,反映在更高的熵和更少的集中边缘分布上。这些变化表明,强化学习重塑了信息流,使其更加冗余和灵活,这可能解释了其在泛化方面的优势。值得注意的是,使用直接偏好优化(DPO)进行微调的模型偏离了这些趋势,与基于PPO和GRPO的训练相比,表现出明显较弱或不一致的内部变化。总之,我们的研究结果提供了一个统一的观点,即RL微调如何系统地改变LLM的内部电路,并突出了在线RL和基于偏好的方法之间的方法学区别。我们的代码在https://anonymous.4open.science/r/llm_rl_probing_analysis-F673上是开源的。
摘要:Large language models (LLMs) acquire extensive prior knowledge through large-scale pretraining and can be further enhanced via supervised fine-tuning (SFT) or reinforcement learning (RL)-based post-training. A growing body of evidence has shown that RL fine-tuning improves the capability of LLMs beyond what SFT alone achieves. However, the underlying mechanisms why RL fine-tuning is able to enhance the capability of various LLMs with distinct intrinsic characteristics remain underexplored. In this study, we draw inspiration from prior work on edge attribution patching (EAP) to investigate the internal differences of LLMs before and after RL fine-tuning. Our analysis across multiple model families shows two robust effects of online RL post-training: (i) an overall increase in activation intensity, indicating that more internal pathways are engaged and their signals become stronger, and (ii) greater diversity in activation patterns, reflected by higher entropy and less concentrated edge distributions. These changes suggest that RL reshapes information flow to be both more redundant and more flexible, which may explain its advantage in generalization. Notably, models fine-tuned with Direct Preference Optimization (DPO) deviate from these trends, exhibiting substantially weaker or inconsistent internal changes compared to PPO- and GRPO-based training. Together, our findings provide a unified view of how RL fine-tuning systematically alters the internal circuitry of LLMs and highlight the methodological distinctions between online RL and preference-based approaches. Our code is open source at https://anonymous.4open.science/r/llm_rl_probing_analysis-F673.


【9】DELTA-Code: How Does RL Unlock and Transfer New Programming Algorithms in LLMs?
标题:DELTA-Code:RL如何在LLM中初始化和传输新编程算法?
链接:https://arxiv.org/abs/2509.21016

作者:, Yuhan Cao, Pohao Huang, Haoyue Bai, Hannaneh Hajishirzi, Nouha Dziri, Dawn Song
摘要:它仍然是一个悬而未决的问题,LLM是否可以获得或推广真正新的推理策略,超越了在训练前或训练后的参数中编码的尖锐技能。为了试图回答这一争论,我们引入了DELTA-Code-Distributional Evaluation of Learnability and Transferrability in Mathematic Coding,这是一个合成编码问题系列的受控基准,旨在探索两个基本方面:可学习性-LLM可以通过强化学习(RL)解决预训练模型在足够大的尝试(pass@K=0)下失败的问题系列吗?--和可转移性--如果可学习性发生,这些技能可以系统地转移到分布外(OOD)测试集吗?与之前的公共编码数据集不同,DELTA通过模板化的问题生成器隔离推理技能,并引入了完全面向对象的问题族,这些问题族需要新颖的策略,而不是工具调用或记忆模式。我们的实验揭示了一个惊人的Grokking相变:在长时间接近零奖励之后,RL训练的模型突然攀升到近乎完美的准确度。为了使以前无法解决的问题家族具有可学习性,我们探索了关键的培训成分,例如具有密集奖励的阶段性热身,经验重放,课程培训和循环验证。除了可学习性之外,我们还使用DELTA来评估可转移性或沿探索性,组合性和变革性轴的泛化,以及跨家族转移。结果显示,在家庭内部和重组技能方面取得了坚实的进展,但在变革性案例中仍然存在弱点。因此,DELTA提供了一个干净的测试平台,用于探测RL驱动推理的极限,并了解模型如何超越现有的先验知识,以获得新的算法技能。
摘要:It remains an open question whether LLMs can acquire or generalize genuinely new reasoning strategies, beyond the sharpened skills encoded in their parameters during pre-training or post-training. To attempt to answer this debate, we introduce DELTA-Code--Distributional Evaluation of Learnability and Transferrability in Algorithmic Coding, a controlled benchmark of synthetic coding problem families designed to probe two fundamental aspects: learnability -- can LLMs, through reinforcement learning (RL), solve problem families where pretrained models exhibit failure with large enough attempts (pass@K=0)? --and transferrability -- if learnability happens, can such skills transfer systematically to out-of-distribution (OOD) test sets? Unlike prior public coding datasets, DELTA isolates reasoning skills through templated problem generators and introduces fully OOD problem families that demand novel strategies rather than tool invocation or memorized patterns. Our experiments reveal a striking grokking phase transition: after an extended period with near-zero reward, RL-trained models abruptly climb to near-perfect accuracy. To enable learnability on previously unsolvable problem families, we explore key training ingredients such as staged warm-up with dense rewards, experience replay, curriculum training, and verification-in-the-loop. Beyond learnability, we use DELTA to evaluate transferability or generalization along exploratory, compositional, and transformative axes, as well as cross-family transfer. Results show solid gains within families and for recomposed skills, but persistent weaknesses in transformative cases. DELTA thus offers a clean testbed for probing the limits of RL-driven reasoning and for understanding how models can move beyond existing priors to acquire new algorithmic skills.


【10】Predicting LLM Reasoning Performance with Small Proxy Model
标题:使用小型代理模型预测LLM推理性能
链接:https://arxiv.org/abs/2509.21013

作者:oh, Juyoung Suk, Sungjun Han, Se-Young Yun, Jay Shin
备注:Pre-print
摘要:考虑到预先训练大型语言模型的高昂成本,在扩展之前利用较小的代理模型来优化数据集是至关重要的。然而,这种方法对于推理能力来说变得具有挑战性,推理能力表现出只有在更大的模型大小(通常超过7B参数)下才可靠出现的紧急行为。为了解决这个问题,我们引入了rBridge,表明小代理(1B)可以通过更紧密地与(1)预训练目标和(2)目标任务保持一致来有效地预测大模型推理。rBridge通过使用任务对齐对负对数似然进行加权来实现这一点,使用来自前沿模型的推理轨迹作为黄金标签。在我们的实验中,rBridge(i)相对于最佳基线将数据集排名成本降低了100倍以上,(ii)在1B到32 B规模的六个推理基准之间实现了最强的相关性,(iii)zero-shot在1B到7B规模的预训练数据集之间传输预测关系。这些研究结果表明,rBridge提供了一个实用的路径,探索以较低的成本面向推理的预训练。
摘要:Given the prohibitive cost of pre-training large language models, it is essential to leverage smaller proxy models to optimize datasets before scaling up. However, this approach becomes challenging for reasoning capabilities, which exhibit emergent behavior that only appear reliably at larger model sizes, often exceeding 7B parameters. To address this, we introduce rBridge, showing that small proxies ($\leq$1B) can effectively predict large-model reasoning by aligning more closely with (1) the pre-training objective and (2) the target task. rBridge achieves this by weighting negative log-likelihood with task alignment, using reasoning traces from frontier models as gold labels. In our experiments, rBridge (i) reduces dataset ranking costs by over 100x relative to the best baseline, (ii) achieves the strongest correlation across six reasoning benchmarks at 1B to 32B scale, and (iii) zero-shot transfers predictive relationships across pre-training datasets at 1B to 7B scale. These findings indicate that rBridge offers a practical path for exploring reasoning-oriented pre-training at lower cost.


【11】Binary Autoencoder for Mechanistic Interpretability of Large Language Models
标题:大型语言模型的机械可解释性二进制自动编码器
链接:https://arxiv.org/abs/2509.20997

作者:o, Haolin Yang, Brian M. Kurkoski, Naoya Inoue
备注:36 pages, 41 figures, 3 tables
摘要 :现有的工作致力于解开原子化的数值组件(功能)从隐藏状态的大型语言模型(LLM)解释他们的机制。然而,它们通常依赖于受单个训练实例上的一些隐式训练时间正则化约束的自动编码器(即,$L_1$归一化、top-k函数等),没有明确保证实例之间的全局稀疏性,导致大量密集(同时不活动)特征,损害特征稀疏性和原子化。在本文中,我们提出了一种新的自动编码器变体,它对隐藏激活的小批量执行最小熵,从而促进了实例之间的特征独立性和稀疏性。为了有效的熵计算,我们通过阶跃函数将隐藏的激活离散化为1位,并应用梯度估计来实现反向传播,因此我们将其称为二进制自动编码器(BAE),并以经验证明了两个主要应用:(1)特征集熵计算。熵可以可靠地估计二进制隐藏激活,我们经验性地评估和利用LLM和上下文学习的推理动态特征。(2)特征解缠。与典型方法类似,BAE可以从LLM的隐藏状态中提取原子化特征。为了稳健地评估这种特征提取能力,我们改进了传统的特征解释方法,以避免不可靠的处理数字令牌,并表明BAE避免密集的功能,同时产生最大数量的基线之间的可解释的,这证实了BAE作为特征提取器的有效性。
摘要:Existing works are dedicated to untangling atomized numerical components (features) from the hidden states of Large Language Models (LLMs) for interpreting their mechanism. However, they typically rely on autoencoders constrained by some implicit training-time regularization on single training instances (i.e., $L_1$ normalization, top-k function, etc.), without an explicit guarantee of global sparsity among instances, causing a large amount of dense (simultaneously inactive) features, harming the feature sparsity and atomization. In this paper, we propose a novel autoencoder variant that enforces minimal entropy on minibatches of hidden activations, thereby promoting feature independence and sparsity across instances. For efficient entropy calculation, we discretize the hidden activations to 1-bit via a step function and apply gradient estimation to enable backpropagation, so that we term it as Binary Autoencoder (BAE) and empirically demonstrate two major applications: (1) Feature set entropy calculation. Entropy can be reliably estimated on binary hidden activations, which we empirically evaluate and leverage to characterize the inference dynamics of LLMs and In-context Learning. (2) Feature untangling. Similar to typical methods, BAE can extract atomized features from LLM's hidden states. To robustly evaluate such feature extraction capability, we refine traditional feature-interpretation methods to avoid unreliable handling of numerical tokens, and show that BAE avoids dense features while producing the largest number of interpretable ones among baselines, which confirms the effectiveness of BAE serving as a feature extractor.


【12】CLUE: Conflict-guided Localization for LLM Unlearning Framework
标题:线索:LLM Unlearning框架的预算引导本地化
链接:https://arxiv.org/abs/2509.20977

作者:, Jiaying Zhu, Xinyu Yang, Wenya Wang
备注:10 pages
摘要:LLM unlearning旨在消除不需要的数据的影响,而不影响因果无关的信息。这个过程通常包括使用遗忘集来删除目标信息,以及使用保留集来维护非目标功能。虽然最近基于定位的方法在识别重要的神经元方面表现出了希望,但它们未能解开负责忘记不需要的知识或保留基本技能的神经元,通常将它们视为一个单一的纠缠组。因此,这些方法采用统一的干预措施,冒着灾难性的过度遗忘或不完全删除目标知识的风险。为了解决这个问题,我们转向电路发现,一种机械的可解释性技术,并提出了LLM Unlearning框架(CLUE)的引导定位。该框架识别了由重要神经元组成的遗忘和保持回路,并将其转换为合取范式(CNF)。CNF可满足性解决方案中每个神经元的分配揭示了它应该被遗忘还是保留。然后,我们为不同类别的神经元提供有针对性的微调策略。大量的实验表明,与现有的定位方法相比,CLUE通过精确的神经定位实现了卓越的遗忘功效和保留效用。
摘要:The LLM unlearning aims to eliminate the influence of undesirable data without affecting causally unrelated information. This process typically involves using a forget set to remove target information, alongside a retain set to maintain non-target capabilities. While recent localization-based methods demonstrate promise in identifying important neurons to be unlearned, they fail to disentangle neurons responsible for forgetting undesirable knowledge or retaining essential skills, often treating them as a single entangled group. As a result, these methods apply uniform interventions, risking catastrophic over-forgetting or incomplete erasure of the target knowledge. To address this, we turn to circuit discovery, a mechanistic interpretability technique, and propose the Conflict-guided Localization for LLM Unlearning framEwork (CLUE). This framework identifies the forget and retain circuit composed of important neurons, and then the circuits are transformed into conjunctive normal forms (CNF). The assignment of each neuron in the CNF satisfiability solution reveals whether it should be forgotten or retained. We then provide targeted fine-tuning strategies for different categories of neurons. Extensive experiments demonstrate that, compared to existing localization methods, CLUE achieves superior forget efficacy and retain utility through precise neural localization.


【13】Knowledgeable Language Models as Black-Box Optimizers for Personalized Medicine
标题:知识语言模型作为个性化医疗的黑匣子优化器
链接:https://arxiv.org/abs/2509.20975

作者:. Yao, Osbert Bastani, Alma Andersson, Tommaso Biancalani, Aïcha Bentaieb, Claudia Iriondo
备注:56 pages
摘要:个性化医疗的目标是发现一种治疗方案,根据患者的个人遗传和环境因素优化患者的临床结果。然而,候选治疗不能任意给予患者以评估其疗效;我们通常可以使用模拟替代模型,该模型近似于拟议治疗的真实适合度。不幸的是,这种替代模型已被证明无法推广到以前看不见的患者治疗组合。我们假设,特定领域的先验知识-如医学教科书和生物医学知识图-可以提供一个有意义的替代信号的建议治疗的适应性。为此,我们引入了基于LLM的熵引导优化与已知先验(LEON),这是一种数学原理方法,可以利用大型语言模型(LLM)作为黑盒优化器,而无需任何特定于任务的微调,利用它们将非结构化领域知识置于上下文的能力,以自然语言提出个性化的治疗计划。在实践中,我们通过“提示优化”来实现LEON,它使用LLM作为提出治疗设计的随机引擎。在实际优化任务上的实验表明,LEON在为患者提供个性化治疗方面优于传统方法和基于LLM的方法。
摘要:The goal of personalized medicine is to discover a treatment regimen that optimizes a patient's clinical outcome based on their personal genetic and environmental factors. However, candidate treatments cannot be arbitrarily administered to the patient to assess their efficacy; we often instead have access to an in silico surrogate model that approximates the true fitness of a proposed treatment. Unfortunately, such surrogate models have been shown to fail to generalize to previously unseen patient-treatment combinations. We hypothesize that domain-specific prior knowledge - such as medical textbooks and biomedical knowledge graphs - can provide a meaningful alternative signal of the fitness of proposed treatments. To this end, we introduce LLM-based Entropy-guided Optimization with kNowledgeable priors (LEON), a mathematically principled approach to leverage large language models (LLMs) as black-box optimizers without any task-specific fine-tuning, taking advantage of their ability to contextualize unstructured domain knowledge to propose personalized treatment plans in natural language. In practice, we implement LEON via 'optimization by prompting,' which uses LLMs as stochastic engines for proposing treatment designs. Experiments on real-world optimization tasks show LEON outperforms both traditional and LLM-based methods in proposing individualized treatments for patients.


【14】StyleBench: Evaluating thinking styles in Large Language Models
标题:StyleBench:评估大型语言模型中的思维方式
链接:https://arxiv.org/abs/2509.20868

作者:, Shangding Gu, Ming Jin, Costas Spanos, Javad Lavaei
摘要:大型语言模型(LLM)的有效性在很大程度上受到其提示中采用的推理策略或思维风格的影响。然而,这些推理风格,模型架构和任务类型之间的相互作用仍然知之甚少。为了解决这个问题,我们引入了StyleBench,这是一个全面的基准,用于系统地评估不同任务和模型的推理风格。我们评估了五种代表性的推理风格,包括思想链(CoT),思想树(ToT),思想算法(AoT),思想草图(SoT)和草稿链(CoD),使用来自主要家族(LLaMA,Qwen,Mistral,Gemma,GPT-OSS,Phi和DeepSeek)的15个开源模型,从270 M到120 B参数。我们的大规模分析表明,没有一种风格是普遍最佳的。我们证明了策略的有效性是高度依赖于模型规模和任务类型:基于搜索的方法(AoT,ToT)在开放式问题中表现出色,但需要大规模的模型,而简洁的风格(SoT,CoD)在定义明确的任务上实现了根本的效率提升。此外,我们确定了关键的行为模式:较小的模型经常无法遵循输出指令并默认猜测,而推理鲁棒性则是规模的函数。我们的研究结果提供了一个关键的路线图,选择基于特定约束的最佳推理策略,我们在https://github.com/JamesJunyuGuo/Style_Bench开源基准。
摘要:The effectiveness of Large Language Models (LLMs) is heavily influenced by the reasoning strategies, or styles of thought, employed in their prompts. However, the interplay between these reasoning styles, model architecture, and task type remains poorly understood. To address this, we introduce StyleBench, a comprehensive benchmark for systematically evaluating reasoning styles across diverse tasks and models. We assess five representative reasoning styles, including Chain of Thought (CoT), Tree of Thought (ToT), Algorithm of Thought (AoT), Sketch of Thought (SoT), and Chain-of-Draft (CoD) on five reasoning tasks, using 15 open-source models from major families (LLaMA, Qwen, Mistral, Gemma, GPT-OSS, Phi, and DeepSeek) ranging from 270M to 120B parameters. Our large-scale analysis reveals that no single style is universally optimal. We demonstrate that strategy efficacy is highly contingent on both model scale and task type: search-based methods (AoT, ToT) excel in open-ended problems but require large-scale models, while concise styles (SoT, CoD) achieve radical efficiency gains on well-defined tasks. Furthermore, we identify key behavioral patterns: smaller models frequently fail to follow output instructions and default to guessing, while reasoning robustness emerges as a function of scale. Our findings offer a crucial roadmap for selecting optimal reasoning strategies based on specific constraints, we open source the benchmark in https://github.com/JamesJunyuGuo/Style_Bench.


【15】CaTS-Bench: Can Language Models Describe Numeric Time Series?
标题:CaTS长凳:语言模型可以描述数字时间序列吗?
链接:https://arxiv.org/abs/2509.20823

作者:, Pratham Yashwante, Marshall Fisher, Alessio Sampieri, Zihao Zhou, Fabio Galasso, Rose Yu
备注:9 pages, 4 images, 4 tables in the main paper. Many more in the appendix
摘要:时间序列字幕是用自然语言描述数字时间序列的任务,需要数字推理,趋势解释和上下文理解。然而,现有的基准通常依赖于合成数据或过于简单的标题,并且通常忽略元数据和视觉表示。为了缩小这一差距,我们引入了CaTS-Bench,这是第一个用于上下文感知时间序列字幕的大规模真实基准测试。CaTS-Bench来自11个不同的数据集,这些数据集被重新定义为字幕和问答任务,包括大约465 k的训练和105 k的测试时间戳。每个示例都包括一个数字序列段、上下文元数据、一个折线图图像和一个标题。这项工作的一个关键贡献是用于生成参考标题的可扩展管道:虽然大多数参考文献都是由Oracle LLM生成的,并通过事实检查,人类不可否认性研究和多样性分析进行验证,但我们还提供了一个由579个测试标题组成的人类重新访问的子集,从LLM输出中提炼,以确保准确性和人性化风格。除了标题,CaTS-Bench还提供了460个多项选择题,针对时间序列推理的更深层次问题。我们进一步提出了新的量身定制的评估指标和基准领先的VLM,突出他们的优势和持续的局限性。总之,这些贡献建立CaTS-Bench及其字幕管道作为一个可靠的和可扩展的基础,为未来的研究在时间序列分析和基础模型的交叉点。
摘要:Time series captioning, the task of describing numeric time series in natural language, requires numerical reasoning, trend interpretation, and contextual understanding. Existing benchmarks, however, often rely on synthetic data or overly simplistic captions, and typically neglect metadata and visual representations. To close this gap, we introduce CaTS-Bench, the first large-scale, real-world benchmark for Context-aware Time Series captioning. CaTS-Bench is derived from 11 diverse datasets reframed as captioning and Q&A tasks, comprising roughly 465k training and 105k test timestamps. Each sample includes a numeric series segment, contextual metadata, a line-chart image, and a caption. A key contribution of this work is the scalable pipeline used to generate reference captions: while most references are produced by an oracle LLM and verified through factual checks, human indistinguishability studies, and diversity analyses, we also provide a human-revisited subset of 579 test captions, refined from LLM outputs to ensure accuracy and human-like style. Beyond captioning, CaTS-Bench offers 460 multiple-choice questions targeting deeper aspects of time series reasoning. We further propose new tailored evaluation metrics and benchmark leading VLMs, highlighting both their strengths and persistent limitations. Together, these contributions establish CaTS-Bench and its captioning pipeline as a reliable and extensible foundation for future research at the intersection of time series analysis and foundation models.


【16】Measuring LLM Sensitivity in Transformer-based Tabular Data Synthesis
标题:在基于转换器的表格数据合成中测量LLM敏感性
链接:https://arxiv.org/abs/2509.20768

作者:Davila R, Azizjon Turaev, Wolfram Wingerath
备注:12 pages, 7 figures
摘要 :合成表格数据用于隐私保护数据共享和数据驱动模型开发。然而,它的有效性在很大程度上取决于所使用的表格数据合成(TDS)工具。最近的研究表明,基于transformer的模型在数据质量方面优于其他最先进的模型,如生成对抗网络(GANs)和扩散模型。然而,基于Transformer的模型也具有很高的计算成本,这使得它们有时对于拥有Prosumer硬件的最终用户来说是不可行的。这项研究提出了一个敏感性评估如何选择超参数,如层数或隐藏的维度影响合成数据的质量和计算性能。它是在两个工具GReaT和REaLTabFormer上执行的,评估了10个不同架构类型和深度的模型设置。我们评估了三个维度的敏感性:运行时间,机器学习(ML)实用程序以及与真实数据分布的相似性。在四个真实世界的数据集上进行了实验。我们的研究结果表明,运行时间与超参数的数量成正比,较浅的配置完成得更快。GReaT始终比REaLTabFormer实现更低的运行时间,并且只有在最大的数据集上,它们才具有可比的运行时间。对于小型数据集,这两种工具都可以实现具有高实用性和最佳相似性的合成数据,但在较大的数据集上,只有REaLTabFormer保持强大的实用性和相似性。因此,具有轻量级LLM的REaLTabFormer提供了最佳的平衡,因为它在降低计算需求的同时保留了数据质量。尽管如此,它的运行时间仍然高于GReaT和其他TDS工具,这表明效率提高是可能的,但只能达到一定的水平。
摘要:Synthetic tabular data is used for privacy-preserving data sharing and data-driven model development. Its effectiveness, however, depends heavily on the used Tabular Data Synthesis (TDS) tool. Recent studies have shown that Transformer-based models outperform other state-of-the-art models such as Generative Adversarial Networks (GANs) and Diffusion models in terms of data quality. However, Transformer-based models also come with high computational costs, making them sometimes unfeasible for end users with prosumer hardware. This study presents a sensitivity assessment on how the choice of hyperparameters, such as number of layers or hidden dimension affects the quality of the resultant synthetic data and the computational performance. It is performed across two tools, GReaT and REaLTabFormer, evaluating 10 model setups that vary in architecture type and depth. We assess the sensitivity on three dimensions: runtime, machine learning (ML) utility, and similarity to real data distributions. Experiments were conducted on four real-world datasets. Our findings reveal that runtime is proportional to the number of hyperparameters, with shallower configurations completing faster. GReaT consistently achieves lower runtimes than REaLTabFormer, and only on the largest dataset they have comparable runtime. For small datasets, both tools achieve synthetic data with high utility and optimal similarity, but on larger datasets only REaLTabFormer sustains strong utility and similarity. As a result, REaLTabFormer with lightweight LLMs provides the best balance, since it preserves data quality while reducing computational requirements. Nonetheless, its runtime remains higher than that of GReaT and other TDS tools, suggesting that efficiency gains are possible but only up to a certain level.


【17】Can Federated Learning Safeguard Private Data in LLM Training? Vulnerabilities, Attacks, and Defense Evaluation
标题:联合学习可以保护LLM训练中的私人数据吗?漏洞、攻击和防御评估
链接:https://arxiv.org/abs/2509.20680

作者:o, Xuefeng Liu, Haolin Wang, Jianwei Niu, Shaojie Tang, Jing Yuan
备注:28 pages, 32 figures, accepted to the Findings of EMNLP 2025
摘要:使用本地数据微调大型语言模型(LLM)是寻求使LLM适应其特定领域的组织广泛采用的方法。鉴于不同组织之间数据的共享特征,使用来自多个来源的数据协作微调LLM的想法提供了一个有吸引力的机会。然而,组织通常不愿意共享本地数据,使得集中式微调不切实际。联邦学习(FL)是一种隐私保护框架,使客户能够保留本地数据,同时仅共享用于协作训练的模型参数,提供了一种潜在的解决方案。虽然在集中式数据集上微调LLM有通过下一个令牌预测泄露数据的风险,但FL中的迭代聚合过程导致了一个封装了广义知识的全局模型,有些人认为这可以保护客户隐私。然而,在本文中,我们通过大量的实验提出了矛盾的发现。我们发现,攻击者仍然可以从全局模型中提取训练数据,即使使用简单的生成方法,泄漏也会随着模型大小的增加而增加。此外,我们还引入了针对FL量身定制的增强攻击策略,该策略在训练期间跟踪全局模型更新,以加剧隐私泄露。为了减轻这些风险,我们评估了FL中的隐私保护技术,包括差分隐私,正则化约束更新和采用具有安全对齐的LLM。我们的研究结果提供了宝贵的见解和实用的指导方针,以减少隐私风险时,培训法学硕士与FL。
摘要:Fine-tuning large language models (LLMs) with local data is a widely adopted approach for organizations seeking to adapt LLMs to their specific domains. Given the shared characteristics in data across different organizations, the idea of collaboratively fine-tuning an LLM using data from multiple sources presents an appealing opportunity. However, organizations are often reluctant to share local data, making centralized fine-tuning impractical. Federated learning (FL), a privacy-preserving framework, enables clients to retain local data while sharing only model parameters for collaborative training, offering a potential solution. While fine-tuning LLMs on centralized datasets risks data leakage through next-token prediction, the iterative aggregation process in FL results in a global model that encapsulates generalized knowledge, which some believe protects client privacy. In this paper, however, we present contradictory findings through extensive experiments. We show that attackers can still extract training data from the global model, even using straightforward generation methods, with leakage increasing as the model size grows. Moreover, we introduce an enhanced attack strategy tailored to FL, which tracks global model updates during training to intensify privacy leakage. To mitigate these risks, we evaluate privacy-preserving techniques in FL, including differential privacy, regularization-constrained updates and adopting LLMs with safety alignment. Our results provide valuable insights and practical guidelines for reducing privacy risks when training LLMs with FL.


【18】Look Before you Leap: Estimating LLM Benchmark Scores from Descriptions
标题:三思而后行:根据描述估计LLM基准分数
链接:https://arxiv.org/abs/2509.20645

作者:ark, Ethan Mendes, Gabriel Stanovsky, Alan Ritter
备注:24 pages, 6 figures
摘要:大型语言模型的进展受到评估瓶颈的限制:构建基准,评估模型和设置,然后重新构建。因此,我们提出了一个简单的问题:我们能在进行任何实验之前预测结果吗?我们研究纯文本性能预测:从编辑的任务描述和预期配置中估计模型的分数,而不访问数据集实例。为了支持系统的研究,我们策划了PRECOG,这是一个跨越不同任务,领域和指标的编辑的预防性能对语料库。实验表明,这项任务具有挑战性,但可行:配备了排除源论文的检索模块的模型实现了中等的预测性能,具有良好的校准不确定性,在高置信度阈值下,准确度子集的平均绝对误差低至8.7。我们的分析表明,更强大的推理模型参与多样化的迭代查询,而当前的开源模型滞后,经常跳过检索或收集证据,多样性有限。我们进一步测试了零泄漏设置,在他们的论文被索引之前对新发布的数据集或实验进行预测,其中具有内置网络搜索的GPT-5仍然达到了非平凡的预测精度。总体而言,我们的语料库和分析为开放式预期评估、支持难度估计和更智能的实验优先级确定迈出了第一步。
摘要 :Progress in large language models is constrained by an evaluation bottleneck: build a benchmark, evaluate models and settings, then iterate. We therefore ask a simple question: can we forecast outcomes before running any experiments? We study text-only performance forecasting: estimating a model's score from a redacted task description and intended configuration, with no access to dataset instances. To support systematic study, we curate PRECOG, a corpus of redacted description-performance pairs spanning diverse tasks, domains, and metrics. Experiments show the task is challenging but feasible: models equipped with a retrieval module that excludes source papers achieve moderate prediction performance with well-calibrated uncertainty, reaching mean absolute error as low as 8.7 on the Accuracy subset at high-confidence thresholds. Our analysis indicates that stronger reasoning models engage in diverse, iterative querying, whereas current open-source models lag and often skip retrieval or gather evidence with limited diversity. We further test a zero-leakage setting, forecasting on newly released datasets or experiments before their papers are indexed, where GPT-5 with built-in web search still attains nontrivial prediction accuracy. Overall, our corpus and analyses offer an initial step toward open-ended anticipatory evaluation, supporting difficulty estimation and smarter experiment prioritization.


【19】Investigating Modality Contribution in Audio LLMs for Music
标题:调查音乐音频LLM中的情态贡献
链接:https://arxiv.org/abs/2509.20641

作者:orais, Magdalena Fuentes
摘要:音频大语言模型(Audio LLM)可以实现关于音乐的类似人类的对话,但目前还不清楚它们是真的在听音频还是只是使用文本推理,正如最近的基准测试所表明的那样。本文通过量化每种模态对模型输出的贡献来研究这个问题。我们采用MM-SHAP框架,这是一个基于Shapley值的性能不可知分数,它量化了每个模态对模型预测的相对贡献。我们在MuChoMusic基准测试中评估了两个模型,发现准确率更高的模型更多地依赖于文本来回答问题,但进一步的检查表明,即使整体音频贡献很低,模型也可以成功地定位关键的声音事件,这表明音频并没有完全被忽略。我们的研究是MM-SHAP在音频LLM中的首次应用,我们希望它能成为未来可解释AI和音频研究的基础。
摘要:Audio Large Language Models (Audio LLMs) enable human-like conversation about music, yet it is unclear if they are truly listening to the audio or just using textual reasoning, as recent benchmarks suggest. This paper investigates this issue by quantifying the contribution of each modality to a model's output. We adapt the MM-SHAP framework, a performance-agnostic score based on Shapley values that quantifies the relative contribution of each modality to a model's prediction. We evaluate two models on the MuChoMusic benchmark and find that the model with higher accuracy relies more on text to answer questions, but further inspection shows that even if the overall audio contribution is low, models can successfully localize key sound events, suggesting that audio is not entirely ignored. Our study is the first application of MM-SHAP to Audio LLMs and we hope it will serve as a foundational step for future research in explainable AI and audio.


【20】FS-DFM: Fast and Accurate Long Text Generation with Few-Step Diffusion Language Models
标题:FS-DFM:使用少步扩散语言模型快速准确地生成长文本
链接:https://arxiv.org/abs/2509.20624

作者:mi Monsefi, Nikhil Bhendawade, Manuel Rafael Ciosici, Dominic Culver, Yizhe Zhang, Irina Belousova
摘要:自回归语言模型(ARM)提供了很强的可能性,但本质上是串行的:它们每个前向传递生成一个令牌,这限制了吞吐量并增加了长序列的延迟。扩散语言模型(DLMs)在各个位置之间并行化,因此似乎很有希望用于语言生成,但标准的离散扩散通常需要数百到数千个模型评估才能达到高质量,以串行深度换取迭代宽度。本文介绍了FS-DFM,少步离散流匹配算法。一个离散的流量匹配模型,设计用于速度而不牺牲质量。核心思想很简单:使采样步骤的数量成为一个显式参数,并训练模型,使其在步骤预算中保持一致,因此一个大的移动会落在许多小的移动的地方。我们将其与可靠的更新规则相结合,该规则将概率向正确的方向移动而不会过度,并且具有从长期轨迹中提取的强有力的教师指导。这些选择共同使少步采样稳定、准确且易于控制。在语言建模基准测试中,具有8个采样步骤的FS-DFM通过1,024个步骤的离散流基线实现了困惑奇偶校验,用于使用类似大小的模型生成1,024个令牌,提供高达128倍的采样速度和相应的延迟/吞吐量增益。
摘要:Autoregressive language models (ARMs) deliver strong likelihoods, but are inherently serial: they generate one token per forward pass, which limits throughput and inflates latency for long sequences. Diffusion Language Models (DLMs) parallelize across positions and thus appear promising for language generation, yet standard discrete diffusion typically needs hundreds to thousands of model evaluations to reach high quality, trading serial depth for iterative breadth. We introduce FS-DFM, Few-Step Discrete Flow-Matching. A discrete flow-matching model designed for speed without sacrificing quality. The core idea is simple: make the number of sampling steps an explicit parameter and train the model to be consistent across step budgets, so one big move lands where many small moves would. We pair this with a reliable update rule that moves probability in the right direction without overshooting, and with strong teacher guidance distilled from long-run trajectories. Together, these choices make few-step sampling stable, accurate, and easy to control. On language modeling benchmarks, FS-DFM with 8 sampling steps achieves perplexity parity with a 1,024-step discrete-flow baseline for generating 1,024 tokens using a similar-size model, delivering up to 128 times faster sampling and corresponding latency/throughput gains.


【21】Training Task Reasoning LLM Agents for Multi-turn Task Planning via Single-turn Reinforcement Learning
标题:通过单回合强化学习训练任务推理LLM代理进行多回合任务规划
链接:https://arxiv.org/abs/2509.20616

作者:Hu, Changliu Liu, Na Li, Yebin Wang
摘要:大型语言模型(LLM)在知识获取、推理和工具使用方面表现出了卓越的能力,使其成为自治代理应用的有希望的候选者。然而,训练LLM代理进行复杂的多回合任务规划面临着重大挑战,包括稀疏的逐集奖励,长期范围内的信用分配,以及多回合交互设置中强化学习的计算开销。为此,本文介绍了一种新的方法,将多回合任务规划转化为单回合任务推理问题,通过组相对策略优化(GRPO)实现高效的策略优化,并从专家轨迹中获得密集和可验证的奖励。我们的理论分析表明,GRPO单回合任务推理的改进导致更高的多回合成功概率下的最小回合,以及推广到子任务具有较短的视野。对复杂任务规划基准的实验评估表明,与高达14B参数的大型基线模型相比,我们使用单圈GRPO训练的1.5B参数模型具有更好的性能,对于超过30个步骤的长期规划任务,成功率为70%。我们还从理论和经验上验证了强大的跨任务泛化能力,即在复杂任务上训练的模型可以成功完成所有简单的子任务。
摘要 :Large Language Models (LLMs) have demonstrated remarkable capabilities in knowledge acquisition, reasoning, and tool use, making them promising candidates for autonomous agent applications. However, training LLM agents for complex multi-turn task planning faces significant challenges, including sparse episode-wise rewards, credit assignment across long horizons, and the computational overhead of reinforcement learning in multi-turn interaction settings. To this end, this paper introduces a novel approach that transforms multi-turn task planning into single-turn task reasoning problems, enabling efficient policy optimization through Group Relative Policy Optimization (GRPO) with dense and verifiable reward from expert trajectories. Our theoretical analysis shows that GRPO improvement on single-turn task reasoning results in higher multi-turn success probability under the minimal turns, as well as the generalization to subtasks with shorter horizons. Experimental evaluation on the complex task planning benchmark demonstrates that our 1.5B parameter model trained with single-turn GRPO achieves superior performance compared to larger baseline models up to 14B parameters, with success rates of 70% for long-horizon planning tasks with over 30 steps. We also theoretically and empirically validate the strong cross-task generalizability that the models trained on complex tasks can lead to the successful completion of all simpler subtasks.


【22】An LLM-based Agentic Framework for Accessible Network Control
标题:基于LLM的可访问网络控制抽象框架
链接:https://arxiv.org/abs/2509.20600

作者:n, Jiawei Zhou, Minlan Yu
备注:11 pages, 6 figures
摘要:传统的网络管理方法只适用于少数训练有素、拥有丰富专业知识的网络运营商。这就为普通用户在不求助于专家的情况下轻松管理其网络造成了障碍。随着最近开发的强大的大型语言模型(LLM)的语言理解,我们设计了一个系统,使网络管理访问更广泛的观众的非专家,允许用户与网络交谈的自然语言。为了有效地利用LLM的进步,我们提出了一个代理框架,该框架使用中间表示来简化不同供应商设备的配置,实时从内存中检索网络状态,并提供外部反馈接口。我们还进行了试点研究,以收集真实的用户数据的自然语言的话语网络控制,并提出了一个可视化界面,以促进对话驱动的用户交互,并使大规模的数据收集为未来的发展。初步实验验证了我们提出的系统组件与LLM集成的合成和真实用户的话语的有效性。通过我们的数据收集和可视化工作,我们为更有效地使用LLM铺平了道路,并使日常用户的网络控制民主化。
摘要:Traditional approaches to network management have been accessible only to a handful of highly-trained network operators with significant expert knowledge. This creates barriers for lay users to easily manage their networks without resorting to experts. With recent development of powerful large language models (LLMs) for language comprehension, we design a system to make network management accessible to a broader audience of non-experts by allowing users to converse with networks in natural language. To effectively leverage advancements in LLMs, we propose an agentic framework that uses an intermediate representation to streamline configuration across diverse vendor equipment, retrieves the network state from memory in real-time, and provides an interface for external feedback. We also conduct pilot studies to collect real user data of natural language utterances for network control, and present a visualization interface to facilitate dialogue-driven user interaction and enable large-scale data collection for future development. Preliminary experiments validate the effectiveness of our proposed system components with LLM integration on both synthetic and real user utterances. Through our data collection and visualization efforts, we pave the way for more effective use of LLMs and democratize network control for everyday users.


【23】Structuring Collective Action with LLM-Guided Evolution: From Ill-Structured Problems to Executable Heuristics
标题:用LLM指导的进化构建集体行动:从结构不良的问题到可执行的启发法
链接:https://arxiv.org/abs/2509.20412

作者:dley Dsouza, Graham Alexander Watt, Yuri Leonenko, Juan Moreno-Cruz
摘要:集体行动问题需要将个人激励与集体目标相结合,这是结构不良问题(ISP)的典型例子。对于个体行为者而言,地方行动与全球结果之间的因果关系并不明确,利益攸关方的目标往往相互冲突,没有一种单一、明确的算法可以将微观层面的选择与宏观层面的福利联系起来。我们提出了ECHO-MIMIC,一个计算框架,通过发现紧凑,可执行的逻辑和有说服力的理由,将这种全球性的复杂性转换为一个易于处理的,结构良好的问题(WSP)为每个代理。该框架分为两个阶段:ECHO(Evolutionary Crafting of Heuristics from Outcomes)演化编码候选行为策略的Python代码片段,而MIMIC(Mechanism Inference & Messaging for Individual-to-Collective Alliance)演化激励代理采用这些策略的伴随自然语言消息。这两个阶段都采用了大型语言模型驱动的进化搜索:LLM提出了多样化和上下文感知的代码或文本变体,而群体级选择则保留了那些在模拟环境中最大化集体性能的变体。我们证明了这个框架上的一个典型的ISP在农业景观管理,当地的农业决策影响全球生态连接。结果表明,ECHO-MIMIC发现高性能的生态学相比,基线和工艺量身定制的消息,成功地调整模拟农民的行为与生态目标。通过将算法规则发现与定制通信相结合,ECHO-MIMIC将集体行动的认知负担转化为一组简单的代理级指令,使以前结构不良的问题在实践中得到解决,并开辟了一条通往可扩展的自适应策略设计的新途径。
摘要:Collective action problems, which require aligning individual incentives with collective goals, are classic examples of Ill-Structured Problems (ISPs). For an individual agent, the causal links between local actions and global outcomes are unclear, stakeholder objectives often conflict, and no single, clear algorithm can bridge micro-level choices with macro-level welfare. We present ECHO-MIMIC, a computational framework that converts this global complexity into a tractable, Well-Structured Problem (WSP) for each agent by discovering compact, executable heuristics and persuasive rationales. The framework operates in two stages: ECHO (Evolutionary Crafting of Heuristics from Outcomes) evolves snippets of Python code that encode candidate behavioral policies, while MIMIC (Mechanism Inference & Messaging for Individual-to-Collective Alignment) evolves companion natural language messages that motivate agents to adopt those policies. Both phases employ a large-language-model-driven evolutionary search: the LLM proposes diverse and context-aware code or text variants, while population-level selection retains those that maximize collective performance in a simulated environment. We demonstrate this framework on a canonical ISP in agricultural landscape management, where local farming decisions impact global ecological connectivity. Results show that ECHO-MIMIC discovers high-performing heuristics compared to baselines and crafts tailored messages that successfully align simulated farmer behavior with landscape-level ecological goals. By coupling algorithmic rule discovery with tailored communication, ECHO-MIMIC transforms the cognitive burden of collective action into a simple set of agent-level instructions, making previously ill-structured problems solvable in practice and opening a new path toward scalable, adaptive policy design.


【24】The Secret Agenda: LLMs Strategically Lie and Our Current Safety Tools Are Blind
标题:秘密议程:LLM战略性谎言,我们当前的安全工具是盲目的
链接:https://arxiv.org/abs/2509.20393

作者:eeuw, Gaurav Chawla, Aniket Sharma, Vanessa Dietze
备注:9 pages plus citations and appendix, 7 figures
摘要:我们使用两个互补的测试平台来研究大型语言模型中的策略欺骗:Secret Agenda(跨38个模型)和Insider Trading合规性(通过SAE架构)。在所有模型家庭中,当欺骗影响目标实现时,秘密议程可靠地诱导了撒谎。分析显示,在策略性不诚实期间,自动标记的SAE“欺骗”功能很少被激活,并且在100多个欺骗相关功能中进行的功能转向实验未能防止说谎。相反,使用未标记SAE激活的内幕交易分析通过热图和t-SNE可视化中的判别模式将欺骗性与合规性反应分开。这些研究结果表明,自动标记驱动的可解释性方法无法检测或控制行为欺骗,而聚合的未标记激活提供了风险评估的人群水平结构。结果跨越Llama 8B/70 B SAE实施和GemmaScope的资源限制下,代表初步的调查结果,激励更大的研究功能发现,标签方法,和因果干预在现实的欺骗环境。
摘要:We investigate strategic deception in large language models using two complementary testbeds: Secret Agenda (across 38 models) and Insider Trading compliance (via SAE architectures). Secret Agenda reliably induced lying when deception advantaged goal achievement across all model families. Analysis revealed that autolabeled SAE features for "deception" rarely activated during strategic dishonesty, and feature steering experiments across 100+ deception-related features failed to prevent lying. Conversely, insider trading analysis using unlabeled SAE activations separated deceptive versus compliant responses through discriminative patterns in heatmaps and t-SNE visualizations. These findings suggest autolabel-driven interpretability approaches fail to detect or control behavioral deception, while aggregate unlabeled activations provide population-level structure for risk assessment. Results span Llama 8B/70B SAE implementations and GemmaScope under resource constraints, representing preliminary findings that motivate larger studies on feature discovery, labeling methodology, and causal interventions in realistic deception contexts.


Graph相关(图学习|图神经网络|图优化等)(3篇)

【1】Structure-Attribute Transformations with Markov Chain Boost Graph Domain Adaptation
标题:具有马尔科夫链增强图域自适应的结构属性转换
链接:https://arxiv.org/abs/2509.21059

作者: Yongtao Zhang, Shaobo Ren, Yuxin You
备注:11 pages,6 figures,Accepted by ACM CIKM'25
摘要:图域自适应在不同图域的标签稀缺场景中获得了极大的关注。传统的图域自适应方法主要关注在原始图结构上转换节点属性,并在网络上对齐转换后的节点特征的分布。然而,这些方法经常与不同图域之间的潜在结构异质性作斗争,这导致次优的分布对齐。为了解决这个问题,我们提出了结构-属性转换与马尔可夫链(SATMC),一个新的框架,顺序排列分布通过图结构和属性转换在网络上。为了减轻领域私有信息的负面影响,进一步提高模型的泛化能力,SATMC引入了私有领域信息约简机制和经验Wasserstein距离。理论证明表明,SATMC可以实现一个更严格的错误界跨网络节点分类相比,现有的图域自适应方法。在九对公开的跨域数据集上的大量实验表明,SATMC在跨网络节点分类任务中优于现有的方法。该代码可在https://github.com/GiantZhangYT/SATMC上获得。
摘要:Graph domain adaptation has gained significant attention in label-scarce scenarios across different graph domains. Traditional approaches to graph domain adaptation primarily focus on transforming node attributes over raw graph structures and aligning the distributions of the transformed node features across networks. However, these methods often struggle with the underlying structural heterogeneity between distinct graph domains, which leads to suboptimal distribution alignment. To address this limitation, we propose Structure-Attribute Transformation with Markov Chain (SATMC), a novel framework that sequentially aligns distributions across networks via both graph structure and attribute transformations. To mitigate the negative influence of domain-private information and further enhance the model's generalization, SATMC introduces a private domain information reduction mechanism and an empirical Wasserstein distance. Theoretical proofs suggest that SATMC can achieve a tighter error bound for cross-network node classification compared to existing graph domain adaptation methods. Extensive experiments on nine pairs of publicly available cross-domain datasets show that SATMC outperforms state-of-the-art methods in the cross-network node classification task. The code is available at https://github.com/GiantZhangYT/SATMC.


【2】FracAug: Fractional Augmentation boost Graph-level Anomaly Detection under Limited Supervision
标题:FracAug:分数增强在有限监督下增强图形级异常检测
链接:https://arxiv.org/abs/2509.20978

作者:ong, Xingyi Zhang, Sibo Wang
摘要:图级异常检测(GAD)在药物发现等不同领域至关重要,但高标记成本和数据集不平衡阻碍了图神经网络(GNN)的性能。为了解决这些问题,我们提出了FracAug,这是一个创新的插件增强框架,通过生成语义一致的图形变体和具有相互验证的伪标签来增强GNN。与以前的启发式方法不同,FracAug在给定的图中学习语义,并在一种新的加权距离感知利润损失的指导下合成分数变量。这捕获了多尺度拓扑,以生成不受数据不平衡影响的多样化、语义保持的图。然后,FracAug利用来自原始图和增强图的预测来伪标记未标记的数据,迭代地扩展训练集。作为一个与各种GNN兼容的模型不可知模块,FracAug表现出显著的通用性和有效性:在12个真实世界数据集上的14个GNN实验显示出一致的收益,将平均AUROC,AUPRC和F1分数分别提高了5.72%,7.23%和4.18%。
摘要 :Graph-level anomaly detection (GAD) is critical in diverse domains such as drug discovery, yet high labeling costs and dataset imbalance hamper the performance of Graph Neural Networks (GNNs). To address these issues, we propose FracAug, an innovative plug-in augmentation framework that enhances GNNs by generating semantically consistent graph variants and pseudo-labeling with mutual verification. Unlike previous heuristic methods, FracAug learns semantics within given graphs and synthesizes fractional variants, guided by a novel weighted distance-aware margin loss. This captures multi-scale topology to generate diverse, semantic-preserving graphs unaffected by data imbalance. Then, FracAug utilizes predictions from both original and augmented graphs to pseudo-label unlabeled data, iteratively expanding the training set. As a model-agnostic module compatible with various GNNs, FracAug demonstrates remarkable universality and efficacy: experiments across 14 GNNs on 12 real-world datasets show consistent gains, boosting average AUROC, AUPRC, and F1-score by up to 5.72%, 7.23%, and 4.18%, respectively.


【3】A Hierarchical Variational Graph Fused Lasso for Recovering Relative Rates in Spatial Compositional Data
标题:融合Lasso的分层变分图恢复空间成分数据中的相对速率
链接:https://arxiv.org/abs/2509.20636

作者:alerio Teixeira, Ed Reznik, Sudpito Banerjee, Wesley Tansey
摘要:来自生物成像技术(例如成像质谱(IMS)或成像质谱细胞术(IMC))的空间数据的分析是具有挑战性的,因为竞争性采样过程将来自单个像素中的分子的信号卷积。为了解决这个问题,我们开发了一个可扩展的贝叶斯框架,利用空间信号模式中的自然稀疏性来恢复整个图像中每个分子的相对速率。我们的方法依赖于使用的重尾变体的图形套索先验和一个新的分层变分家庭,使有效的推理,通过自动微分变分推理。仿真结果表明,我们的方法优于国家的实践点估计方法在IMS,并具有优越的后验覆盖比平均场变分推理技术。真实IMS数据的结果表明,我们的方法更好地恢复了已知组织的真实解剖结构,去除伪影,并检测标准分析方法遗漏的活动区域。
摘要:The analysis of spatial data from biological imaging technology, such as imaging mass spectrometry (IMS) or imaging mass cytometry (IMC), is challenging because of a competitive sampling process which convolves signals from molecules in a single pixel. To address this, we develop a scalable Bayesian framework that leverages natural sparsity in spatial signal patterns to recover relative rates for each molecule across the entire image. Our method relies on the use of a heavy-tailed variant of the graphical lasso prior and a novel hierarchical variational family, enabling efficient inference via automatic differentiation variational inference. Simulation results show that our approach outperforms state-of-the-practice point estimate methodologies in IMS, and has superior posterior coverage than mean-field variational inference techniques. Results on real IMS data demonstrate that our approach better recovers the true anatomical structure of known tissue, removes artifacts, and detects active regions missed by the standard analysis approach.


Transformer(3篇)

【1】MAIFormer: Multi-Agent Inverted Transformer for Flight Trajectory Prediction
标题:MAIFormer:用于飞行轨迹预测的多智能体倒置Transformer
链接:https://arxiv.org/abs/2509.21004

作者:oon, Keumjin Lee
备注:8 pages, 7 figures, submitted for IEEE Transactions on Intelligent Transportation System
摘要:多架飞机的飞行轨迹预测是必不可少的,并提供了关键的见解,飞机如何在当前的空中交通流量导航。然而,预测多智能体飞行轨迹本质上具有挑战性。其中一个主要的困难是建模的个别飞机的行为随着时间的推移和飞行之间的复杂的相互作用。生成可解释的预测结果也是一个挑战。因此,我们提出了一个多智能体倒置Transformer,MAIFormer,作为一种新的神经架构,预测多智能体飞行轨迹。所提出的框架具有两个关键的注意力模块:(i)掩蔽的多元注意力,它捕捉单个飞机的时空模式,以及(ii)代理注意力,它模拟复杂空中交通场景中多个代理之间的社会模式。我们使用来自韩国仁川国际机场终端空域的真实世界自动相关监视广播飞行轨迹数据集评估了MAIFormer。实验结果表明,MAIFormer在多个指标上取得了最佳的性能,优于其他方法。此外,MAIFormer生成的预测结果可以从人类的角度进行解释,这提高了模型的透明度及其在空中交通管制中的实用性。
摘要:Flight trajectory prediction for multiple aircraft is essential and provides critical insights into how aircraft navigate within current air traffic flows. However, predicting multi-agent flight trajectories is inherently challenging. One of the major difficulties is modeling both the individual aircraft behaviors over time and the complex interactions between flights. Generating explainable prediction outcomes is also a challenge. Therefore, we propose a Multi-Agent Inverted Transformer, MAIFormer, as a novel neural architecture that predicts multi-agent flight trajectories. The proposed framework features two key attention modules: (i) masked multivariate attention, which captures spatio-temporal patterns of individual aircraft, and (ii) agent attention, which models the social patterns among multiple agents in complex air traffic scenes. We evaluated MAIFormer using a real-world automatic dependent surveillance-broadcast flight trajectory dataset from the terminal airspace of Incheon International Airport in South Korea. The experimental results show that MAIFormer achieves the best performance across multiple metrics and outperforms other methods. In addition, MAIFormer produces prediction outcomes that are interpretable from a human perspective, which improves both the transparency of the model and its practical utility in air traffic control.


【2】Why Attention Fails: The Degeneration of Transformers into MLPs in Time Series Forecasting
标题:为什么注意力失败:时间序列预测中Transformer退化为MLP
链接:https://arxiv.org/abs/2509.20942

作者:g, Jiayi Zhu, Weiqiang Sun
摘要:基于transformer的架构在自然语言处理和计算机视觉方面取得了很高的性能,但许多研究表明,它们在时间序列预测方面并没有表现出明显的优势,在某些情况下甚至不如简单的线性基线。然而,这些研究中的大多数都没有彻底探讨Transformers故障背后的原因。为了更好地理解时间序列Transformers(TST),我们设计了一系列实验,逐步将Transformers修改为MLP,以研究注意机制的影响。令人惊讶的是,在现有的时间序列Transformers中,Transformer块经常退化为简单的MLP。我们设计了一个可解释的数据集来调查注意力机制失败背后的原因,并发现注意力机制并没有以预期的方式工作。本文从理论上分析了这一现象背后的原因,证明了当前的嵌入方法无法使Transformers在结构良好的潜在空间中发挥作用,并进一步分析了嵌入失败的深层原因。
摘要 :Transformer-based architectures achieved high performance in natural language processing and computer vision, yet many studies have shown that they have not demonstrated a clear advantage in time series forecasting and even underperform simple linear baselines in some cases. However, most of these studies have not thoroughly explored the reasons behind the failure of transformers. To better understand time-series transformers(TST), we designed a series of experiments, progressively modifying transformers into MLPs to investigate the impact of the attention mechanism. Surprisingly, transformer blocks often degenerate into simple MLPs in existing time-series transformers. We designed a interpretable dataset to investigate the reasons behind the failure of the attention mechanism and revealed that the attention mechanism is not working in the expected way. We theoretically analyzed the reasons behind this phenomenon, demonstrating that the current embedding methods fail to allow transformers to function in a well-structured latent space, and further analyzed the deeper underlying causes of the failure of embedding.


【3】FHRFormer: A Self-supervised Transformer Approach for Fetal Heart Rate Inpainting and Forecasting
标题:FHRFormer:一种用于胎儿心率修复和预测的自我监督Transformer方法
链接:https://arxiv.org/abs/2509.20852

作者:ngan, Neel Kanwal, Anita Yeconia, Ladislaus Blacy, Yuda Munyaw, Estomih Mduma, Hege Ersdal
备注:Submitted to IEEE JBHI
摘要:大约10%的新生儿在出生时需要辅助才能开始呼吸,大约5%的新生儿需要通气支持。胎儿心率(FHR)监测在产前护理期间评估胎儿健康方面发挥着至关重要的作用,能够检测异常模式并支持及时的产科干预措施,以减轻分娩期间的胎儿风险。应用人工智能(AI)方法来分析具有不同结果的连续FHR监测事件的大型数据集,可以为预测需要呼吸辅助或干预的风险提供新的见解。可穿戴FHR监测器的最新进展使得能够在不影响产妇移动性的情况下进行连续的胎儿监测。然而,在母体运动期间的传感器位移,以及胎儿或母体位置的变化,通常会导致信号丢失,从而导致记录的FHR数据中的间隙。这种缺失的数据限制了有意义的见解的提取,并使自动化(基于AI的)分析变得复杂。传统的处理缺失数据的方法,如简单的插值技术,往往不能保持信号的频谱特性。在本文中,我们提出了一种基于掩蔽变换的自动编码器方法,通过捕获数据的空间和频率分量来重建丢失的FHR信号。所提出的方法在不同的缺失数据持续时间上具有鲁棒性,可用于信号修复和预测。所提出的方法可以回顾性地应用于研究数据集,以支持基于AI的风险算法的开发。在未来,所提出的方法可以集成到可穿戴胎心率监测设备,以实现更早,更强大的风险检测。
摘要:Approximately 10\% of newborns require assistance to initiate breathing at birth, and around 5\% need ventilation support. Fetal heart rate (FHR) monitoring plays a crucial role in assessing fetal well-being during prenatal care, enabling the detection of abnormal patterns and supporting timely obstetric interventions to mitigate fetal risks during labor. Applying artificial intelligence (AI) methods to analyze large datasets of continuous FHR monitoring episodes with diverse outcomes may offer novel insights into predicting the risk of needing breathing assistance or interventions. Recent advances in wearable FHR monitors have enabled continuous fetal monitoring without compromising maternal mobility. However, sensor displacement during maternal movement, as well as changes in fetal or maternal position, often lead to signal dropouts, resulting in gaps in the recorded FHR data. Such missing data limits the extraction of meaningful insights and complicates automated (AI-based) analysis. Traditional approaches to handle missing data, such as simple interpolation techniques, often fail to preserve the spectral characteristics of the signals. In this paper, we propose a masked transformer-based autoencoder approach to reconstruct missing FHR signals by capturing both spatial and frequency components of the data. The proposed method demonstrates robustness across varying durations of missing data and can be used for signal inpainting and forecasting. The proposed approach can be applied retrospectively to research datasets to support the development of AI-based risk algorithms. In the future, the proposed method could be integrated into wearable FHR monitoring devices to achieve earlier and more robust risk detection.


GAN|对抗|攻击|生成相关(13篇)

【1】No Prior, No Leakage: Revisiting Reconstruction Attacks in Trained Neural Networks
标题:没有先验,没有泄漏:重新审视训练神经网络中的重建攻击
链接:https://arxiv.org/abs/2509.21296

作者: Refael, Guy Smorodinsky, Ofir Lindenbaum, Itay Safran
摘要:神经网络对训练数据的记忆引起了对隐私和安全的紧迫关注。最近的工作表明,在某些条件下,部分训练集可以直接从模型参数重建。其中一些方法利用了对边缘最大化的隐含偏见,这表明通常被认为有利于泛化的属性实际上可能会损害隐私。然而,尽管有惊人的经验证明,这些攻击的可靠性仍然知之甚少,缺乏坚实的理论基础。在这项工作中,我们采取了互补的观点:而不是设计更强大的攻击,我们分析了现有重建方法的固有弱点和局限性,并确定了它们失败的条件。我们严格证明,在不结合有关数据的先验知识的情况下,存在无限多的替代解决方案,这些解决方案可能与真实的训练集相距甚远,从而使重建从根本上不可靠。从经验上讲,我们进一步证明了训练样本的精确重复只是偶然发生的。我们的研究结果完善了对何时可能出现训练集泄漏的理论理解,并为减轻重建攻击提供了新的见解。值得注意的是,我们证明了训练更广泛的网络,因此更强烈地满足隐式偏差条件-事实上,不太容易受到重建攻击,在这种情况下,隐私与强大的泛化需求相协调。
摘要:The memorization of training data by neural networks raises pressing concerns for privacy and security. Recent work has shown that, under certain conditions, portions of the training set can be reconstructed directly from model parameters. Some of these methods exploit implicit bias toward margin maximization, suggesting that properties often regarded as beneficial for generalization may actually compromise privacy. Yet despite striking empirical demonstrations, the reliability of these attacks remains poorly understood and lacks a solid theoretical foundation. In this work, we take a complementary perspective: rather than designing stronger attacks, we analyze the inherent weaknesses and limitations of existing reconstruction methods and identify conditions under which they fail. We rigorously prove that, without incorporating prior knowledge about the data, there exist infinitely many alternative solutions that may lie arbitrarily far from the true training set, rendering reconstruction fundamentally unreliable. Empirically, we further demonstrate that exact duplication of training examples occurs only by chance. Our results refine the theoretical understanding of when training set leakage is possible and offer new insights into mitigating reconstruction attacks. Remarkably, we demonstrate that networks trained more extensively, and therefore satisfying implicit bias conditions more strongly -- are, in fact, less susceptible to reconstruction attacks, reconciling privacy with the need for strong generalization in this setting.


【2】Taxonomy-aware Dynamic Motion Generation on Hyperbolic Manifolds
标题 :双曲形体上分类意识的动态运动生成
链接:https://arxiv.org/abs/2509.21281

作者:nstein, Noémie Jaquier, Tamim Asfour, Leonel Rozo
备注:8 pages, 6 figures, 1 table
摘要:机器人的类人运动生成通常从生物力学研究中汲取灵感,生物力学研究通常将复杂的人类运动归类为层次分类法。虽然这些分类提供了丰富的结构信息,运动如何相互关联,这些信息经常被忽视的运动生成模型,导致生成的运动和它们的底层层次结构之间的脱节。本文介绍了\ac{gphdm},一种新的方法,学习潜在的表示保留层次结构的运动和他们的时间动态,以确保物理一致性。我们的模型通过将高斯过程动态模型(GPDM)的动态先验扩展到双曲流形并将其与分类感知的归纳偏差相结合来实现这一点。基于这种几何和分类感知框架,我们提出了三种新的机制来生成分类结构和物理一致的运动:两种概率递归方法和一种基于回调度量测地线的方法。实验表明,GPHDM忠实地编码底层的分类和时间动态,并产生新的物理一致的轨迹上的手抓分类生成逼真的运动序列。
摘要:Human-like motion generation for robots often draws inspiration from biomechanical studies, which often categorize complex human motions into hierarchical taxonomies. While these taxonomies provide rich structural information about how movements relate to one another, this information is frequently overlooked in motion generation models, leading to a disconnect between the generated motions and their underlying hierarchical structure. This paper introduces the \ac{gphdm}, a novel approach that learns latent representations preserving both the hierarchical structure of motions and their temporal dynamics to ensure physical consistency. Our model achieves this by extending the dynamics prior of the Gaussian Process Dynamical Model (GPDM) to the hyperbolic manifold and integrating it with taxonomy-aware inductive biases. Building on this geometry- and taxonomy-aware frameworks, we propose three novel mechanisms for generating motions that are both taxonomically-structured and physically-consistent: two probabilistic recursive approaches and a method based on pullback-metric geodesics. Experiments on generating realistic motion sequences on the hand grasping taxonomy show that the proposed GPHDM faithfully encodes the underlying taxonomy and temporal dynamics, and generates novel physically-consistent trajectories.


【3】Sparse Representations Improve Adversarial Robustness of Neural Network Classifiers
标题:稀疏表示提高神经网络分类器的对抗鲁棒性
链接:https://arxiv.org/abs/2509.21130

作者:teunou, Sigurd Saue, Théo Druilhe
备注:Killian Steunou is the main contributor and corresponding author of this work
摘要:深度神经网络在图像分类任务上表现得非常好,但仍然容易受到精心设计的对抗性扰动的影响。这项工作重新审视线性降维作为一个简单的,数据适应的防御。我们经验比较标准的主成分分析(PCA)与其稀疏的变体(SPCA)作为前端特征提取下游分类器,我们补充这些实验与理论分析。在理论方面,我们推导出适用于SPCA特征的线性头部的精确鲁棒性证书:对于$\ell_\infty$和$\ell_2$威胁模型(二进制和多类),认证半径随着$W^\top u$收缩的对偶范数而增长,其中$W$是投影,$u$是头部权重。我们进一步表明,对于一般(非线性)头,稀疏性通过Lipschitz组合参数降低了算子范数的界限,预测较低的输入灵敏度。从经验上讲,在投影后使用一个小的非线性网络,SPCA在强大的白盒和黑盒攻击下始终比PCA更优雅地降级,同时保持有竞争力的干净准确性。总而言之,理论确定了机制(稀疏投影减少了对抗杠杆),实验验证了这种好处在线性设置之外仍然存在。我们的代码可在https://github.com/killian31/SPCARobustness上获得。
摘要:Deep neural networks perform remarkably well on image classification tasks but remain vulnerable to carefully crafted adversarial perturbations. This work revisits linear dimensionality reduction as a simple, data-adapted defense. We empirically compare standard Principal Component Analysis (PCA) with its sparse variant (SPCA) as front-end feature extractors for downstream classifiers, and we complement these experiments with a theoretical analysis. On the theory side, we derive exact robustness certificates for linear heads applied to SPCA features: for both $\ell_\infty$ and $\ell_2$ threat models (binary and multiclass), the certified radius grows as the dual norms of $W^\top u$ shrink, where $W$ is the projection and $u$ the head weights. We further show that for general (non-linear) heads, sparsity reduces operator-norm bounds through a Lipschitz composition argument, predicting lower input sensitivity. Empirically, with a small non-linear network after the projection, SPCA consistently degrades more gracefully than PCA under strong white-box and black-box attacks while maintaining competitive clean accuracy. Taken together, the theory identifies the mechanism (sparser projections reduce adversarial leverage) and the experiments verify that this benefit persists beyond the linear setting. Our code is available at https://github.com/killian31/SPCARobustness.


【4】Cross-Modal Instructions for Robot Motion Generation
标题:用于机器人运动生成的跨模态指令
链接:https://arxiv.org/abs/2509.21107

作者:arron, Xiaoxiang Dong, Matthew Johnson-Roberson, Weiming Zhi
摘要:教导机器人新的行为通常需要通过遥操作或动觉教学,即物理指导机器人的运动演示。虽然最近的工作已经探索了使用人类草图来指定所需的行为,但数据收集仍然很麻烦,并且演示数据集难以扩展。在本文中,我们介绍了另一种范式,从跨模态指令,机器人的形状由演示的形式粗糙的注释,它可以包含自由形式的文本标签,并用于代替物理运动的学习。我们介绍了CrossInstruct框架,它将跨模态指令作为示例集成到基础视觉语言模型(VLM)的上下文输入中。然后,VLM迭代地查询更小的微调模型,并在多个2D视图上合成所需的运动。然后,这些随后融合成机器人工作空间中的3D运动轨迹上的相干分布。通过将大型VLM的推理与细粒度的指向模型相结合,CrossInstruct产生可执行的机器人行为,这些行为在有限的指令示例集的环境之外进行概括。然后,我们引入了一个下游强化学习管道,该管道利用CrossInstruct输出来有效地学习策略以完成细粒度任务。我们在基准模拟任务和实际硬件上严格评估了CrossInstruct,证明了其有效性,无需额外的微调,并为随后通过强化学习进行优化的策略提供了强大的初始化。
摘要 :Teaching robots novel behaviors typically requires motion demonstrations via teleoperation or kinaesthetic teaching, that is, physically guiding the robot. While recent work has explored using human sketches to specify desired behaviors, data collection remains cumbersome, and demonstration datasets are difficult to scale. In this paper, we introduce an alternative paradigm, Learning from Cross-Modal Instructions, where robots are shaped by demonstrations in the form of rough annotations, which can contain free-form text labels, and are used in lieu of physical motion. We introduce the CrossInstruct framework, which integrates cross-modal instructions as examples into the context input to a foundational vision-language model (VLM). The VLM then iteratively queries a smaller, fine-tuned model, and synthesizes the desired motion over multiple 2D views. These are then subsequently fused into a coherent distribution over 3D motion trajectories in the robot's workspace. By incorporating the reasoning of the large VLM with a fine-grained pointing model, CrossInstruct produces executable robot behaviors that generalize beyond the environment of in the limited set of instruction examples. We then introduce a downstream reinforcement learning pipeline that leverages CrossInstruct outputs to efficiently learn policies to complete fine-grained tasks. We rigorously evaluate CrossInstruct on benchmark simulation tasks and real hardware, demonstrating effectiveness without additional fine-tuning and providing a strong initialization for policies subsequently refined via reinforcement learning.


【5】FORCE: Transferable Visual Jailbreaking Attacks via Feature Over-Reliance CorrEction
标题:FORCE:通过功能过度依赖更正的可转移视觉越狱攻击
链接:https://arxiv.org/abs/2509.21029

作者:, Alasdair Paren, Suqin Yuan, Muyang Li, Philip Torr, Adel Bibi, Tongliang Liu
摘要:新模态的集成增强了多模态大型语言模型(MLLM)的能力,但也引入了额外的漏洞。特别是,简单的视觉越狱攻击可以比复杂的文本攻击更容易地操纵开源MLLM。然而,这些不发达的攻击表现出非常有限的跨模型可转移性,无法可靠地识别闭源MLLM中的漏洞。在这项工作中,我们分析了这些越狱攻击的损失情况,发现生成的攻击往往位于高清晰度区域,其有效性对传输过程中即使是微小的参数变化也非常敏感。为了进一步解释高清晰度定位,我们分析了它们在中间层和谱域中的特征表示,揭示了对窄层表示和语义差的频率分量的不适当依赖。在此基础上,我们提出了一种特征过度依赖校正(FORCE)方法,该方法引导攻击探索更广泛的可行区域,并根据其语义内容重新调整频率特征的影响。通过消除对层和光谱特征的不可推广的依赖,我们的方法发现了视觉越狱攻击的平坦可行区域,从而提高了跨模型的可移植性。大量的实验表明,我们的方法有效地促进了对闭源MLLM的视觉红队评估。
摘要:The integration of new modalities enhances the capabilities of multimodal large language models (MLLMs) but also introduces additional vulnerabilities. In particular, simple visual jailbreaking attacks can manipulate open-source MLLMs more readily than sophisticated textual attacks. However, these underdeveloped attacks exhibit extremely limited cross-model transferability, failing to reliably identify vulnerabilities in closed-source MLLMs. In this work, we analyse the loss landscape of these jailbreaking attacks and find that the generated attacks tend to reside in high-sharpness regions, whose effectiveness is highly sensitive to even minor parameter changes during transfer. To further explain the high-sharpness localisations, we analyse their feature representations in both the intermediate layers and the spectral domain, revealing an improper reliance on narrow layer representations and semantically poor frequency components. Building on this, we propose a Feature Over-Reliance CorrEction (FORCE) method, which guides the attack to explore broader feasible regions across layer features and rescales the influence of frequency features according to their semantic content. By eliminating non-generalizable reliance on both layer and spectral features, our method discovers flattened feasible regions for visual jailbreaking attacks, thereby improving cross-model transferability. Extensive experiments demonstrate that our approach effectively facilitates visual red-teaming evaluations against closed-source MLLMs.


【6】ExMolRL: Phenotype-Target Joint Generation of De Novo Molecules via Multi-Objective Reinforcement Learning
标题:ExMolRL:通过多目标强化学习实现De Novo分子的表现型-目标联合生成
链接:https://arxiv.org/abs/2509.21010

作者:uo, Hui Liu
摘要:高质量候选分子的产生仍然是人工智能驱动药物设计的核心挑战。目前基于表型和基于靶点的策略都存在局限性,要么导致高实验成本,要么忽视系统水平的细胞反应。为了弥合这一差距,我们提出了ExMoIRL,这是一种新的生成框架,它协同整合了表型和靶特异性线索,用于从头分子生成。表型引导的生成器首先在广泛的药物诱导的转录谱上进行预训练,随后通过多目标强化学习(RL)进行微调。至关重要的是,奖励函数融合了对接亲和力和药物相似性分数,并增加了排名损失,先验似然正则化和熵最大化。多目标RL将模型转向同时有效、多样并与指定表型效应一致的化学型。广泛的实验证明了ExMoIRL在多个充分表征的靶标上优于最先进的基于表型和基于靶标的模型的性能。我们生成的分子表现出良好的药物样性质,高靶向亲和力和对癌细胞的抑制效力(IC50)。这个统一的框架展示了结合表型指导和靶点感知策略的协同潜力,为从头药物发现提供了更有效的解决方案。
摘要:The generation of high-quality candidate molecules remains a central challenge in AI-driven drug design. Current phenotype-based and target-based strategies each suffer limitations, either incurring high experimental costs or overlook system-level cellular responses. To bridge this gap, we propose ExMoIRL, a novel generative framework that synergistically integrates phenotypic and target-specific cues for de novo molecular generation. The phenotype-guided generator is first pretrained on expansive drug-induced transcriptional profiles and subsequently fine-tuned via multi-objective reinforcement learning (RL). Crucially, the reward function fuses docking affinity and drug-likeness scores, augmented with ranking loss, prior-likelihood regularization, and entropy maximization. The multi-objective RL steers the model toward chemotypes that are simultaneously potent, diverse, and aligned with the specified phenotypic effects. Extensive experiments demonstrate ExMoIRL's superior performance over state-of-the-art phenotype-based and target-based models across multiple well-characterized targets. Our generated molecules exhibit favorable drug-like properties, high target affinity, and inhibitory potency (IC50) against cancer cells. This unified framework showcases the synergistic potential of combining phenotype-guided and target-aware strategies, offering a more effective solution for de novo drug discovery.


【7】Single Answer is Not Enough: On Generating Ranked Lists with Medical Reasoning Models
标题:单一答案是不够的:基于医学推理模型的排序列表生成
链接:https://arxiv.org/abs/2509.20866

作者 :Taveekitworachai, Natpatchara Pongjirapat, Krittaphas Chaisutyakorn, Piyalitt Ittichaiwong, Tossaporn Saengja, Kunat Pipatanakul
备注:51 pages, 27 figures
摘要:本文提出了一个系统的研究,使医学推理模型(MRM),以产生开放式问题的答案排序列表。临床决策很少依赖于一个单一的答案,而是考虑多个选项,减少了狭隘观点的风险。然而,目前的MRM通常被训练成只产生一个答案,即使在开放式环境中也是如此。我们提出了一种替代格式:排名列表和调查两种方法:提示和微调。虽然提示是引导MRM响应的一种具有成本效益的方法,但并非所有的MRM都能很好地概括不同的答案格式:选择,短文本和列表答案。基于我们的提示性发现,我们使用监督微调(SFT)和强化微调(RFT)训练和评估MRM。SFT教授一个模型来模仿注释的响应,RFT通过最大化奖励的响应来激励探索。我们提出了新的奖励功能,针对排名列表的答案格式,并进行消融RFT的研究。我们的研究结果表明,虽然一些SFT模型可以推广到某些答案格式,但使用RFT训练的模型在多种格式中更强大。我们还提出了一个案例研究的修改MedQA与多个有效的答案,发现虽然MRM可能无法选择基准的首选地面真相,他们可以识别有效的答案。据我们所知,这是第一个系统的调查方法,使MRM生成答案的排名列表。我们希望这项工作提供了第一步,开发替代的答案格式,是有益的超越单一的答案在医学领域。
摘要:This paper presents a systematic study on enabling medical reasoning models (MRMs) to generate ranked lists of answers for open-ended questions. Clinical decision-making rarely relies on a single answer but instead considers multiple options, reducing the risks of narrow perspectives. Yet current MRMs are typically trained to produce only one answer, even in open-ended settings. We propose an alternative format: ranked lists and investigate two approaches: prompting and fine-tuning. While prompting is a cost-effective way to steer an MRM's response, not all MRMs generalize well across different answer formats: choice, short text, and list answers. Based on our prompting findings, we train and evaluate MRMs using supervised fine-tuning (SFT) and reinforcement fine-tuning (RFT). SFT teaches a model to imitate annotated responses, and RFT incentivizes exploration through the responses that maximize a reward. We propose new reward functions targeted at ranked-list answer formats, and conduct ablation studies for RFT. Our results show that while some SFT models generalize to certain answer formats, models trained with RFT are more robust across multiple formats. We also present a case study on a modified MedQA with multiple valid answers, finding that although MRMs might fail to select the benchmark's preferred ground truth, they can recognize valid answers. To the best of our knowledge, this is the first systematic investigation of approaches for enabling MRMs to generate answers as ranked lists. We hope this work provides a first step toward developing alternative answer formats that are beneficial beyond single answers in medical domains.


【8】Causal Time Series Generation via Diffusion Models
标题:通过扩散模型生成因果时间序列
链接:https://arxiv.org/abs/2509.20846

作者:a, Chang Xu, Yuxuan Liang, Qingsong Wen, Roger Zimmermann, Jiang Bian
摘要:时间序列生成(TSG)是一种合成真实序列的方法,取得了显著的成功。在TSG中,条件模型在给定观测协变量的情况下生成序列,然而,此类模型在不考虑未观测混杂的情况下学习观测相关性。在这项工作中,我们提出了一个因果的角度来看,有条件的TSG和引入因果时间序列生成作为一个新的TSG任务家庭,正式珍珠的因果阶梯内,超越观察一代,包括干预和反事实设置。为了实例化这些任务,我们开发了CaTSG,这是一个统一的基于扩散的框架,具有后门调整的指导,在保持观察保真度的同时,因果地将采样转向所需的干预措施和个人反事实。具体来说,我们的方法通过后门调整和外展行动预测程序推导出因果评分函数,从而为所有三个级别的TSG提供原则性支持。对合成和真实世界数据集的广泛实验表明,CaTSG实现了卓越的保真度,并且还支持现有基线无法处理的干预和反事实生成。总的来说,我们提出了因果TSG家族,并将其实例化与CaTSG,提供了一个初步的概念验证,并开辟了一个有前途的方向,更可靠的模拟下的干预和反事实生成。
摘要:Time series generation (TSG) synthesizes realistic sequences and has achieved remarkable success. Among TSG, conditional models generate sequences given observed covariates, however, such models learn observational correlations without considering unobserved confounding. In this work, we propose a causal perspective on conditional TSG and introduce causal time series generation as a new TSG task family, formalized within Pearl's causal ladder, extending beyond observational generation to include interventional and counterfactual settings. To instantiate these tasks, we develop CaTSG, a unified diffusion-based framework with backdoor-adjusted guidance that causally steers sampling toward desired interventions and individual counterfactuals while preserving observational fidelity. Specifically, our method derives causal score functions via backdoor adjustment and the abduction-action-prediction procedure, thus enabling principled support for all three levels of TSG. Extensive experiments on both synthetic and real-world datasets show that CaTSG achieves superior fidelity and also supporting interventional and counterfactual generation that existing baselines cannot handle. Overall, we propose the causal TSG family and instantiate it with CaTSG, providing an initial proof-of-concept and opening a promising direction toward more reliable simulation under interventions and counterfactual generation.


【9】T2I-Diff: fMRI Signal Generation via Time-Frequency Image Transform and Classifier-Free Denoising Diffusion Models
标题:T2 I-Diff:通过时频图像变换和无分类器去噪扩散模型产生fMRI信号
链接:https://arxiv.org/abs/2509.20822

作者:ew, Junn Yong Loo, Yee-Fan Tan, Xinyu Tang, Hernando Ombao, Fuad Noman, Raphael C.-W. Phan, Chee-Ming Ting
备注:Accepted at the 28th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2025)
摘要:功能性磁共振成像(fMRI)是一种先进的神经成像方法,通过测量血氧水平依赖(BOLD)信号的动态变化,可以深入分析大脑活动。然而,fMRI数据采集的资源密集型性质限制了数据驱动的大脑分析模型所需的高保真样本的可用性。虽然现代生成模型可以合成fMRI数据,但它们往往表现不佳,因为它们忽略了复杂的非平稳性和非线性BOLD动力学。为了解决这些挑战,我们引入了T2 I-Diff,一个利用BOLD信号的时频表示和无分类器去噪扩散的fMRI生成框架。具体来说,我们的框架首先通过时间相关的傅里叶变换将BOLD信号转换为加窗谱图,捕获底层的时间动态和谱演化。随后,训练无分类器扩散模型以生成类条件频谱图,然后通过逆傅立叶变换将其恢复为BOLD信号。最后,我们通过在下游基于fMRI的脑网络分类中证明改进的准确性和泛化性来验证我们的方法的有效性。
摘要 :Functional Magnetic Resonance Imaging (fMRI) is an advanced neuroimaging method that enables in-depth analysis of brain activity by measuring dynamic changes in the blood oxygenation level-dependent (BOLD) signals. However, the resource-intensive nature of fMRI data acquisition limits the availability of high-fidelity samples required for data-driven brain analysis models. While modern generative models can synthesize fMRI data, they often underperform because they overlook the complex non-stationarity and nonlinear BOLD dynamics. To address these challenges, we introduce T2I-Diff, an fMRI generation framework that leverages time-frequency representation of BOLD signals and classifier-free denoising diffusion. Specifically, our framework first converts BOLD signals into windowed spectrograms via a time-dependent Fourier transform, capturing both the underlying temporal dynamics and spectral evolution. Subsequently, a classifier-free diffusion model is trained to generate class-conditioned frequency spectrograms, which are then reverted to BOLD signals via inverse Fourier transforms. Finally, we validate the efficacy of our approach by demonstrating improved accuracy and generalization in downstream fMRI-based brain network classification.


【10】DAC-LoRA: Dynamic Adversarial Curriculum for Efficient and Robust Few-Shot Adaptation
标题:DAC-LoRA:用于高效且稳健的Few-Shot适应的动态对抗课程
链接:https://arxiv.org/abs/2509.20792

作者:kar
备注:Accepted at ICCV2025 Workshop on Safe and Trustworthy Multimodal AI Systems
摘要:视觉语言模型(VLM)是自动驾驶、医疗诊断和内容审核等关键应用的基础。虽然像LoRA这样的参数高效微调(PEFT)方法能够有效地适应专门的任务,但这些模型仍然容易受到对抗性攻击的影响,这些攻击可能会危及安全关键决策。CLIP是许多下游VLM的骨干,是一个高价值的目标,其漏洞可以在多模式AI生态系统中级联。我们提出了动态对抗课程DAC-LoRA,这是一个将对抗训练集成到PEFT中的新框架。我们的方法的核心原则,即逐步挑战性攻击的智能课程,是通用的,可以潜在地应用于任何迭代攻击方法。在一阶平稳条件(FOSC)和TRADES启发的损失的指导下,DAC-LoRA在对抗鲁棒性方面实现了实质性改进,而不会显着影响干净的准确性。我们的工作提出了一种有效的,轻量级的,广泛适用的方法来证明DAC-LoRA框架可以很容易地集成到一个标准的PEFT管道,以显着提高鲁棒性。
摘要:Vision-Language Models (VLMs) are foundational to critical applications like autonomous driving, medical diagnosis, and content moderation. While Parameter-Efficient Fine-Tuning (PEFT) methods like LoRA enable their efficient adaptation to specialized tasks, these models remain vulnerable to adversarial attacks that can compromise safety-critical decisions. CLIP, the backbone for numerous downstream VLMs, is a high-value target whose vulnerabilities can cascade across the multimodal AI ecosystem. We propose Dynamic Adversarial Curriculum DAC-LoRA, a novel framework that integrates adversarial training into PEFT. The core principle of our method i.e. an intelligent curriculum of progressively challenging attack, is general and can potentially be applied to any iterative attack method. Guided by the First-Order Stationary Condition (FOSC) and a TRADES-inspired loss, DAC-LoRA achieves substantial improvements in adversarial robustness without significantly compromising clean accuracy. Our work presents an effective, lightweight, and broadly applicable method to demonstrate that the DAC-LoRA framework can be easily integrated into a standard PEFT pipeline to significantly enhance robustness.


【11】Understanding and Improving Adversarial Robustness of Neural Probabilistic Circuits
标题:理解和提高神经概率电路的对抗鲁棒性
链接:https://arxiv.org/abs/2509.20549

作者:en, Han Zhao
备注:NeurIPS 2025 Camera Ready
摘要:神经概率回路是一类新的概念瓶颈模型,它由属性识别模型和概率推理回路组成。通过整合这两个模块的输出,NPC产生了组合和可解释的预测。虽然在下游任务上提供了增强的可解释性和高性能,但基于神经网络的属性识别模型仍然是一个黑匣子。该漏洞允许对抗性攻击通过向输入图像引入精心制作的微妙扰动来操纵属性预测,从而可能损害最终预测。本文从理论上分析了NPC的对抗鲁棒性,证明了它只依赖于属性识别模型的鲁棒性,而与概率电路的鲁棒性无关。此外,我们提出了RNPC,第一个强大的神经概率电路对抗对抗攻击的识别模块。RNPC引入了一种新的类集成推理,确保了两个模块输出的鲁棒组合。我们的理论分析表明,RNPC具有可证明的提高对抗鲁棒性相比,NPC。图像分类任务的实验结果表明,RNPC实现了优越的对抗鲁棒性相比,现有的概念瓶颈模型,同时保持高精度的良性输入。
摘要:Neural Probabilistic Circuits (NPCs), a new class of concept bottleneck models, comprise an attribute recognition model and a probabilistic circuit for reasoning. By integrating the outputs from these two modules, NPCs produce compositional and interpretable predictions. While offering enhanced interpretability and high performance on downstream tasks, the neural-network-based attribute recognition model remains a black box. This vulnerability allows adversarial attacks to manipulate attribute predictions by introducing carefully crafted subtle perturbations to input images, potentially compromising the final predictions. In this paper, we theoretically analyze the adversarial robustness of NPC and demonstrate that it only depends on the robustness of the attribute recognition model and is independent of the robustness of the probabilistic circuit. Moreover, we propose RNPC, the first robust neural probabilistic circuit against adversarial attacks on the recognition module. RNPC introduces a novel class-wise integration for inference, ensuring a robust combination of outputs from the two modules. Our theoretical analysis demonstrates that RNPC exhibits provably improved adversarial robustness compared to NPC. Empirical results on image classification tasks show that RNPC achieves superior adversarial robustness compared to existing concept bottleneck models while maintaining high accuracy on benign inputs.


【12】Lightweight MobileNetV1+GRU for ECG Biometric Authentication: Federated and Adversarial Evaluation
标题:用于心电图生物识别认证的轻量级MobileNetV1+GRU:联合和对抗评估
链接:https://arxiv.org/abs/2509.20382

作者:g Rai, Sabin Kafley
备注:5 pages, 7 figures, 5 tables
摘要 :ECG生物识别技术提供了一种独特、安全的身份验证方法,但其在可穿戴设备上的部署面临着实时处理、隐私和欺骗漏洞的挑战。本文提出了一种轻量级深度学习模型(MobileNetV 1 +GRU),用于基于ECG的身份验证,注入20 dB高斯噪声和自定义预处理。我们使用ECGID、MIT-BIH、CYBHi和PTB数据集模拟可穿戴条件和边缘部署,准确率分别为99.34%、99.31%、91.74%和98.49%,F1分数分别为0.9869、0.9923、0.9125和0.9771,精度分别为0.9866、0.9924,0.9180和0.9845,召回率为0.9878、0.9923、0.9129和0.9756,等误差率(EER)为0.0009、0.00013、0.0091和0.0009,ROC-AUC值为0.9999、0.9999、0.9985和0.9998,而在FGSM对抗攻击下,准确率从96.82%下降到0.80%。本文重点介绍了联邦学习、对抗性测试以及对各种可穿戴生理数据集的需求,以确保安全和可扩展的生物识别技术。
摘要:ECG biometrics offer a unique, secure authentication method, yet their deployment on wearable devices faces real-time processing, privacy, and spoofing vulnerability challenges. This paper proposes a lightweight deep learning model (MobileNetV1+GRU) for ECG-based authentication, injection of 20dB Gaussian noise & custom preprocessing. We simulate wearable conditions and edge deployment using the ECGID, MIT-BIH, CYBHi, and PTB datasets, achieving accuracies of 99.34%, 99.31%, 91.74%, and 98.49%, F1-scores of 0.9869, 0.9923, 0.9125, and 0.9771, Precision of 0.9866, 0.9924, 0.9180 and 0.9845, Recall of 0.9878, 0.9923, 0.9129, and 0.9756, equal error rates (EER) of 0.0009, 0.00013, 0.0091, and 0.0009, and ROC-AUC values of 0.9999, 0.9999, 0.9985, and 0.9998, while under FGSM adversarial attacks, accuracy drops from 96.82% to as low as 0.80%. This paper highlights federated learning, adversarial testing, and the need for diverse wearable physiological datasets to ensure secure and scalable biometrics.


【13】Are Modern Speech Enhancement Systems Vulnerable to Adversarial Attacks?
标题:现代语音增强系统容易受到对抗攻击吗?
链接:https://arxiv.org/abs/2509.21087

作者: Makarov, Lea Schönherr, Timo Gerkmann
备注:Copyright 2026 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
摘要:用于语音增强的机器学习方法正变得越来越有表现力,能够对输入信号进行更强大的修改。在本文中,我们证明了这种表现力引入了一个漏洞:先进的语音增强模型可能容易受到对抗性攻击。具体来说,我们表明,敌对的噪音,精心制作和心理声学掩盖的原始输入,可以注入这样的增强语音输出传达一个完全不同的语义含义。我们通过实验验证,当代预测语音增强模型确实可以以这种方式进行操作。此外,我们强调,具有随机采样器的扩散模型通过设计表现出对这种对抗性攻击的固有鲁棒性。
摘要:Machine learning approaches for speech enhancement are becoming increasingly expressive, enabling ever more powerful modifications of input signals. In this paper, we demonstrate that this expressiveness introduces a vulnerability: advanced speech enhancement models can be susceptible to adversarial attacks. Specifically, we show that adversarial noise, carefully crafted and psychoacoustically masked by the original input, can be injected such that the enhanced speech output conveys an entirely different semantic meaning. We experimentally verify that contemporary predictive speech enhancement models can indeed be manipulated in this way. Furthermore, we highlight that diffusion models with stochastic samplers exhibit inherent robustness to such adversarial attacks by design.


半/弱/无/有监督|不确定性|主动学习(5篇)

【1】LAVA: Explainability for Unsupervised Latent Embeddings
标题:LAVA:无监督潜在嵌入的解释性
链接:https://arxiv.org/abs/2509.21149

作者:sec, Joana P. Gonçalves
备注:28 pages, including references and appendix
摘要:无监督的黑箱模型可以成为科学发现的驱动力,但仍然难以解释。至关重要的是,发现取决于对模型输出的理解,这通常是一个多维的潜在嵌入,而不是一个定义明确的目标。虽然监督学习的可解释性通常旨在揭示如何使用输入特征来预测目标,但其无监督对应物应将输入特征与学习的潜在空间的结构相关联。对无监督学习的监督模型可解释性的调整提供了单样本或全样本的摘要解释。然而,如果没有自动化的策略,将相似的样本相互关联起来,并以它们潜在的接近度为指导,那么解释要么太细,要么太简化,没有意义。这对于不产生映射函数的流形学习方法尤其重要,只留给我们它们嵌入的相对空间组织。我们介绍了局部感知变量关联(LAVA),一个事后模型无关的方法,旨在解释本地嵌入组织通过其与输入功能的关系。为了实现这一点,LAVA将潜在空间表示为根据原始特征之间的相关性描述的一系列位置(邻域),然后揭示整个潜在空间中重复出现的相关性模式。基于MNIST和单细胞肾脏数据集的UMAP嵌入,我们表明LAVA捕获相关特征关联,在潜在空间的看似遥远的区域之间共享视觉和生物相关的局部模式。
摘要:Unsupervised black-box models can be drivers of scientific discovery, but remain difficult to interpret. Crucially, discovery hinges on understanding the model output, which is often a multi-dimensional latent embedding rather than a well-defined target. While explainability for supervised learning usually seeks to uncover how input features are used to predict a target, its unsupervised counterpart should relate input features to the structure of the learned latent space. Adaptations of supervised model explainability for unsupervised learning provide either single-sample or dataset-wide summary explanations. However, without automated strategies of relating similar samples to one another guided by their latent proximity, explanations remain either too fine-grained or too reductive to be meaningful. This is especially relevant for manifold learning methods that produce no mapping function, leaving us only with the relative spatial organization of their embeddings. We introduce Locality-Aware Variable Associations (LAVA), a post-hoc model-agnostic method designed to explain local embedding organization through its relationship with the input features. To achieve this, LAVA represents the latent space as a series of localities (neighborhoods) described in terms of correlations between the original features, and then reveals reoccurring patterns of correlations across the entire latent space. Based on UMAP embeddings of MNIST and a single-cell kidney dataset, we show that LAVA captures relevant feature associations, with visually and biologically relevant local patterns shared among seemingly distant regions of the latent spaces.


【2】GeoRef: Referring Expressions in Geometry via Task Formulation, Synthetic Supervision, and Reinforced MLLM-based Solutions
标题:GeoRef:通过任务制定、综合监督和增强的基于MLLM的解决方案引用几何中的公式
链接:https://arxiv.org/abs/2509.21050

作者: Wenqiang Yv, Xuzheng Yang, Shichang Wang, Junzhuo Liu, Peng Wang, Guoqing Wang, Yang Yang, Heng Tao Shen
摘要:人工智能驱动的几何问题解决是一项复杂的视觉语言任务,需要准确的图表解释,数学推理和强大的跨模态基础。这项任务的一个基础但未充分开发的功能是基于自然语言查询识别和解释几何元素的能力。为了解决这个问题,我们引入了几何问题的引用表达式理解(REC)任务,该任务评估模型是否可以响应文本提示在图中定位点,形状和空间关系。我们提出了GeoRef,这是一个根据现有几何问题语料库构建的基准数据集,具有多样化、高质量的注释和查询。由于缺乏用于此任务的注释数据,我们使用结构化几何形式语言生成大规模合成训练数据集,从而实现几何概念的广泛覆盖并促进模型自适应。我们探讨了两种微调方法:监督微调(SFT)和组相对策略优化(GRPO)。我们的研究结果表明,GRPO通过更好地将模型行为与特定任务的奖励相结合,显着优于SFT。此外,我们提出了一种验证和再生机制,可以检测不正确的预测并使用上下文推理历史重新推断答案,进一步提高准确性。值得注意的是,即使是最先进的多模态大型语言模型(MLLM)也难以完成这一任务,这强调了明确评估和加强几何基础作为鲁棒几何问题解决的先决条件的必要性。此外,在GeoRef上训练的模型在下游几何推理任务上表现出可衡量的改进,突出了REC作为多模态数学理解基础的更广泛价值。
摘要:AI-driven geometric problem solving is a complex vision-language task that requires accurate diagram interpretation, mathematical reasoning, and robust cross-modal grounding. A foundational yet underexplored capability for this task is the ability to identify and interpret geometric elements based on natural language queries. To address this, we introduce the task of Referring Expression Comprehension (REC) for geometric problems, which evaluates whether models can localize points, shapes, and spatial relations in diagrams in response to textual prompts. We present GeoRef, a benchmark dataset constructed from existing geometric problem corpora, featuring diverse, high-quality annotations and queries. Due to the lack of annotated data for this task, we generate a large-scale synthetic training dataset using a structured geometric formal language, enabling broad coverage of geometric concepts and facilitating model adaptation. We explore two fine-tuning approaches: Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO). Our results show that GRPO significantly outperforms SFT by better aligning model behavior with task-specific rewards. Furthermore, we propose a verify-and-regenerate mechanism that detects incorrect predictions and re-infers answers using contextual reasoning history, further boosting accuracy. Notably, even state-of-the-art Multimodal Large Language Models (MLLMs) struggle with this task, underscoring the necessity of explicitly evaluating and strengthening geometric grounding as a prerequisite for robust geometric problem solving. Moreover, models trained on GeoRef demonstrate measurable improvements on downstream geometric reasoning tasks, highlighting the broader value of REC as a foundation for multimodal mathematical understanding.


【3】An Adaptor for Triggering Semi-Supervised Learning to Out-of-Box Serve Deep Image Clustering
标题:用于触发半监督学习以开箱即用服务深度图像集群的适配器
链接:https://arxiv.org/abs/2509.20976

作者: Lei Qi, Yinghuan Shi, Yang Gao
备注:Accepted by IEEE Transactions on Image Processing (TIP)
摘要:None
摘要:Recently, some works integrate SSL techniques into deep clustering frameworks to enhance image clustering performance. However, they all need pretraining, clustering learning, or a trained clustering model as prerequisites, limiting the flexible and out-of-box application of SSL learners in the image clustering task. This work introduces ASD, an adaptor that enables the cold-start of SSL learners for deep image clustering without any prerequisites. Specifically, we first randomly sample pseudo-labeled data from all unlabeled data, and set an instance-level classifier to learn them with semantically aligned instance-level labels. With the ability of instance-level classification, we track the class transitions of predictions on unlabeled data to extract high-level similarities of instance-level classes, which can be utilized to assign cluster-level labels to pseudo-labeled data. Finally, we use the pseudo-labeled data with assigned cluster-level labels to trigger a general SSL learner trained on the unlabeled data for image clustering. We show the superior performance of ASD across various benchmarks against the latest deep image clustering approaches and very slight accuracy gaps compared to SSL methods using ground-truth, e.g., only 1.33% on CIFAR-10. Moreover, ASD can also further boost the performance of existing SSL-embedded deep image clustering methods.


【4】CoSupFormer : A Contrastive Supervised learning approach for EEG signal Classification
标题:CoSupFormer:一种用于脑电信号分类的对比监督学习方法
链接:https://arxiv.org/abs/2509.20489

作者:oum, C. Habermacher, J. Volle, S. Grudinin
备注:20 pages (14 pages Main text and 6 pages Supplementary Material)
摘要:脑电信号(EEG)包含丰富的多尺度信息,对于了解大脑状态至关重要,在诊断和推进药物开发领域具有潜在应用。然而,从原始EEG信号中提取有意义的特征,同时处理噪声和通道变异性仍然是一个重大挑战。这项工作提出了一个新的端到端深度学习框架,通过几个关键创新来解决这些问题。首先,我们设计了一个编码器,能够明确捕获多尺度频率振荡,涵盖了广泛的功能,不同的EEG相关的任务。其次,为了对复杂的依赖关系进行建模并处理EEG的高时间分辨率,我们引入了一个基于注意力的编码器,该编码器同时学习EEG通道之间的交互以及单个通道的局部{\em patches}内的交互。我们在注意力编码器上集成了一个专用的门控网络,以动态过滤掉噪声和非信息通道,提高EEG数据的可靠性。整个编码过程由一个新的损失函数指导,该函数利用了监督和对比学习,显著提高了模型的泛化能力。我们在多种应用中验证了我们的方法,从多种中枢神经系统(CNS)疾病治疗的效果分类到帕金森病和阿尔茨海默病的诊断。我们的研究结果表明,所提出的学习范式可以从不同物种的原始EEG信号中提取具有生物学意义的模式,自主选择高质量的通道,并通过创新的架构和损失设计实现鲁棒的泛化。
摘要 :Electroencephalography signals (EEGs) contain rich multi-scale information crucial for understanding brain states, with potential applications in diagnosing and advancing the drug development landscape. However, extracting meaningful features from raw EEG signals while handling noise and channel variability remains a major challenge. This work proposes a novel end-to-end deep-learning framework that addresses these issues through several key innovations. First, we designed an encoder capable of explicitly capturing multi-scale frequency oscillations covering a wide range of features for different EEG-related tasks. Secondly, to model complex dependencies and handle the high temporal resolution of EEGs, we introduced an attention-based encoder that simultaneously learns interactions across EEG channels and within localized {\em patches} of individual channels. We integrated a dedicated gating network on top of the attention encoder to dynamically filter out noisy and non-informative channels, enhancing the reliability of EEG data. The entire encoding process is guided by a novel loss function, which leverages supervised and contrastive learning, significantly improving model generalization. We validated our approach in multiple applications, ranging from the classification of effects across multiple Central Nervous System (CNS) disorders treatments to the diagnosis of Parkinson's and Alzheimer's disease. Our results demonstrate that the proposed learning paradigm can extract biologically meaningful patterns from raw EEG signals across different species, autonomously select high-quality channels, and achieve robust generalization through innovative architectural and loss design.


【5】Unsupervised Domain Adaptation with an Unobservable Source Subpopulation
标题:具有不可观察源亚群的无监督领域适应
链接:https://arxiv.org/abs/2509.20587

作者:, Jun Jin, Haotian Zhang, Qinglong Tian, Yanyuan Ma, Yixuan Li, Jiwei Zhao
摘要:我们研究了一个无监督域自适应问题,其中源域由二进制标签$Y$和二进制背景(或环境)$A$定义的子种群组成。我们专注于一个具有挑战性的设置,其中一个这样的亚群在源域是不可观察的。天真地忽略这个未观察到的组可能会导致有偏差的估计和预测性能下降。尽管这种结构性的缺失,我们表明,在目标域的预测仍然可以恢复。具体来说,我们严格推导出目标域的背景特定和整体预测模型。在实际应用中,我们提出了分布匹配的方法来估计子总体的比例。我们提供理论保证我们的估计的渐近行为,并建立一个上界的预测误差。在合成数据集和真实数据集上的实验表明,我们的方法优于不考虑这种不可观察的源亚群的天真基准。
摘要:We study an unsupervised domain adaptation problem where the source domain consists of subpopulations defined by the binary label $Y$ and a binary background (or environment) $A$. We focus on a challenging setting in which one such subpopulation in the source domain is unobservable. Naively ignoring this unobserved group can result in biased estimates and degraded predictive performance. Despite this structured missingness, we show that the prediction in the target domain can still be recovered. Specifically, we rigorously derive both background-specific and overall prediction models for the target domain. For practical implementation, we propose the distribution matching method to estimate the subpopulation proportions. We provide theoretical guarantees for the asymptotic behavior of our estimator, and establish an upper bound on the prediction error. Experiments on both synthetic and real-world datasets show that our method outperforms the naive benchmark that does not account for this unobservable source subpopulation.


迁移|Zero/Few/One-Shot|自适应(6篇)

【1】AbideGym: Turning Static RL Worlds into Adaptive Challenges
标题:AbideGym:将静态RL世界转变为适应性挑战
链接:https://arxiv.org/abs/2509.21234

作者:, Zac Liu, Aaron Childress
摘要:用强化学习训练的代理通常会制定脆弱的策略,当动态变化时会失败,这是一个被静态基准放大的问题。AbideGym是一个动态的MiniGrid包装器,它引入了代理感知的扰动和可扩展的复杂性来执行情节内适应。通过暴露静态政策的弱点和促进弹性,AbideGym提供了一个模块化的,可复制的评估框架,用于推进课程学习,持续学习和强大的泛化研究。
摘要:Agents trained with reinforcement learning often develop brittle policies that fail when dynamics shift, a problem amplified by static benchmarks. AbideGym, a dynamic MiniGrid wrapper, introduces agent-aware perturbations and scalable complexity to enforce intra-episode adaptation. By exposing weaknesses in static policies and promoting resilience, AbideGym provides a modular, reproducible evaluation framework for advancing research in curriculum learning, continual learning, and robust generalization.


【2】Towards Foundation Models for Zero-Shot Time Series Anomaly Detection: Leveraging Synthetic Data and Relative Context Discrepancy
标题:走向Zero-Shot时间序列异常检测的基础模型:利用合成数据和相对上下文差异
链接:https://arxiv.org/abs/2509.21190

作者: Hao Duong Le, Jinbo Li, Wenjun He, Meng Wang, Chenghao Liu, Chen Zhang
摘要:时间序列异常检测(TSAD)是一项关键任务,但开发以zero-shot方式推广到不可见数据的模型仍然是一项重大挑战。TSAD的主流基础模型主要依赖于基于重建的目标,这是一个基本的目标不匹配:它们很难识别细微的异常,同时经常误解复杂的正常模式,导致高比例的假阴性和阳性。为了克服这些局限性,我们引入了一个新的TSAD基础模型TimeRCD,它建立在一个新的预训练范式上:相对上下文离散(RCD)。不是学习重建输入,而是显式训练{TimeRCD}通过检测相邻时间窗口之间的显著差异来识别异常。这种关系方法,实现了一个标准的Transformer架构,使模型能够捕捉上下文的变化,异常的指示,基于重建的方法往往错过。为了促进这种模式,我们开发了一个大规模的,多样化的合成语料库与令牌级异常标签,提供有效的预训练所需的丰富的监督信号。大量的实验表明,\texttt{TimeRCD}显着优于现有的通用和异常特定的基础模型在zero-shot TSAD在不同的数据集。我们的研究结果验证了RCD范式的优越性,并建立了一个新的,有效的路径建立强大的和可推广的基础模型的时间序列异常检测。
摘要 :Time series anomaly detection (TSAD) is a critical task, but developing models that generalize to unseen data in a zero-shot manner remains a major challenge. Prevailing foundation models for TSAD predominantly rely on reconstruction-based objectives, which suffer from a fundamental objective mismatch: they struggle to identify subtle anomalies while often misinterpreting complex normal patterns, leading to high rates of false negatives and positives. To overcome these limitations, we introduce \texttt{TimeRCD}, a novel foundation model for TSAD built upon a new pre-training paradigm: Relative Context Discrepancy (RCD). Instead of learning to reconstruct inputs, \texttt{TimeRCD} is explicitly trained to identify anomalies by detecting significant discrepancies between adjacent time windows. This relational approach, implemented with a standard Transformer architecture, enables the model to capture contextual shifts indicative of anomalies that reconstruction-based methods often miss. To facilitate this paradigm, we develop a large-scale, diverse synthetic corpus with token-level anomaly labels, providing the rich supervisory signal necessary for effective pre-training. Extensive experiments demonstrate that \texttt{TimeRCD} significantly outperforms existing general-purpose and anomaly-specific foundation models in zero-shot TSAD across diverse datasets. Our results validate the superiority of the RCD paradigm and establish a new, effective path toward building robust and generalizable foundation models for time series anomaly detection.


【3】EvoMail: Self-Evolving Cognitive Agents for Adaptive Spam and Phishing Email Defense
标题:EvoMail:用于自适应垃圾邮件和网络钓鱼电子邮件防御的自我进化认知代理
链接:https://arxiv.org/abs/2509.21129

作者:, De-Tian Chu, Lin-Yuan Bai, Wei Kang, Hai-Tao Zhang, Bo Li, Zhi-Mo Han, Jing Ge, Hai-Feng Lin
摘要:现代电子邮件垃圾邮件和网络钓鱼攻击已经远远超出了关键字黑名单或简单的黑客攻击。攻击者现在制作多模式活动,将自然语言文本与混淆的URL,伪造的标题和恶意附件相结合,在几天内调整其策略以绕过过滤器。传统的垃圾邮件检测系统依赖于静态规则或单模态模型,难以整合异构信号或不断适应,导致性能迅速下降。   我们提出EvoMail,一个自我进化的认知代理框架,用于检测垃圾邮件和网络钓鱼。EvoMail首先构建了一个统一的异构电子邮件图,它融合了文本内容、元数据(头、域、域)和嵌入式资源(URL、附件)。由大型语言模型(LLM)增强的认知图神经网络在这些来源之间执行上下文感知推理,以识别协调的垃圾邮件活动。最关键的是,EvoMail参与了一个对抗性的自我进化循环:“红队”代理生成新的逃避策略-例如字符混淆或AI生成的钓鱼文本-而“蓝队”检测器从失败中学习,将经验压缩到记忆模块中,并将其重新用于未来的推理。   对真实世界数据集(Enron-Spam,Ling-Spam,SpamAssassin和TREC)和合成对抗变体的广泛实验表明,EvoMail在检测准确性,对不断发展的垃圾邮件策略的适应性以及推理痕迹的可解释性方面始终优于最先进的基线。这些结果凸显了EvoMail作为针对下一代垃圾邮件和网络钓鱼威胁的弹性且可解释的防御框架的潜力。
摘要:Modern email spam and phishing attacks have evolved far beyond keyword blacklists or simple heuristics. Adversaries now craft multi-modal campaigns that combine natural-language text with obfuscated URLs, forged headers, and malicious attachments, adapting their strategies within days to bypass filters. Traditional spam detection systems, which rely on static rules or single-modality models, struggle to integrate heterogeneous signals or to continuously adapt, leading to rapid performance degradation.   We propose EvoMail, a self-evolving cognitive agent framework for robust detection of spam and phishing. EvoMail first constructs a unified heterogeneous email graph that fuses textual content, metadata (headers, senders, domains), and embedded resources (URLs, attachments). A Cognitive Graph Neural Network enhanced by a Large Language Model (LLM) performs context-aware reasoning across these sources to identify coordinated spam campaigns. Most critically, EvoMail engages in an adversarial self-evolution loop: a ''red-team'' agent generates novel evasion tactics -- such as character obfuscation or AI-generated phishing text -- while the ''blue-team'' detector learns from failures, compresses experiences into a memory module, and reuses them for future reasoning.   Extensive experiments on real-world datasets (Enron-Spam, Ling-Spam, SpamAssassin, and TREC) and synthetic adversarial variants demonstrate that EvoMail consistently outperforms state-of-the-art baselines in detection accuracy, adaptability to evolving spam tactics, and interpretability of reasoning traces. These results highlight EvoMail's potential as a resilient and explainable defense framework against next-generation spam and phishing threats.


【4】SPREAD: Sampling-based Pareto front Refinement via Efficient Adaptive Diffusion
标题:SPRAD:通过高效自适应扩散进行基于采样的帕累托前沿细化
链接:https://arxiv.org/abs/2509.21058

作者:lomon Hotegni, Sebastian Peitz
摘要:开发有效的多目标优化方法来计算冲突目标之间的帕累托最优折衷集仍然是一个关键的挑战,特别是对于大规模和昂贵的问题。为了弥合这一差距,我们引入了SPREAD,一个基于去噪扩散概率模型(DDPMs)的生成框架。SPREAD首先在从决策空间采样的点上学习条件扩散过程,然后在每个反向扩散步骤中,通过一个采样方案来细化候选项,该方案使用自适应多梯度下降更新来快速收敛,同时使用基于高斯RBF的排斥项来实现多样性。多目标优化基准的实证结果,包括离线和贝叶斯代理的设置,表明SPREAD匹配或超过领先的基线效率,可扩展性和帕累托前沿覆盖。
摘要:Developing efficient multi-objective optimization methods to compute the Pareto set of optimal compromises between conflicting objectives remains a key challenge, especially for large-scale and expensive problems. To bridge this gap, we introduce SPREAD, a generative framework based on Denoising Diffusion Probabilistic Models (DDPMs). SPREAD first learns a conditional diffusion process over points sampled from the decision space and then, at each reverse diffusion step, refines candidates via a sampling scheme that uses an adaptive multiple gradient descent-inspired update for fast convergence alongside a Gaussian RBF-based repulsion term for diversity. Empirical results on multi-objective optimization benchmarks, including offline and Bayesian surrogate-based settings, show that SPREAD matches or exceeds leading baselines in efficiency, scalability, and Pareto front coverage.


【5】LiLAW: Lightweight Learnable Adaptive Weighting to Meta-Learn Sample Difficulty and Improve Noisy Training
标题:LiLAW:轻量级可学习自适应加权以提高元学习样本难度并改善噪音训练
链接:https://arxiv.org/abs/2509.20786

作者:Moturu, Anna Goldenberg, Babak Taati
摘要 :在存在噪声标签和数据异质性的情况下训练深度神经网络是一个重大挑战。我们引入了轻量级可学习自适应加权(LiLAW),这是一种新方法,它根据每个训练样本的难度水平(分为容易、中等或困难)动态调整每个训练样本的损失权重。仅使用三个可学习的参数,LiLAW通过在每个训练小批量之后使用验证集上的单个小批量梯度下降步骤来更新这些权重,从而在整个训练过程中自适应地优先考虑信息样本,而不需要过多的超参数调整或干净的验证集。在多个常规和医学成像数据集、噪声水平和类型、损失函数以及有和没有预训练的架构上进行的广泛实验表明,LiLAW即使在高噪声环境中也能始终如一地提高性能。它是有效的,不严重依赖数据增强或高级正则化,突出了其实用性。它提供了一种计算效率高的解决方案,可以在任何神经网络训练设置中提高模型的泛化能力和鲁棒性。
摘要:Training deep neural networks in the presence of noisy labels and data heterogeneity is a major challenge. We introduce Lightweight Learnable Adaptive Weighting (LiLAW), a novel method that dynamically adjusts the loss weight of each training sample based on its evolving difficulty level, categorized as easy, moderate, or hard. Using only three learnable parameters, LiLAW adaptively prioritizes informative samples throughout training by updating these weights using a single mini-batch gradient descent step on the validation set after each training mini-batch, without requiring excessive hyperparameter tuning or a clean validation set. Extensive experiments across multiple general and medical imaging datasets, noise levels and types, loss functions, and architectures with and without pretraining demonstrate that LiLAW consistently enhances performance, even in high-noise environments. It is effective without heavy reliance on data augmentation or advanced regularization, highlighting its practicality. It offers a computationally efficient solution to boost model generalization and robustness in any neural network training setup.


【6】Adaptive Approach to Enhance Machine Learning Scheduling Algorithms During Runtime Using Reinforcement Learning in Metascheduling Applications
标题:在元管理应用中使用强化学习在数据处理期间增强机器学习调度算法的自适应方法
链接:https://arxiv.org/abs/2509.20520

作者:haer, Ala Khalifeh, Roman Obermaisser
备注:18 pages, 21 figures
摘要:时间触发架构中的元数据调度对于适应动态和不可预测的环境,确保任务执行的可靠性和效率至关重要。然而,传统方法在离线训练人工智能(AI)调度推理时面临着重大挑战,特别是由于构建一个全面的多调度图(MSG)所涉及的复杂性,该图考虑了所有可能的场景。生成捕获巨大概率空间的MSG的过程是资源密集型的,并且通常是不可行的,特别是在考虑诸如硬件故障、松弛变化或模式改变之类的上下文事件时。为了解决这些挑战,我们提出了一个自适应的在线学习单元集成在元数据库,以提高实时性能。开发此单元的主要动机源于离线训练的局限性,其中创建的MSG本质上是完整空间的子集,仅关注最可能和最关键的上下文事件。在在线模式下,强化学习(RL)通过不断探索和发现新的调度解决方案发挥着关键作用,从而随着时间的推移扩展MSG并提高系统性能。这种动态适应允许系统更有效地处理意外事件和复杂的调度场景。在在线学习单元中实现了几个RL模型,每个模型都旨在解决调度中的特定挑战。这些模型不仅有助于发现新的解决方案,而且还可以优化现有的解决方案,特别是在引入更严格的截止日期或新的性能标准时。通过实时训练不断完善人工智能推理,系统保持灵活性,能够满足不断变化的需求,从而确保在大规模安全关键环境中的鲁棒性和效率。
摘要:Metascheduling in time-triggered architectures has been crucial in adapting to dynamic and unpredictable environments, ensuring the reliability and efficiency of task execution. However, traditional approaches face significant challenges when training Artificial Intelligence (AI) scheduling inferences offline, particularly due to the complexities involved in constructing a comprehensive Multi-Schedule Graph (MSG) that accounts for all possible scenarios. The process of generating an MSG that captures the vast probability space, especially when considering context events like hardware failures, slack variations, or mode changes, is resource-intensive and often infeasible. To address these challenges, we propose an adaptive online learning unit integrated within the metascheduler to enhance performance in real-time. The primary motivation for developing this unit stems from the limitations of offline training, where the MSG created is inherently a subset of the complete space, focusing only on the most probable and critical context events. In the online mode, Reinforcement Learning (RL) plays a pivotal role by continuously exploring and discovering new scheduling solutions, thus expanding the MSG and enhancing system performance over time. This dynamic adaptation allows the system to handle unexpected events and complex scheduling scenarios more effectively. Several RL models were implemented within the online learning unit, each designed to address specific challenges in scheduling. These models not only facilitate the discovery of new solutions but also optimize existing schedulers, particularly when stricter deadlines or new performance criteria are introduced. By continuously refining the AI inferences through real-time training, the system remains flexible and capable of meeting evolving demands, thus ensuring robustness and efficiency in large-scale, safety-critical environments.


强化学习(7篇)

【1】Inverse Reinforcement Learning Using Just Classification and a Few Regressions
标题:使用简单分类和一些回归的反向强化学习
链接:https://arxiv.org/abs/2509.21172

作者:der Laan, Nathan Kallus, Aurélien Bibaut
摘要:反向强化学习(IRL)旨在通过揭示潜在的奖励来解释观察到的行为。在最大熵或Gumbel冲击奖励框架中,这相当于拟合一个奖励函数和一个软值函数,它们一起满足软贝尔曼一致性条件,并最大化观察到的行为的可能性。虽然这种观点对机器人的模仿学习和理解经济学中的动态选择产生了巨大的影响,但实际的学习算法通常涉及微妙的内环优化,重复的动态编程或对抗训练,所有这些都使神经网络和boosting等现代高度表达的函数逼近器的使用变得复杂。我们重新访问softmax IRL,并表明人口最大似然解的特征在于一个线性不动点方程涉及的行为政策。这一观察将IRL简化为两个现成的监督学习问题:估计行为策略的概率分类和求解不动点的迭代回归。由此产生的方法是简单的,模块化的函数逼近类和算法。我们提供了一个精确的表征的最佳解决方案,一个通用的基于Oracle的算法,有限样本的误差范围,和实证结果显示竞争力或优越的性能,以MaxEnt IRL。
摘要 :Inverse reinforcement learning (IRL) aims to explain observed behavior by uncovering an underlying reward. In the maximum-entropy or Gumbel-shocks-to-reward frameworks, this amounts to fitting a reward function and a soft value function that together satisfy the soft Bellman consistency condition and maximize the likelihood of observed actions. While this perspective has had enormous impact in imitation learning for robotics and understanding dynamic choices in economics, practical learning algorithms often involve delicate inner-loop optimization, repeated dynamic programming, or adversarial training, all of which complicate the use of modern, highly expressive function approximators like neural nets and boosting. We revisit softmax IRL and show that the population maximum-likelihood solution is characterized by a linear fixed-point equation involving the behavior policy. This observation reduces IRL to two off-the-shelf supervised learning problems: probabilistic classification to estimate the behavior policy, and iterative regression to solve the fixed point. The resulting method is simple and modular across function approximation classes and algorithms. We provide a precise characterization of the optimal solution, a generic oracle-based algorithm, finite-sample error bounds, and empirical results showing competitive or superior performance to MaxEnt IRL.


【2】Teaching RL Agents to Act Better: VLM as Action Advisor for Online Reinforcement Learning
标题:教RL代理更好地行动:VLM作为在线强化学习的行动顾问
链接:https://arxiv.org/abs/2509.21126

作者:u, Jing Zhao, Shu Zhang, Mingyu Hu
摘要:复杂任务中的在线强化学习是耗时的,因为需要大量的交互步骤来学习最优Q函数。视觉语言动作(VLA)策略代表了解决不同任务的一个有前途的方向;然而,它们在低级别控制上的性能仍然有限,有效的部署通常需要特定任务的专家演示进行微调。在本文中,我们提出了\textbf{VARL}(\textbf{V}LM作为在线学习的\textbf{A}顾问),这是一个利用视觉语言模型(VLM)的领域知识为强化学习代理提供行动建议的框架。与以前的方法不同,VARL提供行动建议,而不是设计启发式奖励,从而保证不变的最优性和收敛性。建议的操作增加了样本多样性,并最终提高了样本效率,特别是在稀疏奖励任务中。为了验证VARL的有效性,我们在不同的环境和代理设置中对其进行评估。结果表明,VARL大大提高了采样效率,而不引入显着的计算开销。这些优点使VARL成为在线强化学习的通用框架,并使在现实环境中从头开始直接应用强化学习成为可能。
摘要:Online reinforcement learning in complex tasks is time-consuming, as massive interaction steps are needed to learn the optimal Q-function.Vision-language action (VLA) policies represent a promising direction for solving diverse tasks; however, their performance on low-level control remains limited, and effective deployment often requires task-specific expert demonstrations for fine-tuning. In this paper, we propose \textbf{VARL} (\textbf{V}LM as \textbf{A}ction advisor for online \textbf{R}einforcement \textbf{L}earning), a framework that leverages the domain knowledge of vision-language models (VLMs) to provide action suggestions for reinforcement learning agents. Unlike previous methods, VARL provides action suggestions rather than designing heuristic rewards, thereby guaranteeing unchanged optimality and convergence. The suggested actions increase sample diversity and ultimately improve sample efficiency, especially in sparse-reward tasks. To validate the effectiveness of VARL, we evaluate it across diverse environments and agent settings. Results show that VARL greatly improves sample efficiency without introducing significant computational overhead. These advantages make VARL a general framework for online reinforcement learning and make it feasible to directly apply reinforcement learning from scratch in real-world environments.


【3】MPC-based Deep Reinforcement Learning Method for Space Robotic Control with Fuel Sloshing Mitigation
标题:基于MPC的空间机器人控制深度强化学习方法减缓燃料晃动
链接:https://arxiv.org/abs/2509.21045

作者:ezani, M. Amin Alandihallaj, Barış Can Yalçın, Miguel Angel Olivares Mendez, Holger Voos
备注:Pre-print version submitted to IEEE IROS
摘要:本文提出了一种集成的强化学习(RL)和模型预测控制(MPC)框架的自主卫星对接与部分填充的燃料箱。传统的对接控制面临着挑战,由于燃料晃动在微重力,这会导致不可预测的力量影响稳定性。为了解决这个问题,我们将近似策略优化(PPO)和软演员-评论(SAC)RL算法与MPC集成,利用MPC的预测能力来加速RL训练并提高控制鲁棒性。通过SnT零重力实验室的平面稳定性实验和6自由度对接燃油晃动动力学的高保真数值模拟,验证了所提出的方法。仿真结果表明,SAC-MPC实现了卓越的对接精度,更高的成功率,和更低的控制工作量,优于独立的RL和PPO-MPC方法。这项研究推进了燃料效率和抗干扰卫星对接,提高了在轨加油和服务任务的可行性。
摘要:This paper presents an integrated Reinforcement Learning (RL) and Model Predictive Control (MPC) framework for autonomous satellite docking with a partially filled fuel tank. Traditional docking control faces challenges due to fuel sloshing in microgravity, which induces unpredictable forces affecting stability. To address this, we integrate Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC) RL algorithms with MPC, leveraging MPC's predictive capabilities to accelerate RL training and improve control robustness. The proposed approach is validated through Zero-G Lab of SnT experiments for planar stabilization and high-fidelity numerical simulations for 6-DOF docking with fuel sloshing dynamics. Simulation results demonstrate that SAC-MPC achieves superior docking accuracy, higher success rates, and lower control effort, outperforming standalone RL and PPO-MPC methods. This study advances fuel-efficient and disturbance-resilient satellite docking, enhancing the feasibility of on-orbit refueling and servicing missions.


【4】Model-Based Reinforcement Learning under Random Observation Delays
标题:随机观察延迟下基于模型的强化学习
链接:https://arxiv.org/abs/2509.20869

作者:amzade, Kyungmin Kim, JB Lanier, Davide Corsi, Roy Fox
摘要:延迟经常发生在现实世界的环境中,然而标准的强化学习(RL)算法通常假设对环境的瞬时感知。我们研究随机传感器延迟POMDPs,观察可能到达的顺序,以前没有在RL中解决的设置。我们分析了这种延迟的结构,并证明了天真的方法,如堆叠过去的观察,是不足以可靠的性能。为了解决这个问题,我们提出了一个基于模型的过滤过程,顺序更新的信念状态的基础上传入的观测流。然后,我们引入了一个简单的延迟感知框架,将这一想法融入到基于模型的强化学习中,使代理能够有效地处理随机延迟。将此框架应用于Dreamer,我们将我们的方法与为MDP开发的延迟感知基线进行比较。我们的方法始终优于这些基线,并证明了部署过程中延迟分布变化的鲁棒性。此外,我们提出了模拟机器人任务的实验,比较我们的方法,以常见的实用实用的appropriatics和强调明确建模的重要性观察延迟。
摘要 :Delays frequently occur in real-world environments, yet standard reinforcement learning (RL) algorithms often assume instantaneous perception of the environment. We study random sensor delays in POMDPs, where observations may arrive out-of-sequence, a setting that has not been previously addressed in RL. We analyze the structure of such delays and demonstrate that naive approaches, such as stacking past observations, are insufficient for reliable performance. To address this, we propose a model-based filtering process that sequentially updates the belief state based on an incoming stream of observations. We then introduce a simple delay-aware framework that incorporates this idea into model-based RL, enabling agents to effectively handle random delays. Applying this framework to Dreamer, we compare our approach to delay-aware baselines developed for MDPs. Our method consistently outperforms these baselines and demonstrates robustness to delay distribution shifts during deployment. Additionally, we present experiments on simulated robotic tasks, comparing our method to common practical heuristics and emphasizing the importance of explicitly modeling observation delays.


【5】Leveraging Temporally Extended Behavior Sharing for Multi-task Reinforcement Learning
标题:利用时间扩展行为共享进行多任务强化学习
链接:https://arxiv.org/abs/2509.20766

作者: (1), Daesol Cho (1), H. Jin Kim (1) ((1) Seoul National University)
备注:Accepted for publication in the proceedings of the 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
摘要:多任务强化学习(MTRL)提供了一种很有前途的方法,通过跨多个任务训练代理来提高样本效率和泛化能力,从而实现它们之间的知识共享。然而,将MTRL应用于机器人技术仍然具有挑战性,因为收集不同任务数据的成本很高。为了解决这个问题,我们提出了MT-L\'evy,一种新的探索策略,通过将跨任务的行为共享与L\' evy飞行启发的时间扩展探索相结合,提高了MTRL环境中的样本效率。MT-L\'evy利用在相关任务上训练的策略来引导对关键状态的探索,同时基于任务成功率动态调整探索级别。这种方法可以实现更有效的状态空间覆盖,即使在复杂的机器人环境中。实证结果表明,MT-L\'evy显着提高勘探和采样效率,支持定量和定性分析。消融研究进一步强调了每个组件的贡献,表明将行为共享与自适应探索策略相结合可以显着提高MTRL在机器人应用中的实用性。
摘要:Multi-task reinforcement learning (MTRL) offers a promising approach to improve sample efficiency and generalization by training agents across multiple tasks, enabling knowledge sharing between them. However, applying MTRL to robotics remains challenging due to the high cost of collecting diverse task data. To address this, we propose MT-L\'evy, a novel exploration strategy that enhances sample efficiency in MTRL environments by combining behavior sharing across tasks with temporally extended exploration inspired by L\'evy flight. MT-L\'evy leverages policies trained on related tasks to guide exploration towards key states, while dynamically adjusting exploration levels based on task success ratios. This approach enables more efficient state-space coverage, even in complex robotics environments. Empirical results demonstrate that MT-L\'evy significantly improves exploration and sample efficiency, supported by quantitative and qualitative analyses. Ablation studies further highlight the contribution of each component, showing that combining behavior sharing with adaptive exploration strategies can significantly improve the practicality of MTRL in robotics applications.


【6】CE-GPPO: Controlling Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning
标题:CE-GPPO:在强化学习中通过保留对象的剪辑策略优化控制信息
链接:https://arxiv.org/abs/2509.20712

作者:Su, Leiyu Pan, Minxuan Lv, Yuntao Li, Wenping Hu, Fuzheng Zhang, Kun Gai, Guorui Zhou
摘要:强化学习(RL)已经成为优化大型语言模型(LLM)以处理复杂推理任务的强大范例。这一过程的核心挑战在于管理政策熵,它反映了培训期间探索和利用之间的平衡。现有的方法,如邻近策略优化(PPO)及其变体,丢弃有价值的梯度信号从低概率令牌由于裁剪机制。我们系统地分析了熵动力学,并揭示了这些剪切令牌在调节熵演化中起着关键但被忽视的作用。我们提出了\textbf{C} controlling\textbf{E}ntropy via \textbf{G}radient-\textbf {P}reserving \textbf{P} policy\textbf{O} optimization(CE-GPPO),这是一种新的算法,它以温和和有界的方式从原生PPO中的裁剪令牌重新引入梯度。通过控制来自限幅区间之外的令牌的梯度的幅度,CE-GPPO能够实现探索-利用权衡。我们提供的理论依据和经验证据表明,CE-GPPO有效地减轻熵不稳定性。对数学推理基准的广泛实验表明,CE-GPPO在不同的模型尺度上始终优于强基线。
摘要:Reinforcement learning (RL) has become a powerful paradigm for optimizing large language models (LLMs) to handle complex reasoning tasks. A core challenge in this process lies in managing policy entropy, which reflects the balance between exploration and exploitation during training. Existing methods, such as proximal policy optimization (PPO) and its variants, discard valuable gradient signals from low-probability tokens due to the clipping mechanism. We systematically analyze the entropy dynamics and reveal that these clipped tokens play a critical yet overlooked role in regulating entropy evolution. We propose \textbf{C}ontrolling \textbf{E}ntropy via \textbf{G}radient-\textbf{P}reserving \textbf{P}olicy \textbf{O}ptimization (CE-GPPO), a novel algorithm that reintroduces gradients from clipped tokens in native PPO in a gentle and bounded manner. By controlling the magnitude of gradients from tokens outside the clipping interval, CE-GPPO is able to achieve an exploration-exploitation trade-off. We provide theoretical justification and empirical evidence showing that CE-GPPO effectively mitigates entropy instability. Extensive experiments on mathematical reasoning benchmarks show that CE-GPPO consistently outperforms strong baselines across different model scales.


【7】Offline Goal-conditioned Reinforcement Learning with Quasimetric Representations
标题:具有准度量表示的离线目标条件强化学习
链接:https://arxiv.org/abs/2509.20478

作者:rs, Bill Chunyuan Zheng, Benjamin Eysenbach, Sergey Levine
摘要:目标条件强化学习(GCRL)的方法通常使用学习的状态表示来提取达到目标的策略。表示结构的两个框架产生了特别有效的GCRL算法:(1)* 对比表示 *,其中方法学习“后继特征”,具有对未来结果进行推断的对比目标,以及(2)* 时间距离 *,将表示空间中的(准度量)距离与从状态到目标的过渡时间联系起来。我们提出了一种方法,统一这两个框架,使用准度量表示空间(三角不等式)的结构与正确的附加约束,学习后继表示,使最佳的目标达成。与过去的工作不同,我们的方法能够利用 ** 准度量 ** 距离参数化来学习 ** 最佳 ** 目标到达距离,即使在 ** 次优 ** 数据和 ** 随机 ** 环境中也是如此。这给了我们两全其美的效果:我们保留了Monte Carlo对比RL方法的稳定性和长期能力,同时获得了准度量网络参数化的自由拼接能力。在现有的离线GCRL基准测试中,我们的表示学习目标提高了基于对比学习方法的拼接任务的性能,以及基于准度量网络方法的噪声,高维环境的性能。
摘要 :Approaches for goal-conditioned reinforcement learning (GCRL) often use learned state representations to extract goal-reaching policies. Two frameworks for representation structure have yielded particularly effective GCRL algorithms: (1) *contrastive representations*, in which methods learn "successor features" with a contrastive objective that performs inference over future outcomes, and (2) *temporal distances*, which link the (quasimetric) distance in representation space to the transit time from states to goals. We propose an approach that unifies these two frameworks, using the structure of a quasimetric representation space (triangle inequality) with the right additional constraints to learn successor representations that enable optimal goal-reaching. Unlike past work, our approach is able to exploit a **quasimetric** distance parameterization to learn **optimal** goal-reaching distances, even with **suboptimal** data and in **stochastic** environments. This gives us the best of both worlds: we retain the stability and long-horizon capabilities of Monte Carlo contrastive RL methods, while getting the free stitching capabilities of quasimetric network parameterizations. On existing offline GCRL benchmarks, our representation learning objective improves performance on stitching tasks where methods based on contrastive learning struggle, and on noisy, high-dimensional environments where methods based on quasimetric networks struggle.


分层学习(1篇)

【1】Learning Greens Operators through Hierarchical Neural Networks Inspired by the Fast Multipole Method
标题:受快速多极法启发,通过分层神经网络学习格林运算符
链接:https://arxiv.org/abs/2509.20591

作者:Allister Fognini, Marta M. Betcke, Ben T. Cox
备注:Previously under review at ICLR 2025, originally submitted on the 12th of May 2025. The OpenReview page can be found at: this http URL
摘要:快速多极子方法(FMM)是计算引力场和静电场中N体问题中长程力的一种有效的数值算法。该方法利用了底层动力系统固有的格林函数的多极展开。尽管FMM在物理学和工程学中有着广泛的应用,但它与现代机器学习架构的集成仍然没有得到充分的探索。在这项工作中,我们提出了一种新的神经网络架构,神经FMM,它集成了FMM的信息流到一个分层的机器学习框架学习的椭圆PDE的格林算子。我们的神经FMM架构利用FMM方法的分层计算流程来分割本地和远场交互,并有效地学习它们各自的表示。
摘要:The Fast Multipole Method (FMM) is an efficient numerical algorithm for computation of long-ranged forces in $N$-body problems within gravitational and electrostatic fields. This method utilizes multipole expansions of the Green's function inherent to the underlying dynamical systems. Despite its widespread application in physics and engineering, the integration of FMM with modern machine learning architectures remains underexplored. In this work, we propose a novel neural network architecture, the Neural FMM, that integrates the information flow of the FMM into a hierarchical machine learning framework for learning the Green's operator of an Elliptic PDE. Our Neural FMM architecture leverages a hierarchical computation flow of the FMM method to split up the local and far-field interactions and efficiently learn their respective representations.


医学相关(4篇)

【1】Robust Multi-Omics Integration from Incomplete Modalities Significantly Improves Prediction of Alzheimer's Disease
标题:来自不完整模式的稳健多组学集成显着改善阿尔茨海默病的预测
链接:https://arxiv.org/abs/2509.20842

作者:Park, Kyungwook Lee, Soorin Yim, Doyeong Hwang, Dongyun Kim, Soonyoung Lee, Amy Dunn, Daniel Gatti, Elissa Chesler, Kristen O'Connell, Kiyoung Kim
摘要:多组学数据捕获复杂的生物分子相互作用,并提供对代谢和疾病的见解。然而,缺失的模式阻碍了跨异质组学的综合分析。为了解决这个问题,我们提出了MOIRA(多组学集成与鲁棒性缺席模态),早期的集成方法,使强大的学习不完整的组学数据通过表示对齐和自适应聚合。MOIRA通过将每个组学数据集投影到一个共享的嵌入空间,利用所有样本,包括那些缺少模态的样本,在这个空间中,一个可学习的加权机制将它们融合在一起。在阿尔茨海默病(AD)的宗教秩序研究和记忆与衰老项目(ROSMAP)数据集上进行评估,MOIRA优于现有方法,进一步的消融研究证实了模态方面的贡献。特征重要性分析显示AD相关生物标志物与先前文献一致,突出了我们方法的生物学相关性。
摘要:Multi-omics data capture complex biomolecular interactions and provide insights into metabolism and disease. However, missing modalities hinder integrative analysis across heterogeneous omics. To address this, we present MOIRA (Multi-Omics Integration with Robustness to Absent modalities), an early integration method enabling robust learning from incomplete omics data via representation alignment and adaptive aggregation. MOIRA leverages all samples, including those with missing modalities, by projecting each omics dataset onto a shared embedding space where a learnable weighting mechanism fuses them. Evaluated on the Religious Order Study and Memory and Aging Project (ROSMAP) dataset for Alzheimer's Disease (AD), MOIRA outperformed existing approaches, and further ablation studies confirmed modality-wise contributions. Feature importance analysis revealed AD-related biomarkers consistent with prior literature, highlighting the biological relevance of our approach.


【2】Generalizable Diabetes Risk Stratification via Hybrid Machine Learning Models
标题:通过混合机器学习模型进行可推广的糖尿病风险分层
链接:https://arxiv.org/abs/2509.20565

作者:vez, Muhammad Jawad Mufti
摘要 :背景/目的:糖尿病影响着全世界超过5.37亿人,预计到2045年将达到7.83亿人。早期风险分层可以从机器学习中受益。我们比较了两个混合分类器,并评估其泛化能力的外部队列。   方法:构建两种混合模型:(i)XGBoost +随机森林(XGB-RF)和(ii)支持向量机+逻辑回归(SVM-LR)。将泄漏安全的标准化管道(编码、插补、最小-最大缩放;仅对训练折叠的SMOTE; SVM的概率校准)拟合在主要数据集上并冻结。评价优先考虑阈值无关的区分(AUROC/AUPRC)和校准(Brier,斜率/截距)。外部验证使用具有冻结管道的PIMA队列(N=768);在默认规则tau = 0.5下计算PIMA的任何阈值度量。   结果如下:在主要数据集(PR基线= 0.50)上,XGB-RF达到AUROC ~0.995和AUPRC ~0.998,优于SVM-LR(AUROC ~0.978; AUPRC ~0.947)。在PIMA(PR基线~0.349)上,XGB-RF保持了较强的性能(AUROC ~0.990; AUPRC ~0.959); SVM-LR较低(AUROC ~0.963; AUPRC ~0.875)。在tau = 0.5时,PIMA的最佳指标是XGB-RF(准确度0.960;精度0.941;召回率0.944; F1 0.942)和SVM-LR(准确度0.900;精度0.855;召回率0.858; F1 0.857)。   结论:在内部和外部队列中,XGB-RF始终主导SVM-LR,并在ROC/PR上表现出较小的外部衰减,校准可接受。这些结果支持基于梯度提升的杂交作为糖尿病风险分层的一种稳健的、可转移的方法,并激励基于临床权衡的部署时间阈值选择的前瞻性、多站点验证。
摘要:Background/Purpose: Diabetes affects over 537 million people worldwide and is projected to reach 783 million by 2045. Early risk stratification can benefit from machine learning. We compare two hybrid classifiers and assess their generalizability on an external cohort.   Methods: Two hybrids were built: (i) XGBoost + Random Forest (XGB-RF) and (ii) Support Vector Machine + Logistic Regression (SVM-LR). A leakage-safe, standardized pipeline (encoding, imputation, min-max scaling; SMOTE on training folds only; probability calibration for SVM) was fit on the primary dataset and frozen. Evaluation prioritized threshold-independent discrimination (AUROC/AUPRC) and calibration (Brier, slope/intercept). External validation used the PIMA cohort (N=768) with the frozen pipeline; any thresholded metrics on PIMA were computed at the default rule tau = 0.5.   Results: On the primary dataset (PR baseline = 0.50), XGB-RF achieved AUROC ~0.995 and AUPRC ~0.998, outperforming SVM-LR (AUROC ~0.978; AUPRC ~0.947). On PIMA (PR baseline ~0.349), XGB-RF retained strong performance (AUROC ~0.990; AUPRC ~0.959); SVM-LR was lower (AUROC ~0.963; AUPRC ~0.875). Thresholded metrics on PIMA at tau = 0.5 were XGB-RF (Accuracy 0.960; Precision 0.941; Recall 0.944; F1 0.942) and SVM-LR (Accuracy 0.900; Precision 0.855; Recall 0.858; F1 0.857).   Conclusions: Across internal and external cohorts, XGB-RF consistently dominated SVM-LR and exhibited smaller external attenuation on ROC/PR with acceptable calibration. These results support gradient-boosting-based hybridization as a robust, transferable approach for diabetes risk stratification and motivate prospective, multi-site validation with deployment-time threshold selection based on clinical trade-offs.


【3】Intercept Cancer: Cancer Pre-Screening with Large Scale Healthcare Foundation Models
标题:拦截癌症:利用大规模医疗保健基金会模型进行癌症预筛查
链接:https://arxiv.org/abs/2506.00209

作者:, Hao-Ren Yao, Gary Gao, Ophir Frieder, Chenyan Xiong
摘要:癌症筛查,导致早期发现,挽救生命。不幸的是,现有的筛查技术需要昂贵和侵入性的医疗程序,不是全球可用的,导致太多本来可以挽救的生命丧失。我们提出了CATCH-FM,CATCH癌症早期与医疗保健基金会模型,一种癌症预筛查方法,仅根据他们的历史医疗记录识别高风险患者进行进一步筛查。通过数以百万计的电子医疗记录(EHR),我们建立了在医疗代码序列上预训练的EHR基础模型的比例律,预训练了多达24亿个参数的计算最佳基础模型,并在临床医生策划的癌症风险预测队列中对其进行微调。在我们对3万例患者的回顾性评价中,CATCH-FM实现了较强的疗效(60%的灵敏度)和较低的风险(99%的特异性和阴性预测值),大大优于基于特征的树模型以及一般和医学大型语言模型。尽管存在显著的人口统计学、医疗保健系统和EHR编码差异,但CATCH-FM在EHRSHOT Few-Shot排行榜上实现了最先进的胰腺癌风险预测,优于使用现场患者数据预训练的EHR基础模型。我们的分析证明了CATCH-FM在各种患者分布中的鲁棒性,在ICD代码空间中操作的益处,以及其捕获非平凡癌症风险因素的能力。我们的代码将是开源的。
摘要:Cancer screening, leading to early detection, saves lives. Unfortunately, existing screening techniques require expensive and intrusive medical procedures, not globally available, resulting in too many lost would-be-saved lives. We present CATCH-FM, CATch Cancer early with Healthcare Foundation Models, a cancer pre-screening methodology that identifies high-risk patients for further screening solely based on their historical medical records. With millions of electronic healthcare records (EHR), we establish the scaling law of EHR foundation models pretrained on medical code sequences, pretrain compute-optimal foundation models of up to 2.4 billion parameters, and finetune them on clinician-curated cancer risk prediction cohorts. In our retrospective evaluation comprising of thirty thousand patients, CATCH-FM achieved strong efficacy (60% sensitivity) with low risk (99% specificity and Negative Predictive Value), outperforming feature-based tree models as well as general and medical large language models by large margins. Despite significant demographic, healthcare system, and EHR coding differences, CATCH-FM achieves state-of-the-art pancreatic cancer risk prediction on the EHRSHOT few-shot leaderboard, outperforming EHR foundation models pretrained using on-site patient data. Our analysis demonstrates the robustness of CATCH-FM in various patient distributions, the benefits of operating in the ICD code space, and its ability to capture non-trivial cancer risk factors. Our code will be open-sourced.


【4】Copycats: the many lives of a publicly available medical imaging dataset
标题:模仿者:公开医疗成像数据集的许多生命
链接:https://arxiv.org/abs/2402.06353

作者:ménez-Sánchez, Natalia-Rozalia Avlona, Dovile Juodelyte, Théo Sourget, Caroline Vang-Larsen, Anna Rogers, Hubert Dariusz Zając, Veronika Cheplygina
备注:NeurIPS 2024 Track on Datasets and Benchmarks. Please note that v1 has a different title
摘要:医学成像(MI)数据集是医疗保健领域人工智能的基础。诊断算法的准确性、鲁棒性和公平性取决于用于训练和评估模型的数据(及其质量)。MI数据集过去是专有的,但现在越来越多地向公众开放,包括在Kaggle或HuggingFace等社区贡献平台(CCP)上。虽然开放数据对于提高数据公共价值的再分配很重要,但我们发现,当前的CCP治理模式未能维护共享,记录和评估数据集所需的质量和推荐的实践。在本文中,我们对CCP上公开可用的机器学习数据集进行了分析,讨论了数据集的背景,并确定了当前CCP格局中的局限性和差距。我们强调了MI和计算机视觉数据集之间的差异,特别是在推荐的数据集管理实践的不良采用所带来的潜在有害下游影响方面。我们在多个维度上比较分析的数据集,包括数据共享,数据文档和维护。我们发现模糊的许可证,缺乏持久性标识符和存储,重复和丢失的元数据,以及平台之间的差异。我们的研究有助于为医疗保健提供负责任的数据管理和人工智能算法。
摘要 :Medical Imaging (MI) datasets are fundamental to artificial intelligence in healthcare. The accuracy, robustness, and fairness of diagnostic algorithms depend on the data (and its quality) used to train and evaluate the models. MI datasets used to be proprietary, but have become increasingly available to the public, including on community-contributed platforms (CCPs) like Kaggle or HuggingFace. While open data is important to enhance the redistribution of data's public value, we find that the current CCP governance model fails to uphold the quality needed and recommended practices for sharing, documenting, and evaluating datasets. In this paper, we conduct an analysis of publicly available machine learning datasets on CCPs, discussing datasets' context, and identifying limitations and gaps in the current CCP landscape. We highlight differences between MI and computer vision datasets, particularly in the potentially harmful downstream effects from poor adoption of recommended dataset management practices. We compare the analyzed datasets across several dimensions, including data sharing, data documentation, and maintenance. We find vague licenses, lack of persistent identifiers and storage, duplicates, and missing metadata, with differences between the platforms. Our research contributes to efforts in responsible data curation and AI algorithms for healthcare.


蒸馏|知识提取(1篇)

【1】FERD: Fairness-Enhanced Data-Free Robustness Distillation
标题:FERD:公平增强的无数据稳健蒸馏
链接:https://arxiv.org/abs/2509.20793

作者: Li, Liming Lu, Xu Zheng, Siyuan Liang, Zhenghan Chen, Yongbin Zhou, Shuchao Pang
摘要:无数据鲁棒性蒸馏(DFRD)旨在将鲁棒性从教师转移到学生,而无需访问训练数据。虽然现有的方法集中在整体的鲁棒性,他们忽略了强大的公平性问题,导致不同类别之间的鲁棒性的严重差异。在本文中,我们发现两个关键问题:(1)学生模型提取等类比例的数据表现出显着不同的类别;(2)学生模型的鲁棒性是不稳定的不同的攻击目标。为了弥合这些差距,我们提出了第一个公平增强的无数据鲁棒性蒸馏(FERD)框架,以调整对抗性示例的比例和分布。对于比例,FERD采用鲁棒性指导的类别权重调整策略,为鲁棒性较差的类别合成更多的样本,从而提高其鲁棒性。对于分布,FERD生成补充数据样本,用于高级鲁棒性蒸馏。它通过对特征级预测实施一致性约束来生成公平感知示例(FAE),这抑制了特定于类的非鲁棒特征的主导地位,从而在所有类别中提供更平衡的表示。然后,FERD通过应用统一的目标类约束来从FAE中构造统一目标对抗示例(UTAE),以避免有偏见的攻击方向,将攻击目标分布在所有类别中,并防止过度拟合特定的脆弱类别。在三个公开数据集上的大量实验表明,FERD在所有对抗性攻击(例如,在CIFAR-10上使用MobileNet-V2,FGSM和AutoAttack下的最差类鲁棒性分别提高了15.1%和6.4%,在鲁棒性和公平性方面都表现出了优异的性能。
摘要:Data-Free Robustness Distillation (DFRD) aims to transfer the robustness from the teacher to the student without accessing the training data. While existing methods focus on overall robustness, they overlook the robust fairness issues, leading to severe disparity of robustness across different categories. In this paper, we find two key problems: (1) student model distilled with equal class proportion data behaves significantly different across distinct categories; and (2) the robustness of student model is not stable across different attacks target. To bridge these gaps, we present the first Fairness-Enhanced data-free Robustness Distillation (FERD) framework to adjust the proportion and distribution of adversarial examples. For the proportion, FERD adopts a robustness-guided class reweighting strategy to synthesize more samples for the less robust categories, thereby improving robustness of them. For the distribution, FERD generates complementary data samples for advanced robustness distillation. It generates Fairness-Aware Examples (FAEs) by enforcing a uniformity constraint on feature-level predictions, which suppress the dominance of class-specific non-robust features, providing a more balanced representation across all categories. Then, FERD constructs Uniform-Target Adversarial Examples (UTAEs) from FAEs by applying a uniform target class constraint to avoid biased attack directions, which distribute the attack targets across all categories and prevents overfitting to specific vulnerable categories. Extensive experiments on three public datasets show that FERD achieves state-of-the-art worst-class robustness under all adversarial attack (e.g., the worst-class robustness under FGSM and AutoAttack are improved by 15.1\% and 6.4\% using MobileNet-V2 on CIFAR-10), demonstrating superior performance in both robustness and fairness aspects.


推荐(2篇)

【1】IntSR: An Integrated Generative Framework for Search and Recommendation
标题:IntSR:一个用于搜索和推荐的集成生成框架
链接:https://arxiv.org/abs/2509.21179

作者:n, Longfei Xu, Junjie Sun, Ni Ou, Wei Luo, Xing Tan, Ran Cheng, Kaikui Liu, Xiangxiang Chu
摘要:生成式推荐已经成为一种很有前途的模式,在学术基准和工业应用中都取得了显著的成果。然而,现有的系统主要集中在统一的检索和排名,而忽略了搜索和推荐(S&R)任务的集成。搜索和推荐的不同之处在于查询的形成方式:搜索使用显式的用户请求,而推荐依赖于隐式的用户兴趣。至于检索与排名,区别归结为查询是否是目标项本身。认识到作为中心元素的查询,我们提出了IntSR,一个集成的生成框架S&R。IntSR使用不同的查询方式集成这些不同的任务。它还解决了与集成的S&R行为相关的计算复杂性增加以及动态变化的语料库引入的错误模式学习。IntSR已成功部署在Amap的各种场景中,使数字资产的GMV(+3.02%),POI建议的CTR(+2.76%)和旅行模式建议的ACC(+5.13%)大幅改善。
摘要:Generative recommendation has emerged as a promising paradigm, demonstrating remarkable results in both academic benchmarks and industrial applications. However, existing systems predominantly focus on unifying retrieval and ranking while neglecting the integration of search and recommendation (S&R) tasks. What makes search and recommendation different is how queries are formed: search uses explicit user requests, while recommendation relies on implicit user interests. As for retrieval versus ranking, the distinction comes down to whether the queries are the target items themselves. Recognizing the query as central element, we propose IntSR, an integrated generative framework for S&R. IntSR integrates these disparate tasks using distinct query modalities. It also addresses the increased computational complexity associated with integrated S&R behaviors and the erroneous pattern learning introduced by a dynamically changing corpus. IntSR has been successfully deployed across various scenarios in Amap, leading to substantial improvements in digital asset's GMV(+3.02%), POI recommendation's CTR(+2.76%), and travel mode suggestion's ACC(+5.13%).


【2】RecIS: Sparse to Dense, A Unified Training Framework for Recommendation Models
标题:RecIS:从稀疏到密集,推荐模型的统一训练框架
链接:https://arxiv.org/abs/2509.20883

作者: Qingtao Zeng, Zhengxiong Zhou, Zhihua Han, Zhensong Yan, Mingjie Liu, Hechen Sun, Jiawei Liu, Yiwen Hu, Qi Wang, YiHan Xian, Wenjie Guo, Houyuan Xiang, Zhiyuan Zeng, Xiangrong Sheng, Bencheng Yan, Nan Hu, Yuheng Huang, Jinqing Lian, Ziru Xu, Yan Zhang, Ju Huang, Siran Yang, Huimin Yi, Jiamang Wang, Pengjie Wang, Han Zhu, Jian Wu, Dan Ou, Jian Xu, Haihong Tang, Yuning Jiang, Bo Zheng, Lin Qu
摘要:在本文中,我们提出了RecIS,一个统一的稀疏-密集训练框架,旨在实现两个主要目标:1。统一框架基于PyTorch生态系统创建统一的稀疏密集训练框架,满足与大型模型集成的工业级推荐模型的训练需求。2.系统优化优化稀疏组件,提供优于基于TensorFlow的推荐模型的效率。同时,密集组件利用了PyTorch生态系统中现有的优化技术。目前,RecIS正在阿里巴巴用于大量的大模型增强推荐训练任务,一些传统的稀疏模型也开始在其中训练。
摘要:In this paper, we propose RecIS, a unified Sparse-Dense training framework designed to achieve two primary goals: 1. Unified Framework To create a Unified sparse-dense training framework based on the PyTorch ecosystem that meets the training needs of industrial-grade recommendation models that integrated with large models. 2.System Optimization To optimize the sparse component, offering superior efficiency over the TensorFlow-based recommendation models. The dense component, meanwhile, leverages existing optimization technologies within the PyTorch ecosystem. Currently, RecIS is being used in Alibaba for numerous large-model enhanced recommendation training tasks, and some traditional sparse models have also begun training in it.


聚类(1篇)

【1】Beyond Visual Similarity: Rule-Guided Multimodal Clustering with explicit domain rules
标题:超越视觉相似性:具有显式领域规则的规则引导多模式集群
链接:https://arxiv.org/abs/2509.20501

作者:tta Gupta, Mohd Ariful Haque, Marufa Kamal, Ahmed Rafi Hasan, Md. Mahfuzur Rahman, Roy George
备注:12 pages, 9 figures
摘要:传统的聚类技术通常仅依赖于输入数据的相似性,限制了它们捕获在许多领域中至关重要的结构或语义约束的能力。我们介绍了域感知规则触发变分自动编码器(DARTVAE),规则引导的多模态聚类框架,将域特定的约束直接到表示学习过程。DARTVAE通过将显式规则、语义表示和数据驱动特征嵌入到统一的潜在空间中来扩展VAE架构,同时通过损失函数中的规则一致性和违规惩罚来强制约束合规。与仅依赖于视觉相似性或将规则作为事后过滤器的传统聚类方法不同,DARTVAE将规则视为第一类学习信号。这些规则由LLM生成,结构化为知识图,并通过结合重构、KL发散、一致性和违规惩罚的损失函数来执行。在飞机和汽车数据集上的实验表明,规则引导的聚类产生了更具操作意义和可解释性的聚类,例如,隔离无人机,统一隐形飞机,或将SUV与轿车分离,同时改进了传统的聚类指标。然而,该框架面临着挑战:LLM生成的规则可能会产生幻觉或冲突,过多的规则有过拟合的风险,扩展到复杂的领域会增加计算和一致性的困难。通过将规则编码与学习表示相结合,DARTVAE实现了比纯粹数据驱动模型更有意义和一致的聚类结果,突出了约束引导多模态聚类在复杂知识密集型环境中的实用性。
摘要:Traditional clustering techniques often rely solely on similarity in the input data, limiting their ability to capture structural or semantic constraints that are critical in many domains. We introduce the Domain Aware Rule Triggered Variational Autoencoder (DARTVAE), a rule guided multimodal clustering framework that incorporates domain specific constraints directly into the representation learning process. DARTVAE extends the VAE architecture by embedding explicit rules, semantic representations, and data driven features into a unified latent space, while enforcing constraint compliance through rule consistency and violation penalties in the loss function. Unlike conventional clustering methods that rely only on visual similarity or apply rules as post hoc filters, DARTVAE treats rules as first class learning signals. The rules are generated by LLMs, structured into knowledge graphs, and enforced through a loss function combining reconstruction, KL divergence, consistency, and violation penalties. Experiments on aircraft and automotive datasets demonstrate that rule guided clustering produces more operationally meaningful and interpretable clusters for example, isolating UAVs, unifying stealth aircraft, or separating SUVs from sedans while improving traditional clustering metrics. However, the framework faces challenges: LLM generated rules may hallucinate or conflict, excessive rules risk overfitting, and scaling to complex domains increases computational and consistency difficulties. By combining rule encodings with learned representations, DARTVAE achieves more meaningful and consistent clustering outcomes than purely data driven models, highlighting the utility of constraint guided multimodal clustering for complex, knowledge intensive settings.


超分辨率|去噪|去模糊|去雾(2篇)

【1】Deterministic Discrete Denoising
标题:确定性离散去噪
链接:https://arxiv.org/abs/2509.20896

作者:Suzuki, Hiroshi Yamashita
备注:9 pages, 1 figure
摘要:提出了一种基于马尔可夫链的离散状态扩散模型的确定性去噪算法。通过引入具有弱混沌动力学的羊群算法的变体,从而诱导确定性的离散状态转换,使生成的反向过程去随机化。我们的方法是随机去噪过程的直接替代,既不需要再训练,也不需要连续状态嵌入。我们在文本和图像生成任务的效率和样本质量方面都有了持续的改进。因此,这种简单的去随机化方法有望提高离散扩散在生成建模中的重要性。此外,我们的研究结果表明,确定性的反向过程,以及建立在连续扩散,也可以是有效的离散状态空间。
摘要:We propose a deterministic denoising algorithm for discrete-state diffusion models based on Markov chains. The generative reverse process is derandomized by introducing a variant of the herding algorithm with weakly chaotic dynamics, which induces deterministic discrete state transitions. Our approach is a direct replacement for the stochastic denoising process, requiring neither retraining nor continuous state embeddings. We demonstrate consistent improvements in both efficiency and sample quality on text and image generation tasks. Thus, this simple derandomization approach is expected to enhance the significance of discrete diffusion in generative modeling. Furthermore, our results reveal that deterministic reverse processes, well established in continuous diffusion, can also be effective in discrete state spaces.


【2】Implicit Augmentation from Distributional Symmetry in Turbulence Super-Resolution
标题:湍流超分辨率中分布对称性的隐式增强
链接:https://arxiv.org/abs/2509.20683

作者 :la, Jeremiah Bailey, Ali Backour, Elyssa Hofgard, Tommi Jaakkola, Tess Smidt, Ryley McConkey
备注:Accepted to Machine Learning and the Physical Sciences Workshop at NeurIPS 2025
摘要:模拟湍流的巨大计算成本促使机器学习方法用于超分辨湍流。一个核心挑战是确保学习的模型尊重物理对称性,例如旋转等变性。我们表明,标准卷积神经网络(CNN)可以部分获得这种对称性,而无需显式增强或专门的架构,因为湍流本身在时间和空间上都提供了隐式旋转增强。使用具有不同各向异性的3D通道流子域,我们发现在各向同性的中平面数据上训练的模型比在边界层数据上训练的模型实现了更低的等方差误差,并且更大的时间或空间采样进一步降低了这种误差。我们显示了一个明显的尺度依赖性的等方差误差,无论数据集的各向异性是一致的Kolmogorov的局部各向同性假设。这些结果澄清了何时必须将旋转对称性明确纳入学习算法,以及何时可以直接从湍流中获得,从而实现更有效和更敏感的超分辨率。
摘要:The immense computational cost of simulating turbulence has motivated the use of machine learning approaches for super-resolving turbulent flows. A central challenge is ensuring that learned models respect physical symmetries, such as rotational equivariance. We show that standard convolutional neural networks (CNNs) can partially acquire this symmetry without explicit augmentation or specialized architectures, as turbulence itself provides implicit rotational augmentation in both time and space. Using 3D channel-flow subdomains with differing anisotropy, we find that models trained on more isotropic mid-plane data achieve lower equivariance error than those trained on boundary layer data, and that greater temporal or spatial sampling further reduces this error. We show a distinct scale-dependence of equivariance error that occurs regardless of dataset anisotropy that is consistent with Kolmogorov's local isotropy hypothesis. These results clarify when rotational symmetry must be explicitly incorporated into learning algorithms and when it can be obtained directly from turbulence, enabling more efficient and symmetry-aware super-resolution.


联邦学习|隐私保护|加密(4篇)

【1】Emerging Paradigms for Securing Federated Learning Systems
标题:保护联邦学习系统的新兴范式
链接:https://arxiv.org/abs/2509.21147

作者: Abouelmagd, Amr Hilal
摘要:联邦学习(FL)促进了协作模型训练,同时保持原始数据分散,使其成为利用物联网设备的力量的管道,同时保持本地收集数据的隐私。然而,现有的隐私保护技术存在显著的障碍。诸如多方计算(MPC)、同态加密(HE)和差分隐私(DP)之类的方法通常招致高计算成本并且遭受有限的可扩展性。本调查研究了有望提高FL隐私和效率的新兴方法,包括可信执行环境(TEE),物理不可克隆功能(PUF),量子计算(QC),基于混沌的加密(CBE),神经形态计算(NC)和群智能(SI)。对于每一种模式,我们评估其相关性的FL管道,概述其优势,局限性和实际考虑。最后,我们强调了开放的挑战和前瞻性的研究途径,为推进安全和可扩展的FL系统提供了详细的路线图。
摘要:Federated Learning (FL) facilitates collaborative model training while keeping raw data decentralized, making it a conduit for leveraging the power of IoT devices while maintaining privacy of the locally collected data. However, existing privacy- preserving techniques present notable hurdles. Methods such as Multi-Party Computation (MPC), Homomorphic Encryption (HE), and Differential Privacy (DP) often incur high compu- tational costs and suffer from limited scalability. This survey examines emerging approaches that hold promise for enhancing both privacy and efficiency in FL, including Trusted Execution Environments (TEEs), Physical Unclonable Functions (PUFs), Quantum Computing (QC), Chaos-Based Encryption (CBE), Neuromorphic Computing (NC), and Swarm Intelligence (SI). For each paradigm, we assess its relevance to the FL pipeline, outlining its strengths, limitations, and practical considerations. We conclude by highlighting open challenges and prospective research avenues, offering a detailed roadmap for advancing secure and scalable FL systems.


【2】Improving Early Sepsis Onset Prediction Through Federated Learning
标题:通过联邦学习改进早期败血症发病预测
链接:https://arxiv.org/abs/2509.20885

作者: Düsing, Philipp Cimiano
备注:Accepted at the 1st Workshop on Artificial Intelligence for Biomedical Data (AIBio) 2025
摘要:脓毒症发病的早期和准确预测仍然是重症监护的主要挑战,及时发现和后续干预可以显着改善患者的预后。虽然机器学习模型在这一领域显示出了希望,但它们的成功往往受到单个医院和重症监护室(ICU)可用的训练数据的数量和多样性的限制。联邦学习(FL)通过在不需要数据共享的情况下实现跨机构的协作模型训练来解决这个问题,从而保护患者隐私。在这项工作中,我们提出了一种用于脓毒症发作预测的联合、注意力增强的长短期记忆模型,并根据多中心ICU数据进行训练。与依赖固定预测窗口的现有方法不同,我们的模型支持可变预测范围,从而在单个统一模型中实现短期和长期预测。在分析过程中,我们特别强调了通过我们的方法在早期脓毒症检测方面的改进,即,通过进行深入的时间分析来进行大预测窗口的预测。我们的研究结果证明,使用FL不仅提高了整体预测性能(性能接近集中模型),而且对早期脓毒症发作预测特别有益。最后,我们表明,我们选择采用可变的预测窗口,而不是一个固定的窗口不会显着损害性能,但减少了计算,通信和组织开销。
摘要:Early and accurate prediction of sepsis onset remains a major challenge in intensive care, where timely detection and subsequent intervention can significantly improve patient outcomes. While machine learning models have shown promise in this domain, their success is often limited by the amount and diversity of training data available to individual hospitals and Intensive Care Units (ICUs). Federated Learning (FL) addresses this issue by enabling collaborative model training across institutions without requiring data sharing, thus preserving patient privacy. In this work, we propose a federated, attention-enhanced Long Short-Term Memory model for sepsis onset prediction, trained on multi-centric ICU data. Unlike existing approaches that rely on fixed prediction windows, our model supports variable prediction horizons, enabling both short- and long-term forecasting in a single unified model. During analysis, we put particular emphasis on the improvements through our approach in terms of early sepsis detection, i.e., predictions with large prediction windows by conducting an in-depth temporal analysis. Our results prove that using FL does not merely improve overall prediction performance (with performance approaching that of a centralized model), but is particularly beneficial for early sepsis onset prediction. Finally, we show that our choice of employing a variable prediction window rather than a fixed window does not hurt performance significantly but reduces computational, communicational, and organizational overhead.


【3】Distribution-Controlled Client Selection to Improve Federated Learning Strategies
标题:分布控制的客户端选择以改进联邦学习策略
链接:https://arxiv.org/abs/2509.20877

作者: Düsing, Philipp Cimiano
备注:Accepted at the 2nd Workshop on Advancements in Federated Learning (WAFL@ECML-PKDD 2024)
摘要:联邦学习(FL)是一种分布式学习范式,允许多个客户端联合训练共享模型,同时保持数据隐私。尽管它在具有严格数据隐私要求的领域具有巨大的潜力,但客户端之间数据不平衡的存在是FL成功的一个因素,因为它会导致共享模型的性能下降。为了解决这个问题,各种研究提出了对现有FL策略的增强,特别是通过减轻数据不平衡的不利影响的客户选择方法。在本文中,我们提出了一个扩展现有的FL策略,它选择的主动客户端,最好的对齐当前的标签分布与两个目标分布之一,即一个平衡的分布或联合会组合的标签分布。随后,我们通过三种常见的FL策略和两个数据集上的分布控制的客户端选择实证验证了改进。我们的研究结果表明,虽然对齐标签分布与平衡的分布产生最大的改善面临本地的不平衡,对齐与联邦的组合标签分布是优越的全球不平衡。
摘要:Federated learning (FL) is a distributed learning paradigm that allows multiple clients to jointly train a shared model while maintaining data privacy. Despite its great potential for domains with strict data privacy requirements, the presence of data imbalance among clients is a thread to the success of FL, as it causes the performance of the shared model to decrease. To address this, various studies have proposed enhancements to existing FL strategies, particularly through client selection methods that mitigate the detrimental effects of data imbalance. In this paper, we propose an extension to existing FL strategies, which selects active clients that best align the current label distribution with one of two target distributions, namely a balanced distribution or the federations combined label distribution. Subsequently, we empirically verify the improvements through our distribution-controlled client selection on three common FL strategies and two datasets. Our results show that while aligning the label distribution with a balanced distribution yields the greatest improvements facing local imbalance, alignment with the federation's combined label distribution is superior for global imbalance.


【4】Personalized Federated Dictionary Learning for Modeling Heterogeneity in Multi-site fMRI Data
标题:用于多站点fMRI数据中异源建模的个性化联邦词典学习
链接:https://arxiv.org/abs/2509.20627

作者:g, Chengshuo Zhang, Ziyu Zhou, Gang Qu, Hao Zheng, Yuping Wang, Hui Shen, Hongwen Deng
摘要:数据隐私约束对大规模神经成像分析提出了重大挑战,特别是在多位点功能磁共振成像(fMRI)研究中,其中位点特异性异质性导致非独立同分布(非IID)数据。这些因素阻碍了可推广模型的发展。为了解决这些挑战,我们提出了个性化联合字典学习(PFedDL),一种新的联合学习框架,可以在不共享原始数据的情况下跨站点进行协作建模。PFedDL在每个站点执行独立的字典学习,将每个站点特定的字典分解为共享的全局组件和个性化的本地组件。全局原子通过联邦聚合进行更新,以促进跨站点的一致性,而局部原子则独立地进行细化,以捕获特定于站点的可变性,从而增强下游分析。在ABIDE数据集上的实验表明,PFedDL在非IID数据集上的准确性和鲁棒性优于现有方法。
摘要:Data privacy constraints pose significant challenges for large-scale neuroimaging analysis, especially in multi-site functional magnetic resonance imaging (fMRI) studies, where site-specific heterogeneity leads to non-independent and identically distributed (non-IID) data. These factors hinder the development of generalizable models. To address these challenges, we propose Personalized Federated Dictionary Learning (PFedDL), a novel federated learning framework that enables collaborative modeling across sites without sharing raw data. PFedDL performs independent dictionary learning at each site, decomposing each site-specific dictionary into a shared global component and a personalized local component. The global atoms are updated via federated aggregation to promote cross-site consistency, while the local atoms are refined independently to capture site-specific variability, thereby enhancing downstream analysis. Experiments on the ABIDE dataset demonstrate that PFedDL outperforms existing methods in accuracy and robustness across non-IID datasets.


推理|分析|理解|解释(6篇)

【1】ScaleDiff: Scaling Difficult Problems for Advanced Mathematical Reasoning
标题:ScaleDiff:高等数学推理中的标度难题
链接:https://arxiv.org/abs/2509.21070

作者:, Zhuoshi Pan, Honglin Lin, Xin Gao, Yu Li, Zinan Tang, Conghui He, Rui Yan, Lijun Wu
备注:15 pages
摘要:大型推理模型(LRM)在解决复杂问题方面表现出令人印象深刻的能力,通常受益于对刺激复杂推理的困难数学问题的训练。最近的努力已经通过从种子数据或固有的数学概念中促进专有模型或大规模开源模型来探索数学问题的自动合成。然而,由于这些方法的高计算/API成本、提示的复杂性以及生成的问题的难度水平有限,因此按比例放大这些方法仍然具有挑战性。为了克服这些限制,我们提出了ScaleDiff,这是一个简单而有效的管道,旨在扩展困难问题的创建。我们使用自适应思维模型从现有数据集中有效地识别困难问题,只需一次向前传递,该模型可以感知问题难度并自动在“思考”和“NoThinking”模式之间切换。然后,我们在这个过滤后的困难数据上训练一个专门的困难问题生成器(DiffGen-8B),它可以大规模地生成新的困难问题,从而消除了对复杂的每个实例提示的需要及其相关的高API成本。在ScaleDiff-Math数据集上微调Qwen2.5-Math-7 B-Instruct,与原始数据集相比,性能大幅提升了11.3%,在AIME'24、AIME'25、HMMT-Feb'25、BRUMO'25和MATH 500上的平均准确率达到了65.9%,优于OpenThinker 3等最近强劲的LRM。值得注意的是,这种性能是使用具有成本效益的Qwen 3 -8B模型作为教师来实现的,这表明我们的管道可以有效地传输高级推理能力,而无需依赖更大,更昂贵的教师模型。此外,我们观察到一个明显的缩放现象,在困难的基准模型性能的困难问题的数量增加。代码:https://github.com/QizhiPei/ScaleDiff。
摘要 :Large Reasoning Models (LRMs) have shown impressive capabilities in complex problem-solving, often benefiting from training on difficult mathematical problems that stimulate intricate reasoning. Recent efforts have explored automated synthesis of mathematical problems by prompting proprietary models or large-scale open-source models from seed data or inherent mathematical concepts. However, scaling up these methods remains challenging due to their high computational/API cost, complexity of prompting, and limited difficulty level of the generated problems. To overcome these limitations, we propose ScaleDiff, a simple yet effective pipeline designed to scale the creation of difficult problems. We efficiently identify difficult problems from existing datasets with only a single forward pass using an adaptive thinking model, which can perceive problem difficulty and automatically switch between "Thinking" and "NoThinking" modes. We then train a specialized difficult problem generator (DiffGen-8B) on this filtered difficult data, which can produce new difficult problems in large scale, eliminating the need for complex, per-instance prompting and its associated high API costs. Fine-tuning Qwen2.5-Math-7B-Instruct on the ScaleDiff-Math dataset yields a substantial performance increase of 11.3% compared to the original dataset and achieves a 65.9% average accuracy on AIME'24, AIME'25, HMMT-Feb'25, BRUMO'25, and MATH500, outperforming recent strong LRMs like OpenThinker3. Notably, this performance is achieved using the cost-efficient Qwen3-8B model as a teacher, demonstrating that our pipeline can effectively transfer advanced reasoning capabilities without relying on larger, more expensive teacher models. Furthermore, we observe a clear scaling phenomenon in model performance on difficult benchmarks as the quantity of difficult problems increases. Code: https://github.com/QizhiPei/ScaleDiff.


【2】Toward Robust and Efficient ML-Based GPU Caching for Modern Inference
标题:迈向稳健高效的基于ML的图形处理器以实现现代推理
链接:https://arxiv.org/abs/2509.20979

作者:, Jiaji Zhang, Hailiang Zhao, Yirong Zhang, Jiahong Yu, Xueyan Tang, Yixuan Wang, Hao Li, Jianping Zou, Gang Xiong, Kingsum Chow, Shuibing He, Shuiguang Deng
摘要:在现代GPU推理中,缓存效率仍然是一个主要瓶颈。在推荐模型中,嵌入命中率在很大程度上决定了吞吐量,而在大型语言模型中,KV缓存未命中大大增加了首次标记时间(TTFT)。启发式策略(如\textsc{LRU})通常在结构化访问模式下难以使用。基于学习的方法很有前途,但在实践中面临两个主要的限制:当预测不准确时,它们会急剧下降,或者由于保守的设计,即使预测准确,它们也几乎没有收获。有些还导致高开销,进一步限制了实用性。   我们提出了\textsc{LCR},这是一个实用的基于学习的GPU缓存框架,可以在确保鲁棒性和效率的同时提高性能。其核心算法\textsc{LARU}通过机器学习预测增强\textsc{LRU},并通过在线误差估计动态适应预测精度。当预测准确时,\textsc{LARU}可实现接近最佳的性能。如果预测不准确,它会优雅地降低到接近\textsc{LRU}的性能。有了LCR,我们弥合了基于学习的缓存的经验进步和理论进步之间的差距。   实验表明,\textsc{LCR}在现实条件下提供一致的增益。在DLRM和LLM场景中,它将吞吐量提高了24.2%,并将P99 TTFT降低了28.3%,优于广泛使用的推理系统。即使在预测不佳的情况下,其性能仍然稳定,表现出实际的鲁棒性。
摘要:In modern GPU inference, cache efficiency remains a major bottleneck. In recommendation models, embedding hit rates largely determine throughput, while in large language models, KV-cache misses substantially increase time-to-first-token (TTFT). Heuristic policies such as \textsc{LRU} often struggle under structured access patterns. Learning-based approaches are promising, but in practice face two major limitations: they degrade sharply when predictions are inaccurate, or they gain little even with accurate predictions due to conservative designs. Some also incur high overhead, further limiting practicality.   We present \textsc{LCR}, a practical framework for learning-based GPU caching that delivers performance gains while ensuring robustness and efficiency. Its core algorithm, \textsc{LARU}, enhances \textsc{LRU} with machine-learned predictions and dynamically adapts to prediction accuracy through online error estimation. When predictions are accurate, \textsc{LARU} achieves near-optimal performance. With inaccurate predictions, it degrades gracefully to near-\textsc{LRU} performance. With \textsc{LCR}, we bridge the gap between empirical progress and theoretical advances in learning-based caching.   Experiments show that \textsc{LCR} delivers consistent gains under realistic conditions. In DLRM and LLM scenarios, it improves throughput by up to 24.2\% and reduces P99 TTFT by up to 28.3\%, outperforming widely used inference systems. Even under poor predictions, its performance remains stable, demonstrating practical robustness.


【3】Decoupled-Value Attention for Prior-Data Fitted Networks: GP Inference for Physical Equations
标题:先验数据匹配网络的去耦合值关注:物理方程的GP推理
链接:https://arxiv.org/abs/2509.20950

作者:Sharma, Simardeep Singh, Parikshit Pareek
摘要:先验数据拟合网络(PFN)是一种很有前途的替代耗时的高斯过程(GP)推理,用于创建物理系统的快速代理。PFN通过将GP中的贝叶斯推理替换为学习预测模型的单次前向传递来减少GP训练的计算负担。然而,在标准的Transformer注意力下,PFN在高维回归任务上显示出有限的有效性。我们引入了解耦值注意力(DVA)-由GP属性驱动,即函数空间完全由输入上的内核表征,预测均值是训练目标的加权和。DVA仅从输入计算相似性,并仅通过值传播标签。因此,所提出的DVA反映了高斯过程更新,同时保持无内核。我们证明,缩放PFN的关键因素是注意规则,而不是架构本身。具体来说,我们的研究结果表明:(a)本地化注意力一致地减少了不同维度设置中PFN的样本外验证损失,在五维和十维情况下,验证损失减少了50%以上;(b)注意力的作用比主干架构的选择更具决定性,表明基于CNN的PFN可以与基于Transformer的PFN相媲美。所提出的PFN提供64维潮流方程近似的平均绝对误差的顺序为1E-3,而超过80倍的速度比精确的GP推断。
摘要:Prior-data fitted networks (PFNs) are a promising alternative to time-consuming Gaussian Process (GP) inference for creating fast surrogates of physical systems. PFN reduces the computational burden of GP-training by replacing Bayesian inference in GP with a single forward pass of a learned prediction model. However, with standard Transformer attention, PFNs show limited effectiveness on high-dimensional regression tasks. We introduce Decoupled-Value Attention (DVA)-- motivated by the GP property that the function space is fully characterized by the kernel over inputs and the predictive mean is a weighted sum of training targets. DVA computes similarities from inputs only and propagates labels solely through values. Thus, the proposed DVA mirrors the Gaussian-process update while remaining kernel-free. We demonstrate that the crucial factor for scaling PFNs is the attention rule rather than the architecture itself. Specifically, our results demonstrate that (a) localized attention consistently reduces out-of-sample validation loss in PFNs across different dimensional settings, with validation loss reduced by more than 50% in five- and ten-dimensional cases, and (b) the role of attention is more decisive than the choice of backbone architecture, showing that CNN-based PFNs can perform at par with their Transformer-based counterparts. The proposed PFNs provide 64-dimensional power flow equation approximations with a mean absolute error of the order of 1E-3, while being over 80x faster than exact GP inference.


【4】GenFacts-Generative Counterfactual Explanations for Multi-Variate Time Series
标题:针对多变量时间序列的GenFacts生成反事实解释
链接:https://arxiv.org/abs/2509.20936

作者:fi, Anass Ibrahimi, Tobias Sukianto, Cecilia Carbonelli, Lorenzo Servadei, Robert Wille
备注:5 pages
摘要:反事实解释旨在通过展示如何最小限度地改变输入来改变预测,从而提高模型的透明度。对于多变量时间序列,现有的方法通常会产生无效的,难以置信的,或不直观的反事实。我们介绍GenFacts,一个基于类判别变分自动编码器的生成框架。它集成了对比和分类一致性目标,基于原型的初始化和现实约束优化。我们将雷达手势数据作为工业用例评估GenFacts,并将手写信件轨迹作为直观的基准。在这两个数据集中,GenFacts在可解释性方面优于最先进的基线(+18.7%),并在人类研究中获得了最高的可解释性分数。这些结果强调了可解释性和以用户为中心的可解释性,而不仅仅是稀疏性,是时间序列数据中可操作的反事实的关键。
摘要:Counterfactual explanations aim to enhance model transparency by showing how inputs can be minimally altered to change predictions. For multivariate time series, existing methods often generate counterfactuals that are invalid, implausible, or unintuitive. We introduce GenFacts, a generative framework based on a class-discriminative variational autoencoder. It integrates contrastive and classification-consistency objectives, prototype-based initialization, and realism-constrained optimization. We evaluate GenFacts on radar gesture data as an industrial use case and handwritten letter trajectories as an intuitive benchmark. Across both datasets, GenFacts outperforms state-of-the-art baselines in plausibility (+18.7%) and achieves the highest interpretability scores in a human study. These results highlight that plausibility and user-centered interpretability, rather than sparsity alone, are key to actionable counterfactuals in time series data.


【5】A Recovery Theory for Diffusion Priors: Deterministic Analysis of the Implicit Prior Algorithm
标题:扩散先验的恢复理论:隐式先验算法的确定性分析
链接:https://arxiv.org/abs/2509.20511

作者:ng, Yann Traonmilin
摘要:从被破坏的测量中恢复高维信号是反问题中的一个核心挑战。生成扩散模型的最新进展在提供强大的数据驱动先验方面取得了显着的经验成功,但严格的恢复保证仍然有限。在这项工作中,我们开发了一个理论框架,用于分析反问题的确定性扩散算法,重点是Kadkhodaie \& Simoncelli \cite{kadkhodaie 2021 stochastic}提出的算法的确定性版本。首先,我们表明,当底层数据分布集中在一个低维模型集,相关的噪声卷积分数可以被解释为随时间变化的投影到这样一个集合。这导致解释以前的算法使用扩散先验的反问题的广义投影梯度下降方法与不同的投影。当传感矩阵满足模型集上的限制等距属性时,我们可以导出明确依赖于噪声时间表的定量收敛率。我们将我们的框架应用到两个具有指导意义的数据分布:低维紧致,凸集和低秩高斯混合模型的均匀分布。在后一种情况下,我们可以建立全局收敛保证,尽管底层模型集的非凸性。
摘要:Recovering high-dimensional signals from corrupted measurements is a central challenge in inverse problems. Recent advances in generative diffusion models have shown remarkable empirical success in providing strong data-driven priors, but rigorous recovery guarantees remain limited. In this work, we develop a theoretical framework for analyzing deterministic diffusion-based algorithms for inverse problems, focusing on a deterministic version of the algorithm proposed by Kadkhodaie \& Simoncelli \cite{kadkhodaie2021stochastic}. First, we show that when the underlying data distribution concentrates on a low-dimensional model set, the associated noise-convolved scores can be interpreted as time-varying projections onto such a set. This leads to interpreting previous algorithms using diffusion priors for inverse problems as generalized projected gradient descent methods with varying projections. When the sensing matrix satisfies a restricted isometry property over the model set, we can derive quantitative convergence rates that depend explicitly on the noise schedule. We apply our framework to two instructive data distributions: uniform distributions over low-dimensional compact, convex sets and low-rank Gaussian mixture models. In the latter setting, we can establish global convergence guarantees despite the nonconvexity of the underlying model set.


【6】A Comparative Analysis of Ensemble-Based Machine Learning Approaches with Explainable AI for Multi-Class Intrusion Detection in Drone Networks
标题:基于集成的机器学习方法与可解释人工智能用于无人机网络多类入侵检测的比较分析
链接:https://arxiv.org/abs/2509.20391

作者:ir Hossain, Waqas Ishtiaq, Md. Samiul Islam
备注:27 pages, 18 figures, 10 tables
摘要:无人机越来越多地融入民用、商业和国防领域,带来了重大的网络安全问题,特别是针对无人机通信协议的基于网络的入侵风险增加。由于无人机流量的动态性质以及存在多种复杂的攻击向量,例如欺骗,注入,重放和中间人(MITM)攻击,因此检测和分类这些入侵具有固有的挑战性。这项研究旨在开发一个专为无人机网络定制的强大且可解释的入侵检测框架,重点是处理多类分类和模型可解释性。我们对基于集合的机器学习模型进行了比较分析,即随机森林,额外树,AdaBoost,CatBoost和XGBoost,这些模型在包含良性流量和九种不同入侵类型的标记数据集上进行了训练。进行全面的数据预处理,包括缺失值插补、缩放和分类编码,然后进行模型训练和使用宏观F1评分、ROC AUC、Matthews相关系数和Log Loss等指标进行广泛评估。随机森林实现了最高的性能,宏观F1得分为0.9998,ROC AUC为1.0000。为了验证模型的优效性,应用了统计检验,包括Friedmans检验、Wilcoxon符号秩检验和Holm校正以及自举置信区间。此外,可解释的AI方法SHAP和LIME被集成来解释全局和局部特征的重要性,增强模型的透明度和决策可信度。所提出的方法不仅提供了近乎完美的准确性,而且还确保了可解释性,使其非常适合实时和安全关键的无人机操作。
摘要 :The growing integration of drones into civilian, commercial, and defense sectors introduces significant cybersecurity concerns, particularly with the increased risk of network-based intrusions targeting drone communication protocols. Detecting and classifying these intrusions is inherently challenging due to the dynamic nature of drone traffic and the presence of multiple sophisticated attack vectors such as spoofing, injection, replay, and man-in-the-middle (MITM) attacks. This research aims to develop a robust and interpretable intrusion detection framework tailored for drone networks, with a focus on handling multi-class classification and model explainability. We present a comparative analysis of ensemble-based machine learning models, namely Random Forest, Extra Trees, AdaBoost, CatBoost, and XGBoost, trained on a labeled dataset comprising benign traffic and nine distinct intrusion types. Comprehensive data preprocessing was performed, including missing value imputation, scaling, and categorical encoding, followed by model training and extensive evaluation using metrics such as macro F1-score, ROC AUC, Matthews Correlation Coefficient, and Log Loss. Random Forest achieved the highest performance with a macro F1-score of 0.9998 and ROC AUC of 1.0000. To validate the superiority of the models, statistical tests, including Friedmans test, the Wilcoxon signed-rank test with Holm correction, and bootstrapped confidence intervals, were applied. Furthermore, explainable AI methods, SHAP and LIME, were integrated to interpret both global and local feature importance, enhancing model transparency and decision trustworthiness. The proposed approach not only delivers near-perfect accuracy but also ensures interpretability, making it highly suitable for real-time and safety-critical drone operations.


检测相关(3篇)

【1】The Unwinnable Arms Race of AI Image Detection
标题:AI图像检测的不可能的军备竞赛
链接:https://arxiv.org/abs/2509.21135

作者:l, Lorenzo Vettor, Andreas Plesner, Roger Wattenhofer
摘要:图像生成AI的快速发展模糊了合成图像和真实图像之间的界限,助长了生成器和鉴别器之间的军备竞赛。本文研究了在这种竞争中鉴别器处于最不利地位的条件。我们分析了两个关键因素:数据维度和数据复杂性。虽然增加的维度通常会增强鉴别器检测细微不一致的能力,但复杂性会带来更微妙的效果。使用Kolmogorov复杂性作为内在数据集结构的度量,我们表明,非常简单和高度复杂的数据集都降低了合成图像的可检测性;生成器几乎可以完美地学习简单的数据集,而极端的多样性掩盖了缺陷。相比之下,中等复杂度的数据集为检测创造了最有利的条件,因为生成器无法完全捕获分布,并且它们的错误仍然可见。
摘要:The rapid progress of image generative AI has blurred the boundary between synthetic and real images, fueling an arms race between generators and discriminators. This paper investigates the conditions under which discriminators are most disadvantaged in this competition. We analyze two key factors: data dimensionality and data complexity. While increased dimensionality often strengthens the discriminators ability to detect subtle inconsistencies, complexity introduces a more nuanced effect. Using Kolmogorov complexity as a measure of intrinsic dataset structure, we show that both very simple and highly complex datasets reduce the detectability of synthetic images; generators can learn simple datasets almost perfectly, whereas extreme diversity masks imperfections. In contrast, intermediate-complexity datasets create the most favorable conditions for detection, as generators fail to fully capture the distribution and their errors remain visible.


【2】Every Character Counts: From Vulnerability to Defense in Phishing Detection
标题:每个字符都很重要:网络钓鱼检测中的脆弱性到防御
链接:https://arxiv.org/abs/2509.20589

作者:per, Radu Tudor Ionescu
备注:Accepted at ICTAI 2025
摘要:随着技术的进步,针对组织和个人的网络钓鱼攻击正在成为越来越严重的威胁。目前的自动检测方法在检测新的钓鱼攻击时往往缺乏可解释性和鲁棒性。在这项工作中,我们研究了字符级深度学习模型用于网络钓鱼检测的有效性,它可以提供鲁棒性和可解释性。我们评估了三种适合在字符级别操作的神经架构,即CharCNN,CharGRU和CharBiLSTM,在定制的电子邮件数据集上,该数据集结合了来自多个来源的数据。它们的性能在三种情况下进行了分析:(i)标准训练和测试,(ii)对抗性攻击下的标准训练和测试,以及(iii)对抗性示例的训练和测试。为了开发一个作为浏览器扩展的工具,我们在有限的计算资源下测试了所有模型。在这种受约束的设置中,CharGRU被证明是所有场景中性能最好的模型。所有模型都显示出对抗攻击的脆弱性,但对抗训练大大提高了它们的鲁棒性。此外,通过将加权类激活映射(Grad-CAM)技术应用于字符级输入,我们能够可视化每个电子邮件的哪些部分影响每个模型的决策。我们的开源代码和数据发布在https://github.com/chipermaria/every-character-counts上。
摘要:Phishing attacks targeting both organizations and individuals are becoming an increasingly significant threat as technology advances. Current automatic detection methods often lack explainability and robustness in detecting new phishing attacks. In this work, we investigate the effectiveness of character-level deep learning models for phishing detection, which can provide both robustness and interpretability. We evaluate three neural architectures adapted to operate at the character level, namely CharCNN, CharGRU, and CharBiLSTM, on a custom-built email dataset, which combines data from multiple sources. Their performance is analyzed under three scenarios: (i) standard training and testing, (ii) standard training and testing under adversarial attacks, and (iii) training and testing with adversarial examples. Aiming to develop a tool that operates as a browser extension, we test all models under limited computational resources. In this constrained setup, CharGRU proves to be the best-performing model across all scenarios. All models show vulnerability to adversarial attacks, but adversarial training substantially improves their robustness. In addition, by adapting the Gradient-weighted Class Activation Mapping (Grad-CAM) technique to character-level inputs, we are able to visualize which parts of each email influence the decision of each model. Our open-source code and data is released at https://github.com/chipermaria/every-character-counts.


【3】Leveraging NTPs for Efficient Hallucination Detection in VLMs
标题:利用NTPs在VLM中进行有效的幻觉检测
链接:https://arxiv.org/abs/2509.20379

作者:hi, Kfir Eliyahu, Eyal El Ani, Rom Himelstein, Roi Reichart, Yuval Pinter, Nitay Calderon
摘要 :视觉语言模型(VLM)的幻觉,即视觉内容和生成的文本之间的不一致,破坏了VLM的可靠性。检测它们的一种常见方法采用相同或不同的VLM来评估生成的输出。这个过程是计算密集型的,并且增加了模型延迟。在本文中,我们探索了一种有效的动态方法,通过基于VLM的下一个令牌概率(NTP)在信号上训练传统的ML模型来进行幻觉检测。NTPs提供了模型不确定性的直接量化。我们假设高不确定性(即,低NTP值)与幻觉强烈相关。为了测试这一点,我们引入了一个数据集,其中包含1,400个来自VLM生成内容的人类注释语句,每个语句都被标记为幻觉或非幻觉,并使用它来测试我们基于NTP的轻量级方法。我们的研究结果表明,基于NTP的特征是幻觉的有价值的预测因子,使快速简单的ML模型能够实现与强大的VLM相当的性能。此外,通过仅将生成的文本反馈到VLM中来计算语言NTP来增强这些NTP,增强了幻觉检测性能。最后,将来自VLM的幻觉预测分数整合到基于NTP的模型中,比单独使用VLM或NTP具有更好的性能。我们希望这项研究为提高VLM可靠性的简单,轻量级的解决方案铺平道路。
摘要:Hallucinations of vision-language models (VLMs), which are misalignments between visual content and generated text, undermine the reliability of VLMs. One common approach for detecting them employs the same VLM, or a different one, to assess generated outputs. This process is computationally intensive and increases model latency. In this paper, we explore an efficient on-the-fly method for hallucination detection by training traditional ML models over signals based on the VLM's next-token probabilities (NTPs). NTPs provide a direct quantification of model uncertainty. We hypothesize that high uncertainty (i.e., a low NTP value) is strongly associated with hallucinations. To test this, we introduce a dataset of 1,400 human-annotated statements derived from VLM-generated content, each labeled as hallucinated or not, and use it to test our NTP-based lightweight method. Our results demonstrate that NTP-based features are valuable predictors of hallucinations, enabling fast and simple ML models to achieve performance comparable to that of strong VLMs. Furthermore, augmenting these NTPs with linguistic NTPs, computed by feeding only the generated text back into the VLM, enhances hallucination detection performance. Finally, integrating hallucination prediction scores from VLMs into the NTP-based models led to better performance than using either VLMs or NTPs alone. We hope this study paves the way for simple, lightweight solutions that enhance the reliability of VLMs.


分类|识别(6篇)

【1】Learning Conformal Explainers for Image Classifiers
标题:学习图像分类器的保形解释器
链接:https://arxiv.org/abs/2509.21209

作者:tib, Stephanie Lowry
摘要:特征归因方法被广泛用于解释基于图像的预测,因为它们提供了可以直观可视化的特征级见解。然而,这样的解释往往在其鲁棒性上有所不同,并且可能无法忠实地反映底层黑盒模型的推理。为了解决这些限制,我们提出了一种新的共形预测为基础的方法,使用户能够直接控制所生成的解释的保真度。该方法识别出足以保留模型预测的显著特征的子集,而不管被排除的特征所携带的信息如何,并且不需要访问用于校准的地面实况解释。提出了四个一致性函数来量化解释符合模型预测的程度。该方法是经验评估使用五个解释器在六个图像数据集。实证结果表明,FastSHAP始终优于竞争方法的保真度和信息效率,后者的解释区域的大小来衡量。此外,结果表明,基于超像素的一致性措施是更有效的比他们的像素级同行。
摘要:Feature attribution methods are widely used for explaining image-based predictions, as they provide feature-level insights that can be intuitively visualized. However, such explanations often vary in their robustness and may fail to faithfully reflect the reasoning of the underlying black-box model. To address these limitations, we propose a novel conformal prediction-based approach that enables users to directly control the fidelity of the generated explanations. The method identifies a subset of salient features that is sufficient to preserve the model's prediction, regardless of the information carried by the excluded features, and without demanding access to ground-truth explanations for calibration. Four conformity functions are proposed to quantify the extent to which explanations conform to the model's predictions. The approach is empirically evaluated using five explainers across six image datasets. The empirical results demonstrate that FastSHAP consistently outperforms the competing methods in terms of both fidelity and informational efficiency, the latter measured by the size of the explanation regions. Furthermore, the results reveal that conformity measures based on super-pixels are more effective than their pixel-wise counterparts.


【2】Design, Implementation and Evaluation of a Novel Programming Language Topic Classification Workflow
标题:一种新型编程语言主题分类工作流的设计、实现和评估
链接:https://arxiv.org/abs/2509.20631

作者:hang, Yuan Tian, Mariam Guizani
摘要:随着软件系统的规模和复杂性的增长,了解源代码中编程语言主题的分布对于指导技术决策,改进入门以及通知工具和教育变得越来越重要。本文提出了一种新的编程语言主题分类工作流的设计,实现和评估。我们的方法结合了多标签支持向量机(SVM)与滑动窗口和投票策略,使细粒度本地化的核心语言概念,如运算符重载,虚函数,继承和模板。在IBM Project CodeNet数据集上进行训练后,我们的模型在各个主题上的平均F1得分为0.90,在代码主题突出显示中的平均F1得分为0.75。我们的研究结果为对代码分析和数据驱动软件工程感兴趣的研究人员和从业者提供了经验性的见解和可重用的管道。
摘要:As software systems grow in scale and complexity, understanding the distribution of programming language topics within source code becomes increasingly important for guiding technical decisions, improving onboarding, and informing tooling and education. This paper presents the design, implementation, and evaluation of a novel programming language topic classification workflow. Our approach combines a multi-label Support Vector Machine (SVM) with a sliding window and voting strategy to enable fine-grained localization of core language concepts such as operator overloading, virtual functions, inheritance, and templates. Trained on the IBM Project CodeNet dataset, our model achieves an average F1 score of 0.90 across topics and 0.75 in code-topic highlight. Our findings contribute empirical insights and a reusable pipeline for researchers and practitioners interested in code analysis and data-driven software engineering.


【3】Region-of-Interest Augmentation for Mammography Classification under Patient-Level Cross-Validation
标题:患者水平交叉验证下乳腺X线摄影分类的感兴趣区域增强
链接:https://arxiv.org/abs/2509.20585

作者:gdeli, Mohsen Mohammadagha, Ali Bigdeli
备注:5 pages, 5 figures, 2 tables
摘要:乳房X光检查乳腺癌筛查仍然是早期发现和降低死亡率的核心。深度学习在自动化乳房X光片解释方面显示出强大的潜力,但有限分辨率的数据集和小样本量继续限制性能。我们重新访问了Mini-DDSM数据集(9,684张图像; 2,414名患者),并介绍了一种轻量级的感兴趣区域(ROI)增强策略。在训练过程中,完整的图像被从预先计算的、无标签的边界框库中采样的随机ROI作物概率性地替换,并具有可选的抖动以增加可变性。我们在严格的患者级交叉验证下进行评估,并报告ROC-AUC,PR-AUC和训练时间效率指标(吞吐量和GPU内存)。由于ROI增强仅用于训练,因此推理时间成本保持不变。在Mini-DDSM上,ROI增强(最佳:p_roi = 0.10,alpha = 0.10)产生适度的平均ROC-AUC增益,性能随倍数变化; PR-AUC持平或略低。这些结果表明,简单的、以数据为中心的ROI策略可以在受约束的设置中增强乳腺X线摄影分类,而不需要额外的标签或架构修改。
摘要:Breast cancer screening with mammography remains central to early detection and mortality reduction. Deep learning has shown strong potential for automating mammogram interpretation, yet limited-resolution datasets and small sample sizes continue to restrict performance. We revisit the Mini-DDSM dataset (9,684 images; 2,414 patients) and introduce a lightweight region-of-interest (ROI) augmentation strategy. During training, full images are probabilistically replaced with random ROI crops sampled from a precomputed, label-free bounding-box bank, with optional jitter to increase variability. We evaluate under strict patient-level cross-validation and report ROC-AUC, PR-AUC, and training-time efficiency metrics (throughput and GPU memory). Because ROI augmentation is training-only, inference-time cost remains unchanged. On Mini-DDSM, ROI augmentation (best: p_roi = 0.10, alpha = 0.10) yields modest average ROC-AUC gains, with performance varying across folds; PR-AUC is flat to slightly lower. These results demonstrate that simple, data-centric ROI strategies can enhance mammography classification in constrained settings without requiring additional labels or architectural modifications.


【4】Innovative Deep Learning Architecture for Enhanced Altered Fingerprint Recognition
标题:用于增强型改变指纹识别的创新深度学习架构
链接:https://arxiv.org/abs/2509.20537

作者:dullah, Dana Rasul Hamad, Bishar Rasheed Ibrahim, Sirwan Abdulwahid Aula, Aso Khaleel Ameen, Sabat Salih Hamadamin
摘要:改变指纹识别(AFR)是具有挑战性的生物特征验证应用,如边境管制,取证和财政准入。攻击者可以故意修改脊线图案以逃避检测,因此对篡改指纹的鲁棒识别至关重要。我们提出了DeepAFRNet,这是一种深度学习识别模型,可以匹配和识别失真的指纹样本。该方法使用VGG 16主干提取高维特征和余弦相似性来比较嵌入。我们在SOCOFing Real-Altered子集上进行评估,有三个难度级别(简单,中等,困难)。通过严格的阈值,DeepAFRNet在三个级别上的准确率分别为96.7%、98.76%和99.54%。一项阈值敏感性研究表明,将阈值从0.92放宽到0.72会使准确率急剧下降到7.86%、27.05%和29.51%,这强调了阈值选择在生物识别系统中的重要性。通过使用真实的更改样本和报告每级指标,DeepAFRNet解决了基于合成更改或有限验证协议的先前工作的局限性,并表明了安全性和识别弹性都至关重要的真实世界部署的准备情况。
摘要:Altered fingerprint recognition (AFR) is challenging for biometric verification in applications such as border control, forensics, and fiscal admission. Adversaries can deliberately modify ridge patterns to evade detection, so robust recognition of altered prints is essential. We present DeepAFRNet, a deep learning recognition model that matches and recognizes distorted fingerprint samples. The approach uses a VGG16 backbone to extract high-dimensional features and cosine similarity to compare embeddings. We evaluate on the SOCOFing Real-Altered subset with three difficulty levels (Easy, Medium, Hard). With strict thresholds, DeepAFRNet achieves accuracies of 96.7 percent, 98.76 percent, and 99.54 percent for the three levels. A threshold-sensitivity study shows that relaxing the threshold from 0.92 to 0.72 sharply degrades accuracy to 7.86 percent, 27.05 percent, and 29.51 percent, underscoring the importance of threshold selection in biometric systems. By using real altered samples and reporting per-level metrics, DeepAFRNet addresses limitations of prior work based on synthetic alterations or limited verification protocols, and indicates readiness for real-world deployments where both security and recognition resilience are critical.


【5】A Compound Classification System Based on Fuzzy Relations Applied to the Noise-Tolerant Control of a Bionic Hand via EMG Signal Recognition
标题:基于模糊关系的复合分类系统应用于仿生手的EMG信号识别抗噪控制
链接:https://arxiv.org/abs/2509.20523

作者:jdos, Marek Kurzynski
摘要:现代拟人上肢生物假体通常使用模式识别方案由肌电(EMG)生物信号控制。不幸的是,存在许多源自待分类对象的人类来源和源自人类-假体接口的因素,这些因素使得难以获得可接受的分类质量。这些因素之一是生物信号对污染的高度敏感性,这可以大大降低识别系统的分类质量。   在本文中,作者提出了一种新的识别系统,旨在为EMG为基础的控制的手假体与污染的生物信号的检测,以减轻污染的不利影响。该系统由两个集合组成:一类分类器(OCC)的集合,以评估单个通道的污染程度和K-最近邻(KNN)分类器的集合,以识别患者的意图。对于所有的识别系统,一个原始的,连贯的模糊模型,它允许使用一个统一的软(模糊)决策计划在整个识别过程中。使用来自公共存储库的真实生物信号进行实验评估。我们的目标是提供一个实验比较分析的参数和程序的开发方法上的识别系统的质量取决于。所提出的模糊识别系统也与文献中描述的类似系统进行了比较。
摘要:Modern anthropomorphic upper limb bioprostheses are typically controlled by electromyographic (EMG) biosignals using a pattern recognition scheme. Unfortunately, there are many factors originating from the human source of objects to be classified and from the human-prosthesis interface that make it difficult to obtain an acceptable classification quality. One of these factors is the high susceptibility of biosignals to contamination, which can considerably reduce the quality of classification of a recognition system.   In the paper, the authors propose a new recognition system intended for EMG based control of the hand prosthesis with detection of contaminated biosignals in order to mitigate the adverse effect of contaminations. The system consists of two ensembles: the set of one-class classifiers (OCC) to assess the degree of contamination of individual channels and the ensemble of K-nearest neighbours (KNN) classifier to recognise the patient's intent. For all recognition systems, an original, coherent fuzzy model was developed, which allows the use of a uniform soft (fuzzy) decision scheme throughout the recognition process. The experimental evaluation was conducted using real biosignals from a public repository. The goal was to provide an experimental comparative analysis of the parameters and procedures of the developed method on which the quality of the recognition system depends. The proposed fuzzy recognition system was also compared with similar systems described in the literature.


【6】Speaker Style-Aware Phoneme Anchoring for Improved Cross-Lingual Speech Emotion Recognition
标题:说话者风格感知音素锚定,以改进跨语言语音情感识别
链接:https://arxiv.org/abs/2509.20373

作者: Upadhyay, Carlos Busso, Chi-Chun Lee
摘要:由于语音变异性和说话人表达风格的差异,跨语言语音情感识别(SER)仍然是一项具有挑战性的任务。在如此多样化的条件下有效地捕捉情感需要一个框架,可以在不同的说话者和语言中调整情感的外在化。为了解决这个问题,我们提出了一个扬声器风格的意识音素锚定框架,在语音和扬声器水平对齐的情感表达。我们的方法通过基于图的聚类来构建特定于情感的扬声器社区,以捕获共享的扬声器特征。使用这些组,我们在扬声器和语音空间中应用双空间锚定,以实现更好的跨语言情感转移。对MSP播客(英语)和BIIC播客(台湾普通话)语料库的评估表明,与竞争基线相比,泛化能力有所提高,并为跨语言情感表征的共性提供了有价值的见解。
摘要:Cross-lingual speech emotion recognition (SER) remains a challenging task due to differences in phonetic variability and speaker-specific expressive styles across languages. Effectively capturing emotion under such diverse conditions requires a framework that can align the externalization of emotions across different speakers and languages. To address this problem, we propose a speaker-style aware phoneme anchoring framework that aligns emotional expression at the phonetic and speaker levels. Our method builds emotion-specific speaker communities via graph-based clustering to capture shared speaker traits. Using these groups, we apply dual-space anchoring in speaker and phonetic spaces to enable better emotion transfer across languages. Evaluations on the MSP-Podcast (English) and BIIC-Podcast (Taiwanese Mandarin) corpora demonstrate improved generalization over competitive baselines and provide valuable insights into the commonalities in cross-lingual emotion representation.


表征(2篇)

【1】Alignment Unlocks Complementarity: A Framework for Multiview Circuit Representation Learning
标题:对齐解锁互补性:多视图电路表示学习的框架
链接:https://arxiv.org/abs/2509.20968

作者: Shi, Jingxin Wang, Wentao Jiang, Chengyu Ma, Ziyang Zheng, Zhufei Chu, Weikang Qian, Qiang Xu
摘要:布尔电路上的多视图学习有着巨大的前景,因为不同的基于图的表示提供了互补的结构和语义信息。然而,视图之间的巨大结构异质性,例如与反转图(AIG)与异或多数图(XMG),对有效融合构成了关键障碍,特别是对于自监督技术,如掩码建模。简单地应用这样的方法是失败的,因为交叉视图上下文被感知为噪声。我们的主要观点是,功能对齐是释放多视图自我监督能力的必要前提。我们介绍MixGate,这是一个建立在原则性培训课程基础上的框架,它首先通过等价对齐损失来教授模型一个共享的、功能感知的表示空间。只有这样,我们才能引入多视图掩码建模目标,它现在可以利用对齐的视图作为丰富的互补信号。广泛的实验,包括一个重要的消融研究,表明我们的实验第一的策略转换掩蔽建模从一个无效的技术到一个强大的性能驱动程序。
摘要:Multiview learning on Boolean circuits holds immense promise, as different graph-based representations offer complementary structural and semantic information. However, the vast structural heterogeneity between views, such as an And-Inverter Graph (AIG) versus an XOR-Majority Graph (XMG), poses a critical barrier to effective fusion, especially for self-supervised techniques like masked modeling. Naively applying such methods fails, as the cross-view context is perceived as noise. Our key insight is that functional alignment is a necessary precondition to unlock the power of multiview self-supervision. We introduce MixGate, a framework built on a principled training curriculum that first teaches the model a shared, function-aware representation space via an Equivalence Alignment Loss. Only then do we introduce a multiview masked modeling objective, which can now leverage the aligned views as a rich, complementary signal. Extensive experiments, including a crucial ablation study, demonstrate that our alignment-first strategy transforms masked modeling from an ineffective technique into a powerful performance driver.


【2】Function Spaces Without Kernels: Learning Compact Hilbert Space Representations
标题:没有核的函数空间:学习紧凑的Hilbert空间表示
链接:https://arxiv.org/abs/2509.20605

作者:w, Quentin Rommel, Kevin S. Miller, Adam J. Thorpe, Ufuk Topcu
备注:Submitted to ICLR 2026
摘要:函数编码器是一种最近的技术,它学习神经网络基函数以形成函数的希尔伯特空间的紧凑的自适应表示。我们表明,函数编码器提供了一个原则性的连接功能学习和内核的方法,通过定义一个内核通过内积的学习功能图。这种内核理论的观点解释了它们独立于数据集大小进行扩展的能力,同时适应数据的内在结构,并且它可以对神经模型进行内核风格的分析。在此基础上,我们开发了两种学习紧凑基的训练算法:一种是建设性地增长基的渐进式训练方法,另一种是训练后修剪方法,它在训练后提供了一种计算效率高的替代方法。这两种方法都使用PCA的原理来揭示学习空间的内在维度。在并行,我们推导出有限样本的泛化范围使用Rademacher复杂性和PAC-Bayes技术,提供推理时间保证。我们验证了我们的方法与已知的内在维度的多项式基准,并在非线性动力学系统,包括一个范德波尔振荡器和两体轨道模型,证明了相同的精度可以实现与基本上更少的基函数。这项工作提出了一条通往具有内核级保证的神经预测器的道路,使适应性强的模型在规模上既有效又有原则。
摘要 :Function encoders are a recent technique that learn neural network basis functions to form compact, adaptive representations of Hilbert spaces of functions. We show that function encoders provide a principled connection to feature learning and kernel methods by defining a kernel through an inner product of the learned feature map. This kernel-theoretic perspective explains their ability to scale independently of dataset size while adapting to the intrinsic structure of data, and it enables kernel-style analysis of neural models. Building on this foundation, we develop two training algorithms that learn compact bases: a progressive training approach that constructively grows bases, and a train-then-prune approach that offers a computationally efficient alternative after training. Both approaches use principles from PCA to reveal the intrinsic dimension of the learned space. In parallel, we derive finite-sample generalization bounds using Rademacher complexity and PAC-Bayes techniques, providing inference time guarantees. We validate our approach on a polynomial benchmark with a known intrinsic dimension, and on nonlinear dynamical systems including a Van der Pol oscillator and a two-body orbital model, demonstrating that the same accuracy can be achieved with substantially fewer basis functions. This work suggests a path toward neural predictors with kernel-level guarantees, enabling adaptable models that are both efficient and principled at scale.


3D|3D重建等相关(2篇)

【1】Large Pre-Trained Models for Bimanual Manipulation in 3D
标题:3D双手操纵的大型预训练模型
链接:https://arxiv.org/abs/2509.20579

作者:chyk, Wei-Di Chang, Gregory Dudek, David Meger
备注:Accepted to 2025 IEEE-RAS 24th International Conference on Humanoid Robots
摘要:我们研究了从预先训练的Vision Transformer到体素表示的注意力图的集成,以增强双手机器人操作。具体来说,我们从DINOv2(一种自监督ViT模型)中提取注意力地图,并将其解释为RGB图像上的像素级显着性得分。这些地图被提升到一个三维体素网格,导致体素级的语义线索,被纳入行为克隆策略。当集成到最先进的基于体素的策略中时,我们的注意力引导特征化在RLBench双手基准测试中的所有任务中产生了8.2%的平均绝对改进和21.9%的相对增益。
摘要:We investigate the integration of attention maps from a pre-trained Vision Transformer into voxel representations to enhance bimanual robotic manipulation. Specifically, we extract attention maps from DINOv2, a self-supervised ViT model, and interpret them as pixel-level saliency scores over RGB images. These maps are lifted into a 3D voxel grid, resulting in voxel-level semantic cues that are incorporated into a behavior cloning policy. When integrated into a state-of-the-art voxel-based policy, our attention-guided featurization yields an average absolute improvement of 8.2% and a relative gain of 21.9% across all tasks in the RLBench bimanual benchmark.


【2】SceneWeaver: All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective Agent
标题:SceneWeaver:采用可扩展和自反射代理的一体化3D场景合成
链接:https://arxiv.org/abs/2509.20414

作者:ng, Baoxiong Jia, Shujie Zhang, Siyuan Huang
备注:Accepted by NeurIPS 2025, 26 pages
摘要:随着人工智能的兴起,室内场景合成变得越来越重要,这需要3D环境不仅在视觉上逼真,而且在物理上合理且功能多样。虽然最近的方法具有先进的视觉保真度,但它们通常仍然局限于固定的场景类别,缺乏足够的对象级细节和物理一致性,并且难以与复杂的用户指令保持一致。在这项工作中,我们提出了SceneWeaver,一个反射代理框架,通过基于工具的迭代细化统一了不同的场景合成范例。在其核心,SceneWeaver采用基于语言模型的规划器,从一套可扩展的场景生成工具中进行选择,从数据驱动的生成模型到基于视觉和LLM的方法,由物理可扩展性,视觉现实主义和语义对齐的自我评估与用户输入指导。这种闭环的reason-act-reflect设计使代理能够识别语义不一致,调用目标工具,并在连续的迭代中更新环境。对常见和开放词汇室类型的广泛实验表明,SceneWeaver不仅在物理,视觉和语义指标上优于先前的方法,而且还有效地推广到具有不同指令的复杂场景,标志着向通用3D环境生成迈出了一步。项目网址:https://scene-weaver.github.io/。
摘要:Indoor scene synthesis has become increasingly important with the rise of Embodied AI, which requires 3D environments that are not only visually realistic but also physically plausible and functionally diverse. While recent approaches have advanced visual fidelity, they often remain constrained to fixed scene categories, lack sufficient object-level detail and physical consistency, and struggle to align with complex user instructions. In this work, we present SceneWeaver, a reflective agentic framework that unifies diverse scene synthesis paradigms through tool-based iterative refinement. At its core, SceneWeaver employs a language model-based planner to select from a suite of extensible scene generation tools, ranging from data-driven generative models to visual- and LLM-based methods, guided by self-evaluation of physical plausibility, visual realism, and semantic alignment with user input. This closed-loop reason-act-reflect design enables the agent to identify semantic inconsistencies, invoke targeted tools, and update the environment over successive iterations. Extensive experiments on both common and open-vocabulary room types demonstrate that SceneWeaver not only outperforms prior methods on physical, visual, and semantic metrics, but also generalizes effectively to complex scenes with diverse instructions, marking a step toward general-purpose 3D environment generation. Project website: https://scene-weaver.github.io/.


编码器(1篇)

【1】Behind RoPE: How Does Causal Mask Encode Positional Information?
标题:RoPE背后:因果面具如何编码位置信息?
链接:https://arxiv.org/abs/2509.21042

作者: Xiao Liu, Zhenghao Lin, Lei Ji, Yeyun Gong, Edward Choi
备注:Codes available at: this https URL
摘要:虽然诸如RoPE的显式位置编码是Transformer解码器中的位置信息的主要来源,但是因果掩码也提供位置信息。在这项工作中,我们证明了因果掩模可以诱导注意力分数的位置依赖模式,即使没有参数或因果依赖的输入。我们的理论分析表明,诱导的注意模式往往有利于附近的查询键对,反映了常见的位置编码的行为。实证分析证实,训练的模型表现出相同的行为,学习的参数进一步放大了这些模式。值得注意的是,我们发现因果面具和RoPE的相互作用扭曲了RoPE的相对注意力得分模式为非相对的。我们一直在现代大型语言模型中观察到这种效应,这表明将因果掩码作为位置信息的来源以及显式位置编码的重要性。
摘要 :While explicit positional encodings such as RoPE are a primary source of positional information in Transformer decoders, the causal mask also provides positional information. In this work, we prove that the causal mask can induce position-dependent patterns in attention scores, even without parameters or causal dependency in the input. Our theoretical analysis indicates that the induced attention pattern tends to favor nearby query-key pairs, mirroring the behavior of common positional encodings. Empirical analysis confirms that trained models exhibit the same behavior, with learned parameters further amplifying these patterns. Notably, we found that the interaction of causal mask and RoPE distorts RoPE's relative attention score patterns into non-relative ones. We consistently observed this effect in modern large language models, suggesting the importance of considering the causal mask as a source of positional information alongside explicit positional encodings.


优化|敛散性(6篇)

【1】Optimal Robust Recourse with $L^p$-Bounded Model Change
标题:具有$L^p$-有界模型变化的最优鲁棒补偿
链接:https://arxiv.org/abs/2509.21293

作者:w, Kshitij Kayastha, Shahin Jabbari
摘要:追索权为那些收到不受欢迎的标签的人提供了机会(例如,拒绝贷款),并从算法决策系统中获得最低成本的改进建议,以实现预期的结果。然而,在实践中,模型经常被更新以反映数据分布或环境的变化,从而使追索建议无效(即,采取追索行动不会导致理想的结果)。强大的追索文学解决了这个问题,提供了一个框架,计算追索权的有效性是弹性的轻微变化的模型。然而,由于计算鲁棒补偿的优化问题是非凸的(即使是线性模型),目前的大多数方法没有任何理论保证的最优性的补偿。最近的工作Kayastha et.等人提供了第一个可证明的最优算法的鲁棒追索权相对于广义线性模型时,模型的变化测量使用$L^{\infty}$范数。然而,使用$L^{\infty}$范数可能导致代价高昂的追索解决方案。为了解决这个缺点,我们考虑更多的约束模型的变化定义的$L^p$范数,其中$p\geq 1$,但$p\neq \infty$,并提供了一个新的算法,可证明计算广义线性模型的最优鲁棒追索权。经验上,对于线性和非线性模型,我们证明了我们的算法实现了显着降低的追索权的价格(高达几个数量级)相比,以前的工作,也表现出更好的实施成本之间的权衡追索权和其有效性。我们的实证分析还表明,我们的方法提供了更稀疏的资源相比,以前的工作,并保持弹性的后处理方法,保证可行性。
摘要:Recourse provides individuals who received undesirable labels (e.g., denied a loan) from algorithmic decision-making systems with a minimum-cost improvement suggestion to achieve the desired outcome. However, in practice, models often get updated to reflect changes in the data distribution or environment, invalidating the recourse recommendations (i.e., following the recourse will not lead to the desirable outcome). The robust recourse literature addresses this issue by providing a framework for computing recourses whose validity is resilient to slight changes in the model. However, since the optimization problem of computing robust recourse is non-convex (even for linear models), most of the current approaches do not have any theoretical guarantee on the optimality of the recourse. Recent work by Kayastha et. al. provides the first provably optimal algorithm for robust recourse with respect to generalized linear models when the model changes are measured using the $L^{\infty}$ norm. However, using the $L^{\infty}$ norm can lead to recourse solutions with a high price. To address this shortcoming, we consider more constrained model changes defined by the $L^p$ norm, where $p\geq 1$ but $p\neq \infty$, and provide a new algorithm that provably computes the optimal robust recourse for generalized linear models. Empirically, for both linear and non-linear models, we demonstrate that our algorithm achieves a significantly lower price of recourse (up to several orders of magnitude) compared to prior work and also exhibits a better trade-off between the implementation cost of recourse and its validity. Our empirical analysis also illustrates that our approach provides more sparse recourses compared to prior work and remains resilient to post-processing approaches that guarantee feasibility.


【2】humancompatible.train: Implementing Optimization Algorithms for Stochastically-Constrained Stochastic Optimization Problems
标题:humancompatible.train:实现随机约束随机优化问题的优化算法
链接:https://arxiv.org/abs/2509.21254

作者:iachkin, Jana Lepšová, Gilles Bareilles, Jakub Mareček
备注:Accepted at NeurIPS workshop COML 2025
摘要:最近,人们对深度神经网络(DNN)的约束训练产生了相当大的兴趣,用于公平性和安全性等应用。已经为这项任务提出了几个工具包,但仍然没有行业标准。我们提出了humancompatible.训练(https://github.com/humancompatible/训练),这是一个易于扩展的基于PyTorch的Python包,用于训练具有随机约束的DNN。我们实现了多个以前未实现的随机约束随机优化算法。我们通过在具有公平性约束的深度学习任务上比较两种算法来演示工具包的使用。
摘要:There has been a considerable interest in constrained training of deep neural networks (DNNs) recently for applications such as fairness and safety. Several toolkits have been proposed for this task, yet there is still no industry standard. We present humancompatible.train (https://github.com/humancompatible/train), an easily-extendable PyTorch-based Python package for training DNNs with stochastic constraints. We implement multiple previously unimplemented algorithms for stochastically constrained stochastic optimization. We demonstrate the toolkit use by comparing two algorithms on a deep learning task with fairness constraints.


【3】Bispectral OT: Dataset Comparison using Symmetry-Aware Optimal Transport
标题:双频谱OT:使用对称性感知最佳传输的数据集比较
链接:https://arxiv.org/abs/2509.20678

作者:a, Kaiying Hou, David Alvarez-Melis, Melanie Weber
备注:Accepted to NeurIPS 2025 Workshop on Symmetry and Geometry in Neural Representations (NeurReps)
摘要:最优传输(OT)是机器学习、图形和视觉中广泛使用的技术,它使用两个分布或数据集的相对几何形状来对齐它们。然而,在数据丰富的环境中,仅基于原始特征之间的成对几何距离的OT对齐可能会忽略数据的内在相干结构。我们介绍了双谱最优传输,离散OT的一个具有感知能力的扩展,它使用双谱来比较元素,双谱是一种组傅立叶不变量,它保留了所有信号结构,同时只去除了由于组动作而引起的变化。从经验上讲,我们证明了用双谱OT计算的运输计划在用视觉对称变换的基准数据集上比朴素特征OT实现了更高的类保留精度,提高了捕获数据集中底层语义标签结构的有意义对应的质量,同时去除了不影响类或内容的讨厌的变化。
摘要 :Optimal transport (OT) is a widely used technique in machine learning, graphics, and vision that aligns two distributions or datasets using their relative geometry. In symmetry-rich settings, however, OT alignments based solely on pairwise geometric distances between raw features can ignore the intrinsic coherence structure of the data. We introduce Bispectral Optimal Transport, a symmetry-aware extension of discrete OT that compares elements using their representation using the bispectrum, a group Fourier invariant that preserves all signal structure while removing only the variation due to group actions. Empirically, we demonstrate that the transport plans computed with Bispectral OT achieve greater class preservation accuracy than naive feature OT on benchmark datasets transformed with visual symmetries, improving the quality of meaningful correspondences that capture the underlying semantic label structure in the dataset while removing nuisance variation not affecting class or content.


【4】Complexity-Driven Policy Optimization
标题:复杂性驱动的策略优化
链接:https://arxiv.org/abs/2509.20509

作者:ilippi, Giorgio Franceschelli, Antonio Corradi, Mirco Musolesi
摘要:策略梯度方法通常通过熵最大化来平衡开发和探索。然而,最大化熵将策略推向均匀的随机分布,这代表了非结构化且有时效率低下的探索策略。在这项工作中,我们建议用更强大的复杂性奖金取代熵奖金。特别是,我们采用了一种复杂性的措施,定义为产品的香农熵和不平衡,后者量化的距离均匀分布。这个正则化器鼓励平衡随机性(高熵)和结构(高不平衡)的政策,引导代理人走向有用的,非平凡的行为可以出现的制度。这种行为的出现是因为正则化器抑制了两个极端,例如,最大的无序和完全的有序,为代理人创造压力,以发现结构化但适应性强的策略。从最近策略优化(PPO)开始,我们引入了复杂性驱动的策略优化(CDPO),这是一种新的学习算法,它用复杂性取代了熵。我们的经验表明,在一系列离散的动作空间任务,CDPO是更强大的选择的复杂性系数比PPO的熵系数,特别是在环境中需要更大的探索。
摘要:Policy gradient methods often balance exploitation and exploration via entropy maximization. However, maximizing entropy pushes the policy towards a uniform random distribution, which represents an unstructured and sometimes inefficient exploration strategy. In this work, we propose replacing the entropy bonus with a more robust complexity bonus. In particular, we adopt a measure of complexity, defined as the product of Shannon entropy and disequilibrium, where the latter quantifies the distance from the uniform distribution. This regularizer encourages policies that balance stochasticity (high entropy) with structure (high disequilibrium), guiding agents toward regimes where useful, non-trivial behaviors can emerge. Such behaviors arise because the regularizer suppresses both extremes, e.g., maximal disorder and complete order, creating pressure for agents to discover structured yet adaptable strategies. Starting from Proximal Policy Optimization (PPO), we introduce Complexity-Driven Policy Optimization (CDPO), a new learning algorithm that replaces entropy with complexity. We show empirically across a range of discrete action space tasks that CDPO is more robust to the choice of the complexity coefficient than PPO is with the entropy coefficient, especially in environments requiring greater exploration.


【5】Breaking the curse of dimensionality for linear rules: optimal predictors over the ellipsoid
标题:打破线性规则的维度诅咒:椭圆体上的最佳预测器
链接:https://arxiv.org/abs/2509.21174

作者:me, Bruno Loureiro
摘要:在这项工作中,我们解决了以下问题:需要什么样的最小结构假设,以防止统计学习边界的退化与维数的增加?我们研究这个问题在经典的统计设置信号估计从$n$独立的线性观测$Y_i = X_i^{\top}\theta + \xA_i$。我们的重点是一个广泛的家庭的预测,可以表示为训练标签的线性组合,$f(X)= \sum_{i=1}^{n} l_{i}(X)Y_i$的泛化性能。这一类-通常被称为线性预测规则-包括广泛的流行参数和非参数估计,包括岭回归,梯度下降和核方法。我们的贡献有两方面。首先,我们推导出非渐近的上限和下限的泛化误差为这个类的假设下,贝叶斯预测$\θ $位于一个椭球。其次,当贝叶斯预测器固定时,我们建立了旋转不变线性预测规则子类的下界。我们的分析强调了对风险的两个基本贡献:(a)捕获数据的内在维度的类方差项;(b)无噪声误差,一个专门在高维区域出现的项。这些发现揭示了结构性假设在减轻维度灾难中的作用。
摘要:In this work, we address the following question: What minimal structural assumptions are needed to prevent the degradation of statistical learning bounds with increasing dimensionality? We investigate this question in the classical statistical setting of signal estimation from $n$ independent linear observations $Y_i = X_i^{\top}\theta + \epsilon_i$. Our focus is on the generalization properties of a broad family of predictors that can be expressed as linear combinations of the training labels, $f(X) = \sum_{i=1}^{n} l_{i}(X) Y_i$. This class -- commonly referred to as linear prediction rules -- encompasses a wide range of popular parametric and non-parametric estimators, including ridge regression, gradient descent, and kernel methods. Our contributions are twofold. First, we derive non-asymptotic upper and lower bounds on the generalization error for this class under the assumption that the Bayes predictor $\theta$ lies in an ellipsoid. Second, we establish a lower bound for the subclass of rotationally invariant linear prediction rules when the Bayes predictor is fixed. Our analysis highlights two fundamental contributions to the risk: (a) a variance-like term that captures the intrinsic dimensionality of the data; (b) the noiseless error, a term that arises specifically in the high-dimensional regime. These findings shed light on the role of structural assumptions in mitigating the curse of dimensionality.


【6】PALQO: Physics-informed Model for Accelerating Large-scale Quantum Optimization
标题:PALQO:加速大规模量子优化的物理信息模型
链接:https://arxiv.org/abs/2509.20733

作者:ang, Yajie Hao, Jing Zhou, Xiao Yuan, Xiaoting Wang, Yuxuan Du
摘要 :变分量子算法(VQA)是实现近期量子器件实用化的主要策略。然而,量子力学中的不可克隆定理排除了标准的反向传播,导致在将VQA应用于大规模任务时,量子资源成本过高。为了应对这一挑战,我们将VQA的训练动态重新表述为非线性偏微分方程,并提出了一种新的协议,该协议利用物理信息神经网络(PINN)来有效地对该动态系统进行建模。考虑到从量子设备收集的少量训练轨迹数据,我们的协议在经典侧的多次迭代中预测VQA的参数更新,大大降低了量子资源成本。通过系统的数值实验,我们证明,与传统方法相比,我们的方法实现了高达30倍的加速,并将涉及多达40个量子比特的任务的量子资源成本降低了90%,包括不同量子系统的基态准备,同时保持了具有竞争力的准确性。我们的方法补充了旨在提高VQA效率的现有技术,并进一步加强了其实际应用的潜力。
摘要:Variational quantum algorithms (VQAs) are leading strategies to reach practical utilities of near-term quantum devices. However, the no-cloning theorem in quantum mechanics precludes standard backpropagation, leading to prohibitive quantum resource costs when applying VQAs to large-scale tasks. To address this challenge, we reformulate the training dynamics of VQAs as a nonlinear partial differential equation and propose a novel protocol that leverages physics-informed neural networks (PINNs) to model this dynamical system efficiently. Given a small amount of training trajectory data collected from quantum devices, our protocol predicts the parameter updates of VQAs over multiple iterations on the classical side, dramatically reducing quantum resource costs. Through systematic numerical experiments, we demonstrate that our method achieves up to a 30x speedup compared to conventional methods and reduces quantum resource costs by as much as 90\% for tasks involving up to 40 qubits, including ground state preparation of different quantum systems, while maintaining competitive accuracy. Our approach complements existing techniques aimed at improving the efficiency of VQAs and further strengthens their potential for practical applications.


预测|估计(10篇)

【1】A Causality-Aware Spatiotemporal Model for Multi-Region and Multi-Pollutant Air Quality Forecasting
标题:多区域多污染物空气质量预测的天气感知时空模型
链接:https://arxiv.org/abs/2509.21260

作者:, Shiliang Sun
备注:25 pages, 8 figures
摘要:空气污染是一个紧迫的全球问题,威胁着公共健康、环境可持续性和气候稳定。由于复杂的多污染物相互作用、不断变化的气象条件和特定区域的空间异质性,在空间分布的监测站之间实现准确和可扩展的预测具有挑战性。为了应对这一挑战,我们提出了AirPCM,一种新的深度时空预测模型,它集成了多区域,多污染物动态与显式气象-污染物因果关系建模。与现有的方法局限于单一污染物或局部区域,AirPCM采用统一的架构,共同捕捉跨站的空间相关性,时间自相关性,气象污染物动态因果关系。这使得能够在不同的地理和时间尺度上进行精细的、可解释的多污染物预测,包括突发污染事件。对多尺度真实世界数据集的广泛评估表明,AirPCM在预测准确性和泛化能力方面始终超过最先进的基线。此外,AirPCM的长期预测能力为未来空气质量趋势和潜在的高风险窗口提供了可操作的见解,为循证环境治理和碳减排规划提供了及时的支持。
摘要:Air pollution, a pressing global problem, threatens public health, environmental sustainability, and climate stability. Achieving accurate and scalable forecasting across spatially distributed monitoring stations is challenging due to intricate multi-pollutant interactions, evolving meteorological conditions, and region specific spatial heterogeneity. To address this challenge, we propose AirPCM, a novel deep spatiotemporal forecasting model that integrates multi-region, multi-pollutant dynamics with explicit meteorology-pollutant causality modeling. Unlike existing methods limited to single pollutants or localized regions, AirPCM employs a unified architecture to jointly capture cross-station spatial correlations, temporal auto-correlations, and meteorology-pollutant dynamic causality. This empowers fine-grained, interpretable multi-pollutant forecasting across varying geographic and temporal scales, including sudden pollution episodes. Extensive evaluations on multi-scale real-world datasets demonstrate that AirPCM consistently surpasses state-of-the-art baselines in both predictive accuracy and generalization capability. Moreover, the long-term forecasting capability of AirPCM provides actionable insights into future air quality trends and potential high-risk windows, offering timely support for evidence-based environmental governance and carbon mitigation planning.


【2】Differential-Integral Neural Operator for Long-Term Turbulence Forecasting
标题:长期湍流预测的微积分神经运算器
链接:https://arxiv.org/abs/2509.21196

作者:uan Gao, Fan Xu, Fan Zhang, Qingsong Wen, Kun Wang, Xiaomeng Huang, Xian Wu
摘要:Accurately forecasting the long-term evolution of turbulence represents a grand challenge in scientific computing and is crucial for applications ranging from climate modeling to aerospace engineering. Existing deep learning methods, particularly neural operators, often fail in long-term autoregressive predictions, suffering from catastrophic error accumulation and a loss of physical fidelity. This failure stems from their inability to simultaneously capture the distinct mathematical structures that govern turbulent dynamics: local, dissipative effects and global, non-local interactions. In this paper, we propose the {\textbf{\underline{D}}}ifferential-{\textbf{\underline{I}}}ntegral {\textbf{\underline{N}}}eural {\textbf{\underline{O}}}perator (\method{}), a novel framework designed from a first-principles approach of operator decomposition. \method{} explicitly models the turbulent evolution through parallel branches that learn distinct physical operators: a local differential operator, realized by a constrained convolutional network that provably converges to a derivative, and a global integral operator, captured by a Transformer architecture that learns a data-driven global kernel. This physics-based decomposition endows \method{} with exceptional stability and robustness. Through extensive experiments on the challenging 2D Kolmogorov flow benchmark, we demonstrate that \method{} significantly outperforms state-of-the-art models in long-term forecasting. It successfully suppresses error accumulation over hundreds of timesteps, maintains high fidelity in both the vorticity fields and energy spectra, and establishes a new benchmark for physically consistent, long-range turbulence forecast.
摘要 :Accurately forecasting the long-term evolution of turbulence represents a grand challenge in scientific computing and is crucial for applications ranging from climate modeling to aerospace engineering. Existing deep learning methods, particularly neural operators, often fail in long-term autoregressive predictions, suffering from catastrophic error accumulation and a loss of physical fidelity. This failure stems from their inability to simultaneously capture the distinct mathematical structures that govern turbulent dynamics: local, dissipative effects and global, non-local interactions. In this paper, we propose the {\textbf{\underline{D}}}ifferential-{\textbf{\underline{I}}}ntegral {\textbf{\underline{N}}}eural {\textbf{\underline{O}}}perator (\method{}), a novel framework designed from a first-principles approach of operator decomposition. \method{} explicitly models the turbulent evolution through parallel branches that learn distinct physical operators: a local differential operator, realized by a constrained convolutional network that provably converges to a derivative, and a global integral operator, captured by a Transformer architecture that learns a data-driven global kernel. This physics-based decomposition endows \method{} with exceptional stability and robustness. Through extensive experiments on the challenging 2D Kolmogorov flow benchmark, we demonstrate that \method{} significantly outperforms state-of-the-art models in long-term forecasting. It successfully suppresses error accumulation over hundreds of timesteps, maintains high fidelity in both the vorticity fields and energy spectra, and establishes a new benchmark for physically consistent, long-range turbulence forecast.


【3】Deep Learning for Crime Forecasting: The Role of Mobility at Fine-grained Spatiotemporal Scales
标题:深度学习用于犯罪预测:移动性在细粒度时空尺度上的作用
链接:https://arxiv.org/abs/2509.20913

作者:lbors Zumel, Michele Tizzoni, Gian Maria Campedelli
备注:64 pages, 33 figures, and 6 tables (including appendix)
摘要:目的:开发一个深度学习框架,以评估是否以及如何将微观层面的流动性特征与历史犯罪和社会人口数据结合起来,以细粒度的空间和时间分辨率提高犯罪预测的预测性能。   方法:我们通过关注四个美国城市(即,巴尔的摩、芝加哥、洛杉矶和费城)。我们使用从每个城市的警察局获得的犯罪事件数据,结合美国社区调查的社会人口数据和Advan从2019年到2023年收集的人口流动数据。该数据被聚合到具有0.077平方米等大小单元格的网格中。英里(0.2平方米)kms),并用于训练我们的深度学习预测模型,这是一种卷积长短期记忆(ConvLSTM)网络,它使用14天和2天的输入序列提前12小时预测犯罪发生。我们还将其性能与三种基线模型进行了比较:逻辑回归,随机森林和标准LSTM。   结果:简化移动性特征提高了预测性能,特别是在使用较短的输入序列时。然而,值得注意的是,当同时使用移动性和社会人口特征时,可以获得最佳结果,我们的深度学习模型在所有四个城市中实现了最高的召回率,精确度和F1得分,优于其他方法。通过这种配置,较长的输入序列增强了对暴力犯罪的预测,而较短的序列对财产犯罪更有效。   结论:这些研究结果强调了整合不同的数据源时空犯罪预测,包括流动性的重要性。他们还强调了深度学习在处理细粒度空间和时间尺度时的优势(和局限性)。
摘要:Objectives: To develop a deep learning framework to evaluate if and how incorporating micro-level mobility features, alongside historical crime and sociodemographic data, enhances predictive performance in crime forecasting at fine-grained spatial and temporal resolutions.   Methods: We advance the literature on computational methods and crime forecasting by focusing on four U.S. cities (i.e., Baltimore, Chicago, Los Angeles, and Philadelphia). We employ crime incident data obtained from each city's police department, combined with sociodemographic data from the American Community Survey and human mobility data from Advan, collected from 2019 to 2023. This data is aggregated into grids with equally sized cells of 0.077 sq. miles (0.2 sq. kms) and used to train our deep learning forecasting model, a Convolutional Long Short-Term Memory (ConvLSTM) network, which predicts crime occurrences 12 hours ahead using 14-day and 2-day input sequences. We also compare its performance against three baseline models: logistic regression, random forest, and standard LSTM.   Results: Incorporating mobility features improves predictive performance, especially when using shorter input sequences. Noteworthy, however, the best results are obtained when both mobility and sociodemographic features are used together, with our deep learning model achieving the highest recall, precision, and F1 score in all four cities, outperforming alternative methods. With this configuration, longer input sequences enhance predictions for violent crimes, while shorter sequences are more effective for property crimes.   Conclusion: These findings underscore the importance of integrating diverse data sources for spatiotemporal crime forecasting, mobility included. They also highlight the advantages (and limits) of deep learning when dealing with fine-grained spatial and temporal scales.


【4】IConv: Focusing on Local Variation with Channel Independent Convolution for Multivariate Time Series Forecasting
标题:IConv:专注于局部变化,并通过通道独立卷积进行多元时间序列预测
链接:https://arxiv.org/abs/2509.20783

作者:, Hanbyeol Park, Minseop Kim, Dohee Kim, Hyerim Bae
备注:Submitted to AAAI
摘要:真实世界的时间序列数据通常表现出非平稳性,包括变化的趋势,不规则的季节性和残差。就变化趋势而言,最近提出的基于多层感知器(MLP)的模型由于其计算效率和捕获长期依赖性的能力而表现出优异的性能。然而,MLP架构的线性性质在应用于具有不同分布的信道时带来了限制,导致局部变化,例如季节性模式和残差分量被忽略。然而,卷积神经网络(CNN)可以有效地结合这些变化。为了解决MLP的局限性,我们建议将其与CNN相结合。使用MLP对总体趋势进行建模,以考虑长期依赖性。CNN使用不同的内核来结合MLP趋势预测对细粒度的局部模式进行建模。为了专注于对局部变化进行建模,我们提出了IConv,这是一种新的卷积架构,它独立地处理时间依赖通道,并通过不同的层考虑通道间的关系。独立通道处理使得能够对不同的局部时间依赖性进行建模,并采用大的内核大小。不同的通道间考虑降低了计算成本。通过对时间序列数据集的大量实验,对所提出的模型进行了评估。结果表明,该方法对多变量时间序列预测的优越性。
摘要:Real-world time-series data often exhibit non-stationarity, including changing trends, irregular seasonality, and residuals. In terms of changing trends, recently proposed multi-layer perceptron (MLP)-based models have shown excellent performance owing to their computational efficiency and ability to capture long-term dependency. However, the linear nature of MLP architectures poses limitations when applied to channels with diverse distributions, resulting in local variations such as seasonal patterns and residual components being ignored. However, convolutional neural networks (CNNs) can effectively incorporate these variations. To resolve the limitations of MLP, we propose combining them with CNNs. The overall trend is modeled using an MLP to consider long-term dependencies. The CNN uses diverse kernels to model fine-grained local patterns in conjunction with MLP trend predictions. To focus on modeling local variation, we propose IConv, a novel convolutional architecture that processes the temporal dependency channel independently and considers the inter-channel relationship through distinct layers. Independent channel processing enables the modeling of diverse local temporal dependencies and the adoption of a large kernel size. Distinct inter-channel considerations reduce computational cost. The proposed model is evaluated through extensive experiments on time-series datasets. The results reveal the superiority of the proposed method for multivariate time-series forecasting.


【5】Guiding Application Users via Estimation of Computational Resources for Massively Parallel Chemistry Computations
标题:通过评估大规模并行化学计算的计算资源来指导应用程序用户
链接:https://arxiv.org/abs/2509.20667

作者:abassum, Omer Subasi, Ajay Panyala, Epiya Ebiapia, Gerald Baumgartner, Erdal Mutlu, P. (Saday)Sadayappan, Karol Kowalski
摘要:在这项工作中,我们开发了基于机器学习(ML)的策略来预测大规模并行化学计算所需的资源(成本),例如耦合集群方法,以指导应用程序用户在超级计算机上进行昂贵的实验之前。通过预测应用程序的执行时间,我们确定最佳的运行时参数值,如节点数和瓦片大小。解决了用户感兴趣的两个关键问题。第一个是最短时间问题,用户有兴趣了解参数配置(节点数量和瓦片大小),以实现给定问题大小和目标超级计算机的最短执行时间。第二个是最便宜的运行问题,其中用户对最小化资源使用感兴趣,即,找到节点数和瓦片大小,使给定问题大小的节点小时数最小化。   我们评估了一系列丰富的ML模型和策略,这些模型和策略是基于在能源部(DOE)Frontier和Aurora超级计算机上执行的CCSD(单倍和双倍耦合集群)应用程序的运行时参数值集合开发的。我们的实验表明,在预测CCSD迭代的总执行时间时,梯度提升(GB)ML模型分别为Aurora和Frontier实现了0.023和0.073的平均绝对百分比误差(MAPE)。在仅仅为了收集数据点而进行实验是昂贵的情况下,我们证明了主动学习可以通过从Aurora和Frontier收集的大约450个实验实现大约0.2的MAPE。
摘要:In this work, we develop machine learning (ML) based strategies to predict resources (costs) required for massively parallel chemistry computations, such as coupled-cluster methods, to guide application users before they commit to running expensive experiments on a supercomputer. By predicting application execution time, we determine the optimal runtime parameter values such as number of nodes and tile sizes. Two key questions of interest to users are addressed. The first is the shortest-time question, where the user is interested in knowing the parameter configurations (number of nodes and tile sizes) to achieve the shortest execution time for a given problem size and a target supercomputer. The second is the cheapest-run question in which the user is interested in minimizing resource usage, i.e., finding the number of nodes and tile size that minimizes the number of node-hours for a given problem size.   We evaluate a rich family of ML models and strategies, developed based on the collections of runtime parameter values for the CCSD (Coupled Cluster with Singles and Doubles) application executed on the Department of Energy (DOE) Frontier and Aurora supercomputers. Our experiments show that when predicting the total execution time of a CCSD iteration, a Gradient Boosting (GB) ML model achieves a Mean Absolute Percentage Error (MAPE) of 0.023 and 0.073 for Aurora and Frontier, respectively. In the case where it is expensive to run experiments just to collect data points, we show that active learning can achieve a MAPE of about 0.2 with just around 450 experiments collected from Aurora and Frontier.


【6】MMG: Mutual Information Estimation via the MMSE Gap in Diffusion
标题:MMG:通过扩散中的SSE差距进行互信息估计
链接:https://arxiv.org/abs/2509.20609

作者:Yu, Xing Shi, Xianghao Kong, Tong Jia, Greg Ver Steeg
备注:Accepted to the SPIGM Workshop at NeurIPS 2025
摘要:互信息(MI)是衡量随机变量之间关系的最常用方法之一,但估计复杂系统的互信息量具有挑战性。去噪扩散模型最近为密度估计设置了一个新的标准,因此很自然地考虑这些方法是否也可以用于改进MI估计。使用最近推出的信息理论制定的去噪扩散模型,我们表明扩散模型可以用一种简单的方式来估计MI。特别是,MI对应于有条件和无条件扩散之间的最小均方误差(MMSE)中的间隙的一半,在噪声处理中的所有信噪比(SNR)上积分。我们的方法不仅通过自我一致性测试,但也优于传统的和基于分数的扩散MI估计。此外,我们的方法利用自适应重要性采样来实现可扩展的MI估计,同时保持强大的性能,即使当MI是高的。
摘要:Mutual information (MI) is one of the most general ways to measure relationships between random variables, but estimating this quantity for complex systems is challenging. Denoising diffusion models have recently set a new bar for density estimation, so it is natural to consider whether these methods could also be used to improve MI estimation. Using the recently introduced information-theoretic formulation of denoising diffusion models, we show the diffusion models can be used in a straightforward way to estimate MI. In particular, the MI corresponds to half the gap in the Minimum Mean Square Error (MMSE) between conditional and unconditional diffusion, integrated over all Signal-to-Noise-Ratios (SNRs) in the noising process. Our approach not only passes self-consistency tests but also outperforms traditional and score-based diffusion MI estimators. Furthermore, our method leverages adaptive importance sampling to achieve scalable MI estimation, while maintaining strong performance even when the MI is high.


【7】Auto-Regressive U-Net for Full-Field Prediction of Shrinkage-Induced Damage in Concrete
标题:自回归U型网络用于混凝土收缩损伤的全场预测
链接:https://arxiv.org/abs/2509.20507

作者:utdinova, Petr Havlásek, Ondřej Rokoš, Fleur Hendriks, Martin Doškář
摘要:本文介绍了一种用于预测混凝土中随时间变化的全场损伤的深度学习方法。该研究使用自回归U-网模型来预测给定微观结构几何形状的单胞中的标量损伤场的演变和施加的收缩轮廓的演变。通过顺序地使用预测的损伤输出作为后续预测的输入,该模型有利于损伤进展的连续评估。作为补充,卷积神经网络(CNN)利用损伤估计来预测关键的机械性能,包括观察到的收缩和残余刚度。所提出的双网络架构在合成数据集上表现出高计算效率和鲁棒的预测性能。该方法减少了传统上与全场损伤评估相关的计算负荷,并用于深入了解骨料特性(如形状、尺寸和分布)与有效收缩和刚度降低之间的关系。最终,这可以帮助优化混凝土配合比设计,从而提高耐久性并减少内部损坏。
摘要:This paper introduces a deep learning approach for predicting time-dependent full-field damage in concrete. The study uses an auto-regressive U-Net model to predict the evolution of the scalar damage field in a unit cell given microstructural geometry and evolution of an imposed shrinkage profile. By sequentially using the predicted damage output as input for subsequent predictions, the model facilitates the continuous assessment of damage progression. Complementarily, a convolutional neural network (CNN) utilises the damage estimations to forecast key mechanical properties, including observed shrinkage and residual stiffness. The proposed dual-network architecture demonstrates high computational efficiency and robust predictive performance on the synthesised datasets. The approach reduces the computational load traditionally associated with full-field damage evaluations and is used to gain insights into the relationship between aggregate properties, such as shape, size, and distribution, and the effective shrinkage and reduction in stiffness. Ultimately, this can help to optimize concrete mix designs, leading to improved durability and reduced internal damage.


【8】Conditionally Whitened Generative Models for Probabilistic Time Series Forecasting
标题:概率时间序列预测的连续白化生成模型
链接:https://arxiv.org/abs/2509.20928

作者:ang, Siwei Chen, Pingping Hu, Zhaotong Shen, Yingjie Zhang, Zhuoran Sun, Shuai Li, Ziqi Chen, Kenji Fukumizu
摘要:多变量时间序列的概率预测是具有挑战性的,由于非平稳性,变量间的依赖关系,和分布的变化。虽然最近的扩散和流量匹配模型已经显示出希望,但它们经常忽略信息先验,如条件均值和协方差。在这项工作中,我们提出了一个框架,通过条件白化,结合先验信息的连续白化生成模型(CW-Gen)。从理论上讲,我们建立了充分的条件下,取代传统的终端分布的扩散模型,即标准的多元正态分布,与多元正态分布参数化的条件均值和协方差的估计提高样本质量。在此分析的指导下,我们设计了一种新的联合均值协方差估计(JMCE),同时学习条件均值和滑动窗口协方差。在JMCE的基础上,我们引入了连续白化扩散模型(CW-Diff),并将其扩展为连续白化流匹配(CW-Flow)。在五个真实世界的数据集上进行的实验表明,CW-Gen始终增强了预测性能,比先前的方法更有效地捕捉非平稳动态和变量间的相关性。实证结果进一步表明,CW-Gen可以有效地减轻分布偏移的影响。
摘要:Probabilistic forecasting of multivariate time series is challenging due to non-stationarity, inter-variable dependencies, and distribution shifts. While recent diffusion and flow matching models have shown promise, they often ignore informative priors such as conditional means and covariances. In this work, we propose Conditionally Whitened Generative Models (CW-Gen), a framework that incorporates prior information through conditional whitening. Theoretically, we establish sufficient conditions under which replacing the traditional terminal distribution of diffusion models, namely the standard multivariate normal, with a multivariate normal distribution parameterized by estimators of the conditional mean and covariance improves sample quality. Guided by this analysis, we design a novel Joint Mean-Covariance Estimator (JMCE) that simultaneously learns the conditional mean and sliding-window covariance. Building on JMCE, we introduce Conditionally Whitened Diffusion Models (CW-Diff) and extend them to Conditionally Whitened Flow Matching (CW-Flow). Experiments on five real-world datasets with six state-of-the-art generative models demonstrate that CW-Gen consistently enhances predictive performance, capturing non-stationary dynamics and inter-variable correlations more effectively than prior-free approaches. Empirical results further demonstrate that CW-Gen can effectively mitigate the effects of distribution shift.


【9】Fast Estimation of Wasserstein Distances via Regression on Sliced Wasserstein Distances
标题:通过切片Wasserstein距离回归快速估计Wasserstein距离
链接:https://arxiv.org/abs/2509.20508

作者:en, Hai Nguyen, Nhat Ho
备注:35 pages, 20 figures, 4 tables
摘要:我们解决的问题,有效地计算Wasserstein距离的多对分布从一个元分布。为此,我们提出了一种快速估计方法的基础上回归Wasserstein距离切片Wasserstein(SW)距离。具体来说,我们利用标准的SW距离,它提供了下限,并解除SW距离,它提供了上限,作为预测的真实Wasserstein距离。为了确保简约性,我们引入了两个线性模型:具有封闭形式最小二乘解的无约束模型,以及仅使用一半参数的约束模型。我们表明,准确的模型可以从少量的分布对。一旦估计,该模型可以通过SW距离的线性组合预测任何一对分布的Wasserstein距离,使其非常有效。从经验上讲,我们验证了我们的方法在不同的任务,包括高斯混合,点云分类,Wasserstein空间可视化的三维点云。在各种数据集,如MNIST点云,ShapeNetV 2,MERFISH Cell Niches和scRNA-seq中,我们的方法始终提供比最先进的Wasserstein嵌入模型Wasserstein Wormhole更好的Wasserstein距离近似,特别是在低数据状态下。最后,我们证明了我们的估计器也可以加速虫洞训练,产生\textit{RG-虫洞}。
摘要:We address the problem of efficiently computing Wasserstein distances for multiple pairs of distributions drawn from a meta-distribution. To this end, we propose a fast estimation method based on regressing Wasserstein distance on sliced Wasserstein (SW) distances. Specifically, we leverage both standard SW distances, which provide lower bounds, and lifted SW distances, which provide upper bounds, as predictors of the true Wasserstein distance. To ensure parsimony, we introduce two linear models: an unconstrained model with a closed-form least-squares solution, and a constrained model that uses only half as many parameters. We show that accurate models can be learned from a small number of distribution pairs. Once estimated, the model can predict the Wasserstein distance for any pair of distributions via a linear combination of SW distances, making it highly efficient. Empirically, we validate our approach on diverse tasks, including Gaussian mixtures, point-cloud classification, and Wasserstein-space visualizations for 3D point clouds. Across various datasets such as MNIST point clouds, ShapeNetV2, MERFISH Cell Niches, and scRNA-seq, our method consistently provides a better approximation of Wasserstein distance than the state-of-the-art Wasserstein embedding model, Wasserstein Wormhole, particularly in low-data regimes. Finally, we demonstrate that our estimator can also accelerate Wormhole training, yielding \textit{RG-Wormhole}.


【10】Objective Evaluation of Prosody and Intelligibility in Speech Synthesis via Conditional Prediction of Discrete Tokens
标题:通过离散令牌的条件预测客观评估语音合成中的韵律和可理解度
链接:https://arxiv.org/abs/2509.20485

作者:sim Ulgen, Zongyang Du, Junchen Lu, Philipp Koehn, Berrak Sisman
备注:Under review for IEEE OJSP
摘要 :合成语音的客观评估对于推进语音生成系统至关重要,但现有的可懂度和韵律指标范围仍然有限,并且与人类感知的相关性较弱。词错误率(WER)只提供了一个粗略的基于文本的可理解性的测量,而F0-RMSE和相关的基于音高的度量提供了一个狭窄的,依赖于参考的韵律视图。为了解决这些限制,我们提出了TTScore,一个有针对性的和无参考的评估框架的基础上的离散语音标记的条件预测。TTScore采用两个以输入文本为条件的序列到序列预测器:TTScore-int,它通过内容令牌测量可理解性,TTScore-pro,它通过韵律令牌评估韵律。对于每个合成的话语,预测器计算相应的令牌序列的可能性,产生可解释的分数,捕获与预期的语言内容和韵律结构的对齐。SOMOS,VoiceMOS和TTSArena基准测试的实验表明,TTScore-int和TTScore-pro提供了可靠的,特定方面的评估,并实现了更强的相关性与人类判断的整体质量比现有的可理解性和韵律为重点的指标。
摘要:Objective evaluation of synthesized speech is critical for advancing speech generation systems, yet existing metrics for intelligibility and prosody remain limited in scope and weakly correlated with human perception. Word Error Rate (WER) provides only a coarse text-based measure of intelligibility, while F0-RMSE and related pitch-based metrics offer a narrow, reference-dependent view of prosody. To address these limitations, we propose TTScore, a targeted and reference-free evaluation framework based on conditional prediction of discrete speech tokens. TTScore employs two sequence-to-sequence predictors conditioned on input text: TTScore-int, which measures intelligibility through content tokens, and TTScore-pro, which evaluates prosody through prosody tokens. For each synthesized utterance, the predictors compute the likelihood of the corresponding token sequences, yielding interpretable scores that capture alignment with intended linguistic content and prosodic structure. Experiments on the SOMOS, VoiceMOS, and TTSArena benchmarks demonstrate that TTScore-int and TTScore-pro provide reliable, aspect-specific evaluation and achieve stronger correlations with human judgments of overall quality than existing intelligibility and prosody-focused metrics.


其他神经网络|深度学习|模型|建模(31篇)

【1】From Physics to Machine Learning and Back: Part II - Learning and Observational Bias in PHM
标题:从物理学到机器学习再回来:第二部分-PHM中的学习和观察偏差
链接:https://arxiv.org/abs/2509.21207

作者:, Ismail Nejjar, Vinay Sharma, Keivan Faghih Niresi, Han Sun, Hao Dong, Chenghao Xu, Amaury Wei, Arthur Bizzi, Raffael Theiler, Yuan Tian, Leandro Von Krannichfeldt, Zhan Ma, Sergei Garmaev, Zepeng Zhang, Mengjie Zhao
摘要:预测和健康管理通过实现故障检测、预测设备故障并优化整个资产生命周期的维护活动,确保复杂工程系统的可靠性、安全性和效率。然而,现实世界的PHM提出了持续的挑战:传感器数据往往是嘈杂的或不完整的,可用的标签是有限的,退化行为和系统的相互依赖性可能是高度复杂和非线性的。物理信息机器学习已经成为一种很有前途的方法,通过将物理知识嵌入到数据驱动的模型中来解决这些限制。本文探讨了如何通过物理建模和数据策略来整合学习和观察偏差,从而引导模型实现物理一致和可靠的预测。学习偏差通过物理信息损失函数和控制方程,或通过结合单调性等属性,将物理约束嵌入模型训练中。观察偏差会影响数据选择和合成,以确保模型通过用于估计未测量状态的虚拟传感、用于数据增强的基于物理学的模拟以及多传感器融合策略来捕获真实的系统行为。然后,审查这些方法如何通过强化学习实现从被动预测到主动决策的过渡,强化学习允许代理学习尊重物理约束的维护策略,同时优化运营目标。这就在基于模型的预测、仿真和实际系统操作之间建立了闭环,从而实现了自适应决策。最后,该审查解决了将PHM解决方案从单个资产扩展到整个机队部署的关键挑战。快速适应方法,包括元学习和Few-Shot学习领域泛化技术一起审查。
摘要:Prognostics and Health Management ensures the reliability, safety, and efficiency of complex engineered systems by enabling fault detection, anticipating equipment failures, and optimizing maintenance activities throughout an asset lifecycle. However, real-world PHM presents persistent challenges: sensor data is often noisy or incomplete, available labels are limited, and degradation behaviors and system interdependencies can be highly complex and nonlinear. Physics-informed machine learning has emerged as a promising approach to address these limitations by embedding physical knowledge into data-driven models. This review examines how incorporating learning and observational biases through physics-informed modeling and data strategies can guide models toward physically consistent and reliable predictions. Learning biases embed physical constraints into model training through physics-informed loss functions and governing equations, or by incorporating properties like monotonicity. Observational biases influence data selection and synthesis to ensure models capture realistic system behavior through virtual sensing for estimating unmeasured states, physics-based simulation for data augmentation, and multi-sensor fusion strategies. The review then examines how these approaches enable the transition from passive prediction to active decision-making through reinforcement learning, which allows agents to learn maintenance policies that respect physical constraints while optimizing operational objectives. This closes the loop between model-based predictions, simulation, and actual system operation, empowering adaptive decision-making. Finally, the review addresses the critical challenge of scaling PHM solutions from individual assets to fleet-wide deployment. Fast adaptation methods including meta-learning and few-shot learning are reviewed alongside domain generalization techniques ...


【2】Closed-form $\ell_r$ norm scaling with data for overparameterized linear regression and diagonal linear networks under $\ell_p$ bias
链接:https://arxiv.org/abs/2509.21181

作者:Zhang, Ard Louis
摘要:对于具有迷向高斯设计和最小-$\ell_p$n $p\in(1,2]$的超参数线性回归,我们给出了参数范数族$ \{ \lVert \widehat{w_p} \rVert_r \\}_{r \in [1,p]} $的标度随样本容量的统一的高概率刻画.   我们通过一个简单的双射线分析解决了这个基本但尚未解决的问题,该分析揭示了信号 *spike* 和$X^\top Y$中的空坐标 *bulk* 之间的竞争,产生了以下的封闭形式预测:(i)数据依赖的跃迁$n_\star$(“肘”),和(ii)一个通用的阈值$r_\star=2(p-1)$,它将$\lVert \widehat{w_p} \rVert_r$的平台与那些继续以显式指数增长的平台分开。   这个统一的解决方案解决了在$\ell_p$-偏置插值下[1,p]$中r\族中 * 所有 * $\ell_r$范数的缩放,并在一幅图中解释了哪些范数饱和,哪些范数随着n$的增长而增加。   然后,我们研究了通过梯度下降训练的对角线性网络(DLN)。通过校准的初始化规模$\alpha$的有效$p_{\marthm {eff}}(\alpha)$通过DLN可分离的潜力,我们经验表明,DLN继承相同的肘/阈值法律,提供了一个预测之间的桥梁显式和隐式的偏见。   考虑到许多泛化代理依赖于$\lVert \widehat {w_p} \rVert_r$,我们的结果表明,它们的预测能力将敏感地依赖于使用的$l_r$范数。
摘要 :For overparameterized linear regression with isotropic Gaussian design and minimum-$\ell_p$ interpolator $p\in(1,2]$, we give a unified, high-probability characterization for the scaling of the family of parameter norms $ \\{ \lVert \widehat{w_p} \rVert_r \\}_{r \in [1,p]} $ with sample size.   We solve this basic, but unresolved question through a simple dual-ray analysis, which reveals a competition between a signal *spike* and a *bulk* of null coordinates in $X^\top Y$, yielding closed-form predictions for (i) a data-dependent transition $n_\star$ (the "elbow"), and (ii) a universal threshold $r_\star=2(p-1)$ that separates $\lVert \widehat{w_p} \rVert_r$'s which plateau from those that continue to grow with an explicit exponent.   This unified solution resolves the scaling of *all* $\ell_r$ norms within the family $r\in [1,p]$ under $\ell_p$-biased interpolation, and explains in one picture which norms saturate and which increase as $n$ grows.   We then study diagonal linear networks (DLNs) trained by gradient descent. By calibrating the initialization scale $\alpha$ to an effective $p_{\mathrm{eff}}(\alpha)$ via the DLN separable potential, we show empirically that DLNs inherit the same elbow/threshold laws, providing a predictive bridge between explicit and implicit bias.   Given that many generalization proxies depend on $\lVert \widehat {w_p} \rVert_r$, our results suggest that their predictive power will depend sensitively on which $l_r$ norm is used.


【3】A Unified Framework for Diffusion Model Unlearning with f-Divergence
标题:具有f-分歧的扩散模型去学习的统一框架
链接:https://arxiv.org/abs/2509.21167

作者:vello, Federico Fontana, Luigi Cinque, Deniz Gunduz, Andrea M. Tonello
摘要:机器非学习旨在从训练模型中删除特定知识。虽然扩散模型(DM)已经显示出显着的生成能力,现有的文本到图像(T2 I)模型的学习方法往往依赖于最小化目标和锚概念的输出分布之间的均方误差(MSE)。我们表明,这种基于MSE的方法是一个特殊的情况下,一个统一的$f$-分歧为基础的框架,其中任何$f$-分歧可以利用。我们分析了使用不同$f$-分歧的好处,这主要影响算法的收敛性和非学习的质量。建议的统一框架提供了一个灵活的范例,允许选择一个特定的应用程序的最佳分歧,积极的遗忘和概念保存之间的平衡不同的权衡。
摘要:Machine unlearning aims to remove specific knowledge from a trained model. While diffusion models (DMs) have shown remarkable generative capabilities, existing unlearning methods for text-to-image (T2I) models often rely on minimizing the mean squared error (MSE) between the output distribution of a target and an anchor concept. We show that this MSE-based approach is a special case of a unified $f$-divergence-based framework, in which any $f$-divergence can be utilized. We analyze the benefits of using different $f$-divergences, that mainly impact the convergence properties of the algorithm and the quality of unlearning. The proposed unified framework offers a flexible paradigm that allows to select the optimal divergence for a specific application, balancing different trade-offs between aggressive unlearning and concept preservation.


【4】Mixture of Thoughts: Learning to Aggregate What Experts Think, Not Just What They Say
标题:思想混合:学会汇总专家的想法,而不仅仅是他们所说的
链接:https://arxiv.org/abs/2509.21164

作者:n-Ashley, Dhruv Parikh, Rajgopal Kannan, Viktor Prasanna
摘要:开源大型语言模型(LLM)越来越多地按领域进行专业化(例如,数学,代码,一般推理),激励系统,利用互补的优势,跨模型。先前的多LLM方法(i)将查询路由到一个或几个专家并独立生成,(ii)通过昂贵的多回合交换从每个模型聚合输出,或(iii)将权重融合到单个模型中-通常需要架构同质性。我们介绍了混合思想(MoT),一个简单的方法,潜在的异构专家之间的合作,根据全球路由方案。对于每个查询,一个轻量级的路由器选择顶级的$K$专家,并指定一个主要的专家;均匀放置的交互层项目隐藏的状态到一个共享的潜在空间,在那里的主要专家执行交叉关注其活动(选定)同行。预先训练的专家保持冻结状态;只有路由器和轻量级交互层使用新的联合训练目标进行训练,该目标改善了专家选择和专家间协作。在五个分发(ID)和三个分发(OOD)基准测试中,MoT分别超过了当前基于路由和聚合的最先进的Avengers,分别为+0.38美元和+2.92美元。此外,MoT显著优于表现最好的单一模型。它通过单遍推理实现了这一点,运行时与路由基线相当,并且没有迭代聚合的开销。MoT提供了一个简单的潜在空间机制来组合异构LLM,这是迈向更广泛的多LLM协作的实际步骤。我们的代码可在https://github.com/jacobfa/mot上公开获取。
摘要:Open-source Large Language Models (LLMs) increasingly specialize by domain (e.g., math, code, general reasoning), motivating systems that leverage complementary strengths across models. Prior multi-LLM approaches either (i) route a query to one or a few experts and generate independently, (ii) aggregate outputs from each model via costly multi-turn exchanges, or (iii) fuse weights into a single model-typically requiring architectural homogeneity. We introduce Mixture of Thoughts (MoT), a simple method for latent-level collaboration among heterogeneous experts under a global routing scheme. For each query, a lightweight router selects top-$K$ experts and designates a primary expert; uniformly placed interaction layers project hidden states into a shared latent space where the primary expert performs cross-attention over its active (selected) peers. Pre-trained experts remain frozen; only the router and the lightweight interaction layers are trained with a novel joint training objective that improves both the expert selection and inter-expert collaboration. Across five in-distribution (ID) and three out-of-distribution (OOD) benchmarks, MoT surpasses the current routing and aggregation-based state-of-the-art, Avengers, by $+0.38\%$ and $+2.92\%$, respectively. Further, MoT significantly outperforms the best-performing single model. It achieves this with single-pass inference, runtime comparable to routing baselines, and none of the overheads of iterative aggregation. MoT offers a simple latent-space mechanism for combining heterogeneous LLMs, a practical step toward broader multi-LLM collaboration. Our code is publicly available at https://github.com/jacobfa/mot.


【5】DATS: Distance-Aware Temperature Scaling for Calibrated Class-Incremental Learning
标题:DATS:用于校准类增量学习的距离感知温度缩放
链接:https://arxiv.org/abs/2509.21161

作者:Serra, Florian Buettner
摘要:持续学习(CL)最近越来越受到关注,因为它能够使单个模型从一系列新类中增量学习。在这种情况下,重要的是要在所有类别中保持一致的预测性能,并防止所谓的灾难性遗忘(CF)。然而,在安全关键型应用中,仅预测性能是不够的。预测模型还应该能够以校准的方式可靠地传达其不确定性-也就是说,置信度得分与目标事件的真实频率一致。CL中的现有方法主要从以数据为中心的角度解决校准问题,依赖于所有任务共享的单个温度。这种解决方案忽略了特定于任务的差异,导致任务之间的校准误差波动很大。出于这个原因,我们认为,一个更有原则的方法应该适应温度根据距离到当前的任务。然而,在测试时/部署期间任务信息的不可用对实现预期目标构成了重大挑战。为此,我们提出了距离感知温度缩放(DATS),它结合了基于原型的距离估计与距离感知校准来推断任务接近度,并在没有先前任务信息的情况下分配自适应温度。通过对标准基准和来自生物医学领域的真实世界不平衡数据集进行广泛的实证评估,与最先进的方法相比,我们的方法在减少任务之间的校准误差方面表现出稳定,可靠和一致性。
摘要 :Continual Learning (CL) is recently gaining increasing attention for its ability to enable a single model to learn incrementally from a sequence of new classes. In this scenario, it is important to keep consistent predictive performance across all the classes and prevent the so-called Catastrophic Forgetting (CF). However, in safety-critical applications, predictive performance alone is insufficient. Predictive models should also be able to reliably communicate their uncertainty in a calibrated manner - that is, with confidence scores aligned to the true frequencies of target events. Existing approaches in CL address calibration primarily from a data-centric perspective, relying on a single temperature shared across all tasks. Such solutions overlook task-specific differences, leading to large fluctuations in calibration error across tasks. For this reason, we argue that a more principled approach should adapt the temperature according to the distance to the current task. However, the unavailability of the task information at test time/during deployment poses a major challenge to achieve the intended objective. For this, we propose Distance-Aware Temperature Scaling (DATS), which combines prototype-based distance estimation with distance-aware calibration to infer task proximity and assign adaptive temperatures without prior task information. Through extensive empirical evaluation on both standard benchmarks and real-world, imbalanced datasets taken from the biomedical domain, our approach demonstrates to be stable, reliable and consistent in reducing calibration error across tasks compared to state-of-the-art approaches.


【6】GRPO is Secretly a Process Reward Model
标题:GRPO秘密地是一种流程奖励模型
链接:https://arxiv.org/abs/2509.21154

作者:ullivan
备注:14 pages, 6 figures; under review at ICLR 2026
摘要:我们从理论上证明了GRPO RL算法诱导一个非平凡的过程奖励模型(PRM),在一定的假设下,组内重叠的令牌序列完成。然后,我们经验表明,这些假设在现实世界的条件下得到满足:GRPO实际上诱导一个非平凡的PRM。利用GRPO-as-a-PRM的框架,我们发现了GRPO目标中的一个缺陷:非均匀分布的流程步骤阻碍了勘探和开发(在不同条件下)。我们提出了一个简单的修改算法,以减轻这一缺陷($\lambda$-GRPO),并表明,LLM训练与$\lambda$-GRPO实现更高的验证精度和性能的下游推理任务$-$,并达到峰值性能更快$-$比LLM训练与标准GRPO。我们的研究结果对GRPO昂贵的、明确定义的PRM的优势提出了质疑:我们表明,可以利用香草GRPO算法中隐藏的内置PRM结构来提高模型性能,而对训练时间和成本的影响可以忽略不计。
摘要:We prove theoretically that the GRPO RL algorithm induces a non-trivial process reward model (PRM), under certain assumptions regarding within-group overlap of token sequences across completions. We then show empirically that these assumptions are met under real-world conditions: GRPO does in fact induce a non-trivial PRM. Leveraging the framework of GRPO-as-a-PRM, we identify a flaw in the GRPO objective: non-uniformly distributed process steps hinder both exploration and exploitation (under different conditions). We propose a simple modification to the algorithm to mitigate this defect ($\lambda$-GRPO), and show that LLMs trained with $\lambda$-GRPO achieve higher validation accuracy and performance on downstream reasoning tasks$-$and reach peak performance more rapidly$-$than LLMs trained with standard GRPO. Our results call into question the advantage of costly, explicitly-defined PRMs for GRPO: we show that it is possible to instead leverage the hidden, built-in PRM structure within the vanilla GRPO algorithm to boost model performance with a negligible impact on training time and cost.


【7】Physics of Learning: A Lagrangian perspective to different learning paradigms
标题:学习物理学:不同学习范式的拉格朗日视角
链接:https://arxiv.org/abs/2509.21049

作者:o, Bernhard Schölkopf
备注:Work in progress
摘要:我们研究的问题,建立一个有效的学习系统。有效的学习在最短的时间内处理信息,即,建立一个系统,该系统以最少的观测值达到期望的误差阈值。基于物理学中的最小作用原理,我们从第一原理中推导出经典的学习算法,强化学习中的Bellman最优性方程和生成模型中的Adam优化器,即,学习$\textit{拉格朗日}$。我们假设学习搜索拉格朗日中的固定路径,并且学习算法可以通过寻找固定轨迹来推导。
摘要:We study the problem of building an efficient learning system. Efficient learning processes information in the least time, i.e., building a system that reaches a desired error threshold with the least number of observations. Building upon least action principles from physics, we derive classic learning algorithms, Bellman's optimality equation in reinforcement learning, and the Adam optimizer in generative models from first principles, i.e., the Learning $\textit{Lagrangian}$. We postulate that learning searches for stationary paths in the Lagrangian, and learning algorithms are derivable by seeking the stationary trajectories.


【8】Mechanism of Task-oriented Information Removal in In-context Learning
标题:情境学习中任务导向信息删除机制
链接:https://arxiv.org/abs/2509.21012

作者:o, Haolin Yang, Gouki Minegishi, Naoya Inoue
备注:67 pages, 70 figures, 7 tables
摘要:语境学习是一种基于现代语言模型的Few-Shot学习范式,但其内在机制尚不清楚。在本文中,我们研究的机制,通过一个新的角度信息删除。具体来说,我们证明了在zero-shot的情况下,LM编码查询到非选择性表示隐藏状态包含所有可能的任务的信息,导致任意输出,而不专注于预期的任务,导致接近零的准确性。同时,我们发现,选择性地删除特定的信息从隐藏状态的低秩过滤器有效地引导LM向预期的任务。基于这些发现,通过测量精心设计的指标上的隐藏状态,我们观察到,Few-Shot ICL有效地模拟了这种面向任务的信息去除过程,选择性地去除冗余信息纠缠非选择性表示,并提高输出的基础上的演示,这构成了ICL的关键机制。此外,我们确定了必要的注意头诱导删除操作,称为去噪头,这使得消融实验阻止信息删除操作的推断,其中ICL的准确性显着降低,特别是当正确的标签是缺席的Few-Shot演示,确认信息删除机制和去噪头的关键作用。
摘要 :In-context Learning (ICL) is an emerging few-shot learning paradigm based on modern Language Models (LMs), yet its inner mechanism remains unclear. In this paper, we investigate the mechanism through a novel perspective of information removal. Specifically, we demonstrate that in the zero-shot scenario, LMs encode queries into non-selective representations in hidden states containing information for all possible tasks, leading to arbitrary outputs without focusing on the intended task, resulting in near-zero accuracy. Meanwhile, we find that selectively removing specific information from hidden states by a low-rank filter effectively steers LMs toward the intended task. Building on these findings, by measuring the hidden states on carefully designed metrics, we observe that few-shot ICL effectively simulates such task-oriented information removal processes, selectively removing the redundant information from entangled non-selective representations, and improving the output based on the demonstrations, which constitutes a key mechanism underlying ICL. Moreover, we identify essential attention heads inducing the removal operation, termed Denoising Heads, which enables the ablation experiments blocking the information removal operation from the inference, where the ICL accuracy significantly degrades, especially when the correct label is absent from the few-shot demonstrations, confirming both the critical role of the information removal mechanism and denoising heads.


【9】Lossless Compression: A New Benchmark for Time Series Model Evaluation
标题:无损压缩:时间序列模型评估的新基准
链接:https://arxiv.org/abs/2509.21002

作者: Benxi Tian, Jue Wang, Cui Hui, Ningming Nie, Tiantian Liu, Zongguo Wang, Cao Rongqiang, Peng Shi, Yangang Wang
备注:24 pages
摘要:时间序列模型的评估传统上集中在四个典型的任务:预测,插补,异常检测和分类。虽然这些任务推动了重大进展,但它们主要评估特定任务的性能,而不是严格衡量模型是否捕获了数据的完整生成分布。我们引入无损压缩作为一个新的范例评估时间序列模型,基于香农的信源编码定理。这种观点建立了最佳压缩长度和负对数似然之间的直接等价关系,为建模能力提供了严格统一的信息论标准。然后,我们定义了一个标准化的评估协议和指标。我们进一步提出并开源了一个全面的评估框架TSCom-Bench,它可以快速适应时间序列模型作为无损压缩的骨干。在最先进的模型(包括TimeXer,iTransformer和PatchTST)上进行的不同数据集的实验表明,压缩揭示了经典基准测试所忽视的分布弱点。这些发现将无损压缩定位为一项原则性任务,可以补充和扩展现有的时间序列建模评估。
摘要:The evaluation of time series models has traditionally focused on four canonical tasks: forecasting, imputation, anomaly detection, and classification. While these tasks have driven significant progress, they primarily assess task-specific performance and do not rigorously measure whether a model captures the full generative distribution of the data. We introduce lossless compression as a new paradigm for evaluating time series models, grounded in Shannon's source coding theorem. This perspective establishes a direct equivalence between optimal compression length and the negative log-likelihood, providing a strict and unified information-theoretic criterion for modeling capacity. Then We define a standardized evaluation protocol and metrics. We further propose and open-source a comprehensive evaluation framework TSCom-Bench, which enables the rapid adaptation of time series models as backbones for lossless compression. Experiments across diverse datasets on state-of-the-art models, including TimeXer, iTransformer, and PatchTST, demonstrate that compression reveals distributional weaknesses overlooked by classic benchmarks. These findings position lossless compression as a principled task that complements and extends existing evaluation for time series modeling.


【10】Learning Ising Models under Hard Constraints using One Sample
标题:使用一个样本在硬约束下学习伊辛模型
链接:https://arxiv.org/abs/2509.20993

作者:uhan, Ioannis Panageas
摘要:考虑了n维截断伊辛模型的逆温度参数β的单样本估计问题。给定一个有n$个顶点的图G =(V,E)$,截断伊辛模型是n$维超立方体$\{-1,1\}^n$上的概率分布,其中每个配置$\mathbf{\sigma}$被约束为位于截断集$S \subseteq \{-1,1\}^n$且有概率$\Pr(\mathbf{\sigma})\propto \exp(\beta\mathbf{\sigma}^\top A\mathbf{\sigma})$,其中$A$是$G$的邻接矩阵。我们采用最近的设置[Galanis等人。SODA'24],其中截断集$S$可以表示为一个$k$-SAT公式的满足分配的集合。给定一个截断伊辛模型的单个样本,逆参数为$\beta^*$,有界度的图G $\Delta$和S$被表示为满足k$-SAT公式的赋值集,我们在近O(n)$时间内设计了一个估计量$\hat{\beta}$,(\Delta^3/\sqrt{n})$-与$k \gtrsim \log(d^2k)\Delta^3的真实参数$\beta^*$一致。$   我们的估计是基于最大化的pseudolikkaline,一个概念,已经收到了广泛的分析,各种概率模型没有[Chatterjee,统计年鉴'07]或截断[Galanis等人。SODA '24]。我们的方法概括了最近的技术[Daskalakis et al. STOC '19,Galanis et al. SODA '24],以面对截断伊辛模型的更具挑战性的设置。
摘要:We consider the problem of estimating inverse temperature parameter $\beta$ of an $n$-dimensional truncated Ising model using a single sample. Given a graph $G = (V,E)$ with $n$ vertices, a truncated Ising model is a probability distribution over the $n$-dimensional hypercube $\{-1,1\}^n$ where each configuration $\mathbf{\sigma}$ is constrained to lie in a truncation set $S \subseteq \{-1,1\}^n$ and has probability $\Pr(\mathbf{\sigma}) \propto \exp(\beta\mathbf{\sigma}^\top A\mathbf{\sigma})$ with $A$ being the adjacency matrix of $G$. We adopt the recent setting of [Galanis et al. SODA'24], where the truncation set $S$ can be expressed as the set of satisfying assignments of a $k$-SAT formula. Given a single sample $\mathbf{\sigma}$ from a truncated Ising model, with inverse parameter $\beta^*$, underlying graph $G$ of bounded degree $\Delta$ and $S$ being expressed as the set of satisfying assignments of a $k$-SAT formula, we design in nearly $O(n)$ time an estimator $\hat{\beta}$ that is $O(\Delta^3/\sqrt{n})$-consistent with the true parameter $\beta^*$ for $k \gtrsim \log(d^2k)\Delta^3.$   Our estimator is based on the maximization of the pseudolikelihood, a notion that has received extensive analysis for various probabilistic models without [Chatterjee, Annals of Statistics '07] or with truncation [Galanis et al. SODA '24]. Our approach generalizes recent techniques from [Daskalakis et al. STOC '19, Galanis et al. SODA '24], to confront the more challenging setting of the truncated Ising model.


【11】Unlocking Noise-Resistant Vision: Key Architectural Secrets for Robust Models
标题:解锁抗噪愿景:稳健模型的关键架构秘密
链接:https://arxiv.org/abs/2509.20939

作者:im, Makoto Kawano, Yusuke Iwasawa, Yutaka Matsuo
备注:30 pages, 5 figures
摘要 :虽然视觉模型的鲁棒性经常被测量,但它们对特定架构设计选择的依赖性很少被剖析。我们研究为什么某些视觉架构本质上对加性高斯噪声更鲁棒,并将这些经验见解转换为简单,可操作的设计规则。具体来说,我们对1,174个预训练的视觉模型进行了广泛的评估,根据经验确定了四种一致的设计模式,以提高对高斯噪声的鲁棒性:更大的干内核,更小的输入分辨率,平均池化和监督Vision Transformers(ViTs)而不是CLIP ViTs,这产生了高达506的排名改进和21.6\%p的准确性增益。然后,我们开发了一个理论分析,解释这些发现,将观察到的相关性转化为因果机制。首先,我们证明了低通干内核衰减噪声的增益随内核大小的平方下降,抗锯齿下采样降低噪声能量大致与下采样因子的平方成比例。其次,我们证明了平均池化是无偏的,并抑制噪声的比例池化窗口面积,而最大池化引起的正偏差,随着窗口大小缓慢增长,并产生相对较高的均方误差和更大的最坏情况下的灵敏度。第三,我们通过像素空间Lipschitz界揭示并解释了CLIP ViTs的脆弱性:CLIP预处理中使用的较小归一化标准差将最坏情况的灵敏度放大了1.91倍,相对于监督ViTs中常见的Inception式预处理。我们的研究结果共同将鲁棒性分解为可解释的模块,提供了一种解释观察到的趋势的理论,并为设计对高斯噪声更鲁棒的视觉模型建立了实用的即插即用指南。
摘要:While the robustness of vision models is often measured, their dependence on specific architectural design choices is rarely dissected. We investigate why certain vision architectures are inherently more robust to additive Gaussian noise and convert these empirical insights into simple, actionable design rules. Specifically, we performed extensive evaluations on 1,174 pretrained vision models, empirically identifying four consistent design patterns for improved robustness against Gaussian noise: larger stem kernels, smaller input resolutions, average pooling, and supervised vision transformers (ViTs) rather than CLIP ViTs, which yield up to 506 rank improvements and 21.6\%p accuracy gains. We then develop a theoretical analysis that explains these findings, converting observed correlations into causal mechanisms. First, we prove that low-pass stem kernels attenuate noise with a gain that decreases quadratically with kernel size and that anti-aliased downsampling reduces noise energy roughly in proportion to the square of the downsampling factor. Second, we demonstrate that average pooling is unbiased and suppresses noise in proportion to the pooling window area, whereas max pooling incurs a positive bias that grows slowly with window size and yields a relatively higher mean-squared error and greater worst-case sensitivity. Third, we reveal and explain the vulnerability of CLIP ViTs via a pixel-space Lipschitz bound: The smaller normalization standard deviations used in CLIP preprocessing amplify worst-case sensitivity by up to 1.91 times relative to the Inception-style preprocessing common in supervised ViTs. Our results collectively disentangle robustness into interpretable modules, provide a theory that explains the observed trends, and build practical, plug-and-play guidelines for designing vision models more robust against Gaussian noise.


【12】Nuclear Diffusion Models for Low-Rank Background Suppression in Videos
标题:视频中低等级背景抑制的核扩散模型
链接:https://arxiv.org/abs/2509.20886

作者:.W. Stevens, Oisín Nolan, Jean-Luc Robert, Ruud J.G. van Sloun
备注:5 pages, 4 figures, preprint
摘要:视频序列通常包含结构化噪声和背景伪影,这些伪影模糊了动态内容,对准确分析和恢复提出了挑战。鲁棒主成分方法通过将数据分解为低秩和稀疏分量来解决这个问题。尽管如此,稀疏性假设通常无法捕捉真实视频数据中存在的丰富可变性。为了克服这一限制,提出了一种混合框架,集成了低秩时间建模与扩散后验采样。所提出的方法,核扩散,评估现实世界的医学成像问题,即心脏超声去雾,并证明了改进的去雾性能相比,传统的RPCA有关的对比度增强(gCNR)和信号保存(KS统计)。这些结果突出了将基于模型的时间模型与深度生成先验相结合用于高保真视频恢复的潜力。
摘要:Video sequences often contain structured noise and background artifacts that obscure dynamic content, posing challenges for accurate analysis and restoration. Robust principal component methods address this by decomposing data into low-rank and sparse components. Still, the sparsity assumption often fails to capture the rich variability present in real video data. To overcome this limitation, a hybrid framework that integrates low-rank temporal modeling with diffusion posterior sampling is proposed. The proposed method, Nuclear Diffusion, is evaluated on a real-world medical imaging problem, namely cardiac ultrasound dehazing, and demonstrates improved dehazing performance compared to traditional RPCA concerning contrast enhancement (gCNR) and signal preservation (KS statistic). These results highlight the potential of combining model-based temporal models with deep generative priors for high-fidelity video restoration.


【13】Actively Learning Halfspaces without Synthetic Data
标题:在没有合成数据的情况下积极学习半空间
链接:https://arxiv.org/abs/2509.20848

作者:ack, Kasper Green Larsen, Arya Mazumdar, Barna Saha, Geelon So
摘要:在经典的点定位问题中,给定一个$n$个点的任意数据集$X \subset \mathbb{R}^d$,查询访问未知的半空间$f:\mathbb{R}^d \to \{0,1\}$,目标是学习$X$中每个点的标签。这个问题是非常好的研究和一个接近最优的$\widetilde{O}(d \log n)$查询算法是已知的,由于霍普金斯-凯恩-Lovett-Mahajan(FOCS 2020)。然而,他们的算法被赋予了查询$X$之外的任意点的能力(点合成),事实上,由于Dasgupta(NeurIPS 2004),没有这种能力就存在$\Omega(n)$查询下限。   在这项工作中,我们的目标是设计有效的算法学习半空间没有点合成。为了绕过$\Omega(n)$的下界,我们考虑学习其法向量来自大小为$D$的集合的半空间,并显示$\Theta(D + \log n)$的紧界。作为推论,我们获得了轴对齐半空间的最佳$O(d + \log n)$查询确定性学习器,关闭了之前的$O(d \log n)$与$\Omega(d + \log n)$的差距。事实上,我们的算法解决了更一般的问题,学习布尔函数f$超过$n$元素,这是单调的下至少有一个$D$提供的排序。我们的技术见解是利用这些排序中的结构来并行执行二分搜索,而不是顺序地考虑每个排序,我们相信我们的方法可能会引起更广泛的兴趣。   此外,我们使用我们的精确学习算法,以获得近最佳的PAC学习算法。我们表明,$O(\min(D + \log(1/\varepsilon),1/\varepsilon)\cdot \log D)$查询足以学习$f$误差$\varepsilon$内,即使在设置时,$f$可以在一个$c\varepsilon$-分数的点,一个足够小的常数$c$的adversarially损坏。这个界限是最佳的$\log D$因子,包括在可实现的设置。
摘要 :In the classic point location problem, one is given an arbitrary dataset $X \subset \mathbb{R}^d$ of $n$ points with query access to an unknown halfspace $f : \mathbb{R}^d \to \{0,1\}$, and the goal is to learn the label of every point in $X$. This problem is extremely well-studied and a nearly-optimal $\widetilde{O}(d \log n)$ query algorithm is known due to Hopkins-Kane-Lovett-Mahajan (FOCS 2020). However, their algorithm is granted the power to query arbitrary points outside of $X$ (point synthesis), and in fact without this power there is an $\Omega(n)$ query lower bound due to Dasgupta (NeurIPS 2004).   In this work our goal is to design efficient algorithms for learning halfspaces without point synthesis. To circumvent the $\Omega(n)$ lower bound, we consider learning halfspaces whose normal vectors come from a set of size $D$, and show tight bounds of $\Theta(D + \log n)$. As a corollary, we obtain an optimal $O(d + \log n)$ query deterministic learner for axis-aligned halfspaces, closing a previous gap of $O(d \log n)$ vs. $\Omega(d + \log n)$. In fact, our algorithm solves the more general problem of learning a Boolean function $f$ over $n$ elements which is monotone under at least one of $D$ provided orderings. Our technical insight is to exploit the structure in these orderings to perform a binary search in parallel rather than considering each ordering sequentially, and we believe our approach may be of broader interest.   Furthermore, we use our exact learning algorithm to obtain nearly optimal algorithms for PAC-learning. We show that $O(\min(D + \log(1/\varepsilon), 1/\varepsilon) \cdot \log D)$ queries suffice to learn $f$ within error $\varepsilon$, even in a setting when $f$ can be adversarially corrupted on a $c\varepsilon$-fraction of points, for a sufficiently small constant $c$. This bound is optimal up to a $\log D$ factor, including in the realizable setting.


【14】Aligning Inductive Bias for Data-Efficient Generalization in State Space Models
标题:调整归纳偏差以实现状态空间模型中的数据高效概括
链接:https://arxiv.org/abs/2509.20789

作者:, Guozhang Chen
摘要:大规模模型的显著成功从根本上与标度定律有关,但高质量数据的有限性提出了一个迫在眉睫的挑战。建模的下一个前沿领域之一是数据效率:从更少的数据中学习更多的能力。模型的归纳偏差是实现这一点的关键杠杆,但像状态空间模型(SSM)这样的基础序列模型依赖于固定偏差。当任务的底层结构不匹配时,这种固定的先验是样本效率低下的。在这项工作中,我们介绍了一个原则性的框架来解决这个问题。我们首先通过SSM诱导的内核正式的线性时不变SSM的感应偏置,数学和经验证明其频谱是直接由模型的频率响应。此外,我们还提出了一种任务依赖性训练(TDI)的方法:功率谱匹配,这是一种快速有效的方法,可以在大规模训练之前将模型的归纳偏差与任务的谱特征对齐。我们在一组不同的真实世界的基准测试的实验表明,TDI显着提高泛化和采样效率,特别是在低数据的制度。这项工作提供了一个理论和实践工具,以创建更有效的数据模型,这是迈向可持续扩展的关键一步。
摘要:The remarkable success of large-scale models is fundamentally tied to scaling laws, yet the finite nature of high-quality data presents a looming challenge. One of the next frontiers in modeling is data efficiency: the ability to learn more from less. A model's inductive bias is a critical lever for this, but foundational sequence models like State Space Models (SSMs) rely on a fixed bias. This fixed prior is sample-inefficient when a task's underlying structure does not match. In this work, we introduce a principled framework to solve this problem. We first formalize the inductive bias of linear time-invariant SSMs through an SSM-induced kernel, mathematically and empirically proving its spectrum is directly governed by the model's frequency response. Further, we propose a method of Task-Dependent Initialization (TDI): power spectrum matching, a fast and efficient method that aligns the model's inductive bias with the task's spectral characteristics before large-scale training. Our experiments on a diverse set of real-world benchmarks show that TDI significantly improves generalization and sample efficiency, particularly in low-data regimes. This work provides a theoretical and practical tool to create more data-efficient models, a crucial step towards sustainable scaling.


【15】Sig2Model: A Boosting-Driven Model for Updatable Learned Indexes
标题:Sig2模型:可更新学习索引的助推驱动模型
链接:https://arxiv.org/abs/2509.20781

作者:eidari, Amirhossein Ahmad, Wei Zhang, Ying Xiong
备注:22 pages, 11 figures
摘要:学习索引(LI)代表了传统索引结构的范式转变,它采用机器学习模型来近似排序数据的累积分布函数(CDF)。虽然LI在静态数据集上实现了显着的效率,但它们的性能在动态更新下会下降:保持CDF不变(F(k)的总和等于1)需要全局模型重新训练,这会阻止查询并限制每秒查询(QPS)度量。目前的方法无法有效地解决这些再培训成本,使其不适合频繁更新的现实工作负载。Sig 2 Model是一种高效的自适应学习索引,它通过三个关键技术来最小化再训练成本:(1)sigmoid boosting逼近技术,通过局部sigmoid函数逼近更新引起的数据分布变化来动态调整索引模型,同时保持有界误差保证并推迟完全再训练;(2)通过高斯混合模型(GMM)进行主动更新训练,该模型识别用于战略占位符分配的高更新概率区域以加速更新;以及(3)神经联合优化框架,该框架通过基于梯度的学习不断改进sigmoid系综和GMM参数。我们针对现实世界和合成工作负载的最先进的可更新学习索引评估了Sig 2 Model,并表明Sig 2 Model将再训练成本降低了20倍,QPS提高了3倍,内存使用量减少了1000倍。
摘要:Learned Indexes (LIs) represent a paradigm shift from traditional index structures by employing machine learning models to approximate the cumulative distribution function (CDF) of sorted data. While LIs achieve remarkable efficiency for static datasets, their performance degrades under dynamic updates: maintaining the CDF invariant (sum of F(k) equals 1) requires global model retraining, which blocks queries and limits the queries-per-second (QPS) metric. Current approaches fail to address these retraining costs effectively, rendering them unsuitable for real-world workloads with frequent updates. In this paper, we present Sig2Model, an efficient and adaptive learned index that minimizes retraining cost through three key techniques: (1) a sigmoid boosting approximation technique that dynamically adjusts the index model by approximating update-induced shifts in data distribution with localized sigmoid functions while preserving bounded error guarantees and deferring full retraining; (2) proactive update training via Gaussian mixture models (GMMs) that identifies high-update-probability regions for strategic placeholder allocation to speed up updates; and (3) a neural joint optimization framework that continuously refines both the sigmoid ensemble and GMM parameters via gradient-based learning. We evaluate Sig2Model against state-of-the-art updatable learned indexes on real-world and synthetic workloads, and show that Sig2Model reduces retraining cost by up to 20x, achieves up to 3x higher QPS, and uses up to 1000x less memory.


【16】Cryptographic Backdoor for Neural Networks: Boon and Bane
标题:神经网络的密码后门:布恩和贝恩
链接:https://arxiv.org/abs/2509.20714

作者:o, Anupam Chattopadhyay, Subhamoy Maitra
备注:Preprint
摘要:在本文中,我们表明,神经网络(NN)中的密码后门可以在两个方向上非常有效,即安装攻击以及提出防御。在攻击方面,精心种植的密码后门可以对NN进行强大而无形的攻击。考虑到防御,我们目前的应用:第一,可证明的强大NN水印方案;第二,用于保证用户身份验证的协议;第三,用于跟踪未经授权的共享NN知识产权(IP)的协议。从一个更广泛的理论视角,借用Goldwasser et. [FOCS 2022],我们的主要贡献是表明,所有这些实例化的实际协议实现是可证明的鲁棒性。水印,身份验证和IP跟踪协议抵抗对手的黑盒访问NN,而后门启用的对抗性攻击是不可能防止的标准假设下。虽然用于我们的攻击的理论工具主要是在符合Goldwasser等。但是,与辩护相关的证据问题还有待进一步研究。最后,所有这些协议都实现了国家的最先进的神经网络架构与实证结果证实了理论主张。此外,人们可以利用后量子原语来实现加密后门,为机器学习(ML)中的量子时代应用奠定基础。
摘要:In this paper we show that cryptographic backdoors in a neural network (NN) can be highly effective in two directions, namely mounting the attacks as well as in presenting the defenses as well. On the attack side, a carefully planted cryptographic backdoor enables powerful and invisible attack on the NN. Considering the defense, we present applications: first, a provably robust NN watermarking scheme; second, a protocol for guaranteeing user authentication; and third, a protocol for tracking unauthorized sharing of the NN intellectual property (IP). From a broader theoretical perspective, borrowing the ideas from Goldwasser et. al. [FOCS 2022], our main contribution is to show that all these instantiated practical protocol implementations are provably robust. The protocols for watermarking, authentication and IP tracking resist an adversary with black-box access to the NN, whereas the backdoor-enabled adversarial attack is impossible to prevent under the standard assumptions. While the theoretical tools used for our attack is mostly in line with the Goldwasser et. al. ideas, the proofs related to the defense need further studies. Finally, all these protocols are implemented on state-of-the-art NN architectures with empirical results corroborating the theoretical claims. Further, one can utilize post-quantum primitives for implementing the cryptographic backdoors, laying out foundations for quantum-era applications in machine learning (ML).


【17】Learning to Align Molecules and Proteins: A Geometry-Aware Approach to Binding Affinity
标题:学习对齐分子和蛋白质:一种具有几何意识的结合亲和力方法
链接:https://arxiv.org/abs/2509.20693

作者:aleh Refahi, Bahrad A. Sokhansanj, James R. Brown, Gail Rosen
备注:10pages,2 figures
摘要:准确预测药物-靶标结合亲和力可以通过在昂贵的湿实验室筛选之前优先考虑有希望的化合物来加速药物发现。虽然深度学习已经推进了这一任务,但大多数模型通过简单的连接融合配体和蛋白质表示,缺乏明确的几何正则化,导致跨化学空间和时间的泛化能力差。我们介绍FIRM-DTI,一个轻量级的框架,条件分子嵌入的蛋白质嵌入通过一个功能的线性调制(薄膜)层,并强制执行度量结构与三重态损失。RBF回归头对嵌入距离进行操作,产生平滑,可解释的亲和力预测。尽管尺寸不大,但FIRM-DTI在治疗学数据共享DTI-DG基准测试中达到了最先进的性能,这一点已通过广泛的消融研究和域外评估得到证实。我们的研究结果强调了调节和度量学习对稳健的药物靶点亲和力预测的价值。
摘要:Accurate prediction of drug-target binding affinity can accelerate drug discovery by prioritizing promising compounds before costly wet-lab screening. While deep learning has advanced this task, most models fuse ligand and protein representations via simple concatenation and lack explicit geometric regularization, resulting in poor generalization across chemical space and time. We introduce FIRM-DTI, a lightweight framework that conditions molecular embeddings on protein embeddings through a feature-wise linear modulation (FiLM) layer and enforces metric structure with a triplet loss. An RBF regression head operating on embedding distances yields smooth, interpretable affinity predictions. Despite its modest size, FIRM-DTI achieves state-of-the-art performance on the Therapeutics Data Commons DTI-DG benchmark, as demonstrated by an extensive ablation study and out-of-domain evaluation. Our results underscore the value of conditioning and metric learning for robust drug-target affinity prediction.


【18】Theoretical Bounds for Stable In-Context Learning
标题:稳定上下文学习的理论界限
链接:https://arxiv.org/abs/2509.20677

作者:ng, Zhuoyang Xia
摘要:情境学习(ICL)是灵活的,但其可靠性是高度敏感的提示长度。本文建立了一个非渐近下界,链接的最小数量的演示ICL稳定性在固定的高维次高斯表示。该界给出了显式的充分条件的谱性质的协方差,提供了一个可计算的标准的实践。在此分析的基础上,我们提出了一个两阶段的可观察估计与一杆校准,产生的mixtioner-ready的长度估计没有分布先验。不同数据集、编码器和生成器的实验表明,预测阈值与经验拐点之间的关系非常密切,理论作为保守但可靠的上限;校准变体进一步缩小了这一差距。这些结果连接光谱覆盖稳定的ICL,桥梁理论和部署,并提高了可解释性和可靠性的大规模提示在现实的有限样本制度。
摘要:In-context learning (ICL) is flexible but its reliability is highly sensitive to prompt length. This paper establishes a non-asymptotic lower bound that links the minimal number of demonstrations to ICL stability under fixed high-dimensional sub-Gaussian representations. The bound gives explicit sufficient conditions in terms of spectral properties of the covariance, providing a computable criterion for practice. Building on this analysis, we propose a two-stage observable estimator with a one-shot calibration that produces practitioner-ready prompt-length estimates without distributional priors. Experiments across diverse datasets, encoders, and generators show close alignment between the predicted thresholds and empirical knee-points, with the theory acting as a conservative but reliable upper bound; the calibrated variant further tightens this gap. These results connect spectral coverage to stable ICL, bridge theory and deployment, and improve the interpretability and reliability of large-scale prompting in realistic finite-sample regimes.


【19】Policy Compatible Skill Incremental Learning via Lazy Learning Interface
标题:通过懒惰学习界面的策略兼容技能增量学习
链接:https://arxiv.org/abs/2509.20612

作者 :e, Dongsu Lee, TaeYoon Kwack, Wonje Choi, Honguk Woo
摘要:技能增量学习(SIL)是一个过程,通过这个过程,一个具体的代理扩展和完善其技能集随着时间的推移,通过利用经验,通过与其环境的交互或通过集成额外的数据。SIL促进了基于下游任务的可重用技能的分层策略的有效获取。然而,随着技能库的发展,它可能会破坏与现有基于技能的策略的兼容性,限制其可重用性和通用性。在这项工作中,我们提出了SIL-C,一个新的框架,确保技能政策的兼容性,允许在增量学习技能的改进,以提高下游政策的性能,而不需要政策重新培训或结构调整。SIL-C采用了一种基于双向懒惰学习的映射技术,将策略引用的子任务空间与解码为Agent行为的技能空间动态对齐。这使得从策略对复杂任务的分解中导出的每个子任务能够通过基于轨迹分布相似性选择适当的技能来执行。我们在不同的SIL场景中评估SIL-C,并证明它保持了不断发展的技能和下游政策之间的兼容性,同时确保了整个学习过程的效率。
摘要:Skill Incremental Learning (SIL) is the process by which an embodied agent expands and refines its skill set over time by leveraging experience gained through interaction with its environment or by the integration of additional data. SIL facilitates efficient acquisition of hierarchical policies grounded in reusable skills for downstream tasks. However, as the skill repertoire evolves, it can disrupt compatibility with existing skill-based policies, limiting their reusability and generalization. In this work, we propose SIL-C, a novel framework that ensures skill-policy compatibility, allowing improvements in incrementally learned skills to enhance the performance of downstream policies without requiring policy re-training or structural adaptation. SIL-C employs a bilateral lazy learning-based mapping technique to dynamically align the subtask space referenced by policies with the skill space decoded into agent behaviors. This enables each subtask, derived from the policy's decomposition of a complex task, to be executed by selecting an appropriate skill based on trajectory distribution similarity. We evaluate SIL-C across diverse SIL scenarios and demonstrate that it maintains compatibility between evolving skills and downstream policies while ensuring efficiency throughout the learning process.


【20】TSKAN: Interpretable Machine Learning for QoE modeling over Time Series Data
标题:TSKAN:用于时间序列数据QOE建模的可解释机器学习
链接:https://arxiv.org/abs/2509.20595

作者:gh, Priyanka Rawat, Sami Marouani, Baptiste Jeudy
摘要:体验质量(QoE)建模对于优化视频流服务以捕获不同功能和用户体验之间的复杂关系至关重要。我们提出了一种新的方法,在视频流应用中使用可解释的机器学习(ML)技术对原始时间序列数据进行QoE建模。与传统的黑盒方法不同,我们的方法将Kolmogorov-Arnold网络(KAN)作为紧凑的频域特征之上的可解释的读出,使我们能够捕获时间信息,同时保留透明和可解释的模型。我们在流行的数据集上评估了我们的方法,并证明了它在QoE预测中的准确性,同时提供了透明度和可解释性。
摘要:Quality of Experience (QoE) modeling is crucial for optimizing video streaming services to capture the complex relationships between different features and user experience. We propose a novel approach to QoE modeling in video streaming applications using interpretable Machine Learning (ML) techniques over raw time series data. Unlike traditional black-box approaches, our method combines Kolmogorov-Arnold Networks (KANs) as an interpretable readout on top of compact frequency-domain features, allowing us to capture temporal information while retaining a transparent and explainable model. We evaluate our method on popular datasets and demonstrate its enhanced accuracy in QoE prediction, while offering transparency and interpretability.


【21】The Sensitivity of Variational Bayesian Neural Network Performance to Hyperparameters
标题:变分Bayesian神经网络性能对超参数的敏感性
链接:https://arxiv.org/abs/2509.20574

作者:rmer, Natalie Klein
备注:18 pages, 6 figures
摘要:在科学应用中,如果没有准确的不确定性量化(UQ)来指示模型何时可以外推或何时需要收集更多数据,预测建模的使用通常是有限的。贝叶斯神经网络(BNN)通过在神经网络(NN)权重中传播不确定性来产生预测不确定性,并提供了不仅获得准确预测模型而且获得准确UQ的承诺。然而,在实践中,使用BNN获得准确的UQ是困难的,部分原因是用于实际模型训练的近似值,部分原因是需要选择一组合适的超参数;这些超参数的数量超过了传统NN所需的数量,并且通常对结果产生不透明的影响。我们的目标是揭示超参数选择的BNN的影响,通过执行不同的超参数设置下的BNN性能的全局灵敏度分析。我们的研究结果表明,许多超参数相互作用,影响预测精度和UQ。为了改进BNN在实际应用中的使用,我们建议使用全局敏感性分析或贝叶斯优化等相关方法来帮助降维和选择超参数,以确保BNN中准确的UQ。
摘要:In scientific applications, predictive modeling is often of limited use without accurate uncertainty quantification (UQ) to indicate when a model may be extrapolating or when more data needs to be collected. Bayesian Neural Networks (BNNs) produce predictive uncertainty by propagating uncertainty in neural network (NN) weights and offer the promise of obtaining not only an accurate predictive model but also accurate UQ. However, in practice, obtaining accurate UQ with BNNs is difficult due in part to the approximations used for practical model training and in part to the need to choose a suitable set of hyperparameters; these hyperparameters outnumber those needed for traditional NNs and often have opaque effects on the results. We aim to shed light on the effects of hyperparameter choices for BNNs by performing a global sensitivity analysis of BNN performance under varying hyperparameter settings. Our results indicate that many of the hyperparameters interact with each other to affect both predictive accuracy and UQ. For improved usage of BNNs in real-world applications, we suggest that global sensitivity analysis, or related methods such as Bayesian optimization, should be used to aid in dimensionality reduction and selection of hyperparameters to ensure accurate UQ in BNNs.


【22】PIRF: Physics-Informed Reward Fine-Tuning for Diffusion Models
标题:PIRF:扩散模型的物理知情奖励微调
链接:https://arxiv.org/abs/2509.20570

作者:an, Pengfei Jin, Na Li, Quanzheng Li
备注:18 pages, 6 figures; NeurIPS 2025 AI for science workshop
摘要:扩散模型在科学领域表现出强大的生成能力,但往往产生违反物理定律的输出。我们提出了一个新的视角,框架物理知情的一代作为一个稀疏的奖励优化问题,坚持物理约束被视为一个奖励信号。该公式将先前的方法统一在基于奖励的范式下,并揭示了一个共同的瓶颈:依赖于扩散后验采样(DPS)风格的值函数近似,这引入了不可忽略的错误,并导致训练不稳定和推理效率低下。为了克服这一点,我们引入了物理信息奖励微调(PIRF),这是一种通过计算概率级奖励并直接反向传播其梯度来绕过值近似的方法。然而,天真的实现遭受低采样效率和妥协的数据保真度。PIRF通过两个关键策略缓解了这些问题:(1)逐层截断反向传播方法,该方法利用了基于物理的奖励的时空局部化性质,以及(2)基于权重的正则化方案,该方案比传统的基于蒸馏的方法提高了效率。在五个PDE基准中,PIRF在有效的采样机制下始终实现了卓越的物理执行,突出了奖励微调对推进科学生成建模的潜力。
摘要:Diffusion models have demonstrated strong generative capabilities across scientific domains, but often produce outputs that violate physical laws. We propose a new perspective by framing physics-informed generation as a sparse reward optimization problem, where adherence to physical constraints is treated as a reward signal. This formulation unifies prior approaches under a reward-based paradigm and reveals a shared bottleneck: reliance on diffusion posterior sampling (DPS)-style value function approximations, which introduce non-negligible errors and lead to training instability and inference inefficiency. To overcome this, we introduce Physics-Informed Reward Fine-tuning (PIRF), a method that bypasses value approximation by computing trajectory-level rewards and backpropagating their gradients directly. However, a naive implementation suffers from low sample efficiency and compromised data fidelity. PIRF mitigates these issues through two key strategies: (1) a layer-wise truncated backpropagation method that leverages the spatiotemporally localized nature of physics-based rewards, and (2) a weight-based regularization scheme that improves efficiency over traditional distillation-based methods. Across five PDE benchmarks, PIRF consistently achieves superior physical enforcement under efficient sampling regimes, highlighting the potential of reward fine-tuning for advancing scientific generative modeling.


【23】MDBench: Benchmarking Data-Driven Methods for Model Discovery
标题:MDBench:对模型发现的数据驱动方法进行基准测试
链接:https://arxiv.org/abs/2509.20529

作者:mad Ziaei Bideh, Aleksandra Georgievska, Jonathan Gryak
摘要:模型发现的目的是直接从实验数据中揭示动力系统的控制微分方程。对这些方法进行基准测试对于跟踪进展情况和了解实地的利弊至关重要。虽然以前的努力主要集中在识别单一的方程,通常框架为符号回归,仍然缺乏全面的基准发现动态模型。为了解决这个问题,我们引入MDBench,一个开源的基准框架,用于评估动态系统的模型发现方法。MDBench评估了14个偏微分方程(PDE)和63个常微分方程(ODE)在不同噪声水平下的12种算法。评估指标包括导数预测精度、模型复杂度和方程保真度。我们还介绍了七个具有挑战性的PDE系统从流体动力学和热力学,揭示了当前方法的关键局限性。我们的研究结果表明,线性方法和遗传编程方法实现最低的预测误差偏微分方程和常微分方程,分别。此外,线性模型通常对噪声更具鲁棒性。MDBench通过提供严格的、可扩展的基准测试框架和丰富多样的动态系统数据集,加速了模型发现方法的进步,从而实现了方程准确性和鲁棒性的系统评估、比较和改进。
摘要:Model discovery aims to uncover governing differential equations of dynamical systems directly from experimental data. Benchmarking such methods is essential for tracking progress and understanding trade-offs in the field. While prior efforts have focused mostly on identifying single equations, typically framed as symbolic regression, there remains a lack of comprehensive benchmarks for discovering dynamical models. To address this, we introduce MDBench, an open-source benchmarking framework for evaluating model discovery methods on dynamical systems. MDBench assesses 12 algorithms on 14 partial differential equations (PDEs) and 63 ordinary differential equations (ODEs) under varying levels of noise. Evaluation metrics include derivative prediction accuracy, model complexity, and equation fidelity. We also introduce seven challenging PDE systems from fluid dynamics and thermodynamics, revealing key limitations in current methods. Our findings illustrate that linear methods and genetic programming methods achieve the lowest prediction error for PDEs and ODEs, respectively. Moreover, linear models are in general more robust against noise. MDBench accelerates the advancement of model discovery methods by offering a rigorous, extensible benchmarking framework and a rich, diverse collection of dynamical system datasets, enabling systematic evaluation, comparison, and improvement of equation accuracy and robustness.


【24】mloz: A Highly Efficient Machine Learning-Based Ozone Parameterization for Climate Sensitivity Simulations
标题:mloz:一种用于气候敏感性模拟的高效机器学习臭氧参数化
链接:https://arxiv.org/abs/2509.20422

作者:, Nathan Luke Abraham, Stefan Versick, Roland Ruhnke, Andrea Schneidereit, Ulrike Niemeier, Felix Back, Peter Braesicke, Peer Nowack
摘要:大气臭氧是太阳辐射的重要吸收剂,也是一种重要的温室气体。然而,由于大气化学方案的计算成本很高,大多数参与耦合模型相互比较项目(CMIP)的气候模型仍然缺乏臭氧的交互式表示。在这里,我们介绍了一种机器学习参数化(mloz),以交互式方式模拟标准气候敏感性模拟中对流层和平流层的每日臭氧变化和趋势,包括臭氧与准两年振荡的双向相互作用。我们展示了它的高保真度在十年的时间尺度和它的灵活使用在线在两个不同的气候模型-英国地球系统模型(UKESM)和德国ICOsahedral非静力(ICON)模型。以大气温度廓线信息作为唯一输入,mloz产生稳定的臭氧预测比UKESM中的化学方案快约31倍,占各自气候模式总运行时间的不到4%。特别是,我们还证明了它的可移植性,不同的气候模式没有化学方案转移参数化从UKESM到ICON。这突出表明了广泛采用CMIP级气候模型的潜力,这些模型在未来的气候变化评估中缺乏互动化学作用,特别是在侧重于气候敏感性模拟时,因为已知臭氧趋势和变异性会显著调节大气反馈过程。
摘要 :Atmospheric ozone is a crucial absorber of solar radiation and an important greenhouse gas. However, most climate models participating in the Coupled Model Intercomparison Project (CMIP) still lack an interactive representation of ozone due to the high computational costs of atmospheric chemistry schemes. Here, we introduce a machine learning parameterization (mloz) to interactively model daily ozone variability and trends across the troposphere and stratosphere in standard climate sensitivity simulations, including two-way interactions of ozone with the Quasi-Biennial Oscillation. We demonstrate its high fidelity on decadal timescales and its flexible use online across two different climate models -- the UK Earth System Model (UKESM) and the German ICOsahedral Nonhydrostatic (ICON) model. With atmospheric temperature profile information as the only input, mloz produces stable ozone predictions around 31 times faster than the chemistry scheme in UKESM, contributing less than 4 percent of the respective total climate model runtimes. In particular, we also demonstrate its transferability to different climate models without chemistry schemes by transferring the parameterization from UKESM to ICON. This highlights the potential for widespread adoption in CMIP-level climate models that lack interactive chemistry for future climate change assessments, particularly when focusing on climate sensitivity simulations, where ozone trends and variability are known to significantly modulate atmospheric feedback processes.


【25】A Theory of Multi-Agent Generative Flow Networks
标题:多主体生成流网络理论
链接:https://arxiv.org/abs/2509.20408

作者:e Brunswic, Haozhi Wang, Shuang Luo, Jianye Hao, Amir Rasouli, Yinchuan Li
备注:Accepted at SPIGM Workshop NeurIPS 2025
摘要:生成流网络利用流匹配损失来学习用于从动作序列生成对象的随机策略,使得生成模式的概率可以与相应的给定奖励成比例。然而,一个多智能体生成流网络(MA-GFlowNets)的理论框架尚未提出。本文提出了MA-GFlowNets的理论框架,该框架可以应用于多个Agent,通过一系列的联合动作协同生成对象。我们进一步提出了四种算法:用于集中训练MA-GFlowNets的集中流网络,用于分散执行的独立流网络,用于实现分散执行的集中训练的联合流网络,以及其更新的条件版本。联合流训练基于局部-全局原则,允许训练(局部)GFN的集合作为唯一的(全局)GFN。这个原则提供了一个合理的复杂性损失,并允许利用GFN上的通常结果提供理论保证,独立的政策产生的样本的概率成比例的奖励函数。实验结果表明,所提出的框架相比,强化学习和MCMC为基础的方法的优越性。
摘要:Generative flow networks utilize a flow-matching loss to learn a stochastic policy for generating objects from a sequence of actions, such that the probability of generating a pattern can be proportional to the corresponding given reward. However, a theoretical framework for multi-agent generative flow networks (MA-GFlowNets) has not yet been proposed. In this paper, we propose the theory framework of MA-GFlowNets, which can be applied to multiple agents to generate objects collaboratively through a series of joint actions. We further propose four algorithms: a centralized flow network for centralized training of MA-GFlowNets, an independent flow network for decentralized execution, a joint flow network for achieving centralized training with decentralized execution, and its updated conditional version. Joint Flow training is based on a local-global principle allowing to train a collection of (local) GFN as a unique (global) GFN. This principle provides a loss of reasonable complexity and allows to leverage usual results on GFN to provide theoretical guarantees that the independent policies generate samples with probability proportional to the reward function. Experimental results demonstrate the superiority of the proposed framework compared to reinforcement learning and MCMC-based methods.


【26】Philosophy-informed Machine Learning
标题:基于哲学的机器学习
链接:https://arxiv.org/abs/2509.20370

作者
摘要:Philosophy-informed machine learning(PhIML)将分析哲学的核心思想直接注入ML模型架构、目标和评估协议。因此,PhIML通过尊重哲学概念和设计价值观的模型承诺了新的功能。从这个角度,本文回顾了概念基础,以证明哲学的收获和调整。此外,我们还介绍了ML用户/设计人员如何采用PhIML作为不可知的事后工具或将其构建到ML模型架构中的案例研究。最后,本文揭示了开放的技术障碍以及哲学,实践和治理挑战,并概述了安全,哲学意识和道德责任的PhIML的研究路线图。
摘要:Philosophy-informed machine learning (PhIML) directly infuses core ideas from analytic philosophy into ML model architectures, objectives, and evaluation protocols. Therefore, PhIML promises new capabilities through models that respect philosophical concepts and values by design. From this lens, this paper reviews conceptual foundations to demonstrate philosophical gains and alignment. In addition, we present case studies on how ML users/designers can adopt PhIML as an agnostic post-hoc tool or intrinsically build it into ML model architectures. Finally, this paper sheds light on open technical barriers alongside philosophical, practical, and governance challenges and outlines a research roadmap toward safe, philosophy-aware, and ethically responsible PhIML.


【27】Response to Promises and Pitfalls of Deep Kernel Learning
标题:对深度核心学习的承诺和陷阱的回应
链接:https://arxiv.org/abs/2509.21228

作者:rdon Wilson, Zhiting Hu, Ruslan Salakhutdinov, Eric P. Xing
摘要:本文是对“深度内核学习的承诺和陷阱”(Ober et al.,2021年)。高斯过程的边际似然可以划分为数据拟合项和复杂性惩罚。Ober等人(2021)表明,如果核可以乘以信号方差系数,那么重新参数化并代入该参数的最大值将重新参数化的数据拟合项设置为固定值。他们利用这一发现来论证复杂性惩罚,即核矩阵的对数行列式,然后在确定核超参数的其他值时占主导地位,这可能导致数据过相关。相比之下,我们表明,重新参数化实际上引入了另一个影响所有其他内核超参数的数据拟合项。因此,数据拟合和复杂性之间的平衡在确定内核超参数方面仍然起着重要作用。
摘要:This note responds to "Promises and Pitfalls of Deep Kernel Learning" (Ober et al., 2021). The marginal likelihood of a Gaussian process can be compartmentalized into a data fit term and a complexity penalty. Ober et al. (2021) shows that if a kernel can be multiplied by a signal variance coefficient, then reparametrizing and substituting in the maximized value of this parameter sets a reparametrized data fit term to a fixed value. They use this finding to argue that the complexity penalty, a log determinant of the kernel matrix, then dominates in determining the other values of kernel hyperparameters, which can lead to data overcorrelation. By contrast, we show that the reparametrization in fact introduces another data-fit term which influences all other kernel hyperparameters. Thus, a balance between data fit and complexity still plays a significant role in determining kernel hyperparameters.


【28】Data-driven Neural Networks for Windkessel Parameter Calibration
标题:用于Windkessel参数校准的数据驱动神经网络
链接 :https://arxiv.org/abs/2509.21206

作者:Hoock, Tobias Köppl
备注:32 pages, 15 figures, for associated git see this https URL, submitted to International Journal for Numerical Methods in Biomedical Engineering
摘要:In this work, we propose a novel method for calibrating Windkessel (WK) parameters in a dimensionally reduced 1D-0D coupled blood flow model. To this end, we design a data-driven neural network (NN)trained on simulated blood pressures in the left brachial artery. Once trained, the NN emulates the pressure pulse waves across the entire simulated domain, i.e., over time, space and varying WK parameters, with negligible error and computational effort. To calibrate the WK parameters on a measured pulse wave, the NN is extended by dummy neurons and retrained only on these. The main objective of this work is to assess the effectiveness of the method in various scenarios -- particularly, when the exact measurement location is unknown or the data are affected by noise.


【29】Physics Informed Neural Networks for design optimisation of diamond particle detectors for charged particle fast-tracking at high luminosity hadron colliders
标题:物理知识神经网络用于金刚石粒子探测器的设计优化,以在高亮度强子对撞机上实现带电粒子快速跟踪
链接:https://arxiv.org/abs/2509.21123

作者:o Bombini, Alessandro Rosa, Clarissa Buti, Giovanni Passaleva, Lucio Anderlini
备注:9 pages; 3 figures; conference paper submitted to EUCAIFCON 2025
摘要:Future high-luminosity hadron colliders demand tracking detectors with extreme radiation tolerance, high spatial precision, and sub-nanosecond timing. 3D diamond pixel sensors offer these capabilities due to diamond's radiation hardness and high carrier mobility. Conductive electrodes, produced via femtosecond IR laser pulses, exhibit high resistivity that delays signal propagation. This effect necessitates extending the classical Ramo-Shockley weighting potential formalism. We model the phenomenon through a 3rd-order, 3+1D PDE derived as a quasi-stationary approximation of Maxwell's equations. The PDE is solved numerically and coupled with charge transport simulations for realistic 3D sensor geometries. A Mixture-of-Experts Physics-Informed Neural Network, trained on Spectral Method data, provides a meshless solver to assess timing degradation from electrode resistance.


【30】RAPTOR-GEN: RApid PosTeriOR GENerator for Bayesian Learning in Biomanufacturing
标题:RAPTOR-Gen:生物制造中Bayesian学习的Rapid PosTeriOR GENerator
链接:https://arxiv.org/abs/2509.20753

作者: Wei Xie
备注:80 pages, 6 figures
摘要:Biopharmaceutical manufacturing is vital to public health but lacks the agility for rapid, on-demand production of biotherapeutics due to the complexity and variability of bioprocesses. To overcome this, we introduce RApid PosTeriOR GENerator (RAPTOR-GEN), a mechanism-informed Bayesian learning framework designed to accelerate intelligent digital twin development from sparse and heterogeneous experimental data. This framework is built on a multi-scale probabilistic knowledge graph (pKG), formulated as a stochastic differential equation (SDE)-based foundational model that captures the nonlinear dynamics of bioprocesses. RAPTOR-GEN consists of two ingredients: (i) an interpretable metamodel integrating linear noise approximation (LNA) that exploits the structural information of bioprocessing mechanisms and a sequential learning strategy to fuse heterogeneous and sparse data, enabling inference of latent state variables and explicit approximation of the intractable likelihood function; and (ii) an efficient Bayesian posterior sampling method that utilizes Langevin diffusion (LD) to accelerate posterior exploration by exploiting the gradients of the derived likelihood. It generalizes the LNA approach to circumvent the challenge of step size selection, facilitating robust learning of mechanistic parameters with provable finite-sample performance guarantees. We develop a fast and robust RAPTOR-GEN algorithm with controllable error. Numerical experiments demonstrate its effectiveness in uncovering the underlying regulatory mechanisms of biomanufacturing processes.


【31】Neural Networks as Surrogate Solvers for Time-Dependent Accretion Disk Dynamics
标题:神经网络作为时间相关吸积盘动力学的替代求解器
链接:https://arxiv.org/abs/2509.20447

作者:Mao, Weiqi Wang, Sifan Wang, Ruobing Dong, Lu Lu, Kwang Moo Yi, Paris Perdikaris, Andrea Isella, Sébastien Fabbro, Lile Wang
备注:Astrophysical Journal Letters accepted; associate animations are available at this https URL
摘要:Accretion disks are ubiquitous in astrophysics, appearing in diverse environments from planet-forming systems to X-ray binaries and active galactic nuclei. Traditionally, modeling their dynamics requires computationally intensive (magneto)hydrodynamic simulations. Recently, Physics-Informed Neural Networks (PINNs) have emerged as a promising alternative. This approach trains neural networks directly on physical laws without requiring data. We for the first time demonstrate PINNs for solving the two-dimensional, time-dependent hydrodynamics of non-self-gravitating accretion disks. Our models provide solutions at arbitrary times and locations within the training domain, and successfully reproduce key physical phenomena, including the excitation and propagation of spiral density waves and gap formation from disk-companion interactions. Notably, the boundary-free approach enabled by PINNs naturally eliminates the spurious wave reflections at disk edges, which are challenging to suppress in numerical simulations. These results highlight how advanced machine learning techniques can enable physics-driven, data-free modeling of complex astrophysical systems, potentially offering an alternative to traditional numerical simulations in the future.


其他(42篇)

【1】RLBFF: Binary Flexible Feedback to bridge between Human Feedback & Verifiable Rewards
标题:WLBFF:二元灵活反馈,在人类反馈和可验证奖励之间架起桥梁
链接:https://arxiv.org/abs/2509.21319

作者:ng, Jiaqi Zeng, Olivier Delalleau, Ellie Evans, Daniel Egert, Hoo-Chang Shin, Felipe Soares, Yi Dong, Oleksii Kuchaiev
摘要:Reinforcement Learning with Human Feedback (RLHF) and Reinforcement Learning with Verifiable Rewards (RLVR) are the main RL paradigms used in LLM post-training, each offering distinct advantages. However, RLHF struggles with interpretability and reward hacking because it relies on human judgments that usually lack explicit criteria, whereas RLVR is limited in scope by its focus on correctness-based verifiers. We propose Reinforcement Learning with Binary Flexible Feedback (RLBFF), which combines the versatility of human-driven preferences with the precision of rule-based verification, enabling reward models to capture nuanced aspects of response quality beyond mere correctness. RLBFF extracts principles that can be answered in a binary fashion (e.g. accuracy of information: yes, or code readability: no) from natural language feedback. Such principles can then be used to ground Reward Model training as an entailment task (response satisfies or does not satisfy an arbitrary principle). We show that Reward Models trained in this manner can outperform Bradley-Terry models when matched for data and achieve top performance on RM-Bench (86.2%) and JudgeBench (81.4%, #1 on leaderboard as of September 24, 2025). Additionally, users can specify principles of interest at inference time to customize the focus of our reward models, in contrast to Bradley-Terry models. Finally, we present a fully open source recipe (including data) to align Qwen3-32B using RLBFF and our Reward Model, to match or exceed the performance of o3-mini and DeepSeek R1 on general alignment benchmarks of MT-Bench, WildBench, and Arena Hard v2 (at <5% of the inference cost).


【2】Does FLUX Already Know How to Perform Physically Plausible Image Composition?
标题:FLUX是否已经知道如何进行物理上合理的图像合成?
链接:https://arxiv.org/abs/2509.21278

作者:, Zhuming Lian, Zihan Zhou, Shaocong Zhang, Chen Zhao, Adams Wai-Kin Kong
备注:Preprint
摘要:Image composition aims to seamlessly insert a user-specified object into a new scene, but existing models struggle with complex lighting (e.g., accurate shadows, water reflections) and diverse, high-resolution inputs. Modern text-to-image diffusion models (e.g., SD3.5, FLUX) already encode essential physical and resolution priors, yet lack a framework to unleash them without resorting to latent inversion, which often locks object poses into contextually inappropriate orientations, or brittle attention surgery. We propose SHINE, a training-free framework for Seamless, High-fidelity Insertion with Neutralized Errors. SHINE introduces manifold-steered anchor loss, leveraging pretrained customization adapters (e.g., IP-Adapter) to guide latents for faithful subject representation while preserving background integrity. Degradation-suppression guidance and adaptive background blending are proposed to further eliminate low-quality outputs and visible seams. To address the lack of rigorous benchmarks, we introduce ComplexCompo, featuring diverse resolutions and challenging conditions such as low lighting, strong illumination, intricate shadows, and reflective surfaces. Experiments on ComplexCompo and DreamEditBench show state-of-the-art performance on standard metrics (e.g., DINOv2) and human-aligned scores (e.g., DreamSim, ImageReward, VisionReward). Code and benchmark will be publicly available upon publication.


【3】Federated Flow Matching
标题:联邦流匹配
链接:https://arxiv.org/abs/2509.21250

作者:g, Anqi Dong, Mahmoud Selim, Michael M. Zavlanos, Karl H. Johansson
摘要:Data today is decentralized, generated and stored across devices and institutions where privacy, ownership, and regulation prevent centralization. This motivates the need to train generative models directly from distributed data locally without central aggregation. In this paper, we introduce Federated Flow Matching (FFM), a framework for training flow matching models under privacy constraints. Specifically, we first examine FFM-vanilla, where each client trains locally with independent source and target couplings, preserving privacy but yielding curved flows that slow inference. We then develop FFM-LOT, which employs local optimal transport couplings to improve straightness within each client but lacks global consistency under heterogeneous data. Finally, we propose FFM-GOT, a federated strategy based on the semi-dual formulation of optimal transport, where a shared global potential function coordinates couplings across clients. Experiments on synthetic and image datasets show that FFM enables privacy-preserving training while enhancing both the flow straightness and sample quality in federated settings, with performance comparable to the centralized baseline.


【4】Can Less Precise Be More Reliable? A Systematic Evaluation of Quantization's Impact on CLIP Beyond Accuracy
标题:不太精确可以更可靠吗?系统评估量化对CLIP超出准确性的影响
链接:https://arxiv.org/abs/2509.21173

作者:guerra, Daniel Montoya, Alexandra Gomez-Villa, Fabio Arnez, Chokri Mraidha
摘要:The powerful zero-shot generalization capabilities of vision-language models (VLMs) like CLIP have enabled new paradigms for safety-related tasks such as out-of-distribution (OOD) detection. However, additional aspects crucial for the computationally efficient and reliable deployment of CLIP are still overlooked. In particular, the impact of quantization on CLIP's performance beyond accuracy remains underexplored. This work presents a large-scale evaluation of quantization on CLIP models, assessing not only in-distribution accuracy but a comprehensive suite of reliability metrics and revealing counterintuitive results driven by pre-training source. We demonstrate that quantization consistently improves calibration for typically underconfident pre-trained models, while often degrading it for overconfident variants. Intriguingly, this degradation in calibration does not preclude gains in other reliability metrics; we find that OOD detection can still improve for these same poorly calibrated models. Furthermore, we identify specific quantization-aware training (QAT) methods that yield simultaneous gains in zero-shot accuracy, calibration, and OOD robustness, challenging the view of a strict efficiency-performance trade-off. These findings offer critical insights for navigating the multi-objective problem of deploying efficient, reliable, and robust VLMs by utilizing quantization beyond its conventional role.


【5】CAD-Tokenizer: Towards Text-based CAD Prototyping via Modality-Specific Tokenization
标题:CAD-Tokenizer:通过特定模式的Tokenizer迈向基于文本的CAD原型
链接:https://arxiv.org/abs/2509.21150

作者:g, Shizhao Sun, Weijian Ma, Jiang Bian
摘要:Computer-Aided Design (CAD) is a foundational component of industrial prototyping, where models are defined not by raw coordinates but by construction sequences such as sketches and extrusions. This sequential structure enables both efficient prototype initialization and subsequent editing. Text-guided CAD prototyping, which unifies Text-to-CAD generation and CAD editing, has the potential to streamline the entire design pipeline. However, prior work has not explored this setting, largely because standard large language model (LLM) tokenizers decompose CAD sequences into natural-language word pieces, failing to capture primitive-level CAD semantics and hindering attention modules from modeling geometric structure. We conjecture that a multimodal tokenization strategy, aligned with CAD's primitive and structural nature, can provide more effective representations. To this end, we propose CAD-Tokenizer, a framework that represents CAD data with modality-specific tokens using a sequence-based VQ-VAE with primitive-level pooling and constrained decoding. This design produces compact, primitive-aware representations that align with CAD's structural nature. Applied to unified text-guided CAD prototyping, CAD-Tokenizer significantly improves instruction following and generation quality, achieving better quantitative and qualitative performance over both general-purpose LLMs and task-specific baselines.


【6】GraphUniverse: Enabling Systematic Evaluation of Inductive Generalization
标题:GraphUniverse:实现归纳概括的系统评估
链接:https://arxiv.org/abs/2509.21097

作者: Langendonck, Guillermo Bernárdez, Nina Miolane, Pere Barlet-Ros
摘要:A fundamental challenge in graph learning is understanding how models generalize to new, unseen graphs. While synthetic benchmarks offer controlled settings for analysis, existing approaches are confined to single-graph, transductive settings where models train and test on the same graph structure. Addressing this gap, we introduce GraphUniverse, a framework for generating entire families of graphs to enable the first systematic evaluation of inductive generalization at scale. Our core innovation is the generation of graphs with persistent semantic communities, ensuring conceptual consistency while allowing fine-grained control over structural properties like homophily and degree distributions. This enables crucial but underexplored robustness tests, such as performance under controlled distribution shifts. Benchmarking a wide range of architectures -- from GNNs to graph transformers and topological architectures -- reveals that strong transductive performance is a poor predictor of inductive generalization. Furthermore, we find that robustness to distribution shift is highly sensitive not only to model architecture choice but also to the initial graph regime (e.g., high vs. low homophily). Beyond benchmarking, GraphUniverse's flexibility and scalability can facilitate the development of robust and truly generalizable architectures -- including next-generation graph foundation models. An interactive demo is available at https://graphuniverse.streamlit.app.


【7】TyphoonMLA: A Mixed Naive-Absorb MLA Kernel For Shared Prefix
标题:Typhoon MLA:一个混合的朴素吸收的共享后缀MLA内核
链接:https://arxiv.org/abs/2509.21081

作者:er Yüzügüler, Ahmet Çelik, Jiawei Zhuang, Lukas Cavigelli
摘要:Multi-Head Latent Attention (MLA) is a recent attention mechanism adopted in state-of-the-art LLMs such as DeepSeek-v3 and Kimi K2. Thanks to its novel formulation, MLA allows two functionally equivalent but computationally distinct kernel implementations: naive and absorb. While the naive kernels (e.g., FlashAttention) are typically preferred in training and prefill for their computational efficiency, existing decoding kernels (e.g., FlashMLA) rely on the absorb method to minimize HBM bandwidth usage. However, the compute-bound nature of the absorb implementations prohibits performance benefits from data reuse opportunities in attention calculations, such as shared prefixes. In this work, we introduce TyphoonMLA, a hybrid approach that combines naive and absorb formulations to harness the strengths of both. TyphoonMLA effectively leverages the shared prefix by applying the naive formulation to the compute-bound parts of attention calculations, while reducing the bandwidth requirements for non-shared parts by using the absorb formulation. As a result, TyphoonMLA improves the throughput of attention calculations in MLA architectures by up to 3x and 3.24x on NPU and GPUs, with only a 3% overhead in HBM size.


【8】Combinatorial Creativity: A New Frontier in Generalization Abilities
标题:组合创造力:概括能力的新前沿
链接:https://arxiv.org/abs/2509.21043

作者:hapiro, Sumuk Shashidhar, Alexi Gladstone, Jonah Black, Royce Moon, Dilek Hakkani-Tur, Lav R. Varshney
备注:Preprint. The first two authors contributed equally
摘要 :Artificial intelligence (AI) systems, and large language models (LLMs) in particular, are increasingly employed for creative tasks like scientific idea generation, constituting a form of generalization from training data unaddressed by existing conceptual frameworks. Though in many ways similar to forms of compositional generalization (CG), combinatorial creativity (CC) is an open-ended ability. Instead of evaluating for accuracy or correctness against fixed targets, which would contradict the open-ended nature of CC, we propose a theoretical framework and algorithmic task for evaluating outputs by their degrees of novelty and utility. From here, we make several important empirical contributions: (1) We obtain the first insights into the scaling behavior of creativity for LLMs. (2) We discover that, for fixed compute budgets, there exist optimal model depths and widths for creative ability. (3) We find that the ideation-execution gap, whereby LLMs excel at generating novel scientific ideas but struggle to ensure their practical feasibility, may be explained by a more fundamental novelty-utility tradeoff characteristic of creativity algorithms in general. Importantly, this tradeoff remains persistent even at scale, casting doubt on the long-term creative potential of LLMs in their current form. Together, our conceptual framework and empirical findings provide a foundation for understanding and improving creativity in modern AI models, marking a new frontier in generalization abilities.


【9】Actor-Critic without Actor
标题:没有演员的演员评论家
链接:https://arxiv.org/abs/2509.21022

作者: Ki, Hee-Jun Ahn, Kyungyoon Kim, Byung-Jun Lee
摘要:Actor-critic methods constitute a central paradigm in reinforcement learning (RL), coupling policy evaluation with policy improvement. While effective across many domains, these methods rely on separate actor and critic networks, which makes training vulnerable to architectural decisions and hyperparameter tuning. Such complexity limits their scalability in settings that require large function approximators. Recently, diffusion models have recently been proposed as expressive policies that capture multi-modal behaviors and improve exploration, but they introduce additional design choices and computational burdens, hindering efficient deployment. We introduce Actor-Critic without Actor (ACA), a lightweight framework that eliminates the explicit actor network and instead generates actions directly from the gradient field of a noise-level critic. This design removes the algorithmic and computational overhead of actor training while keeping policy improvement tightly aligned with the critic's latest value estimates. Moreover, ACA retains the ability to capture diverse, multi-modal behaviors without relying on diffusion-based actors, combining simplicity with expressiveness. Through extensive experiments on standard online RL benchmarks,ACA achieves more favorable learning curves and competitive performance compared to both standard actor-critic and state-of-the-art diffusion-based methods, providing a simple yet powerful solution for online RL.


【10】Efficient Ensemble Conditional Independence Test Framework for Causal Discovery
标题:用于因果发现的高效整合条件独立测试框架
链接:https://arxiv.org/abs/2509.21021

作者: Guan, Kun Kuang
摘要:Constraint-based causal discovery relies on numerous conditional independence tests (CITs), but its practical applicability is severely constrained by the prohibitive computational cost, especially as CITs themselves have high time complexity with respect to the sample size. To address this key bottleneck, we introduce the Ensemble Conditional Independence Test (E-CIT), a general and plug-and-play framework. E-CIT operates on an intuitive divide-and-aggregate strategy: it partitions the data into subsets, applies a given base CIT independently to each subset, and aggregates the resulting p-values using a novel method grounded in the properties of stable distributions. This framework reduces the computational complexity of a base CIT to linear in the sample size when the subset size is fixed. Moreover, our tailored p-value combination method offers theoretical consistency guarantees under mild conditions on the subtests. Experimental results demonstrate that E-CIT not only significantly reduces the computational burden of CITs and causal discovery but also achieves competitive performance. Notably, it exhibits an improvement in complex testing scenarios, particularly on real-world datasets.


【11】RollPacker: Mitigating Long-Tail Rollouts for Fast, Synchronous RL Post-Training
标题:RollPacker:缓解长尾滚动,实现快速、同步的RL训练后
链接:https://arxiv.org/abs/2509.21009

作者:Yuheng Zhao, Dakai An, Tianyuan Wu, Lunxi Cao, Shaopan Xiong, Ju Huang, Weixun Wang, Siran Yang, Wenbo Su, Jiamang Wang, Lin Qu, Bo Zheng, Wei Wang
备注:16pages,14 figures
摘要:Reinforcement Learning (RL) is a pivotal post-training technique for enhancing the reasoning capabilities of Large Language Models (LLMs). However, synchronous RL post-training often suffers from significant GPU underutilization, referred to as bubbles, caused by imbalanced response lengths within rollout steps. Many RL systems attempt to alleviate this problem by relaxing synchronization, but this can compromise training accuracy. In this paper, we introduce tail batching, a novel rollout scheduling strategy for synchronous RL that systematically consolidates prompts leading to long-tail responses into a small subset of rollout steps (long rounds), while ensuring that the majority of steps (short rounds) involve only balanced, short rollouts. By excluding long responses from short rounds and rescheduling them into a few designated long rounds, tail batching effectively reduces GPU idle time during rollouts and significantly accelerates RL training without sacrificing accuracy. We present RollPacker, a system that fully harnesses the benefits of tail batching through holistic optimizations across all three RL stages: elastic parallelism adaptation for rollout, dynamic resource allocation and scheduling for reward, and stream-based training. Empirical results show that RollPacker achieves a 2.03x-2.56x end-to-end training time reduction compared to veRL and up to 2.24x speedup compared to RLHFuse for the Qwen2.5 family of LLMs on up to 128 H800 GPUs.


【12】Feature Augmentation of GNNs for ILPs: Local Uniqueness Suffices
标题:用于ILP的GNN的特征增强:局部唯一性足够
链接:https://arxiv.org/abs/2509.21000

作者:n, Qian Li, Linxin Yang, Qian Chen, Qingjiang Shi, Ruoyu Sun
备注:9 pages, 6 Tables
摘要 :Integer Linear Programs (ILPs) are central to real-world optimizations but notoriously difficult to solve. Learning to Optimize (L2O) has emerged as a promising paradigm, with Graph Neural Networks (GNNs) serving as the standard backbone. However, standard anonymous GNNs are limited in expressiveness for ILPs, and the common enhancement of augmenting nodes with globally unique identifiers (UIDs) typically introduces spurious correlations that severely harm generalization. To address this tradeoff, we propose a parsimonious Local-UID scheme based on d-hop uniqueness coloring, which ensures identifiers are unique only within each node's d-hop neighborhood. Building on this scheme, we introduce ColorGNN, which incorporates color information via color-conditioned embeddings, and ColorUID, a lightweight feature-level variant. We prove that for d-layer networks, Local-UIDs achieve the expressive power of Global-UIDs while offering stronger generalization. Extensive experiments show that our approach (i) yields substantial gains on three ILP benchmarks, (ii) exhibits strong OOD generalization on linear programming datasets, and (iii) further improves a general graph-level task when paired with a state-of-the-art method.


【13】Fast-SEnSeI: Lightweight Sensor-Independent Cloud Masking for On-board Multispectral Sensors
标题:Fast-SEnSeI:用于机载多光谱传感器的轻量级传感器独立云掩蔽
链接:https://arxiv.org/abs/2509.20991

作者:k, Jonáš Herec, Rado Pitoňák
备注:This is a preprint of a paper accepted for the EDHPC 2025 Conference
摘要:Cloud segmentation is a critical preprocessing step for many Earth observation tasks, yet most models are tightly coupled to specific sensor configurations and rely on ground-based processing. In this work, we propose Fast-SEnSeI, a lightweight, sensor-independent encoder module that enables flexible, on-board cloud segmentation across multispectral sensors with varying band configurations. Building upon SEnSeI-v2, Fast-SEnSeI integrates an improved spectral descriptor, lightweight architecture, and robust padding-band handling. It accepts arbitrary combinations of spectral bands and their wavelengths, producing fixed-size feature maps that feed into a compact, quantized segmentation model based on a modified U-Net. The module runs efficiently on embedded CPUs using Apache TVM, while the segmentation model is deployed on FPGA, forming a CPU-FPGA hybrid pipeline suitable for space-qualified hardware. Evaluations on Sentinel-2 and Landsat 8 datasets demonstrate accurate segmentation across diverse input configurations.


【14】Flow Matching in the Low-Noise Regime: Pathologies and a Contrastive Remedy
标题:低噪音状态下的流量匹配:病理学和对比补救措施
链接:https://arxiv.org/abs/2509.20952

作者:g, Yichao Yan
摘要:Flow matching has recently emerged as a powerful alternative to diffusion models, providing a continuous-time formulation for generative modeling and representation learning. Yet, we show that this framework suffers from a fundamental instability in the low-noise regime. As noise levels approach zero, arbitrarily small perturbations in the input can induce large variations in the velocity target, causing the condition number of the learning problem to diverge. This ill-conditioning not only slows optimization but also forces the encoder to reallocate its limited Jacobian capacity toward noise directions, thereby degrading semantic representations. We provide the first theoretical analysis of this phenomenon, which we term the low-noise pathology, establishing its intrinsic link to the structure of the flow matching objective. Building on these insights, we propose Local Contrastive Flow (LCF), a hybrid training protocol that replaces direct velocity regression with contrastive feature alignment at small noise levels, while retaining standard flow matching at moderate and high noise. Empirically, LCF not only improves convergence speed but also stabilizes representation quality. Our findings highlight the critical importance of addressing low-noise pathologies to unlock the full potential of flow matching for both generation and representation learning.


【15】Reverse Faà di Bruno's Formula for Cartesian Reverse Differential Categories
标题:Cartesian反向微范畴的反向Faà di Bruno公式
链接:https://arxiv.org/abs/2509.20931

作者:gin (Macquarie University), Jean-Simon Pacaud Lemay (Macquarie University)
备注:In Proceedings ACT 2024, arXiv:2509.18357
摘要:Reverse differentiation is an essential operation for automatic differentiation. Cartesian reverse differential categories axiomatize reverse differentiation in a categorical framework, where one of the primary axioms is the reverse chain rule, which is the formula that expresses the reverse derivative of a composition. Here, we present the reverse differential analogue of Faa di Bruno's Formula, which gives a higher-order reverse chain rule in a Cartesian reverse differential category. To properly do so, we also define partial reverse derivatives and higher-order reverse derivatives in a Cartesian reverse differential category.


【16】Energy saving in off-road vehicles using leakage compensation technique
标题:利用泄漏补偿技术实现越野车节能
链接:https://arxiv.org/abs/2509.20926

作者:, J. Das
摘要:The article focuses on enhancing the energy efficiency of linear actuators used in heavy earth moving equipment, particularly in the booms ofexcavation equipment. Two hydraulic circuits are compared in terms of energy efficiency, with one using a conventional proportional directionalcontrol valve (PDCV) and the other using an innovative solution of proportional flow control valve (PFCV) with artificial leakage between thetwo ends of the actuator. The PFCV reduces energy loss in the form of heat by bypassing the extra flow from the pump during position control,unlike the PDCV that uses a pressure relief valve. The hydraulic circuit using PFCV is found to be 8.5% more energy efficient than theconventional circuit using PDCV. The article also discusses the position control of the actuator, which is achieved using a PID controller tuned by a fuzzy controller. Thesimulation of the hydraulic circuit is carried out using MATLAB/Simulink, and the results are compared with experiments. Overall, the proposedapproach could lead to significant improvements in the energy efficiency of linear actuators used in heavy earth moving equipment, therebyreducing their environmental impact and operating costs.


【17】Federated Markov Imputation: Privacy-Preserving Temporal Imputation in Multi-Centric ICU Environments
标题:联邦马尔科夫插补:多中心ICU环境中保护隐私的时间插补
链接:https://arxiv.org/abs/2509.20867

作者: Düsing, Philipp Cimiano
备注:Accepted at the 1st International ECML-PKDD Workshop-Tutorial on Learning on Real and Synthetic Medical Time Series Data (MED-TIME)
摘要:Missing data is a persistent challenge in federated learning on electronic health records, particularly when institutions collect time-series data at varying temporal granularities. To address this, we propose Federated Markov Imputation (FMI), a privacy-preserving method that enables Intensive Care Units (ICUs) to collaboratively build global transition models for temporal imputation. We evaluate FMI on a real-world sepsis onset prediction task using the MIMIC-IV dataset and show that it outperforms local imputation baselines, especially in scenarios with irregular sampling intervals across ICUs.


【18】ImaginationPolicy: Towards Generalizable, Precise and Reliable End-to-End Policy for Robotic Manipulation
标题:ImaginationPolicy:迈向通用、精确和可靠的机器人操纵端到端策略
链接:https://arxiv.org/abs/2509.20841

作者: Wei Gao, Kui Jia
备注:First two authors contribute equally. Project page: this https URL
摘要:End-to-end robot manipulation policies offer significant potential for enabling embodied agents to understand and interact with the world. Unlike traditional modular pipelines, end-to-end learning mitigates key limitations such as information loss between modules and feature misalignment caused by isolated optimization targets. Despite these advantages, existing end-to-end neural networks for robotic manipulation--including those based on large VLM/VLA models--remain insufficiently performant for large-scale practical deployment. In this paper, we take a step towards an end-to-end manipulation policy that is generalizable, accurate and reliable. To achieve this goal, we propose a novel Chain of Moving Oriented Keypoints (CoMOK) formulation for robotic manipulation. Our formulation is used as the action representation of a neural policy, which can be trained in an end-to-end fashion. Such an action representation is general, as it extends the standard end-effector pose action representation and supports a diverse set of manipulation tasks in a unified manner. The oriented keypoint in our method enables natural generalization to objects with different shapes and sizes, while achieving sub-centimeter accuracy. Moreover, our formulation can easily handle multi-stage tasks, multi-modal robot behaviors, and deformable objects. Extensive simulated and hardware experiments demonstrate the effectiveness of our method.


【19】Shaping Initial State Prevents Modality Competition in Multi-modal Fusion: A Two-stage Scheduling Framework via Fast Partial Information Decomposition
标题:塑造初始状态防止多模式融合中的模式竞争:通过快速部分信息分解的两阶段调度框架
链接:https://arxiv.org/abs/2509.20840

作者:g, Yinsong Xu, Yang Liu, Qingchao Chen
摘要:Multi-modal fusion often suffers from modality competition during joint training, where one modality dominates the learning process, leaving others under-optimized. Overlooking the critical impact of the model's initial state, most existing methods address this issue during the joint learning stage. In this study, we introduce a two-stage training framework to shape the initial states through unimodal training before the joint training. First, we propose the concept of Effective Competitive Strength (ECS) to quantify a modality's competitive strength. Our theoretical analysis further reveals that properly shaping the initial ECS by unimodal training achieves a provably tighter error bound. However, ECS is computationally intractable in deep neural networks. To bridge this gap, we develop a framework comprising two core components: a fine-grained computable diagnostic metric and an asynchronous training controller. For the metric, we first prove that mutual information(MI) is a principled proxy for ECS. Considering MI is induced by per-modality marginals and thus treats each modality in isolation, we further propose FastPID, a computationally efficient and differentiable solver for partial information decomposition, which decomposes the joint distribution's information into fine-grained measurements: modality-specific uniqueness, redundancy, and synergy. Guided by these measurements, our asynchronous controller dynamically balances modalities by monitoring uniqueness and locates the ideal initial state to start joint training by tracking peak synergy. Experiments on diverse benchmarks demonstrate that our method achieves state-of-the-art performance. Our work establishes that shaping the pre-fusion models' initial state is a powerful strategy that eases competition before it starts, reliably unlocking synergistic multi-modal fusion.


【20】Explaining Grokking and Information Bottleneck through Neural Collapse Emergence
标题:通过神经崩溃的出现解释Grokking和信息瓶颈
链接:https://arxiv.org/abs/2509.20829

作者:akamoto, Issei Sato
备注:Code is available at this https URL
摘要:The training dynamics of deep neural networks often defy expectations, even as these models form the foundation of modern machine learning. Two prominent examples are grokking, where test performance improves abruptly long after the training loss has plateaued, and the information bottleneck principle, where models progressively discard input information irrelevant to the prediction task as training proceeds. However, the mechanisms underlying these phenomena and their relations remain poorly understood. In this work, we present a unified explanation of such late-phase phenomena through the lens of neural collapse, which characterizes the geometry of learned representations. We show that the contraction of population within-class variance is a key factor underlying both grokking and information bottleneck, and relate this measure to the neural collapse measure defined on the training set. By analyzing the dynamics of neural collapse, we show that distinct time scales between fitting the training set and the progression of neural collapse account for the behavior of the late-phase phenomena. Finally, we validate our theoretical findings on multiple datasets and architectures.


【21】Extrapolating Phase-Field Simulations in Space and Time with Purely Convolutional Architectures
标题:用活泼卷积结构外推时空相场模拟
链接:https://arxiv.org/abs/2509.20770

作者:e Bonneville, Nathan Bieberdorf, Pieterjan Robbe, Mark Asta, Habib N. Najm, Laurent Capolungo, Cosmin Safta
摘要:Phase-field models of liquid metal dealloying (LMD) can resolve rich microstructural dynamics but become intractable for large domains or long time horizons. We present a conditionally parameterized, fully convolutional U-Net surrogate that generalizes far beyond its training window in both space and time. The design integrates convolutional self-attention and physics-aware padding, while parameter conditioning enables variable time-step skipping and adaptation to diverse alloy systems. Although trained only on short, small-scale simulations, the surrogate exploits the translational invariance of convolutions to extend predictions to much longer horizons than traditional solvers. It accurately reproduces key LMD physics, with relative errors typically under 5% within the training regime and below 10% when extrapolating to larger domains and later times. The method accelerates computations by up to 16,000 times, cutting weeks of simulation down to seconds, and marks an early step toward scalable, high-fidelity extrapolation of LMD phase-field models.


【22】Identifying Group Anchors in Real-World Group Interactions Under Label Scarcity
标题:在标签稀缺下识别现实世界群体互动中的群体优势
链接:https://arxiv.org/abs/2509.20762

作者:u, Geon Lee, Minyoung Choe, Kijung Shin
备注:IEEE International Conference on Data Mining (ICDM) 2025
摘要:Group interactions occur in various real-world contexts, e.g., co-authorship, email communication, and online Q&A. In each group, there is often a particularly significant member, around whom the group is formed. Examples include the first or last author of a paper, the sender of an email, and the questioner in a Q&A session. In this work, we discuss the existence of such individuals in real-world group interactions. We call such individuals group anchors and study the problem of identifying them. First, we introduce the concept of group anchors and the identification problem. Then, we discuss our observations on group anchors in real-world group interactions. Based on our observations, we develop AnchorRadar, a fast and effective method for group anchor identification under realistic settings with label scarcity, i.e., when only a few groups have known anchors. AnchorRadar is a semi-supervised method using information from groups both with and without known group anchors. Finally, through extensive experiments on thirteen real-world datasets, we demonstrate the empirical superiority of AnchorRadar over various baselines w.r.t. accuracy and efficiency. In most cases, AnchorRadar achieves higher accuracy in group anchor identification than all the baselines, while using 10.2$\times$ less training time than the fastest baseline and 43.6$\times$ fewer learnable parameters than the most lightweight baseline on average.


【23】The Impact of Audio Watermarking on Audio Anti-Spoofing Countermeasures
标题:音频水印对音频反欺骗对策的影响
链接:https://arxiv.org/abs/2509.20736

作者:Zhang, Xueping Zhang, Yechen Wang, Liwei Jin, Ming Li
备注:5 pages, submitted to ICASSP 2026
摘要:This paper presents the first study on the impact of audio watermarking on spoofing countermeasures. While anti-spoofing systems are essential for securing speech-based applications, the influence of widely used audio watermarking, originally designed for copyright protection, remains largely unexplored. We construct watermark-augmented training and evaluation datasets, named the Watermark-Spoofing dataset, by applying diverse handcrafted and neural watermarking methods to existing anti-spoofing datasets. Experiments show that watermarking consistently degrades anti-spoofing performance, with higher watermark density correlating with higher Equal Error Rates (EERs). To mitigate this, we propose the Knowledge-Preserving Watermark Learning (KPWL) framework, enabling models to adapt to watermark-induced shifts while preserving their original-domain spoofing detection capability. These findings reveal audio watermarking as a previously overlooked domain shift and establish the first benchmark for developing watermark-resilient anti-spoofing systems. All related protocols are publicly available at https://github.com/Alphawarheads/Watermark_Spoofing.git


【24】Scaling Laws are Redundancy Laws
标题:缩放定律是冗余定律
链接:https://arxiv.org/abs/2509.20721

作者:Vince D Calhoun
摘要:Scaling laws, a defining feature of deep learning, reveal a striking power-law improvement in model performance with increasing dataset and model size. Yet, their mathematical origins, especially the scaling exponent, have remained elusive. In this work, we show that scaling laws can be formally explained as redundancy laws. Using kernel regression, we show that a polynomial tail in the data covariance spectrum yields an excess risk power law with exponent alpha = 2s / (2s + 1/beta), where beta controls the spectral tail and 1/beta measures redundancy. This reveals that the learning curve's slope is not universal but depends on data redundancy, with steeper spectra accelerating returns to scale. We establish the law's universality across boundedly invertible transformations, multi-modal mixtures, finite-width approximations, and Transformer architectures in both linearized (NTK) and feature-learning regimes. This work delivers the first rigorous mathematical explanation of scaling laws as finite-sample redundancy laws, unifying empirical observations with theoretical foundations.


【25】A Genetic Algorithm for Navigating Synthesizable Molecular Spaces
标题:导航可合成分子空间的遗传算法
链接:https://arxiv.org/abs/2509.20719

作者:, Connor W. Coley, Wojciech Matusik
摘要 :Inspired by the effectiveness of genetic algorithms and the importance of synthesizability in molecular design, we present SynGA, a simple genetic algorithm that operates directly over synthesis routes. Our method features custom crossover and mutation operators that explicitly constrain it to synthesizable molecular space. By modifying the fitness function, we demonstrate the effectiveness of SynGA on a variety of design tasks, including synthesizable analog search and sample-efficient property optimization, for both 2D and 3D objectives. Furthermore, by coupling SynGA with a machine learning-based filter that focuses the building block set, we boost SynGA to state-of-the-art performance. For property optimization, this manifests as a model-based variant SynGBO, which employs SynGA and block filtering in the inner loop of Bayesian optimization. Since SynGA is lightweight and enforces synthesizability by construction, our hope is that SynGA can not only serve as a strong standalone baseline but also as a versatile module that can be incorporated into larger synthesis-aware workflows in the future.


【26】Wonder Wins Ways: Curiosity-Driven Exploration through Multi-Agent Contextual Calibration
标题:Wonder获胜方式:通过多智能体上下文校准进行好奇心驱动的探索
链接:https://arxiv.org/abs/2509.20648

作者:n, Zhe Liu, Hesheng Wang
摘要:Autonomous exploration in complex multi-agent reinforcement learning (MARL) with sparse rewards critically depends on providing agents with effective intrinsic motivation. While artificial curiosity offers a powerful self-supervised signal, it often confuses environmental stochasticity with meaningful novelty. Moreover, existing curiosity mechanisms exhibit a uniform novelty bias, treating all unexpected observations equally. However, peer behavior novelty, which encode latent task dynamics, are often overlooked, resulting in suboptimal exploration in decentralized, communication-free MARL settings. To this end, inspired by how human children adaptively calibrate their own exploratory behaviors via observing peers, we propose a novel approach to enhance multi-agent exploration. We introduce CERMIC, a principled framework that empowers agents to robustly filter noisy surprise signals and guide exploration by dynamically calibrating their intrinsic curiosity with inferred multi-agent context. Additionally, CERMIC generates theoretically-grounded intrinsic rewards, encouraging agents to explore state transitions with high information gain. We evaluate CERMIC on benchmark suites including VMAS, Meltingpot, and SMACv2. Empirical results demonstrate that exploration with CERMIC significantly outperforms SoTA algorithms in sparse-reward environments.


【27】Latent Twins
标题:潜在双胞胎
链接:https://arxiv.org/abs/2509.20615

作者:Chung, Deepanshu Verma, Max Collins, Amit N. Subrahmanya, Varuni Katti Sastry, Vishwas Rao
备注:38 pages, 22 figures, 1 table
摘要:Over the past decade, scientific machine learning has transformed the development of mathematical and computational frameworks for analyzing, modeling, and predicting complex systems. From inverse problems to numerical PDEs, dynamical systems, and model reduction, these advances have pushed the boundaries of what can be simulated. Yet they have often progressed in parallel, with representation learning and algorithmic solution methods evolving largely as separate pipelines. With \emph{Latent Twins}, we propose a unifying mathematical framework that creates a hidden surrogate in latent space for the underlying equations. Whereas digital twins mirror physical systems in the digital world, Latent Twins mirror mathematical systems in a learned latent space governed by operators. Through this lens, classical modeling, inversion, model reduction, and operator approximation all emerge as special cases of a single principle. We establish the fundamental approximation properties of Latent Twins for both ODEs and PDEs and demonstrate the framework across three representative settings: (i) canonical ODEs, capturing diverse dynamical regimes; (ii) a PDE benchmark using the shallow-water equations, contrasting Latent Twin simulations with DeepONet and forecasts with a 4D-Var baseline; and (iii) a challenging real-data geopotential reanalysis dataset, reconstructing and forecasting from sparse, noisy observations. Latent Twins provide a compact, interpretable surrogate for solution operators that evaluate across arbitrary time gaps in a single-shot, while remaining compatible with scientific pipelines such as assimilation, control, and uncertainty quantification. Looking forward, this framework offers scalable, theory-grounded surrogates that bridge data-driven representation learning and classical scientific modeling across disciplines.


【28】Experience Deploying Containerized GenAI Services at an HPC Center
标题:在IPC中心部署集装箱化GenAI服务的体验
链接:https://arxiv.org/abs/2509.20603

作者:Beltre, Jeff Ogden, Kevin Pedretti
备注:10 pages, 12 figures
摘要:Generative Artificial Intelligence (GenAI) applications are built from specialized components -- inference servers, object storage, vector and graph databases, and user interfaces -- interconnected via web-based APIs. While these components are often containerized and deployed in cloud environments, such capabilities are still emerging at High-Performance Computing (HPC) centers. In this paper, we share our experience deploying GenAI workloads within an established HPC center, discussing the integration of HPC and cloud computing environments. We describe our converged computing architecture that integrates HPC and Kubernetes platforms running containerized GenAI workloads, helping with reproducibility. A case study illustrates the deployment of the Llama Large Language Model (LLM) using a containerized inference server (vLLM) across both Kubernetes and HPC platforms using multiple container runtimes. Our experience highlights practical considerations and opportunities for the HPC container community, guiding future research and tool development.


【29】Explicit and Effectively Symmetric Schemes for Neural SDEs
标题:神经SDP的显式且有效对称方案
链接:https://arxiv.org/abs/2509.20599

作者:melev, Cristopher Salvi
摘要 :Backpropagation through (neural) SDE solvers is traditionally approached in two ways: discretise-then-optimise, which offers accurate gradients but incurs prohibitive memory costs due to storing the full computational graph (even when mitigated by checkpointing); and optimise-then-discretise, which achieves constant memory cost by solving an auxiliary backward SDE, but suffers from slower evaluation and gradient approximation errors. Algebraically reversible solvers promise both memory efficiency and gradient accuracy, yet existing methods such as the Reversible Heun scheme are often unstable under complex models and large step sizes. We address these limitations by introducing a novel class of stable, near-reversible Runge--Kutta schemes for neural SDEs. These Explicit and Effectively Symmetric (EES) schemes retain the benefits of reversible solvers while overcoming their instability, enabling memory-efficient training without severe restrictions on step size or model complexity. Through numerical experiments, we demonstrate the superior stability and reliability of our schemes, establishing them as a practical foundation for scalable and accurate training of neural SDEs.


【30】Myosotis: structured computation for attention like layer
标题:Myosotis:像层一样关注的结构化计算
链接:https://arxiv.org/abs/2509.20503

作者:gorov, Hanno Ackermann, Markus Nagel, Hong Cai
摘要:Attention layers apply a sequence-to-sequence mapping whose parameters depend on the pairwise interactions of the input elements. However, without any structural assumptions, memory and compute scale quadratically with the sequence length. The two main ways to mitigate this are to introduce sparsity by ignoring a sufficient amount of pairwise interactions or to introduce recurrent dependence along them, as SSM does. Although both approaches are reasonable, they both have disadvantages. We propose a novel algorithm that combines the advantages of both concepts. Our idea is based on the efficient inversion of tree-structured matrices.


【31】Efficiently Attacking Memorization Scores
标题:有效地攻击认证分数
链接:https://arxiv.org/abs/2509.20463

作者:arun Chandrasekaran, Daniel Alabi
摘要:Influence estimation tools -- such as memorization scores -- are widely used to understand model behavior, attribute training data, and inform dataset curation. However, recent applications in data valuation and responsible machine learning raise the question: can these scores themselves be adversarially manipulated? In this work, we present a systematic study of the feasibility of attacking memorization-based influence estimators. We characterize attacks for producing highly memorized samples as highly sensitive queries in the regime where a trained algorithm is accurate. Our attack (calculating the pseudoinverse of the input) is practical, requiring only black-box access to model outputs and incur modest computational overhead. We empirically validate our attack across a wide suite of image classification tasks, showing that even state-of-the-art proxies are vulnerable to targeted score manipulations. In addition, we provide a theoretical analysis of the stability of memorization scores under adversarial perturbations, revealing conditions under which influence estimates are inherently fragile. Our findings highlight critical vulnerabilities in influence-based attribution and suggest the need for robust defenses. All code can be found at https://anonymous.4open.science/r/MemAttack-5413/


【32】Document Summarization with Conformal Importance Guarantees
标题:具有共形重要性保证的文档摘要
链接:https://arxiv.org/abs/2509.20461

作者:ahara, Chen-Yuan Lin, Xiao Shi Huang, Kin Kwan Leung, Jullian Arta Yapeter, Ilya Stanevich, Felipe Perez, Jesse C. Cresswell
备注:NeurIPS 2025. Code is available at this https URL
摘要:Automatic summarization systems have advanced rapidly with large language models (LLMs), yet they still lack reliable guarantees on inclusion of critical content in high-stakes domains like healthcare, law, and finance. In this work, we introduce Conformal Importance Summarization, the first framework for importance-preserving summary generation which uses conformal prediction to provide rigorous, distribution-free coverage guarantees. By calibrating thresholds on sentence-level importance scores, we enable extractive document summarization with user-specified coverage and recall rates over critical content. Our method is model-agnostic, requires only a small calibration set, and seamlessly integrates with existing black-box LLMs. Experiments on established summarization benchmarks demonstrate that Conformal Importance Summarization achieves the theoretically assured information coverage rate. Our work suggests that Conformal Importance Summarization can be combined with existing techniques to achieve reliable, controllable automatic summarization, paving the way for safer deployment of AI summarization tools in critical applications. Code is available at https://github.com/layer6ai-labs/conformal-importance-summarization.


【33】Bridging Privacy and Utility: Synthesizing anonymized EEG with constraining utility functions
标题:隐私与效用的桥梁:具有约束效用函数的匿名EEG合成
链接:https://arxiv.org/abs/2509.20454

作者:eister, Arne Pelzer, Fabian Radke, Julia Lechinger, Mahzad Gharleghi, Thomas Köllmer, Insa Wolf
摘要:Electroencephalography (EEG) is widely used for recording brain activity and has seen numerous applications in machine learning, such as detecting sleep stages and neurological disorders. Several studies have successfully shown the potential of EEG data for re-identification and leakage of other personal information. Therefore, the increasing availability of EEG consumer devices raises concerns about user privacy, motivating us to investigate how to safeguard this sensitive data while retaining its utility for EEG applications. To address this challenge, we propose a transformer-based autoencoder to create EEG data that does not allow for subject re-identification while still retaining its utility for specific machine learning tasks. We apply our approach to automatic sleep staging by evaluating the re-identification and utility potential of EEG data before and after anonymization. The results show that the re-identifiability of the EEG signal can be substantially reduced while preserving its utility for machine learning.


【34】FastEagle: Cascaded Drafting for Accelerating Speculative Decoding
标题:FastEagle:加速推测解码的级联起草
链接:https://arxiv.org/abs/2509.20416

作者:ang, Jiangcheng Song, Wenzhe Zhao, Pengju Ren
摘要:Speculative decoding accelerates generation by drafting candidates and verifying them in parallel, yet state-of-the-art drafters (e.g., EAGLE) still require N sequential passes to propose N tokens. We present FastEagle, a non-autoregressive cascaded drafter that emits an entire draft in a single forward pass. FastEagle replaces temporal steps with a lightweight layer cascade and trains with layer-wise supervision to mitigate error accumulation. Coupled with a constrained draft tree that preserves lossless verification cost, FastEagle delivers substantial wall-clock speedups over strong autoregressive drafters while maintaining competitive acceptance behavior. Across multiple LLMs (Vicuna-13B, LLaMA-Instruct 3.x, and DeepSeek-R1-Distill-LLaMA) and tasks (MT-Bench, HumanEval, GSM8K, CNN/DM, Alpaca), FastEagle consistently outperforms EAGLE-3 in speedup under both greedy and stochastic decoding, with comparable average acceptance lengths. These results indicate that removing sequential dependencies in drafting is a practical path toward lossless LLM inference acceleration.


【35】Maxout Polytopes
标题:Maxout多边形
链接:https://arxiv.org/abs/2509.21286

作者:lakin, Shelby Cox, Georg Loho, Bernd Sturmfels
备注:24 pages, 3 figures
摘要:Maxout polytopes are defined by feedforward neural networks with maxout activation function and non-negative weights after the first layer. We characterize the parameter spaces and extremal f-vectors of maxout polytopes for shallow networks, and we study the separating hypersurfaces which arise when a layer is added to the network. We also show that maxout polytopes are cubical for generic networks without bottlenecks.


【36】WISER: Segmenting watermarked region - an epidemic change-point perspective
标题:WISER:分割水印区域-流行病转折点的视角
链接:https://arxiv.org/abs/2509.21160

作者:nerjee, Sayar Karmakar, Subhrajyoty Roy
摘要:With the increasing popularity of large language models, concerns over content authenticity have led to the development of myriad watermarking schemes. These schemes can be used to detect a machine-generated text via an appropriate key, while being imperceptible to readers with no such keys. The corresponding detection mechanisms usually take the form of statistical hypothesis testing for the existence of watermarks, spurring extensive research in this direction. However, the finer-grained problem of identifying which segments of a mixed-source text are actually watermarked, is much less explored; the existing approaches either lack scalability or theoretical guarantees robust to paraphrase and post-editing. In this work, we introduce a unique perspective to such watermark segmentation problems through the lens of epidemic change-points. By highlighting the similarities as well as differences of these two problems, we motivate and propose WISER: a novel, computationally efficient, watermark segmentation algorithm. We theoretically validate our algorithm by deriving finite sample error-bounds, and establishing its consistency in detecting multiple watermarked segments in a single text. Complementing these theoretical results, our extensive numerical experiments show that WISER outperforms state-of-the-art baseline methods, both in terms of computational speed as well as accuracy, on various benchmark datasets embedded with diverse watermarking schemes. Our theoretical and empirical findings establish WISER as an effective tool for watermark localization in most settings. It also shows how insights from a classical statistical problem can lead to a theoretically valid and computationally efficient solution of a modern and pertinent problem.


【37】Best-of-$\infty$ -- Asymptotic Performance of Test-Time Compute
链接:https://arxiv.org/abs/2509.21091

作者:miyama, Daisuke Oba, Masafumi Oyamada
摘要:We study best-of-$N$ for large language models (LLMs) where the selection is based on majority voting. In particular, we analyze the limit $N \to \infty$, which we denote as Best-of-$\infty$. While this approach achieves impressive performance in the limit, it requires an infinite test-time budget. To address this, we propose an adaptive generation scheme that selects $N$ based on answer agreement, thereby efficiently allocating inference-time computation. Beyond adaptivity, we extend the framework to weighted ensembles of multiple LLMs, showing that such mixtures can outperform any individual model. The optimal ensemble weighting is formulated and efficiently computed as a mixed-integer linear program. Extensive experiments demonstrate the effectiveness of our approach.


【38】Empirical PAC-Bayes bounds for Markov chains
标题:Markov链的经验PAC-Bayes界
链接:https://arxiv.org/abs/2509.20985

作者:gulyan, Pierre Alquier
摘要:The core of generalization theory was developed for independent observations. Some PAC and PAC-Bayes bounds are available for data that exhibit a temporal dependence. However, there are constants in these bounds that depend on properties of the data-generating process: mixing coefficients, mixing time, spectral gap... Such constants are unknown in practice. In this paper, we prove a new PAC-Bayes bound for Markov chains. This bound depends on a quantity called the pseudo-spectral gap. The main novelty is that we can provide an empirical bound on the pseudo-spectral gap when the state space is finite. Thus, we obtain the first fully empirical PAC-Bayes bound for Markov chains. This extends beyond the finite case, although this requires additional assumptions. On simulated experiments, the empirical version of the bound is essentially as tight as the non-empirical one.


【39】Real-Time System for Audio-Visual Target Speech Enhancement
标题:视听目标语音增强实时系统
链接:https://arxiv.org/abs/2509.20741

作者:ndra Ma, Sile Yin, Li-Chia Yang, Shuo Zhang
备注:Accepted into WASPAA 2025 demo session
摘要:We present a live demonstration for RAVEN, a real-time audio-visual speech enhancement system designed to run entirely on a CPU. In single-channel, audio-only settings, speech enhancement is traditionally approached as the task of extracting clean speech from environmental noise. More recent work has explored the use of visual cues, such as lip movements, to improve robustness, particularly in the presence of interfering speakers. However, to our knowledge, no prior work has demonstrated an interactive system for real-time audio-visual speech enhancement operating on CPU hardware. RAVEN fills this gap by using pretrained visual embeddings from an audio-visual speech recognition model to encode lip movement information. The system generalizes across environmental noise, interfering speakers, transient sounds, and even singing voices. In this demonstration, attendees will be able to experience live audio-visual target speech enhancement using a microphone and webcam setup, with clean speech playback through headphones.


【40】A Gapped Scale-Sensitive Dimension and Lower Bounds for Offset Rademacher Complexity
标题:补偿Rademacher复杂性的缺口规模敏感维度和下界
链接:https://arxiv.org/abs/2509.20618

作者: Yury Polyanskiy, Alexander Rakhlin
摘要:We study gapped scale-sensitive dimensions of a function class in both sequential and non-sequential settings. We demonstrate that covering numbers for any uniformly bounded class are controlled above by these gapped dimensions, generalizing the results of \cite{anthony2000function,alon1997scale}. Moreover, we show that the gapped dimensions lead to lower bounds on offset Rademacher averages, thereby strengthening existing approaches for proving lower bounds on rates of convergence in statistical and online learning.


【41】Sample completion, structured correlation, and Netflix problems
标题:样本完成、结构化相关性和Netflix问题
链接:https://arxiv.org/abs/2509.20404

作者:N. Coregliano, Maryanthe Malliaris
备注:97 pages, 1 figure
摘要:We develop a new high-dimensional statistical learning model which can take advantage of structured correlation in data even in the presence of randomness. We completely characterize learnability in this model in terms of VCN${}_{k,k}$-dimension (essentially $k$-dependence from Shelah's classification theory). This model suggests a theoretical explanation for the success of certain algorithms in the 2006~Netflix Prize competition.


【42】An Analytical and AI-discovered Stable, Accurate, and Generalizable Subgrid-scale Closure for Geophysical Turbulence
标题:地球物理湍流的分析和人工智能发现的稳定,准确和可推广的亚网格尺度闭合
链接:https://arxiv.org/abs/2509.20365

作者:har, Yifei Guan, Pedram Hassanzadeh
摘要:By combining AI and fluid physics, we discover a closed-form closure for 2D turbulence from small direct numerical simulation (DNS) data. Large-eddy simulation (LES) with this closure is accurate and stable, reproducing DNS statistics including those of extremes. We also show that the new closure could be derived from a 4th-order truncated Taylor expansion. Prior analytical and AI-based work only found the 2nd-order expansion, which led to unstable LES. The additional terms emerge only when inter-scale energy transfer is considered alongside standard reconstruction criterion in the sparse-equation discovery.


机器翻译由腾讯交互翻译提供,仅供参考

点击“阅读原文”获取带摘要的学术速递

Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/187201
 
64 次点击