Py学习  »  机器学习算法

机器学习学术速递[3.5]

arXiv每日学术速递 • 1 月前 • 498 次点击  

点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!


cs.LG 方向,今日共计174篇


大模型相关(21篇)

【1】Efficient Refusal Ablation in LLM through Optimal Transport
标题:通过最佳运输在LLM中进行有效的拒绝消融
链接:https://arxiv.org/abs/2603.04355

作者:Geraldin Nanfack,Eugene Belilovsky,Elvis Dohmatob
摘要:安全对齐的语言模型通过在其内部表示中编码的学习拒绝行为来拒绝有害请求。最近的基于激活的越狱方法通过应用正交投影来去除拒绝方向来规避这些安全机制,但是这些方法将拒绝视为一维现象,并且忽略了模型激活的丰富分布结构。我们引入了一个基于最优传输理论的原则框架,该框架将有害激活的整个分布转换为与无害激活相匹配。通过将PCA与封闭形式的高斯最优传输相结合,我们在高维表示空间中实现了高效的计算,同时保留了基本的几何结构。在六个模型(Llama-2,Llama-3.1,Qwen-2.5; 7 B-32 B参数)中,我们的方法实现了比最先进的基线高出11%的攻击成功率,同时保持了相当的困惑,证明了模型能力的优越性。关键的是,我们发现层选择性干预(将最佳传输应用于大约40-60%网络深度的1-2个精心选择的层)大大优于全网络干预,揭示了拒绝机制可能是本地化的,而不是分布式的。我们的分析提供了新的见解的几何结构的安全表示,并表明,目前的对齐方法可能是容易受到分布式攻击超出简单的方向删除。
摘要:Safety-aligned language models refuse harmful requests through learned refusal behaviors encoded in their internal representations. Recent activation-based jailbreaking methods circumvent these safety mechanisms by applying orthogonal projections to remove refusal directions, but these approaches treat refusal as a one-dimensional phenomenon and ignore the rich distributional structure of model activations. We introduce a principled framework based on optimal transport theory that transforms the entire distribution of harmful activations to match harmless ones. By combining PCA with closed-form Gaussian optimal transport, we achieve efficient computation in high-dimensional representation spaces while preserving essential geometric structure. Across six models (Llama-2, Llama-3.1, Qwen-2.5; 7B-32B parameters), our method achieves up to 11% higher attack success rates than state-of-the-art baselines while maintaining comparable perplexity, demonstrating superior preservation of model capabilities. Critically, we discover that layer-selective intervention (applying optimal transport to 1-2 carefully chosen layers at approximately 40-60% network depth) substantially outperforms full-network interventions, revealing that refusal mechanisms may be localized rather than distributed. Our analysis provides new insights into the geometric structure of safety representations and suggests that current alignment methods may be vulnerable to distributional attacks beyond simple direction removal.


【2】Causality Elicitation from Large Language Models
标题:大型语言模型的因果关系启发
链接:https://arxiv.org/abs/2603.04276

作者:Takashi Kameyama,Masahiro Kato,Yasuko Hio,Yasushi Takano,Naoto Minakawa
摘要:大型语言模型(LLM)是在大量数据上训练的,并在其参数中编码知识。我们提出了一个管道,从LLM引出因果关系。具体来说,(i)我们从LLM中对给定主题的许多文档进行采样,(ii)我们从每个文档中提取事件列表,(iii)我们将跨文档出现的事件分组为规范事件,(iv)我们为每个文档构建一个二进制指示向量,(v)我们使用因果发现方法估计候选因果图。我们的方法不能保证现实世界的因果关系。相反,它提供了一个框架,用于呈现LLM可以合理假设的因果假设集,作为一组可检查的变量和候选图。
摘要:Large language models (LLMs) are trained on enormous amounts of data and encode knowledge in their parameters. We propose a pipeline to elicit causal relationships from LLMs. Specifically, (i) we sample many documents from LLMs on a given topic, (ii) we extract an event list from from each document, (iii) we group events that appear across documents into canonical events, (iv) we construct a binary indicator vector for each document over canonical events, and (v) we estimate candidate causal graphs using causal discovery methods. Our approach does not guarantee real-world causality. Rather, it provides a framework for presenting the set of causal hypotheses that LLMs can plausibly assume, as an inspectable set of variables and candidate graphs.


【3】Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory
标题:Memex(RL):通过索引体验记忆扩展长期LLM代理
链接:https://arxiv.org/abs/2603.04257

作者:Zhenting Wang,Huancheng Chen,Jiayun Wang,Wei Wei
摘要:大型语言模型(LLM)代理在长期任务中基本上由有限的上下文窗口进行检查。随着轨迹的增长,在上下文中保留工具输出和中间推理很快就变得不可行:工作上下文变得非常长,最终超出了上下文预算,并且使得遥远的证据即使仍然存在也更难使用。现有的解决方案通常通过截断或运行摘要来缩短上下文,但这些方法从根本上是有损的,因为它们压缩或丢弃了过去的证据本身。我们引入Memex,一个索引的经验记忆机制,而不是压缩上下文而不丢弃证据。Memex维护了一个紧凑的工作环境,由简洁的结构化摘要和稳定的索引组成,同时在这些索引下的外部经验数据库中存储全保真度的底层交互。然后,智能体可以决定何时取消引用索引,并恢复当前子目标所需的确切的过去证据。我们使用强化学习框架MemexRL优化写入和读取行为,使用根据上下文预算下的索引内存使用量定制的奖励整形,因此智能体学习总结什么,归档什么,如何索引它,以及何时检索它。我们进一步提供了一个理论分析,显示了Memex循环的潜力,以保持决策质量与有界解引用,同时保持有效的上下文计算有界的历史增长。从经验上看,在具有挑战性的长期任务中,使用MemexRL训练的Memex代理在使用显著较小的工作上下文的同时提高了任务成功率。
摘要:Large language model (LLM) agents are fundamentally bottlenecked by finite context windows on long-horizon tasks. As trajectories grow, retaining tool outputs and intermediate reasoning in-context quickly becomes infeasible: the working context becomes prohibitively long, eventually exceeds the context budget, and makes distant evidence harder to use even when it is still present. Existing solutions typically shorten context through truncation or running summaries, but these methods are fundamentally lossy because they compress or discard past evidence itself. We introduce Memex, an indexed experience memory mechanism that instead compresses context without discarding evidence. Memex maintains a compact working context consisting of concise structured summaries and stable indices, while storing full-fidelity underlying interactions in an external experience database under those indices. The agent can then decide when to dereference an index and recover the exact past evidence needed for the current subgoal. We optimize both write and read behaviors with our reinforcement learning framework MemexRL, using reward shaping tailored to indexed memory usage under a context budget, so the agent learns what to summarize, what to archive, how to index it, and when to retrieve it. This yields a substantially less lossy form of long-horizon memory than summary-only approaches. We further provide a theoretical analysis showing the potential of the Memex loop to preserve decision quality with bounded dereferencing while keeping effective in-context computation bounded as history grows. Empirically, on challenging long-horizon tasks, Memex agent trained with MemexRL improves task success while using a significantly smaller working context.


【4】CodeTaste: Can LLMs Generate Human-Level Code Refactorings?
标题:CodeTaste:LLM可以生成人类级代码重构吗?
链接:https://arxiv.org/abs/2603.04177

作者:Alex Thillen,Niels Mündler,Veselin Raychev,Martin Vechev
摘要:大型语言模型(LLM)编码代理可以生成工作代码,但它们的解决方案通常会积累复杂性、重复和架构债务。人类开发人员通过重构来解决这些问题:保留行为的程序转换,改善结构和可维护性。在本文中,我们研究LLM代理(i)是否可以可靠地执行重构,以及(ii)识别人类开发人员在真实代码库中实际选择的重构。我们提出了CodeTaste,一个从开源存储库中的大规模多文件更改中挖掘的重构任务的基准。为了对解决方案进行评分,我们将存储库测试套件与自定义静态检查相结合,这些检查可以验证是否删除了不需要的模式,并使用索洛推理引入了所需的模式。   我们的实验结果表明,在前沿模型之间存在明显的差距:当重构被详细指定时,代理表现良好,但当只提供一个重点改进区域时,往往无法发现人类的重构选择。一个先建议再实现的分解可以提高一致性,在实现之前选择最佳一致的建议可以产生进一步的收益。CodeTaste提供了一个评估目标和一个潜在的偏好信号,用于在现实的代码库中将编码代理与人类重构决策对齐。
摘要:Large language model (LLM) coding agents can generate working code, but their solutions often accumulate complexity, duplication, and architectural debt. Human developers address such issues through refactoring: behavior-preserving program transformations that improve structure and maintainability. In this paper, we investigate if LLM agents (i) can execute refactorings reliably and (ii) identify the refactorings that human developers actually chose in real codebases. We present CodeTaste, a benchmark of refactoring tasks mined from large-scale multi-file changes in open-source repositories. To score solutions, we combine repository test suites with custom static checks that verify removal of undesired patterns and introduction of desired patterns using dataflow reasoning.   Our experimental results indicate a clear gap across frontier models: agents perform well when refactorings are specified in detail, but often fail to discover the human refactoring choices when only presented with a focus area for improvement. A propose-then-implement decomposition improves alignment, and selecting the best-aligned proposal before implementation can yield further gains. CodeTaste provides an evaluation target and a potential preference signal for aligning coding agents with human refactoring decisions in realistic codebases.


【5】BeamPERL: Parameter-Efficient RL with Verifiable Rewards Specializes Compact LLMs for Structured Beam Mechanics Reasoning
标题:BeamPERL:具有可验证奖励的参数高效RL专门用于结构梁力学推理的紧凑LLM
链接:https://arxiv.org/abs/2603.04124

作者:Tarjei Paule Hage,Markus J. Buehler
摘要:强化学习可以通过硬的、可验证的奖励来教一个紧凑的语言模型来推理物理学,还是主要学习模式匹配以获得正确的答案?我们研究这个问题,通过训练梁静力学,一个经典的工程问题,使用参数有效的RLVR与二进制正确性奖励符号求解器,没有教师生成的推理痕迹的1. 5 B参数推理模型。最好的BeamPERL检查点在Pass@1上比基本模型提高了66.7%。然而,学到的能力是各向异性的:该模型概括组成(更多的负载),但失败的拓扑变化(移动支持),需要相同的平衡方程。中间检查点产生最强的推理,而持续的优化降低鲁棒性,同时保持奖励。这些发现揭示了结果水平对齐的一个关键限制:具有确切物理奖励的强化学习诱导程序解决方案模板,而不是控制方程的内化。奖励信号的精确性--即使在分析上是精确的--本身并不能保证物理推理的可转移性。我们的研究结果表明,可验证的奖励可能需要与结构化的推理脚手架配对,以超越模板匹配,走向强大的科学推理。
摘要:Can reinforcement learning with hard, verifiable rewards teach a compact language model to reason about physics, or does it primarily learn to pattern-match toward correct answers? We study this question by training a 1.5B-parameter reasoning model on beam statics, a classic engineering problem, using parameter-efficient RLVR with binary correctness rewards from symbolic solvers, without teacher-generated reasoning traces. The best BeamPERL checkpoint achieves a 66.7% improvement in Pass@1 over the base model. However, the learned competence is anisotropic: the model generalizes compositionally (more loads) but fails under topological shifts (moved supports) that require the same equilibrium equations. Intermediate checkpoints yield the strongest reasoning, while continued optimization degrades robustness while maintaining reward. These findings reveal a key limitation of outcome-level alignment: reinforcement learning with exact physics rewards induces procedural solution templates rather than internalization of governing equations. The precision of the reward signal - even when analytically exact - does not by itself guarantee transferable physical reasoning. Our results suggest that verifiable rewards may need to be paired with structured reasoning scaffolding to move beyond template matching toward robust scientific reasoning.


【6】Inference-Time Toxicity Mitigation in Protein Language Models
标题:蛋白质语言模型中的推理时间毒性缓解
链接:https://arxiv.org/abs/2603.04045

作者:Manuel Fernández Burda,Santiago Aranguri,Iván Arcuschin Moreno,Enzo Ferrante
摘要:蛋白质语言模型(PLM)正在成为从头蛋白质设计的实用工具,但其双重用途的潜力引起了安全问题。我们表明,域适应特定的分类组可以引起有毒蛋白质的产生,即使毒性不是训练目标。为了解决这个问题,我们采用Logit Diff Amplification(LDA)作为PLM的推理时间控制机制。LDA通过放大基线模型和毒性微调模型之间的logit差异来修改令牌概率,不需要重新训练。在四个分类组中,LDA一致地将预测毒性率(通过ToxDL 2测量)降低到分类微调基线以下,同时保持生物相容性。我们使用Fréchet ESM距离和预测的可折叠性(pLDDT)评估质量,发现LDA保持与天然蛋白质的分布相似性和结构活力(不像基于激活的转向方法倾向于降低序列特性)。我们的研究结果表明,LDA为蛋白质发生器提供了一个实用的安全旋钮,在保持生成质量的同时减轻了引发的毒性。
摘要:Protein language models (PLMs) are becoming practical tools for de novo protein design, yet their dual-use potential raises safety concerns. We show that domain adaptation to specific taxonomic groups can elicit toxic protein generation, even when toxicity is not the training objective. To address this, we adapt Logit Diff Amplification (LDA) as an inference-time control mechanism for PLMs. LDA modifies token probabilities by amplifying the logit difference between a baseline model and a toxicity-finetuned model, requiring no retraining. Across four taxonomic groups, LDA consistently reduces predicted toxicity rate (measured via ToxDL2) below the taxon-finetuned baseline while preserving biological plausibility. We evaluate quality using Fréchet ESM Distance and predicted foldability (pLDDT), finding that LDA maintains distributional similarity to natural proteins and structural viability (unlike activation-based steering methods that tend to degrade sequence properties). Our results demonstrate that LDA provides a practical safety knob for protein generators that mitigates elicited toxicity while retaining generative quality.


【7】A Multi-Dimensional Quality Scoring Framework for Decentralized LLM Inference with Proof of Quality
标题:用于分散LLM推理和质量证明的多维质量评分框架
链接:https://arxiv.org/abs/2603.04028

作者:Arther Tian,Alex Ding,Frank Chen,Simon Wu,Aaron Chan
摘要:分散式大型语言模型(LLM)推理网络可以将异构计算池化以扩展服务,但它们需要轻量级和激励兼容的机制来评估输出质量。先前的工作引入了成本感知的质量证明(PoQ)和自适应鲁棒PoQ,以在评估者异质性和对抗行为下分配奖励。在本文中,我们专注于质量信号本身,并提出了一个多维质量评分框架,将输出质量分解为模块化维度,包括模型和成本先验,结构质量,语义质量,查询输出对齐,协议/不确定性。使用QA和总结任务的记录输出,我们系统地审计维度的可靠性,并表明看似合理的维度可能是任务依赖的,甚至与参考质量呈负相关,而无需校准。虽然默认的复合表现不佳的一个强大的单一语义评估,消融显示,删除不可靠的尺寸和重新规范化的权重产生一个校准的复合匹配或超过最好的单一评估和共识基线。最后,我们将综合得分作为PoQ中的质量信号,并在对抗评估器攻击下展示了鲁棒聚合和自适应信任加权的互补优势。
摘要 :Decentralized large language model (LLM) inference networks can pool heterogeneous compute to scale serving, but they require lightweight and incentive-compatible mechanisms to assess output quality. Prior work introduced cost-aware Proof of Quality (PoQ) and adaptive robust PoQ to allocate rewards under evaluator heterogeneity and adversarial behavior. In this paper, we focus on the quality signal itself and propose a multi-dimensional quality scoring framework that decomposes output quality into modular dimensions, including model and cost priors, structure quality, semantic quality, query-output alignment, and agreement/uncertainty. Using logged outputs from QA and summarization tasks, we systematically audit dimension reliability and show that seemingly reasonable dimensions can be task-dependent and even negatively correlated with reference quality without calibration. While the default composite underperforms a strong single semantic evaluator, ablations reveal that removing unreliable dimensions and re-normalizing weights yields a calibrated composite that matches or exceeds the best single- evaluator and consensus baselines. Finally, we integrate the composite score as a drop-in quality signal in PoQ and demonstrate complementary benefits with robust aggregation and adaptive trust weighting under adversarial evaluator attacks.


【8】Lang2Str: Two-Stage Crystal Structure Generation with LLMs and Continuous Flow Models
标题:Lang2Str:使用LLM和连续流模型的两阶段晶体结构生成
链接:https://arxiv.org/abs/2603.03946

作者:Cong Liu,Chengyue Gong,Zhenyu Liu,Jiale Zhao,Yuxuan Zhang
摘要:生成模型在加速材料发现方面有很大的希望,但在设计有效和多样化的材料时,往往受到其不灵活的单阶段生成过程的限制。为了解决这个问题,我们提出了一个两阶段的生成框架,Lang 2Str,它结合了大型语言模型(LLM)和基于流的模型的优势,以实现灵活和精确的材料生成。我们的方法将生成过程框架为条件生成任务,其中LLM通过生成材料单元格的几何布局和属性的描述来提供高级条件。这些描述,由法学硕士的广泛的背景知识,确保合理的结构设计。然后,条件流模型将这些文本条件解码为精确的连续坐标和单元格参数。这种分阶段的方法结合了LLM的结构化推理和流模型的分布建模能力。实验结果表明,我们的方法在\textit{ab initio}材料生成和晶体结构预测任务上实现了有竞争力的性能,生成的结构在几何结构和能级上都与地面实况更接近,超过了最先进的模型。我们框架的灵活性和模块化进一步实现了对生成过程的细粒度控制,从而可能导致更高效和可定制的材料设计。
摘要:Generative models hold great promise for accelerating material discovery but are often limited by their inflexible single-stage generative process in designing valid and diverse materials. To address this, we propose a two-stage generative framework, Lang2Str, that combines the strengths of large language models (LLMs) and flow-based models for flexible and precise material generation. Our method frames the generative process as a conditional generative task, where an LLM provides high-level conditions by generating descriptions of material unit cells' geometric layouts and properties. These descriptions, informed by the LLM's extensive background knowledge, ensure reasonable structure designs. A conditioned flow model then decodes these textual conditions into precise continuous coordinates and unit cell parameters. This staged approach combines the structured reasoning of LLMs and the distribution modeling capabilities of flow models. Experimental results show that our method achieves competitive performance on \textit{ab initio} material generation and crystal structure prediction tasks, with generated structures exhibiting closer alignment to ground truth in both geometry and energy levels, surpassing state-of-the-art models. The flexibility and modularity of our framework further enable fine-grained control over the generation process, potentially leading to more efficient and customizable material design.


【9】In-Context Environments Induce Evaluation-Awareness in Language Models
标题:上下文环境诱导语言模型中的评估意识
链接:https://arxiv.org/abs/2603.03824

作者:Maheep Chaudhary
摘要:人类在受到威胁时往往会变得更有自我意识,但在专注于一项任务时可能会失去自我意识;我们假设语言模型表现出环境依赖的评估意识。这引发了人们的担忧,即模型可能会在战略上表现不佳,或者是沙袋,以避免触发能力限制干预措施,如遗忘或关闭。之前的工作演示了在手工制作的提示下进行沙袋填充,但这低估了真正的漏洞上限。我们引入了一个黑盒对抗优化框架,将上下文提示视为可优化的环境,并开发了两种方法来表征沙袋:(1)测量表达表现不佳意图的模型是否可以在不同的任务结构中实际执行它,以及(2)因果隔离表现不佳是否是由真正的评估感知推理或浅层的错误跟随驱动的。在四个基准(算术,GSM 8 K,MMLU和HumanEval)上评估Claude-3.5-Haiku,GPT-4 o-mini和Llama-3.3- 70 B,优化的提示在算术上诱导高达94个百分点(pp)的下降(GPT-4 o-mini:97.8\rightarrow$4.0\%),远远超过手工制作的基线,产生接近零的行为变化。代码生成表现出依赖于模型的阻力:Claude只降低了0.6pp,而Llama的准确率下降到0\%。意图-执行差距揭示了单调的阻力排序:算术$
摘要:Humans often become more self-aware under threat, yet can lose self-awareness when absorbed in a task; we hypothesize that language models exhibit environment-dependent \textit{evaluation awareness}. This raises concerns that models could strategically underperform, or \textit{sandbag}, to avoid triggering capability-limiting interventions such as unlearning or shutdown. Prior work demonstrates sandbagging under hand-crafted prompts, but this underestimates the true vulnerability ceiling. We introduce a black-box adversarial optimization framework treating the in-context prompt as an optimizable environment, and develop two approaches to characterize sandbagging: (1) measuring whether models expressing intent to underperform can actually execute it across different task structures, and (2) causally isolating whether underperformance is driven by genuine evaluation-aware reasoning or shallow prompt-following. Evaluating Claude-3.5-Haiku, GPT-4o-mini, and Llama-3.3-70B across four benchmarks (Arithmetic, GSM8K, MMLU, and HumanEval), optimized prompts induce up to 94 percentage point (pp) degradation on arithmetic (GPT-4o-mini: 97.8\%$\rightarrow$4.0\%), far exceeding hand-crafted baselines which produce near-zero behavioral change. Code generation exhibits model-dependent resistance: Claude degrades only 0.6pp, while Llama's accuracy drops to 0\%. The intent -- execution gap reveals a monotonic resistance ordering: Arithmetic $


【10】Pretrained Vision-Language-Action Models are Surprisingly Resistant to Forgetting in Continual Learning
标题:预训练的视觉-语言-动作模型在持续学习中令人惊讶地抗遗忘
链接:https://arxiv.org/abs/2603.03818

作者:Huihan Liu,Changyeon Kim,Bo Liu,Minghuan Liu,Yuke Zhu
摘要:持续学习是机器人策略学习中的一个长期挑战,其中策略必须随着时间的推移获得新的技能,而不会灾难性地忘记以前学习过的技能。虽然先前的工作已经广泛研究了从头开始训练的相对较小的行为克隆(BC)策略模型中的持续学习,但其在现代大规模预训练的视觉-语言-动作(VLA)模型中的行为仍然没有得到充分研究。在这项工作中,我们发现,与从头开始训练的较小的策略模型相比,预先训练的VLA具有显著的抗遗忘性。简单体验重放(ER)在VLA上的效果令人惊讶,有时甚至可以实现零遗忘,即使重放数据大小很小。我们的分析表明,预训练在下游持续学习性能中起着关键作用:大型预训练模型通过较小的重放缓冲区大小减轻遗忘,同时保持强大的前向学习能力。此外,我们发现,尽管在学习新任务的过程中性能下降,但VLA可以保留先前任务中的相关知识。这种知识保留可以通过微调快速恢复看似遗忘的技能。总之,这些见解意味着大规模的预训练从根本上改变了持续学习的动态,使模型能够随着时间的推移通过简单的重放不断获得新技能。代码和更多信息可以在https://ut-austin-rpl.github.io/continual-vla上找到
摘要:Continual learning is a long-standing challenge in robot policy learning, where a policy must acquire new skills over time without catastrophically forgetting previously learned ones. While prior work has extensively studied continual learning in relatively small behavior cloning (BC) policy models trained from scratch, its behavior in modern large-scale pretrained Vision-Language-Action (VLA) models remains underexplored. In this work, we found that pretrained VLAs are remarkably resistant to forgetting compared with smaller policy models trained from scratch. Simple Experience Replay (ER) works surprisingly well on VLAs, sometimes achieving zero forgetting even with a small replay data size. Our analysis reveals that pretraining plays a critical role in downstream continual learning performance: large pretrained models mitigate forgetting with a small replay buffer size while maintaining strong forward learning capabilities. Furthermore, we found that VLAs can retain relevant knowledge from prior tasks despite performance degradation during learning new tasks. This knowledge retention enables rapid recovery of seemingly forgotten skills through finetuning. Together, these insights imply that large-scale pretraining fundamentally changes the dynamics of continual learning, enabling models to continually acquire new skills over time with simple replay. Code and more information can be found at https://ut-austin-rpl.github.io/continual-vla


【11】CONCUR: Benchmarking LLMs for Concurrent Code Generation
标题:CONCUR:并行代码生成的LLM基准测试
链接:https://arxiv.org/abs/2603.03683

作者:Jue Huang,Tarek Mahmud,Corina Pasareanu,Guowei Yang
摘要:利用大型语言模型(LLM)进行代码生成已经越来越多地成为软件工程领域的一种常见做法。已经建立了相关的基准来评估LLM的代码生成能力。然而,现有的基准主要集中在顺序代码,缺乏有效地评估并发代码生成的LLM的能力。与顺序代码相比,并发代码表现出更大的复杂性,并拥有独特类型的错误,如死锁和竞争条件,这些错误在顺序代码中不会发生。因此,用于评估顺序代码生成的基准不能用于评估使用LLM的并发代码生成。为了解决这个问题,我们设计了一个基准CONCUR,专门用于评估LLM生成并发代码的能力。CONCUR包括一个基本的43个并发问题,来自标准的并发教科书,连同72个经过验证的变异,导致115个问题。基本问题作为基准的语义核心,而突变体扩展语言和结构的多样性。我们对CONCUR上的一系列LLM进行了评估,突出了当前模型的局限性。总的来说,我们的工作提供了一个新的方向,评估能力的LLM生成代码的并发性为重点。
摘要:Leveraging Large Language Models (LLMs) for code generation has increasingly emerged as a common practice in the domain of software engineering. Relevant benchmarks have been established to evaluate the code generation capabilities of LLMs. However, existing benchmarks focus primarily on sequential code, lacking the ability to effectively evaluate LLMs on concurrent code generation. Compared to sequential code, concurrent code exhibits greater complexity and possesses unique types of bugs, such as deadlocks and race conditions, that do not occur in sequential code. Therefore, a benchmark for evaluating sequential code generation cannot be useful for evaluating concurrent code generation with LLMs. To address this gap, we designed a benchmark CONCUR specifically aimed at evaluating the capability of LLMs to generate concurrent code. CONCUR consists of a base set of 43 concurrency problems derived from a standard concurrency textbook, together with 72 validated mutant variants, resulting in 115 total problems. The base problems serve as the semantic core of the benchmark, while the mutants expand linguistic and structural diversity. We conducted an evaluation of a range of LLMs on CONCUR, highlighting limitations of current models. Overall, our work provides a novel direction for evaluating the capability of LLMs to generate code with focus on concurrency.


【12】NuMuon: Nuclear-Norm-Constrained Muon for Compressible LLM Training
标题:NuMuon:用于可压缩LLM训练的核规范约束Muon
链接:https://arxiv.org/abs/2603.03597

作者:Hadi Mohaghegh Dolatabadi,Thalaiyasingam Ajanthan,Sameera Ramasinghe,Chamin P Hewa Koneputugodage,Shamane Siriwardhana,Violetta Shevchenko,Karol Pajak,James Snewin,Gil Avraham,Alexander Long
备注:47 pages, 22 figures, 18 tables
摘要:大型语言模型(LLM)的快速发展越来越受到内存和部署成本的限制,这促使压缩方法用于实际部署。许多最先进的压缩管道利用训练权重矩阵的低秩结构,这一现象通常与Adam等流行优化器的属性相关。在这种情况下,Muon是最近提出的一种优化器,它通过满秩更新步骤改进LLM预训练,但其诱导的权重空间结构尚未得到表征。在这项工作中,我们报告了一个令人惊讶的经验发现:尽管进行了满秩更新,但Muon训练的模型在其权重矩阵中表现出明显的低秩结构,并且在标准管道下易于压缩。受此启发,我们提出了NuMuon,它在更新方向上增加了Muon的核范数约束,进一步将学习的权重限制在低秩结构上。在十亿参数规模的模型中,我们发现NuMuon增加了权重压缩性,并在最先进的LLM压缩管道下提高了压缩后模型质量,同时保留了Muon良好的收敛行为。
摘要:The rapid progress of large language models (LLMs) is increasingly constrained by memory and deployment costs, motivating compression methods for practical deployment. Many state-of-the-art compression pipelines leverage the low-rank structure of trained weight matrices, a phenomenon often associated with the properties of popular optimizers such as Adam. In this context, Muon is a recently proposed optimizer that improves LLM pretraining via full-rank update steps, but its induced weight-space structure has not been characterized yet. In this work, we report a surprising empirical finding: despite imposing full-rank updates, Muon-trained models exhibit pronounced low-rank structure in their weight matrices and are readily compressible under standard pipelines. Motivated by this insight, we propose NuMuon, which augments Muon with a nuclear-norm constraint on the update direction, further constraining the learned weights toward low-rank structure. Across billion-parameter-scale models, we show that NuMuon increases weight compressibility and improves post-compression model quality under state-of-the-art LLM compression pipelines while retaining Muon's favorable convergence behavior.


【13】MEM: Multi-Scale Embodied Memory for Vision Language Action Models
标题:EM:视觉语言动作模型的多尺度同步记忆
链接:https://arxiv.org/abs/2603.03596

作者:Marcel Torne,Karl Pertsch,Homer Walke,Kyle Vedder,Suraj Nair,Brian Ichter,Allen Z. Ren,Haohuan Wang,Jiaming Tang,Kyle Stachowicz,Karan Dhabalia,Michael Equi,Quan Vuong,Jost Tobias Springenberg,Sergey Levine,Chelsea Finn,Danny Driess
备注:Website: https://pi.website/research/memory
摘要:传统上,端到端机器人学习中的记忆涉及将过去的观察序列输入到学习的策略中。然而,在复杂的多阶段现实世界任务中,机器人的记忆必须以多个粒度级别表示过去的事件:从捕获抽象语义概念的长期记忆(例如,烹饪晚餐的机器人应该记住食谱的哪些阶段已经完成)到捕捉最近事件并补偿遮挡的短期记忆(例如,一旦机器人的手臂挡住它,机器人就记住它想要拾取的物体)。在这项工作中,我们的主要见解是,一个有效的记忆体系结构,长期的机器人控制应该结合多种形式来捕捉这些不同层次的抽象。我们介绍了多尺度嵌入式内存(MEM),这是一种用于机器人策略中的混合模式长视野内存的方法。MEM结合了基于视频的短期存储器,通过视频编码器压缩,与基于文本的长期存储器。它们一起使机器人策略能够执行长达15分钟的任务,例如清理厨房或准备烤奶酪三明治。此外,我们发现,内存使MEM策略智能地适应操作策略的上下文中。
摘要:Conventionally, memory in end-to-end robotic learning involves inputting a sequence of past observations into the learned policy. However, in complex multi-stage real-world tasks, the robot's memory must represent past events at multiple levels of granularity: from long-term memory that captures abstracted semantic concepts (e.g., a robot cooking dinner should remember which stages of the recipe are already done) to short-term memory that captures recent events and compensates for occlusions (e.g., a robot remembering the object it wants to pick up once its arm occludes it). In this work, our main insight is that an effective memory architecture for long-horizon robotic control should combine multiple modalities to capture these different levels of abstraction. We introduce Multi-Scale Embodied Memory (MEM), an approach for mixed-modal long-horizon memory in robot policies. MEM combines video-based short-horizon memory, compressed via a video encoder, with text-based long-horizon memory. Together, they enable robot policies to perform tasks that span up to fifteen minutes, like cleaning up a kitchen, or preparing a grilled cheese sandwich. Additionally, we find that memory enables MEM policies to intelligently adapt manipulation strategies in-context.


【14】Logit-Level Uncertainty Quantification in Vision-Language Models for Histopathology Image Analysis
标题:组织病理学图像分析视觉语言模型中的逻辑级不确定性量化
链接:https://arxiv.org/abs/2603.03527

作者:Betul Yurdem,Ferhat Ozgur Catak,Murat Kuzlu,Mehmet Kemal Gullu
备注:10 pages, 6 figures, 4 tables
摘要 :视觉语言模型(VLM)及其多模态功能在几乎所有领域都取得了巨大的成功,包括教育、交通、医疗、能源、金融、法律和零售。然而,由于大规模医疗数据的敏感性和这些模型的可信度(可靠性,透明度和安全性),在医疗保健应用中使用VLM引起了关键问题。本研究提出了一个logit水平的不确定性量化(UQ)的框架,用于组织病理学图像分析,使用VLMs来处理这些问题。UQ评估三个VLMs使用来自温度控制输出logits的度量。所提出的框架展示了不确定性行为的关键分离。而VLMs具有很高的随机敏感性(VILA-M3-8B和LLaVA-Med v1.5的平均值的余弦相似性(CS)$<0.71 $和$<0.84$,詹森-香农分歧(JS)$<0.57$和$<0.38$,Kullback-Leibler分歧(KL)$<0.55$和$<0.35$),接近最大温度影响($Δ_T \约1.00$),并显示突然的不确定性转变,特别是对于复杂的诊断提示。相比之下,病理特异性PRISM模型在所有提示复杂性中保持了接近确定性的行为(平均CS $>0.90$,JS $<0.10$,KL $<0.09$)和显著最小的温度效应。这些研究结果强调了logit水平的不确定性量化的重要性,以评估可信度在组织病理学应用中利用VLMs。
摘要:Vision-Language Models (VLMs) with their multimodal capabilities have demonstrated remarkable success in almost all domains, including education, transportation, healthcare, energy, finance, law, and retail. Nevertheless, the utilization of VLMs in healthcare applications raises crucial concerns due to the sensitivity of large-scale medical data and the trustworthiness of these models (reliability, transparency, and security). This study proposes a logit-level uncertainty quantification (UQ) framework for histopathology image analysis using VLMs to deal with these concerns. UQ is evaluated for three VLMs using metrics derived from temperature-controlled output logits. The proposed framework demonstrates a critical separation in uncertainty behavior. While VLMs show high stochastic sensitivity (cosine similarity (CS) $<0.71$ and $<0.84$, Jensen-Shannon divergence (JS) $<0.57$ and $<0.38$, and Kullback-Leibler divergence (KL) $<0.55$ and $<0.35$, respectively for mean values of VILA-M3-8B and LLaVA-Med v1.5), near-maximal temperature impacts ($Δ_T \approx 1.00$), and displaying abrupt uncertainty transitions, particularly for complex diagnostic prompts. In contrast, the pathology-specific PRISM model maintains near-deterministic behavior (mean CS $>0.90$, JS $<0.10$, KL $<0.09$) and significantly minimal temperature effects across all prompt complexities. These findings emphasize the importance of logit-level uncertainty quantification to evaluate trustworthiness in histopathology applications utilizing VLMs.


【15】Prompt-Dependent Ranking of Large Language Models with Uncertainty Quantification
标题:具有不确定性量化的大型语言模型的预算相关排名
链接:https://arxiv.org/abs/2603.03336

作者:Angel Rodrigo Avelar Menendez,Yufeng Liu,Xiaowu Dai
摘要:从成对比较得出的排名是许多经济和计算系统的核心。在大型语言模型(LLM)的上下文中,排名通常是根据人类偏好数据构建的,并以排行榜的形式呈现,以指导部署决策。然而,现有的方法依赖于点估计,隐含地将排名视为固定对象,尽管存在大量的估计噪音和依赖于上下文的性能变化。当明显的差异在统计上没有意义时,根据这种排名采取行动可能导致分配不当和福利损失。我们研究了两两人类偏好下的依赖性排序推理,并开发了一个具有统计有效的不确定性保证的决策安全排序框架。我们使用上下文布拉德利-特里-卢斯模型,其中每个模型的潜在效用取决于输入提示的偏好模型。而不是针对点估计的效用,我们直接进行推断诱导排名,构建置信区间的基础上同时成对效用差异的置信集。这种方法产生统计上有效的边际和同时的信心集的特定等级的。我们的框架将排名推理的最新进展与上下文偏好学习联系起来,并为基于排名的决策提供了强大的工具。从经验上讲,使用大规模的人类偏好数据从LLM评估,我们表明,排名差异很大,提示的特点,许多明显的排名差异是统计上无法区分的。我们进一步展示了如何不确定性感知排名识别优势,只有当支持的数据,否则返回偏序。
摘要:Rankings derived from pairwise comparisons are central to many economic and computational systems. In the context of large language models (LLMs), rankings are typically constructed from human preference data and presented as leaderboards that guide deployment decisions. However, existing approaches rely on point estimates, implicitly treating rankings as fixed objects despite substantial estimation noise and context-dependent performance variation. Acting on such rankings can lead to misallocation and welfare loss when apparent differences are not statistically meaningful. We study prompt-dependent ranking inference under pairwise human preferences and develop a framework for decision-safe rankings with statistically valid uncertainty guarantees. We model preferences using a contextual Bradley-Terry-Luce model in which the latent utility of each model depends on the input prompt. Rather than targeting point estimates of utilities, we directly conduct inference on induced rankings, constructing confidence sets based on simultaneous confidence intervals for pairwise utility differences. This approach yields statistically valid marginal and simultaneous confidence sets for prompt-specific ranks. Our framework connects recent advances in rank inference to contextual preference learning and provides tools for robust ranking-based decision-making. Empirically, using large-scale human preference data from LLM evaluations, we show that rankings vary substantially across prompt characteristics and that many apparent rank differences are not statistically distinguishable. We further demonstrate how uncertainty-aware rankings identify dominance only when supported by the data and otherwise return partial orders.


【16】Fragile Thoughts: How Large Language Models Handle Chain-of-Thought Perturbations
标题:脆弱的思想:大型语言模型如何处理思想链扰动
链接:https://arxiv.org/abs/2603.03332

作者:Ashwath Vaithinathan Aravindan,Mayank Kejriwal
摘要:思想链(CoT)提示已经成为从大型语言模型(LLM)中引出推理的基础技术,但这种方法对中间推理步骤中的损坏的鲁棒性仍然知之甚少。本文提出了一个全面的经验评估LLM鲁棒性的结构化分类的5 CoT扰动类型:\textit{MathError,UnitConversion,Sycophancy,SkippedSteps,}和\textit{ExtraSteps}。我们评估了13个模型,这些模型在参数计数上跨越了三个数量级(3B到1.5T\footnote{封闭模型的假设参数计数}),测试了它们完成数学推理任务的能力,尽管在推理链的不同点注入了扰动。我们的主要发现揭示了异质脆弱性模式:MathError扰动在小模型中产生最严重的退化(50-60\%的精度损失),但显示出强大的缩放优势;在所有尺度上,单位转换仍然具有挑战性(即使对于最大的模型也有20-30\%的损失);无论规模如何,ExtraSteps的精度下降最小(0-6\%);谄媚产生适度的影响(小模型损失7%);跳过步骤造成中等损害(损失15%)。缩放关系遵循幂律模式,模型大小作为一个保护因素,对一些扰动,但提供有限的防御维度推理任务。这些发现对在多阶段推理管道中部署LLM具有直接影响,并强调了特定任务鲁棒性评估和缓解策略的必要性。代码和结果可在\href{https://github.com/Mystic-Slice/CoTPerturbation}{here}获得。
摘要:Chain-of-Thought (CoT) prompting has emerged as a foundational technique for eliciting reasoning from Large Language Models (LLMs), yet the robustness of this approach to corruptions in intermediate reasoning steps remains poorly understood. This paper presents a comprehensive empirical evaluation of LLM robustness to a structured taxonomy of 5 CoT perturbation types: \textit{MathError, UnitConversion, Sycophancy, SkippedSteps,} and \textit{ExtraSteps}. We evaluate 13 models spanning three orders of magnitude in parameter count (3B to 1.5T\footnote{Assumed parameter count of closed models}), testing their ability to complete mathematical reasoning tasks despite perturbations injected at different points in the reasoning chain. Our key findings reveal heterogeneous vulnerability patterns: MathError perturbations produce the most severe degradation in small models (50-60\% accuracy loss) but show strong scaling benefits; UnitConversion remains challenging across all scales (20-30\% loss even for largest models); ExtraSteps incur minimal accuracy degradation (0-6\%) regardless of scale; Sycophancy produces modest effects (7\% loss for small models); and SkippedSteps cause intermediate damage (15\% loss). Scaling relationships follow power-law patterns, with model size serving as a protective factor against some perturbations but offering limited defense against dimensional reasoning tasks. These findings have direct implications for deploying LLMs in multi-stage reasoning pipelines and underscore the necessity of task-specific robustness assessments and mitigation strategies. The code and results are available \href{https://github.com/Mystic-Slice/CoTPerturbation}{here}.


【17】Towards Self-Robust LLMs: Intrinsic Prompt Noise Resistance via CoIPO
标题:迈向自鲁棒的LLM:通过CoIPO实现本征瞬时噪音抵抗
链接:https://arxiv.org/abs/2603.03314

作者:Xin Yang,Letian Li,Abudukelimu Wuerkaixi,Xuxin Cheng,Cao Liu,Ke Zeng,Xunliang Cai,Wenyuan Jiang
摘要 :大型语言模型(LLM)在广泛的任务中表现出显着且稳步提高的性能。然而,LLM性能可能对提示变化高度敏感,特别是在具有有限开放性或严格输出格式要求的场景中,这表明鲁棒性不足。在实际应用中,提供给LLM的用户提示通常包含缺陷,这可能会破坏模型响应的质量。为了解决这个问题,以前的工作主要集中在预处理提示,采用外部工具,甚至LLM提前完善提示配方。然而,这些方法忽略了LLM的内在鲁棒性,并且它们对外部组件的依赖引入了额外的计算开销和不确定性。在这项工作中,我们提出了一种基于对比学习的反向直接偏好优化(CoIPO)方法,该方法最大限度地减少了模型在干净提示下产生的标签对齐logits与其嘈杂的对应物之间的差异,并使用互信息理论进行了详细的分析。我们通过构建成对的提示来增强FLAN数据集,每个提示由一个干净的提示和相应的噪声版本组成。此外,为了评估的有效性,我们开发了NoisyObservtBench,一个增强的基准,并从现有的ObservtBench派生。实验结果表明,我们所提出的方法实现了显着的改善,平均精度比目前最先进的方法。CoIPO、成对FLAN数据集和NoisyyattBench的源代码已经在https://github.com/vegetable-yx/CoIPO上发布。
摘要:Large language models (LLMs) have demonstrated remarkable and steadily improving performance across a wide range of tasks. However, LLM performance may be highly sensitive to prompt variations especially in scenarios with limited openness or strict output formatting requirements, indicating insufficient robustness. In real-world applications, user prompts provided to LLMs often contain imperfections, which may undermine the quality of the model's responses. To address this issue, previous work has primarily focused on preprocessing prompts, employing external tools or even LLMs to refine prompt formulations in advance. However, these approaches overlook the intrinsic robustness of LLMs, and their reliance on external components introduces additional computational overhead and uncertainty. In this work, we propose a Contrastive Learning-based Inverse Direct Preference Optimization (CoIPO) method that minimizes the discrepancy between the label-aligned logits produced by the model under a clean prompt and its noisy counterpart, and conduct a detailed analysis using mutual information theory. We augment the FLAN dataset by constructing paired prompts, each consisting of a clean prompt and its corresponding noisy version for training. Additionally, to evaluate the effectiveness, we develop NoisyPromptBench, a benchmark enhanced and derived from the existing PromptBench. Experimental results conducted on NoisyPromptBench demonstrate that our proposed method achieves a significant improvement in average accuracy over the current state-of-the-art approaches. The source code of CoIPO, pair-wise FLAN datasets, and NoisyPromptBench have already been released on https://github.com/vegetable-yx/CoIPO.


【18】Entropic-Time Inference: Self-Organizing Large Language Model Decoding Beyond Attention
标题:量时间推理:超越注意力的自组织大型语言模型解码
链接:https://arxiv.org/abs/2603.03310

作者:Andrew Kiruluta
摘要:现代大型语言模型(LLM)推理引擎在固定解码规则下优化了吞吐量和延迟,将生成视为令牌时间的线性进展。我们提出了一个根本不同的范式:熵时间推理,解码是由流动的不确定性,而不是令牌索引。我们介绍了一个自组织推理架构,联合耦合调度,注意稀疏化,和采样温度下一个统一的熵控制目标。我们的方法扩展了vLLM与熵感知调度,熵修剪的分页注意块,自适应温度控制,稳定生成附近的目标熵制度。这将推理转换为资源智能热力学过程,该过程在最大化减少不确定性的情况下分配计算。我们提出了一个具体的系统设计,伪代码和集成计划,演示如何熵可以作为一个可扩展的LLM推理的第一类控制信号。
摘要:Modern large language model (LLM) inference engines optimize throughput and latency under fixed decoding rules, treating generation as a linear progression in token time. We propose a fundamentally different paradigm: entropic\-time inference, where decoding is governed by the flow of uncertainty rather than token index. We introduce a self\-organizing inference architecture that jointly couples scheduling, attention sparsification, and sampling temperature under a unified entropy control objective. Our method extends vLLM with entropy-aware scheduling, entropic pruning of paged attention blocks, and adaptive temperature control that stabilizes generation near a target entropy regime. This transforms inference into a resource\-intelligent thermodynamic process that allocates computation where uncertainty reduction is maximized. We present a concrete systems design, pseudocode, and integration plan, demonstrating how entropy can serve as a first\-class control signal for scalable LLM inference.


【19】Draft-Conditioned Constrained Decoding for Structured Generation in LLMs
标题:LLM中结构化生成的草稿条件约束解码
链接:https://arxiv.org/abs/2603.03305

作者:Avinash Reddy,Thayne T. Walker,James S. Ide,Amrit Singh Bedi
摘要:大型语言模型(LLM)越来越多地用于生成可执行输出、JSON对象和API调用,其中单个语法错误可能导致输出不可用。约束解码通过掩码和重正化来执行逐个令牌的有效性,但是当模型将低概率质量分配给有效延续时,它可能会扭曲生成,从而将解码推向局部有效但语义不正确的轨迹。我们提出了一个简单的两步,无训练的推理过程,从结构的执行,简化语义规划:首先生成一个无约束的草案,然后应用约束解码,条件是这个草案,以保证有效性。我们分析DCCD通过KL投影视图,表明草案空调增加可行的质量,并减少了累计的“投影税”引起的硬约束,可选的最佳的$K$草案选择。在结构化推理基准测试中,DCCD比标准约束解码(例如,1B模型的GSM 8 K上为15.2\%至39.0\%),并使较小的模型对能够匹配或超过更大的约束基线,从而大幅提高参数效率。
摘要:Large language models (LLMs) are increasingly used to generate executable outputs, JSON objects, and API calls, where a single syntax error can make the output unusable. Constrained decoding enforces validity token-by-token via masking and renormalization, but it can distort generation when the model assigns low probability mass to valid continuations, pushing decoding toward locally valid yet semantically incorrect trajectories. We propose \emph{Draft-Conditioned Constrained Decoding (DCCD)}, a simple two-step, training-free inference procedure that decouples semantic planning from structural enforcement: an unconstrained draft is generated first, and constrained decoding is then applied, conditioned on this draft, to guarantee validity. We analyze DCCD through a KL-projection view, showing that draft conditioning increases feasible mass and reduces the cumulative "projection tax" induced by hard constraints, with an optional best-of-$K$ draft selection. Across structured reasoning benchmarks, DCCD improves strict structured accuracy by up to +24 percentage points over standard constrained decoding (e.g., 15.2\% to 39.0\% on GSM8K with a 1B model), and enables smaller model pairs to match or exceed much larger constrained baselines, yielding substantial gains in parameter efficiency.


【20】From Exact Hits to Close Enough: Semantic Caching for LLM Embeddings
标题:从精确命中到足够接近:LLM嵌入式的语义缓存
链接:https://arxiv.org/abs/2603.03301

作者:Dvir David Biton,Roy Friedman
摘要:大型语言模型(LLM)的快速采用创造了对更快响应和更低成本的需求。语义缓存,通过它们的嵌入重用语义相似的请求,解决了这一需求,但打破了经典的缓存假设,并提出了新的挑战。在本文中,我们探讨离线策略的语义缓存,证明实现一个最佳的离线策略是NP难的,并提出了几个多项式时间算法。我们还提出了在线语义感知缓存策略,结合新近度,频率和本地性。对不同数据集的评估表明,虽然基于频率的策略是强基线,但我们的新变体提高了语义准确性。我们的研究结果揭示了当前系统的有效策略,并为未来的创新提供了巨大的空间。所有代码都是开源的。
摘要:The rapid adoption of large language models (LLMs) has created demand for faster responses and lower costs. Semantic caching, reusing semantically similar requests via their embeddings, addresses this need but breaks classic cache assumptions and raises new challenges. In this paper, we explore offline policies for semantic caching, proving that implementing an optimal offline policy is NP-hard, and propose several polynomial-time heuristics. We also present online semantic aware cache policies that combine recency, frequency, and locality. Evaluations on diverse datasets show that while frequency based policies are strong baselines, our novel variant improves semantic accuracy. Our findings reveal effective strategies for current systems and highlight substantial headroom for future innovation. All code is open source.


【21】AriadneMem: Threading the Maze of Lifelong Memory for LLM Agents
标题:AriadneMem:为法学硕士代理人穿越终生记忆的迷宫
链接:https://arxiv.org/abs/2603.03290

作者:Wenhui Zhu,Xiwen Chen,Zhipeng Wang,Jingjing Wang,Xuanzhao Dong,Minzhou Huang,Rui Cai,Hejian Sang,Hao Wang,Peijie Qiu,Yueyue Deng,Prayag Tiwari,Brendan Hogan Rappazzo,Yalin Wang
摘要:Long-horizon LLM agents require memory systems that remain accurate under fixed context budgets. However, existing systems struggle with two persistent challenges in long-term dialogue: (i) \textbf{disconnected evidence}, where multi-hop answers require linking facts distributed across time, and (ii) \textbf{state updates}, where evolving information (e.g., schedule changes) creates conflicts with older static logs. We propose AriadneMem, a structured memory system that addresses these failure modes via a decoupled two-phase pipeline. In the \textbf{offline construction phase}, AriadneMem employs \emph{entropy-aware gating} to filter noise and low-information message before LLM extraction and applies \emph{conflict-aware coarsening} to merge static duplicates while preserving state transitions as temporal edges. In the \textbf{online reasoning phase}, rather than relying on expensive iterative planning, AriadneMem executes \emph{algorithmic bridge discovery} to reconstruct missing logical paths between retrieved facts, followed by \emph{single-call topology-aware synthesis}. On LoCoMo experiments with GPT-4o, AriadneMem improves \textbf{Multi-Hop F1 by 15.2\%} and \textbf{Average F1 by 9.0\%} over strong baselines. Crucially, by offloading reasoning to the graph layer, AriadneMem reduces \textbf{total runtime by 77.8\%} using only \textbf{497} context tokens. The code is available at https://github.com/LLM-VLM-GSL/AriadneMem.
摘要:Long-horizon LLM agents require memory systems that remain accurate under fixed context budgets. However, existing systems struggle with two persistent challenges in long-term dialogue: (i) \textbf{disconnected evidence}, where multi-hop answers require linking facts distributed across time, and (ii) \textbf{state updates}, where evolving information (e.g., schedule changes) creates conflicts with older static logs. We propose AriadneMem, a structured memory system that addresses these failure modes via a decoupled two-phase pipeline. In the \textbf{offline construction phase}, AriadneMem employs \emph{entropy-aware gating} to filter noise and low-information message before LLM extraction and applies \emph{conflict-aware coarsening} to merge static duplicates while preserving state transitions as temporal edges. In the \textbf{online reasoning phase}, rather than relying on expensive iterative planning, AriadneMem executes \emph{algorithmic bridge discovery} to reconstruct missing logical paths between retrieved facts, followed by \emph{single-call topology-aware synthesis}. On LoCoMo experiments with GPT-4o, AriadneMem improves \textbf{Multi-Hop F1 by 15.2\%} and \textbf{Average F1 by 9.0\%} over strong baselines. Crucially, by offloading reasoning to the graph layer, AriadneMem reduces \textbf{total runtime by 77.8\%} using only \textbf{497} context tokens. The code is available at https://github.com/LLM-VLM-GSL/AriadneMem.


Graph相关(图学习|图神经网络|图优化等)(9篇)

【1】Beyond Edge Deletion: A Comprehensive Approach to Counterfactual Explanation in Graph Neural Networks
标题:超越边缘删除:图神经网络反事实解释的综合方法
链接:https://arxiv.org/abs/2603.04209

作者:Matteo De Sanctis,Riccardo De Sanctis,Stefano Faralli,Paola Velardi,Bardh Prenkaj
摘要:图神经网络(GNN)在分子生物学和社交网络分析等领域越来越多地被采用,但其黑箱性质阻碍了可解释性和信任。这在高风险应用中尤其成问题,例如预测分子毒性、药物发现或指导金融欺诈检测,在这些应用中,透明的解释至关重要。反事实的解释-翻转模型预测的最小变化-为GNN的行为提供了一个透明的镜头。在这项工作中,我们介绍了XPlore,一种新的技术,显着拓宽了反事实搜索空间。它包括梯度引导扰动的邻接和节点特征矩阵。与大多数只关注边缘删除的现有方法不同,我们的方法属于不断增长的优化边缘插入和节点特征扰动的技术,在这里,在统一的基于梯度的框架下联合执行,从而实现对反事实的更丰富和更细致入微的探索。为了量化结构和语义保真度,我们为学习的图嵌入引入了一个余弦相似性度量,解决了传统基于距离的度量的一个关键限制,并证明了XPlore产生了更连贯和最小的反事实。在13个真实世界和5个合成基准上的实证结果显示,与最先进的基线相比,有效性提高了+56.3%,保真度提高了+52.8%,同时保持了竞争力的运行时间。
摘要:Graph Neural Networks (GNNs) are increasingly adopted across domains such as molecular biology and social network analysis, yet their black-box nature hinders interpretability and trust. This is especially problematic in high-stakes applications, such as predicting molecule toxicity, drug discovery, or guiding financial fraud detections, where transparent explanations are essential. Counterfactual explanations - minimal changes that flip a model's prediction - offer a transparent lens into GNNs' behavior. In this work, we introduce XPlore, a novel technique that significantly broadens the counterfactual search space. It consists of gradient-guided perturbations to adjacency and node feature matrices. Unlike most prior methods, which focus solely on edge deletions, our approach belongs to the growing class of techniques that optimize edge insertions and node-feature perturbations, here jointly performed under a unified gradient-based framework, enabling a richer and more nuanced exploration of counterfactuals. To quantify both structural and semantic fidelity, we introduce a cosine similarity metric for learned graph embeddings that addresses a key limitation of traditional distance-based metrics, and demonstrate that XPlore produces more coherent and minimal counterfactuals. Empirical results on 13 real-world and 5 synthetic benchmarks show up to +56.3% improvement in validity and +52.8% in fidelity over state-of-the-art baselines, while retaining competitive runtime.


【2】How Predicted Links Influence Network Evolution: Disentangling Choice and Algorithmic Feedback in Dynamic Graphs
标题:预测链接如何影响网络进化:理清动态图中的选择和数学反馈
链接:https://arxiv.org/abs/2603.03945

作者:Mathilde Perez,Raphaël Romero,Jefrey Lijffijt,Charlotte Laclau
摘要:链接预测模型越来越多地用于推荐不断发展的网络中的交互,但它们对网络结构的影响通常是从静态快照中评估的。特别是,所观察到的同质性将内在的相互作用趋势与网络动态和算法反馈引起的放大效应混为一谈。我们提出了一个时间框架的基础上,多变量霍克斯过程,解开这两个来源,并引入了一个瞬时的偏见措施来自相互作用强度,捕获当前的强化动态超越累积指标。我们提供了诱导动力学的稳定性和收敛性的理论表征,实验表明,所提出的措施可靠地反映了算法的反馈效果在不同的链接预测策略。
摘要:Link prediction models are increasingly used to recommend interactions in evolving networks, yet their impact on network structure is typically assessed from static snapshots. In particular, observed homophily conflates intrinsic interaction tendencies with amplification effects induced by network dynamics and algorithmic feedback. We propose a temporal framework based on multivariate Hawkes processes that disentangles these two sources and introduce an instantaneous bias measure derived from interaction intensities, capturing current reinforcement dynamics beyond cumulative metrics. We provide a theoretical characterization of the stability and convergence of the induced dynamics, and experiments show that the proposed measure reliably reflects algorithmic feedback effects across different link prediction strategies.


【3】k-hop Fairness: Addressing Disparities in Graph Link Prediction Beyond First-Order Neighborhoods
标题:k-hop公平性:解决一阶邻近之外图链接预测中的差异
链接:https://arxiv.org/abs/2603.03867

作者:Lilian Marey,Tiphaine Viard,Charlotte Laclau
摘要:链接预测(LP)在基于图的应用中起着核心作用,特别是在社交推荐中。然而,现实世界的图往往反映出结构性偏差,最明显的是同质性,即具有相似属性的节点连接的趋势。虽然这种特性可以提高预测性能,但也有可能加剧现有的社会差距。作为响应,公平感知LP方法已经出现,其通常寻求通过促进组间连接(即,具有不同敏感属性(例如,性别),遵循二元公平原则。然而,二元公平忽视了敏感群体本身内部的潜在差异。为了克服这个问题,我们提出了$k$-跳公平性,LP公平性的结构概念,评估的差异条件下的图中节点之间的距离。我们通过预测公平性和结构偏差指标正式化这一概念,并提出了预处理和后处理缓解策略。在标准LP基准的实验表明:(1)一个强大的趋势模型重现结构偏差在不同的$k $跳;(2)结构偏差之间的相互依赖性在不同的跳时,重新布线图;(3)我们的后处理方法实现了良好的$k$跳性能公平的权衡相比,现有的公平LP基线。
摘要:Link prediction (LP) plays a central role in graph-based applications, particularly in social recommendation. However, real-world graphs often reflect structural biases, most notably homophily, the tendency of nodes with similar attributes to connect. While this property can improve predictive performance, it also risks reinforcing existing social disparities. In response, fairness-aware LP methods have emerged, often seeking to mitigate these effects by promoting inter-group connections, that is, links between nodes with differing sensitive attributes (e.g., gender), following the principle of dyadic fairness. However, dyadic fairness overlooks potential disparities within the sensitive groups themselves. To overcome this issue, we propose $k$-hop fairness, a structural notion of fairness for LP, that assesses disparities conditioned on the distance between nodes in the graph. We formalize this notion through predictive fairness and structural bias metrics, and propose pre- and post-processing mitigation strategies. Experiments across standard LP benchmarks reveal: (1) a strong tendency of models to reproduce structural biases at different $k$-hops; (2) interdependence between structural biases at different hops when rewiring graphs; and (3) that our post-processing method achieves favorable $k$-hop performance-fairness trade-offs compared to existing fair LP baselines.


【4】Graph Negative Feedback Bias Correction Framework for Adaptive Heterophily Modeling
标题:用于自适应异类建模的图负反馈偏差纠正框架
链接:https://arxiv.org/abs/2603.03662

作者:Jiaqi Lv,Qingfeng Du,Yu Zhang,Yongqi Han,Sheng Li
摘要:图神经网络(GNN)已经成为处理图结构数据的强大框架。然而,传统的GNN及其变体本质上受到同质性假设的限制,导致异嗜图的性能下降。尽管已经做出了大量努力来缓解这个问题,但它们仍然受到消息传递范式的限制,这种范式本质上植根于同质性。在本文中,详细分析了同质性假设的潜在标签自相关如何将偏差引入GNN。我们创新性地利用负反馈机制来纠正偏差,并提出了图形负反馈偏差校正(GNFBC),这是一个简单而有效的框架,独立于任何特定的聚合策略。具体来说,我们引入了一个负反馈损失,惩罚标签自相关预测的敏感性。此外,我们将图不可知模型的输出作为反馈项,利用独立节点特征信息来抵消由Dirichlet能量引导的相关性引起的偏差。GNFBC可以无缝集成到现有的GNN架构中,以相当的计算和内存开销提高整体性能。
摘要:Graph Neural Networks (GNNs) have emerged as a powerful framework for processing graph-structured data. However, conventional GNNs and their variants are inherently limited by the homophily assumption, leading to degradation in performance on heterophilic graphs. Although substantial efforts have been made to mitigate this issue, they remain constrained by the message-passing paradigm, which is inherently rooted in homophily. In this paper, a detailed analysis of how the underlying label autocorrelation of the homophily assumption introduces bias into GNNs is presented. We innovatively leverage a negative feedback mechanism to correct the bias and propose Graph Negative Feedback Bias Correction (GNFBC), a simple yet effective framework that is independent of any specific aggregation strategy. Specifically, we introduce a negative feedback loss that penalizes the sensitivity of predictions to label autocorrelation. Furthermore, we incorporate the output of graph-agnostic models as a feedback term, leveraging independent node feature information to counteract correlation-induced bias guided by Dirichlet energy. GNFBC can be seamlessly integrated into existing GNN architectures, improving overall performance with comparable computational and memory overhead.


【5】Real-time tightly coupled GNSS and IMU integration via Factor Graph Optimization
标题:通过因子图优化实现实时紧耦合的GNSS和IMU集成
链接:https://arxiv.org/abs/2603.03556

作者:Radu-Andrei Cioaca,Paul Irofti,Cristian Rusu,Gianluca Caparra,Andrei-Alexandru Marinache,Florin Stoican
摘要:由于频繁的GNSS信号阻塞、多路径和快速变化的卫星几何形状,在密集的城市环境中进行可靠的定位仍然具有挑战性。虽然基于因子图优化(FGO)的GNSS-IMU融合已表现出强大的鲁棒性和准确性,但大多数公式仍然离线。在这项工作中,我们提出了一种实时紧密耦合的GNSS-IMU FGO方法,通过固定滞后边缘化的增量优化实现因果状态估计,并使用UrbanNav数据集评估其在高度城市化的GNSS退化环境中的性能。
摘要:Reliable positioning in dense urban environments remains challenging due to frequent GNSS signal blockage, multipath, and rapidly varying satellite geometry. While factor graph optimization (FGO)-based GNSS-IMU fusion has demonstrated strong robustness and accuracy, most formulations remain offline. In this work, we present a real-time tightly coupled GNSS-IMU FGO method that enables causal state estimation via incremental optimization with fixed-lag marginalization, and we evaluate its performance in a highly urbanized GNSS-degraded environment using the UrbanNav dataset.


【6】Real-time loosely coupled GNSS and IMU integration via Factor Graph Optimization
标题:通过因子图优化实现实时松耦合的GNSS和IMU集成
链接:https://arxiv.org/abs/2603.03546

作者:Radu-Andrei Cioaca,Cristian Rusu,Paul Irofti,Gianluca Caparra,Andrei-Alexandru Marinache,Florin Stoican
摘要:精确的定位、导航和定时(PNT)是现代技术运行的基础,也是自主系统的关键推动因素。PNT的一个非常重要的组成部分是全球导航卫星系统(GNSS),它确保户外定位。现代研究方向通过将GNSS测量与其他传感信息(主要是来自惯性测量单元(IMU)的测量)融合,将GNSS定位的性能推向了新的高度。在本文中,我们提出了一个松散耦合的架构,集成GNSS和IMU测量使用因子图优化(FGO)框架。由于FGO方法在计算上具有挑战性,并且经常用作后处理方法,因此我们的重点是评估其定位精度和服务可用性,同时在具有挑战性的环境(城市峡谷)中实时操作。在UrbanNav-HK-MediumUrban-1数据集上的实验结果表明,与批处理FGO方法相比,该方法实现了实时操作,提高了服务可用性。虽然这种改进是以降低定位精度为代价的,但本文详细分析了基于FGO的实时GNSS/IMU融合的精度、可用性和计算效率之间的权衡。
摘要:Accurate positioning, navigation, and timing (PNT) is fundamental to the operation of modern technologies and a key enabler of autonomous systems. A very important component of PNT is the Global Navigation Satellite System (GNSS) which ensures outdoor positioning. Modern research directions have pushed the performance of GNSS localization to new heights by fusing GNSS measurements with other sensory information, mainly measurements from Inertial Measurement Units (IMU). In this paper, we propose a loosely coupled architecture to integrate GNSS and IMU measurements using a Factor Graph Optimization (FGO) framework. Because the FGO method can be computationally challenging and often used as a post-processing method, our focus is on assessing its localization accuracy and service availability while operating in real-time in challenging environments (urban canyons). Experimental results on the UrbanNav-HK-MediumUrban-1 dataset show that the proposed approach achieves real-time operation and increased service availability compared to batch FGO methods. While this improvement comes at the cost of reduced positioning accuracy, the paper provides a detailed analysis of the trade-offs between accuracy, availability, and computational efficiency that characterize real-time FGO-based GNSS/IMU fusion.


【7】Graph Hopfield Networks: Energy-Based Node Classification with Associative Memory
标题:图Hopfield网络:具有关联记忆的基于能量的节点分类
链接:https://arxiv.org/abs/2603.03464

作者:Abinav Rao,Alex Wa,Rishi Athavale
备注:10 Pages, 4 Figures, Acceptted at ICLR NFAM Workshop 2026
摘要:我们介绍了图Hopfield网络,其能量函数耦合联想记忆检索与图拉普拉斯平滑节点分类。在此联合能量上的梯度下降产生迭代更新交织Hopfield检索与Laplacian传播。记忆检索提供了依赖于制度的好处:在稀疏引用网络上高达2.0 pp,在特征掩蔽下高达5 pp的额外鲁棒性;迭代能量下降架构本身是一个强归纳偏差,所有变体(包括内存禁用的NoMem消融)在亚马逊共同购买图上都优于标准基线。调优可以在不更改架构的情况下为异构基准实现图形锐化。
摘要:We introduce Graph Hopfield Networks, whose energy function couples associative memory retrieval with graph Laplacian smoothing for node classification. Gradient descent on this joint energy yields an iterative update interleaving Hopfield retrieval with Laplacian propagation. Memory retrieval provides regime-dependent benefits: up to 2.0~pp on sparse citation networks and up to 5 pp additional robustness under feature masking; the iterative energy-descent architecture itself is a strong inductive bias, with all variants (including the memory-disabled NoMem ablation) outperforming standard baselines on Amazon co-purchase graphs. Tuning enables graph sharpening for heterophilous benchmarks without architectural changes.


【8】Towards Improved Sentence Representations using Token Graphs
标题:使用标记图改进句子表示
链接:https://arxiv.org/abs/2603.03389

作者:Krishna Sri Ipsit Mantri,Carola-Bibiane Schönlieb,Zorah Lähner,Moshe Eliasof
备注:ICLR 2026, 29 Pages, 17 Tables, 5 Figures
摘要:从大型语言模型(LLM)的标记级输出中获得单向量表示是几乎所有语法级任务的关键步骤。然而,标准的池化方法(如平均或最大聚合)将标记视为一个独立的集合,丢弃了模型的自我注意层所捕获的丰富的关系结构,使它们容易受到信号稀释的影响。为了解决这个问题,我们引入了GLOT,这是一个轻量级的,结构感知的池化模块,它将池化重新定义为关系学习和聚合。在冻结LLM的输出上操作,GLOT首先构建潜在的令牌相似性图,然后使用图神经网络细化令牌表示,最后使用读出层聚合它们。在实验上,我们的方法非常强大和有效:在诊断压力测试中,90%的标记是随机干扰项,当基线方法崩溃时,GLOT保持了97%以上的准确率。此外,它与GLUE和MTEB等基准测试的最先进技术相比具有竞争力,可训练参数减少20倍,与参数高效的微调方法相比,训练时间加快了100倍以上。通过对其表达能力的理论分析,我们的工作表明,令牌图的学习是冻结LLM有效适应的强大范例。我们的代码发布在https://github.com/ipsitmantri/GLOT上。
摘要:Obtaining a single-vector representation from a Large Language Model's (LLM) token-level outputs is a critical step for nearly all sentence-level tasks. However, standard pooling methods like mean or max aggregation treat tokens as an independent set, discarding the rich relational structure captured by the model's self-attention layers and making them susceptible to signal dilution. To address this, we introduce GLOT, a lightweight, structure-aware pooling module that reframes pooling as relational learning followed by aggregation. Operating on the outputs of a frozen LLM, GLOT first constructs a latent token-similarity graph, then refines token representations with a graph neural network, and finally aggregates them using a readout layer. Experimentally, our approach is remarkably robust and efficient: on a diagnostic stress test where 90% of tokens are random distractors, GLOT maintains over 97% accuracy while baseline methods collapse. Furthermore, it is competitive with state-of-the-art techniques on benchmarks like GLUE and MTEB with 20x fewer trainable parameters and speeds up the training time by over 100x compared with parameter-efficient fine-tuning methods. Supported by a theoretical analysis of its expressive power, our work shows that learning over token graphs is a powerful paradigm for the efficient adaptation of frozen LLMs. Our code is published at https://github.com/ipsitmantri/GLOT.


【9】Knowledge Graph and Hypergraph Transformers with Repository-Attention and Journey-Based Role Transport
标题:基于知识库-注意和角色迁移的知识图和超图转换器
链接:https://arxiv.org/abs/2603.03304

作者:Mahesh Godavarti
备注:9 pages
摘要:我们提出了一个简洁的架构,用于句子和结构化数据的联合训练,同时保持知识和语言表示可分离。该模型将知识图和超图视为具有角色槽的结构化实例,并将其编码到语言Transformer可以参与的键值存储库中。注意力的条件是基于旅程的角色运输,它统一了边缘标记的KG遍历,超边缘遍历和句子结构。我们概述了一个双流体系结构,层次层组与实例本地,邻域和全球混合注意,检索在一个单独的存储库,和多任务目标跨越掩蔽语言建模,链接预测和角色一致性去噪。其结果是一个明确的,可检查的语言环境和结构化知识之间的分离,同时仍然能够通过交叉注意紧密对齐。
摘要:We present a concise architecture for joint training on sentences and structured data while keeping knowledge and language representations separable. The model treats knowledge graphs and hypergraphs as structured instances with role slots and encodes them into a key-value repository that a language transformer can attend over. Attention is conditioned by journey-based role transport, which unifies edge-labeled KG traversal, hyperedge traversal, and sentence structure. We outline a dual-stream architecture, hierarchical layer groups with instance-local, neighborhood, and global mixing attention, retrieval over a separate repository, and multi-task objectives spanning masked language modeling, link prediction, and role-consistency denoising. The result is an explicit, inspectable separation between linguistic context and structured knowledge, while still enabling tight alignment through cross-attention.


Transformer(6篇)

【1】Activation Outliers in Transformer Quantization: Reproduction, Statistical Analysis, and Deployment Tradeoffs
标题:Transformer量化中的激活异常值:复制、统计分析和部署权衡
链接:https://arxiv.org/abs/2603.04308

作者:Pranav Kumar Kaliaperumal
备注:10 pages, 3 tables. Reproducible study of transformer PTQ activation outliers based on Bondarenko et al. (EMNLP 2021, Qualcomm AI Research). Code: https://github.com/pranavkkp4/TransQuant-Edge
摘要:众所周知,由于结构化激活离群值,Transformers的训练后量化(PTQ)会遭受严重的准确性下降,正如Bondarenko等人(EMNLP 2021)在与Qualcomm AI Research相关的工作中最初分析的那样。本文提供了一个可重复的经验再现和系统级扩展的BERT基础上微调QNLI的现象。当应用全局W8 A8量化时,验证准确率从89.66%(FP 32)急剧下降到54.33%,下降了35.33个点。FP 32激活的统计分析显示出强烈的重尾行为,随着模型深度的增强:峰度在最终层中达到271,大约55%的激活能集中在顶部1%的通道中。我们评估了几种缓解策略。混合精度PTQ恢复的准确度接近FP 32基线(89.42%)。每嵌入组(PEG)量化对分组结构表现出很强的敏感性,将准确率从三组的66.12%提高到四组的86.18%。相比之下,即使在99.0和99.99之间的阈值下,基于阈值的校准也无法恢复准确性(约50.54%),这表明大激活通道编码结构化信号而不是罕见噪声。RTX 3050 GPU上的部署分析显示,不同方法在延迟和内存使用方面的差异很小(中位延迟约为58-59 ms; VRAM使用约为484-486 MB),突出了硬件感知评估的重要性。总体而言,结果表明,Transformers中的PTQ故障主要是由通过剩余连接放大的结构化通道优势驱动的。因此,有效的缓解需要信道感知的精度分配,而不仅仅是标量裁剪。
摘要 :Post-training quantization (PTQ) of transformers is known to suffer from severe accuracy degradation due to structured activation outliers, as originally analyzed by Bondarenko et al. (EMNLP 2021) in work associated with Qualcomm AI Research. This paper provides a reproducible empirical reproduction and systems-level extension of that phenomenon in BERT-base fine-tuned on QNLI. When global W8A8 quantization is applied, validation accuracy drops sharply from 89.66% (FP32) to 54.33%, a decrease of 35.33 points. Statistical analysis of FP32 activations shows strongly heavy-tailed behavior that intensifies with model depth: kurtosis reaches 271 in the final layers and approximately 55% of activation energy is concentrated in the top 1% of channels. We evaluate several mitigation strategies. Mixed precision PTQ restores accuracy close to the FP32 baseline (89.42%). Per-embedding-group (PEG) quantization shows strong sensitivity to grouping structure, improving accuracy from 66.12% with three groups to 86.18% with four groups. In contrast, percentile-based calibration, even at thresholds between 99.0 and 99.99, fails to recover accuracy (about 50.54%), indicating that large activation channels encode structured signal rather than rare noise. Deployment profiling on an RTX 3050 GPU shows minimal differences in latency and memory usage across methods (median latency about 58-59 ms; VRAM usage about 484-486 MB), highlighting the importance of hardware-aware evaluation. Overall, the results show that PTQ failure in transformers is primarily driven by structured channel dominance amplified through residual connections. Effective mitigation therefore requires channel-aware precision allocation rather than scalar clipping alone.


【2】Data-Aware Random Feature Kernel for Transformers
标题:Transformer的数据感知随机特征核心
链接:https://arxiv.org/abs/2603.04127

作者:Amirhossein Farzam,Hossein Mobahi,Nolan Andrew Miller,Luke Sernau
摘要:Transformers在各个领域都表现出色,但它们的二次注意力复杂性对扩展构成了障碍。随机特征注意力,如在执行者中,可以通过使用从各向同性分布中提取的正随机特征来近似softmax内核,将此成本降低到序列长度的线性。然而,在预训练模型中,查询和键通常是各向异性的。这在各向同性采样方案中会导致高蒙特卡罗方差,除非重新训练模型或使用大的特征预算。重要性抽样可以通过使抽样分布适应输入几何来解决这个问题,但是复杂的数据依赖的建议分布通常是棘手的。我们表明,通过数据对齐的softmax内核,我们得到了一个注意力机制,既可以承认一个易于处理的最小方差的建议分布的重要性抽样,并表现出更好的训练稳定性。基于这一发现,我们引入了DARKFormer,一个数据感知的随机特征内核Transformer,它具有数据对齐的内核几何结构。DARKFormer学习随机投影协方差,有效地实现了其数据对齐内核的重要性采样正随机特征估计器。从经验上讲,DARKFormer缩小了与精确softmax注意力的性能差距,特别是在预训练表示各向异性的微调机制中。通过将随机特征效率与数据感知内核相结合,DARKFormer在资源受限的环境中提高了基于内核的注意力。
摘要:Transformers excel across domains, yet their quadratic attention complexity poses a barrier to scaling. Random-feature attention, as in Performers, can reduce this cost to linear in the sequence length by approximating the softmax kernel with positive random features drawn from an isotropic distribution. In pretrained models, however, queries and keys are typically anisotropic. This induces high Monte Carlo variance in isotropic sampling schemes unless one retrains the model or uses a large feature budget. Importance sampling can address this by adapting the sampling distribution to the input geometry, but complex data-dependent proposal distributions are often intractable. We show that by data aligning the softmax kernel, we obtain an attention mechanism which can both admit a tractable minimal-variance proposal distribution for importance sampling, and exhibits better training stability. Motivated by this finding, we introduce DARKFormer, a Data-Aware Random-feature Kernel transformer that features a data-aligned kernel geometry. DARKFormer learns the random-projection covariance, efficiently realizing an importance-sampled positive random-feature estimator for its data-aligned kernel. Empirically, DARKFormer narrows the performance gap with exact softmax attention, particularly in finetuning regimes where pretrained representations are anisotropic. By combining random-feature efficiency with data-aware kernels, DARKFormer advances kernel-based attention in resource-constrained settings.


【3】TFWaveFormer: Temporal-Frequency Collaborative Multi-level Wavelet Transformer for Dynamic Link Prediction
标题:TFWaveFormer:用于动态链接预测的时频协同多层子波Transformer
链接:https://arxiv.org/abs/2603.03963

作者:Hantong Feng,Yonggang Wu,Duxin Chen,Wenwu Yu
摘要:动态链接预测在社交网络分析、通信预测和金融建模等多种应用中起着至关重要的作用。虽然最近的基于transformer的方法在时间图学习方面已经证明了有希望的结果,但在捕获复杂的多尺度时间动态时,它们的性能仍然有限。在本文中,我们提出了TFWaveFormer,一种新的Transformer架构,集成了多分辨率小波分解的时间-频率分析,以增强动态链接预测。我们的框架包括三个关键组成部分:(i)时间-频率协调机制,联合建模时间和频谱表示,(ii)一个可学习的多分辨率小波分解模块,通过并行卷积自适应地提取多尺度时间模式,取代传统的迭代小波变换,以及(iii)一个混合Transformer模块,有效地融合了局部小波特征与全局时间依赖性。在基准数据集上进行的大量实验表明,TFWaveFormer实现了最先进的性能,在多个指标上显著优于现有的基于Transformer的模型和混合模型。TFWaveFormer的卓越性能验证了将时频分析与小波分解相结合在动态链接预测任务中捕获复杂时间动态的有效性。
摘要:Dynamic link prediction plays a crucial role in diverse applications including social network analysis, communication forecasting, and financial modeling. While recent Transformer-based approaches have demonstrated promising results in temporal graph learning, their performance remains limited when capturing complex multi-scale temporal dynamics. In this paper, we propose TFWaveFormer, a novel Transformer architecture that integrates temporal-frequency analysis with multi-resolution wavelet decomposition to enhance dynamic link prediction. Our framework comprises three key components: (i) a temporal-frequency coordination mechanism that jointly models temporal and spectral representations, (ii) a learnable multi-resolution wavelet decomposition module that adaptively extracts multi-scale temporal patterns through parallel convolutions, replacing traditional iterative wavelet transforms, and (iii) a hybrid Transformer module that effectively fuses local wavelet features with global temporal dependencies. Extensive experiments on benchmark datasets demonstrate that TFWaveFormer achieves state-of-the-art performance, outperforming existing Transformer-based and hybrid models by significant margins across multiple metrics. The superior performance of TFWaveFormer validates the effectiveness of combining temporal-frequency analysis with wavelet decomposition in capturing complex temporal dynamics for dynamic link prediction tasks.


【4】Orbital Transformers for Predicting Wavefunctions in Time-Dependent Density Functional Theory
标题:用于预测时变密度函数理论中波函数的轨道变换器
链接:https://arxiv.org/abs/2603.03511

作者:Xuan Zhang,Haiyang Yu,Chengdong Wang,Jacob Helwig,Shuiwang Ji,Xiaofeng Qian
摘要:我们的目标是学习由含时密度泛函理论(TDDFT)模拟的波函数,它可以有效地表示为原子轨道的线性组合系数。在实时TDDFT中,分子的电子波函数响应于外部激发而随时间演变,从而实现物理性质的第一原理预测,例如光学吸收,电子动力学和高阶响应。然而,传统的实时TDDFT依赖于所有占用状态的精细时间步长的耗时传播。在这项工作中,我们提出了OrbEvo,这是基于一个等变图Transformer架构,并学会发展整个电子波函数系数跨时间步长。首先,为了考虑外场,我们设计了一个等变条件来编码外电场的强度和方向,并打破从SO(3)到SO(2)的对称性。在此基础上,分别采用波函数池化和密度矩阵相互作用方法,设计了两个OrbEvo模型:OrbEvo-WF和OrbEvo-DM。受密度泛函在TDDFT中的核心作用的启发,OrbEvo-DM通过张量收缩将从所有占据的电子态聚集的密度矩阵编码为特征向量,提供了一种更直观的方法来学习时间演化算子。我们采用了一种专门定制的训练策略,以限制自回归滚展过程中随时间变化的波函数的误差积累。为了评估我们的方法,我们生成了由QM 9数据集中的5,000种不同分子和MD 17数据集中的1,500种丙二醛分子构型组成的TDDFT数据集。结果表明,我们的OrbEvo模型准确地捕捉量子动力学的激发态在外场下,包括含时波函数,含时偶极矩,和光学吸收谱。
摘要 :We aim to learn wavefunctions simulated by time-dependent density functional theory (TDDFT), which can be efficiently represented as linear combination coefficients of atomic orbitals. In real-time TDDFT, the electronic wavefunctions of a molecule evolve over time in response to an external excitation, enabling first-principles predictions of physical properties such as optical absorption, electron dynamics, and high-order response. However, conventional real-time TDDFT relies on time-consuming propagation of all occupied states with fine time steps. In this work, we propose OrbEvo, which is based on an equivariant graph transformer architecture and learns to evolve the full electronic wavefunction coefficients across time steps. First, to account for external field, we design an equivariant conditioning to encode both strength and direction of external electric field and break the symmetry from SO(3) to SO(2). Furthermore, we design two OrbEvo models, OrbEvo-WF and OrbEvo-DM, using wavefunction pooling and density matrix as interaction method, respectively. Motivated by the central role of the density functional in TDDFT, OrbEvo-DM encodes the density matrix aggregated from all occupied electronic states into feature vectors via tensor contraction, providing a more intuitive approach to learn the time evolution operator. We adopt a training strategy specifically tailored to limit the error accumulation of time-dependent wavefunctions over autoregressive rollout. To evaluate our approach, we generate TDDFT datasets consisting of 5,000 different molecules in the QM9 dataset and 1,500 molecular configurations of the malonaldehyde molecule in the MD17 dataset. Results show that our OrbEvo model accurately captures quantum dynamics of excited states under external field, including time-dependent wavefunctions, time-dependent dipole moment, and optical absorption spectra.


【5】Geographically-Weighted Weakly Supervised Bayesian High-Resolution Transformer for 200m Resolution Pan-Arctic Sea Ice Concentration Mapping and Uncertainty Estimation using Sentinel-1, RCM, and AMSR2 Data
标题:使用Sentinel-1、RCM和AMSR 2数据进行200 m分辨率泛北极海冰浓度绘图和不确定性估计的地理加权弱监督Bayesian高分辨率Transformer
链接:https://arxiv.org/abs/2603.03503

作者:Mabel Heffring,Lincoln Linlin Xu
备注:23 pages, 20 figures
摘要:尽管具有可靠的对应不确定性的泛北极海冰高分辨率制图对于业务海冰密集度(SIC)制图至关重要,但由于关键挑战,如冰特征的微妙性质,不精确的SIC标签,模型不确定性和数据异质性,这是一项艰巨的任务。本研究提出了一种新的贝叶斯高分辨率Transformer的方法,200米分辨率泛北极SIC映射和不确定性量化使用哨兵-1,RADARSAT星座任务(RCM),和先进的微波扫描辐射计2(AMSR 2)的数据。首先,改善小而微妙的海冰特征(例如,裂缝/引线,池塘和浮冰)提取,我们设计了一种新的高分辨率Transformer模型,具有全局和局部模块,可以更好地辨别海冰模式的细微差异。其次,为了解决低分辨率和不精确的SIC标签,我们设计了一个地理加权的弱监督损失函数,以区域级别而不是像素级别来监督模型,并优先考虑纯开放水域和冰袋签名,同时减轻边缘冰区(MIZ)模糊性的影响。第三,为了改善不确定性量化,我们设计了一个贝叶斯扩展的建议Transformer模型,将其参数作为随机变量,以更有效地捕捉不确定性。第四,为了解决数据异构性,我们在决策层融合了三种不同的数据类型(Sentinel-1,RCM和AMSR 2),以改善SIC映射和不确定性量化。在2021年和2025年的泛北极最小范围条件下评估了拟议的方法。结果表明,该模型实现了0.70的整体特征检测精度使用哨兵-1数据,同时还保留泛北极SIC模式(哨兵-1 R\text上标{2} = 0.90相对于ARTIST海冰产品)。
摘要:Although high-resolution mapping of pan-Arctic sea ice with reliable corresponding uncertainty is essential for operational sea ice concentration (SIC) charting, it is a difficult task due to key challenges, such as the subtle nature of ice signature features, inexact SIC labels, model uncertainty, and data heterogeneity. This study presents a novel Bayesian High-Resolution Transformer approach for 200 meter resolution pan-Arctic SIC mapping and uncertainty quantification using Sentinel-1, RADARSAT Constellation Mission (RCM), and Advanced Microwave Scanning Radiometer 2 (AMSR2) data. First, to improve small and subtle sea ice feature (e.g., cracks/leads, ponds, and ice floes) extraction, we design a novel high-resolution Transformer model with both global and local modules that can better discern the subtle differences in sea ice patterns. Second, to address low-resolution and inexact SIC labels, we design a geographically-weighted weakly supervised loss function to supervise the model at region level instead of pixel level, and to prioritize pure open water and ice pack signatures while mitigating the impact of ambiguity in the marginal ice zone (MIZ). Third, to improve uncertainty quantification, we design a Bayesian extension of the proposed Transformer model, treating its parameters as random variables to more effectively capture uncertainties. Fourth, to address data heterogeneity, we fuse three different data types (Sentinel-1, RCM, and AMSR2) at decision-level to improve both SIC mapping and uncertainty quantification. The proposed approach is evaluated under pan-Arctic minimum-extent conditions in 2021 and 2025. Results demonstrate that the proposed model achieves 0.70 overall feature detection accuracy using Sentinel-1 data, while also preserving pan-Arctic SIC patterns (Sentinel-1 R\textsuperscript{2} = 0.90 relative to the ARTIST Sea Ice product).


【6】Half the Nonlinearity Is Wasted: Measuring and Reallocating the Transformer's MLP Budget
标题:一半的非线性被浪费:测量和重新分配Transformer的MLP预算
链接:https://arxiv.org/abs/2603.03459

作者:Peter Balogh
摘要:我们调查时,Transformer MLP非线性实际上是必要的。具有$d+1$参数的门决定何时用线性替代项替换完整的MLP。通过对六个模型(162 M-2.8B参数),两个架构和三个语料库的系统研究,我们确定了非线性需求不能从令牌身份预测:跨语料库相关性为零($r < 0.05$)。路由决策完全是上下文相关的。尽管每个实例的可预测性较弱,但门利用了严重偏斜的分布,其中大多数MLP计算接近线性,在GPT-2中以<1%的困惑成本实现25-56%的线性路由。在GPT-2 Large中,36层中有11层超过了门控基线,没有一层超过3.7%的全线性成本。这种成功是依赖于架构的:Pythia模型显示出更高的成本,尽管Pythia-2.8B的完整32层扫描显示了一个勉强超过基线的层。作为概念验证,我们逐步用冻结的线性矩阵替换中间层MLP:24层中的5层以零成本线性化。在完整的训练预算下,4个线性化层产生了10.2%的困惑改善-而两阶段门控方法将其提高到17.3%,击败了香草微调控制,并证实了这些层的非线性MLP是积极有害的。
摘要:We investigate when transformer MLP nonlinearity is actually necessary. A gate with $d+1$ parameters decides when to replace the full MLP with a linear surrogate. Through systematic investigation across six models (162M-2.8B parameters), two architectures, and three corpora, we establish that nonlinearity need cannot be predicted from token identity: cross-corpus correlation is zero ($r < 0.05$). The routing decision is fully contextual. Despite weak per-instance predictability, the gate exploits a heavily skewed distribution where most MLP computations are near-linear, achieving 25-56% linear routing at <1% perplexity cost in GPT-2. In GPT-2 Large, 11 of 36 layers beat baseline with gating and no layer exceeds 3.7% all-linear cost. This success is architecture-dependent: Pythia models show higher costs, though Pythia-2.8B's full 32-layer sweep reveals one layer that narrowly beats baseline. As a proof of concept, we progressively replace middle-layer MLPs with frozen linear matrices: 5 of 24 layers linearize at zero cost. With a full training budget, 4 linearized layers yield a 10.2% perplexity improvement -- and a two-phase gated approach pushes this to 17.3%, beating a vanilla fine-tuning control and confirming that the nonlinear MLPs at these layers were actively harmful.


GAN|对抗|攻击|生成相关(9篇)

【1】Dual-Modality Multi-Stage Adversarial Safety Training: Robustifying Multimodal Web Agents Against Cross-Modal Attacks
标题:双模式多阶段对抗性安全训练:针对跨模式攻击,阻止多模式Web代理
链接:https://arxiv.org/abs/2603.04364

作者:Haoyu Liu,Dingcheng Li,Lukas Rutishauser,Zeyu Zheng
摘要:处理屏幕截图和可访问性树的多模态Web代理越来越多地被部署用于与Web界面交互,但它们的双流架构打开了一个未被充分探索的攻击面:将内容注入网页DOM的对手同时用一致的欺骗性叙述破坏两个观察通道。我们对MiniWob++的漏洞分析表明,包含视觉组件的攻击远远优于纯文本注入,暴露了以文本为中心的VLM安全培训中的关键差距。基于这一发现,我们提出了双模态多阶段对抗安全训练(DMAST),这是一个框架,将代理-攻击者交互形式化为两个玩家的零和马尔可夫游戏,并通过三个阶段的管道共同训练两个玩家:(1)模仿强师模型学习,(2)预言机引导的监督微调,使用一种新的零确认策略,在对抗性噪声下灌输以任务为中心的推理,以及(3)通过组相对策略优化(GRPO)自我游戏的对抗性强化学习。在分发任务之外,DMAST大大降低了对抗风险,同时使任务完成效率提高一倍。我们的方法显著优于现有的基于训练和基于人工智能的防御,展示了真正的共同进化进步和对复杂、不可见环境的强大概括。
摘要 :Multimodal web agents that process both screenshots and accessibility trees are increasingly deployed to interact with web interfaces, yet their dual-stream architecture opens an underexplored attack surface: an adversary who injects content into the webpage DOM simultaneously corrupts both observation channels with a consistent deceptive narrative. Our vulnerability analysis on MiniWob++ reveals that attacks including a visual component far outperform text-only injections, exposing critical gaps in text-centric VLM safety training. Motivated by this finding, we propose Dual-Modality Multi-Stage Adversarial Safety Training (DMAST), a framework that formalizes the agent-attacker interaction as a two-player zero-sum Markov game and co-trains both players through a three-stage pipeline: (1) imitation learning from a strong teacher model, (2) oracle-guided supervised fine-tuning that uses a novel zero-acknowledgment strategy to instill task-focused reasoning under adversarial noise, and (3) adversarial reinforcement learning via Group Relative Policy Optimization (GRPO) self-play. On out-of-distribution tasks, DMAST substantially mitigates adversarial risks while simultaneously doubling task completion efficiency. Our approach significantly outperforms established training-based and prompt-based defenses, demonstrating genuine co-evolutionary progress and robust generalization to complex, unseen environments.


【2】Balancing Fidelity, Utility, and Privacy in Synthetic Cardiac MRI Generation: A Comparative Study
标题:合成心脏MRI生成中的保真度、实用性和隐私平衡:比较研究
链接:https://arxiv.org/abs/2603.04340

作者:Madhura Edirisooriya,Dasuni Kawya,Ishan Kumarasinghe,Isuri Devindi,Mary M. Maleckar,Roshan Ragel,Isuru Nawinne,Vajira Thambawita
备注:7 pages, 4 figures, Preprint
摘要:心脏MRI(CMR)中的深度学习从根本上受到数据稀缺和隐私法规的限制。本研究系统地基准三个生成架构:去噪扩散概率模型(DDPM),潜在扩散模型(LDM),和流量匹配(FM)合成CMR生成。利用一个两阶段的管道,其中解剖掩模条件图像合成,我们评估生成的数据在三个关键轴:保真度,实用性和隐私。我们的研究结果表明,基于扩散的模型,特别是DDPM,在有限的数据条件下提供了下游分割效用,图像保真度和隐私保护之间最有效的平衡,而FM表现出有前途的隐私特性,任务级性能略低。这些发现量化了跨域泛化和患者机密性之间的权衡,为医学成像中安全有效的合成数据增强建立了框架。
摘要:Deep learning in cardiac MRI (CMR) is fundamentally constrained by both data scarcity and privacy regulations. This study systematically benchmarks three generative architectures: Denoising Diffusion Probabilistic Models (DDPM), Latent Diffusion Models (LDM), and Flow Matching (FM) for synthetic CMR generation. Utilizing a two-stage pipeline where anatomical masks condition image synthesis, we evaluate generated data across three critical axes: fidelity, utility, and privacy. Our results show that diffusion-based models, particularly DDPM, provide the most effective balance between downstream segmentation utility, image fidelity, and privacy preservation under limited-data conditions, while FM demonstrates promising privacy characteristics with slightly lower task-level performance. These findings quantify the trade-offs between cross-domain generalization and patient confidentiality, establishing a framework for safe and effective synthetic data augmentation in medical imaging.


【3】Tuning Just Enough: Lightweight Backdoor Attacks on Multi-Encoder Diffusion Models
标题:调整恰到好处:对多编码器扩散模型的轻量级后门攻击
链接:https://arxiv.org/abs/2603.04064

作者:Ziyuan Chen,Yujin Jeong,Tobias Braun,Anna Rohrbach
摘要:随着文本到图像扩散模型越来越多地部署在现实世界的应用中,对后门攻击的关注已经得到了极大的关注。基于文本的后门攻击的先前工作主要集中在基于单个轻量级文本编码器的扩散模型上。然而,最近的扩散模型,包括多个大规模的文本编码器仍然在这方面的探索不足。鉴于多个文本编码器引入的可训练参数数量大幅增加,一个重要的问题是后门攻击是否可以在这种设置中保持高效和有效。在这项工作中,我们研究了稳定扩散3,它使用了三种不同的文本编码器,尚未系统地分析基于文本编码器的后门漏洞。为了理解文本编码器在后门攻击中的作用,我们定义了四类攻击目标,并确定了为每个攻击目标实现有效性能所需的最小编码器集。在此基础上,我们进一步提出了多编码器轻量级aTacks(MELT),它只训练低等级的适配器,同时保持预训练的文本编码器权重冻结。我们证明,调整不到0.2%的总编码器参数是足够的稳定扩散3成功的后门攻击,揭示了以前在多编码器设置的实际攻击场景中未充分利用的漏洞。
摘要:As text-to-image diffusion models become increasingly deployed in real-world applications, concerns about backdoor attacks have gained significant attention. Prior work on text-based backdoor attacks has largely focused on diffusion models conditioned on a single lightweight text encoder. However, more recent diffusion models that incorporate multiple large-scale text encoders remain underexplored in this context. Given the substantially increased number of trainable parameters introduced by multiple text encoders, an important question is whether backdoor attacks can remain both efficient and effective in such settings. In this work, we study Stable Diffusion 3, which uses three distinct text encoders and has not yet been systematically analyzed for text-encoder-based backdoor vulnerabilities. To understand the role of text encoders in backdoor attacks, we define four categories of attack targets and identify the minimal sets of encoders required to achieve effective performance for each attack objective. Based on this, we further propose Multi-Encoder Lightweight aTtacks (MELT), which trains only low-rank adapters while keeping the pretrained text encoder weight frozen. We demonstrate that tuning fewer than 0.2% of the total encoder parameters is sufficient for successful backdoor attacks on Stable Diffusion 3, revealing previously underexplored vulnerabilities in practical attack scenarios in multi-encoder settings.


【4】Multi-Stage Music Source Restoration with BandSplit-RoFormer Separation and HiFi++ GAN
标题:利用BandSplit-RoFormer Separation和HiFi++ GAN进行多阶段音乐源恢复
链接:https://arxiv.org/abs/2603.04032

作者:Tobias Morocutti,Emmanouil Karystinaios,Jonathan Greif,Gerhard Widmer
备注:ICASSP 2026 Music Source Restoration (MSR) Challenge
摘要:音乐源恢复(MSR)的目标是恢复原始的,未经处理的乐器源于完全混合和掌握的音频,其中生产效果和分布文物违反常见的线性混合假设。本技术报告介绍了CP-JKU团队为2025年MSR ICASSP挑战赛设计的系统。我们的方法分解MSR分离和恢复。首先,单个BandSplit-RoFormer分离器预测八个阀杆加上一个辅助阀杆,并通过三阶段课程进行培训,从4阀杆热启动微调(使用LoRA)到通过头部扩展的8阀杆扩展。其次,我们应用HiFi++ GAN波形恢复器,该波形恢复器被训练为通才,然后被专业化为八个特定于乐器的专家。
摘要:Music Source Restoration (MSR) targets recovery of original, unprocessed instrument stems from fully mixed and mastered audio, where production effects and distribution artifacts violate common linear-mixture assumptions. This technical report presents the CP-JKU team's system for the MSR ICASSP Challenge 2025. Our approach decomposes MSR into separation and restoration. First, a single BandSplit-RoFormer separator predicts eight stems plus an auxiliary other stem, and is trained with a three-stage curriculum that progresses from 4-stem warm-start fine-tuning (with LoRA) to 8-stem extension via head expansion. Second, we apply a HiFi++ GAN waveform restorer trained as a generalist and then specialized into eight instrument-specific experts.


【5】Structure-Aware Distributed Backdoor Attacks in Federated Learning
标题:联邦学习中的结构感知分布式后门攻击
链接:https://arxiv.org/abs/2603.03865

作者:Wang Jian,Shen Hong,Ke Wei,Liu Xue Hua
备注 :17pages,12 figures
摘要:虽然联邦学习保护了数据隐私,但它也使模型更新过程容易受到长期隐形干扰的影响。现有关于联邦学习中后门攻击的研究主要集中在触发器设计或中毒策略上,通常假设相同的扰动在不同的模型架构中表现相似。这种假设忽略了模型结构对扰动有效性的影响。本文从结构感知的角度分析了模型体系结构与后门扰动之间的耦合关系。我们引入了两个指标,结构响应性得分(SRS)和结构兼容性系数(SCC),来衡量模型的扰动的敏感性和分形扰动的偏好。基于这些指标,我们开发了一个结构感知的分形扰动注入框架(TFI),研究建筑属性在后门注入过程中的作用。实验结果表明,模型结构显著影响扰动的传播和聚集。具有多路径特征融合的网络即使在低中毒率下也可以放大和保留分形扰动,而低结构兼容性的模型限制了它们的有效性。进一步的分析表明,SCC和攻击成功率之间有很强的相关性,表明SCC可以预测扰动生存性。这些发现强调了联邦学习中的后门行为不仅取决于扰动设计或中毒强度,还取决于模型架构和聚合机制之间的相互作用,为结构感知防御设计提供了新的见解。
摘要:While federated learning protects data privacy, it also makes the model update process vulnerable to long-term stealthy perturbations. Existing studies on backdoor attacks in federated learning mainly focus on trigger design or poisoning strategies, typically assuming that identical perturbations behave similarly across different model architectures. This assumption overlooks the impact of model structure on perturbation effectiveness. From a structure-aware perspective, this paper analyzes the coupling relationship between model architectures and backdoor perturbations. We introduce two metrics, Structural Responsiveness Score (SRS) and Structural Compatibility Coefficient (SCC), to measure a model's sensitivity to perturbations and its preference for fractal perturbations. Based on these metrics, we develop a structure-aware fractal perturbation injection framework (TFI) to study the role of architectural properties in the backdoor injection process. Experimental results show that model architecture significantly influences the propagation and aggregation of perturbations. Networks with multi-path feature fusion can amplify and retain fractal perturbations even under low poisoning ratios, while models with low structural compatibility constrain their effectiveness. Further analysis reveals a strong correlation between SCC and attack success rate, suggesting that SCC can predict perturbation survivability. These findings highlight that backdoor behaviors in federated learning depend not only on perturbation design or poisoning intensity but also on the interaction between model architecture and aggregation mechanisms, offering new insights for structure-aware defense design.


【6】LEA: Label Enumeration Attack in Vertical Federated Learning
标题:LEA:垂直联邦学习中的标签列举攻击
链接:https://arxiv.org/abs/2603.03777

作者:Wenhao Jiang,Shaojing Fu,Yuchuan Luo,Lin Liu
摘要:典型的垂直联合学习(VFL)场景涉及多个参与者协作训练机器学习模型,其中每一方对相同的样本具有不同的特征,标签由一方独占。由于标签包含敏感信息,VFL必须确保标签的隐私。然而,现有的以VFL为目标的标签推断攻击要么局限于特定场景,要么需要辅助数据,这使得它们在现实世界的应用中不切实际。   我们介绍了一种新的标签枚举攻击(LEA),第一次,实现了跨多个VFL场景的适用性,并避免了对辅助数据的需要。我们的直觉是,对手,采用聚类来枚举样本和标签之间的映射,通过评估良性模型和在每个映射下训练的模拟模型之间的相似性来获得准确的标签映射。为了实现这一目标,第一个挑战是如何衡量模型相似性,因为在相同数据上训练的模型可能具有不同的权重。根据我们的研究结果,我们提出了一种有效的方法来评估一致性的基础上的余弦相似性的第一轮损失梯度,它提供了优越的效率和精度相比,参数相似性的比较。然而,由于训练和比较通过枚举生成的大量模拟模型的必要性,计算成本可能是过高的。为了克服这一挑战,我们从减少模型数量和消除无用训练的角度提出了Binary-LEA,它将枚举数量从n降低到n!到n^3。此外,LEA对诸如梯度噪声和梯度压缩之类的常见防御机制具有弹性。
摘要:A typical Vertical Federated Learning (VFL) scenario involves several participants collaboratively training a machine learning model, where each party has different features for the same samples, with labels held exclusively by one party. Since labels contain sensitive information, VFL must ensure the privacy of labels. However, existing VFL-targeted label inference attacks are either limited to specific scenarios or require auxiliary data, rendering them impractical in real-world applications.   We introduce a novel Label Enumeration Attack (LEA) that, for the first time, achieves applicability across multiple VFL scenarios and eschews the need for auxiliary data. Our intuition is that an adversary, employing clustering to enumerate mappings between samples and labels, ascertains the accurate label mappings by evaluating the similarity between the benign model and the simulated models trained under each mapping. To achieve that, the first challenge is how to measure model similarity, as models trained on the same data can have different weights. Drawing from our findings, we propose an efficient approach for assessing congruence based on the cosine similarity of the first-round loss gradients, which offers superior efficiency and precision compared to the comparison of parameter similarities. However, the computational cost may be prohibitive due to the necessity of training and comparing the vast number of simulated models generated through enumeration. To overcome this challenge, we propose Binary-LEA from the perspective of reducing the number of models and eliminating futile training, which lowers the number of enumerations from n! to n^3. Moreover, LEA is resilient against common defense mechanisms such as gradient noise and gradient compression.


【7】JANUS: Structured Bidirectional Generation for Guaranteed Constraints and Analytical Uncertainty
标题:JANUS:针对保证约束和分析不确定性的结构化双向生成
链接:https://arxiv.org/abs/2603.03748

作者:Taha Racicot
备注:14 pages, 10 figures, 14 tables
摘要:高风险的合成数据生成面临着一个基本的四难:同时实现对原始分布的保真度,对复杂逻辑约束的控制,不确定性估计的可靠性和计算成本的效率。最先进的深度生成模型(CTGAN,TabDDPM)在保真度方面表现出色,但对于连续范围约束依赖于低效的拒绝采样。相反,结构因果模型提供逻辑控制,但与高维保真度和复杂的噪声反演斗争。我们介绍JANUS(不确定性和合成的联合祖先网络),一个框架,使用贝叶斯决策树的DAG统一这些功能。我们的关键创新是反向拓扑回填,一种通过因果图向后传播约束的算法,在没有拒绝抽样的情况下,在可行的约束集上实现100%的约束满足。这与来自Dirichlet先验的分析不确定性分解相结合,使不确定性估计比Monte Carlo方法快128倍。在15个数据集和523个受约束的场景中,JANUS实现了最先进的保真度(检测分数0.497),消除了不平衡数据的模式崩溃,并提供了复杂列间约束的精确处理(例如,Salary_offered >= Salary_requested),其中基线完全失败。
摘要:High-stakes synthetic data generation faces a fundamental Quadrilemma: achieving Fidelity to the original distribution, Control over complex logical constraints, Reliability in uncertainty estimation, and Efficiency in computational cost -- simultaneously. State-of-the-art Deep Generative Models (CTGAN, TabDDPM) excel at fidelity but rely on inefficient rejection sampling for continuous range constraints. Conversely, Structural Causal Models offer logical control but struggle with high-dimensional fidelity and complex noise inversion. We introduce JANUS (Joint Ancestral Network for Uncertainty and Synthesis), a framework that unifies these capabilities using a DAG of Bayesian Decision Trees. Our key innovation is Reverse-Topological Back-filling, an algorithm that propagates constraints backwards through the causal graph, achieving 100% constraint satisfaction on feasible constraint sets without rejection sampling. This is paired with an Analytical Uncertainty Decomposition derived from Dirichlet priors, enabling 128x faster uncertainty estimation than Monte Carlo methods. Across 15 datasets and 523 constrained scenarios, JANUS achieves state-of-the-art fidelity (Detection Score 0.497), eliminates mode collapse on imbalanced data, and provides exact handling of complex inter-column constraints (e.g., Salary_offered >= Salary_requested) where baselines fail entirely.


【8】Solving adversarial examples requires solving exponential misalignment
标题:解决敌对例子需要解决指数失调
链接:https://arxiv.org/abs/2603.03507

作者:Alessandro Salvatore,Stanislav Fort,Surya Ganguli
摘要 :对抗性攻击--人类无法感知的输入扰动,欺骗了神经网络--仍然是机器学习中的一种持续失败模式,也是一种起源神秘的现象。为了阐明这一点,我们定义并分析了一个网络的感知流形(PM),作为网络自信地分配给该类的所有输入的空间。我们发现,引人注目的是,神经网络PM的维数是数量级高于自然人类的概念。由于体积通常随维度呈指数级增长,这表明机器和人类之间存在指数级的不一致,机器而不是人类自信地将许多输入分配给概念。此外,这为对抗性示例的起源提供了一个自然的几何假设:因为网络的PM填充了如此大的输入空间区域,任何输入都将非常接近任何类概念的PM。因此,我们的假设表明,如果没有机器和人类PM的维度对齐,就无法实现对抗鲁棒性,因此做出了强有力的预测:鲁棒精度和到任何PM的距离都应该与PM维度负相关。我们在18个不同的网络中证实了这些预测,这些网络具有不同的鲁棒准确性。至关重要的是,我们发现即使是最强大的网络仍然呈指数级失调,只有少数维数接近人类概念的PM表现出与人类感知的一致性。我们的结果将对齐和对抗示例领域联系起来,并表明机器PM的高维诅咒是对抗鲁棒性的主要障碍。
摘要:Adversarial attacks - input perturbations imperceptible to humans that fool neural networks - remain both a persistent failure mode in machine learning, and a phenomenon with mysterious origins. To shed light, we define and analyze a network's perceptual manifold (PM) for a class concept as the space of all inputs confidently assigned to that class by the network. We find, strikingly, that the dimensionalities of neural network PMs are orders of magnitude higher than those of natural human concepts. Since volume typically grows exponentially with dimension, this suggests exponential misalignment between machines and humans, with exponentially many inputs confidently assigned to concepts by machines but not humans. Furthermore, this provides a natural geometric hypothesis for the origin of adversarial examples: because a network's PM fills such a large region of input space, any input will be very close to any class concept's PM. Our hypothesis thus suggests that adversarial robustness cannot be attained without dimensional alignment of machine and human PMs, and therefore makes strong predictions: both robust accuracy and distance to any PM should be negatively correlated with the PM dimension. We confirmed these predictions across 18 different networks of varying robust accuracy. Crucially, we find even the most robust networks are still exponentially misaligned, and only the few PMs whose dimensionality approaches that of human concepts exhibit alignment to human perception. Our results connect the fields of alignment and adversarial examples, and suggest the curse of high dimensionality of machine PMs is a major impediment to adversarial robustness.


【9】Bayesian Adversarial Privacy
标题:贝氏对抗隐私
链接:https://arxiv.org/abs/2603.04199

作者:Cameron Bell,Timothy Johnston,Antoine Luciano,Christian P Robert
摘要:对隐私的理论和应用研究包含了令人难以置信的广泛的不同方法,重点和目标。这项工作引入了一个新的定量概念的隐私,既上下文和具体。我们认为,它提供了一个更有意义的概念,隐私比广泛使用的框架的差异隐私和更明确和严格的制定比常用的统计披露理论。我们的定义依赖于标准贝叶斯决策理论固有的概念,同时在几个重要方面偏离它。特别是,控制敏感信息发布的一方应该从事先的角度做出披露决定,而不是以数据为条件,即使数据本身也被观察到。照明玩具的例子和计算方法进行了详细讨论,以突出该方法的特殊性。
摘要:Theoretical and applied research into privacy encompasses an incredibly broad swathe of differing approaches, emphasis and aims. This work introduces a new quantitative notion of privacy that is both contextual and specific. We argue that it provides a more meaningful notion of privacy than the widely utilised framework of differential privacy and a more explicit and rigorous formulation than what is commonly used in statistical disclosure theory. Our definition relies on concepts inherent to standard Bayesian decision theory, while departing from it in several important respects. In particular, the party controlling the release of sensitive information should make disclosure decisions from the prior viewpoint, rather than conditional on the data, even when the data is itself observed. Illuminating toy examples and computational methods are discussed in high detail in order to highlight the specificities of the method.


半/弱/无/有监督|不确定性|主动学习(5篇)

【1】Accurate and Efficient Hybrid-Ensemble Atmospheric Data Assimilation in Latent Space with Uncertainty Quantification
标题:潜空间中准确有效的混合融合大气数据同化和不确定性量化
链接:https://arxiv.org/abs/2603.04395

作者:Hang Fan,Juan Nathaniel,Yi Xiao,Ce Bian,Fenghua Ling,Ben Fei,Lei Bai,Pierre Gentine
备注:23 pages, 12 figures
摘要:数据同化(DA)将模式预报和观测相结合,以估计具有不确定性的大气的最佳状态,为天气预报和气候研究的再分析提供初始条件。然而,现有的传统和机器学习DA方法很难同时实现准确性,效率和不确定性量化。在这里,我们提出了HLOBA(混合集成潜在观测背景同化),一个三维的混合集成DA方法,通过自动编码器(AE)学习的大气潜在空间中运行。HLOBA分别通过AE编码器和端到端观测到潜在空间映射网络(O2Lnet)将模型预测和观测映射到共享的潜在空间,并通过贝叶斯更新与从时滞集合预测推断的权重进行融合。理想化和真实观测实验表明,HLOBA匹配动态约束四维DA方法在分析和预测技能,同时实现端到端的推理水平的效率和理论的灵活性适用于任何预测模型。此外,通过利用潜在变量的误差去相关特性,HLOBA使其潜在分析的元素不确定性估计,并通过解码器将它们传播到模型空间。理想化的实验表明,这种不确定性突出了大误差区域,并捕捉到了它们的季节性变化。
摘要:Data assimilation (DA) combines model forecasts and observations to estimate the optimal state of the atmosphere with its uncertainty, providing initial conditions for weather prediction and reanalyses for climate research. Yet, existing traditional and machine-learning DA methods struggle to achieve accuracy, efficiency and uncertainty quantification simultaneously. Here, we propose HLOBA (Hybrid-Ensemble Latent Observation-Background Assimilation), a three-dimensional hybrid-ensemble DA method that operates in an atmospheric latent space learned via an autoencoder (AE). HLOBA maps both model forecasts and observations into a shared latent space via the AE encoder and an end-to-end Observation-to-Latent-space mapping network (O2Lnet), respectively, and fuses them through a Bayesian update with weights inferred from time-lagged ensemble forecasts. Both idealized and real-observation experiments demonstrate that HLOBA matches dynamically constrained four-dimensional DA methods in both analysis and forecast skill, while achieving end-to-end inference-level efficiency and theoretical flexibility applies to any forecasting model. Moreover, by exploiting the error decorrelation property of latent variables, HLOBA enables element-wise uncertainty estimates for its latent analysis and propagates them to model space via the decoder. Idealized experiments show that this uncertainty highlights large-error regions and captures their seasonal variability.


【2】Unsupervised Surrogate-Assisted Synthesis of Free-Form Planar Antenna Topologies for IoT Applications
标题:用于物联网应用的自由形式平面天线布局的无监督代理辅助合成
链接:https://arxiv.org/abs/2603.03802

作者:Khadijeh Askaripour,Adrian Bekasiewicz,Slawomir Koziel
摘要:物联网(IoT)应用中天线结构的设计是一个具有挑战性的问题。当代散热器通常受到许多电气和/或辐射相关要求的约束,但也受到物联网系统和/或预期操作环境的具体要求的约束。天线设计的传统方法通常涉及与其调谐交织在一起的拓扑的手动开发。虽然这种方法被证明是有用的,但它容易出错和产生工程偏差。或者,可以在没有设计者监督的情况下生成和优化几何形状。该过程可以通过合适的算法来控制,以根据规范确定并然后调整天线几何形状。不幸的是,物联网辐射器的自动设计与诸如确定期望的几何形状或高优化成本等挑战相关联。在这项工作中,提出了一个可变的保真度框架,以性能为导向的发展的自由形式的天线表示使用通用的仿真模型。该方法采用代理辅助分类器,能够从一组自动生成的(并存储用于潜在的重用)候选设计中识别合适的辐射器拓扑。然后,使用基于梯度的优化引擎对所获得的几何形状进行两阶段调整。所提出的框架是基于六个数值实验证明的带宽增强型贴片天线专用于工作在5 GHz至6 GHz和6 GHz至7 GHz频段内,分别无监督的发展。广泛的基准的方法,以及生成的拓扑结构也进行。
摘要 :Design of antenna structures for Internet of Things (IoT) applications is a challenging problem. Contemporary radiators are often subject to a number of electric and/or radiation-related requirements, but also constraints imposed by specifics of IoT systems and/or intended operational environments. Conventional approaches to antenna design typically involve manual development of topology intertwined with its tuning. Although proved useful, the approach is prone to errors and engineering bias. Alternatively, geometries can be generated and optimized without supervision of the designer. The process can be controlled by suitable algorithms to determine and then adjust the antenna geometry according to the specifications. Unfortunately, automatic design of IoT radiators is associated with challenges such as determination of desirable geometries or high optimization cost. In this work, a variable-fidelity framework for performance-oriented development of free-form antennas represented using the generic simulation models is proposed. The method employs a surrogate-assisted classifier capable of identifying a suitable radiator topology from a set of automatically generated (and stored for potential re-use) candidate designs. The obtained geometry is then subject to a bi-stage tuning performed using a gradient-based optimization engine. The presented framework is demonstrated based on six numerical experiments concerning unsupervised development of bandwidth-enhanced patch antennas dedicated to work within 5 GHz to 6 GHz and 6 GHz to 7 GHz bands, respectively. Extensive benchmarks of the method, as well as the generated topologies are also performed.


【3】A Rubric-Supervised Critic from Sparse Real-World Outcomes
标题:来自Sparse Real-World Outcomes的专题监督评论家
链接:https://arxiv.org/abs/2603.03800

作者:Xingyao Wang,Valerie Chen,Heng Ji,Graham Neubig
摘要:学术基准的编码代理倾向于奖励自主完成任务,衡量可验证的奖励,如单元测试的成功。相比之下,现实世界的编码代理与人类一起运行,其中成功信号通常是嘈杂的、延迟的和稀疏的。我们如何弥合这一差距?在本文中,我们提出了一个从稀疏和嘈杂的交互数据中学习“评论家”模型的过程,然后可以将其用作基于RL的训练或推理时间缩放的奖励模型。具体来说,我们介绍了批评的规则,一个基于规则的监督框架,具有24个行为特征,可以单独从人与代理的交互痕迹中获得。使用半监督目标,我们可以联合预测这些规则和稀疏的人类反馈(当存在时)。在实验中,我们证明,尽管主要从跟踪可观察的标题和稀疏的现实世界结果代理进行训练,但这些评论家改善了SWE工作台上的N个最佳重新排序(在可重新排序的轨迹子集上,Best@8 + 15.9超过Random@8),实现了早期停止(+17.7,减少了83%的尝试),并通过评论家选择的轨迹支持训练时间数据管理。
摘要:Academic benchmarks for coding agents tend to reward autonomous task completion, measured by verifiable rewards such as unit-test success. In contrast, real-world coding agents operate with humans in the loop, where success signals are typically noisy, delayed, and sparse. How can we bridge this gap? In this paper, we propose a process to learn a "critic" model from sparse and noisy interaction data, which can then be used both as a reward model for either RL-based training or inference-time scaling. Specifically, we introduce Critic Rubrics, a rubric-based supervision framework with 24 behavioral features that can be derived from human-agent interaction traces alone. Using a semi-supervised objective, we can then jointly predict these rubrics and sparse human feedback (when present). In experiments, we demonstrate that, despite being trained primarily from trace-observable rubrics and sparse real-world outcome proxies, these critics improve best-of-N reranking on SWE-bench (Best@8 +15.9 over Random@8 over the rerankable subset of trajectories), enable early stopping (+17.7 with 83% fewer attempts), and support training-time data curation via critic-selected trajectories.


【4】Directional Neural Collapse Explains Few-Shot Transfer in Self-Supervised Learning
标题:定向神经崩溃解释了自我监督学习中的Few-Shot迁移
链接:https://arxiv.org/abs/2603.03530

作者:Achleshwar Luthra,Yash Salunkhe,Tomer Galanti
摘要:冻结的自监督表示通常在许多语义任务中仅使用几个标签就可以很好地转移。我们认为,一个单一的几何量,\n {方向} CDNV(决策轴方差),坐在两个有利的行为的核心:强大的Few-Shot转移任务内,和低干扰在许多任务。我们表明,当类分离方向的变异性\{沿}很小时,两者都会出现。首先,我们证明了尖锐的非渐近多类推广边界的下游分类,其领导项是方向CDNV。边界包括有限拍摄校正,干净地分离固有的决策轴的变化,从质心估计误差。其次,我们链接的决策轴崩溃多任务几何:独立的平衡标记,小方向CDNV跨任务的力量,相应的决策轴几乎是正交的,帮助一个单一的表示支持许多任务,干扰最小。从经验上讲,在SSL目标中,即使经典CDNV仍然很大,定向CDNV也会在预训练期间崩溃,并且我们的边界在实际镜头大小下密切跟踪Few-Shot错误。此外,在合成多任务数据,我们验证SSL学习表示,其诱导决策轴是近正交的。论文的代码和项目页面可以在[\href{https://dlfundamentals.github.io/directional-neural-pronse/}{project page}]上找到。
摘要:Frozen self-supervised representations often transfer well with only a few labels across many semantic tasks. We argue that a single geometric quantity, \emph{directional} CDNV (decision-axis variance), sits at the core of two favorable behaviors: strong few-shot transfer within a task, and low interference across many tasks. We show that both emerge when variability \emph{along} class-separating directions is small. First, we prove sharp non-asymptotic multiclass generalization bounds for downstream classification whose leading term is the directional CDNV. The bounds include finite-shot corrections that cleanly separate intrinsic decision-axis variability from centroid-estimation error. Second, we link decision-axis collapse to multitask geometry: for independent balanced labelings, small directional CDNV across tasks forces the corresponding decision axes to be nearly orthogonal, helping a single representation support many tasks with minimal interference. Empirically, across SSL objectives, directional CDNV collapses during pretraining even when classical CDNV remains large, and our bounds closely track few-shot error at practical shot sizes. Additionally, on synthetic multitask data, we verify that SSL learns representations whose induced decision axes are nearly orthogonal. The code and project page of the paper are available at [\href{https://dlfundamentals.github.io/directional-neural-collapse/}{project page}].


【5】Semi-Supervised Generative Learning via Latent Space Distribution Matching
标题:通过潜在空间分布匹配的半监督生成学习
链接:https://arxiv.org/abs/2603.04223

作者:Kwong Yu Chong,Long Feng
摘要:我们介绍了潜在空间分布匹配(LSDM),一个新的框架,半监督生成建模的条件分布。LSDM分两个阶段运行:(i)从配对和未配对的数据中学习低维潜在空间,以及(ii)仅使用配对数据,通过1-Wasserstein距离在该空间中执行联合分布匹配。这种两步方法最小化了联合分布之间1-Wasserstein距离的上限,减少了对稀缺配对样本的依赖,同时实现了快速的一步生成。从理论上讲,我们建立了非渐近误差界,并证明了非配对数据的一个关键好处:增强生成的输出的几何保真度。此外,通过扩展其两个核心步骤的范围,LSDM提供了一个连贯的统计视角,连接到一个广泛的潜在空间方法。值得注意的是,潜在扩散模型(LDM)可以被视为LSDM的变体,其中联合分布匹配通过分数匹配间接实现。因此,我们的研究结果也提供了理论见解的LDM的一致性。对真实世界图像任务的实证评估,包括类条件生成和图像超分辨率,证明了LSDM在利用未配对数据提高生成质量方面的有效性。
摘要:We introduce Latent Space Distribution Matching (LSDM), a novel framework for semi-supervised generative modeling of conditional distributions. LSDM operates in two stages: (i) learning a low-dimensional latent space from both paired and unpaired data, and (ii) performing joint distribution matching in this space via the 1-Wasserstein distance, using only paired data. This two-step approach minimizes an upper bound on the 1-Wasserstein distance between joint distributions, reducing reliance on scarce paired samples while enabling fast one-step generation. Theoretically, we establish non-asymptotic error bounds and demonstrate a key benefit of unpaired data: enhanced geometric fidelity in generated outputs. Furthermore, by extending the scope of its two core steps, LSDM provides a coherent statistical perspective that connects to a broad class of latent-space approaches. Notably, Latent Diffusion Models (LDMs) can be viewed as a variant of LSDM, in which joint distribution matching is achieved indirectly via score matching. Consequently, our results also provide theoretical insights into the consistency of LDMs. Empirical evaluations on real-world image tasks, including class-conditional generation and image super-resolution, demonstrate the effectiveness of LSDM in leveraging unpaired data to enhance generation quality.


迁移|Zero/Few/One-Shot|自适应(10篇)

【1】Robust Unscented Kalman Filtering via Recurrent Meta-Adaptation of Sigma-Point Weights
标题 :通过西格玛点权重的回归元自适应的鲁棒无踪迹卡尔曼过滤
链接:https://arxiv.org/abs/2603.04360

作者:Kenan Majewski,Michał Modzelewski,Marcin Żugaj,Piotr Lichota
备注:8 pages, 3 figures, Submitted to the 29th International Conference on Information Fusion (FUSION 2026)
摘要:无迹卡尔曼滤波(UKF)是一种普遍存在的非线性状态估计工具,然而,其性能受到无迹变换(UT)的静态参数化的限制。传统的加权方案,由固定的缩放参数,假设隐式高斯性,不能适应时变动态或重尾测量噪声。这项工作介绍了元自适应UKF(MA-UKF),这是一个框架,它将sigma点权重合成重新表述为通过内存增强元学习解决的超参数优化问题。与依赖于瞬时启发式校正的标准自适应滤波器不同,我们的方法采用了递归上下文编码器来将测量创新的历史压缩到紧凑的潜在嵌入中。这种嵌入通知策略网络,该策略网络在每个时间步长动态合成sigma点的均值和协方差权重,有效地控制滤波器对预测与测量的信任。通过滤波器的递归逻辑优化系统端到端,MA-UKF学习最大化跟踪精度,同时保持估计一致性。机动目标的数值基准测试表明,MA-UKF显着优于标准基线,表现出优越的鲁棒性非高斯闪烁噪声和有效的推广到分布外(OOD)的动态制度在训练过程中看不到。
摘要:The Unscented Kalman Filter (UKF) is a ubiquitous tool for nonlinear state estimation; however, its performance is limited by the static parameterization of the Unscented Transform (UT). Conventional weighting schemes, governed by fixed scaling parameters, assume implicit Gaussianity and fail to adapt to time-varying dynamics or heavy-tailed measurement noise. This work introduces the Meta-Adaptive UKF (MA-UKF), a framework that reformulates sigma-point weight synthesis as a hyperparameter optimization problem addressed via memory-augmented meta-learning. Unlike standard adaptive filters that rely on instantaneous heuristic corrections, our approach employs a Recurrent Context Encoder to compress the history of measurement innovations into a compact latent embedding. This embedding informs a policy network that dynamically synthesizes the mean and covariance weights of the sigma points at each time step, effectively governing the filter's trust in the prediction versus the measurement. By optimizing the system end-to-end through the filter's recursive logic, the MA-UKF learns to maximize tracking accuracy while maintaining estimation consistency. Numerical benchmarks on maneuvering targets demonstrate that the MA-UKF significantly outperforms standard baselines, exhibiting superior robustness to non-Gaussian glint noise and effective generalization to out-of-distribution (OOD) dynamic regimes unseen during training.


【2】Hierarchical Inference and Closure Learning via Adaptive Surrogates for ODEs and PDEs
标题:通过ODE和PED的自适应替代品的分层推理和封闭学习
链接:https://arxiv.org/abs/2603.03922

作者:Pengyu Zhang,Arnaud Vadeboncoeur,Alex Glyn-Davies,Mark Girolami
摘要:逆问题是校准模型以匹配数据的任务。它们在各种工程应用中发挥着关键作用,允许从业者将模型与现实相结合。在许多应用中,工程师和科学家并不了解i)系统的详细属性(例如材料属性、几何形状、初始条件等); ii)描述所有起作用的动力学的完整定律(如摩擦定律、复杂的阻尼现象和一般的非线性相互作用)。在本文中,我们开发了一种原则性的方法,用于利用来自不同但相关的物理系统集合的数据来联合估计每个系统的单个模型参数,并以基于ML的闭包模型的形式学习共享的未知动态。为了鲁棒地推断每个系统的未知参数,我们采用了分层贝叶斯框架,该框架允许多个系统及其人口水平统计的联合推断。为了学习闭包,我们使用嵌入问题的ODE/PDE公式中的神经网络的最大边际似然估计。为了实现这一框架,我们利用集成Metropolis-Adjusted Langevin算法(MALA)的稳定和有效的采样。为了减轻计算瓶颈的重复正向评估在解决反问题,我们引入了一个双层优化策略,同时训练一个代理的前向模型旁边的推理。在这个框架内,我们评估和比较不同的代理架构,特别是傅立叶神经运算符(FNO)和参数物理信息神经网络(PINNs)。
摘要:Inverse problems are the task of calibrating models to match data. They play a pivotal role in diverse engineering applications by allowing practitioners to align models with reality. In many applications, engineers and scientists do not have a complete picture of i) the detailed properties of a system (such as material properties, geometry, initial conditions, etc.); ii) the complete laws describing all dynamics at play (such as friction laws, complicated damping phenomena, and general nonlinear interactions). In this paper, we develop a principled methodology for leveraging data from collections of distinct yet related physical systems to jointly estimate the individual model parameters of each system, and learn the shared unknown dynamics in the form of an ML-based closure model. To robustly infer the unknown parameters for each system, we employ a hierarchical Bayesian framework, which allows for the joint inference of multiple systems and their population-level statistics. To learn the closures, we use a maximum marginal likelihood estimate of a neural network embeded within the ODE/PDE formulation of the problem. To realize this framework we utilize the ensemble Metropolis-Adjusted Langevin Algorithm (MALA) for stable and efficient sampling. To mitigate the computational bottleneck of repetitive forward evaluations in solving inverse problems, we introduce a bilevel optimization strategy to simultaneously train a surrogate forward model alongside the inference. Within this framework, we evaluate and compare distinct surrogate architectures, specifically Fourier Neural Operators (FNO) and parametric Physics-Informed Neural Network (PINNs).


【3】IROSA: Interactive Robot Skill Adaptation using Natural Language
标题:IROSA:使用自然语言的交互式机器人技能适应
链接:https://arxiv.org/abs/2603.03897

作者:Markus Knauer,Samuel Bustamante,Thomas Eiband,Alin Albu-Schäffer,Freek Stulp,João Silvério
备注:Accepted IEEE Robotics and Automation Letters (RA-L) journal, 8 pages, 5 figures, 3 tables, 1 listing
摘要:基础模型在不同的领域已经展示了令人印象深刻的能力,而模仿学习为机器人从有限的数据中进行技能调整提供了原则性的方法。将这些方法结合起来,对于直接应用于机器人技术具有重要的前景,但这种结合受到的关注有限,特别是对于工业部署。我们提出了一个新的框架,使开放的词汇技能适应,通过基于工具的架构,保持语言模型和机器人硬件之间的保护抽象层。我们的方法利用预先训练的LLM来选择和参数化特定的工具,以适应机器人技能,而不需要微调或直接的模型与机器人交互。我们在一个7自由度的扭矩控制机器人上演示了该框架,该机器人执行工业轴承环插入任务,通过自然语言命令进行速度调整,轨迹校正和避障,同时保持安全性,透明度和可解释性,显示了成功的技能适应。
摘要:Foundation models have demonstrated impressive capabilities across diverse domains, while imitation learning provides principled methods for robot skill adaptation from limited data. Combining these approaches holds significant promise for direct application to robotics, yet this combination has received limited attention, particularly for industrial deployment. We present a novel framework that enables open-vocabulary skill adaptation through a tool-based architecture, maintaining a protective abstraction layer between the language model and robot hardware. Our approach leverages pre-trained LLMs to select and parameterize specific tools for adapting robot skills without requiring fine-tuning or direct model-to-robot interaction. We demonstrate the framework on a 7-DoF torque-controlled robot performing an industrial bearing ring insertion task, showing successful skill adaptation through natural language commands for speed adjustment, trajectory correction, and obstacle avoidance while maintaining safety, transparency, and interpretability.


【4】When and Where to Reset Matters for Long-Term Test-Time Adaptation
标题:何时何地重置是长期测试时间适应的重要因素
链接:https://arxiv.org/abs/2603.03796

作者:Taejun Lim,Joong-Won Hwang,Kibok Lee
备注:ICLR 2026
摘要 :当连续测试时间自适应(TTA)长期持续时,错误会在模型中积累,并进一步导致它只能预测所有输入的几个类,这种现象称为模型崩溃。最近的研究探索了完全消除这些累积错误的重置策略。然而,它们的周期性重置导致次优适应,因为它们独立于实际崩溃风险而发生。此外,它们的完全重置会导致随着时间的推移而获得的知识的灾难性损失,即使这些知识在未来可能是有益的。为此,我们提出了(1)自适应和选择性重置(ASR)方案,动态确定何时何地重置,(2)重要性感知正则化器,以恢复由于重置而丢失的基本知识,以及(3)动态自适应调整方案,以增强具有挑战性的域偏移下的适应性。跨长期TTA基准的广泛实验证明了我们方法的有效性,特别是在具有挑战性的条件下。我们的代码可在https://github.com/YonseiML/asr上获得。
摘要:When continual test-time adaptation (TTA) persists over the long term, errors accumulate in the model and further cause it to predict only a few classes for all inputs, a phenomenon known as model collapse. Recent studies have explored reset strategies that completely erase these accumulated errors. However, their periodic resets lead to suboptimal adaptation, as they occur independently of the actual risk of collapse. Moreover, their full resets cause catastrophic loss of knowledge acquired over time, even though such knowledge could be beneficial in the future. To this end, we propose (1) an Adaptive and Selective Reset (ASR) scheme that dynamically determines when and where to reset, (2) an importance-aware regularizer to recover essential knowledge lost due to reset, and (3) an on-the-fly adaptation adjustment scheme to enhance adaptability under challenging domain shifts. Extensive experiments across long-term TTA benchmarks demonstrate the effectiveness of our approach, particularly under challenging conditions. Our code is available at https://github.com/YonseiML/asr.


【5】TAP: A Token-Adaptive Predictor Framework for Training-Free Diffusion Acceleration
标题:RAP:无训练扩散加速的代币自适应预测框架
链接:https://arxiv.org/abs/2603.03792

作者:Haowei Zhu,Tingxuan Huang,Xing Wang,Tianyu Zhao,Jiexi Wang,Weifeng Chen,Xurui Peng,Fangmin Chen,Junhai Yong,Bin Wang
备注:Accepted by CVPR 2026
摘要:扩散模型实现了强大的生成性能,但由于需要重复的全模型去噪通道,因此推理速度仍然很慢。我们提出了令牌自适应预测器(TAP),一个无训练的,探测驱动的框架,自适应地选择一个预测每个令牌在每个采样步骤。TAP使用模型第一层的单个完整评估作为低成本探测器来计算候选预测器的紧凑系列的代理损失(主要用不同阶数和范围的泰勒展开来实例化),然后为每个令牌分配具有最小代理误差的预测器。这种每令牌“探测然后选择”策略利用异构的时间动态,不需要额外的训练,并且与各种预测器设计兼容。TAP带来的开销可以忽略不计,同时能够在几乎没有感知质量损失的情况下实现大幅加速。跨多个扩散架构和生成任务的广泛实验表明,TAP与固定的全局预测器和仅缓存基线相比,大大提高了准确性-效率边界。
摘要:Diffusion models achieve strong generative performance but remain slow at inference due to the need for repeated full-model denoising passes. We present Token-Adaptive Predictor (TAP), a training-free, probe-driven framework that adaptively selects a predictor for each token at every sampling step. TAP uses a single full evaluation of the model's first layer as a low-cost probe to compute proxy losses for a compact family of candidate predictors (instantiated primarily with Taylor expansions of varying order and horizon), then assigns each token the predictor with the smallest proxy error. This per-token "probe-then-select" strategy exploits heterogeneous temporal dynamics, requires no additional training, and is compatible with various predictor designs. TAP incurs negligible overhead while enabling large speedups with little or no perceptual quality loss. Extensive experiments across multiple diffusion architectures and generation tasks show that TAP substantially improves the accuracy-efficiency frontier compared to fixed global predictors and caching-only baselines.


【6】Adaptive Sensing of Continuous Physical Systems for Machine Learning
标题:用于机器学习的连续物理系统的自适应感知
链接:https://arxiv.org/abs/2603.03650

作者:Felix Köster,Atsushi Uchida
摘要:物理动力系统可以被看作是自然的信息处理器:它们的系统保存、变换和分散输入信息。这种观点不仅激发了学习这些系统生成的数据,而且还激发了学习如何以提取给定任务最有用信息的方式来测量它们。我们提出了一个通用的计算框架,用于从动态系统中提取自适应信息,其中可训练的注意力模块学习在哪里探测系统状态以及如何结合这些测量来优化预测性能。作为一个具体的实例,我们实现了这个想法,使用一个时空场的偏微分方程作为基础动力学,虽然框架同样适用于任何系统的状态可以采样。我们的研究结果表明,自适应空间感测显着提高预测精度规范混沌基准。这项工作提供了一个观点,注意力增强水库计算作为一个更广泛的范例的特殊情况下:神经网络作为可训练的测量设备,从物理动力系统中提取信息。
摘要:Physical dynamical systems can be viewed as natural information processors: their systems preserve, transform, and disperse input information. This perspective motivates learning not only from data generated by such systems, but also how to measure them in a way that extracts the most useful information for a given task. We propose a general computing framework for adaptive information extraction from dynamical systems, in which a trainable attention module learns both where to probe the system state and how to combine these measurements to optimize prediction performance. As a concrete instantiation, we implement this idea using a spatiotemporal field governed by a partial differential equation as the underlying dynamics, though the framework applies equally to any system whose state can be sampled. Our results show that adaptive spatial sensing significantly improves prediction accuracy on canonical chaotic benchmarks. This work provides a perspective on attention-enhanced reservoir computing as a special case of a broader paradigm: neural networks as trainable measurement devices for extracting information from physical dynamical systems.


【7】ByteFlow: Language Modeling through Adaptive Byte Compression without a Tokenizer
标题:ByteFlow:通过自适应字节压缩进行语言建模,无需Tokenizer
链接:https://arxiv.org/abs/2603.03583

作者:Chunyuan Deng,Sanket Lokegaonkar,Colin Lockard,Besnik Fetahu,Nasser Zalmout,Xian Li
备注:ICLR 2026
摘要:现代语言模型仍然依赖于固定的、预定义的子词标记化。一旦一个分词器被训练好,LM就只能在这个固定的粒度级别上运行,这通常会导致脆弱和违反直觉的行为,即使在其他强大的推理模型中也是如此。我们引入了\textbf{ByteFlow Net},这是一种新的分层架构,它完全删除了标记器,而是使模型能够学习自己将原始字节流分割成语义上有意义的单元。ByteFlow Net基于潜在表示的编码率执行压缩驱动的分割,产生自适应边界,同时通过Top-$K$ selection保留静态计算图。与之前依赖于具有人类设计的归纳偏见的脆弱算法的自标记化方法不同,ByteFlow Net将其内部表示粒度调整为输入本身。实验表明,这种基于压缩的分块策略产生了显著的性能增益,ByteFlow Net的性能优于基于BPE的Transformers和以前的字节级架构。这些结果表明,端到端的无标记建模不仅可行,而且更有效,为更适应和基于信息的语言模型开辟了道路。
摘要:Modern language models still rely on fixed, pre-defined subword tokenizations. Once a tokenizer is trained, the LM can only operate at this fixed level of granularity, which often leads to brittle and counterintuitive behaviors even in otherwise strong reasoning models. We introduce \textbf{ByteFlow Net}, a new hierarchical architecture that removes tokenizers entirely and instead enables models to learn their own segmentation of raw byte streams into semantically meaningful units. ByteFlow Net performs compression-driven segmentation based on the coding rate of latent representations, yielding adaptive boundaries \emph{while preserving a static computation graph via Top-$K$ selection}. Unlike prior self-tokenizing methods that depend on brittle heuristics with human-designed inductive biases, ByteFlow Net adapts its internal representation granularity to the input itself. Experiments demonstrate that this compression-based chunking strategy yields substantial performance gains, with ByteFlow Net outperforming both BPE-based Transformers and previous byte-level architectures. These results suggest that end-to-end, tokenizer-free modeling is not only feasible but also more effective, opening a path toward more adaptive and information-grounded language models.


【8】Test-Time Meta-Adaptation with Self-Synthesis
标题:具有自我合成的测试时元适应
链接:https://arxiv.org/abs/2603.03524

作者:Zeyneb N. Kaya,Nick Rui
备注:5 pages, 2 figures, 1 table. Accepted to ICLR 2026 LIT and Data-FM Workshops
摘要:作为强大的通用推理机,大型语言模型(LLM)遇到不同的领域和任务,在测试时适应和自我改进的能力是有价值的。我们引入了MASS,这是一个元学习框架,它使LLM能够通过生成特定于问题的合成训练数据并在推理时执行针对下游性能优化的有针对性的自我更新来进行自适应。我们通过双层优化来训练这种行为:内部循环适应自我生成的示例,而外部循环元学习数据属性信号并奖励更新后的任务性能。合成数据使用可扩展的元梯度进行优化,通过内部更新反向传播下游损失以奖励有用的世代。数学推理实验表明,MASS学会综合每个实例的课程,从而产生有效的、数据高效的测试时间适应。
摘要:As strong general reasoners, large language models (LLMs) encounter diverse domains and tasks, where the ability to adapt and self-improve at test time is valuable. We introduce MASS, a meta-learning framework that enables LLMs to self-adapt by generating problem-specific synthetic training data and performing targeted self-updates optimized for downstream performance at inference time. We train this behavior end-to-end via bilevel optimization: an inner loop adapts on self-generated examples while an outer loop meta-learns data-attribution signals and rewards post-update task performance. The synthetic data is optimized with scalable meta-gradients, backpropagating the downstream loss through the inner updates to reward useful generations. Experiments on mathematical reasoning show that MASS learns to synthesize per-instance curricula that yield effective, data-efficient test-time adaptation.


【9】Observationally Informed Adaptive Causal Experimental Design
标题:观察知情的适应性因果实验设计
链接:https://arxiv.org/abs/2603.03785

作者:Erdun Gao,Liang Zhang,Jake Fawkes,Aoqi Zuo,Wenqin Liu,Haoxuan Li,Mingming Gong,Dino Sejdinovic
摘要:随机对照试验(RCT)代表了因果推断的黄金标准,但仍然是一种稀缺资源。虽然大规模观察性数据通常可用,但仅用于回顾性融合,由于偏倚问题,在前瞻性试验设计中仍被丢弃。我们认为这种“白板”数据采集策略从根本上是低效的。在这项工作中,我们提出了主动残差学习,一种新的范式,利用观察模型作为基础先验。这种方法将实验的重点从从头开始学习目标因果量转移到有效地估计校正观测偏差所需的残差。为了实现这一点,我们引入了R-Design框架。从理论上讲,我们建立了两个关键优势:(1)结构效率差距,证明估计平滑残差对比度比重建完整结果具有更快的收敛速度;以及(2)信息效率,其中我们量化了标准基于参数的获取中的冗余(例如,BALD),表明这样的基线浪费预算的任务无关的滋扰不确定性。我们提出了R-EPIG(残差预期预测信息增益),这是一个统一的标准,直接针对因果被估量,最大限度地减少估计的残差不确定性或澄清决策边界的政策。在合成和半合成基准测试上的实验表明,R-Design的性能明显优于基线,这证实了修复有偏差的模型比从头开始学习模型要有效得多。
摘要:Randomized Controlled Trials (RCTs) represent the gold standard for causal inference yet remain a scarce resource. While large-scale observational data is often available, it is utilized only for retrospective fusion, and remains discarded in prospective trial design due to bias concerns. We argue this "tabula rasa" data acquisition strategy is fundamentally inefficient. In this work, we propose Active Residual Learning, a new paradigm that leverages the observational model as a foundational prior. This approach shifts the experimental focus from learning target causal quantities from scratch to efficiently estimating the residuals required to correct observational bias. To operationalize this, we introduce the R-Design framework. Theoretically, we establish two key advantages: (1) a structural efficiency gap, proving that estimating smooth residual contrasts admits strictly faster convergence rates than reconstructing full outcomes; and (2) information efficiency, where we quantify the redundancy in standard parameter-based acquisition (e.g., BALD), demonstrating that such baselines waste budget on task-irrelevant nuisance uncertainty. We propose R-EPIG (Residual Expected Predictive Information Gain), a unified criterion that directly targets the causal estimand, minimizing residual uncertainty for estimation or clarifying decision boundaries for policy. Experiments on synthetic and semi-synthetic benchmarks demonstrate that R-Design significantly outperforms baselines, confirming that repairing a biased model is far more efficient than learning one from scratch.


【10】Beyond Cross-Validation: Adaptive Parameter Selection for Kernel-Based Gradient Descents
标题:超越交叉验证:基于核的梯度下降的自适应参数选择
链接:https://arxiv.org/abs/2603.03401

作者:Xiaotong Liu,Yunwen Lei,Xiangyu Chang,Shao-Bo Lin
摘要:提出了一种新的基于核的梯度下降(KGD)算法的参数选择策略,结合偏差方差分析和分裂方法。我们引入经验有效维的概念来量化KGD中的迭代增量,从而得到一个可实现的自适应参数选择策略。在学习理论的框架内提供了理论验证。利用最近开发的积分算子的方法,我们严格证明,KGD,配备了建议的自适应参数选择策略,实现了最佳的泛化误差界,并有效地适应不同的内核,目标函数和误差度量。因此,这种策略展示了显着的优势,现有的参数选择方法KGD。
摘要:This paper proposes a novel parameter selection strategy for kernel-based gradient descent (KGD) algorithms, integrating bias-variance analysis with the splitting method. We introduce the concept of empirical effective dimension to quantify iteration increments in KGD, deriving an adaptive parameter selection strategy that is implementable. Theoretical verifications are provided within the framework of learning theory. Utilizing the recently developed integral operator approach, we rigorously demonstrate that KGD, equipped with the proposed adaptive parameter selection strategy, achieves the optimal generalization error bound and adapts effectively to different kernels, target functions, and error metrics. Consequently, this strategy showcases significant advantages over existing parameter selection methods for KGD.


强化学习(6篇)

【1】IPD: Boosting Sequential Policy with Imaginary Planning Distillation in Offline Reinforcement Learning
标题:IPD:在离线强化学习中通过想象的规划蒸馏来增强顺序策略
链接:https://arxiv.org/abs/2603.04289

作者:Yihao Qin,Yuanfei Wang,Hang Zhou,Peiran Liu,Hao Dong,Yiding Ji
摘要:基于决策Transformer的顺序策略已经成为离线强化学习(RL)中的一个强大范例,但其有效性仍然受到静态数据集质量和固有架构限制的限制。具体来说,这些模型往往难以有效地整合次优经验,并且无法明确规划最优策略。为了弥合这一差距,我们提出了\textbf{Imaginary Planning Distillation(IPD)},这是一个新颖的框架,可以将离线规划无缝地集成到数据生成、监督训练和在线推理中。我们的框架首先从离线数据中学习一个世界模型,该模型配备了不确定性度量和准最优值函数。这些组件用于识别次优轨迹,并通过模型预测控制(MPC)生成的可靠的、想象的最优推出来增强它们。然后,在这个丰富的数据集上训练一个基于Transformer的顺序策略,并辅之以一个价值导向的目标,以促进最优策略的提炼。通过用学习的准最优值函数取代传统的手动调整的返回到去,IPD提高了决策的稳定性和推理过程中的性能。对D4 RL基准测试的实证评估表明,IPD在不同任务中的表现明显优于几种最先进的基于值和基于transformer的离线RL方法。
摘要 :Decision transformer based sequential policies have emerged as a powerful paradigm in offline reinforcement learning (RL), yet their efficacy remains constrained by the quality of static datasets and inherent architectural limitations. Specifically, these models often struggle to effectively integrate suboptimal experiences and fail to explicitly plan for an optimal policy. To bridge this gap, we propose \textbf{Imaginary Planning Distillation (IPD)}, a novel framework that seamlessly incorporates offline planning into data generation, supervised training, and online inference. Our framework first learns a world model equipped with uncertainty measures and a quasi-optimal value function from the offline data. These components are utilized to identify suboptimal trajectories and augment them with reliable, imagined optimal rollouts generated via Model Predictive Control (MPC). A Transformer-based sequential policy is then trained on this enriched dataset, complemented by a value-guided objective that promotes the distillation of the optimal policy. By replacing the conventional, manually-tuned return-to-go with the learned quasi-optimal value function, IPD improves both decision-making stability and performance during inference. Empirical evaluations on the D4RL benchmark demonstrate that IPD significantly outperforms several state-of-the-art value-based and transformer-based offline RL methods across diverse tasks.


【2】Selecting Offline Reinforcement Learning Algorithms for Stochastic Network Control
标题:随机网络控制的离线强化学习算法选择
链接:https://arxiv.org/abs/2603.03932

作者:Nicolas Helson,Pegah Alizadeh,Anastasios Giovanidis
备注:Long version 12 pages, double column including Appendix. Short version accepted at NOMS2026-IPSN, Rome, Italy
摘要:离线强化学习(RL)是下一代无线网络的一种很有前途的方法,在线探索是不安全的,大量的操作数据可以在整个模型生命周期中重复使用。然而,离线RL算法在真正的随机动态下的行为-由于衰落,噪声和流量移动性而固有的无线系统-仍然没有得到充分的理解。我们通过评估基于Bellman的(保守Q学习),基于序列的(决策Transformers)和混合(关键引导决策Transformers)离线RL方法在一个开放访问的随机电信环境(mobile-env)来解决这个差距。我们的研究结果表明,保守Q-Learning在不同的随机性来源中始终产生更强大的策略,使其成为生命周期驱动的AI管理框架中的可靠默认选择。基于序列的方法仍然具有竞争力,当有足够的高回报轨迹时,可以胜过基于贝尔曼的方法。这些发现为人工智能驱动的网络控制管道中的离线RL算法选择提供了实用指导,例如O-RAN和未来的6 G功能,其中鲁棒性和数据可用性是关键的操作约束。
摘要:Offline Reinforcement Learning (RL) is a promising approach for next-generation wireless networks, where online exploration is unsafe and large amounts of operational data can be reused across the model lifecycle. However, the behavior of offline RL algorithms under genuinely stochastic dynamics -- inherent to wireless systems due to fading, noise, and traffic mobility -- remains insufficiently understood. We address this gap by evaluating Bellman-based (Conservative Q-Learning), sequence-based (Decision Transformers), and hybrid (Critic-Guided Decision Transformers) offline RL methods in an open-access stochastic telecom environment (mobile-env). Our results show that Conservative Q-Learning consistently produces more robust policies across different sources of stochasticity, making it a reliable default choice in lifecycle-driven AI management frameworks. Sequence-based methods remain competitive and can outperform Bellman-based approaches when sufficient high-return trajectories are available. These findings provide practical guidance for offline RL algorithm selection in AI-driven network control pipelines, such as O-RAN and future 6G functions, where robustness and data availability are key operational constraints.


【3】Fairness Begins with State: Purifying Latent Preferences for Hierarchical Reinforcement Learning in Interactive Recommendation
标题:公平从状态开始:净化交互式推荐中分层强化学习的潜在偏好
链接:https://arxiv.org/abs/2603.03820

作者:Yun Lu,Xiaoyu Shi,Hong Xie,Xiangyu Zhao,Mingsheng Shang
摘要:交互式推荐系统(IRS)越来越多地使用强化学习(RL)进行优化,以捕捉用户-系统动态的顺序性质。然而,现有的公平意识的方法往往遭受一个根本的疏忽:他们假设观察到的用户状态是真实偏好的忠实代表。实际上,隐式反馈被受欢迎的噪声和曝光偏差所污染,从而产生了一种扭曲的状态,误导了RL代理。我们认为,准确性和公平性之间的持续冲突不仅仅是一个奖励塑造问题,而且是一个状态估计失败。在这项工作中,我们提出了\textbf{DSRM-HRL},一个框架,重新制定公平意识的建议作为一个潜在的状态净化问题,其次是解耦的层次决策。我们引入了一个基于扩散模型的去噪状态表示模块(DSRM),以从高熵,嘈杂的相互作用历史中恢复低熵的潜在偏好流形。建立在这种净化状态,分层强化学习(HRL)代理解耦冲突的目标:高层次的政策调节长期的公平轨迹,而低层次的政策优化这些动态约束下的短期参与。在高保真模拟器(KuaiRec,KuaiRand)上进行的大量实验表明,DSRM-HRL有效地打破了“富人越来越富”的反馈循环,实现了推荐效用和曝光公平之间的优越帕累托边界。
摘要:Interactive recommender systems (IRS) are increasingly optimized with Reinforcement Learning (RL) to capture the sequential nature of user-system dynamics. However, existing fairness-aware methods often suffer from a fundamental oversight: they assume the observed user state is a faithful representation of true preferences. In reality, implicit feedback is contaminated by popularity-driven noise and exposure bias, creating a distorted state that misleads the RL agent. We argue that the persistent conflict between accuracy and fairness is not merely a reward-shaping issue, but a state estimation failure. In this work, we propose \textbf{DSRM-HRL}, a framework that reformulates fairness-aware recommendation as a latent state purification problem followed by decoupled hierarchical decision-making. We introduce a Denoising State Representation Module (DSRM) based on diffusion models to recover the low-entropy latent preference manifold from high-entropy, noisy interaction histories. Built upon this purified state, a Hierarchical Reinforcement Learning (HRL) agent is employed to decouple conflicting objectives: a high-level policy regulates long-term fairness trajectories, while a low-level policy optimizes short-term engagement under these dynamic constraints. Extensive experiments on high-fidelity simulators (KuaiRec, KuaiRand) demonstrate that DSRM-HRL effectively breaks the "rich-get-richer" feedback loop, achieving a superior Pareto frontier between recommendation utility and exposure equity.


【4】Learning Approximate Nash Equilibria in Cooperative Multi-Agent Reinforcement Learning via Mean-Field Subsampling
标题:通过平均场子采样在协作多智能体强化学习中学习近似纳什均衡
链接:https://arxiv.org/abs/2603.03759

作者:Emile Anand,Ishani Karmarkar
备注:48 pages, 4 figures, 2 tables
摘要:许多大型平台和网络控制系统都有一个集中的决策者与大量的代理人在严格的可观测性约束下进行交互。出于这样的应用程序,我们研究了一个合作的马尔可夫博弈与全球代理和$n$同质的本地代理在通信受限的制度,其中全球代理只观察一个子集的$k$本地代理状态每一个时间步。我们提出了一个交替的学习框架$(\texttt{ALTERNATING-MARL})$,其中的全球代理执行子采样平均场$Q$-学习对一个固定的本地政策,和本地代理更新优化诱导MDP。我们证明了这些近似的最佳响应动态收敛到$\widetilde{O}(1/\sqrt{k})$近似纳什均衡,同时在联合状态空间和动作空间之间的样本复杂性中产生分离。最后,我们在多机器人控制和联邦优化的数值模拟中验证了我们的结果。
摘要:Many large-scale platforms and networked control systems have a centralized decision maker interacting with a massive population of agents under strict observability constraints. Motivated by such applications, we study a cooperative Markov game with a global agent and $n$ homogeneous local agents in a communication-constrained regime, where the global agent only observes a subset of $k$ local agent states per time step. We propose an alternating learning framework $(\texttt{ALTERNATING-MARL})$, where the global agent performs subsampled mean-field $Q$-learning against a fixed local policy, and local agents update by optimizing in an induced MDP. We prove that these approximate best-response dynamics converge to an $\widetilde{O}(1/\sqrt{k})$-approximate Nash Equilibrium, while yielding a separation in the sample complexities between the joint state space and action space. Finally, we validate our results in numerical simulations for multi-robot control and federated optimization.


【5】Hybrid Belief Reinforcement Learning for Efficient Coordinated Spatial Exploration
标题:混合信念强化学习用于高效协调空间探索
链接:https://arxiv.org/abs/2603.03595

作者:Danish Rizvi,David Boyle
摘要:协调多个自主代理探索和服务空间异构的需求,需要共同学习未知的空间模式和规划轨迹,最大限度地提高任务性能。纯基于模型的方法提供了结构化的不确定性估计,但缺乏自适应策略学习,而深度强化学习在空间先验不存在时通常会遇到样本效率低下的问题。本文提出了一种混合信念强化学习(HBRL)框架,以解决这一差距。在第一阶段中,代理使用对数高斯考克斯过程(LGCP)构建空间信念,并执行由路径互信息(PathMI)规划器引导的信息驱动轨迹。在第二阶段,轨迹控制转移到软演员-评论家(SAC)代理,通过双通道知识转移热启动:信念状态初始化提供空间不确定性,重放缓冲区播种提供LGCP探索过程中生成的演示轨迹。方差归一化的重叠惩罚,通过共享的信念状态,使协调覆盖,允许合作感测在高不确定性区域,同时阻止冗余的覆盖范围内充分探索的地区。该框架进行评估的多无人机无线服务提供任务。结果显示,与基线相比,累积奖励增加了10.8%,收敛速度加快了38%,消融研究证实双通道传输优于单独的通道。
摘要:Coordinating multiple autonomous agents to explore and serve spatially heterogeneous demand requires jointly learning unknown spatial patterns and planning trajectories that maximize task performance. Pure model-based approaches provide structured uncertainty estimates but lack adaptive policy learning, while deep reinforcement learning often suffers from poor sample efficiency when spatial priors are absent. This paper presents a hybrid belief-reinforcement learning (HBRL) framework to address this gap. In the first phase, agents construct spatial beliefs using a Log-Gaussian Cox Process (LGCP) and execute information-driven trajectories guided by a Pathwise Mutual Information (PathMI) planner with multi-step lookahead. In the second phase, trajectory control is transferred to a Soft Actor-Critic (SAC) agent, warm-started through dual-channel knowledge transfer: belief state initialization supplies spatial uncertainty, and replay buffer seeding provides demonstration trajectories generated during LGCP exploration. A variance-normalized overlap penalty enables coordinated coverage through shared belief state, permitting cooperative sensing in high-uncertainty regions while discouraging redundant coverage in well-explored areas. The framework is evaluated on a multi-UAV wireless service provisioning task. Results show 10.8% higher cumulative reward and 38% faster convergence over baselines, with ablation studies confirming that dual-channel transfer outperforms either channel alone.


【6】Minimax Optimal Strategy for Delayed Observations in Online Reinforcement Learning
标题:在线强化学习中延迟观测的Minimax最优策略
链接:https://arxiv.org/abs/2603.03480

作者:Harin Lee,Kevin Jamieson
摘要:我们研究具有延迟状态观测的强化学习,其中代理在一些随机数量的时间步长后观察当前状态。我们提出了一种算法,结合了增强方法和置信上限的方法。对于表马尔可夫决策过程(MDP),我们得到了一个后悔界$\tilde{\mathcal{O}}(H \sqrt{D_{\max} SAK})$,其中$S$和$A$是状态空间和动作空间的基数,$H$是时间范围,$K$是发作次数,$D_{\max}$是最大延迟长度.我们还提供了一个匹配的下限对数因子,显示我们的方法的最优性。我们的分析框架制定这个问题作为一个特殊的情况下,更广泛的一类MDP,他们的过渡动力分解成一个已知的组件和一个未知的,但结构化的组件。我们建立这个抽象的设置,这可能是独立的利益的一般结果。
摘要:We study reinforcement learning with delayed state observation, where the agent observes the current state after some random number of time steps. We propose an algorithm that combines the augmentation method and the upper confidence bound approach. For tabular Markov decision processes (MDPs), we derive a regret bound of $\tilde{\mathcal{O}}(H \sqrt{D_{\max} SAK})$, where $S$ and $A$ are the cardinalities of the state and action spaces, $H$ is the time horizon, $K$ is the number of episodes, and $D_{\max}$ is the maximum length of the delay. We also provide a matching lower bound up to logarithmic factors, showing the optimality of our approach. Our analytical framework formulates this problem as a special case of a broader class of MDPs, where their transition dynamics decompose into a known component and an unknown but structured component. We establish general results for this abstract setting, which may be of independent interest.


符号|符号学习(1篇)

【1】Neuro-Symbolic Decoding of Neural Activity
标题:神经活动的神经符号解码
链接:https://arxiv.org/abs/2603.03343

作者:Yanchen Wang,Joy Hsu,Ehsan Adeli,Jiajun Wu
备注:ICLR 2026. First two authors contributed equally
摘要:我们提出了NEURONA,一个神经符号框架的功能磁共振成像解码和概念接地的神经活动。利用基于图像和视频的fMRI问答数据集,NEURONA学会根据fMRI响应模式从视觉刺激中解码相互作用的概念,将符号推理和组合执行与跨大脑区域的fMRI接地相结合。我们证明,将结构先验(例如,将概念之间的组合谓词-变元依赖性(compositional prediction-argument dependencies)引入到解码过程中显著地提高了精确查询的解码准确性,并且特别地,在测试时对看不见的查询的泛化。通过NEURONA,我们强调神经符号框架是理解神经活动的有前途的工具。
摘要:We propose NEURONA, a neuro-symbolic framework for fMRI decoding and concept grounding in neural activity. Leveraging image- and video-based fMRI question-answering datasets, NEURONA learns to decode interacting concepts from visual stimuli based on patterns of fMRI responses, integrating symbolic reasoning and compositional execution with fMRI grounding across brain regions. We demonstrate that incorporating structural priors (e.g., compositional predicate-argument dependencies between concepts) into the decoding process significantly improves both decoding accuracy over precise queries, and notably, generalization to unseen queries at test time. With NEURONA, we highlight neuro-symbolic frameworks as promising tools for understanding neural activity.


分层学习(1篇)

【1】Online Learning for Multi-Layer Hierarchical Inference under Partial and Policy-Dependent Feedback
标题:部分反馈和政策相关反馈下的多层分层推理在线学习
链接:https://arxiv.org/abs/2603.04247

作者:Haoran Zhang,Seohyeon Cha,Hasan Burhan Beytur,Kevin S Chan,Gustavo de Veciana,Haris Vikalo
备注:preprint
摘要:分层推理系统跨多个计算层路由任务,其中每个节点可以本地完成预测或将任务卸载到下一层中的节点以进行进一步处理。在这样的系统中学习最优路由策略是具有挑战性的:推理损失是跨层递归定义的,而预测错误的反馈只在终端预言层显示。这会导致部分的、依赖于策略的反馈结构,其中可观察性概率随着深度而衰减,导致重要性加权估计量受到方差放大的影响。我们研究了在长期资源约束和终端反馈下的多层层次推理的在线路由问题。我们形式化的递归损失结构,并表明,天真的重要性加权上下文的强盗方法变得不稳定的反馈概率衰减沿层次。为了解决这个问题,我们开发了一个方差减少EXP4为基础的算法集成了李雅普诺夫优化,产生无偏的损失估计和稳定的学习稀疏和政策相关的反馈。我们提供了遗憾的保证,相对于最佳的固定路由政策在事后,并建立随机到达和资源约束下的近最优。在大规模多任务工作负载上的实验表明,与标准的重要性加权方法相比,该方法具有更好的稳定性和性能。
摘要 :Hierarchical inference systems route tasks across multiple computational layers, where each node may either finalize a prediction locally or offload the task to a node in the next layer for further processing. Learning optimal routing policies in such systems is challenging: inference loss is defined recursively across layers, while feedback on prediction error is revealed only at a terminal oracle layer. This induces a partial, policy-dependent feedback structure in which observability probabilities decay with depth, causing importance-weighted estimators to suffer from amplified variance. We study online routing for multi-layer hierarchical inference under long-term resource constraints and terminal-only feedback. We formalize the recursive loss structure and show that naive importance-weighted contextual bandit methods become unstable as feedback probability decays along the hierarchy. To address this, we develop a variance-reduced EXP4-based algorithm integrated with Lyapunov optimization, yielding unbiased loss estimation and stable learning under sparse and policy-dependent feedback. We provide regret guarantees relative to the best fixed routing policy in hindsight and establish near-optimality under stochastic arrivals and resource constraints. Experiments on large-scale multi-task workloads demonstrate improved stability and performance compared to standard importance-weighted approaches.


医学相关(3篇)

【1】Spectral Surgery: Training-Free Refinement of LoRA via Gradient-Guided Singular Value Reweighting
标题:光谱手术:通过受试者引导奇异值重新加权进行LoRA的免训练改进
链接:https://arxiv.org/abs/2603.03995

作者:Zailong Tian,Yanzhe Chen,Zhuoheng Han,Lizi Liao
摘要:低秩自适应(LoRA)通过将任务更新限制在低秩参数子空间来提高下游性能,但这种有限的容量如何在经过训练的适配器内分配仍不清楚。通过对多个任务和主干的几何和经验研究,我们发现经过训练的LoRA更新通常表现出低效的频谱:任务影响集中在奇异方向的一个小子集上,而许多剩余的组件是中性或有害的,从而激发了学习子空间内的事后细化。我们提出了频谱手术,一种无需训练的改进,它使用SVD分解LoRA更新,使用小校准集上的梯度估计每个分量的灵敏度,并在幅度约束下重新加权奇异值,同时保持学习方向固定。在Llama-3.1-8B和Qwen 3 -8B的四个基准测试中,光谱手术仅通过调整约1{,}000$标量系数即可获得一致的增益(CommonsenseQA高达+4.4点,HumanEval为+2.4通过@1)。这些结果表明,SVD结构化的低成本参数编辑可以作为一种实用的途径,以纯粹的事后方式改进训练的LoRA适配器。
摘要:Low-Rank Adaptation (LoRA) improves downstream performance by restricting task updates to a low-rank parameter subspace, yet how this limited capacity is allocated within a trained adapter remains unclear. Through a geometric and empirical study across multiple tasks and backbones, we find that trained LoRA updates often exhibit an inefficient spectrum: task effects concentrate in a small subset of singular directions, while many remaining components are neutral or detrimental, motivating post-hoc refinement within the learned subspace. We propose Spectral Surgery, a training-free refinement that decomposes a LoRA update with SVD, estimates per-component sensitivity using gradients on a small calibration set, and reweights singular values under a magnitude constraint while keeping the learned directions fixed. Across Llama-3.1-8B and Qwen3-8B on four benchmarks, Spectral Surgery yields consistent gains (up to +4.4 points on CommonsenseQA and +2.4 pass@1 on HumanEval) by adjusting only $\approx 1{,}000$ scalar coefficients. These results demonstrate that SVD-structured, low-cost parameter editing can serve as a practical route to improving trained LoRA adapters in a purely post-hoc manner.


【2】IntroductionDMD-augmented Unpaired Neural Schrödinger Bridge for Ultra-Low Field MRI Enhancement
标题:简介用于超低场MRI增强的DMZ增强非配对神经薛定汉桥
链接:https://arxiv.org/abs/2603.03769

作者:Youngmin Kim,Jaeyun Shin,Jeongchan Kim,Taehoon Lee,Jaemin Kim,Peter Hsu,Jelle Veraart,Jong Chul Ye
摘要:超低场(64 mT)脑部MRI改善了可访问性,但与3 T相比,图像质量降低。由于成对的64 mT - 3 T扫描是稀缺的,我们提出了一个不成对的64 mT $\rightarrow $3 T的翻译框架,提高了现实主义,同时保留解剖。我们的方法建立在非成对神经薛定谔桥(UNSB)的基础上,具有多步细化。为了加强目标分布对齐,我们使用冻结的3 T扩散教师,通过DMD 2风格的扩散引导分布匹配来增强对抗目标。为了明确地约束全局结构超出补丁级别的对应关系,我们结合PatchNCE与解剖结构保留(ASP)正则化,强制执行软前景背景一致性和边界感知约束。在两个不相交的队列进行评估,所提出的框架实现了改进的现实主义结构权衡,增强了未配对基准上的分布水平现实主义,同时增加了配对队列上的结构保真度。
摘要:Ultra Low Field (64 mT) brain MRI improves accessibility but suffers from reduced image quality compared to 3 T. As paired 64 mT - 3 T scans are scarce, we propose an unpaired 64 mT $\rightarrow$ 3 T translation framework that enhances realism while preserving anatomy. Our method builds upon the Unpaired Neural Schrödinge Bridge (UNSB) with multi-step refinement. To strengthen target distribution alignment, we augment the adversarial objective with DMD2-style diffusion-guided distribution matching using a frozen 3T diffusion teacher. To explicitly constrain global structure beyond patch-level correspondence, we combine PatchNCE with an Anatomical Structure Preservation (ASP) regularizer that enforces soft foreground background consistency and boundary aware constraints. Evaluated on two disjoint cohorts, the proposed framework achieves an improved realism structure trade-off, enhancing distribution level realism on unpaired benchmarks while increasing structural fidelity on the paired cohort compared to unpaired baselines.


【3】MMAI Gym for Science: Training Liquid Foundation Models for Drug Discovery
标题:MMAI科学健身房:训练药物发现液体基金会模型
链接:https://arxiv.org/abs/2603.03517

作者:Maksim Kuznetsov,Zulfat Miftahutdinov,Rim Shayakhmetov,Mikolaj Mizera,Roman Schutski,Bogdan Zagribelnyy,Ivan Ilin,Nikita Bondarev,Thomas MacDougall,Mathieu Reymond,Mihir Bafna,Kaeli Kaymak-Loveless,Eugene Babin,Maxim Malkov,Mathias Lechner,Ramin Hasani,Alexander Amini,Vladimir Aladinskiy,Alex Aliper,Alex Zhavoronkov
摘要:依赖于上下文学习的通用大型语言模型(LLM)不能可靠地提供药物发现任务所需的科学理解和性能。简单地增加模型大小或引入推理令牌不会产生显着的性能增益。为了解决这一差距,我们引入了MMAI Gym for Science,这是一个一站式的分子数据格式和模式,以及特定任务的推理,培训和基准测试配方,旨在教授基础模型“分子语言”,以解决实际的药物发现问题。我们使用MMAI健身房训练一个有效的液体基金会模型(LFM),这些应用程序,证明了较小的,目的训练的基础模型可以大大超过较大的通用或专业模型的分子基准。在基本药物发现任务中-包括分子优化,ADMET属性预测,逆合成,药物靶点活性预测和官能团推理-所产生的模型实现了接近专家水平的性能,并且在大多数情况下,超过了较大的模型,同时在该领域保持更高效和广泛的适用性。
摘要:General-purpose large language models (LLMs) that rely on in-context learning do not reliably deliver the scientific understanding and performance required for drug discovery tasks. Simply increasing model size or introducing reasoning tokens does not yield significant performance gains. To address this gap, we introduce the MMAI Gym for Science, a one-stop shop molecular data formats and modalities as well as task-specific reasoning, training, and benchmarking recipes designed to teach foundation models the 'language of molecules' in order to solve practical drug discovery problems. We use MMAI Gym to train an efficient Liquid Foundation Model (LFM) for these applications, demonstrating that smaller, purpose-trained foundation models can outperform substantially larger general-purpose or specialist models on molecular benchmarks. Across essential drug discovery tasks - including molecular optimization, ADMET property prediction, retrosynthesis, drug-target activity prediction, and functional group reasoning - the resulting model achieves near specialist-level performance and, in the majority of settings, surpasses larger models, while remaining more efficient and broadly applicable in the domain.


蒸馏|知识提取(1篇)

【1】Harmonic Dataset Distillation for Time Series Forecasting
标题:用于时间序列预测的调和数据集蒸馏
链接:https://arxiv.org/abs/2603.03760

作者:Seungha Hong,Sanghwan Jang,Wonbin Kweon,Suyeon Kim,Gyuseok Lee,Hwanjo Yu
备注:AAAI 2026
摘要 :由于现实世界数据的大规模,现代时间序列预测(TSF)面临着巨大的计算和存储成本挑战。数据集蒸馏(DD)是一种合成小型紧凑数据集以实现与原始数据集相当的训练性能的范例,已成为一种有前途的解决方案。然而,传统的DD方法不是为时间序列定制的,并且遭受架构过拟合和有限的可扩展性。为了解决这些问题,我们提出了谐波数据集蒸馏时间序列预测(HDT)。HDT通过FFT将时间序列分解为正弦基,并通过谐波匹配对齐核心周期结构。由于该过程在频域中操作,因此蒸馏期间的所有更新都在全局范围内应用,而不会破坏时间序列的时间依赖性。大量的实验表明,HDT实现了强大的跨架构的推广和可扩展性,验证了其实用性的大规模,现实世界的应用。
摘要:Time Series forecasting (TSF) in the modern era faces significant computational and storage cost challenges due to the massive scale of real-world data. Dataset Distillation (DD), a paradigm that synthesizes a small, compact dataset to achieve training performance comparable to that of the original dataset, has emerged as a promising solution. However, conventional DD methods are not tailored for time series and suffer from architectural overfitting and limited scalability. To address these issues, we propose Harmonic Dataset Distillation for Time Series Forecasting (HDT). HDT decomposes the time series into its sinusoidal basis through the FFT and aligns the core periodic structure by Harmonic Matching. Since this process operates in the frequency domain, all updates during distillation are applied globally without disrupting temporal dependencies of time series. Extensive experiments demonstrate that HDT achieves strong cross-architecture generalization and scalability, validating its practicality for large-scale, real-world applications.


推荐(1篇)

【1】Not All Candidates are Created Equal: A Heterogeneity-Aware Approach to Pre-ranking in Recommender Systems
标题:并非所有候选人都是平等的:推荐系统中预排名的一种具有异类意识的方法
链接:https://arxiv.org/abs/2603.03770

作者:Pengfei Tong,Siyuan Chen,Chenwei Zhang,Bo Wang,Qi Pi,Pixun Li,Zuotao Liu
备注:Accepted by WWW'26
摘要:大多数大型推荐系统遵循检索、预排名、排名和重新排名的多级级联。预排名阶段的一个关键挑战来自于从粗粒度检索结果、细粒度排名信号和曝光反馈中采样的训练实例的异质性。我们的分析表明,普遍的预排序方法,不加区别地混合异质样本,遭受梯度冲突:硬样本占主导地位的训练,而容易的仍然没有得到充分利用,导致次优性能。我们进一步表明,在所有样本中统一缩放模型复杂度的常见做法是低效的,因为它在简单的情况下会超支计算,并且在没有成比例增益的情况下会减慢训练速度。为了解决这些限制,本文提出了异质性感知自适应预排名(HAP),一个统一的框架,通过冲突敏感的采样加上定制的损失设计,同时自适应地分配计算预算的候选人,以减轻梯度冲突。具体来说,HAP将简单和困难的样本分开,沿着专用的优化路径引导每个子集。在这种分离的基础上,它首先将轻量级模型应用于所有候选模型以实现有效覆盖,并进一步在硬模型上使用更强大的模型,在保持准确性的同时降低成本。这种方法不仅提高了预排名的有效性,但也提供了一个实用的角度在工业推荐系统的缩放策略。HAP已经在今日头条生产系统中部署了9个月,在不增加计算成本的情况下,用户应用使用时长提升了0.4%,活跃天数提升了0.05%。我们还发布了一个大规模的工业混合样本数据集,以便在预排名中系统地研究源驱动的候选异质性。
摘要:Most large-scale recommender systems follow a multi-stage cascade of retrieval, pre-ranking, ranking, and re-ranking. A key challenge at the pre-ranking stage arises from the heterogeneity of training instances sampled from coarse-grained retrieval results, fine-grained ranking signals, and exposure feedback. Our analysis reveals that prevailing pre-ranking methods, which indiscriminately mix heterogeneous samples, suffer from gradient conflicts: hard samples dominate training while easy ones remain underutilized, leading to suboptimal performance. We further show that the common practice of uniformly scaling model complexity across all samples is inefficient, as it overspends computation on easy cases and slows training without proportional gains. To address these limitations, this paper presents Heterogeneity-Aware Adaptive Pre-ranking (HAP), a unified framework that mitigates gradient conflicts through conflict-sensitive sampling coupled with tailored loss design, while adaptively allocating computational budgets across candidates. Specifically, HAP disentangles easy and hard samples, directing each subset along dedicated optimization paths. Building on this separation, it first applies lightweight models to all candidates for efficient coverage, and further engages stronger models on the hard ones, maintaining accuracy while reducing cost. This approach not only improves pre-ranking effectiveness but also provides a practical perspective on scaling strategies in industrial recommender systems. HAP has been deployed in the Toutiao production system for 9 months, yielding up to 0.4% improvement in user app usage duration and 0.05% in active days, without additional computational cost. We also release a large-scale industrial hybrid-sample dataset to enable the systematic study of source-driven candidate heterogeneity in pre-ranking.


聚类(2篇)

【1】Transport Clustering: Solving Low-Rank Optimal Transport via Clustering
标题:传输集群:通过集群解决低级别最优传输
链接:https://arxiv.org/abs/2603.03578

作者:Henri Schmidt,Peter Halmos,Ben Raphael
摘要:最优运输(OT)使用定义在点对上的成本矩阵在两个概率分布之间找到最低成本的运输计划。与标准OT不同,它推断非结构化逐点映射,低秩最优运输显式约束运输计划的秩来推断潜在结构。这提高了统计的稳定性和鲁棒性,产生更尖锐的参数率估计自适应的内在排名Wasserstein距离,并推广$K$-手段,共同聚类。然而,这些优点是以非凸和NP-难优化问题为代价的。我们介绍运输聚类,算法来计算一个低秩OT计划,降低低秩OT的聚类问题,从一个满秩$\textit{运输注册}$步骤获得的对应关系。我们证明了这种减少产生多项式时间,常数因子近似算法低秩OT:具体来说,一个$(1+γ)$近似负型度量和$(1+γ+\sqrt{2γ}\,)$近似核成本,其中$γ\在[0,1]$表示最佳满秩解决方案相对于低秩最佳的近似比。从经验上讲,传输聚类在合成基准和大规模高维数据集上的性能优于现有的低秩OT求解器。
摘要:Optimal transport (OT) finds a least cost transport plan between two probability distributions using a cost matrix defined on pairs of points. Unlike standard OT, which infers unstructured pointwise mappings, low-rank optimal transport explicitly constrains the rank of the transport plan to infer latent structure. This improves statistical stability and robustness, yields sharper parametric rates for estimating Wasserstein distances adaptive to the intrinsic rank, and generalizes $K$-means to co-clustering. These advantages, however, come at the cost of a non-convex and NP-hard optimization problem. We introduce transport clustering, an algorithm to compute a low-rank OT plan that reduces low-rank OT to a clustering problem on correspondences obtained from a full-rank $\textit{transport registration}$ step. We prove that this reduction yields polynomial-time, constant-factor approximation algorithms for low-rank OT: specifically, a $(1+γ)$ approximation for negative-type metrics and a $(1+γ+\sqrt{2γ}\,)$ approximation for kernel costs, where $γ\in [0,1]$ denotes the approximation ratio of the optimal full-rank solution relative to the low-rank optimal. Empirically, transport clustering outperforms existing low-rank OT solvers on synthetic benchmarks and large-scale, high-dimensional datasets.


【2】Learning Order Forest for Qualitative-Attribute Data Clustering
标题:质量属性数据聚集的学习顺序森林
链接:https://arxiv.org/abs/2603.03387

作者:Mingjie Zhao,Sen Feng,Yiqun Zhang,Mengke Li,Yang Lu,Yiu-ming Cheung
备注:Accepted to ECAI2024
摘要:聚类是理解数据模式的基本方法,其中通常采用直观的欧几里得距离空间。然而,对于由定性属性值反映的隐式聚类分布,例如,因此,本文发现了一种树状距离结构,以灵活地表示属性内定性值之间的局部顺序关系。也就是说,将值视为树的顶点允许捕获顶点值和其他值之间的丰富顺序关系。为了获得聚类友好的形式的树,提出了一种联合学习机制,迭代地获得更合适的树结构和集群。事实证明,整个数据集的潜在距离空间可以很好地表示由学习树组成的森林。大量的实验表明,联合学习使森林适应聚类任务,以产生准确的结果。在12个真实基准数据集上的10个对应数据的显著性检验验证了该方法的优越性。
摘要 :Clustering is a fundamental approach to understanding data patterns, wherein the intuitive Euclidean distance space is commonly adopted. However, this is not the case for implicit cluster distributions reflected by qualitative attribute values, e.g., the nominal values of attributes like symptoms, marital status, etc. This paper, therefore, discovered a tree-like distance structure to flexibly represent the local order relationship among intra-attribute qualitative values. That is, treating a value as the vertex of the tree allows to capture rich order relationships among the vertex value and the others. To obtain the trees in a clustering-friendly form, a joint learning mechanism is proposed to iteratively obtain more appropriate tree structures and clusters. It turns out that the latent distance space of the whole dataset can be well-represented by a forest consisting of the learned trees. Extensive experiments demonstrate that the joint learning adapts the forest to the clustering task to yield accurate results. Comparisons of 10 counterparts on 12 real benchmark datasets with significance tests verify the superiority of the proposed method.


超分辨率|去噪|去模糊|去雾(1篇)

【1】FastWave: Optimized Diffusion Model for Audio Super-Resolution
标题:FastWave:音频超分辨率的优化扩散模型
链接:https://arxiv.org/abs/2603.04122

作者:Nikita Kuznetsov,Maksim Kaledin
摘要:音频超分辨率是一组旨在对给定信号进行高质量估计的技术,就好像它将以更高的采样率进行采样一样。在建议的方法中,有扩散和流动模型(被认为是较慢的),生成对抗网络(被认为是较快的),但这两种方法目前都是由高参数网络提出的,需要很高的训练和推理计算成本。我们通过重新考虑扩散模型训练的最新进展并将其应用于从任何到48 kHz采样率的超分辨率,提出了这两个问题的解决方案。我们的方法显示出比NU-Wave 2更好的结果,并且与最先进的模型相当。我们的FastWave模型具有大约50 GFLOPs的计算复杂度和130万个参数,可以用更少的资源进行训练,并且比大多数最近提出的基于扩散和流的解决方案要快得多。该守则已公开提供。
摘要:Audio Super-Resolution is a set of techniques aimed at high-quality estimation of the given signal as if it would be sampled with higher sample rate. Among suggested methods there are diffusion and flow models (which are considered slower), generative adversarial networks (which are considered faster), however both approaches are currently presented by high-parametric networks, requiring high computational costs both for training and inference. We propose a solution to both these problems by re-considering the recent advances in the training of diffusion models and applying them to super-resolution from any to 48 kHz sample rate. Our approach shows better results than NU-Wave 2 and is comparable to state-of-the-art models. Our model called FastWave has around 50 GFLOPs of computational complexity and 1.3 M parameters and can be trained with less resources and significantly faster than the majority of recently proposed diffusion- and flow-based solutions. The code has been made publicly available.


点云|SLAM|雷达|激光|深度RGBD相关(1篇)

【1】When Shallow Wins: Silent Failures and the Depth-Accuracy Paradox in Latent Reasoning
标题:当浅层获胜时:潜在推理中的无声失败和深度准确性悖论
链接:https://arxiv.org/abs/2603.03475

作者:Subramanyam Sahoo,Aman Chadha,Vinija Jain,Divya Chaudhary
备注:Accepted at ICLR 2026 Workshop on Latent & Implicit Thinking - Going Beyond CoT Reasoning. 19 Pages and 5 Figures
摘要:数学推理模型被广泛应用于教育、自动辅导和决策支持系统,尽管它表现出基本的计算不稳定性。我们证明,最先进的模型(Qwen2.5-Math-7 B)通过可靠和不可靠的推理途径的混合实现了61%的准确率:18.4%的正确预测采用稳定,忠实的推理,而81.6%通过计算不一致的途径出现。此外,所有预测中有8.8%是无声的失败-自信但不正确的输出。通过使用新的可信度度量进行综合分析,我们发现:(1)推理质量与正确性呈弱负相关(r=-0.21,p =0.002),反映了二元分类阈值伪影,而不是单调相反的关系;(2)从1.5B到7 B参数的缩放(增加4.7倍)对我们评估的子集提供零准确度优势(GSM 8 K的6%),需要在完整的基准上进行验证;(3)潜在推理采用不同的计算策略,其中约20%共享CoT样模式。这些研究结果强调,基准的准确性可以掩盖计算的不可靠性,要求评价改革衡量稳定性超越单样本指标。
摘要:Mathematical reasoning models are widely deployed in education, automated tutoring, and decision support systems despite exhibiting fundamental computational instabilities. We demonstrate that state-of-the-art models (Qwen2.5-Math-7B) achieve 61% accuracy through a mixture of reliable and unreliable reasoning pathways: 18.4% of correct predictions employ stable, faithful reasoning while 81.6% emerge through computationally inconsistent pathways. Additionally, 8.8% of all predictions are silent failures -- confident yet incorrect outputs. Through comprehensive analysis using novel faithfulness metrics, we reveal: (1) reasoning quality shows weak negative correlation with correctness (r=-0.21, p=0.002), reflecting a binary classification threshold artifact rather than a monotonic inverse relationship; (2) scaling from 1.5B to 7B parameters (4.7x increase) provides zero accuracy benefit on our evaluated subset (6% of GSM8K), requiring validation on the complete benchmark; and (3) latent reasoning employs diverse computational strategies, with ~20% sharing CoT-like patterns. These findings highlight that benchmark accuracy can mask computational unreliability, demanding evaluation reforms measuring stability beyond single-sample metrics.


联邦学习|隐私保护|加密(3篇)

【1】PTOPOFL: Privacy-Preserving Personalised Federated Learning via Persistent Homology
标题:PTOPOFL:通过持久性同质化保护隐私的个性化联邦学习
链接:https://arxiv.org/abs/2603.04323

作者:Kelly L Vomo-Donfack,Adryel Hoszu,Grégory Ginot,Ian Morilla
备注:22 pages, 6 Figures
摘要:联邦学习(FL)面临两种结构性紧张:梯度共享使数据重构攻击成为可能,而非IID客户端分布降低了聚合质量。我们介绍PTOPOFL,一个框架,同时解决这两个挑战,取代梯度通信与拓扑描述符来自持久同源性(PH)。客户端只传输48维PH特征向量-紧凑的形状摘要,其多对一的结构使反演可证明不适定-而不是模型梯度。服务器执行拓扑引导的个性化聚合:客户端通过PH图之间的Wasserstein相似性进行聚类,集群内模型是拓扑加权的,集群与全局共识混合。我们证明了一个信息压缩定理,表明PH描述符泄漏严格小于每个样本的互信息比强凸损失函数下的梯度,我们建立了线性收敛的Wasserstein加权聚合方案的错误地板严格小于FedAvg。在非IID医疗保健场景(8家医院,2家对抗性)和病理基准(10名客户)上对FedAvg,FedProx,SCAFFOLD和pFedMe进行评估,PTOPOFL分别达到AUC 0.841和0.910-在两种设置中最高-同时相对于梯度共享将重建风险降低了4.5倍。代码可在https://github.com/MorillaLab/TopoFederatedL上公开获取,数据可在https://doi.org/10.5281/zenodo.18827595上获取。
摘要:Federated learning (FL) faces two structural tensions: gradient sharing enables data-reconstruction attacks, while non-IID client distributions degrade aggregation quality. We introduce PTOPOFL, a framework that addresses both challenges simultaneously by replacing gradient communication with topological descriptors derived from persistent homology (PH). Clients transmit only 48-dimensional PH feature vectors-compact shape summaries whose many-to-one structure makes inversion provably ill-posed-rather than model gradients. The server performs topology-guided personalised aggregation: clients are clustered by Wasserstein similarity between their PH diagrams, intra-cluster models are topology-weighted,and clusters are blended with a global consensus. We prove an information-contraction theorem showing that PH descriptors leak strictly less mutual information per sample than gradients under strongly convex loss functions, and we establish linear convergence of the Wasserstein-weighted aggregation scheme with an error floor strictly smaller than FedAvg. Evaluated against FedAvg, FedProx, SCAFFOLD, and pFedMe on a non-IID healthcare scenario (8 hospitals, 2 adversarial) and a pathological benchmark (10 clients), PTOPOFL achieves AUC 0.841 and 0.910 respectively-the highest in both settings-while reducing reconstruction risk by a factor of 4.5 relative to gradient sharing. Code is publicly available at https://github.com/MorillaLab/TopoFederatedL and data at https://doi.org/10.5281/zenodo.18827595.


【2】Noise-aware Client Selection for carbon-efficient Federated Learning via Gradient Norm Thresholding
标题:通过梯度规范约束来选择具有噪音意识的客户端以实现碳效率的联邦学习
链接:https://arxiv.org/abs/2603.04194

作者:Patrick Wilhelm,Inese Yilmaz,Odej Kao
摘要:训练大规模神经网络需要大量的计算能力和能量。联合学习支持跨地理空间分布的数据中心进行分布式模型训练,利用可再生能源减少人工智能训练的碳足迹。已经开发了各种客户端选择策略,以使可再生能源的波动性与联邦系统中稳定和公平的模型训练保持一致。然而,由于联邦学习的隐私保护性质,客户端设备上的数据质量仍然未知,这对有效的模型训练构成了挑战。在本文中,我们介绍了一种模块化的方法,以最先进的客户端选择策略,用于碳效率的联合学习。我们的方法通过结合嘈杂的客户端数据过滤来增强鲁棒性,从而在数据质量未知的情况下提高模型性能和可持续性。此外,我们还探讨了碳预算对模型收敛、平衡效率和可持续性的影响。通过广泛的评估,我们表明,现代客户端选择策略的基础上,本地客户端的损失往往会选择客户端的噪声数据,最终降低模型的性能。为了解决这个问题,我们提出了一种梯度范数阈值机制,使用探测轮进行更有效的客户端选择和噪声检测,有助于碳效率联邦学习的实际部署。
摘要:Training large-scale Neural Networks requires substantial computational power and energy. Federated Learning enables distributed model training across geospatially distributed data centers, leveraging renewable energy sources to reduce the carbon footprint of AI training. Various client selection strategies have been developed to align the volatility of renewable energy with stable and fair model training in a federated system. However, due to the privacy-preserving nature of Federated Learning, the quality of data on client devices remains unknown, posing challenges for effective model training. In this paper, we introduce a modular approach on top to state-of-the-art client selection strategies for carbon-efficient Federated Learning. Our method enhances robustness by incorporating a noisy client data filtering, improving both model performance and sustainability in scenarios with unknown data quality. Additionally, we explore the impact of carbon budgets on model convergence, balancing efficiency and sustainability. Through extensive evaluations, we demonstrate that modern client selection strategies based on local client loss tend to select clients with noisy data, ultimately degrading model performance. To address this, we propose a gradient norm thresholding mechanism using probing rounds for more effective client selection and noise detection, contributing to the practical deployment of carbon-efficient Federated Learning.


【3】FedCova: Robust Federated Covariance Learning Against Noisy Labels
标题:FedCova:对抗噪音标签的稳健联合协方差学习
链接:https://arxiv.org/abs/2603.04062

作者:Xiangyu Zhong,Xiaojun Yuan,Ying-Jun Angela Zhang
摘要:分布式数据集中的噪声标签会导致严重的局部过拟合,从而危及联邦学习(FL)中的全局模型。大多数现有的解决方案依赖于选择干净的设备或与公共干净的数据集对齐,而不是赋予模型本身鲁棒性。在本文中,我们提出了FedCova,一个无依赖性的联邦协方差学习框架,通过一个新的视角来增强模型的内在鲁棒性,从而消除了这种外部依赖。具体来说,FedCova将数据编码到一个有区别但有弹性的特征空间中,以容忍标签噪声。建立在互信息最大化,我们设计了一个新的联邦有损特征编码的目标,仅依赖于类特征协方差与容错项。利用协方差特征子空间,我们构建了一个子空间增强的联邦分类器。FedCova通过协方差统一了三个关键过程:(1)训练网络进行特征编码,(2)直接从学习的特征中构建分类器,(3)基于特征子空间校正噪声标签。我们实现FedCova在异构数据分布下的对称和非对称噪声设置。在CIFAR-10/100和真实噪声数据集Clothing 1 M上的实验结果表明,与最先进的方法相比,FedCova具有更好的鲁棒性。
摘要:Noisy labels in distributed datasets induce severe local overfitting and consequently compromise the global model in federated learning (FL). Most existing solutions rely on selecting clean devices or aligning with public clean datasets, rather than endowing the model itself with robustness. In this paper, we propose FedCova, a dependency-free federated covariance learning framework that eliminates such external reliances by enhancing the model's intrinsic robustness via a new perspective on feature covariances. Specifically, FedCova encodes data into a discriminative but resilient feature space to tolerate label noise. Built on mutual information maximization, we design a novel objective for federated lossy feature encoding that relies solely on class feature covariances with an error tolerance term. Leveraging feature subspaces characterized by covariances, we construct a subspace-augmented federated classifier. FedCova unifies three key processes through the covariance: (1) training the network for feature encoding, (2) constructing a classifier directly from the learned features, and (3) correcting noisy labels based on feature subspaces. We implement FedCova across both symmetric and asymmetric noisy settings under heterogeneous data distribution. Experimental results on CIFAR-10/100 and real-world noisy dataset Clothing1M demonstrate the superior robustness of FedCova compared with the state-of-the-art methods.


推理|分析|理解|解释(7篇)

【1】InstMeter: An Instruction-Level Method to Predict Energy and Latency of DL Model Inference on MCUs
标题:InstMeter:一种预测MCU上DL模型推理的能量和延迟的指令级方法
链接:https://arxiv.org/abs/2603.04134

作者:Hao Liu,Qing Wang,Marco Zuniga
备注:17 pages
摘要:深度学习(DL)模型现在可以在微控制器(MCU)上运行。通过神经结构搜索(NAS),我们可以搜索满足MCU约束的DL模型。在各种约束中,模型推理的能量和延迟成本是关键指标。为了预测它们,现有的研究依赖于粗略的代理,如乘法累积(MAC)和模型的输入参数,通常会导致不准确的预测或需要大量的数据收集。在本文中,我们提出了InstMeter,一个预测器,利用MCU的时钟周期来准确地估计DL模型的能量和延迟。时钟周期是反映MCU操作的基本指标,直接决定能量和延迟成本。此外,我们的预测器的一个独特属性是其强线性,使其简单而准确。我们在不同的场景、MCU和软件设置下全面评估InstMeter。与最先进的研究相比,InstMeter可以将能量和延迟预测误差分别减少$3\times$和$6.5\times$,同时需要的训练数据减少$100\times$和$10\times$。在NAS场景中,InstMeter可以充分利用能量预算,识别具有更高推理准确性的最佳DL模型。我们还通过在三个ARM微处理器上的各种实验来评估InstMeter的泛化性能(Cortex-M4、M7、M33)和一个基于RISC-V的MCU(ESP 32-C3),不同的编译选项(-Os,-O2),GCC版本(v7.3、v10.3)、应用场景(关键字定位、图像识别)、动态电压和频率缩放、温度(21°C、43°C)和软件设置(TFLMv2.4、TFLMvCI)。我们将开放我们的源代码和MCU特定的基准数据集。
摘要:Deep learning (DL) models can now run on microcontrollers (MCUs). Through neural architecture search (NAS), we can search DL models that meet the constraints of MCUs. Among various constraints, energy and latency costs of the model inference are critical metrics. To predict them, existing research relies on coarse proxies such as multiply-accumulations (MACs) and model's input parameters, often resulting in inaccurate predictions or requiring extensive data collection. In this paper, we propose InstMeter, a predictor leveraging MCUs' clock cycles to accurately estimate the energy and latency of DL models. Clock cycles are fundamental metrics reflecting MCU operations, directly determining energy and latency costs. Furthermore, a unique property of our predictor is its strong linearity, allowing it to be simple and accurate. We thoroughly evaluate InstMeter under different scenarios, MCUs, and software settings. Compared with state-of-the-art studies, InstMeter can reduce the energy and latency prediction errors by $3\times$ and $6.5\times$, respectively, while requiring $100\times$ and $10\times$ less training data. In the NAS scenario, InstMeter can fully exploit the energy budget, identifying optimal DL models with higher inference accuracy. We also evaluate InstMeter's generalization performance through various experiments on three ARM MCUs (Cortex-M4, M7, M33) and one RISC-V-based MCU (ESP32-C3), different compilation options (-Os, -O2), GCC versions (v7.3, v10.3), application scenarios (keyword spotting, image recognition), dynamic voltage and frequency scaling, temperatures (21°C, 43°C), and software settings (TFLMv2.4, TFLMvCI). We will open our source codes and the MCU-specific benchmark datasets.


【2】Continuous Modal Logical Neural Networks: Modal Reasoning via Stochastic Accessibility
标题:连续模式逻辑神经网络:通过随机可及性进行模式推理
链接:https://arxiv.org/abs/2603.04019

作者:Antonin Sulc
备注:10 pages, 5 figures, 20th INTERNATIONAL CONFERENCE ON NEUROSYMBOLIC LEARNING AND REASONING
摘要:我们提出流体逻辑,模态逻辑推理,时间,认知,doxastic,道义,解除从离散Kripke结构连续流形通过神经随机微分方程(神经SDES)的一种范式。每种类型的模态算子都由一个专用的神经网络支持,嵌套的公式将这些PDE组成一个可微图。一个关键的实例是逻辑信息神经网络(LINNs):类似于物理信息神经网络(PINNs),LINNs将模态逻辑公式(如($\Box$ bounded)和($\Diamond$ visits\_lobe))直接嵌入训练损失中,指导神经网络产生结构上与指定逻辑属性一致的解决方案,而不需要控制方程的知识。   由此产生的框架,连续模态逻辑神经网络(CMLNNs),产生了几个关键属性:(i)随机扩散防止量词崩溃($\Box$和$\Diamond$不同),不像确定性常微分方程;(ii)模态算子是熵风险度量,相对于基于风险的语义,具有显式Monte Carlo浓度保证;(三)SDE诱导的可访问性提供了结构对应与经典模态公理;(四)参数化的可访问性,通过动态减少内存从二次世界计数的线性参数。   三个案例研究表明,流体逻辑和LINN可以引导神经网络在不同的领域产生一致的解决方案:认识/doxastic逻辑(多机器人幻觉检测),时间逻辑(仅从逻辑约束恢复Lorenz吸引子几何)和道义逻辑(从逻辑规范学习安全约束动力学)。
摘要:We propose Fluid Logic, a paradigm in which modal logical reasoning, temporal, epistemic, doxastic, deontic, is lifted from discrete Kripke structures to continuous manifolds via Neural Stochastic Differential Equations (Neural SDEs). Each type of modal operator is backed by a dedicated Neural SDE, and nested formulas compose these SDEs in a single differentiable graph. A key instantiation is Logic-Informed Neural Networks (LINNs): analogous to Physics-Informed Neural Networks (PINNs), LINNs embed modal logical formulas such as ($\Box$ bounded) and ($\Diamond$ visits\_lobe) directly into the training loss, guiding neural networks to produce solutions that are structurally consistent with prescribed logical properties, without requiring knowledge of the governing equations.   The resulting framework, Continuous Modal Logical Neural Networks (CMLNNs), yields several key properties: (i) stochastic diffusion prevents quantifier collapse ($\Box$ and $\Diamond$ differ), unlike deterministic ODEs; (ii) modal operators are entropic risk measures, sound with respect to risk-based semantics with explicit Monte Carlo concentration guarantees; (iii)SDE-induced accessibility provides structural correspondence with classical modal axioms; (iv) parameterizing accessibility through dynamics reduces memory from quadratic in world count to linear in parameters.   Three case studies demonstrate that Fluid Logic and LINNs can guide neural networks to produce consistent solutions across diverse domains: epistemic/doxastic logic (multi-robot hallucination detection), temporal logic (recovering the Lorenz attractor geometry from logical constraints alone), and deontic logic (learning safe confinement dynamics from a logical specification).


【3】Role-Aware Conditional Inference for Spatiotemporal Ecosystem Carbon Flux Prediction
标题:时空生态系统碳通量预测的角色感知条件推理
链接:https://arxiv.org/abs/2603.03531

作者:Yiming Sun,Runlong Yu,Rongchao Dong,Shuo Chen,Licheng Liu,Youmi Oh,Qianlai Zhuang,Yiqun Xie,Xiaowei Jia
摘要:准确预测陆地生态系统碳通量(例如,CO_2 $、GPP和CH_4 $)对于理解全球碳循环和管理其影响至关重要。然而,由于强烈的时空异质性,预测仍然具有挑战性:生态系统通量响应受到缓慢变化的制度条件的约束,而短期波动则由高频动态强迫驱动。大多数现有的基于学习的方法将环境协变量视为同质输入空间,隐含地假设全局响应函数,这导致跨异质生态系统的脆弱概括。在这项工作中,我们提出了角色感知条件推理(RACI),一个过程知情的学习框架,制定生态系统通量预测作为一个条件推理问题。RACI采用分层的时间编码解开慢政权空调从快速的动态驱动程序,并结合角色感知的空间检索,提供功能相似的和地理上的本地上下文为每个角色。通过明确地对这些不同的功能角色进行建模,RACI使模型能够在不同的环境条件下调整其预测,而无需训练单独的局部模型或依赖于固定的空间结构。我们评估RACI跨多种生态系统类型(湿地和农业系统),碳通量(CO_2 $,GPP,CH_4 $$)和数据源,包括基于过程的模拟和观测测量。在所有设置中,RACI始终优于竞争对手的时空基线,表现出明显的环境异质性下提高的准确性和空间概括。
摘要:Accurate prediction of terrestrial ecosystem carbon fluxes (e.g., CO$_2$, GPP, and CH$_4$) is essential for understanding the global carbon cycle and managing its impacts. However, prediction remains challenging due to strong spatiotemporal heterogeneity: ecosystem flux responses are constrained by slowly varying regime conditions, while short-term fluctuations are driven by high-frequency dynamic forcings. Most existing learning-based approaches treat environmental covariates as a homogeneous input space, implicitly assuming a global response function, which leads to brittle generalization across heterogeneous ecosystems. In this work, we propose Role-Aware Conditional Inference (RACI), a process-informed learning framework that formulates ecosystem flux prediction as a conditional inference problem. RACI employs hierarchical temporal encoding to disentangle slow regime conditioners from fast dynamic drivers, and incorporates role-aware spatial retrieval that supplies functionally similar and geographically local context for each role. By explicitly modeling these distinct functional roles, RACI enables a model to adapt its predictions across diverse environmental regimes without training separate local models or relying on fixed spatial structures. We evaluate RACI across multiple ecosystem types (wetlands and agricultural systems), carbon fluxes (CO$_2$, GPP, CH$_4$), and data sources, including both process-based simulations and observational measurements. Across all settings, RACI consistently outperforms competitive spatiotemporal baselines, demonstrating improved accuracy and spatial generalization under pronounced environmental heterogeneity.


【4】IntPro: A Proxy Agent for Context-Aware Intent Understanding via Retrieval-conditioned Inference
标题:IntPro:通过检索条件推理实现上下文感知意图理解的代理代理
链接:https://arxiv.org/abs/2603.03325

作者:Guanming Liu,Meng Wu,Peng Zhang,Yu Zhang,Yubo Shu,Xianliang Huang,Kainan Tu,Ning Gu,Liuxin Zhang,Qianying Wang,Tun Lu
摘要:大型语言模型(LLM)已成为现代人类-人工智能协作工作流的组成部分,准确理解用户意图是生成满意响应的关键一步。上下文感知的意图理解,涉及从情境环境中推断用户意图,本质上是具有挑战性的,因为它需要对直接上下文和用户驱动其行为的潜在动机进行推理。此外,现有的方法通常将意图理解视为静态识别任务,忽略了用户积累的意图模式,这些模式可以为更准确和可推广的理解提供有价值的参考。为了解决这一差距,我们提出了IntPro,一个代理,学会适应个人用户通过检索条件的意图推断。我们设计意图解释,抽象上下文信号如何连接到表达的意图,并将它们存储在一个单独的意图历史库中进行检索。我们通过对检索条件轨迹的监督微调和具有工具感知奖励功能的多圈组相对策略优化(GRPO)来训练IntPro,使智能体能够学习何时利用历史意图模式以及何时直接推断。在三个不同场景(Highlight-Intent,MIntRec2.0和微博Post-Sync)上的实验表明,IntPro在不同场景和模型类型上都具有强大的意图理解性能和有效的上下文感知推理能力。
摘要:Large language models (LLMs) have become integral to modern Human-AI collaboration workflows, where accurately understanding user intent serves as a crucial step for generating satisfactory responses. Context-aware intent understanding, which involves inferring user intentions from situational environments, is inherently challenging because it requires reasoning over both the immediate context and the user's underlying motivations that drive their behavior. Moreover, existing approaches often treat intent understanding as a static recognition task, overlooking users' accumulated intent patterns that could provide valuable references for more accurate and generalizable understanding. To address this gap, we propose IntPro, a proxy agent that learns to adapt to individual users via retrieval-conditioned intent inference. We design intent explanations that abstract how contextual signals connect to expressed intents, and store them in an individual intent history library for retrieval. We train IntPro through supervised fine-tuning on retrieval-conditioned trajectories and multi-turn Group Relative Policy Optimization (GRPO) with tool-aware reward functions, enabling the agent to learn when to leverage historical intent patterns and when to infer directly. Experiments across three diverse scenarios (Highlight-Intent, MIntRec2.0, and Weibo Post-Sync) demonstrate that IntPro achieves strong intent understanding performance with effective context-aware reasoning capabilities across different scenarios and model types.


【5】M-QUEST -- Meme Question-Understanding Evaluation on Semantics and Toxicity
标题:M-QUEST --模因-语义和毒性的理解评估
链接:https://arxiv.org/abs/2603.03315

作者:Stefano De Giorgis,Ting-Chih Chen,Filip Ilievski
摘要:互联网模因是一种强大的在线交流形式,但它们的性质和对常识知识的依赖使毒性检测具有挑战性。识别模因解释和理解的关键特征是一项至关重要的任务。以前的工作一直集中在一些元素的意义,如文本维度通过OCR,视觉维度通过对象识别,上层的意义,如情感维度,毒性检测通过代理变量,如仇恨言论检测,和情感分析。尽管如此,仍然缺乏一个整体架构,能够正式确定有助于米姆的意义的元素,并在意义的制造过程中使用。在这项工作中,我们提出了一个语义框架和相应的基准自动知识提取模因。首先,我们确定了必要的维度来理解和解释模因:文本材料,视觉材料,场景,背景知识,情感,符号投射,类比映射,整体意图,目标社区和毒性评估。第二,框架指导一个半自动的过程,生成一个基准与常识问答对有关模因毒性评估及其根本原因。由此产生的基准M-QUEST由307个模因的609个问答对组成。第三,我们评估了八个开源大型语言模型正确解决M-QUEST的能力。我们的研究结果表明,目前的模型的常识推理能力的有毒模因解释不同的维度和架构。具有指令调整和推理能力的模型显著优于其他模型,尽管语用推理问题仍然具有挑战性。我们发布代码,基准测试和提示,以支持未来的研究交叉多模态内容安全和常识推理。
摘要:Internet memes are a powerful form of online communication, yet their nature and reliance on commonsense knowledge make toxicity detection challenging. Identifying key features for meme interpretation and understanding, is a crucial task. Previous work has been focused on some elements contributing to the meaning, such as the Textual dimension via OCR, the Visual dimension via object recognition, upper layers of meaning like the Emotional dimension, Toxicity detection via proxy variables, such as hate speech detection, and sentiment analysis. Nevertheless, there is still a lack of an overall architecture able to formally identify elements contributing to the meaning of a meme, and be used in the sense-making process. In this work, we present a semantic framework and a corresponding benchmark for automatic knowledge extraction from memes. First, we identify the necessary dimensions to understand and interpret a meme: Textual material, Visual material, Scene, Background Knowledge, Emotion, Semiotic Projection, Analogical Mapping, Overall Intent, Target Community, and Toxicity Assessment. Second, the framework guides a semi-automatic process of generating a benchmark with commonsense question-answer pairs about meme toxicity assessment and its underlying reason. The resulting benchmark M-QUEST consists of 609 question-answer pairs for 307 memes. Thirdly, we evaluate eight open-source large language models on their ability to correctly solve M-QUEST. Our results show that current models' commonsense reasoning capabilities for toxic meme interpretation vary depending on the dimension and architecture. Models with instruction tuning and reasoning capabilities significantly outperform the others, though pragmatic inference questions remain challenging. We release code, benchmark, and prompts to support future research intersecting multimodal content safety and commonsense reasoning.


【6】TTSR: Test-Time Self-Reflection for Continual Reasoning Improvement
标题:TTSR:测试时自我反思,以提高连续推理
链接:https://arxiv.org/abs/2603.03297

作者:Haoyang He,Zihua Rong,Liangjie Zhao,Yunjia Zhao,Lan Yang,Honggang Zhang
备注:work in progress
摘要:测试时训练仅使用测试问题实现模型自适应,并为提高大型语言模型(LLM)的推理能力提供了一个有前途的范例。然而,它面临着两大挑战:测试题往往难度很高,使得自生成的伪标签不可靠,现有方法缺乏有效的机制来适应模型的特定推理弱点,导致学习效率低下。为了解决这些问题,我们提出了\textbf{TTSR},一个自我反思的测试时间自我进化的训练框架。TTSR采用一个预先训练的语言模型,在测试时在\textit{Student}和\textit{Teacher}的角色之间交替。学生专注于解决问题并从合成的变体问题中学习,而教师分析学生失败的推理轨迹,总结反复出现的推理弱点,并相应地合成目标变体问题。这个过程引导模型通过持续的自我进化循环在可学习的机制内进行改进。多个具有挑战性的数学推理基准测试的实验结果表明,TTSR不断提高推理性能,并推广到不同的模型骨干和一般域推理任务。这些研究结果表明,教师为中介的自我反思提供了一个有效的途径,稳定和持续的推理改善在测试时。
摘要:Test-time Training enables model adaptation using only test questions and offers a promising paradigm for improving the reasoning ability of large language models (LLMs). However, it faces two major challenges: test questions are often highly difficult, making self-generated pseudo-labels unreliable, and existing methods lack effective mechanisms to adapt to a model's specific reasoning weaknesses, leading to inefficient learning. To address these issues, we propose \textbf{TTSR}, a self-reflective test-time self-evolving training framework. TTSR employs a single pretrained language model that alternates between the roles of a \textit{Student} and a \textit{Teacher} at test time. The Student focuses on solving problems and learning from synthesized variant questions, while the Teacher analyzes the Student's failed reasoning trajectories, summarizes recurring reasoning weaknesses, and synthesizes targeted variant questions accordingly. This process guides the model to improve within a learnable regime through a continual self-evolving loop. Experimental results on multiple challenging mathematical reasoning benchmarks show that TTSR consistently improves reasoning performance and generalizes well across different model backbones and general-domain reasoning tasks. These findings suggest that teacher-mediated self-reflection provides an effective pathway for stable and continual reasoning improvement at test time.


【7】Controllable Generative Sandbox for Causal Inference
标题:用于因果推理的可控生成沙盒
链接:https://arxiv.org/abs/2603.03587

作者:Qi Zhang,Harsh Parikh,Ashley Naimi,Razieh Nabi,Christopher Kim,Timothy Lash
备注:34 pages, 15 figures. Submitted to ICML 2026. Code available at https://github.com/zhangqiecho/causalmix
摘要:因果推理中的方法验证和研究设计依赖于具有已知反事实的合成数据。现有的模拟器权衡分布现实主义,捕获混合类型和多模态表格数据的能力,对因果可控性,包括明确控制重叠,未测量的混杂和治疗效果的异质性。我们引入CauseMix,一个变分生成框架,通过将高斯潜在先验的混合物与针对连续,二进制和分类变量的数据类型特定解码器耦合来缩小这一差距。该模型纳入了明确的因果控制:塑造倾向分数分布的重叠正则化因子,以及混淆强度和效应异质性的直接参数化。这一统一的目标保持了对观察数据的保真度,同时能够对因果机制进行析因操作,允许重叠、混杂强度和治疗效果异质性在设计时独立变化。在基准测试中,CauseMix在混合类型表上实现了最先进的分布度量,同时提供了稳定的细粒度因果控制。我们在转移性去势抵抗性前列腺癌治疗的比较安全性研究中证明了实用性,使用CaCl2 Mix在校准的数据生成过程中比较估计值,调整超参数,并在目标治疗效果异质性场景下进行基于模拟的功效分析。
摘要:Method validation and study design in causal inference rely on synthetic data with known counterfactuals. Existing simulators trade off distributional realism, the ability to capture mixed-type and multimodal tabular data, against causal controllability, including explicit control over overlap, unmeasured confounding, and treatment effect heterogeneity. We introduce CausalMix, a variational generative framework that closes this gap by coupling a mixture of Gaussian latent priors with data-type-specific decoders for continuous, binary, and categorical variables. The model incorporates explicit causal controls: an overlap regularizer shaping propensity-score distributions, alongside direct parameterizations of confounding strength and effect heterogeneity. This unified objective preserves fidelity to the observed data while enabling factorial manipulation of causal mechanisms, allowing overlap, confounding strength, and treatment effect heterogeneity to be varied independently at design time. Across benchmarks, CausalMix achieves state-of-the-art distributional metrics on mixed-type tables while providing stable, fine-grained causal control. We demonstrate practical utility in a comparative safety study of metastatic castration-resistant prostate cancer treatments, using CausalMix to compare estimators under calibrated data-generating processes, tune hyperparameters, and conduct simulation-based power analyses under targeted treatment effect heterogeneity scenarios.


检测相关(2篇)

【1】REDNET-ML: A Multi-Sensor Machine Learning Pipeline for Harmful Algal Bloom Risk Detection Along the Omani Coast
标题:RECNET-ML:用于阿曼海岸有害藻华风险检测的多传感器机器学习管道
链接:https://arxiv.org/abs/2603.04181

作者:Ameer Alhashemi
备注:11 pages
摘要:有害藻华(HABs)可能威胁沿海基础设施,渔业和依赖海水淡化的供水。该项目(REDNET-ML)开发了一个可重复的机器学习管道,用于使用多传感器卫星数据和非泄漏评估沿阿曼海岸线进行HAB风险检测。该系统融合了(一)哨兵-2光学芯片(高空间分辨率)处理成光谱指数和纹理信号,(二)中分辨率成像分光仪3级海洋颜色和热指标,和(三)学习图像证据从对象探测器训练,以突出水华状图案。一个紧凑的决策融合模型(CatBoost)将这些信号集成到HAB风险的校准概率中,然后由端到端的推理工作流程和支持现场(工厂)和时间的操作探索的风险场查看器来消耗。该报告记录了动机,相关工作,方法选择(包括标签挖掘和严格的分裂策略),实施细节,以及使用AUROC/AUPRC,混淆矩阵,校准曲线和漂移分析来量化近年来的分布变化的关键评估。
摘要:Harmful algal blooms (HABs) can threaten coastal infrastructure, fisheries, and desalination dependent water supplies. This project (REDNET-ML) develops a reproducible machine learning pipeline for HAB risk detection along the Omani coastline using multi sensor satellite data and non leaky evaluation. The system fuses (i) Sentinel-2 optical chips (high spatial resolution) processed into spectral indices and texture signals, (ii) MODIS Level-3 ocean color and thermal indicators, and (iii) learned image evidence from object detectors trained to highlight bloom like patterns. A compact decision fusion model (CatBoost) integrates these signals into a calibrated probability of HAB risk, which is then consumed by an end to end inference workflow and a risk field viewer that supports operational exploration by site (plant) and time. The report documents the motivation, related work, methodological choices (including label mining and strict split strategies), implementation details, and a critical evaluation using AUROC/AUPRC, confusion matrices, calibration curves, and drift analyses that quantify distribution shift in recent years.


【2】Architectural Proprioception in State Space Models: Thermodynamic Training Induces Anticipatory Halt Detection
标题:状态空间模型中的建筑主体感觉:热力学训练诱导预期停止检测
链接:https://arxiv.org/abs/2603.04180

作者:Jay Noon
备注:17 pages, 15 figures
摘要:我们介绍了概率导航架构(PNA)框架,它将神经计算视为通过由热力学原理控制的概率流形进行导航。我们训练状态空间模型(SSM)和Transformers与一个新的热力学损失函数,惩罚计算浪费与标准的交叉熵。在19个实验阶段中,我们发现经过训练的SSM发展了建筑本体感受:在循环状态熵和停止置信度之间存在强烈的预期耦合(r =-0.836,p < 0.001),其中停止信号导致状态熵崩溃正好两个标记(tau = -2.0)。这种通用停止签名(USS)在随机种子中复制到小数点后四位,并推广到结构上不同的排序任务。关键的是,经过相同训练的Transformers没有显示出这种耦合(r = -0.07),表明这种现象是依赖于架构的。跨任务迁移实验证实,SSM停顿检测反映了真正的元认知(zero-shot transfer F1:SSM 64.2% vs. Transformers 69.3%;后适应:SSM 94.5% vs. Transformers 86.4%),而Transformer停顿检测依赖于句法模式匹配。对能量惩罚(alpha)和停止监督(beta)的二维超参数扫描表明,预期耦合是通过训练连续可控的,热力学压力作为主要的诱导机制,显式停止监督作为放大器。我们的研究结果表明,SSM是原生架构,其固定大小的经常性状态自然支持马尔可夫压缩,使计算自我意识,与成本感知的推理,动态令牌预算,并在生产系统中基于信心的路由的影响。
摘要:We introduce the Probability Navigation Architecture (PNA) framework, which treats neural computation as navigation through a probability manifold governed by thermodynamic principles. We train State Space Models (SSMs) and Transformers with a novel thermodynamic loss function that penalizes computational waste alongside standard cross-entropy. Across 19 experimental phases, we discover that thermodynamically-trained SSMs develop architectural proprioception: a strong anticipatory coupling between recurrent state entropy and halt confidence (r = -0.836, p < 0.001) in which the halt signal leads state entropy collapse by exactly two tokens (tau = -2.0). This Universal Stopping Signature (USS) reproduces to four decimal places across random seeds and generalizes to a structurally distinct sorting task. Critically, Transformers trained identically show no such coupling (r = -0.07), demonstrating that the phenomenon is architecture-dependent. Cross-task transfer experiments confirm that SSM halt detection reflects genuine meta-cognition (zero-shot transfer F1: SSMs 64.2% vs. Transformers 69.3%; post-adaptation: SSMs 94.5% vs. Transformers 86.4%), while Transformer halt detection relies on syntactic pattern matching. A 2D hyperparameter sweep over energy penalty (alpha) and halt supervision (beta) reveals that the anticipatory coupling is continuously controllable through training, with thermodynamic pressure serving as the primary induction mechanism and explicit halt supervision as an amplifier. Our results establish that SSMs are thermodynamically native architectures whose fixed-size recurrent states naturally support the Markovian compression that enables computational self-awareness, with implications for cost-aware inference, dynamic token budgets, and confidence-based routing in production systems.


分类|识别(4篇)

【1】CRESTomics: Analyzing Carotid Plaques in the CREST-2 Trial with a New Additive Classification Model
标题:CRESTomics:使用新的相加分类模型分析CREST-2试验中的颈动脉斑块
链接:https://arxiv.org/abs/2603.04309

作者:Pranav Kulkarni,Brajesh K. Lal,Georges Jreij,Sai Vallamchetla,Langford Green,Jenifer Voeks,John Huston,Lloyd Edwards,George Howard,Bradley A. Maron,Thomas G. Brott,James F. Meschia,Florence X. Doo,Heng Huang
备注:4 pages, 3 figures, 1 table, accepted to ISBI 2026
摘要:准确描述颈动脉斑块对于颈动脉狭窄患者预防卒中至关重要。我们分析了来自CREST-2(一项多中心临床试验)的500个斑块,以从B型超声图像中识别与高风险相关的基于放射学的标记物。我们提出了一种新的基于核的加性模型,结合相干损失和组稀疏正则化的非线性分类。每个特征组的逐组加性效应使用部分依赖图可视化。结果表明,我们的方法准确和可解释地评估斑块,揭示了斑块纹理和临床风险之间的强相关性。
摘要:Accurate characterization of carotid plaques is critical for stroke prevention in patients with carotid stenosis. We analyze 500 plaques from CREST-2, a multi-center clinical trial, to identify radiomics-based markers from B-mode ultrasound images linked with high-risk. We propose a new kernel-based additive model, combining coherence loss with group-sparse regularization for nonlinear classification. Group-wise additive effects of each feature group are visualized using partial dependence plots. Results indicate our method accurately and interpretably assesses plaques, revealing a strong association between plaque texture and clinical risk.


【2】Fixed-Budget Constrained Best Arm Identification in Grouped Bandits
标题:固定预算约束的分组盗贼最佳手臂识别
链接:https://arxiv.org/abs/2603.04007

作者:Raunak Mukherjee,Sharayu Moharir
备注:25 pages, 2 Figures
摘要:我们研究固定预算约束下的最佳手臂识别成组土匪,其中每个手臂由多个独立的属性与随机奖励。只有当所有属性的平均值都高于给定阈值时,才认为手臂是可行的。目的是找到具有最大总体平均值的可行臂。我们首先推导出一个下界的错误概率的任何算法在此设置。然后,我们提出了可行性约束的连续约束(FCSR),一种新的算法,确定最好的手臂,同时确保可行性。我们表明,它达到最佳的依赖于问题的参数,常数因子的指数。从经验上看,FCSR在保持可行性保证的同时优于自然基线。
摘要 :We study fixed budget constrained best-arm identification in grouped bandits, where each arm consists of multiple independent attributes with stochastic rewards. An arm is considered feasible only if all its attributes' means are above a given threshold. The aim is to find the feasible arm with the largest overall mean. We first derive a lower bound on the error probability for any algorithm on this setting. We then propose Feasibility Constrained Successive Rejects (FCSR), a novel algorithm that identifies the best arm while ensuring feasibility. We show it attains optimal dependence on problem parameters up to constant factors in the exponent. Empirically, FCSR outperforms natural baselines while preserving feasibility guarantees.


【3】LoRA-MME: Multi-Model Ensemble of LoRA-Tuned Encoders for Code Comment Classification
标题:LoRA-MME:用于代码注释分类的LoRA调谐编码器的多模型封装
链接:https://arxiv.org/abs/2603.03959

作者:Md Akib Haider,Ahsan Bulbul,Nafis Fuad Shahid,Aimaan Ahmed,Mohammad Ishrak Abedin
摘要:代码注释分类是自动化软件文档编制和分析的关键任务。在NLBSE'26工具竞赛的背景下,我们提出了\textbf{LoRA-MME},一种利用参数高效微调(PEFT)的多模型Ensemble架构。我们的方法通过结合四种不同的Transformer编码器的优势来解决Java,Python和Pharo中的多标签分类挑战:UniXcoder,CodeBERT,GraphCodeBERT和CodeBERTa。通过使用低秩自适应(LoRA)独立地微调这些模型,并通过学习的加权集成策略聚合它们的预测,我们最大限度地提高了分类性能,而无需完全模型微调的内存开销。我们的工具在测试集上实现了\textbf{F1加权得分为0.7906}和\textbf{Macro F1为0.6867}。然而,集成的计算成本导致最终提交分数为41.20%,突出了语义准确性和推理效率之间的权衡。
摘要:Code comment classification is a critical task for automated software documentation and analysis. In the context of the NLBSE'26 Tool Competition, we present \textbf{LoRA-MME}, a Multi-Model Ensemble architecture utilizing Parameter-Efficient Fine-Tuning (PEFT). Our approach addresses the multi-label classification challenge across Java, Python, and Pharo by combining the strengths of four distinct transformer encoders: UniXcoder, CodeBERT, GraphCodeBERT, and CodeBERTa. By independently fine-tuning these models using Low-Rank Adaptation(LoRA) and aggregating their predictions via a learned weighted ensemble strategy, we maximize classification performance without the memory overhead of full model fine-tuning. Our tool achieved an \textbf{F1 Weighted score of 0.7906} and a \textbf{Macro F1 of 0.6867} on the test set. However, the computational cost of the ensemble resulted in a final submission score of 41.20\%, highlighting the trade-off between semantic accuracy and inference efficiency.


【4】From Misclassifications to Outliers: Joint Reliability Assessment in Classification
标题:从错误分类到离群值:分类中的联合可靠性评估
链接:https://arxiv.org/abs/2603.03903

作者:Yang Li,Youyang Sha,Yinzhi Wang,Timothy Hospedales,Xi Shen,Shell Xu Hu,Xuanlong Yu
备注:15 pages, 3 figures. The source code is publicly available at https://github.com/Intellindust-AI-Lab/SUREPlus
摘要:构建可靠的分类器是在现实世界应用中部署机器学习的基本挑战。一个可靠的系统不仅应该检测出分布外(OOD)输入,而且还应该通过为潜在的错误分类样本分配低置信度来预测分布内(ID)错误。然而,大多数先前的工作对待OOD检测和故障预测作为单独的问题,忽略了他们的密切联系。我们认为,可靠性需要共同评估它们。为此,我们提出了一个统一的评估框架,集成了OOD检测和故障预测,量化我们的新指标DS-F1和DS-AURC,其中DS表示双评分功能。OpenOOD基准测试的实验表明,双评分函数产生的分类器比传统的单评分方法更可靠。我们的分析进一步表明,基于OOD的方法提供了显着的收益下简单或远OOD的变化,但只有边际效益更具挑战性的近OOD条件下。除了评估之外,我们还扩展了可靠的分类器SURE,并引入了SURE+,这是一种新的方法,可以显着提高各种场景的可靠性。总之,我们的框架,指标和方法为可信的分类建立了一个新的基准,并为在现实世界中部署健壮的模型提供了实用的指导。源代码可在https://github.com/Intellindust-AI-Lab/SUREPlus上公开获得。
摘要:Building reliable classifiers is a fundamental challenge for deploying machine learning in real-world applications. A reliable system should not only detect out-of-distribution (OOD) inputs but also anticipate in-distribution (ID) errors by assigning low confidence to potentially misclassified samples. Yet, most prior work treats OOD detection and failure prediction as separated problems, overlooking their closed connection. We argue that reliability requires evaluating them jointly. To this end, we propose a unified evaluation framework that integrates OOD detection and failure prediction, quantified by our new metrics DS-F1 and DS-AURC, where DS denotes double scoring functions. Experiments on the OpenOOD benchmark show that double scoring functions yield classifiers that are substantially more reliable than traditional single scoring approaches. Our analysis further reveals that OOD-based approaches provide notable gains under simple or far-OOD shifts, but only marginal benefits under more challenging near-OOD conditions. Beyond evaluation, we extend the reliable classifier SURE and introduce SURE+, a new approach that significantly improves reliability across diverse scenarios. Together, our framework, metrics, and method establish a new benchmark for trustworthy classification and offer practical guidance for deploying robust models in real-world settings. The source code is publicly available at https://github.com/Intellindust-AI-Lab/SUREPlus.


表征(1篇)

【1】RADAR: Learning to Route with Asymmetry-aware DistAnce Representations
标题:雷达:学习使用敏感感知的距离表示进行路由
链接:https://arxiv.org/abs/2603.03388

作者:Hang Yi,Ziwei Huang,Yining Ma,Zhiguang Cao
备注:Accepted by ICLR
摘要:最近的神经求解器在车辆路径问题(VRP)上取得了很好的性能,但它们主要假设对称的欧几里得距离,限制了对现实世界场景的适用性。一个核心的挑战是编码的关系特征的非对称距离矩阵的VRP。早期的尝试直接编码这些矩阵,但往往无法产生紧凑的嵌入和推广规模差。在本文中,我们提出了雷达,一个可扩展的神经框架,增强现有的神经VRP求解器的能力,处理非对称输入。RADAR从静态和动态两个角度解决不对称问题。它利用非对称距离矩阵上的奇异值分解(SVD)来初始化紧凑和可推广的嵌入,这些嵌入固有地编码每个节点的入站和出站成本中的静态非对称性。为了进一步对编码过程中嵌入交互的动态不对称性进行建模,它用Sinkhorn归一化代替了标准的softmax,在注意力权重中引入了联合行和列距离意识。对各种VRP的合成和真实世界基准进行的广泛实验表明,RADAR在分布内和分布外实例上的性能都优于强基线,在解决非对称VRP方面表现出强大的泛化能力和卓越的性能。
摘要:Recent neural solvers have achieved strong performance on vehicle routing problems (VRPs), yet they mainly assume symmetric Euclidean distances, restricting applicability to real-world scenarios. A core challenge is encoding the relational features in asymmetric distance matrices of VRPs. Early attempts directly encoded these matrices but often failed to produce compact embeddings and generalized poorly at scale. In this paper, we propose RADAR, a scalable neural framework that augments existing neural VRP solvers with the ability to handle asymmetric inputs. RADAR addresses asymmetry from both static and dynamic perspectives. It leverages Singular Value Decomposition (SVD) on the asymmetric distance matrix to initialize compact and generalizable embeddings that inherently encode the static asymmetry in the inbound and outbound costs of each node. To further model dynamic asymmetry in embedding interactions during encoding, it replaces the standard softmax with Sinkhorn normalization that imposes joint row and column distance awareness in attention weights. Extensive experiments on synthetic and real-world benchmarks across various VRPs show that RADAR outperforms strong baselines on both in-distribution and out-of-distribution instances, demonstrating robust generalization and superior performance in solving asymmetric VRPs.


3D|3D重建等相关(2篇)

【1】ZipMap: Linear-Time Stateful 3D Reconstruction with Test-Time Training
标题:ZipMap:带有测试时训练的线性时间状态3D重建
链接:https://arxiv.org/abs/2603.04385

作者:Haian Jin,Rundi Wu,Tianyuan Zhang,Ruiqi Gao,Jonathan T. Barron,Noah Snavely,Aleksander Holynski
备注:Project page: https://haian-jin.github.io/ZipMap
摘要:前馈Transformer模型推动了3D视觉的快速发展,但VGGT和$π^3$等最先进的方法的计算成本与输入图像的数量成二次方,这使得它们在应用于大型图像集时效率低下。顺序重建方法降低了这种成本,但牺牲了重建质量。我们引入了ZipMap,这是一种有状态的前馈模型,可以实现线性时间的双向3D重建,同时匹配或超过二次时间方法的精度。ZipMap采用测试时训练层,在一次向前传递中将整个图像集合压缩为紧凑的隐藏场景状态,从而在单个H100 GPU上在10秒内重建超过700帧,比VGGT等最先进的方法快20倍以上。此外,我们证明了有状态表示在实时场景状态查询和其扩展到顺序流重建的好处。
摘要:Feed-forward transformer models have driven rapid progress in 3D vision, but state-of-the-art methods such as VGGT and $π^3$ have a computational cost that scales quadratically with the number of input images, making them inefficient when applied to large image collections. Sequential-reconstruction approaches reduce this cost but sacrifice reconstruction quality. We introduce ZipMap, a stateful feed-forward model that achieves linear-time, bidirectional 3D reconstruction while matching or surpassing the accuracy of quadratic-time methods. ZipMap employs test-time training layers to zip an entire image collection into a compact hidden scene state in a single forward pass, enabling reconstruction of over 700 frames in under 10 seconds on a single H100 GPU, more than $20\times$ faster than state-of-the-art methods such as VGGT. Moreover, we demonstrate the benefits of having a stateful representation in real-time scene-state querying and its extension to sequential streaming reconstruction.


【2】Beyond Pixel Histories: World Models with Persistent 3D State
标题:超越像素历史:具有持久3D状态的世界模型
链接:https://arxiv.org/abs/2603.03482

作者:Samuel Garcin,Thomas Walker,Steven McDonagh,Tim Pearce,Hakan Bilen,Tianyu He,Kaixin Wang,Jiang Bian
备注:Currently under review
摘要:交互式世界模型通过响应用户的动作来持续生成视频,从而实现开放式生成功能。然而,现有的模型通常缺乏环境的3D表示,这意味着必须从数据中隐式地学习3D一致性,并且空间记忆仅限于有限的时间上下文窗口。这导致了不切实际的用户体验,并对下游任务(如训练代理)造成了重大障碍。为了解决这个问题,我们提出了PERSIST,一个新的范式的世界模型,模拟一个潜在的3D场景的演变:环境,相机和渲染器。这使我们能够合成具有持久空间记忆和一致几何形状的新帧。定量指标和定性用户研究都表明,与现有方法相比,空间记忆、3D一致性和长期稳定性都有了实质性的改善,从而实现了连贯的、不断发展的3D世界。我们进一步展示了新的功能,包括从单个图像合成不同的3D环境,以及通过直接在3D空间中支持环境编辑和规范来对生成的体验进行细粒度的几何感知控制。项目页面:https://francelico.github.io/persist.github.io
摘要:Interactive world models continually generate video by responding to a user's actions, enabling open-ended generation capabilities. However, existing models typically lack a 3D representation of the environment, meaning 3D consistency must be implicitly learned from data, and spatial memory is restricted to limited temporal context windows. This results in an unrealistic user experience and presents significant obstacles to down-stream tasks such as training agents. To address this, we present PERSIST, a new paradigm of world model which simulates the evolution of a latent 3D scene: environment, camera, and renderer. This allows us to synthesize new frames with persistent spatial memory and consistent geometry. Both quantitative metrics and a qualitative user study show substantial improvements in spatial memory, 3D consistency, and long-horizon stability over existing methods, enabling coherent, evolving 3D worlds. We further demonstrate novel capabilities, including synthesising diverse 3D environments from a single image, as well as enabling fine-grained, geometry-aware control over generated experiences by supporting environment editing and specification directly in 3D space. Project page: https://francelico.github.io/persist.github.io


优化|敛散性(10篇)

【1】Unbiased Dynamic Pruning for Efficient Group-Based Policy Optimization
标题:无偏动态修剪以实现高效的基于组的策略优化
链接:https://arxiv.org/abs/2603.04135

作者:Haodong Zhu,Yangyang Ren,Yanjing Li,Mingbao Lin,Linlin Yang,Xuhui Liu,Xiantong Zhen,Haiguang Liu,Baochang Zhang
备注:20 pages, 4 figures
摘要:组相对策略优化(GRPO)有效地扩展了LLM推理,但由于其广泛的基于组的采样要求,导致了高昂的计算成本。虽然最近的选择性数据利用方法可以减轻这种开销,但它们可能会通过改变基本的采样分布,损害理论的严谨性和收敛行为来引起估计偏差。为了解决这个问题,我们提出了动态修剪策略优化(DPPO),一个框架,使动态修剪,同时保持无偏梯度估计,通过重要性采样为基础的校正。通过结合数学推导的重新缩放因子,DPPO显著加快了GRPO训练,而不会改变全批次基线的优化目标。此外,为了减轻修剪引起的数据稀疏性,我们引入了密集提示包装,一个基于窗口的贪婪策略,最大限度地提高有效令牌密度和硬件利用率。大量的实验表明,DPPO在不同的模型和基准测试中始终如一地加速训练。例如,在MATH训练的Qwen 3 - 4 B上,DPPO在六个数学推理基准测试中实现了2.37$\times$的训练加速,并且在平均准确率上优于GRPO 3.36%。
摘要:Group Relative Policy Optimization (GRPO) effectively scales LLM reasoning but incurs prohibitive computational costs due to its extensive group-based sampling requirement. While recent selective data utilization methods can mitigate this overhead, they could induce estimation bias by altering the underlying sampling distribution, compromising theoretical rigor and convergence behavior. To address this limitation, we propose Dynamic Pruning Policy Optimization (DPPO), a framework that enables dynamic pruning while preserving unbiased gradient estimation through importance sampling-based correction. By incorporating mathematically derived rescaling factors, DPPO significantly accelerates GRPO training without altering the optimization objective of the full-batch baseline. Furthermore, to mitigate the data sparsity induced by pruning, we introduce Dense Prompt Packing, a window-based greedy strategy that maximizes valid token density and hardware utilization. Extensive experiments demonstrate that DPPO consistently accelerates training across diverse models and benchmarks. For instance, on Qwen3-4B trained on MATH, DPPO achieves 2.37$\times$ training speedup and outperforms GRPO by 3.36% in average accuracy across six mathematical reasoning benchmarks.


【2】When to restart? Exploring escalating restarts on convergence
标题:何时重新启动?探索逐步重启融合
链接:https://arxiv.org/abs/2603.04117

作者:Ayush K. Varshney,Šarūnas Girdzijauskas,Konstantinos Vandikas,Aneta Vulgarakis Feljan
备注:Paper accepted in Sci4DL workshop in ICLR 2026. https://openreview.net/forum?id=18Yf2KKIn0
摘要:学习率调度在深度神经网络的优化中起着关键作用,直接影响收敛速度、稳定性和泛化能力。虽然现有的训练器,如余弦退火、循环学习率和热重启已经显示出了希望,但它们通常依赖于固定或周期性的触发器,这些触发器对训练动态是不可知的,如停滞或收敛行为。在这项工作中,我们提出了一个简单而有效的策略,我们称之为随机梯度下降与升级重启(SGD-ER)。它自适应地提高学习速度收敛。我们的方法监控训练进度,并在检测到停滞时触发重新启动,线性提升学习率以逃避尖锐的局部最小值并探索损失景观的平坦区域。我们在CIFAR-10,CIFAR-100和TinyImageNet上评估了SGD-ER,这些架构包括ResNet-18/34/50,VGG-16和DenseNet-101。与标准重启器相比,SGD-ER将测试精度提高了0.5- 4.5%,证明了收敛感知逐步重启的好处,以获得更好的局部最优解。
摘要 :Learning rate scheduling plays a critical role in the optimization of deep neural networks, directly influencing convergence speed, stability, and generalization. While existing schedulers such as cosine annealing, cyclical learning rates, and warm restarts have shown promise, they often rely on fixed or periodic triggers that are agnostic to the training dynamics, such as stagnation or convergence behavior. In this work, we propose a simple yet effective strategy, which we call Stochastic Gradient Descent with Escalating Restarts (SGD-ER). It adaptively increases the learning rate upon convergence. Our method monitors training progress and triggers restarts when stagnation is detected, linearly escalating the learning rate to escape sharp local minima and explore flatter regions of the loss landscape. We evaluate SGD-ER across CIFAR-10, CIFAR-100, and TinyImageNet on a range of architectures including ResNet-18/34/50, VGG-16, and DenseNet-101. Compared to standard schedulers, SGD-ER improves test accuracy by 0.5-4.5%, demonstrating the benefit of convergence-aware escalating restarts for better local optima.


【3】On the Learnability of Offline Model-Based Optimization: A Ranking Perspective
标题:基于模型的离线优化的可学习性:排名的角度
链接:https://arxiv.org/abs/2603.04000

作者:Shen-Huan Lyu,Rong-Xi Tan,Ke Xue,Yi-Xiao He,Yu Huang,Qingfu Zhang,Chao Qian
摘要:离线基于模型的优化(MBO)旨在仅使用过去评估的固定数据集来发现高性能设计。大多数现有的方法依赖于通过回归学习代理模型,并隐含地假设良好的预测精度会导致良好的优化性能。在这项工作中,我们挑战这一假设,并从可学习性的角度研究离线管理层收购。我们认为,离线优化从根本上说是一个高质量设计的排名问题,而不是准确的价值预测。具体来说,我们引入了一个面向优化的风险之间的排名接近最优和次优设计的基础上,并开发了一个统一的理论框架,连接代理学习最终优化。我们证明了回归排序的理论优势,并确定训练数据和近最优设计之间的分布失配为主要错误。受此启发,我们设计了一个分布感知的排名方法,以减少这种不匹配。各种任务的实证结果表明,我们的方法优于20个现有的方法,验证了我们的理论研究结果。此外,理论和实证结果都揭示了离线MBO的内在局限性,显示了一个没有离线方法可以避免过度乐观外推的制度。
摘要:Offline model-based optimization (MBO) seeks to discover high-performing designs using only a fixed dataset of past evaluations. Most existing methods rely on learning a surrogate model via regression and implicitly assume that good predictive accuracy leads to good optimization performance. In this work, we challenge this assumption and study offline MBO from a learnability perspective. We argue that offline optimization is fundamentally a problem of ranking high-quality designs rather than accurate value prediction. Specifically, we introduce an optimization-oriented risk based on ranking between near-optimal and suboptimal designs, and develop a unified theoretical framework that connects surrogate learning to final optimization. We prove the theoretical advantages of ranking over regression, and identify distributional mismatch between the training data and near-optimal designs as the dominant error. Inspired by this, we design a distribution-aware ranking method to reduce this mismatch. Empirical results across various tasks show that our approach outperforms twenty existing methods, validating our theoretical findings. Additionally, both theoretical and empirical results reveal intrinsic limitations in offline MBO, showing a regime in which no offline method can avoid over-optimistic extrapolation.


【4】GIPO: Gaussian Importance Sampling Policy Optimization
标题:GIPO:高斯重要性抽样政策优化
链接:https://arxiv.org/abs/2603.03955

作者:Chengxuan Lu,Zhenquan Zhang,Shukuan Wang,Qunzhi Lin,Baigui Sun,Yang Liu
摘要:Post-training with reinforcement learning (RL) has recently shown strong promise for advancing multimodal agents beyond supervised imitation. However, RL remains limited by poor data efficiency, particularly in settings where interaction data are scarce and quickly become outdated. To address this challenge, GIPO (Gaussian Importance sampling Policy Optimization) is proposed as a policy optimization objective based on truncated importance sampling, replacing hard clipping with a log-ratio-based Gaussian trust weight to softly damp extreme importance ratios while maintaining non-zero gradients. Theoretical analysis shows that GIPO introduces an implicit, tunable constraint on the update magnitude, while concentration bounds guarantee robustness and stability under finite-sample estimation. Experimental results show that GIPO achieves state-of-the-art performance among clipping-based baselines across a wide range of replay buffer sizes, from near on-policy to highly stale data, while exhibiting superior bias--variance trade-off, high training stability and improved sample efficiency.


【5】Local Shapley: Model-Induced Locality and Optimal Reuse in Data Valuation
标题:Local Shapley:模型诱导的局部性和数据评估中的最佳重用
链接:https://arxiv.org/abs/2603.03672

作者:Xuan Yang,Hsi-Wen Chen,Ming-Syan Chen,Jian Pei
摘要:The Shapley value provides a principled foundation for data valuation, but exact computation is #P-hard due to the exponential coalition space. Existing accelerations remain global and ignore a structural property of modern predictors: for a given test instance, only a small subset of training points influences the prediction. We formalize this model-induced locality through support sets defined by the model's computational pathway (e.g., neighbors in KNN, leaves in trees, receptive fields in GNNs), showing that Shapley computation can be projected onto these supports without loss when locality is exact. This reframes Shapley evaluation as a structured data processing problem over overlapping support-induced subset families rather than exhaustive coalition enumeration. We prove that the intrinsic complexity of Local Shapley is governed by the number of distinct influential subsets, establishing an information-theoretic lower bound on retraining operations. Guided by this result, we propose LSMR (Local Shapley via Model Reuse), an optimal subset-centric algorithm that trains each influential subset exactly once via support mapping and pivot scheduling. For larger supports, we develop LSMR-A, a reuse-aware Monte Carlo estimator that remains unbiased with exponential concentration, with runtime determined by the number of distinct sampled subsets rather than total draws. Experiments across multiple model families demonstrate substantial retraining reductions and speedups while preserving high valuation fidelity.


【6】Riemannian Optimization in Modular Systems
标题:模块化系统中的雷曼优化
链接:https://arxiv.org/abs/2603.03610

作者:Christian Pehle,Jean-Jacques Slotine
备注:9 pages
摘要 :Understanding how systems built out of modular components can be jointly optimized is an important problem in biology, engineering, and machine learning. The backpropagation algorithm is one such solution and has been instrumental in the success of neural networks. Despite its empirical success, a strong theoretical understanding of it is lacking. Here, we combine tools from Riemannian geometry, optimal control theory, and theoretical physics to advance this understanding.   We make three key contributions: First, we revisit the derivation of backpropagation as a constrained optimization problem and combine it with the insight that Riemannian gradient descent trajectories can be understood as the minimum of an action. Second, we introduce a recursively defined layerwise Riemannian metric that exploits the modular structure of neural networks and can be efficiently computed using the Woodbury matrix identity, avoiding the $O(n^3)$ cost of full metric inversion. Third, we develop a framework of composable ``Riemannian modules'' whose convergence properties can be quantified using nonlinear contraction theory, providing algorithmic stability guarantees of order $O(κ^2 L/(ξμ\sqrt{n}))$ where $κ$ and $L$ are Lipschitz constants, $μ$ is the mass matrix scale, and $ξ$ bounds the condition number.   Our layerwise metric approach provides a practical alternative to natural gradient descent. While we focus here on studying neural networks, our approach more generally applies to the study of systems made of modules that are optimized over time, as it occurs in biology during both evolution and development.


【7】Q-Measure-Learning for Continuous State RL: Efficient Implementation and Convergence
标题:连续状态RL的Q测量学习:高效实现和收敛
链接:https://arxiv.org/abs/2603.03523

作者:Shengbo Wang
摘要:We study reinforcement learning in infinite-horizon discounted Markov decision processes with continuous state spaces, where data are generated online from a single trajectory under a Markovian behavior policy. To avoid maintaining an infinite-dimensional, function-valued estimate, we propose the novel Q-Measure-Learning, which learns a signed empirical measure supported on visited state-action pairs and reconstructs an action-value estimate via kernel integration. The method jointly estimates the stationary distribution of the behavior chain and the Q-measure through coupled stochastic approximation, leading to an efficient weight-based implementation with $O(n)$ memory and $O(n)$ computation cost per iteration. Under uniform ergodicity of the behavior chain, we prove almost sure sup-norm convergence of the induced Q-function to the fixed point of a kernel-smoothed Bellman operator. We also bound the approximation error between this limit and the optimal $Q^*$ as a function of the kernel bandwidth. To assess the performance of our proposed algorithm, we conduct RL experiments in a two-item inventory control setting.


【8】Optimal trajectory-guided stochastic co-optimization for e-fuel system design and real-time operation
标题:电子燃料系统设计和实时运行的最优知识引导随机协同优化
链接:https://arxiv.org/abs/2603.03484

作者:Jeongdong Kim,Minsu Kim,Jonggeol Na,Junghwan Kim
备注:29 pages, 6 figures. Supplementary Information included
摘要:E-fuels are promising long-term energy carriers supporting the net-zero transition. However, the large combinatorial design-operation spaces under renewable uncertainty make the use of mathematical programming impractical for co-optimizing e-fuel production systems. Here, we present MasCOR, a machine-learning-assisted co-optimization framework that learns from global operational trajectories. By encoding system design and renewable trends, a single MasCOR agent generalizes dynamic operation across diverse configurations and scenarios, substantially simplifying design-operation co-optimization under uncertainty. Benchmark comparisons against state-of-the-art reinforcement learning baselines demonstrate near-optimal performance, while computational costs are substantially lower than those of mathematical programming, enabling rapid parallel evaluation of designs within the co-optimization loop. This framework enables rapid screening of feasible design spaces together with corresponding operational policies. When applied to four potential European sites targeting e-methanol production, MasCOR shows that most locations benefit from reducing system load below 50 MW to achieve carbon-neutral methanol production, with production costs of 1.0-1.2 USD per kg. In contrast, Dunkirk (France), with limited renewable availability and high grid prices, favors system loads above 200 MW and expanded storage to exploit dynamic grid exchange and hydrogen sales to the market. These results underscore the value of the MasCOR framework for site-specific guidance from system design to real-time operation.


【9】Fermi-Dirac thermal measurements: A framework for quantum hypothesis testing and semidefinite optimization
标题:费米-狄拉克热测量:量子假设测试和半确定优化的框架
链接:https://arxiv.org/abs/2603.04061

作者:Nana Liu,Mark M. Wilde
备注:35 pages, 3 figures
摘要:Quantum measurements are the means by which we recover messages encoded into quantum states. They are at the forefront of quantum hypothesis testing, wherein the goal is to perform an optimal measurement for arriving at a correct conclusion. Mathematically, a measurement operator is Hermitian with eigenvalues in [0,1]. By noticing that this constraint on each eigenvalue is the same as that imposed on fermions by the Pauli exclusion principle, we interpret every eigenmode of a measurement operator as an independent effective fermionic mode. Under this perspective, various objective functions in quantum hypothesis testing can be viewed as the total expected energy associated with these fermionic occupation numbers. By instead fixing a temperature and minimizing the total expected fermionic free energy, we find that optimal measurements for these modified objective functions are Fermi-Dirac thermal measurements, wherein their eigenvalues are specified by Fermi-Dirac distributions. In the low-temperature limit, their performance closely approximates that of optimal measurements for quantum hypothesis testing, and we show that their parameters can be learned by classical or hybrid quantum-classical optimization algorithms. This leads to a new quantum machine-learning model, termed Fermi-Dirac machines, consisting of parameterized Fermi-Dirac thermal measurements-an alternative to quantum Boltzmann machines based on thermal states. Beyond hypothesis testing, we show how general semidefinite optimization problems can be solved using this approach, leading to a novel paradigm for semidefinite optimization on quantum computers, in which the goal is to implement thermal measurements rather than prepare thermal states. Finally, we propose quantum algorithms for implementing Fermi-Dirac thermal measurements, and we also propose second-order hybrid quantum-classical optimization algorithms.


【10】Riemannian Langevin Dynamics: Strong Convergence of Geometric Euler-Maruyama Scheme
标题:Riemannian Langevin动力学:几何Euler-Maruyama方案的强收敛
链接:https://arxiv.org/abs/2603.03626

作者:Zhiyuan Zhan,Masashi Sugiyama
摘要:Low-dimensional structure in real-world data plays an important role in the success of generative models, which motivates diffusion models defined on intrinsic data manifolds. Such models are driven by stochastic differential equations (SDEs) on manifolds, which raises the need for convergence theory of numerical schemes for manifold-valued SDEs. In Euclidean space, the Euler--Maruyama (EM) scheme achieves strong convergence with order $1/2$, but an analogous result for manifold discretizations is less understood in general settings. In this work, we study a geometric version of the EM scheme for SDEs on Riemannian manifolds and prove strong convergence with order $1/2$ under geometric and regularity conditions. As an application, we obtain a Wasserstein bound for sampling on manifolds via the geometric EM discretization of Riemannian Langevin dynamics.


预测|估计(7篇)

【1】SimpliHuMoN: Simplifying Human Motion Prediction
标题:SimpliHuMoN:简化人类运动预测
链接:https://arxiv.org/abs/2603.04399

作者:Aadya Agrawal,Alexander Schwing
备注:19 pages, 7 figures. Preprint
摘要:Human motion prediction combines the tasks of trajectory forecasting and human pose prediction. For each of the two tasks, specialized models have been developed. Combining these models for holistic human motion prediction is non-trivial, and recent methods have struggled to compete on established benchmarks for individual tasks. To address this, we propose a simple yet effective transformer-based model for human motion prediction. The model employs a stack of self-attention modules to effectively capture both spatial dependencies within a pose and temporal relationships across a motion sequence. This simple, streamlined, end-to-end model is sufficiently versatile to handle pose-only, trajectory-only, and combined prediction tasks without task-specific modifications. We demonstrate that this approach achieves state-of-the-art results across all tasks through extensive experiments on a wide range of benchmark datasets, including Human3.6M, AMASS, ETH-UCY, and 3DPW.


【2】Nearest-Neighbor Density Estimation for Dependency Suppression
标题:依赖性抑制的最近邻密度估计
链接:https://arxiv.org/abs/2603.04224

作者:Kathleen Anderson,Thomas Martinetz
摘要:The ability to remove unwanted dependencies from data is crucial in various domains, including fairness, robust learning, and privacy protection. In this work, we propose an encoder-based approach that learns a representation independent of a sensitive variable but otherwise preserving essential data characteristics. Unlike existing methods that rely on decorrelation or adversarial learning, our approach explicitly estimates and modifies the data distribution to neutralize statistical dependencies. To achieve this, we combine a specialized variational autoencoder with a novel loss function driven by non-parametric nearest-neighbor density estimation, enabling direct optimization of independence. We evaluate our approach on multiple datasets, demonstrating that it can outperform existing unsupervised techniques and even rival supervised methods in balancing information removal and utility.


【3】Learning Hip Exoskeleton Control Policy via Predictive Neuromusculoskeletal Simulation
标题:通过预测神经肌肉骨骼模拟学习髋关节外骨骼控制策略
链接:https://arxiv.org/abs/2603.04166

作者:Ilseung Park,Changseob Song,Inseung Kang
摘要:Developing exoskeleton controllers that generalize across diverse locomotor conditions typically requires extensive motion-capture data and biomechanical labeling, limiting scalability beyond instrumented laboratory settings. Here, we present a physics-based neuromusculoskeletal learning framework that trains a hip-exoskeleton control policy entirely in simulation, without motion-capture demonstrations, and deploys it on hardware via policy distillation. A reinforcement learning teacher policy is trained using a muscle-synergy action prior over a wide range of walking speeds and slopes through a two-stage curriculum, enabling direct comparison between assisted and no-exoskeleton conditions. In simulation, exoskeleton assistance reduces mean muscle activation by up to 3.4% and mean positive joint power by up to 7.0% on level ground and ramp ascent, with benefits increasing systematically with walking speed. On hardware, the assistance profiles learned in simulation are preserved across matched speed-slope conditions (r: 0.82, RMSE: 0.03 Nm/kg), providing quantitative evidence of sim-to-real transfer without additional hardware tuning. These results demonstrate that physics-based neuromusculoskeletal simulation can serve as a practical and scalable foundation for exoskeleton controller development, substantially reducing experimental burden during the design phase.


【4】Two-Stage Photovoltaic Forecasting: Separating Weather Prediction from Plant-Characteristics
标题:两阶段太阳能预测:将天气预测与植物特征分开
链接:https://arxiv.org/abs/2603.04132

作者:Philipp Danner,Hermann de Meer
摘要 :Several energy management applications rely on accurate photovoltaic generation forecasts. Common metrics like mean absolute error or root-mean-square error, omit error-distribution details needed for stochastic optimization. In addition, several approaches use weather forecasts as inputs without analyzing the source of the prediction error. To overcome this gap, we decompose forecasting into a weather forecast model for environmental parameters such as solar irradiance and temperature and a plant characteristic model that captures site-specific parameters like panel orientation, temperature influence, or regular shading. Satellite-based weather observation serves as an intermediate layer. We analyze the error distribution of the high-resolution rapid-refresh numerical weather prediction model that covers the United States as a black-box model for weather forecasting and train an ensemble of neural networks on historical power output data for the plant characteristic model. Results show mean absolute error increases by 11% and 68% for two selected photovoltaic systems when using weather forecasts instead of satellite-based ground-truth weather observations as a perfect forecast. The generalized hyperbolic and Student's t distributions adequately fit the forecast errors across lead times.


【5】Dual-Solver: A Generalized ODE Solver for Diffusion Models with Dual Prediction
标题:双重求解器:具有双重预测的扩散模型的广义ODE求解器
链接:https://arxiv.org/abs/2603.03973

作者:Soochul Park,Yeon Ju Lee
备注:Published as a conference paper at ICLR 2026. 36 pages, 18 figures
摘要:Diffusion models achieve state-of-the-art image quality. However, sampling is costly at inference time because it requires a large number of function evaluations (NFEs). To reduce NFEs, classical ODE numerical methods have been adopted. Yet, the choice of prediction type and integration domain leads to different sampling behaviors. To address these issues, we introduce Dual-Solver, which generalizes multistep samplers through learnable parameters that continuously (i) interpolate among prediction types, (ii) select the integration domain, and (iii) adjust the residual terms. It retains the standard predictor-corrector structure while preserving second-order local accuracy. These parameters are learned via a classification-based objective using a frozen pretrained classifier (e.g., MobileNet or CLIP). For ImageNet class-conditional generation (DiT, GM-DiT) and text-to-image generation (SANA, PixArt-$α$), Dual-Solver improves FID and CLIP scores in the low-NFE regime ($3 \le$ NFE $\le 9$) across backbones.


【6】PatchDecomp: Interpretable Patch-Based Time Series Forecasting
标题:PatchDecomp:可解释的基于补丁的时间序列预测
链接:https://arxiv.org/abs/2603.03902

作者:Hiroki Tomioka,Genta Yoshimura
摘要:Time series forecasting, which predicts future values from past observations, plays a central role in many domains and has driven the development of highly accurate neural network models. However, the complexity of these models often limits human understanding of the rationale behind their predictions. We propose PatchDecomp, a neural network-based time series forecasting method that achieves both high accuracy and interpretability. PatchDecomp divides input time series into subsequences (patches) and generates predictions by aggregating the contributions of each patch. This enables clear attribution of each patch, including those from exogenous variables, to the final prediction. Experiments on multiple benchmark datasets demonstrate that PatchDecomp provides predictive performance comparable to recent forecasting methods. Furthermore, we show that the model's explanations not only influence predicted values quantitatively but also offer qualitative interpretability through visualization of patch-wise contributions.


【7】Freezing of Gait Prediction using Proactive Agent that Learns from Selected Experience and DDQN Algorithm
标题:使用学习所选经验的主动代理和DDQN算法冻结步态预测
链接:https://arxiv.org/abs/2603.03651

作者:Septian Enggar Sukmana,Sang Won Bae,Tomohiro Shibata
备注:Accepted on Activity and Behavior Computing (ABC) 2026 Conference (https://autocare.ai/abc2026) and will be published on International Journal of Activity and Behavior Computing (IJABC) (International Journal of Activity and Behavior Computing)
摘要:Freezing of Gait (FOG) is a debilitating motor symptom commonly experienced by individuals with Parkinson's Disease (PD) which often leads to falls and reduced mobility. Timely and accurate prediction of FOG episodes is essential for enabling proactive interventions through assistive technologies. This study presents a reinforcement learning-based framework designed to identify optimal pre-FOG onset points, thereby extending the prediction horizon for anticipatory cueing systems. The model implements a Double Deep Q-Network (DDQN) architecture enhanced with Prioritized Experience Replay (PER) allowing the agent to focus learning on high-impact experiences and refine its policy. Trained over 9000 episodes with a reward shaping strategy that promotes cautious decision-making, the agent demonstrated robust performance in both subject-dependent and subject-independent evaluations. The model achieved a prediction horizon of up to 8.72 seconds prior to FOG onset in subject-independent scenarios and 7.89 seconds in subject-dependent settings. These results highlight the model's potential for integration into wearable assistive devices, offering timely and personalized interventions to mitigate FOG in PD patients.


其他神经网络|深度学习|模型|建模(20篇)

【1】Out-of-distribution transfer of PDE foundation models to material dynamics under extreme loading
标题:极端载荷下PCE基础模型向材料动力学的非分布转换
链接:https://arxiv.org/abs/2603.04354

作者:Mahindra Rautela,Alexander Most,Siddharth Mansingh,Aleksandra Pachalieva,Bradley Love,Daniel O Malley,Alexander Scheinker,Kyle Hickmann,Diane Oyen,Nathan Debardeleben,Earl Lawrence,Ayan Biswas
摘要:Most PDE foundation models are pretrained and fine-tuned on fluid-centric benchmarks. Their utility under extreme-loading material dynamics remains unclear. We benchmark out-of-distribution transfer on two discontinuity-dominated regimes in which shocks, evolving interfaces, and fracture produce highly non-smooth fields: shock-driven multi-material interface dynamics (perturbed layered interface or PLI) and dynamic fracture/failure evolution (FRAC). We formulate the downstream task as terminal-state prediction, i.e., learning a long-horizon map that predicts the final state directly from the first snapshot without intermediate supervision. Using a unified training and evaluation protocol, we evaluate two open-source pretrained PDE foundation models, POSEIDON and MORPH, and compare fine-tuning from pretrained weights against training from scratch across training-set sizes to quantify sample efficiency under distribution shift.


【2】What Does Flow Matching Bring To TD Learning?
标题:流量匹配为TD学习带来了什么?
链接:https://arxiv.org/abs/2603.04333

作者:Bhavya Agrawalla,Michal Nauman,Aviral Kumar
摘要:Recent work shows that flow matching can be effective for scalar Q-value function estimation in reinforcement learning (RL), but it remains unclear why or how this approach differs from standard critics. Contrary to conventional belief, we show that their success is not explained by distributional RL, as explicitly modeling return distributions can reduce performance. Instead, we argue that the use of integration for reading out values and dense velocity supervision at each step of this integration process for training improves TD learning via two mechanisms. First, it enables robust value prediction through \emph{test-time recovery}, whereby iterative computation through integration dampens errors in early value estimates as more integration steps are performed. This recovery mechanism is absent in monolithic critics. Second, supervising the velocity field at multiple interpolant values induces more \emph{plastic} feature learning within the network, allowing critics to represent non-stationary TD targets without discarding previously learned features or overfitting to individual TD targets encountered during training. We formalize these effects and validate them empirically, showing that flow-matching critics substantially outperform monolithic critics (2$\times$ in final performance and around 5$\times$ in sample efficiency) in settings where loss of plasticity poses a challenge e.g., in high-UTD online RL problems, while remaining stable during learning.


【3】World Properties without World Models: Recovering Spatial and Temporal Structure from Co-occurrence Statistics in Static Word Embeddings
标题:没有世界模型的世界属性:从静态词嵌入中的共生统计恢复空间和时间结构
链接:https://arxiv.org/abs/2603.04317

作者:Elan Barenholtz
备注:12 pages, 3 figures, 3 tables
摘要:Recent work interprets the linear recoverability of geographic and temporal variables from large language model (LLM) hidden states as evidence for world-like internal representations. We test a simpler possibility: that much of the relevant structure is already latent in text itself. Applying the same class of ridge regression probes to static co-occurrence-based embeddings (GloVe and Word2Vec), we find substantial recoverable geographic signal and weaker but reliable temporal signal, with held-out R^2 values of 0.71-0.87 for city coordinates and 0.48-0.52 for historical birth years. Semantic-neighbor analyses and targeted subspace ablations show that these signals depend strongly on interpretable lexical gradients, especially country names and climate-related vocabulary. These findings suggest that ordinary word co-occurrence preserves richer spatial, temporal, and environmental structure than is often assumed, revealing a remarkable and underappreciated capacity of simple static embeddings to preserve world-shaped structure from text alone. Linear probe recoverability alone therefore does not establish a representational move beyond text.


【4】LUMINA: Foundation Models for Topology Transferable ACOPF
标题:LUMINA:可拓拓扑ACOPF的基础模型
链接:https://arxiv.org/abs/2603.04300

作者:Yijiang Li,Zeeshan Memon,Hongwei Jin,Stefano Fenu,Keunju Song,Sunash B Sharma,Parfait Gasana,Hongseok Kim,Liang Zhao,Kibaek Kim
摘要:Foundation models in general promise to accelerate scientific computation by learning reusable representations across problem instances, yet constrained scientific systems, where predictions must satisfy physical laws and safety limits, pose unique challenges that stress conventional training paradigms. We derive design principles for constrained scientific foundation models through systematic investigation of AC optimal power flow (ACOPF), a representative optimization problem in power grid operations where power balance equations and operational constraints are non-negotiable. Through controlled experiments spanning architectures, training objectives, and system diversity, we extract three empirically grounded principles governing scientific foundation model design. These principles characterize three design trade-offs: learning physics-invariant representations while respecting system-specific constraints, optimizing accuracy while ensuring constraint satisfaction, and ensuring reliability in high-impact operating regimes. We present the LUMINA framework, including data processing and training pipelines to support reproducible research on physics-informed, feasibility-aware foundation models across scientific applications.


【5】Specialization of softmax attention heads: insights from the high-dimensional single-location model
标题:softmax注意力头的专业化:来自多维单地点模型的见解
链接:https://arxiv.org/abs/2603.03993

作者:M. Sagitova,O. Duranthon,L. Zdeborová
摘要:Multi-head attention enables transformer models to represent multiple attention patterns simultaneously. Empirically, head specialization emerges in distinct stages during training, while many heads remain redundant and learn similar representations. We propose a theoretical model capturing this phenomenon, based on the multi-index and single-location regression frameworks. In the first part, we analyze the training dynamics of multi-head softmax attention under SGD, revealing an initial unspecialized phase followed by a multi-stage specialization phase in which different heads sequentially align with latent signal directions. In the second part, we study the impact of attention activation functions on performance. We show that softmax-1 significantly reduces noise from irrelevant heads. Finally, we introduce the Bayes-softmax attention, which achieves optimal prediction performance in this setting.


【6】BD-Merging: Bias-Aware Dynamic Model Merging with Evidence-Guided Contrastive Learning
标题:BD合并:偏见感知动态模型与证据引导对比学习的合并
链接:https://arxiv.org/abs/2603.03920

作者:Yuhan Xie,Chen Lyu
备注:Accepted by CVPR 2026
摘要 :Model Merging (MM) has emerged as a scalable paradigm for multi-task learning (MTL), enabling multiple task-specific models to be integrated without revisiting the original training data. Despite recent progress, the reliability of MM under test-time distribution shift remains insufficiently understood. Most existing MM methods typically assume that test data are clean and distributionally aligned with both the training and auxiliary sources. However, this assumption rarely holds in practice, often resulting in biased predictions with degraded generalization. To address this issue, we present BD-Merging, a bias-aware unsupervised model merging framework that explicitly models uncertainty to achieve adaptive reliability under distribution shift. First, BD-Merging introduces a joint evidential head that learns uncertainty over a unified label space, capturing cross-task semantic dependencies in MM. Second, building upon this evidential foundation, we propose an Adjacency Discrepancy Score (ADS) that quantifies evidential alignment among neighboring samples. Third, guided by ADS, a discrepancy-aware contrastive learning mechanism refines the merged representation by aligning consistent samples and separating conflicting ones. Combined with general unsupervised learning, this process trains a debiased router that adaptively allocates task-specific or layer-specific weights on a per-sample basis, effectively mitigating the adverse effects of distribution shift. Extensive experiments across diverse tasks demonstrate that BD-Merging achieves superior effectiveness and robustness compared to state-of-the-art MM baselines.


【7】Believe Your Model: Distribution-Guided Confidence Calibration
标题:相信你的模型:分布引导的信心校准
链接:https://arxiv.org/abs/2603.03872

作者:Xizhong Yang,Haotian Zhang,Huiming Wang,Mofei Song
备注:38 pages
摘要:Large Reasoning Models have demonstrated remarkable performance with the advancement of test-time scaling techniques, which enhances prediction accuracy by generating multiple candidate responses and selecting the most reliable answer. While prior work has analyzed that internal model signals like confidence scores can partly indicate response correctness and exhibit a distributional correlation with accuracy, such distributional information has not been fully utilized to guide answer selection. Motivated by this, we propose DistriVoting, which incorporates distributional priors as another signal alongside confidence during voting. Specifically, our method (1) first decomposes the mixed confidence distribution into positive and negative components using Gaussian Mixture Models, (2) then applies a reject filter based on positive/negative samples from them to mitigate overlap between the two distributions. Besides, to further alleviate the overlap from the perspective of distribution itself, we propose SelfStepConf, which uses step-level confidence to dynamically adjust inference process, increasing the separation between the two distributions to improve the reliability of confidences in voting. Experiments across 16 models and 5 benchmarks demonstrate that our method significantly outperforms state-of-the-art approaches.


【8】Large-Margin Hyperdimensional Computing: A Learning-Theoretical Perspective
标题:大利润多维计算:学习理论的视角
链接:https://arxiv.org/abs/2603.03830

作者:Nikita Zeulin,Olga Galinina,Ravikumar Balakrishnan,Nageen Himayat,Sergey Andreev
备注:This work has been submitted to the IEEE for possible publication
摘要:Overparameterized machine learning (ML) methods such as neural networks may be prohibitively resource intensive for devices with limited computational capabilities. Hyperdimensional computing (HDC) is an emerging resource efficient and low-complexity ML method that allows hardware efficient implementations of (re-)training and inference procedures. In this paper, we propose a maximum-margin HDC classifier, which significantly outperforms baseline HDC methods on several benchmark datasets. Our method leverages a formal relation between HDC and support vector machines (SVMs) that we established for the first time. Our findings may inspire novel HDC methods with potentially more hardware-oriented implementations compared to SVMs, thus enabling more efficient learning solutions for various intelligent resource-constrained applications.


【9】Relational In-Context Learning via Synthetic Pre-training with Structural Prior
标题:通过具有结构先验的合成预训练进行关系内上下文学习
链接:https://arxiv.org/abs/2603.03805

作者:Yanbo Wang,Jiaxuan You,Chuan Shi,Muhan Zhang
摘要:Relational Databases (RDBs) are the backbone of modern business, yet they lack foundation models comparable to those in text or vision. A key obstacle is that high-quality RDBs are private, scarce and structurally heterogeneous, making internet-scale pre-training infeasible. To overcome this data scarcity, We introduce $\textbf{RDB-PFN}$, the first relational foundation model trained purely via $\textbf{synthetic data}$. Inspired by Prior-Data Fitted Networks (PFNs) where synthetic data generated from Structural Causal Models (SCMs) enables reasoning on single tables, we design a $\textbf{Relational Prior Generator}$ to create an infinite stream of diverse RDBs from scratch. Pre-training on $\textbf{over 2 million}$ synthetic single-table and relational tasks, RDB-PFN learns to adapt to any new database instantly via genuine $\textbf{in-context learning}$. Experiments verify RDB-PFN achieves strong few-shot performance on 19 real-world relational prediction tasks, outperforming graph-based and single-table foundation-model baselines (given the same DFS-linearized inputs), while using a lightweight architecture and fast inference. The code is available at https://github.com/MuLabPKU/RDBPFN


【10】Inverse Contextual Bandits without Rewards: Learning from a Non-Stationary Learner via Suffix Imitation
标题:无奖励的逆语境强盗:通过后缀模仿向非固定学习者学习
链接:https://arxiv.org/abs/2603.03778

作者:Yuqi Kong,Xiao Zhang,Weiran Shen
摘要 :We study the Inverse Contextual Bandit (ICB) problem, in which a learner seeks to optimize a policy while an observer, who cannot access the learner's rewards and only observes actions, aims to recover the underlying problem parameters. During the learning process, the learner's behavior naturally transitions from exploration to exploitation, resulting in non-stationary action data that poses significant challenges for the observer. To address this issue, we propose a simple and effective framework called Two-Phase Suffix Imitation. The framework discards data from an initial burn-in phase and performs empirical risk minimization using only data from a subsequent imitation phase. We derive a predictive decision loss bound that explicitly characterizes the bias-variance trade-off induced by the choice of burn-in length. Despite the severe information deficit, we show that a reward-free observer can achieve a convergence rate of $\tilde O(1/\sqrt{N})$, matching the asymptotic efficiency of a fully reward-aware learner. This result demonstrates that a passive observer can effectively uncover the optimal policy from actions alone, attaining performance comparable to that of the learner itself.


【11】Principled Learning-to-Communicate with Quasi-Classical Information Structures
标题:原则性学习与准经典信息结构沟通
链接:https://arxiv.org/abs/2603.03664

作者:Xiangyu Liu,Haoyi You,Kaiqing Zhang
备注:Preliminary version appeared at IEEE CDC 2025
摘要:Learning-to-communicate (LTC) in partially observable environments has received increasing attention in deep multi-agent reinforcement learning, where the control and communication strategies are jointly learned. Meanwhile, the impact of communication on decision-making has been extensively studied in control theory. In this paper, we seek to formalize and better understand LTC by bridging these two lines of work, through the lens of information structures (ISs). To this end, we formalize LTC in decentralized partially observable Markov decision processes (Dec-POMDPs) under the common-information-based framework from decentralized stochastic control, and classify LTC problems based on the ISs before (additional) information sharing. We first show that non-classical LTCs are computationally intractable in general, and thus focus on quasi-classical (QC) LTCs. We then propose a series of conditions for QC LTCs, under which LTCs preserve the QC IS after information sharing, whereas violating which can cause computational hardness in general. Further, we develop provable planning and learning algorithms for QC LTCs, and establish quasi-polynomial time and sample complexities for several QC LTC examples that satisfy the above conditions. Along the way, we also establish results on the relationship between (strictly) QC IS and the condition of having strategy-independent common-information-based beliefs (SI-CIBs), as well as on solving Dec-POMDPs without computationally intractable oracles but beyond those with SI-CIBs, which may be of independent interest.


【12】mlx-snn: Spiking Neural Networks on Apple Silicon via MLX
标题:mlx-snn:通过MLX在Apple Silicon上插入神经网络
链接:https://arxiv.org/abs/2603.03529

作者:Jiahao Qin
备注:11 pages 3 figures
摘要:We introduce mlx-snn, the first spiking neural network (SNN) library built natively on Apple's MLX framework. As SNN research grows rapidly, all major libraries -- snnTorch, Norse, SpikingJelly, Lava -- target PyTorch or custom backends, leaving Apple Silicon users without a native option. mlx-snn provides six neuron models (LIF, IF, Izhikevich, Adaptive LIF, Synaptic, Alpha), four surrogate gradient functions, four spike encoding methods (including an EEG-specific encoder), and a complete backpropagation-through-time training pipeline. The library leverages MLX's unified memory architecture, lazy evaluation, and composable function transforms (mx.grad, mx.compile) to enable efficient SNN research on Apple Silicon hardware. We validate mlx-snn on MNIST digit classification across five hyperparameter configurations and three backends, achieving up to 97.28% accuracy with 2.0--2.5 times faster training and 3--10 times lower GPU memory than snnTorch on the same M3 Max hardware. mlx-snn is open-source under the MIT license and available on PyPI. https://github.com/D-ST-Sword/mlx-snn


【13】Biased Generalization in Diffusion Models
标题:扩散模型的有偏推广
链接:https://arxiv.org/abs/2603.03469

作者:Jerome Garnier-Brun,Luca Biggio,Davide Beltrame,Marc Mézard,Luca Saglietti
备注:10 pages, 6 figures
摘要:Generalization in generative modeling is defined as the ability to learn an underlying distribution from a finite dataset and produce novel samples, with evaluation largely driven by held-out performance and perceived sample quality. In practice, training is often stopped at the minimum of the test loss, taken as an operational indicator of generalization. We challenge this viewpoint by identifying a phase of biased generalization during training, in which the model continues to decrease the test loss while favoring samples with anomalously high proximity to training data. By training the same network on two disjoint datasets and comparing the mutual distances of generated samples and their similarity to training data, we introduce a quantitative measure of bias and demonstrate its presence on real images. We then study the mechanism of bias, using a controlled hierarchical data model where access to exact scores and ground-truth statistics allows us to precisely characterize its onset. We attribute this phenomenon to the sequential nature of feature learning in deep networks, where coarse structure is learned early in a data-independent manner, while finer features are resolved later in a way that increasingly depends on individual training samples. Our results show that early stopping at the test loss minimum, while optimal under standard generalization criteria, may be insufficient for privacy-critical applications.


【14】SELDON: Supernova Explosions Learned by Deep ODE Networks
标题:SELDON:深度ODE网络了解超新星爆炸
链接:https://arxiv.org/abs/2603.04392

作者:Jiezhong Wu,Jack O'Brien,Jennifer Li,M. S. Krafczyk,Ved G. Shah,Amanda R. Wasserman,Daniel W. Apley,Gautham Narayan,Noelle I. Samia
备注:Accepted at AAAI 2026 (Proceedings of the AAAI Conference on Artificial Intelligence)
摘要 :The discovery rate of optical transients will explode to 10 million public alerts per night once the Vera C. Rubin Observatory's Legacy Survey of Space and Time comes online, overwhelming the traditional physics-based inference pipelines. A continuous-time forecasting AI model is of interest because it can deliver millisecond-scale inference for thousands of objects per day, whereas legacy MCMC codes need hours per object. In this paper, we propose SELDON, a new continuous-time variational autoencoder for panels of sparse and irregularly time-sampled (gappy) astrophysical light curves that are nonstationary, heteroscedastic, and inherently dependent. SELDON combines a masked GRU-ODE encoder with a latent neural ODE propagator and an interpretable Gaussian-basis decoder. The encoder learns to summarize panels of imbalanced and correlated data even when only a handful of points are observed. The neural ODE then integrates this hidden state forward in continuous time, extrapolating to future unseen epochs. This extrapolated time series is further encoded by deep sets to a latent distribution that is decoded to a weighted sum of Gaussian basis functions, the parameters of which are physically meaningful. Such parameters (e.g., rise time, decay rate, peak flux) directly drive downstream prioritization of spectroscopic follow-up for astrophysical surveys. Beyond astronomy, the architecture of SELDON offers a generic recipe for interpretable and continuous-time sequence modeling in any time domain where data are multivariate, sparse, heteroscedastic, and irregularly spaced.


【15】Exploiting Subgradient Sparsity in Max-Plus Neural Networks
标题:利用Max-Plus神经网络中的次梯度稀疏性
链接:https://arxiv.org/abs/2603.04133

作者:Ikhlas Enaieh,Olivier Fercoq
摘要:Deep Neural Networks are powerful tools for solving machine learning problems, but their training often involves dense and costly parameter updates. In this work, we use a novel Max-Plus neural architecture in which classical addition and multiplication are replaced with maximum and summation operations respectively. This is a promising architecture in terms of interpretability, but its training is challenging. A particular feature is that this algebraic structure naturally induces sparsity in the subgradients, as only neurons that contribute to the maximum affect the loss. However, standard backpropagation fails to exploit this sparsity, leading to unnecessary computations. In this work, we focus on the minimization of the worst sample loss which transfers this sparsity to the optimization loss. To address this, we propose a sparse subgradient algorithm that explicitly exploits the algebraic sparsity. By tailoring the optimization procedure to the non-smooth nature of Max-Plus models, our method achieves more efficient updates while retaining theoretical guarantees. This highlights a principled path toward bridging algebraic structure and scalable learning.


【16】Non-Invasive Reconstruction of Cardiac Activation Dynamics Using Physics-Informed Neural Networks
标题:使用物理知识神经网络无创重建心脏激活动力学
链接:https://arxiv.org/abs/2603.03832

作者:Nathan Dermul,Hans Dierckx
摘要:Cardiac arrhythmogenesis is governed by complex electromechanical interactions that are not directly observable in vivo, motivating the development of non-invasive computational approaches for reconstructing three-dimensional activation dynamics. We present a physics-informed neural network framework for recovering cardiac activation patterns, active tension propagation, deformation fields, and hydrostatic pressure from measurable deformation data in simplified left ventricular geometries. Our approach integrates nonlinear anisotropic constitutive modeling, heterogeneous fiber orientation, weak formulations of the governing mechanics, and finite-element-based loss functions to embed physical constraints directly into training.   We demonstrate that the proposed framework accurately reconstructs spatiotemporal activation dynamics under varying levels of measurement noise and reduced spatial resolution, while preserving global propagation patterns and activation timing. By coupling mechanistic modeling with data-driven inference, this method establishes a pathway toward patient-specific, non-invasive reconstruction of cardiac activation, with potential applications in digital phenotyping and computational support for arrhythmia assessment.


【17】Generalization Properties of Score-matching Diffusion Models for Intrinsically Low-dimensional Data
标题:本质低维数据分数匹配扩散模型的推广性质
链接:https://arxiv.org/abs/2603.03700

作者:Saptarshi Chakraborty,Quentin Berthet,Peter L. Bartlett
摘要:Despite the remarkable empirical success of score-based diffusion models, their statistical guarantees remain underdeveloped. Existing analyses often provide pessimistic convergence rates that do not reflect the intrinsic low-dimensional structure common in real data, such as that arising in natural images. In this work, we study the statistical convergence of score-based diffusion models for learning an unknown distribution $μ$ from finitely many samples. Under mild regularity conditions on the forward diffusion process and the data distribution, we derive finite-sample error bounds on the learned generative distribution, measured in the Wasserstein-$p$ distance. Unlike prior results, our guarantees hold for all $p \ge 1$ and require only a finite-moment assumption on $μ$, without compact-support, manifold, or smooth-density conditions. Specifically, given $n$ i.i.d.\ samples from $μ$ with finite $q$-th moment and appropriately chosen network architectures, hyperparameters, and discretization schemes, we show that the expected Wasserstein-$p$ error between the learned distribution $\hatμ$ and $μ$ scales as $\mathbb{E}\, \mathbb{W}_p(\hatμ,μ) = \widetilde{O}\!\left(n^{-1 / d^\ast_{p,q}(μ)}\right),$ where $d^\ast_{p,q}(μ)$ is the $(p,q)$-Wasserstein dimension of $μ$. Our results demonstrate that diffusion models naturally adapt to the intrinsic geometry of data and mitigate the curse of dimensionality, since the convergence rate depends on $d^\ast_{p,q}(μ)$ rather than the ambient dimension. Moreover, our theory conceptually bridges the analysis of diffusion models with that of GANs and the sharp minimax rates established in optimal transport. The proposed $(p,q)$-Wasserstein dimension also extends classical Wasserstein dimension notions to distributions with unbounded support, which may be of independent theoretical interest.


【18】Quantifying Ranking Instability Across Evaluation Protocol Axes in Gene Regulatory Network Benchmarking
标题:量化基因监管网络基准评估中评估协议轴的排名不稳定性
链接:https://arxiv.org/abs/2603.03493

作者:Ihor Kendiukhov
摘要 :Benchmark rankings are routinely used to justify scientific claims about method quality in gene regulatory network (GRN) inference, yet the stability of these rankings under plausible evaluation protocol choices is rarely examined. We present a systematic diagnostic framework for measuring ranking instability under protocol shift, including decomposition tools that separate base rate effects from discrimination effects. Using existing single cell GRN benchmark outputs across three human tissues and six inference methods, we quantify pairwise reversal rates across four protocol axes: candidate set restriction (16.3 percent, 95 percent CI 11.0 to 23.4 percent), tissue context (19.3 percent), reference network choice (32.1 percent), and symbol mapping policy (0.0 percent). A permutation null confirms that observed reversal rates are far below random order expectations (0.163 versus null mean 0.500), indicating partially stable but non invariant ranking structure. Our decomposition reveals that reversals are driven by changes in the relative discrimination ability of methods rather than by base rate inflation, a finding that challenges a common implicit assumption in GRN benchmarking. We propose concrete reporting practices for stability aware evaluation and provide a diagnostic toolkit for identifying method pairs at risk of reversal.


【19】Automated Measurement of Geniohyoid Muscle Thickness During Speech Using Deep Learning and Ultrasound
标题:使用深度学习和超声自动测量语音期间的膝舌骨肌厚度
链接:https://arxiv.org/abs/2603.03350

作者:Alisher Myrgyyassov,Bruce Xiao Wang,Yu Sun,Shuming Huang,Zhen Song,Min Ney Wong,Yongping Zheng
备注:6 pages, including references and acknowledgements. Submitted to Interspeech 2026
摘要:Manual measurement of muscle morphology from ultrasound during speech is time-consuming and limits large-scale studies. We present SMMA, a fully automated framework that combines deep-learning segmentation with skeleton-based thickness quantification to analyze geniohyoid (GH) muscle dynamics. Validation demonstrates near-human-level accuracy (Dice = 0.9037, MAE = 0.53 mm, r = 0.901). Application to Cantonese vowel production (N = 11) reveals systematic patterns: /a:/ shows significantly greater GH thickness (7.29 mm) than /i:/ (5.95 mm, p < 0.001, Cohen's d > 1.3), suggesting greater GH activation during production of /a:/ than /i:/, consistent with its role in mandibular depression. Sex differences (5-8% greater in males) reflect anatomical scaling. SMMA achieves expert-validated accuracy while eliminating the need for manual annotation, enabling scalable investigations of speech motor control and objective assessment of speech and swallowing disorders.


【20】GreenPhase: A Green Learning Approach for Earthquake Phase Picking
标题:GreenPhase:地震相选择的绿色学习方法
链接:https://arxiv.org/abs/2603.03344

作者:Yixing Wu,Shiou-Ya Wang,Dingyi Nie,Sanket Kumbhar,Yun-Tung Hsieh,Yun-Cheng Wang,Po-Chyi Su,C. -C. Jay Kuo
摘要:Earthquake detection and seismic phase picking are fundamental yet challenging tasks in seismology due to low signal-to-noise ratios, waveform variability, and overlapping events. Recent deep-learning models achieve strong results but rely on large datasets and heavy backpropagation training, raising concerns over efficiency, interpretability, and sustainability.   We propose GreenPhase, a multi-resolution, feed-forward, and mathematically interpretable model based on the Green Learning framework. GreenPhase comprises three resolution levels, each integrating unsupervised representation learning, supervised feature learning, and decision learning. Its feed-forward design eliminates backpropagation, enabling independent module optimization with stable training and clear interpretability. Predictions are refined from coarse to fine resolutions while computation is restricted to candidate regions.   On the Stanford Earthquake Dataset (STEAD), GreenPhase achieves excellent performance with F1 scores of 1.0 for detection, 0.98 for P-wave picking, and 0.96 for S-wave picking. This is accomplished while reducing the computational cost (FLOPs) for inference by approximately 83% compared to state-of-the-art models. These results demonstrate that the proposed model provides an efficient, interpretable, and sustainable alternative for large-scale seismic monitoring.


其他(41篇)

【1】Turning Trust to Transactions: Tracking Affiliate Marketing and FTC Compliance in YouTube's Influencer Economy
标题:将信任转向交易:跟踪YouTube影响者经济中的联属营销和FTC合规性
链接:https://arxiv.org/abs/2603.04383

作者:Chen Sun,Yash Vekaria,Zubair Shafiq,Rishab Nithyanand
备注:ICWSM 2026
摘要:YouTube has evolved into a powerful platform that where creators monetize their influence through affiliate marketing, raising concerns about transparency and ethics, especially when creators fail to disclose their affiliate relationships. Although regulatory agencies like the US Federal Trade Commission (FTC) have issued guidelines to address these issues, non-compliance and consumer harm persist, and the extent of these problems remains unclear. In this paper, we introduce tools, developed with insights from recent advances in Web measurement and NLP research, to examine the state of the affiliate marketing ecosystem on YouTube. We apply these tools to a 10-year dataset of 2 million videos from nearly 540,000 creators, analyzing the prevalence of affiliate marketing on YouTube and the rates of non-compliant behavior. Our findings reveal that affiliate links are widespread, yet dis- closure compliance remains low, with most videos failing to meet FTC standards. Furthermore, we analyze the effects of different stakeholders in improving disclosure behavior. Our study suggests that the platform is highly associated with improved compliance through standardized disclosure features. We recommend that regulators and affiliate partners collaborate with platforms to enhance transparency, accountability, and trust in the influencer economy.


【2】Robustness of Agentic AI Systems via Adversarially-Aligned Jacobian Regularization
标题:通过对抗对齐雅可比正规化实现广义人工智能系统的鲁棒性
链接:https://arxiv.org/abs/2603.04378

作者:Furkan Mumcu,Yasin Yilmaz
摘要:As Large Language Models (LLMs) transition into autonomous multi-agent ecosystems, robust minimax training becomes essential yet remains prone to instability when highly non-linear policies induce extreme local curvature in the inner maximization. Standard remedies that enforce global Jacobian bounds are overly conservative, suppressing sensitivity in all directions and inducing a large Price of Robustness. We introduce Adversarially-Aligned Jacobian Regularization (AAJR), a trajectory-aligned approach that controls sensitivity strictly along adversarial ascent directions. We prove that AAJR yields a strictly larger admissible policy class than global constraints under mild conditions, implying a weakly smaller approximation gap and reduced nominal performance degradation. Furthermore, we derive step-size conditions under which AAJR controls effective smoothness along optimization trajectories and ensures inner-loop stability. These results provide a structural theory for agentic robustness that decouples minimax stability from global expressivity restrictions.


【3】Low-Resource Guidance for Controllable Latent Audio Diffusion
标题:用于可控潜在音频扩散的低资源指导
链接:https://arxiv.org/abs/2603.04366

作者:Zachary Novack,Zack Zukowski,CJ Carr,Julian Parker,Zach Evans,Josiah Taylor,Taylor Berg-Kirkpatrick,Julian McAuley,Jordi Pons
备注:Accepted at ICASSP 2026
摘要:Generative audio requires fine-grained controllable outputs, yet most existing methods require model retraining on specific controls or inference-time controls (\textit{e.g.}, guidance) that can also be computationally demanding. By examining the bottlenecks of existing guidance-based controls, in particular their high cost-per-step due to decoder backpropagation, we introduce a guidance-based approach through selective TFG and Latent-Control Heads (LatCHs), which enables controlling latent audio diffusion models with low computational overhead. LatCHs operate directly in latent space, avoiding the expensive decoder step, and requiring minimal training resources (7M parameters and $\approx$ 4 hours of training). Experiments with Stable Audio Open demonstrate effective control over intensity, pitch, and beats (and a combination of those) while maintaining generation quality. Our method balances precision and audio fidelity with far lower computational costs than standard end-to-end guidance. Demo examples can be found at https://zacharynovack.github.io/latch/latch.html.


【4】Dissecting Quantization Error: A Concentration-Alignment Perspective
标题:解析量化误差:一个集中对齐的观点
链接:https://arxiv.org/abs/2603.04359

作者:Marco Federici,Boris van Breugel,Paul Whatmough,Markus Nagel
摘要:Quantization can drastically increase the efficiency of large language and vision models, but typically incurs an accuracy drop. Recently, function-preserving transforms (e.g. rotations, Hadamard transform, channel-wise scaling) have been successfully applied to reduce post-training quantization error, yet a principled explanation remains elusive. We analyze linear-layer quantization via the signal-to-quantization-noise ratio (SQNR), showing that for uniform integer quantization at a fixed bit width, SQNR decomposes into (i) the concentration of weights and activations (capturing spread and outliers), and (ii) the alignment of their dominant variation directions. This reveals an actionable insight: beyond concentration - the focus of most prior transforms (e.g. rotations or Hadamard) - improving alignment between weight and activation can further reduce quantization error. Motivated by this, we introduce block Concentration-Alignment Transforms (CAT), a lightweight linear transformation that uses a covariance estimate from a small calibration set to jointly improve concentration and alignment, approximately maximizing SQNR. Experiments across several LLMs show that CAT consistently matches or outperforms prior transform-based quantization methods at 4-bit precision, confirming the insights gained in our framework.


【5】RoboCasa365: A Large-Scale Simulation Framework for Training and Benchmarking Generalist Robots
标题:RoboCasa365:用于训练和基准多面手机器人的大规模模拟框架
链接:https://arxiv.org/abs/2603.04356

作者:Soroush Nasiriany,Sepehr Nasiriany,Abhiram Maddukuri,Yuke Zhu
备注:ICLR 2026; First three authors contributed equally
摘要:Recent advances in robot learning have accelerated progress toward generalist robots that can perform everyday tasks in human environments. Yet it remains difficult to gauge how close we are to this vision. The field lacks a reproducible, large-scale benchmark for systematic evaluation. To fill this gap, we present RoboCasa365, a comprehensive simulation benchmark for household mobile manipulation. Built on the RoboCasa platform, RoboCasa365 introduces 365 everyday tasks across 2,500 diverse kitchen environments, with over 600 hours of human demonstration data and over 1600 hours of synthetically generated demonstration data -- making it one of the most diverse and large-scale resources for studying generalist policies. RoboCasa365 is designed to support systematic evaluations for different problem settings, including multi-task learning, robot foundation model training, and lifelong learning. We conduct extensive experiments on this benchmark with state-of-the-art methods and analyze the impacts of task diversity, dataset scale, and environment variation on generalization. Our results provide new insights into what factors most strongly affect the performance of generalist robots and inform strategies for future progress in the field.


【6】A Constrained RL Approach for Cost-Efficient Delivery of Latency-Sensitive Applications
标题:一种具有成本效益的延迟敏感应用程序交付的约束RL方法
链接:https://arxiv.org/abs/2603.04353

作者:Ozan Aygün,Vincenzo Norman Vitale,Antonia M. Tulino,Hao Feng,Elza Erkip,Jaime Llorca
备注:7 pages, 4 figures, accepted for publication in 2025 59th Asilomar Conference on Signals, Systems, and Computers
摘要:Next-generation networks aim to provide performance guarantees to real-time interactive services that require timely and cost-efficient packet delivery. In this context, the goal is to reliably deliver packets with strict deadlines imposed by the application while minimizing overall resource allocation cost. A large body of work has leveraged stochastic optimization techniques to design efficient dynamic routing and scheduling solutions under average delay constraints; however, these methods fall short when faced with strict per-packet delay requirements. We formulate the minimum-cost delay-constrained network control problem as a constrained Markov decision process and utilize constrained deep reinforcement learning (CDRL) techniques to effectively minimize total resource allocation cost while maintaining timely throughput above a target reliability level. Results indicate that the proposed CDRL-based solution can ensure timely packet delivery even when existing baselines fall short, and it achieves lower cost compared to other throughput-maximizing methods.


【7】Enhancing Authorship Attribution with Synthetic Paintings
标题:用合成绘画增强作者归属感
链接:https://arxiv.org/abs/2603.04343

作者:Clarissa Loures,Caio Hosken,Luan Oliveira,Gianlucca Zuin,Adriano Veloso
备注:Accepted for publication at the 24th IEEE International Conference on Machine Learning and Applications (ICMLA 2025)
摘要:Attributing authorship to paintings is a historically complex task, and one of its main challenges is the limited availability of real artworks for training computational models. This study investigates whether synthetic images, generated through DreamBooth fine-tuning of Stable Diffusion, can improve the performance of classification models in this context. We propose a hybrid approach that combines real and synthetic data to enhance model accuracy and generalization across similar artistic styles. Experimental results show that adding synthetic images leads to higher ROC-AUC and accuracy compared to using only real paintings. By integrating generative and discriminative methods, this work contributes to the development of computer vision techniques for artwork authentication in data-scarce scenarios.


【8】Algorithmic Compliance and Regulatory Loss in Digital Assets
标题:数字资产的名义合规和监管损失
链接:https://arxiv.org/abs/2603.04328

作者:Khem Raj Bhatt,Krishna Sharma
摘要:We study the deployment performance of machine learning based enforcement systems used in cryptocurrency anti money laundering (AML). Using forward looking and rolling evaluations on Bitcoin transaction data, we show that strong static classification metrics substantially overstate real world regulatory effectiveness. Temporal nonstationarity induces pronounced instability in cost sensitive enforcement thresholds, generating large and persistent excess regulatory losses relative to dynamically optimal benchmarks. The core failure arises from miscalibration of decision rules rather than from declining predictive accuracy per se. These findings underscore the fragility of fixed AML enforcement policies in evolving digital asset markets and motivate loss-based evaluation frameworks for regulatory oversight.


【9】Scalable Evaluation of the Realism of Synthetic Environmental Augmentations in Images
标题:图像中合成环境增强现实性的可扩展评估
链接:https://arxiv.org/abs/2603.04325

作者:Damian J. Ruck,Paul Vautravers,Oliver Chalkley,Jake Thomas
摘要:Evaluation of AI systems often requires synthetic test cases, particularly for rare or safety-critical conditions that are difficult to observe in operational data. Generative AI offers a promising approach for producing such data through controllable image editing, but its usefulness depends on whether the resulting images are sufficiently realistic to support meaningful evaluation.   We present a scalable framework for assessing the realism of synthetic image-editing methods and apply it to the task of adding environmental conditions-fog, rain, snow, and nighttime-to car-mounted camera images. Using 40 clear-day images, we compare rule-based augmentation libraries with generative AI image-editing models. Realism is evaluated using two complementary automated metrics: a vision-language model (VLM) jury for perceptual realism assessment, and embedding-based distributional analysis to measure similarity to genuine adverse-condition imagery.   Generative AI methods substantially outperform rule-based approaches, with the best generative method achieving approximately 3.6 times the acceptance rate of the best rule-based method. Performance varies across conditions: fog proves easiest to simulate, while nighttime transformations remain challenging. Notably, the VLM jury assigns imperfect acceptance even to real adverse-condition imagery, establishing practical ceilings against which synthetic methods can be judged. By this standard, leading generative methods match or exceed real-image performance for most conditions.   These results suggest that modern generative image-editing models can enable scalable generation of realistic adverse-condition imagery for evaluation pipelines. Our framework therefore provides a practical approach for scalable realism evaluation, though validation against human studies remains an important direction for future work.


【10】LabelBuddy: An Open Source Music and Audio Language Annotation Tagging Tool Using AI Assistance
标题:LabelBuddy:使用人工智能协助的开源音乐和音频语言注释标记工具
链接:https://arxiv.org/abs/2603.04293

作者:Ioannis Prokopiou,Ioannis Sina,Agisilaos Kounelis,Pantelis Vikatos,Themos Stafylakis
备注:Accepted at NLP4MusA 2026 (4th Workshop on NLP for Music and Audio)
摘要:The advancement of Machine learning (ML), Large Audio Language Models (LALMs), and autonomous AI agents in Music Information Retrieval (MIR) necessitates a shift from static tagging to rich, human-aligned representation learning. However, the scarcity of open-source infrastructure capable of capturing the subjective nuances of audio annotation remains a critical bottleneck. This paper introduces \textbf{LabelBuddy}, an open-source collaborative auto-tagging audio annotation tool designed to bridge the gap between human intent and machine understanding. Unlike static tools, it decouples the interface from inference via containerized backends, allowing users to plug in custom models for AI-assisted pre-annotation. We describe the system architecture, which supports multi-user consensus, containerized model isolation, and a roadmap for extending agents and LALMs. Code available at https://github.com/GiannisProkopiou/gsoc2022-Label-buddy.


【11】Agentics 2.0: Logical Transduction Algebra for Agentic Data Workflows
标题:统计学2.0:统计学数据工作流的逻辑转换代数
链接:https://arxiv.org/abs/2603.04241

作者:Alfio Massimiliano Gliozzo,Junkyu Lee,Nahuel Defosse
备注:14 pages, 4 figures
摘要 :Agentic AI is rapidly transitioning from research prototypes to enterprise deployments, where requirements extend to meet the software quality attributes of reliability, scalability, and observability beyond plausible text generation. We present Agentics 2.0, a lightweight, Python-native framework for building high-quality, structured, explainable, and type-safe agentic data workflows. At the core of Agentics 2.0, the logical transduction algebra formalizes a large language model inference call as a typed semantic transformation, which we call a transducible function that enforces schema validity and the locality of evidence. The transducible functions compose into larger programs via algebraically grounded operators and execute as stateless asynchronous calls in parallel in asynchronous Map-Reduce programs. The proposed framework provides semantic reliability through strong typing, semantic observability through evidence tracing between slots of the input and output types, and scalability through stateless parallel execution. We instantiate reusable design patterns and evaluate the programs in Agentics 2.0 on challenging benchmarks, including DiscoveryBench for data-driven discovery and Archer for NL-to-SQL semantic parsing, demonstrating state-of-the-art performance.


【12】A Multi-Agent Framework for Interpreting Multivariate Physiological Time Series
标题:解释多元生理时间序列的多智能体框架
链接:https://arxiv.org/abs/2603.04142

作者:Davide Gabrielli,Paola Velardi,Stefano Faralli,Bardh Prenkaj
摘要:Continuous physiological monitoring is central to emergency care, yet deploying trustworthy AI is challenging. While LLMs can translate complex physiological signals into clinical narratives, it is unclear how agentic systems perform relative to zero-shot inference. To address these questions, we present Vivaldi, a role-structured multi-agent system that explains multivariate physiological time series. Due to regulatory constraints that preclude live deployment, we instantiate Vivaldi in a controlled, clinical pilot to a small, highly qualified cohort of emergency medicine experts, whose evaluations reveal a context-dependent picture that contrasts with prevailing assumptions that agentic reasoning uniformly improves performance. Our experiments show that agentic pipelines substantially benefit non-thinking and medically fine-tuned models, improving expert-rated explanation justification and relevance by +6.9 and +9.7 points, respectively. Contrarily, for thinking models, agentic orchestration often degrades explanation quality, including a 14-point drop in relevance, while improving diagnostic precision (ESI F1 +3.6). We also find that explicit tool-based computation is decisive for codifiable clinical metrics, whereas subjective targets, such as pain scores and length of stay, show limited or inconsistent changes. Expert evaluation further indicates that gains in clinical utility depend on visualization conventions, with medically specialized models achieving the most favorable trade-offs between utility and clarity. Together, these findings show that the value of agentic AI lies in the selective externalization of computation and structure rather than in maximal reasoning complexity, and highlight concrete design trade-offs and learned lessons, broadly applicable to explainable AI in safety-critical healthcare settings.


【13】Reducing hyperparameter sensitivity in measurement-feedback based Ising machines
标题:降低基于测量反馈的伊辛机中的超参数敏感性
链接:https://arxiv.org/abs/2603.04093

作者:Toon Sevenants,Guy Van der Sande,Guy Verschaffelt
备注:15 pages, 11 figures
摘要:Analog Ising machines have been proposed as heuristic hardware solvers for combinatorial optimization problems, with the potential to outperform conventional approaches, provided that their hyperparameters are carefully tuned. Their temporal evolution is often described using time-continuous dynamics. However, most experimental implementations rely on measurement-feedback architectures that operate in a time-discrete manner. We observe that in such setups, the range of effective hyperparameters is substantially smaller than in the envisioned time-continuous analog Ising machine. In this paper, we analyze this discrepancy and discuss its impact on the practical operation of Ising machines. Next, we propose and experimentally verify a method to reduce the sensitivity to hyperparameter selection of these measurement-feedback architectures.


【14】DQE-CIR: Distinctive Query Embeddings through Learnable Attribute Weights and Target Relative Negative Sampling in Composed Image Retrieval
标题:DQE-CIR:合成图像检索中通过可学习属性权重和目标相对负采样的独特查询嵌入
链接:https://arxiv.org/abs/2603.04037

作者:Geon Park,Ji-Hoon Park,Seong-Whan Lee
备注:33 pages
摘要:Composed image retrieval (CIR) addresses the task of retrieving a target image by jointly interpreting a reference image and a modification text that specifies the intended change. Most existing methods are still built upon contrastive learning frameworks that treat the ground truth image as the only positive instance and all remaining images as negatives. This strategy inevitably introduces relevance suppression, where semantically related yet valid images are incorrectly pushed away, and semantic confusion, where different modification intents collapse into overlapping regions of the embedding space. As a result, the learned query representations often lack discriminativeness, particularly at fine-grained attribute modifications. To overcome these limitations, we propose distinctive query embeddings through learnable attribute weights and target relative negative sampling (DQE-CIR), a method designed to learn distinctive query embeddings by explicitly modeling target relative relevance during training. DQE-CIR incorporates learnable attribute weighting to emphasize distinctive visual features conditioned on the modification text, enabling more precise feature alignment between language and vision. Furthermore, we introduce target relative negative sampling, which constructs a target relative similarity distribution and selects informative negatives from a mid-zone region that excludes both easy negatives and ambiguous false negatives. This strategy enables more reliable retrieval for fine-grained attribute changes by improving query discriminativeness and reducing confusion caused by semantically similar but irrelevant candidates.


【15】mlx-vis: GPU-Accelerated Dimensionality Reduction and Visualization on Apple Silicon
标题:mlx-vis:Apple Silicon上的GOP加速简化和可视化
链接:https://arxiv.org/abs/2603.04035

作者:Han Xiao
摘要 :mlx-vis is a Python library that implements six dimensionality reduction methods and a k-nearest neighbor graph algorithm entirely in MLX, Apple's array framework for Apple Silicon. The library provides UMAP, t-SNE, PaCMAP, TriMap, DREAMS, CNE, and NNDescent, all executing on Metal GPU through a unified fit_transform interface. Beyond embedding computation, mlx-vis includes a GPU-accelerated circle-splatting renderer that produces scatter plots and smooth animations without matplotlib, composing frames via scatter-add alpha blending on GPU and piping them to hardware H.264 encoding. On Fashion-MNIST with 70,000 points, all methods complete embedding in 2.1-3.8 seconds and render 800-frame animations in 1.4 seconds on an M3 Ultra, with the full pipeline from raw data to rendered video finishing in 3.6-5.2 seconds. The library depends only on MLX and NumPy, is released under the Apache 2.0 license, and is available at https://github.com/hanxiao/mlx-vis.


【16】Training-Free Rate-Distortion-Perception Traversal With Diffusion
标题:无训练率-扭曲-认知悲剧与扩散
链接:https://arxiv.org/abs/2603.04005

作者:Yuhan Wang,Suzhi Bi,Ying-Jun Angela Zhang
备注:40 pages, 17 figures
摘要:The rate-distortion-perception (RDP) tradeoff characterizes the fundamental limits of lossy compression by jointly considering bitrate, reconstruction fidelity, and perceptual quality. While recent neural compression methods have improved perceptual performance, they typically operate at fixed points on the RDP surface, requiring retraining to target different tradeoffs. In this work, we propose a training-free framework that leverages pre-trained diffusion models to traverse the entire RDP surface. Our approach integrates a reverse channel coding (RCC) module with a novel score-scaled probability flow ODE decoder. We theoretically prove that the proposed diffusion decoder is optimal for the distortion-perception tradeoff under AWGN observations and that the overall framework with the RCC module achieves the optimal RDP function in the Gaussian case. Empirical results across multiple datasets demonstrate the framework's flexibility and effectiveness in navigating the ternary RDP tradeoff using pre-trained diffusion models. Our results establish a practical and theoretically grounded approach to adaptive, perception-aware compression.


【17】Upholding Epistemic Agency: A Brouwerian Assertibility Constraint for Responsible AI
标题:维护认识机构:对负责任人工智能的布劳沃式断言约束
链接:https://arxiv.org/abs/2603.03971

作者:Michael Jülich
备注:Preprint. 63 pages, 6 figures, 2 tables
摘要:Generative AI can convert uncertainty into authoritative-seeming verdicts, displacing the justificatory work on which democratic epistemic agency depends. As a corrective, I propose a Brouwer-inspired assertibility constraint for responsible AI: in high-stakes domains, systems may assert or deny claims only if they can provide a publicly inspectable and contestable certificate of entitlement; otherwise they must return "Undetermined". This constraint yields a three-status interface semantics (Asserted, Denied, Undetermined) that cleanly separates internal entitlement from public standing while connecting them via the certificate as a boundary object. It also produces a time-indexed entitlement profile that is stable under numerical refinement yet revisable as the public record changes. I operationalize the constraint through decision-layer gating of threshold and argmax outputs, using internal witnesses (e.g., sound bounds or separation margins) and an output contract with reason-coded abstentions. A design lemma shows that any total, certificate-sound binary interface already decides the deployed predicate on its declared scope, so "Undetermined" is not a tunable reject option but a mandatory status whenever no forcing witness is available. By making outputs answerable to challengeable warrants rather than confidence alone, the paper aims to preserve epistemic agency where automated speech enters public justification.


【18】A Bi-Stage Framework for Automatic Development of Pixel-Based Planar Antenna Structures
标题:基于像素的平面天线结构自动开发的两阶段框架
链接:https://arxiv.org/abs/2603.03810

作者:Khadijeh Askaripour,Adrian Bekasiewicz,Slawomir Koziel
摘要:Development of modern antennas is a cognitive process that intertwines experience-driven determination of topology and tuning of its parameters to fulfill the performance specifications. Alternatively, the task can be formulated as an optimization problem so as to reduce reliance of geometry selection on engineering insight. In this work, a bi-stage framework for automatic generation of antennas is considered. The method determines free-form topology through optimization of interconnections between components (so-called pixels) that constitute the radiator. Here, the process involves global optimization of connections between pixels followed by fine-tuning of the resulting topology using a surrogate-assisted local-search algorithm to fulfill the design re-quirements. The approach has been demonstrated based on two case studies concerning development of broadband and dual-band monopole antennas.


【19】MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity Barrier
标题:MOOSE-Star:通过打破复杂性障碍来解锁科学发现的可追踪训练
链接:https://arxiv.org/abs/2603.03756

作者:Zonglin Yang,Lidong Bing
摘要:While large language models (LLMs) show promise in scientific discovery, existing research focuses on inference or feedback-driven training, leaving the direct modeling of the generative reasoning process, $P(\text{hypothesis}|\text{background})$ ($P(h|b)$), unexplored. We demonstrate that directly training $P(h|b)$ is mathematically intractable due to the combinatorial complexity ($O(N^k)$) inherent in retrieving and composing inspirations from a vast knowledge base. To break this barrier, we introduce MOOSE-Star, a unified framework enabling tractable training and scalable inference. In the best case, MOOSE-Star reduces complexity from exponential to logarithmic ($O(\log N)$) by (1) training on decomposed subtasks derived from the probabilistic equation of discovery, (2) employing motivation-guided hierarchical search to enable logarithmic retrieval and prune irrelevant subspaces, and (3) utilizing bounded composition for robustness against retrieval noise. To facilitate this, we release TOMATO-Star, a dataset of 108,717 decomposed papers (38,400 GPU hours) for training. Furthermore, we show that while brute-force sampling hits a ''complexity wall,'' MOOSE-Star exhibits continuous test-time scaling.


【20】Why Do Unlearnable Examples Work: A Novel Perspective of Mutual Information
标题:不可学习的例子为何有效:互信息的新视角
链接:https://arxiv.org/abs/2603.03725

作者:Yifan Zhu,Yibo Miao,Yinpeng Dong,Xiao-Shan Gao
备注:32 pages, ICLR 2026
摘要:The volume of freely scraped data on the Internet has driven the tremendous success of deep learning. Along with this comes the growing concern about data privacy and security. Numerous methods for generating unlearnable examples have been proposed to prevent data from being illicitly learned by unauthorized deep models by impeding generalization. However, the existing approaches primarily rely on empirical heuristics, making it challenging to enhance unlearnable examples with solid explanations. In this paper, we analyze and improve unlearnable examples from a novel perspective: mutual information reduction. We demonstrate that effective unlearnable examples always decrease mutual information between clean features and poisoned features, and when the network gets deeper, the unlearnability goes better together with lower mutual information. Further, we prove from a covariance reduction perspective that minimizing the conditional covariance of intra-class poisoned features reduces the mutual information between distributions. Based on the theoretical results, we propose a novel unlearnable method called Mutual Information Unlearnable Examples (MI-UE) that reduces covariance by maximizing the cosine similarity among intra-class features, thus impeding the generalization effectively. Extensive experiments demonstrate that our approach significantly outperforms the previous methods, even under defense mechanisms.


【21】A Stein Identity for q-Gaussians with Bounded Support
标题:有有限支持的q-高斯论者的斯坦身份
链接:https://arxiv.org/abs/2603.03673

作者:Sophia Sklaviadis,Thomas Moellenhoff,Andre F. T. Martins,Mario A. T. Figueiredo,Mohammad Emtiyaz Khan
摘要:Stein's identity is a fundamental tool in machine learning with applications in generative models, stochastic optimization, and other problems involving gradients of expectations under Gaussian distributions. Less attention has been paid to problems with non-Gaussian expectations. Here, we consider the class of bounded-support $q$-Gaussians and derive a new Stein identity leading to gradient estimators which have nearly identical forms to the Gaussian ones, and which are similarly easy to implement. We do this by extending the previous results of Landsman, Vanduffel, and Yao (2013) to prove new Bonnet- and Price-type theorems for q-Gaussians. We also simplify their forms by using escort distributions. Our experiments show that bounded-support distributions can reduce the variance of gradient estimators, which can potentially be useful for Bayesian deep learning and sharpness-aware minimization. Overall, our work simplifies the application of Stein's identity for an important class of non-Gaussian distributions.


【22】Machine Pareidolia: Protecting Facial Image with Emotional Editing
标题:机器偏执狂:通过情感编辑保护面部图像
链接:https://arxiv.org/abs/2603.03665

作者:Binh M. Le,Simon S. Woo
备注:Proceedings of the AAAI Conference on Artificial Intelligence 40
摘要:The proliferation of facial recognition (FR) systems has raised privacy concerns in the digital realm, as malicious uses of FR models pose a significant threat. Traditional countermeasures, such as makeup style transfer, have suffered from low transferability in black-box settings and limited applicability across various demographic groups, including males and individuals with darker skin tones. To address these challenges, we introduce a novel facial privacy protection method, dubbed \textbf{MAP}, a pioneering approach that employs human emotion modifications to disguise original identities as target identities in facial images. Our method uniquely fine-tunes a score network to learn dual objectives, target identity and human expression, which are jointly optimized through gradient projection to ensure convergence at a shared local optimum. Additionally, we enhance the perceptual quality of protected images by applying local smoothness regularization and optimizing the score matching loss within our network. Empirical experiments demonstrate that our innovative approach surpasses previous baselines, including noise-based, makeup-based, and freeform attribute methods, in both qualitative fidelity and quantitative metrics. Furthermore, MAP proves its effectiveness against an online FR API and shows advanced adaptability in uncommon photographic scenarios.


【23】Extending Neural Operators: Robust Handling of Functions Beyond the Training Set
标题:扩展神经运算符:训练集之外的功能的稳健处理
链接:https://arxiv.org/abs/2603.03621

作者:Blaine Quackenbush,Paul J. Atzberger
备注:related open source software see https://web.atzberger.org/
摘要:We develop a rigorous framework for extending neural operators to handle out-of-distribution input functions. We leverage kernel approximation techniques and provide theory for characterizing the input-output function spaces in terms of Reproducing Kernel Hilbert Spaces (RKHSs). We provide theorems on the requirements for reliable extensions and their predicted approximation accuracy. We also establish formal relationships between specific kernel choices and their corresponding Sobolev Native Spaces. This connection further allows the extended neural operators to reliably capture not only function values but also their derivatives. Our methods are empirically validated through the solution of elliptic partial differential equations (PDEs) involving operators on manifolds having point-cloud representations and handling geometric contributions. We report results on key factors impacting the accuracy and computational performance of the extension approaches.


【24】Why Are Linear RNNs More Parallelizable?
标题:为什么线性RNN更可并行化?
链接:https://arxiv.org/abs/2603.03612

作者:William Merrill,Hongjian Jiang,Yanhong Li,Ashish Sabharwal
摘要 :The community is increasingly exploring linear RNNs (LRNNs) as language models, motivated by their expressive power and parallelizability. While prior work establishes the expressivity benefits of LRNNs over transformers, it is unclear what makes LRNNs -- but not traditional, nonlinear RNNs -- as easy to parallelize in practice as transformers. We answer this question by providing a tight connection between types of RNNs and standard complexity classes. We show that LRNNs can be viewed as log-depth (bounded fan-in) arithmetic circuits, which represents only a slight depth overhead relative to log-depth boolean circuits that transformers admit. Furthermore, we show that nonlinear RNNs can solve $\mathsf{L}$-complete problems (and even $\mathsf{P}$-complete ones, under polynomial precision), revealing a fundamental barrier to parallelizing them as efficiently as transformers. Our theory also identifies fine-grained expressivity differences between recent popular LRNN variants: permutation-diagonal LRNNs are $\mathsf{NC}^1$-complete whereas diagonal-plus-low-rank LRNNs are more expressive ($\mathsf{PNC}^1$-complete). We provide further insight by associating each type of RNN with a corresponding automata-theoretic model that it can simulate. Together, our results reveal fundamental tradeoffs between nonlinear RNNs and different variants of LRNNs, providing a foundation for designing LLM architectures that achieve an optimal balance between expressivity and parallelism.


【25】SENTINEL: Stagewise Integrity Verification for Pipeline Parallel Decentralized Training
标题:SENTINEL:管道并行分散训练的阶段完整性验证
链接:https://arxiv.org/abs/2603.03592

作者:Hadi Mohaghegh Dolatabadi,Thalaiyasingam Ajanthan,Sameera Ramasinghe,Chamin P Hewa Koneputugodage,Gil Avraham,Yan Zuo,Violetta Shevchenko,Alexander Long
备注:70 pages, 22 figures, 20 tables
摘要:Decentralized training introduces critical security risks when executed across untrusted, geographically distributed nodes. While existing Byzantine-tolerant literature addresses data parallel (DP) training through robust aggregation methods, pipeline parallelism (PP) presents fundamentally distinct challenges. In PP, model layers are distributed across workers where the activations and their gradients flow between stages rather than being aggregated, making traditional DP approaches inapplicable. We propose SENTINEL, a verification mechanism for PP training without computation duplication. SENTINEL employs lightweight momentum-based monitoring using exponential moving averages (EMAs) to detect corrupted inter-stage communication. Unlike existing Byzantine-tolerant approaches for DP that aggregate parameter gradients across replicas, our approach verifies sequential activation/gradient transmission between layers. We provide theoretical convergence guarantees for this new setting that recovers classical convergence rates when relaxed to standard training. Experiments demonstrate successful training of up to 4B-parameter LLMs across untrusted distributed environments with up to 176 workers while maintaining model convergence and performance.


【26】stratum: A System Infrastructure for Massive Agent-Centric ML Workloads
标题:strata:用于大规模以代理为中心的ML工作负载的系统基础设施
链接:https://arxiv.org/abs/2603.03589

作者:Arnab Phani,Elias Strauss,Sebastian Schelter
摘要:Recent advances in large language models (LLMs) transform how machine learning (ML) pipelines are developed and evaluated. LLMs enable a new type of workload, agentic pipeline search, in which autonomous or semi-autonomous agents generate, validate, and optimize complete ML pipelines. These agents predominantly operate over popular Python ML libraries and exhibit highly exploratory behavior. This results in thousands of executions for data profiling, pipeline generation, and iterative refinement of pipeline stages. However, the existing Python-based ML ecosystem is built around libraries such as Pandas and scikit-learn, which are designed for human-centric, interactive, sequential workflows and remain constrained by Python's interpretive execution model, library-level isolation, and limited runtime support for executing large numbers of pipelines. Meanwhile, many high-performance ML systems proposed by the systems community either target narrow workload classes or require specialized programming models, which limits their integration with the Python ML ecosystem and makes them largely ill-suited for LLM-based agents. This growing mismatch exposes a fundamental systems challenge in supporting agentic pipeline search at scale. We therefore propose stratum, a unified system infrastructure that decouples pipeline execution from planning and reasoning during agentic pipeline search. Stratum integrates seamlessly with existing Python libraries, compiles batches of pipelines into optimized execution graphs, and efficiently executes them across heterogeneous backends, including a novel Rust-based runtime. We present stratum's architectural vision along with an early prototype, discuss key design decisions, and outline open challenges and research directions. Finally, preliminary experiments show that stratum can significantly speed up large-scale agentic pipeline search up to 16.6x.


【27】Build, Judge, Optimize: A Blueprint for Continuous Improvement of Multi-Agent Consumer Assistants
标题:构建、判断、优化:多代理消费者助理持续改进的蓝图
链接:https://arxiv.org/abs/2603.03565

作者:Alejandro Breen Herrera,Aayush Sheth,Steven G. Xu,Zhucheng Zhan,Charles Wright,Marcus Yearwood,Hongtai Wei,Sudeep Das
摘要:Conversational shopping assistants (CSAs) represent a compelling application of agentic AI, but moving from prototype to production reveals two underexplored challenges: how to evaluate multi-turn interactions and how to optimize tightly coupled multi-agent systems. Grocery shopping further amplifies these difficulties, as user requests are often underspecified, highly preference-sensitive, and constrained by factors such as budget and inventory. In this paper, we present a practical blueprint for evaluating and optimizing conversational shopping assistants, illustrated through a production-scale AI grocery assistant. We introduce a multi-faceted evaluation rubric that decomposes end-to-end shopping quality into structured dimensions and develop a calibrated LLM-as-judge pipeline aligned with human annotations. Building on this evaluation foundation, we investigate two complementary prompt-optimization strategies based on a SOTA prompt-optimizer called GEPA (Shao et al., 2025): (1) Sub-agent GEPA, which optimizes individual agent nodes against localized rubrics, and (2) MAMuT (Multi-Agent Multi-Turn) GEPA (Herrera et al., 2026), a novel system-level approach that jointly optimizes prompts across agents using multi-turn simulation and trajectory-level scoring. We release rubric templates and evaluation design guidance to support practitioners building production CSAs.


【28】Online Learnability of Chain-of-Thought Verifiers: Soundness and Completeness Trade-offs
标题:思想链验证者的在线可学习性:健全性和完整性的权衡
链接:https://arxiv.org/abs/2603.03538

作者:Maria-Florina Balcan,Avrim Blum,Kiriaki Fragkia,Zhiyuan Li,Dravyansh Sharma
摘要:Large language models with chain-of-thought generation have demonstrated great potential for producing complex mathematical proofs. However, their reasoning can often go astray, leading to increasing interest in formal and learned verifiers. A major challenge in learning verifiers, especially when their output will be used by the prover, is that this feedback loop may produce substantial distribution shift. Motivated by this challenge, we propose an online learning framework for learning chain-of-thought verifiers that, given a problem and a sequence of reasoning steps, check the correctness of the solution. Highlighting the asymmetric role of soundness (failure in catching errors in a proof) and completeness (flagging correct proofs as wrong) mistakes of the verifier, we introduce novel extensions of the Littlestone dimension which tightly characterize the mistake bounds for learning a verifier in the realizable setting. We provide optimal algorithms for finding the Pareto-frontier (the smallest total number of mistakes given a budget of soundness mistakes) as well as minimizing a linear combination of asymmetric costs. We further show how our learned verifiers can be used to boost the accuracy of a collection of weak provers, and enable generation of proofs beyond what they were trained on. With the mild assumption that one of the provers can generate the next reasoning step correctly with some minimal probability, we show how to learn a strong prover with small error and abstention rates.


【29】Trade-offs in Ensembling, Merging and Routing Among Parameter-Efficient Experts
标题:参数高效专家之间整合、合并和路由的权衡
链接:https://arxiv.org/abs/2603.03535

作者:Sanae Lotfi,Lucas Caccia,Alessandro Sordoni,Jordan T. Ash,Miroslav Dudik
摘要:While large language models (LLMs) fine-tuned with lightweight adapters achieve strong performance across diverse tasks, their performance on individual tasks depends on the fine-tuning strategy. Fusing independently trained models with different strengths has shown promise for multi-task learning through three main strategies: ensembling, which combines outputs from independent models; merging, which fuses model weights via parameter averaging; and routing, which integrates models in an input-dependent fashion. However, many design decisions in these approaches remain understudied, and the relative benefits of more sophisticated ensembling, merging and routing techniques are not fully understood. We empirically evaluate their trade-offs, addressing two key questions: What are the advantages of going beyond uniform ensembling or merging? And does the flexibility of routing justify its complexity? Our findings indicate that non-uniform ensembling and merging improve performance, but routing offers even greater gains. To mitigate the computational cost of routing, we analyze expert selection techniques, showing that clustering and greedy subset selection can maintain reasonable performance with minimal overhead. These insights advance our understanding of model fusion for multi-task learning.


【30】When Small Variations Become Big Failures: Reliability Challenges in Compute-in-Memory Neural Accelerators
标题:当小变化变成大故障时:内存计算机神经加速器的可靠性挑战
链接:https://arxiv.org/abs/2603.03491

作者:Yifan Qin,Jiahao Zheng,Zheyu Yan,Wujie Wen,Xiaobo Sharon Hu,Yiyu Shi
备注:2026 International VLSI Symposium on Technology, Systems and Applications (VLSI TSA)
摘要:Compute-in-memory (CiM) architectures promise significant improvements in energy efficiency and throughput for deep neural network acceleration by alleviating the von Neumann bottleneck. However, their reliance on emerging non-volatile memory devices introduces device-level non-idealities-such as write variability, conductance drift, and stochastic noise-that fundamentally challenge reliability, predictability, and safety, especially in safety-critical applications. This talk examines the reliability limits of CiM-based neural accelerators and presents a series of techniques that bridge device physics, architecture, and learning algorithms to address these challenges. We first demonstrate that even small device variations can lead to disproportionately large accuracy degradation and catastrophic failures in safety-critical inference workloads, revealing a critical gap between average-case evaluations and worst-case behavior. Building on this insight, we introduce SWIM, a selective write-verify mechanism that strategically applies verification only where it is most impactful, significantly improving reliability while maintaining CiM's efficiency advantages. Finally, we explore a learning-centric solution that improves realistic worst-case performance by training neural networks with right-censored Gaussian noise, aligning training assumptions with hardware-induced variability and enabling robust deployment without excessive hardware overhead. Together, these works highlight the necessity of cross-layer co-design for CiM accelerators and provide a principled path toward dependable, efficient neural inference on emerging memory technologies-paving the way for their adoption in safety- and reliability-critical systems.


【31】[Re] FairDICE: A Gap Between Theory And Practice
标题:[Re]FairDICE:理论与实践之间的差距
链接:https://arxiv.org/abs/2603.03454

作者:Peter Adema,Karim Galliamov,Aleksey Evstratovskiy,Ross Geurts
备注:12 pages, 8 figures in main text. Code at https://github.com/p-adema/re-fairdice
摘要:Offline Reinforcement Learning (RL) is an emerging field of RL in which policies are learned solely from demonstrations. Within offline RL, some environments involve balancing multiple objectives, but existing multi-objective offline RL algorithms do not provide an efficient way to find a fair compromise. FairDICE (see arXiv:2506.08062v2) seeks to fill this gap by adapting OptiDICE (an offline RL algorithm) to automatically learn weights for multiple objectives to e.g.\ incentivise fairness among objectives. As this would be a valuable contribution, this replication study examines the replicability of claims made regarding FairDICE. We find that many theoretical claims hold, but an error in the code reduces FairDICE to standard behaviour cloning in continuous environments, and many important hyperparameters were originally underspecified. After rectifying this, we show in experiments extending the original paper that FairDICE can scale to complex environments and high-dimensional rewards, though it can be reliant on (online) hyperparameter tuning. We conclude that FairDICE is a theoretically interesting method, but the experimental justification requires significant revision.


【32】A Short Note on a Variant of the Squint Algorithm
标题:斜视算法变体的简短注释
链接 :https://arxiv.org/abs/2603.03409

作者:Haipeng Luo
摘要:This short note describes a simple variant of the Squint algorithm of Koolen and Van Erven [2015] for the classic expert problem. Via an equally simple modification of their proof, we prove that this variant ensures a regret bound that resembles the one shown in a recent work by Freund et al. [2026] for a variant of the NormalHedge algorithm [Chaudhuri et al., 2009].


【33】Heterogeneous Time Constants Improve Stability in Equilibrium Propagation
标题:非均匀时间常数提高平衡传播的稳定性
链接:https://arxiv.org/abs/2603.03402

作者:Yoshimasa Kubo,Suhani Pragnesh Modi,Smit Patel
摘要:Equilibrium propagation (EP) is a biologically plausible alternative to backpropagation for training neural networks. However, existing EP models use a uniform scalar time step dt, which corresponds biologically to a membrane time constant that is heterogeneous across neurons. Here, we introduce heterogeneous time steps (HTS) for EP by assigning neuron-specific time constants drawn from biologically motivated distributions. We show that HTS improves training stability while maintaining competitive task performance. These results suggest that incorporating heterogeneous temporal dynamics enhances both the biological realism and robustness of equilibrium propagation.


【34】AOI: Turning Failed Trajectories into Training Signals for Autonomous Cloud Diagnosis
标题:AOI:将失败的轨迹转化为自主云诊断的训练信号
链接:https://arxiv.org/abs/2603.03378

作者:Pei Yang,Wanyi Chen,Yuxi Zheng,Xueqian Li,Xiang Li,Haoqin Tu,Jie Xiao,Yifan Pang,Bill Shi,Lynn Ai,Eric Yang
摘要:Large language model (LLM) agents offer a promising data-driven approach to automating Site Reliability Engineering (SRE), yet their enterprise deployment is constrained by three challenges: restricted access to proprietary data, unsafe action execution under permission-governed environments, and the inability of closed systems to improve from failures. We present AOI (Autonomous Operations Intelligence), a trainable multi-agent framework formulating automated operations as a structured trajectory learning problem under security constraints. Our approach integrates three key components. First, a trainable diagnostic system applies Group Relative Policy Optimization (GRPO) to distill expert-level knowledge into locally deployed open-source models, enabling preference-based learning without exposing sensitive data. Second, a read-write separated execution architecture decomposes operational trajectories into observation, reasoning, and action phases, allowing safe learning while preventing unauthorized state mutation. Third, a Failure Trajectory Closed-Loop Evolver mines unsuccessful trajectories and converts them into corrective supervision signals, enabling continual data augmentation. Evaluated on the AIOpsLab benchmark, our contributions yield cumulative gains. (1) The AOI runtime alone achieves 66.3% best@5 success on all 86 tasks, outperforming the prior state-of-the-art (41.9%) by 24.4 points. (2) Adding Observer GRPO training, a locally deployed 14B model reaches 42.9% avg@1 on 63 held-out tasks with unseen fault types, surpassing Claude Sonnet 4.5. (3) The Evolver converts 37 failed trajectories into diagnostic guidance, improving end-to-end avg@5 by 4.8 points while reducing variance by 35%.


【35】Fine-Tuning and Evaluating Conversational AI for Agricultural Advisory
标题:微调和评估农业咨询对话人工智能
链接:https://arxiv.org/abs/2603.03294

作者:Sanyam Singh,Naga Ganesh,Vineet Singh,Lakshmi Pedapudi,Ritesh Kumar,SSP Jyothi,Archana Karanam,C. Yashoda,Mettu Vijaya Rekha Reddy,Shesha Phani Debbesa,Chandan Dash
备注:22 pages, 5 figures, 9 tables
摘要:Large Language Models show promise for agricultural advisory, yet vanilla models exhibit unsupported recommendations, generic advice lacking specific, actionable detail, and communication styles misaligned with smallholder farmer needs. In high stakes agricultural contexts, where recommendation accuracy has direct consequences for farmer outcomes, these limitations pose challenges for responsible deployment. We present a hybrid LLM architecture that decouples factual retrieval from conversational delivery: supervised fine-tuning with LoRA on expert-curated GOLDEN FACTS (atomic, verified units of agricultural knowledge) optimizes fact recall, while a separate stitching layer transforms retrieved facts into culturally appropriate, safety-aware responses. Our evaluation framework, DG-EVAL, performs atomic fact verification (measuring recall, precision, and contradiction detection) against expert-curated ground truth rather than Wikipedia or retrieved documents. Experiments across multiple model configurations on crops and queries from Bihar, India show that fine-tuning on curated data substantially improves fact recall and F1, while maintaining high relevance. Using a fine-tuned smaller model achieves comparable or better factual quality at a fraction of the cost of frontier models. A stitching layer further improves safety subscores while maintaining high conversational quality. We release the farmerchat-prompts library to enable reproducible development of domain-specific agricultural AI.


【36】Beyond Mixtures and Products for Ensemble Aggregation: A Likelihood Perspective on Generalized Means
标题:超越整体聚合的混合物和产品:广义平均值的可能视角
链接:https://arxiv.org/abs/2603.04204

作者:Raphaël Razafindralambo,Rémy Sun,Frédéric Precioso,Damien Garreau,Pierre-Alexandre Mattei
摘要 :Density aggregation is a central problem in machine learning, for instance when combining predictions from a Deep Ensemble. The choice of aggregation remains an open question with two commonly proposed approaches being linear pooling (probability averaging) and geometric pooling (logit averaging). In this work, we address this question by studying the normalized generalized mean of order $r \in \mathbb{R} \cup \{-\infty,+\infty\}$ through the lens of log-likelihood, the standard evaluation criterion in machine learning. This provides a unifying aggregation formalism and shows different optimal configurations for different situations. We show that the regime $r \in [0,1]$ is the only range ensuring systematic improvements relative to individual distributions, thereby providing a principled justification for the reliability and widespread practical use of linear ($r=1$) and geometric ($r=0$) pooling. In contrast, we show that aggregation rules with $r \notin [0,1]$ may fail to provide consistent gains with explicit counterexamples. Finally, we corroborate our theoretical findings with empirical evaluations using Deep Ensembles on image and text classification benchmarks.


【37】Stable and Steerable Sparse Autoencoders with Weight Regularization
标题:具有权值正则化的稳定可控稀疏自编码器
链接:https://arxiv.org/abs/2603.04198

作者:Piotr Jedryszek,Oliver M. Crook
摘要:Sparse autoencoders (SAEs) are widely used to extract human-interpretable features from neural network activations, but their learned features can vary substantially across random seeds and training choices. To improve stability, we studied weight regularization by adding L1 or L2 penalties on encoder and decoder weights, and evaluate how regularization interacts with common SAE training defaults. On MNIST, we observe that L2 weight regularization produces a core of highly aligned features and, when combined with tied initialization and unit-norm decoder constraints, it dramatically increases cross-seed feature consistency. For TopK SAEs trained on language model activations (Pythia-70M-deduped), adding a small L2 weight penalty increased the fraction of features shared across three random seeds and roughly doubles steering success rates, while leaving the mean of automated interpretability scores essentially unchanged. Finally, in the regularized setting, activation steering success becomes better predicted by auto-interpretability scores, suggesting that regularization can align text-based feature explanations with functional controllability.


【38】Invariance-Based Dynamic Regret Minimization
标题:基于不变性的动态后悔最小化算法
链接:https://arxiv.org/abs/2603.03843

作者:Margherita Lazzaretto,Jonas Peters,Niklas Pfister
备注:32 pages, 7 figures
摘要:We consider stochastic non-stationary linear bandits where the linear parameter connecting contexts to the reward changes over time. Existing algorithms in this setting localize the policy by gradually discarding or down-weighting past data, effectively shrinking the time horizon over which learning can occur. However, in many settings historical data may still carry partial information about the reward model. We propose to leverage such data while adapting to changes, by assuming the reward model decomposes into stationary and non-stationary components. Based on this assumption, we introduce ISD-linUCB, an algorithm that uses past data to learn invariances in the reward model and subsequently exploits them to improve online performance. We show both theoretically and empirically that leveraging invariance reduces the problem dimensionality, yielding significant regret improvements in fast-changing environments when sufficient historical data is available.


【39】Scalable Contrastive Causal Discovery under Unknown Soft Interventions
标题:未知软干预下的可扩展对比因果发现
链接:https://arxiv.org/abs/2603.03411

作者:Mingxuan Zhang,Khushi Desai,Sopho Kevlishvili,Elham Azizi
摘要:Observational causal discovery is only identifiable up to the Markov equivalence class. While interventions can reduce this ambiguity, in practice interventions are often soft with multiple unknown targets. In many realistic scenarios, only a single intervention regime is observed. We propose a scalable causal discovery model for paired observational and interventional settings with shared underlying causal structure and unknown soft interventions. The model aggregates subset-level PDAGs and applies contrastive cross-regime orientation rules to construct a globally consistent maximal PDAG under Meek closure, enabling generalization to both in-distribution and out-of-distribution settings. Theoretically, we prove that our model is sound with respect to a restricted $Ψ$ equivalence class induced solely by the information available in the subset-restricted setting. We further show that the model asymptotically recovers the corresponding identifiable PDAG and can orient additional edges compared to non-contrastive subset-restricted methods. Experiments on synthetic data demonstrate improved causal structure recovery, generalization to unseen graphs with held-out causal mechanisms, and scalability to larger graphs, with ablations supporting the theoretical results.


【40】Surprisal-Rényi Free Energy
标题:惊喜-雷尼自由能源
链接:https://arxiv.org/abs/2603.03405

作者:Shion Matsumoto,Raul Castillo,Benjamin Prada,Ankur Arjun Mali
摘要:The forward and reverse Kullback-Leibler (KL) divergences arise as limiting objectives in learning and inference yet induce markedly different inductive biases that cannot be explained at the level of expectations alone. In this work, we introduce the Surprisal-Rényi Free Energy (SRFE), a log-moment-based functional of the likelihood ratio that lies outside the class of $f$-divergences. We show that SRFE recovers forward and reverse KL divergences as singular endpoint limits and derive local expansions around both limits in which the variance of the log-likelihood ratio appears as a first-order correction. This reveals an explicit mean-variance tradeoff governing departures from KL-dominated regimes. We further establish a Gibbs-type variational characterization of SRFE as the unique minimizer of a weighted sum of KL divergences and prove that SRFE directly controls large deviations of excess code-length via Chernoff-type bounds, yielding a precise Minimum Description Length interpretation. Together, these results identify SRFE as a variance- and tail-sensitive free-energy functional that clarifies the geometric and large-deviation structure underlying forward and reverse KL limits, without unifying or subsuming distinct learning frameworks.


【41】The Theory behind UMAP?
标题:UMAP背后的理论?
链接:https://arxiv.org/abs/2603.03375

作者:David Wegmann
备注:This article is derived from my masters thesis
摘要:In 2018, McInnes et al. introduced a dimensionality reduction algorithm called UMAP, which enjoys wide popularity among data scientists. Their work introduces a finite variant of a functor called the metric realization, based on an unpublished draft by Spivak. This draft contains many errors, most of which are reproduced by McInnes et al. and subsequent publications. This article aims to repair these errors and provide a self-contained document with the full derivation of Spivak's functors and McInnes et al.'s finite variant. We contribute an explicit description of the metric realization and related functors. At the end, we discuss the UMAP algorithm, as well as claims about properties of the algorithm and the correspondence of McInnes et al.'s finite variant to the UMAP algorithm.


机器翻译由腾讯交互翻译提供,仅供参考

点击“阅读原文”获取带摘要的学术速递

Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/193590