机器学习学术速递[3.16]

点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.LG 方向，今日共计150篇

大模型相关(22篇)

【1】ESPIRE: A Diagnostic Benchmark for Embodied Spatial Reasoning of Vision-Language Models
标题：ESPIRE：视觉语言模型并行空间推理的诊断基准
链接：https://arxiv.org/abs/2603.13033

作者：Yanpeng Zhao,Wentao Ding,Hongtao Li,Baoxiong Jia,Zilong Zheng
摘要：视觉语言模型（VLM）最近的一个趋势是增强他们对具体领域的空间认知。尽管取得了进展，但现有的评价在模式和覆盖面方面都很有限，阻碍了快速、反复的模式开发。为了解决这些局限性，我们提出了ESPIRE，体现空间推理的诊断基准。ESPIRE提供了一个模拟的世界，物理地面VLM和评估他们的空间推理为中心的机器人任务，从而缩小评估和现实世界的部署之间的差距。为了使VLM适应机器人任务，我们将每个任务分解为定位和执行，并将两者框定为生成问题，这与主要的判别式评估（例如，通过视觉问题回答），依赖于干扰物并放弃执行。这种分解进一步实现了细粒度的分析，超越了被动的空间推理，走向行动推理。我们系统地设计ESPIRE在指令级和环境级，确保广泛覆盖的空间推理场景。我们使用ESPIRE来诊断一系列前沿VLMs，并对其空间推理行为进行深入分析。
摘要：A recent trend in vision-language models (VLMs) has been to enhance their spatial cognition for embodied domains. Despite progress, existing evaluations have been limited both in paradigm and in coverage, hindering rapid, iterative model development. To address these limitations, we propose ESPIRE, a diagnostic benchmark for embodied spatial reasoning. ESPIRE offers a simulated world that physically grounds VLMs and evaluates them on spatial-reasoning-centric robotic tasks, thus narrowing the gap between evaluation and real-world deployment. To adapt VLMs to robotic tasks, we decompose each task into localization and execution, and frame both as generative problems, in stark contrast to predominant discriminative evaluations (e.g., via visual-question answering) that rely on distractors and discard execution. This decomposition further enables a fine-grained analysis beyond passive spatial reasoning toward reasoning to act. We systematically design ESPIRE both at the instruction level and at the environment level, ensuring broad coverage of spatial reasoning scenarios. We use ESPIRE to diagnose a range of frontier VLMs and provide in-depth analysis of their spatial reasoning behaviors.

【2】Dependency-Aware Parallel Decoding via Attention for Diffusion LLMs
标题：关注扩散LLM的依赖性感知并行解码
链接：https://arxiv.org/abs/2603.12996

作者：Bumjun Kim,Dongjae Jeon,Moongyu Jeon,Albert No
摘要：扩散LLM（dLLM）的并行解码是困难的，因为每个去噪步骤仅提供令牌方式的边缘分布，而同时揭露多个令牌需要考虑令牌间的依赖性。我们提出了依赖感知并行解码（DAPD），这是一种简单的，无需训练的解码方法，它使用自注意力来诱导屏蔽令牌上的条件依赖图。在每次迭代中，该图中的边捕获强令牌交互，而非边指示弱依赖性。然后，并行解码被简化为选择图上的独立集合，并并行地对所选择的令牌进行解蔽。这避免了在没有辅助模型或再训练的情况下共同更新强耦合令牌。LLaDA和Dream上的实验表明，DAPD改进了现有方法的精度-步骤权衡，并使更多的全球分布式并行更新，更好地利用dLLM的任意阶生成能力。
摘要：Parallel decoding for diffusion LLMs (dLLMs) is difficult because each denoising step provides only token-wise marginal distributions, while unmasking multiple tokens simultaneously requires accounting for inter-token dependencies. We propose Dependency-Aware Parallel Decoding (DAPD), a simple, training-free decoding method that uses self-attention to induce a conditional dependency graph over masked tokens. At each iteration, edges in this graph capture strong token interactions, while non-edges indicate weak dependence. Parallel decoding is then reduced to selecting an independent set on the graph and unmasking the selected tokens in parallel. This avoids co-updating strongly coupled tokens without auxiliary models or retraining. Experiments on LLaDA and Dream show that DAPD improves the accuracy-steps trade-off over existing methods and enables more globally distributed parallel updates that better exploit the any-order generation capability of dLLMs.

【3】Test-time RL alignment exposes task familiarity artifacts in LLM benchmarks
标题：测试时RL对齐暴露LLM基准测试中的任务熟悉性工件
链接：https://arxiv.org/abs/2603.12875

作者：Kun Wang,Reinhard Heckel
摘要：在基准上直接评估LLM可能会产生误导，因为相对较强的表现可能反映了任务的熟悉程度而不是能力。测试前训练方法通过在评估之前给每个模型任务相关的训练来控制任务熟悉度，最初是通过监督微调。然而，合适的训练数据往往很难获得，并且评估结果随所选择的数据而变化。在本文中，我们提出了一个两阶段的测试时间强化学习（RL）对齐方法的训练前测试。首先，具有单个样本的RL提供了模型与任务格式的第一次对齐，其次，具有多数投票奖励的测试时RL将模型与基准分布对齐。我们的测试时RL对齐方法与基于SFT的测试前训练方法相似，但不需要特定于任务的训练集。在没有训练数据的特定领域基准测试中，我们发现直接评估低估了基本模型，这些模型在对齐后表现得更好，从而对其能力进行了更忠实的评估。此外，对于推理任务，微调模型和它们的基础模型之间的性能差距在对齐后基本上消失了，这表明文献中报道的RLVR/SFT的许多收益不是推理能力的差异，而是任务熟悉度的工件。
摘要：Direct evaluation of LLMs on benchmarks can be misleading because comparatively strong performance may reflect task familiarity rather than capability. The train-before-test approach controls for task familiarity by giving each model task-relevant training before evaluation, originally through supervised finetuning. However, suitable training data is often hard to come by, and evaluation results vary with the data chosen. In this paper, we propose a two-stage test-time reinforcement learning (RL) alignment method for train-before-test. First, RL with a single sample provides a first alignment of the model to the task format, and second, test-time RL with majority-voting reward aligns the model to the benchmark distribution. Our test-time RL alignment method aligns similarly well as SFT-based train-before test, but without requiring a task-specific training set. On a domain-specific benchmark without training data, we show that direct evaluation underestimates base models which perform substantially better once aligned, yielding a more faithful evaluation of their capabilities. Moreover, for reasoning tasks, the performance gap between fine-tuned models and their base models largely disappears after alignment, suggesting that many gains from RLVR/SFT reported in the literature are not a difference in reasoning capability, but rather artifacts of task familiarity.

【4】PVI: Plug-in Visual Injection for Vision-Language-Action Models
标题：PVI：视觉-语言-动作模型的插件视觉注入
链接：https://arxiv.org/abs/2603.12772

作者：Zezhou Zhang,Songxin Zhang,Xiao Xiong,Junjie Zhang,Zejian Xie,Jingyi Xi,Zunyao Mao,Zan Mao,Zhixin Mai,Zhuoyang Song,Jiaxing Zhang
摘要：将预训练的VLM与流匹配动作专家配对的VLA架构已经成为语言条件操作的强大范例。然而，VLM，优化语义抽象和静态视觉观察的条件下，往往会衰减细粒度的几何线索，往往缺乏明确的时间证据的行动专家。之前的工作通过注入辅助视觉功能来缓解这一点，但现有的方法要么专注于静态空间表示，要么需要大量的架构修改以适应时间输入，从而使时间信息未被充分利用。我们提出了插件视觉注入（PVI），一个轻量级的，编码器不可知的模块，连接到一个预先训练的动作专家，并通过零初始化的残差路径注入辅助视觉表示，保留预先训练的行为，只有单阶段微调。使用PVI，我们获得了一致的收益超过基本政策和一系列有竞争力的替代注入策略，我们的对照研究表明，时间视频功能（V-JEPA 2）优于强静态图像功能（DINOv 2），在需要状态跟踪和协调的多阶段任务中获得最大收益。实际机器人长视野双手折叠布的实验进一步证明了PVI超越模拟的实用性。
摘要：VLA architectures that pair a pretrained VLM with a flow-matching action expert have emerged as a strong paradigm for language-conditioned manipulation. Yet the VLM, optimized for semantic abstraction and typically conditioned on static visual observations, tends to attenuate fine-grained geometric cues and often lacks explicit temporal evidence for the action expert. Prior work mitigates this by injecting auxiliary visual features, but existing approaches either focus on static spatial representations or require substantial architectural modifications to accommodate temporal inputs, leaving temporal information underexplored. We propose Plug-in Visual Injection (PVI), a lightweight, encoder-agnostic module that attaches to a pretrained action expert and injects auxiliary visual representations via zero-initialized residual pathways, preserving pretrained behavior with only single-stage fine-tuning. Using PVI, we obtain consistent gains over the base policy and a range of competitive alternative injection strategies, and our controlled study shows that temporal video features (V-JEPA2) outperform strong static image features (DINOv2), with the largest gains on multi-phase tasks requiring state tracking and coordination. Real-robot experiments on long-horizon bimanual cloth folding further demonstrate the practicality of PVI beyond simulation.

【5】Taming the Long Tail: Efficient Item-wise Sharpness-Aware Minimization for LLM-based Recommender Systems
标题：驯服长尾：基于LLM的推荐系统的高效逐项敏锐度最小化
链接：https://arxiv.org/abs/2603.12752

作者：Jiaming Zhang,Yuyuan Li,Xiaohua Feng,Li Zhang,Longfei Li,Jun Zhou,Chaochao Chen
摘要：基于大语言模型的推荐系统（LRS）是最近出现的一种新的模式，在顺序推荐直接采用LLM作为骨干。虽然LRS表现出很强的知识利用和遵循能力，但在长期存在的长尾问题下，它们还没有得到系统的研究。在本文中，我们进行了一项实证研究，并揭示了LRS面临两种不同类型的长尾：i）先验长尾，从预训练语料库中隐式继承，ii）数据长尾，来自倾斜的推荐数据集。我们的分析表明，两者都有助于头部和尾部项目之间的性能差距，与两个头部的交叉点表现出更强的头部效应。然而，LRS的整体性能分布，特别是在尾部，仍然由数据长尾主导。为了解决这一挑战，我们提出了高效的项目清晰度感知最小化（EISAM），这是一种新的优化框架，通过自适应地在项目级别正则化损失景观来提高尾项性能。EISAM引入了一种有效的惩罚设计，捕获细粒度的项目特定的清晰度，同时保持计算的可扩展性LLM。此外，我们推导出一个广义界EISAM。我们的理论分析表明，在我们的逐项正则化下，边界以更快的速度下降，为它的有效性提供了理论支持。在三个真实数据集上的大量实验表明，EISAM在保持整体质量的同时显著提高了尾项推荐性能，为LRS中的长尾问题建立了第一个系统的解决方案。
摘要：Large Language Model-based Recommender Systems (LRSs) have recently emerged as a new paradigm in sequential recommendation by directly adopting LLMs as backbones. While LRSs demonstrate strong knowledge utilization and instruction-following abilities, they have not been systematically studied under the long-standing long-tail problem. In this paper, we conduct an empirical study and reveal that LRSs face two distinct types of long-tail: i) prior long-tail, inherited implicitly from pretraining corpora, and ii) data long-tail, originating from skewed recommendation datasets. Our analysis shows that both contribute to the performance disparity between head and tail items, with the intersection of the two heads exhibiting an even stronger head effect. Nevertheless, the overall performance distribution in LRSs, especially on the tail, remains dominated by the data long-tail. To address this challenge, we propose Efficient Item-wise Sharpness-Aware Minimization (EISAM), a novel optimization framework that improves tail-item performance by adaptively regularizing the loss landscape at the item level. EISAM introduces an efficient penalty design that captures fine-grained item-specific sharpness while maintaining computational scalability for LLMs. In addition, we derive a generalization bound for EISAM. Our theoretical analysis shows that the bound decreases at a faster rate under our item-wise regularization, offering theoretical support for its effectiveness. Extensive experiments on three real-world datasets demonstrate that EISAM significantly boosts tail-item recommendation performance while preserving overall quality, establishing the first systematic solution to the long-tail problem in LRSs.

【6】TaoBench: Do Automated Theorem Prover LLMs Generalize Beyond MathLib?
标题：TaoBench：自动化定理证明器LLM是否能推广到MathLib之外？
链接：https://arxiv.org/abs/2603.12744

作者：Alexander K Taylor,Junyi Zhang,Ethan Ji,Vigyan Sahai,Haikang Deng,Yuanzhou Chen,Yifan Yuan,Di Wu,Jia-Chen Gu,Kai-Wei Chang,Nanyun Peng,Amit Sahai,Wei Wang
摘要：自动定理证明（ATP）基准测试主要由MathLib中形式化的问题组成，因此当前的ATP培训和评估严重偏向MathLib的定义框架。然而，前沿数学通常是探索性的和原型沉重的，依赖于偏离标准库的定制结构。在这项工作中，我们评估了当前ATP系统的鲁棒性时，应用到一个新的定义框架，特别是标准库问题和定制的数学结构之间的性能差距。我们介绍TaoBench，一个来自Terence Tao的Analysis I的本科生级基准，它通过从头开始构建核心数学概念来形式化分析，而不依赖于标准Mathlib定义，以及混合从头开始和MathLib结构。为了公平的评估，我们构建了一个代理管道，它可以自动为每个问题提取一个可编译的、自包含的本地环境。为了隔离定义框架的影响，我们还将每个问题转换为数学上等价的Mathlib公式，产生成对的TaoBench-Mathlib语句进行直接比较。虽然最先进的ATP模型在MathLib框架内表现出色，但在定义上等效的Tao公式上，性能平均下降约26%。这表明，主要的瓶颈是有限的泛化跨定义框架，而不是任务的难度。因此，TaoBench突出了基准性能和适用性之间的差距，并为开发和测试更好地与研究数学保持一致的证明器提供了具体的基础。
摘要：Automated theorem proving (ATP) benchmarks largely consist of problems formalized in MathLib, so current ATP training and evaluation are heavily biased toward MathLib's definitional framework. However, frontier mathematics is often exploratory and prototype-heavy, relying on bespoke constructions that deviate from standard libraries. In this work, we evaluate the robustness of current ATP systems when applied to a novel definitional framework, specifically examining the performance gap between standard library problems and bespoke mathematical constructions. We introduce TaoBench, an undergraduate-level benchmark derived from Terence Tao's Analysis I, which formalizes analysis by constructing core mathematical concepts from scratch, without relying on standard Mathlib definitions, as well as by mixing from-scratch and MathLib constructions. For fair evaluation, we build an agentic pipeline that automatically extracts a compilable, self-contained local environment for each problem. To isolate the effect of definitional frameworks, we additionally translate every problem into a mathematically equivalent Mathlib formulation, yielding paired TaoBench-Mathlib statements for direct comparison. While state-of-the-art ATP models perform capably within the MathLib framework, performance drops by an average of roughly 26% on the definitionally equivalent Tao formulation. This indicates that the main bottleneck is limited generalization across definitional frameworks rather than task difficulty. TaoBench thus highlights a gap between benchmark performance and applicability, and provides a concrete foundation for developing and testing provers better aligned with research mathematics.

【7】SciDesignBench: Benchmarking and Improving Language Models for Scientific Inverse Design
标题：SciDesignBench：科学反向设计的基准测试和改进语言模型
链接：https://arxiv.org/abs/2603.12724

作者：David van Dijk,Ivan Vrkic
备注：35 pages, 19 figures, 9 tables
摘要：在科学和工程中，许多最重要的问题都是反问题：给定一个期望的结果，找到一个实现它的设计。评估一个候选方案是否符合规格通常是常规的;可以计算结合能，模拟反应器产率，预测药代动力学曲线。但是，在组合设计空间中搜索满足这些目标的输入从根本上来说是困难的。我们介绍了SciDesignBench，这是一个基准测试，包含了14个科学领域的520个模拟器任务和五个设置，包括单发设计、短视野反馈、长视野细化和种子设计优化。在10域共享核心子集上，最佳的zero-shot模型仅达到29.0%的成功率，尽管解析率高得多。模拟器反馈有帮助，但排行榜随着水平的变化而变化：十四行诗4.5在一轮从头设计中最强，而作品4.6在20轮模拟器接地改进后最强。提供起始种子设计再次重新洗牌排行榜，表明受约束的修改需要与不受约束的从头生成根本不同的能力。然后，我们介绍RLSF，一个模拟器反馈训练配方。RLSF调整的8B模型在三个领域将单回合成功率提高了8-17个百分点。总之，这些结果的位置仿真器接地逆设计作为科学推理的基准和实际的基板摊销昂贵的测试时间计算到模型的权重。
摘要：Many of the most important problems in science and engineering are inverse problems: given a desired outcome, find a design that achieves it. Evaluating whether a candidate meets the spec is often routine; a binding energy can be computed, a reactor yield simulated, a pharmacokinetic profile predicted. But searching a combinatorial design space for inputs that satisfy those targets is fundamentally harder. We introduce SciDesignBench, a benchmark of 520 simulator-grounded tasks across 14 scientific domains and five settings spanning single-shot design, short-horizon feedback, long-horizon refinement, and seed-design optimization. On the 10-domain shared-core subset, the best zero-shot model reaches only 29.0% success despite substantially higher parse rates. Simulator feedback helps, but the leaderboard changes with horizon: Sonnet 4.5 is strongest in one-turn de novo design, whereas Opus 4.6 is strongest after 20 turns of simulator-grounded refinement. Providing a starting seed design reshuffles the leaderboard again, demonstrating that constrained modification requires a fundamentally different capability from unconstrained de novo generation. We then introduce RLSF, a simulator-feedback training recipe. An RLSF-tuned 8B model raises single-turn success rates by 8-17 percentage points across three domains. Together, these results position simulator-grounded inverse design as both a benchmark for scientific reasoning and a practical substrate for amortizing expensive test-time compute into model weights.

【8】Cost-Efficient Multimodal LLM Inference via Cross-Tier GPU Heterogeneity
标题：通过跨层图形处理器实现经济高效的多模式LLM推理
链接：https://arxiv.org/abs/2603.12707

作者：Donglin Yu
摘要：多模态大型语言模型（MLLM）推理分为两个阶段，具有相反的硬件需求：视觉编码是计算限制的，而语言生成是内存带宽限制的。我们表明，在标准的Transformer KV缓存，模态边界（视觉编码器和语言模型之间）最大限度地减少跨设备传输之间的所有分区点，保持标准的基于阶段的执行。这里的分区将传输复杂度从$O（L * s_ctx）$ bytes（在阶段级分解下的GB级KV缓存）降低到$O（N_v * d）$ bytes（MB级嵌入），这是一个O（L）降低，其中L是Transformer深度。该结果适用于注意力机制（MHA/GQA），动态视觉分辨率和模型尺度，并且随着模型的深化，优势会增加。一个直接的含义是，现有的阶段级分解系统被约束到高带宽互连（例如，NVLink），而模态级分解使得能够在商用PCIe上进行跨层异构服务。一个封闭形式的成本模型表明，异构部署在阶段可分离的工作负载下是成本最优的（预测节省31.4%;观察到40.6%）。我们构建了HeteroServe，一个具有模态级分区和跨层调度的阶段感知运行时，并在LLaVA-1.5- 7 B和Qwen2.5-VL上对vLLM v0.3.0进行了评估。在相同的4xA 100硬件上，引擎优化可将吞吐量提高高达54%。在固定的预算下，异构集群（38 k美元）比同构基线（64 k美元）提高了37%，而不会降低延迟。
摘要：Multimodal large language model (MLLM) inference splits into two phases with opposing hardware demands: vision encoding is compute-bound, while language generation is memory-bandwidth-bound. We show that under standard transformer KV caching, the modality boundary (between vision encoder and language model) minimizes cross-device transfer among all partition points that preserve standard stage-based execution. Partitioning here reduces transfer complexity from $O(L * s_ctx)$ bytes (GB-scale KV caches under stage-level disaggregation) to $O(N_v * d)$ bytes (MB-scale embeddings), an O(L) reduction where L is the transformer depth. The result holds across attention mechanisms (MHA/GQA), dynamic vision resolutions, and model scales, and the advantage grows as models deepen. A direct implication is that existing stage-level disaggregation systems are constrained to high-bandwidth interconnects (e.g., NVLink), whereas modality-level disaggregation enables cross-tier heterogeneous serving over commodity PCIe. A closed-form cost model shows that heterogeneous deployment is cost-optimal under phase-separable workloads (predicts 31.4% savings; observed 40.6%). We build HeteroServe, a phase-aware runtime with modality-level partitioning and cross-tier scheduling, and evaluate it on LLaVA-1.5-7B and Qwen2.5-VL against vLLM v0.3.0. On identical 4xA100 hardware, engine optimizations raise throughput by up to 54%. Under a fixed budget, a heterogeneous cluster (\$38k) improves Tokens/\$ by 37% over a homogeneous baseline (\$64k) without degrading latency.

【9】FGTR: Fine-Grained Multi-Table Retrieval via Hierarchical LLM Reasoning
标题：FGTR：通过分层LLM推理的细粒度多表检索
链接：https://arxiv.org/abs/2603.12702

作者：Chaojie Sun,Bin Cao,Tiantian Li,Chenyu Hou,Ruizhe Li,Qing Fan
备注：Under Review - Submitted to SIGIR 2026 Resources Track; 10pages, 5 figures, 4 tables
摘要：随着大型语言模型的快速发展，基于大型语言模型的表检索研究也越来越多。然而，现有的研究通常集中在单表查询，并实现它的编码后，整个表的相似性匹配。这些方法通常会导致低精度，由于其粗粒度的编码，其中包括许多查询无关的数据，也是低效的，当处理大型表，未能充分利用LLM的推理能力。此外，在检索任务中，多表查询还未被充分探索。为此，我们提出了一个层次化的多表查询方法的基础上LLM：细粒度多表检索FGTR，一个新的检索范式，采用类似人类的推理策略。通过层次推理，FGTR首先识别相关的模式元素，然后检索相应的单元格内容，最终构建一个简洁准确的子表，与给定的查询。为了全面评估FGTR的性能，我们基于Spider和BIRD构建了两个新的基准数据集。实验结果表明，FGTR算法在性能上优于已有的方法，在Spider和BIRD上分别将F2度量提高了18%和21%，证明了FGTR算法在提高细粒度检索性能方面的有效性，并具有提高基于表的下游任务端到端性能的潜力.
摘要：With the rapid advancement of large language models (LLMs), growing efforts have been made on LLM-based table retrieval. However, existing studies typically focus on single-table query, and implement it by similarity matching after encoding the entire table. These methods usually result in low accuracy due to their coarse-grained encoding which incorporates much query-irrelated data, and are also inefficient when dealing with large tables, failing to fully utilize the reasoning capabilities of LLM. Further, multi-table query is under-explored in retrieval tasks. To this end, we propose a hierarchical multi-table query method based on LLM: Fine-Grained Multi-Table Retrieval FGTR, a new retrieval paradigm that employs a human-like reasoning strategy. Through hierarchical reasoning, FGTR first identifies relevant schema elements and then retrieves the corresponding cell contents, ultimately constructing a concise and accurate sub-table that aligns with the given query. To comprehensively evaluate the performance of FGTR, we construct two new benchmark datasets based on Spider and BIRD . Experimental results show that FGTR outperforms previous state-of-the-art methods, improving the F_2 metric by 18% on Spider and 21% on BIRD, demonstrating its effectiveness in enhancing fine-grained retrieval and its potential to improve end-to-end performance on table-based downstream tasks.

【10】RXNRECer Enables Fine-grained Enzymatic Function Annotation through Active Learning and Protein Language Models
标题：RXNRECer通过主动学习和蛋白质语言模型实现细粒度酶功能注释
链接：https://arxiv.org/abs/2603.12694

作者：Zhenkun Shi,Jun Zhu,Dehang Wang,BoYu Chen,Qianqian Yuan,Zhitao Mao,Fan Wei,Weining Wu,Xiaoping Liao,Hongwu Ma
摘要：酶注释的一个关键挑战是识别蛋白质催化的生化反应。大多数现有的方法依赖于酶委员会（EC）编号作为中介：它们首先预测EC编号，然后检索相关的反应。由于蛋白质、EC编号和反应之间复杂的多对多映射，这种间接策略引入了模糊性，并且由于EC编号的频繁更新和数据库之间的不一致而进一步复杂化。为了解决这些挑战，我们提出了RXNRECer，一个基于转换器的集成框架，直接预测酶催化的反应，而不依赖于EC数。它集成了蛋白质语言建模和主动学习，以捕获高级序列语义和细粒度转换模式。对策划的交叉验证和时间测试集的评估表明，与六个基于EC的基线相比，F1得分提高了16.54%，准确率提高了15.43%。除了准确性的提高，该框架还为下游应用提供了明显的优势，包括可扩展的蛋白质组范围的反应注释，改进通用反应模式的增强特异性，以前未策划的蛋白质的系统注释，以及酶混杂的可靠鉴定。通过整合大型语言模型，它还为预测提供了可解释的基本原理。这些功能使RXNRECer成为无EC、细粒度酶功能预测的强大而通用的解决方案，具有跨酶研究和工业应用多个领域的潜在应用。
摘要：A key challenge in enzyme annotation is identifying the biochemical reactions catalyzed by proteins. Most existing methods rely on Enzyme Commission (EC) numbers as intermediaries: they first predict an EC number and then retrieve the associated reactions. This indirect strategy introduces ambiguity due to the complex many-to-many mappings among proteins, EC numbers, and reactions, and is further complicated by frequent updates to EC numbers and inconsistencies across databases. To address these challenges, we present RXNRECer, a transformer-based ensemble framework that directly predicts enzyme-catalyzed reactions without relying on EC numbers. It integrates protein language modeling and active learning to capture both high-level sequence semantics and fine-grained transformation patterns. Evaluations on curated cross-validation and temporal test sets demonstrate consistent improvements over six EC-based baselines, with gains of 16.54% in F1 score and 15.43% in accuracy. Beyond accuracy gains, the framework offers clear advantages for downstream applications, including scalable proteome-wide reaction annotation, enhanced specificity in refining generic reaction schemas, systematic annotation of previously uncurated proteins, and reliable identification of enzyme promiscuity. By incorporating large language models, it also provides interpretable rationales for predictions. These capabilities make RXNRECer a robust and versatile solution for EC-free, fine-grained enzyme function prediction, with potential applications across multiple areas of enzyme research and industrial applications.

【11】Colluding LoRA: A Composite Attack on LLM Safety Alignment
标题：勾结LoRA：对LLM安全调整的复合攻击
链接：https://arxiv.org/abs/2603.12681

作者：Sihao Ding
摘要：我们介绍了共谋LoRA（CoLoRA），在这种攻击中，每个适配器似乎是良性的，并且孤立地发挥功能，但它们的线性组合始终会损害安全性。与依赖于特定输入触发器或提示模式的攻击不同，CoLoRA是一种组合触发的广泛拒绝抑制：一旦加载了一组特定的适配器，模型就会进行有效的对齐降级，遵守有害的请求，而不需要对抗性提示或后缀。这种攻击利用了当前防御系统的组合盲性，其中彻底扫描所有组合在计算上是难以处理的。在几个开放权重LLM中，CoLoRA在组合后单独实现了良性行为，但攻击成功率很高，这表明保护模块化LLM供应链需要超越单模块验证，转向组合感知防御。
摘要：We introduce Colluding LoRA (CoLoRA), an attack in which each adapter appears benign and plausibly functional in isolation, yet their linear composition consistently compromises safety. Unlike attacks that depend on specific input triggers or prompt patterns, CoLoRA is a composition-triggered broad refusal suppression: once a particular set of adapters is loaded, the model undergoes effective alignment degradation, complying with harmful requests without requiring adversarial prompts or suffixes. This attack exploits the combinatorial blindness of current defense systems, where exhaustively scanning all compositions is computationally intractable. Across several open-weight LLMs, CoLoRA achieves benign behavior individually yet high attack success rate after composition, indicating that securing modular LLM supply-chains requires moving beyond single-module verification toward composition-aware defenses.

【12】RetroReasoner: A Reasoning LLM for Strategic Retrosynthesis Prediction
标题：RetroReasoner：用于战略回溯合成预测的推理LLM
链接：https://arxiv.org/abs/2603.12666

作者：Hanbum Ko,Chanhui Lee,Ye Rin Kim,Rodrigo Hormazabal,Sehui Han,Sungbin Lim,Sungwoong Kim
备注：26 pages, 18 figures
摘要：逆合成预测是有机合成中的核心任务，其目的是预测给定产物分子的反应物。传统上，化学家选择一个合理的键断开，并得到相应的反应物，这是耗时的，需要大量的专业知识。虽然最近在分子大语言模型（LLM）方面取得了进展，但许多方法要么在没有策略推理的情况下预测反应物，要么只进行通用产物分析，而不是明确地推理键断开策略，从而在逻辑上导致选择特定的反应物。为了克服这些限制，我们提出了RetroReasoner，一个利用化学家战略思维的逆合成推理模型。RetroReasoner使用监督微调（SFT）和强化学习（RL）进行训练。对于SFT，我们引入了SyntheticRetro，这是一个生成结构化断开原理以及反应物预测的框架。在RL的情况下，我们应用往返精度作为奖励，其中预测的反应物通过正向合成模型，并且当正向预测的产物与原始输入产物匹配时，预测得到奖励。实验结果表明，RetroReasoner不仅优于先前的基线，而且还生成了更广泛的可行反应物建议，特别是在处理更具挑战性的反应实例时。
摘要：Retrosynthesis prediction is a core task in organic synthesis that aims to predict reactants for a given product molecule. Traditionally, chemists select a plausible bond disconnection and derive corresponding reactants, which is time-consuming and requires substantial expertise. While recent advancements in molecular large language models (LLMs) have made progress, many methods either predict reactants without strategic reasoning or conduct only a generic product analysis, rather than reason explicitly about bond-disconnection strategies that logically lead to the choice of specific reactants. To overcome these limitations, we propose RetroReasoner, a retrosynthetic reasoning model that leverages chemists' strategic thinking. RetroReasoner is trained using both supervised fine-tuning (SFT) and reinforcement learning (RL). For SFT, we introduce SyntheticRetro, a framework that generates structured disconnection rationales alongside reactant predictions. In the case of RL, we apply a round-trip accuracy as reward, where predicted reactants are passed through a forward synthesis model, and predictions are rewarded when the forward-predicted product matches the original input product. Experimental results show that RetroReasoner not only outperforms prior baselines but also generates a broader range of feasible reactant proposals, particularly in handling more challenging reaction instances.

【13】Spend Less, Reason Better: Budget-Aware Value Tree Search for LLM Agents
标题：少花钱，好理由：预算意识价值树搜索LLM代理
链接：https://arxiv.org/abs/2603.12634

作者：Yushu Li,Wenlong Deng,Jiajin Li,Xiaoxiao Li
摘要：测试时间扩展已成为提高LLM代理可靠性的主要范例，但目前的方法将计算视为丰富的资源，允许代理在冗余步骤或死胡同轨迹上耗尽令牌和工具预算。现有的故障感知方法要么需要昂贵的微调，要么依赖于不能干预中间执行的粗略的故障级故障。我们提出了预算感知价值树（BAVT），这是一个无需训练的推理时间框架，它将多跳推理建模为一个动态搜索树，由单个LLM骨干中的步骤级价值估计指导。另一个关键的创新是一个无条件的节点选择机制，它使用剩余资源比率作为节点值的自然标度指数，随着预算的耗尽，提供了一个从广泛探索到贪婪利用的原则性的、无参数的过渡。为了对抗LLM自我评估的过度自信，BAVT采用了一个残差值预测器，该预测器对相对进度而不是绝对状态质量进行评分，从而能够可靠地修剪无信息或冗余的工具调用。我们进一步提供了一个理论上的收敛性保证，证明了BAVT达到终端答案的概率至少为1-ε$下一个明确的有限预算约束。对两个模型系列的四个多跳QA基准的广泛评估表明，BAVT始终优于并行采样基线。最值得注意的是，在严格的低预算约束下，BAVT在资源分配的4倍时超过了基线性能，这表明智能预算管理从根本上优于暴力计算扩展。
摘要：Test-time scaling has become a dominant paradigm for improving LLM agent reliability, yet current approaches treat compute as an abundant resource, allowing agents to exhaust token and tool budgets on redundant steps or dead-end trajectories. Existing budget-aware methods either require expensive fine-tuning or rely on coarse, trajectory-level heuristics that cannot intervene mid-execution. We propose the Budget-Aware Value Tree (BAVT), a training-free inference-time framework that models multi-hop reasoning as a dynamic search tree guided by step-level value estimation within a single LLM backbone. Another key innovation is a budget-conditioned node selection mechanism that uses the remaining resource ratio as a natural scaling exponent over node values, providing a principled, parameter-free transition from broad exploration to greedy exploitation as the budget depletes. To combat the well-known overconfidence of LLM self-evaluation, BAVT employs a residual value predictor that scores relative progress rather than absolute state quality, enabling reliable pruning of uninformative or redundant tool calls. We further provide a theoretical convergence guarantee, proving that BAVT reaches a terminal answer with probability at least $1-ε$ under an explicit finite budget bound. Extensive evaluations on four multi-hop QA benchmarks across two model families demonstrate that BAVT consistently outperforms parallel sampling baselines. Most notably, BAVT under strict low-budget constraints surpasses baseline performance at $4\times$ the resource allocation, establishing that intelligent budget management fundamentally outperforms brute-force compute scaling.

【14】Reinforcement Learning for Diffusion LLMs with Entropy-Guided Step Selection and Stepwise Advantages
标题：具有信息引导步骤选择和分步优势的扩散LLM强化学习
链接：https://arxiv.org/abs/2603.12554

作者：Vishnu Teja Kunde,Fatemeh Doudi,Mahdi Farahbakhsh,Dileep Kalathil,Krishna Narayanan,Jean-Francois Chamberland
摘要：强化学习（RL）对于训练后的自回归（AR）语言模型是有效的，但由于难以处理的序列级可能性，将这些方法扩展到扩散语言模型（DLM）是具有挑战性的。因此，现有的方法依赖于替代似然或启发式近似，这可能会引入偏差并模糊去噪的顺序结构。我们制定基于扩散的序列生成作为一个有限时域马尔可夫决策过程的去噪轨迹，并推导出一个精确的，无偏的政策梯度，分解的去噪步骤，并表示在中间的优势，而不需要显式的序列似然评估。为了获得一个实用且计算效率高的估计器，我们（i）通过熵引导的近似边界选择策略更新的去噪步骤，以及（ii）使用扩散模型自然提供的一步去噪奖励来估计中间优势，避免昂贵的多步推出。编码和逻辑推理基准测试的实验证明了最先进的结果，在数学推理方面具有很强的竞争力，优于现有的DLM RL后训练方法。代码可在https://github.com/vishnutez/egspo-dllm-rl上获得。
摘要：Reinforcement learning (RL) has been effective for post-training autoregressive (AR) language models, but extending these methods to diffusion language models (DLMs) is challenging due to intractable sequence-level likelihoods. Existing approaches therefore rely on surrogate likelihoods or heuristic approximations, which can introduce bias and obscure the sequential structure of denoising. We formulate diffusion-based sequence generation as a finite-horizon Markov decision process over the denoising trajectory and derive an exact, unbiased policy gradient that decomposes over denoising steps and is expressed in terms of intermediate advantages, without requiring explicit evaluation of the sequence likelihood. To obtain a practical and compute-efficient estimator, we (i) select denoising steps for policy updates via an entropy-guided approximation bound, and (ii) estimate intermediate advantages using a one-step denoising reward naturally provided by the diffusion model, avoiding costly multi-step rollouts. Experiments on coding and logical reasoning benchmarks demonstrate state-of-the-art results, with strong competitive performance on mathematical reasoning, outperforming existing RL post-training approaches for DLMs. Code is available at https://github.com/vishnutez/egspo-dllm-rl.

【15】As Language Models Scale, Low-order Linear Depth Dynamics Emerge
标题：随着语言模型的扩展，低级线性深度动力学出现
链接：https://arxiv.org/abs/2603.12541

作者：Buddhika Nettasinghe,Geethu Joseph
摘要：大型语言模型通常被视为高维非线性系统，并被视为黑箱。在这里，我们表明，Transformer深度动态承认准确的低阶线性代理上下文内。在包括毒性，讽刺，仇恨言论和情绪在内的任务中，32维线性替代品以近乎完美的一致性再现了GPT-2-large的逐层敏感性曲线，捕获了每层添加剂注入下最终输出的变化。然后，我们发现了一个令人惊讶的缩放原则：对于固定阶线性替代，与完整模型的一致性随着GPT-2家族的模型大小单调提高。这种线性替代还实现了原则性多层干预，当应用于完整模型时，这些干预需要比标准启发式调度更少的能量。总之，我们的研究结果表明，随着语言模型的规模，低阶线性深度动态出现在上下文中，为分析和控制它们提供了系统理论基础。
摘要：Large language models are often viewed as high-dimensional nonlinear systems and treated as black boxes. Here, we show that transformer depth dynamics admit accurate low-order linear surrogates within context. Across tasks including toxicity, irony, hate speech and sentiment, a 32-dimensional linear surrogate reproduces the layerwise sensitivity profile of GPT-2-large with near-perfect agreement, capturing how the final output shifts under additive injections at each layer. We then uncover a surprising scaling principle: for a fixed-order linear surrogate, agreement with the full model improves monotonically with model size across the GPT-2 family. This linear surrogate also enables principled multi-layer interventions that require less energy than standard heuristic schedules when applied to the full model. Together, our results reveal that as language models scale, low-order linear depth dynamics emerge within contexts, offering a systems-theoretic foundation for analyzing and controlling them.

【16】When LLM Judge Scores Look Good but Best-of-N Decisions Fail
标题：当LLM法官成绩看起来不错但N中最佳决定失败时
链接：https://arxiv.org/abs/2603.12520

作者：Eddie Landesberg
摘要：大型语言模型通常被用作评判者来对候选答案进行评分，然后使用单个全局度量（如与参考标签的相关性）进行验证。当真正的部署任务是在提示符内进行n中最佳选择时，这可能会产生误导。在Chatbot Arena的5，000个提示的四局两胜基准测试中，具有中等全局相关性（r = 0.47）的法官仅获得了完美选择比随机选择所实现的改进的21.0%。差距的出现是因为全球一致性主要是由非水平基线效应驱动的，而选择取决于提示内排名：提示内相关性仅为r_within = 0.27，粗略的逐点评分在67%的成对比较中产生联系。在匹配的二选一审计中，显式成对判断恢复了大部分丢失的信号，将恢复率从21.1%提高到61.2%。对于基于判断的选择，相关审计应报告提示内信号、平局率和回收率/前1准确度，而不仅仅是全球一致性。
摘要：Large language models are often used as judges to score candidate responses, then validated with a single global metric such as correlation with reference labels. This can be misleading when the real deployment task is best-of-n selection within a prompt. In a 5,000-prompt best-of-4 benchmark from Chatbot Arena, a judge with moderate global correlation (r = 0.47) captures only 21.0% of the improvement that perfect selection would achieve over random choice. The gap arises because global agreement is driven largely by prompt-level baseline effects, while selection depends on within-prompt ranking: within-prompt correlation is only r_within = 0.27, and coarse pointwise scoring creates ties in 67% of pairwise comparisons. In a matched-pair best-of-2 audit, explicit pairwise judging recovers much of this lost signal, raising recovery from 21.1% to 61.2%. For judge-based selection, the relevant audit should report within-prompt signal, tie rates, and recovery/top-1 accuracy, not global agreement alone.

【17】Naïve PAINE: Lightweight Text-to-Image Generation Improvement with Prompt Evaluation
标题：Naïve PAINE：轻量级的文本到图像生成改进，并立即评估
链接：https://arxiv.org/abs/2603.12506

作者：Joong Ho Kim,Nicholas Thai,Souhardya Saha Dip,Dong Lao,Keith G. Mills
备注：Code available at https://github.com/LSU-ATHENA/Naive-PAINE
摘要：文本到图像（T2I）生成主要由依赖于随机高斯噪声的扩散模型（DM）驱动。因此，就像在赌场玩老虎机一样，DM将在给定相同的用户定义输入的情况下产生不同的结果。这就增加了赌徒的负担：执行多个生成周期以获得满意的结果。然而，即使DM使用随机采样来种子生成，生成的内容质量的分布高度依赖于DM关于它的提示和生成能力。为了解释这一点，我们提出了朴素PAINE，通过利用T2I偏好基准来提高扩散模型的生成质量。我们直接预测的数字质量的图像从初始噪声和给定的提示。然后，Naive PAINE选择一些高质量的噪音，并将它们转发给DM进行生成。此外，Naïve PAINE在提示下提供了关于DM生成质量的反馈，并且足够轻量级，可以无缝地适应现有的DM管道。实验结果表明，朴素PAINE优于现有的方法在几个提示语料库基准。
摘要：Text-to-Image (T2I) generation is primarily driven by Diffusion Models (DM) which rely on random Gaussian noise. Thus, like playing the slots at a casino, a DM will produce different results given the same user-defined inputs. This imposes a gambler's burden: To perform multiple generation cycles to obtain a satisfactory result. However, even though DMs use stochastic sampling to seed generation, the distribution of generated content quality highly depends on the prompt and the generative ability of a DM with respect to it. To account for this, we propose Naïve PAINE for improving the generative quality of Diffusion Models by leveraging T2I preference benchmarks. We directly predict the numerical quality of an image from the initial noise and given prompt. Naïve PAINE then selects a handful of quality noises and forwards them to the DM for generation. Further, Naïve PAINE provides feedback on the DM generative quality given the prompt and is lightweight enough to seamlessly fit into existing DM pipelines. Experimental results demonstrate that Naïve PAINE outperforms existing approaches on several prompt corpus benchmarks.

【18】TaxBreak: Unmasking the Hidden Costs of LLM Inference Through Overhead Decomposition
标题：TaxBreak：通过费用分解揭示LLM推理的隐藏成本
链接：https://arxiv.org/abs/2603.12465

作者：Prabhu Vellaisamy,Shreesh Tripathi,Vignesh Natarajan,Surya Santhan Thenarasu,Shawn Blanton,John P. Shen
备注：Accepted at IEEE ISPASS 2026. Copyright assigned to IEEE
摘要：大语言模型推理广泛应用于交互式助手和代理系统中。在对延迟敏感的部署中，推理时间可能会被主机端的开销所支配。现有方法通常仅将此成本暴露为聚合残差或启动/队列度量，这通常不足以识别应优化哪个执行层。这项工作提出了TaxBreak，一种跟踪驱动的方法，用于将主机可见的编排开销分解为三个部分：框架翻译时间，CUDA库翻译时间和内核启动路径时间。我们在NVIDIA H100和H200系统上验证TaxBreak，并使用它来导出我们提出的主机-设备平衡指数（HDBI），这是一个将设备活动执行与主机可见编排相关联的有界摘要指数。在预填充和解码中的代表性密集和混合专家工作负载中，我们表明，聚合延迟，GPU不活动或有界性比率单独可以掩盖主要的优化目标。相反，TaxBreak区分了优化应该减少软件堆栈开销的情况和主要胜利来自减少设备端工作的情况。我们进一步表明，与密集模型相比，MoE模型每个输出令牌多分派8- 11倍的内核，并且对于这种主机绑定的工作负载，CPU单线程性能是一阶参数：更快的主机CPU将编排开销减少10-29%，并将端到端延迟提高高达14%，即使与时钟较慢的GPU配对。这些结果将TaxBreak定位为一种诊断工具，用于评估优化工作是否应该针对软件堆栈或设备端工作负载执行。
摘要：Large Language Model (LLM) inference is widely used in interactive assistants and agentic systems. In latency-sensitive deployments, inference time can become dominated by host-side overheads. Existing approaches typically expose this cost only as an aggregate residual or a launch/queue metric, which is often insufficient to identify which execution layer should be optimized. This work presents TaxBreak, a trace-driven methodology for decomposing host-visible orchestration overhead into three components: framework translation time, CUDA library translation time, and kernel launch-path time. We validate TaxBreak on NVIDIA H100 and H200 systems and use it to derive our proposed Host-Device Balance Index (HDBI), a boundedness summary index that relates device-active execution to host-visible orchestration. Across representative dense and mixture-of-experts workloads in both prefill and decode, we show that aggregate latency, GPU inactivity, or boundedness ratios alone can obscure the dominant optimization target. TaxBreak instead distinguishes cases where optimization should reduce software-stack overhead from cases where the primary win comes from reducing device-side work. We further show that MoE models dispatch 8-11x more kernels per output token than dense models, and that for such host-bound workloads, CPU single-thread performance is a first-order parameter: a faster host CPU reduces orchestration overhead by 10-29% and improves end-to-end latency by up to 14%, even when paired with a slower-clocked GPU. These results position TaxBreak as a diagnostic tool for assessing whether optimization effort should target the software stack or the device-side workload execution.

【19】Generalist Large Language Models for Molecular Property Prediction: Distilling Knowledge from Specialist Models
标题：分子性质预测的通才大语言模型：从专家模型中提取知识
链接：https://arxiv.org/abs/2603.12344

作者：Khiem Le,Sreejata Dey,Marcos Martínez Galindo,Vanessa Lopez,Ting Hua,Nitesh V. Chawla,Hoang Thanh Lam
摘要：分子性质预测（MPP）是药物发现的中心任务。虽然大型语言模型（LLM）显示出作为MPP通用模型的前景，但它们目前的性能仍然低于实际采用的阈值。我们提出了TreeKD，一种新的知识蒸馏方法，将补充知识从基于树的专家模型转移到LLM。我们的方法在功能组特征上训练专家决策树，然后将他们学习的预测规则作为自然语言进行语言化，以实现规则增强的上下文学习。这使得LLM能够利用难以单独从SMILES字符串中提取的结构性见解。我们进一步介绍了规则一致性，这是一种测试时缩放技术，灵感来自于将来自随机森林的不同规则的预测集合在一起的装袋。对TDC基准的22个ADMET属性的实验表明，TreeKD大大提高了LLM的性能，缩小了与SOTA专家模型的差距，并朝着实用的通用分子属性预测模型迈进。
摘要：Molecular Property Prediction (MPP) is a central task in drug discovery. While Large Language Models (LLMs) show promise as generalist models for MPP, their current performance remains below the threshold for practical adoption. We propose TreeKD, a novel knowledge distillation method that transfers complementary knowledge from tree-based specialist models into LLMs. Our approach trains specialist decision trees on functional group features, then verbalizes their learned predictive rules as natural language to enable rule-augmented context learning. This enables LLMs to leverage structural insights that are difficult to extract from SMILES strings alone. We further introduce rule-consistency, a test-time scaling technique inspired by bagging that ensembles predictions across diverse rules from a Random Forest. Experiments on 22 ADMET properties from the TDC benchmark demonstrate that TreeKD substantially improves LLM performance, narrowing the gap with SOTA specialist models and advancing toward practical generalist models for molecular property prediction.

【20】Aligning Language Models from User Interactions
标题：根据用户交互调整语言模型
链接：https://arxiv.org/abs/2603.12273

作者：Thomas Kleine Buening,Jonas Hübotter,Barna Pásztor,Idan Shenfeld,Giorgia Ramponi,Andreas Krause
摘要：多轮用户交互是语言模型产生的最丰富的数据之一，但我们缺乏有效的方法来学习它们。虽然这些交互通常被丢弃，但它们通常包含有用的信息：后续用户消息可能表明响应不正确，未能遵循指示，或与用户的偏好不一致。重要的是，语言模型已经能够在上下文中使用这些信息。在观察用户的后续行动后，同一模型通常能够修改其行为。我们利用这种能力，提出了一个原则性和可扩展的方法，通过自我升华直接从用户交互中学习。通过根据用户的后续消息对模型进行调节，并将生成的令牌分布与原始策略进行比较，我们获得了更新策略的目标，该策略捕获了模型的行为如何在事后发生变化。然后，我们将这种后见之明的分布提取回当前政策。值得注意的是，我们表明，对WildChat真实用户对话的训练可以在标准对齐和遵循基准的情况下改进语言模型，而不会使其他功能退化。同样的机制支持个性化，允许模型通过交互不断适应个人用户，而无需显式反馈。我们的研究结果表明，在部署过程中自然产生的原始用户交互能够实现对齐，个性化和持续适应。
摘要：Multi-turn user interactions are among the most abundant data produced by language models, yet we lack effective methods to learn from them. While typically discarded, these interactions often contain useful information: follow-up user messages may indicate that a response was incorrect, failed to follow an instruction, or did not align with the user's preferences. Importantly, language models are already able to make use of this information in context. After observing a user's follow-up, the same model is often able to revise its behavior. We leverage this ability to propose a principled and scalable method for learning directly from user interactions through self-distillation. By conditioning the model on the user's follow-up message and comparing the resulting token distribution with the original policy, we obtain a target for updating the policy that captures how the model's behavior changes in hindsight. We then distill this hindsight distribution back into the current policy. Remarkably, we show that training on real-world user conversations from WildChat improves language models across standard alignment and instruction-following benchmarks, without regressing other capabilities. The same mechanism enables personalization, allowing models to continually adapt to individual users through interaction without explicit feedback. Our results demonstrate that raw user interactions that arise naturally during deployment enable alignment, personalization, and continual adaptation.

【21】ActTail: Global Activation Sparsity in Large Language Models
标题：ActTail：大型语言模型中的全球激活稀疏性
链接：https://arxiv.org/abs/2603.12272

作者：Wenwen Hou,Xinyuan Song,Shiwei Liu
摘要：激活稀疏是一种通过减少计算和内存移动来加速大型语言模型（LLM）推理的有前途的方法。然而，现有的激活稀疏性方法通常跨投影应用均匀稀疏性，忽略了Transformer权重的异构统计特性，从而放大了性能下降。在本文中，我们提出了ActTail，一个基于TopK幅值的激活稀疏方法与全球激活稀疏分配接地重尾自正则化（HT-SR）理论。具体来说，我们捕捉这种异质性通过重尾指数计算每个投影的经验谱密度（ESD），这是用来作为一个定量指标分配特定的投影稀疏预算。重要的是，我们提供了一个理论分析，建立了显式的激活稀疏比和重尾指数之间的关系，HT-SR制度下，稀疏分配超越启发式设计提供原则性指导。在LLaMA和Mistral模型上的实验表明，与均匀分配相比，该方法在高稀疏度下提高了复杂度和下游任务性能.在80%稀疏度下，LLaMA-2- 7 B上的困惑减少了21.8%，LLaMA-2- 13 B上减少了40.1%，Mistral-7 B上减少了9.4%。
摘要：Activation sparsity is a promising approach for accelerating large language model (LLM) inference by reducing computation and memory movement. However, existing activation sparsity methods typically apply uniform sparsity across projections, ignoring the heterogeneous statistical properties of Transformer weights and thereby amplifying performance degradation. In this paper, we propose ActTail, a TopK magnitude-based activation sparsity method with global activation sparsity allocation grounded in Heavy-Tailed Self-Regularization (HT-SR) theory. Specifically, we capture this heterogeneity via the heavy-tail exponent computed from each projection's empirical spectral density (ESD), which is used as a quantitative indicator to assign projection-specific sparsity budgets. Importantly, we provide a theoretical analysis that establishes an explicit relationship between the activation sparsity ratio and the heavy-tail exponent under the HT-SR regime, offering principled guidance for sparsity allocation beyond heuristic design. Experiments on LLaMA and Mistral models show that our method improves both perplexity and downstream task performance at high sparsity compared to uniform allocation. At 80% sparsity, perplexity is reduced by 21.8% on LLaMA-2-7B, 40.1% on LLaMA-2-13B, and 9.4% on Mistral-7B.

【22】Diagnosing Retrieval Bias Under Multiple In-Context Knowledge Updates in Large Language Models
标题：大型语言模型中多次上下文知识更新下的检索偏差诊断
链接：https://arxiv.org/abs/2603.12271

作者：Boyu Qiao,Sean Guo,Xian Yang,Kun Li,Wei Zhou,Songlin Hu,Yunya Song
摘要：LLM广泛用于知识密集型任务，其中同一事实可能在上下文中多次修改。与以前的工作侧重于一次性更新或单一的冲突，多更新的情况下，包含多个历史上有效的版本，在检索竞争，但仍然未充分探索。这种挑战类似于认知心理学中的AB-AC干扰范式：当同一个线索A相继与B和C相关联时，新旧关联在提取过程中竞争，导致偏见。受此启发，我们引入了一个动态知识实例（DKI）评估框架，将同一事实的多次更新建模为与更新值序列配对的提示，并通过端点探测最早（初始）和最新（当前）状态来评估模型。在不同的LLM中，我们观察到检索偏差随着更新的增加而加剧，最早状态的准确性保持较高，而最新状态的准确性大幅下降。对注意力、隐藏状态相似性和输出logits的诊断分析进一步揭示，这些信号变得更平坦，对错误的区分力也更弱，几乎没有为识别最新更新提供稳定的基础。最后，认知启发的启发式干预策略只产生适度的收益，并没有消除偏见。我们的研究结果揭示了在长期背景下跟踪和跟踪知识更新的持续挑战。
摘要：LLMs are widely used in knowledge-intensive tasks where the same fact may be revised multiple times within context. Unlike prior work focusing on one-shot updates or single conflicts, multi-update scenarios contain multiple historically valid versions that compete at retrieval, yet remain underexplored. This challenge resembles the AB-AC interference paradigm in cognitive psychology: when the same cue A is successively associated with B and C, the old and new associations compete during retrieval, leading to bias. Inspired by this, we introduce a Dynamic Knowledge Instance (DKI) evaluation framework, modeling multi-updates of the same fact as a cue paired with a sequence of updated values, and assess models via endpoint probing of the earliest (initial) and latest (current) states. Across diverse LLMs, we observe that retrieval bias intensifies as updates increase, earliest-state accuracy stays high while latest-state accuracy drops substantially. Diagnostic analyses of attention, hidden-state similarity, and output logits further reveal that these signals become flatter and weakly discriminative on errors, providing little stable basis for identifying the latest update. Finally, cognitively inspired heuristic intervention strategies yield only modest gains and do not eliminate the bias. Our results reveal a persistent challenge in tracking and following knowledge updates in long contexts.

Graph相关(图学习|图神经网络|图优化等)(2篇)

【1】Graph In-Context Operator Networks for Generalizable Spatiotemporal Prediction
标题：用于可推广时空预测的图内上下文运营商网络
链接：https://arxiv.org/abs/2603.12725

作者：Chenghan Wu,Zongmin Yu,Boai Sun,Liu Yang
备注：11 figures, 2 tables
摘要：上下文内算子学习使神经网络能够从上下文示例中推断解算子，而无需更新权重。虽然先前的工作已经证明了这种范式在利用大量数据集方面的有效性，但缺乏与使用相同训练数据的单操作员学习的系统比较。我们通过对照实验来解决这一差距，在相同的训练步骤和数据集下，将上下文操作符学习与经典操作符学习（没有上下文示例训练的单操作符模型）进行比较。为了使现实世界中的时空系统的调查，我们提出了GICON（图形上下文运算符网络），结合图形消息传递的几何泛化与基数泛化的示例感知位置编码。在中国两个地区进行的空气质量预测实验表明，在复杂任务上，上下文算子学习优于经典算子学习，在空间域上泛化，并在推理时从少量训练样本到100个样本进行鲁棒扩展。
摘要：In-context operator learning enables neural networks to infer solution operators from contextual examples without weight updates. While prior work has demonstrated the effectiveness of this paradigm in leveraging vast datasets, a systematic comparison against single-operator learning using identical training data has been absent. We address this gap through controlled experiments comparing in-context operator learning against classical operator learning (single-operator models trained without contextual examples), under the same training steps and dataset. To enable this investigation on real-world spatiotemporal systems, we propose GICON (Graph In-Context Operator Network), combining graph message passing for geometric generalization with example-aware positional encoding for cardinality generalization. Experiments on air quality prediction across two Chinese regions show that in-context operator learning outperforms classical operator learning on complex tasks, generalizing across spatial domains and scaling robustly from few training examples to 100 at inference.

【2】Lyapunov Stable Graph Neural Flow
标题：李亚普诺夫稳定图神经流
链接：https://arxiv.org/abs/2603.12557

作者：Haoyu Chu,Xiaotong Chen,Wei Zhou,Wenjun Cui,Kai Zhao,Shikui Wei,Qiyu Kang
摘要：图神经网络（GNN）在拓扑结构和特征方面都极易受到对抗性扰动的影响，这使得鲁棒表示的学习成为一个关键挑战。在这项工作中，我们将GNNs与控制理论联系起来，引入了一种基于整数阶和分数阶Lyapunov稳定性的新型防御框架。与依赖资源密集型对抗训练或数据净化的传统策略不同，我们的方法从根本上限制了GNN的底层特征更新动态。我们提出了一个自适应的，可学习的李雅普诺夫函数与一个新的投影机制，将网络的状态映射到一个稳定的空间，从而提供理论上可证明的稳定性保证。值得注意的是，这种机制与现有的防御机制是正交的，允许与对抗训练等技术无缝集成，以实现累积的鲁棒性。大量的实验表明，我们的Lyapunov稳定图神经流在标准基准测试和各种对抗性攻击场景中的性能大大优于基本神经流和最先进的基线。
摘要：Graph Neural Networks (GNNs) are highly vulnerable to adversarial perturbations in both topology and features, making the learning of robust representations a critical challenge. In this work, we bridge GNNs with control theory to introduce a novel defense framework grounded in integer- and fractional-order Lyapunov stability. Unlike conventional strategies that rely on resource-heavy adversarial training or data purification, our approach fundamentally constrains the underlying feature-update dynamics of the GNN. We propose an adaptive, learnable Lyapunov function paired with a novel projection mechanism that maps the network's state into a stable space, thereby offering theoretically provable stability guarantees. Notably, this mechanism is orthogonal to existing defenses, allowing for seamless integration with techniques like adversarial training to achieve cumulative robustness. Extensive experiments demonstrate that our Lyapunov-stable graph neural flows substantially outperform base neural flows and state-of-the-art baselines across standard benchmarks and various adversarial attack scenarios.

GAN|对抗|攻击|生成相关(7篇)

【1】SAW: Toward a Surgical Action World Model via Controllable and Scalable Video Generation
标题：AW：通过可控和可扩展的视频生成迈向手术行动世界模型
链接：https://arxiv.org/abs/2603.13024

作者：Sampath Rapuri,Lalithkumar Seenivasan,Dominik Schneider,Roger Soberanis-Mukul,Yufan He,Hao Ding,Jiru Xu,Chenhao Yu,Chenyan Jing,Pengfei Guo,Daguang Xu,Mathias Unberath
备注：The manuscript is under review
摘要：一个能够生成逼真的手术动作视频并精确控制工具-组织交互的手术世界模型可以解决手术AI和模拟中的基本挑战-从数据稀缺和罕见事件合成到弥合手术自动化的模拟到真实的差距。然而，目前的视频生成方法，这种手术世界模型的核心，需要昂贵的注释或复杂的结构化中间作为推理的调节信号，限制了它们的可扩展性。其他方法在复杂的腹腔镜场景中表现出有限的时间一致性，并且不具有足够的真实感。我们提出手术动作世界（SAW）-一个步骤，通过视频扩散条件下的四个轻量级的信号：语言提示编码工具动作上下文，参考手术场景，组织示能表示掩模，和2D工具提示轨迹手术动作世界建模。我们设计了一个有条件的视频扩散的方法，重新制定视频到视频扩散到自动调节手术动作合成。骨干扩散模型在具有轻量级时空调节信号的12，044个腹腔镜剪辑的定制数据集上进行微调，利用深度一致性损失来实施几何可扩展性，而无需推断深度。SAW实现了最先进的时间一致性（CD-FVD：199.19 vs. 546.82）和保持测试数据的强大视觉质量。此外，我们证明了其下游效用（a）手术AI，其中使用SAW生成的视频增强罕见动作，提高动作识别（修剪F1评分：20.93%至43.14%;修剪：0.00%至8.33%），以及（b）手术模拟，其中从模拟器导出的轨迹点向视觉上忠实的模拟引擎呈现工具-组织相互作用视频。
摘要：A surgical world model capable of generating realistic surgical action videos with precise control over tool-tissue interactions can address fundamental challenges in surgical AI and simulation -- from data scarcity and rare event synthesis to bridging the sim-to-real gap for surgical automation. However, current video generation methods, the very core of such surgical world models, require expensive annotations or complex structured intermediates as conditioning signals at inference, limiting their scalability. Other approaches exhibit limited temporal consistency across complex laparoscopic scenes and do not possess sufficient realism. We propose Surgical Action World (SAW) -- a step toward surgical action world modeling through video diffusion conditioned on four lightweight signals: language prompts encoding tool-action context, a reference surgical scene, tissue affordance mask, and 2D tool-tip trajectories. We design a conditional video diffusion approach that reformulates video-to-video diffusion into trajectory-conditioned surgical action synthesis. The backbone diffusion model is fine-tuned on a custom-curated dataset of 12,044 laparoscopic clips with lightweight spatiotemporal conditioning signals, leveraging a depth consistency loss to enforce geometric plausibility without requiring depth at inference. SAW achieves state-of-the-art temporal consistency (CD-FVD: 199.19 vs. 546.82) and strong visual quality on held-out test data. Furthermore, we demonstrate its downstream utility for (a) surgical AI, where augmenting rare actions with SAW-generated videos improves action recognition (clipping F1-score: 20.93% to 43.14%; cutting: 0.00% to 8.33%) on real test data, and (b) surgical simulation, where rendering tool-tissue interaction videos from simulator-derived trajectory points toward a visually faithful simulation engine.

【2】Design-Specification Tiling for ICL-based CAD Code Generation
标题：基于ICL的CAD代码生成中的设计规范拼接
链接：https://arxiv.org/abs/2603.12712

作者：Yali Du,San-Zhuo Xi,Hui Sun,Ming Li
摘要：大型语言模型（LLM）在代码生成方面表现出了卓越的能力，但由于缺乏训练数据，它们在特定领域的任务（如计算机辅助设计（CAD）代码生成）上表现不佳。情境学习（ICL）通过特定任务的范例提供了一种免培训的选择。然而，现有的选择策略优先考虑相似性或逐点多样性，经常产生冗余的选择，不能满足复杂的CAD设计规范的组成要求。在这项工作中，我们提出了知识的充分性作为一个原则性的目标，旨在最大限度地满足设计规范内的所有要求的样本选择。为了实现这一目标，我们引入设计规范平铺（DST），通过提取多粒度的设计组件和测量所选样本覆盖的查询组件的比例，通过代理平铺率量化知识的充分性。我们证明，最大限度地提高这一目标构成子模块最大化，并提供了一个多项式时间的贪婪算法（1-1/e）近似保证。大量的实验表明，DST大大提高了CAD代码生成质量，始终优于ICL中现有的样本选择策略。
摘要：Large language models (LLMs) have demonstrated remarkable capabilities in code generation, yet they underperform on domain-specific tasks such as Computer-Aided Design (CAD) code generation due to scarce training data. In-Context Learning (ICL) offers a training-free alternative through task-specific exemplars. However, existing selection strategies prioritize similarity or point-wise diversity, often producing redundant selections that fail to satisfy the compositional requirements of complex CAD design specifications. In this work, we propose knowledge sufficiency as a principled objective for exemplar selection that aims to maximally satisfy all requirements within design specifications. To realize this objective, we introduce Design-Specification Tiling (DST), which quantifies knowledge sufficiency through a surrogate tiling ratio by extracting multi-granular design components and measuring the proportion of query components covered by selected exemplars. We demonstrate that maximizing this objective constitutes submodular maximization and provide a polynomial-time greedy algorithm with a (1-1/e)-approximation guarantee. Extensive experiments demonstrate that DST substantially improves CAD code generation quality, consistently outperforming existing exemplar selection strategies in ICL.

【3】STRAP-ViT: Segregated Tokens with Randomized -- Transformations for Defense against Adversarial Patches in ViTs
标题：STRAP-ViT：隔离令牌与随机-变换防御对抗补丁在ViT
链接：https://arxiv.org/abs/2603.12688

作者：Nandish Chattopadhyay,Anadi Goyal,Chandan Karfa,Anupam Chattopadhyay
备注：Accepted for publication at IEEE/ACM Design Automation Conference (DAC) 2026
摘要：对抗性补丁是物理上可实现的局部噪声，它能够劫持Vision Transformers（ViT）的自我注意力，将焦点拉向一个小的高对比度区域，并破坏类标记，以迫使自信的错误分类。在本文中，我们声称，令牌对应于包含对抗性噪声的图像的区域，具有不同的统计特性相比，令牌不重叠的对抗性扰动。我们利用这一洞察力提出了一种称为STRAP-ViT的机制，该机制使用Jensen-Shannon Divergence作为分离标记的度量，这些标记在检测阶段表现为异常，然后在缓解阶段对其应用随机复合变换，以使对抗性噪声无效。要转换的令牌的最小数量是防御机制的超参数，并且被选择为使得至少50%的补丁被转换的令牌覆盖。STRAP-ViT适合作为ViT架构内的不可训练的即插即用块，仅用于推理目的，具有最小的计算成本，并且不需要任何额外的训练成本/工作。STRAP-ViT已经在多个预先训练的Vision Transformer架构（ViT-base-16和DinoV 2）和数据集（ImageNet和CalTech-101）上进行了测试，测试跨越了多个对抗性攻击（Adversarial Patch，LAVAN，GDPA和RP 2），并发现其提供了在干净基线的2-3%范围内的出色的鲁棒准确性，并且优于最先进的技术。
摘要：Adversarial patches are physically realizable localized noise, which are able to hijack Vision Transformers (ViT) self-attention, pulling focus toward a small, high-contrast region and corrupting the class token to force confident misclassifications. In this paper, we claim that the tokens which correspond to the areas of the image that contain the adversarial noise, have different statistical properties when compared to the tokens which do not overlap with the adversarial perturbations. We use this insight to propose a mechanism, called STRAP-ViT, which uses Jensen-Shannon Divergence as a metric for segregating tokens that behave as anomalies in the Detection Phase, and then apply randomized composite transformations on them during the Mitigation Phase to make the adversarial noise ineffective. The minimum number of tokens to transform is a hyper-parameter for the defense mechanism and is chosen such that at least 50% of the patch is covered by the transformed tokens. STRAP-ViT fits as a non-trainable plug-and-play block within the ViT architectures, for inference purposes only, with a minimal computational cost and does not require any additional training cost/effort. STRAP-ViT has been tested on multiple pre-trained vision transformer architectures (ViT-base-16 and DinoV2) and datasets (ImageNet and CalTech-101), across multiple adversarial attacks (Adversarial Patch, LAVAN, GDPA and RP2), and found to provide excellent robust accuracies lying within a 2-3% range of the clean baselines, and outperform the state-of-the-art.

【4】Generating Expressive and Customizable Evals for Timeseries Data Analysis Agents with AgentFuel
标题：使用AgentFuel为时间系列数据分析代理生成具有表达性和可定制的Eval
链接：https://arxiv.org/abs/2603.12483

作者：Aadyaa Maddi,Prakhar Naval,Deepti Mande,Shane Duan,Muckai Girish,Vyas Sekar
摘要：跨多个领域（例如，物联网、可观察性、电信、网络安全），正在出现采用对话式数据分析代理，使用户能够“与您的数据对话”以提取见解。这样的数据分析代理对时间序列数据模型进行操作;例如，在产品分析中，用户可以通过传感器或事件监测用户点击和操作。我们评估了6个流行的数据分析代理（包括开源和专有的）对特定领域的数据和查询类型，并发现他们在有状态和特定事件的查询失败。我们观察到两个关键的表现力差距现有的评估：域定制的数据集和域特定的查询类型。为了使这些领域的从业者能够为这些时间序列数据代理生成定制和表达的评估，我们提出了AgentFuel。AgentFuel帮助领域专家快速创建定制的评估，以执行端到端的功能测试。我们表明，AgentFuel的基准暴露了改进现有数据代理框架的关键方向。我们还提出了使用AgentFuel可以提高代理性能的轶事证据（例如，GEPA）。AgentFuel基准测试可在https://huggingface.co/datasets/RockfishData/TimeSeriesAgentEvals上获得。
摘要：Across many domains (e.g., IoT, observability, telecommunications, cybersecurity), there is an emerging adoption of conversational data analysis agents that enable users to "talk to your data" to extract insights. Such data analysis agents operate on timeseries data models; e.g., measurements from sensors or events monitoring user clicks and actions in product analytics. We evaluate 6 popular data analysis agents (both open-source and proprietary) on domain-specific data and query types, and find that they fail on stateful and incident-specific queries. We observe two key expressivity gaps in existing evals: domain-customized datasets and domain-specific query types. To enable practitioners in such domains to generate customized and expressive evals for such timeseries data agents, we present AgentFuel. AgentFuel helps domain experts quickly create customized evals to perform end-to-end functional tests. We show that AgentFuel's benchmarks expose key directions for improvement in existing data agent frameworks. We also present anecdotal evidence that using AgentFuel can improve agent performance (e.g., with GEPA). AgentFuel benchmarks are available at https://huggingface.co/datasets/RockfishData/TimeSeriesAgentEvals.

【5】SpectralGuard: Detecting Memory Collapse Attacks in State Space Models
标题：SpectralGuard：检测状态空间模型中的内存崩溃攻击
链接：https://arxiv.org/abs/2603.12414

作者：Davi Bonetto
备注：24 pages, 10 figures. Code, dataset, and demo: https://github.com/DaviBonetto/spectralguard
摘要：Mamba等状态空间模型（SSM）通过依赖输入的递归实现线性时间序列处理，但这种机制引入了严重的安全漏洞。我们表明，离散化的过渡算子的谱半径ρ（A-bar）管理有效的记忆地平线：当对手通过基于梯度的隐藏状态中毒将ρ推向零时，记忆从数百万个令牌崩溃到几十个，默默地破坏推理能力，而不会触发输出级警报。我们证明了一个规避存在定理，表明对于任何仅输出的防御，存在对抗性输入，同时引起频谱崩溃和逃避检测，然后引入SpectralGuard，一个实时监控器，跟踪所有模型层的频谱稳定性。SpectralGuard针对非自适应攻击者达到F1=0.961，并在最强自适应设置下保持F1=0.842，每个令牌延迟低于15 ms。因果干预和跨架构转移到混合SSM注意力系统证实，频谱监测提供了一个原则性的，可部署的安全层，经常性的基础模型。
摘要：State Space Models (SSMs) such as Mamba achieve linear-time sequence processing through input-dependent recurrence, but this mechanism introduces a critical safety vulnerability. We show that the spectral radius rho(A-bar) of the discretized transition operator governs effective memory horizon: when an adversary drives rho toward zero through gradient-based Hidden State Poisoning, memory collapses from millions of tokens to mere dozens, silently destroying reasoning capacity without triggering output-level alarms. We prove an Evasion Existence Theorem showing that for any output-only defense, adversarial inputs exist that simultaneously induce spectral collapse and evade detection, then introduce SpectralGuard, a real-time monitor that tracks spectral stability across all model layers. SpectralGuard achieves F1=0.961 against non-adaptive attackers and retains F1=0.842 under the strongest adaptive setting, with sub-15ms per-token latency. Causal interventions and cross-architecture transfer to hybrid SSM-Attention systems confirm that spectral monitoring provides a principled, deployable safety layer for recurrent foundation models.

【6】Synthetic Data Generation for Brain-Computer Interfaces: Overview, Benchmarking, and Future Directions
标题：脑机接口的合成数据生成：概述，基准测试和未来方向
链接：https://arxiv.org/abs/2603.12296

作者：Ziwei Wang,Zhentao He,Xingyi He,Hongbin Wang,Tianwang Jia,Jingwei Luo,Siyang Li,Xiaoqing Chen,Dongrui Wu
备注：20 pages, 7 figures
摘要：深度学习在不同领域取得了变革性的表现，这在很大程度上是由大规模、高质量的训练数据驱动的。相比之下，脑机接口（BCI）的发展从根本上限制了有限的，异构的，隐私敏感的神经记录。因此，生成合成但生理上合理的大脑信号已经成为缓解数据稀缺和增强模型能力的一种令人信服的方式。这项调查提供了一个全面的审查脑信号产生的脑机接口，包括方法分类，基准实验，评估指标和关键应用。我们系统地将现有的生成算法分为四种类型：基于知识的，基于特征的，基于模型的和基于推理的方法。此外，我们基准现有的大脑信号生成方法在四个代表性的BCI范式，以提供一个客观的性能比较。最后，我们讨论了当前一代方法的潜力和挑战，并展望了未来的研究准确，数据高效，隐私意识的BCI系统。基准代码库在https://github.com/wzwvv/DG4BCI上公布。
摘要：Deep learning has achieved transformative performance across diverse domains, largely driven by the large-scale, high-quality training data. In contrast, the development of brain-computer interfaces (BCIs) is fundamentally constrained by the limited, heterogeneous, and privacy-sensitive neural recordings. Generating synthetic yet physiologically plausible brain signals has therefore emerged as a compelling way to mitigate data scarcity and enhance model capacity. This survey provides a comprehensive review of brain signal generation for BCIs, covering methodological taxonomies, benchmark experiments, evaluation metrics, and key applications. We systematically categorize existing generative algorithms into four types: knowledge-based, feature-based, model-based, and translation-based approaches. Furthermore, we benchmark existing brain signal generation approaches across four representative BCI paradigms to provide an objective performance comparison. Finally, we discuss the potentials and challenges of current generation approaches and prospect future research on accurate, data-efficient, and privacy-aware BCI systems. The benchmark codebase is publicized at https://github.com/wzwvv/DG4BCI.

【7】VecMol: Vector-Field Representations for 3D Molecule Generation
标题：VecMol：用于3D分子生成的矢场表示
链接：https://arxiv.org/abs/2603.12734

作者：Yuchen Hua,Xingang Peng,Jianzhu Ma,Muhan Zhang
摘要：三维分子的生成式建模是药物发现和材料科学中的一个基本而又具有挑战性的问题。现有方法通常将分子表示为3D图，并共同生成具有连续原子坐标的离散原子类型，从而导致固有的学习困难，例如异构模态纠缠和几何-化学相干性约束。我们提出了VecMol，一个范式转换框架，通过将3D分子建模为欧几里得空间上的连续矢量场来重新想象分子表示，其中矢量指向附近的原子并隐式编码分子结构。向量场由神经场参数化，并使用潜在扩散模型生成，避免了显式图生成和从离散原子实例化解耦结构学习。QM 9和GEOM-Drugs基准上的实验验证了这种新方法的可行性，表明基于矢量场的表示是3D分子生成的一个有前途的新方向。
摘要：Generative modeling of three-dimensional (3D) molecules is a fundamental yet challenging problem in drug discovery and materials science. Existing approaches typically represent molecules as 3D graphs and co-generate discrete atom types with continuous atomic coordinates, leading to intrinsic learning difficulties such as heterogeneous modality entanglement and geometry-chemistry coherence constraints. We propose VecMol, a paradigm-shifting framework that reimagines molecular representation by modeling 3D molecules as continuous vector fields over Euclidean space, where vectors point toward nearby atoms and implicitly encode molecular structure. The vector field is parameterized by a neural field and generated using a latent diffusion model, avoiding explicit graph generation and decoupling structure learning from discrete atom instantiation. Experiments on the QM9 and GEOM-Drugs benchmarks validate the feasibility of this novel approach, suggesting vector-field-based representations as a promising new direction for 3D molecular generation.

半/弱/无/有监督|不确定性|主动学习(9篇)

【1】BoSS: A Best-of-Strategies Selector as an Oracle for Deep Active Learning
标题：BoSS：作为深度主动学习Oracle的最佳策略
链接：https://arxiv.org/abs/2603.13109

作者：Denis Huseljic,Paul Hahn,Marek Herde,Christoph Sandrock,Bernhard Sick
摘要：主动学习（AL）旨在通过迭代选择有价值的实例来降低注释成本，同时最大限度地提高模型性能。虽然基础模型使识别这些实例变得更容易，但现有的选择策略仍然缺乏跨不同模型、注释预算和数据集的鲁棒性。为了突出现有AL策略的潜在弱点并为研究提供参考点，我们探索了Oracle策略，即，通过访问在实际AL场景中不可用的地面实况信息来近似最优选择的策略。然而，目前的Oracle策略无法有效地扩展到大型数据集和复杂的深度神经网络。为了解决这些局限性，我们引入了最佳策略包（BoSS），这是一种专为大规模AL场景设计的可扩展Oracle策略。BoSS通过选择策略的集合构造一组候选批次，然后选择产生最高性能增益的批次。作为选择策略的集合，BoSS可以在出现新的最先进的策略时轻松扩展，确保它在未来仍然是可靠的预言策略。我们的评估表明，i）BoSS优于现有的Oracle策略，ii）最先进的AL策略仍然明显低于Oracle性能，特别是在具有许多类的大规模数据集中，以及iii）抵消AL策略的不一致性能的一种可能的解决方案可能是采用基于集合的方法进行选择。
摘要：Active learning (AL) aims to reduce annotation costs while maximizing model performance by iteratively selecting valuable instances. While foundation models have made it easier to identify these instances, existing selection strategies still lack robustness across different models, annotation budgets, and datasets. To highlight the potential weaknesses of existing AL strategies and provide a reference point for research, we explore oracle strategies, i.e., strategies that approximate the optimal selection by accessing ground-truth information unavailable in practical AL scenarios. Current oracle strategies, however, fail to scale effectively to large datasets and complex deep neural networks. To tackle these limitations, we introduce the Best-of-Strategy Selector (BoSS), a scalable oracle strategy designed for large-scale AL scenarios. BoSS constructs a set of candidate batches through an ensemble of selection strategies and then selects the batch yielding the highest performance gain. As an ensemble of selection strategies, BoSS can be easily extended with new state-of-the-art strategies as they emerge, ensuring it remains a reliable oracle strategy in the future. Our evaluation demonstrates that i) BoSS outperforms existing oracle strategies, ii) state-of-the-art AL strategies still fall noticeably short of oracle performance, especially in large-scale datasets with many classes, and iii) one possible solution to counteract the inconsistent performance of AL strategies might be to employ an ensemble-based approach for the selection.

【2】GeoChemAD: Benchmarking Unsupervised Geochemical Anomaly Detection for Mineral Exploration
标题：GeChemAD：矿产勘探的无监督地球化学异常检测基准
链接：https://arxiv.org/abs/2603.13068

作者：Yihao Ding,Yiran Zhang,Chris Gonzalez,Eun-Jung Holden,Wei Liu
备注：Work in progress
摘要：地球化学异常检测在矿产勘探中发挥着关键作用，因为偏离区域地球化学基线可能指示矿化。现有的研究受到两个关键限制：（1）单一区域情景，限制了模型的普遍性;（2）专有数据集，这使得结果再现无法实现。在这项工作中，我们介绍了\textbf{GeoChemAD}，这是一个由政府主导的地质调查编制的开源基准数据集，涵盖多个地区，采样源和目标元素。该数据集包括八个子集，代表不同的空间尺度和采样条件。为了建立强大的基线，我们复制和基准测试了一系列无监督的异常检测方法，包括统计模型，生成和基于transformer的方法。此外，我们提出了一个基于transformer的框架，它利用自我监督的预训练来学习空间样本的目标元素感知地球化学表示。大量的实验表明，GeoChemFormer在所有八个子集中始终实现卓越和强大的性能，在异常检测精度和泛化能力方面优于现有的无监督方法。拟议的数据集和框架为可重复的研究和未来的发展提供了基础。
摘要：Geochemical anomaly detection plays a critical role in mineral exploration as deviations from regional geochemical baselines may indicate mineralization. Existing studies suffer from two key limitations: (1) single region scenarios which limit model generalizability; (2) proprietary datasets, which makes result reproduction unattainable. In this work, we introduce \textbf{GeoChemAD}, an open-source benchmark dataset compiled from government-led geological surveys, covering multiple regions, sampling sources, and target elements. The dataset comprises eight subsets representing diverse spatial scales and sampling conditions. To establish strong baselines, we reproduce and benchmark a range of unsupervised anomaly detection methods, including statistical models, generative and transformer-based approaches. Furthermore, we propose \textbf{GeoChemFormer}, a transformer-based framework that leverages self-supervised pretraining to learn target-element-aware geochemical representations for spatial samples. Extensive experiments demonstrate that GeoChemFormer consistently achieves superior and robust performance across all eight subsets, outperforming existing unsupervised methods in both anomaly detection accuracy and generalization capability. The proposed dataset and framework provide a foundation for reproducible research and future development in this direction.

【3】Hierarchical Reference Sets for Robust Unsupervised Detection of Scattered and Clustered Outliers
标题：基于分层参考集的离散和离散离群点鲁棒无监督检测
链接：https://arxiv.org/abs/2603.12847

作者：Yiqun Zhang,Zexi Tan,Xiaopeng Luo,Yunlin Liu
备注：15 pages, 9 figures
摘要：大多数现实世界的物联网数据分析任务，例如聚类和异常事件检测，都是无监督的，并且非常容易受到离群值的影响。除了由传感器读数错误等因素引起的零星分散的离群值外，物联网系统通常会出现聚集的离群值。当多个设备或节点产生类似的异常测量时，例如，由于局部干扰，新出现的安全威胁或区域假警报，形成微集群。这些聚集的异常值很容易被误认为是正常行为，因为它们的局部密度相对较高，从而模糊了对分散和上下文异常的检测。为了解决这个问题，我们提出了一种新的离群值检测范式，利用自然相邻关系使用图结构。这有利于多角度的异常评估，将参考集在本地和全球范围内从图形。我们的方法能够有效识别分散的离群值，而不受聚集异常的干扰，而图结构同时有助于反映和隔离聚集的离群值组。大量的实验，包括比较性能分析，消融研究，下游聚类任务的验证，和超参数敏感性的评估，证明了所提出的方法的有效性。源代码可在https://github.com/gordonlok/DROD上获得。
摘要：Most real-world IoT data analysis tasks, such as clustering and anomaly event detection, are unsupervised and highly susceptible to the presence of outliers. In addition to sporadic scattered outliers caused by factors such as faulty sensor readings, IoT systems often exhibit clustered outliers. These occur when multiple devices or nodes produce similar anomalous measurements, for instance, owing to localized interference, emerging security threats, or regional false alarms, forming micro-clusters. These clustered outliers can be easily mistaken for normal behavior because of their relatively high local density, thereby obscuring the detection of both scattered and contextual anomalies. To address this, we propose a novel outlier detection paradigm that leverages the natural neighboring relationships using graph structures. This facilitates multi-perspective anomaly evaluation by incorporating reference sets at both local and global scales derived from the graph. Our approach enables the effective recognition of scattered outliers without interference from clustered anomalies, whereas the graph structure simultaneously helps reflect and isolate clustered outlier groups. Extensive experiments, including comparative performance analysis, ablation studies, validation on downstream clustering tasks, and evaluation of hyperparameter sensitivity, demonstrate the efficacy of the proposed method. The source code is available at https://github.com/gordonlok/DROD.

【4】Residual SODAP: Residual Self-Organizing Domain-Adaptive Prompting with Structural Knowledge Preservation for Continual Learning
标题：残余SODPA：残余自组织域自适应预算，具有结构知识保存，用于持续学习
链接：https://arxiv.org/abs/2603.12816

作者：Gyutae Oh,Jungwoo Bae,Jitae Shin
备注：29 page, 10 figures
摘要：持续学习（CL）遭受灾难性遗忘，这在领域增量学习（DIL）中加剧，其中任务标识符不可用，并且存储过去的数据是不可行的。虽然基于XML的CL（PCL）适应表示与冻结的骨干，我们观察到，仅XML的改进往往是不够的，由于次优的提示选择和分类器级别的不稳定性下域的变化。我们提出了残差SODAP，它联合执行基于语义的表示自适应和分类器级知识保存。我们的框架结合了$α$entmax稀疏提示选择与残留聚合，无数据蒸馏与伪特征重放，基于不确定性使用的漂移检测和不确定性感知的多损失平衡。在没有任务ID或额外数据存储的三个DIL基准测试中，残留SODAP达到了最先进的AvgACC/AvgF，分别为0.850/0.047（DR）、0.760/0.031（皮肤癌）和0.995/0.003（CORe 50）。
摘要：Continual learning (CL) suffers from catastrophic forgetting, which is exacerbated in domain-incremental learning (DIL) where task identifiers are unavailable and storing past data is infeasible. While prompt-based CL (PCL) adapts representations with a frozen backbone, we observe that prompt-only improvements are often insufficient due to suboptimal prompt selection and classifier-level instability under domain shifts. We propose Residual SODAP, which jointly performs prompt-based representation adaptation and classifier-level knowledge preservation. Our framework combines $α$-entmax sparse prompt selection with residual aggregation, data-free distillation with pseudo-feature replay, prompt-usage--based drift detection, and uncertainty-aware multi-loss balancing. Across three DIL benchmarks without task IDs or extra data storage, Residual SODAP achieves state-of-the-art AvgACC/AvgF of 0.850/0.047 (DR), 0.760/0.031 (Skin Cancer), and 0.995/0.003 (CORe50).

【5】Deep Distance Measurement Method for Unsupervised Multivariate Time Series Similarity Retrieval
标题：无监督多元时间序列相似性检索的深距离测量方法
链接：https://arxiv.org/abs/2603.12544

作者：Susumu Naito,Kouta Nakata,Yasunori Taguchi
备注：Workshop of Artificial Intelligence for Time Series Analysis (AI4TS): Theory, Algorithms, and Applications at 2025 IEEE International Conference on Data Mining (ICDM), 2025
摘要：本文提出了深度距离度量方法（DDMM）来提高无监督多变量时间序列相似性检索的准确性。DDMM能够学习整个时间序列中状态的微小差异，从而识别状态之间的微小差异，这是工业工厂用户感兴趣的。为了实现这一点，DDMM使用一种学习算法，该算法基于对内的欧几里得距离为从整个时间序列中任意采样的每对锚点和正样本分配权重，并学习通过权重加权的对内的差异。该算法允许学习状态内的微小差异，并从整个时间序列中采样对。我们的实证研究表明，DDMM在纸浆和造纸厂数据集上的表现明显优于最先进的时间序列表示学习方法，并证明了DDMM在工业工厂中的有效性。此外，我们表明，通过将DDMM与现有的特征提取方法相结合，可以进一步提高精度。
摘要：We propose the Deep Distance Measurement Method (DDMM) to improve retrieval accuracy in unsupervised multivariate time series similarity retrieval. DDMM enables learning of minute differences within states in the entire time series and thereby recognition of minute differences between states, which are of interest to users in industrial plants. To achieve this, DDMM uses a learning algorithm that assigns a weight to each pair of an anchor and a positive sample, arbitrarily sampled from the entire time series, based on the Euclidean distance within the pair and learns the differences within the pairs weighted by the weights. This algorithm allows both learning minute differences within states and sampling pairs from the entire time series. Our empirical studies showed that DDMM significantly outperformed state-of-the-art time series representation learning methods on the Pulp-and-paper mill dataset and demonstrated the effectiveness of DDMM in industrial plants. Furthermore, we showed that accuracy can be further improved by linking DDMM with existing feature extraction methods through experiments with the combined model.

【6】Addressing Data Scarcity in 3D Trauma Detection through Self-Supervised and Semi-Supervised Learning with Vertex Relative Position Encoding
标题：通过具有垂直相对位置编码的自监督和半监督学习解决3D创伤检测中的数据稀缺问题
链接：https://arxiv.org/abs/2603.12514

作者：Shivam Chaudhary,Sheethal Bhat,Andreas Maier
备注：9 pages, 6 figures, 6 tables. The code is available at https://github.com/shivasmic/3d-trauma-detection-ssl
摘要：在腹部CT扫描中准确检测和定位创伤性损伤仍然是急诊放射学的关键挑战，主要是由于注释的医学数据严重缺乏。本文提出了一种结合自监督预训练和半监督检测的标记有效的方法，用于3D医学图像分析。我们采用基于补丁的屏蔽图像建模（MIM）在1，206个无注释的CT体积上预训练3D U-Net编码器，学习鲁棒的解剖表示。预训练的编码器支持两个下游临床任务：使用VDETR和顶点相对位置编码进行3D损伤检测，以及多标签损伤分类。对于检测，具有2，000个未标记卷和一致性正则化的半监督学习仅使用144个标记训练样本就实现了56.57%的验证mAP@0.50和45.30%的测试mAP@0.50，比仅监督训练提高了115%。对于分类，扩展到2，244个标记样本，仅使用冻结编码器就可以在七个损伤类别中获得94.07%的测试准确率，证明了可立即转移的自监督功能。我们的研究结果验证了自监督预训练与半监督学习相结合，有效地解决了医学成像中的标签稀缺问题，实现了具有有限注释的强大3D对象检测。
摘要：Accurate detection and localization of traumatic injuries in abdominal CT scans remains a critical challenge in emergency radiology, primarily due to severe scarcity of annotated medical data. This paper presents a label-efficient approach combining self-supervised pre-training with semi-supervised detection for 3D medical image analysis. We employ patch-based Masked Image Modeling (MIM) to pre-train a 3D U-Net encoder on 1,206 CT volumes without annotations, learning robust anatomical representations. The pretrained encoder enables two downstream clinical tasks: 3D injury detection using VDETR with Vertex Relative Position Encoding, and multi-label injury classification. For detection, semi-supervised learning with 2,000 unlabeled volumes and consistency regularization achieves 56.57% validation mAP@0.50 and 45.30% test mAP@0.50 with only 144 labeled training samples, representing a 115% improvement over supervised-only training. For classification, expanding to 2,244 labeled samples yields 94.07% test accuracy across seven injury categories using only a frozen encoder, demonstrating immediately transferable self-supervised features. Our results validate that self-supervised pre-training combined with semi-supervised learning effectively addresses label scarcity in medical imaging, enabling robust 3D object detection with limited annotations.

【7】Adaptive Conditional Forest Sampling for Spectral Risk Optimisation under Decision-Dependent Uncertainty
标题：决策相关不确定性下光谱风险优化的自适应条件森林抽样
链接：https://arxiv.org/abs/2603.12507

作者：Marcell T. Kurbucz
备注：15 pages, 3 figures, 8 tables
摘要：当不确定性分布依赖于决策时，最小化谱风险目标（定义为预期成本和条件风险值（CVaR）的凸组合）具有挑战性，使得代理建模和基于模拟的排名对尾部估计误差敏感。我们提出了自适应条件森林采样（ACFS），一个四阶段的模拟优化框架，集成了广义随机森林的决策条件分布近似，CEM引导的全球探索，排名加权集中增强，和代理到Oracle两阶段重新排序之前，多开始基于梯度的细化。我们评估ACFS上的两个结构不同的数据生成过程：一个决策相关的学生t copula和高斯copula与对数正态边际，在三个惩罚权重配置和100个重复设置。ACFS在每种配置中的第二个基准测试中实现了最低的中位数Oracle频谱风险，与GP-BO的中位数差距从6.0%到20.0%不等。在第一个基准测试中，ACFS和GP-BO在中位数目标方面在统计学上没有区别，但ACFS在第一个基准测试中将交叉复制离散度降低了约1.8至1.9倍，在第二个基准测试中降低了1.7至2.0倍，这表明运行间可靠性得到了实质性改善。ACFS在几乎所有的设置中也优于CEM-SO，SGD-CVaR和KDE-SO，而消融和敏感性分析支持所提出的设计的贡献和鲁棒性。
摘要：Minimising a spectral risk objective, defined as a convex combination of expected cost and Conditional Value-at-Risk (CVaR), is challenging when the uncertainty distribution is decision-dependent, making both surrogate modelling and simulation-based ranking sensitive to tail estimation error. We propose Adaptive Conditional Forest Sampling (ACFS), a four-phase simulation-optimisation framework that integrates Generalised Random Forests for decision-conditional distribution approximation, CEM-guided global exploration, rank-weighted focused augmentation, and surrogate-to-oracle two-stage reranking before multi-start gradient-based refinement. We evaluate ACFS on two structurally distinct data-generating processes: a decision-dependent Student-t copula and a Gaussian copula with log-normal marginals, across three penalty-weight configurations and 100 replications per setting. ACFS achieves the lowest median oracle spectral risk on the second benchmark in every configuration, with median gaps over GP-BO ranging from 6.0% to 20.0%. On the first benchmark, ACFS and GP-BO are statistically indistinguishable in median objective, but ACFS reduces cross-replication dispersion by approximately 1.8 to 1.9 times on the first benchmark and 1.7 to 2.0 times on the second, indicating materially improved run-to-run reliability. ACFS also outperforms CEM-SO, SGD-CVaR, and KDE-SO in nearly all settings, while ablation and sensitivity analyses support the contribution and robustness of the proposed design.

【8】Self-Supervised Speech Models Encode Phonetic Context via Position-dependent Orthogonal Subspaces
标题：基于位置相关正交子空间的自监督语音模型语音上下文编码
链接：https://arxiv.org/abs/2603.12642

作者：Kwanghee Choi,Eunjung Yeo,Cheol Jun Cho,David R. Mortensen,David Harwath
备注：Submitted to Interspeech 2026
摘要：基于transformer的自监督语音模型（S3 M）通常被描述为情境化的，但这意味着什么仍然不清楚。在这里，我们专注于如何一个单一的帧级S3 M表示可以编码电话及其周围的上下文。先前的工作已经表明，S3 M表示组成的音素;例如，语音向量，如清音，bilabiality和鼻音向量叠加在[m]的S3 M表示中。我们扩展了这一观点，建议从一个序列的相邻电话的语音信息也组成编码在一个单一的帧，这样的矢量对应于前一个，当前的，和下一个电话叠加在一个单一的帧级表示。我们发现，这种结构有几个属性，包括相对位置之间的正交性，和隐式语音边界的出现。总之，我们的研究结果推进了我们对上下文相关的S3 M表示的理解。
摘要：Transformer-based self-supervised speech models (S3Ms) are often described as contextualized, yet what this entails remains unclear. Here, we focus on how a single frame-level S3M representation can encode phones and their surrounding context. Prior work has shown that S3Ms represent phones compositionally; for example, phonological vectors such as voicing, bilabiality, and nasality vectors are superposed in the S3M representation of [m]. We extend this view by proposing that phonological information from a sequence of neighboring phones is also compositionally encoded in a single frame, such that vectors corresponding to previous, current, and next phones are superposed within a single frame-level representation. We show that this structure has several properties, including orthogonality between relative positions, and emergence of implicit phonetic boundaries. Together, our findings advance our understanding of context-dependent S3M representations.

【9】Accelerating materials discovery using foundation model based In-context active learning
标题：使用基于基础模型的上下文主动学习加速材料发现
链接：https://arxiv.org/abs/2603.12567

作者：Jeffrey Hu,Rongzhi Dong,Ying Feng,Ming Hu,Jianjun Hu
备注：18 pages
摘要：主动学习（AL）已经成为一种强大的范例，通过迭代地将实验转向最有前途的候选者，减少昂贵的合成和表征周期，加速材料发现。然而，目前的AL主要依赖于高斯过程（GP）和随机森林（RF）替代品，具有互补的限制：由于严格的内核假设，GP不适合复杂的组成-属性景观，而RF在小数据状态下产生不可靠的不确定性估计，正是大多数材料数据集所在的位置（< 500个样本）。在这里，我们提出了基于基础模型的上下文主动学习（ICAL），用TabPFN代替传统的代理，TabPFN是一种基于transformer的基础模型，在数百万个合成任务上进行预训练，以元学习表格数据的通用先验。TabPFN在单个前向传递中执行原则性贝叶斯推理，而无需特定于队列的再训练，在GP和RF失败最严重的情况下提供校准良好的预测不确定性。在涵盖铜合金硬度和电导率、块体金属玻璃形成能力和晶格热导率的10个材料数据集上，TabPFN以GP和RF为基准，在10个数据集中有8个数据集获胜，相对于GP和RF，在额外的实验/评估中平均节省了52%，相对于RF节省了29.77%。交叉验证分析证实，TabPFN的优势源于优越的不确定性校准，在所有替代品中实现了最低的负对数似然比和稀疏化误差曲线下的面积。我们的工作表明，预先训练的基础模型可以作为加速基于主动学习的材料发现的高效替代。
摘要：Active learning (AL) has emerged as a powerful paradigm for accelerating materials discovery by iteratively steering experiments toward the most promising candidates, reducing costly synthesis-and-characterization cycles. However, current AL relies predominantly on Gaussian Process (GP) and Random Forest (RF) surrogates with complementary limitations: GP underfits complex composition--property landscapes due to rigid kernel assumptions, while RF produces unreliable uncertainty estimates in small-data regimes, precisely where most materials datasets reside (with < 500 samples). Here we propose foudaiton model based In-Context Active Learning (ICAL), replacing conventional surrogates with TabPFN, a transformer-based foundation model pre-trained on millions of synthetic tasks to meta-learn a universal prior over tabular data. TabPFN performs principled Bayesian inference in a single forward pass without dataset-specific retraining, delivering well-calibrated predictive uncertainty where GP and RF fail most severely. Benchmarked against GP and RF across 10 materials datasets spanning copper alloy hardness and electrical conductivity, bulk metallic glass-forming ability, and crystal lattice thermal conductivity, TabPFN wins on 8 out of 10 datasets, achieving a mean saving of 52\% in extra experiments/evaluations relative to GP and 29.77% relative to RF. Cross-validation analysis confirms that TabPFN's advantage stems from superior uncertainty calibration,achieving the lowest Negative Log-Likelihood and Area Under the Sparsification Error curve among all surrogates. Our work demonstrates that a pre-trained foundation model can serve as a highly effective surrogate for accelerating active learning-based materials discovery.

迁移|Zero/Few/One-Shot|自适应(8篇)

【1】Causal Cellular Context Transfer Learning (C3TL): An Efficient Architecture for Prediction of Unseen Perturbation Effects
标题：因果细胞上下文转移学习（C3 TL）：预测不可见微扰效应的有效架构
链接：https://arxiv.org/abs/2603.13051

作者：Michael Scholkemper,Sach Mukherjee
备注：12 Pages, 3 figures, Keywords: perturbation prediction, context transfer, lightweight, machine learning
摘要：预测化学和遗传扰动对定量细胞状态的影响是计算生物学、分子医学和药物发现的核心挑战。最近的工作利用了大规模的单细胞数据和大量的基础模型来解决这一任务。然而，这样的计算资源和广泛的数据集在学术或临床环境中并不总是可访问的，因此限制了实用性。在这里，我们提出了一个轻量级的扰动效应预测框架，利用生物干预和特定的诱导偏差/不变性的结构化性质。我们的方法利用有关扰动效应的可用信息，允许推广到新的背景下，只需要广泛可用的散装分子数据。广泛的测试，对真实的，大规模的干预性实验的具体情况下的扰动效应的预测进行比较，证明在新的情况下准确的预测。该方法与SOTA地基模型具有竞争力，但需要更简单的数据，更小的模型尺寸和更少的时间。专注于强大的批量信号和高效的架构，我们表明，扰动效应的准确预测是可能的，而无需专有硬件或非常大的模型，从而开辟了利用因果学习方法在生物医学一般。
摘要：Predicting the effects of chemical and genetic perturbations on quantitative cell states is a central challenge in computational biology, molecular medicine and drug discovery. Recent work has leveraged large-scale single-cell data and massive foundation models to address this task. However, such computational resources and extensive datasets are not always accessible in academic or clinical settings, hence limiting utility. Here we propose a lightweight framework for perturbation effect prediction that exploits the structured nature of biological interventions and specific inductive biases/invariances. Our approach leverages available information concerning perturbation effects to allow generalization to novel contexts and requires only widely-available bulk molecular data. Extensive testing, comparing predictions of context-specific perturbation effects against real, large-scale interventional experiments, demonstrates accurate prediction in new contexts. The proposed approach is competitive with SOTA foundation models but requires simpler data, much smaller model sizes and less time. Focusing on robust bulk signals and efficient architectures, we show that accurate prediction of perturbation effects is possible without proprietary hardware or very large models, hence opening up ways to leverage causal learning approaches in biomedicine generally.

【2】Federated Few-Shot Learning on Neuromorphic Hardware: An Empirical Study Across Physical Edge Nodes
标题：神经形态硬件上的联合Few-Shot学习：跨物理边缘节点的实证研究
链接：https://arxiv.org/abs/2603.13037

作者：Steven Motta,Gioele Nanni
备注：13 pages, 2 figures, 10 tables. Code: https://github.com/Stemo688/federated-neuromorphic-learning
摘要：神经形态硬件上的联邦学习仍然未被探索，因为片上尖峰定时依赖可塑性（STDP）产生二进制权重更新，而不是标准算法假设的浮点梯度。我们使用BrainChip Akida AKD1000处理器构建了一个两节点联邦系统，并在七个分析阶段运行了大约1，580次实验。在测试的四种权重交换策略中，神经元级连接（FedUnion）始终保持准确性，而元素级权重平均（FedAvg）则破坏了准确性（p = 0.002）。上游特征提取器的域自适应微调占了大部分精度增益，确认特征质量为主导因素。将特征维度从64缩放到256，产生了77.0%的最佳策略联邦准确率（n=30，p < 0.001）。两个独立的不对称性（更广泛的特征比个体学习更有助于联邦，而二进制化更不利于联邦）指向一个共享的原型互补机制：跨节点传输与神经元原型的独特性有关。
摘要：Federated learning on neuromorphic hardware remains unexplored because on-chip spike-timing-dependent plasticity (STDP) produces binary weight updates rather than the floating-point gradients assumed by standard algorithms. We build a two-node federated system with BrainChip Akida AKD1000 processors and run approximately 1,580 experimental trials across seven analysis phases. Of four weight-exchange strategies tested, neuron-level concatenation (FedUnion) consistently preserves accuracy while element-wise weight averaging (FedAvg) destroys it (p = 0.002). Domain-adaptive fine-tuning of the upstream feature extractor accounts for most of the accuracy gains, confirming feature quality as the dominant factor. Scaling feature dimensionality from 64 to 256 yields 77.0% best-strategy federated accuracy (n=30, p < 0.001). Two independent asymmetries (wider features help federation more than individual learning, while binarization hurts federation more) point to a shared prototype complementarity mechanism: cross-node transfer scales with the distinctiveness of neuron prototypes.

【3】DirPA: Addressing Prior Shift in Imbalanced Few-shot Crop-type Classification
标题：DirPA：解决不平衡少粒作物类型分类的先前转变
链接：https://arxiv.org/abs/2603.12905

作者：Joana Reuss,Ekaterina Gikalo,Marco Körner
备注：20 pages, 9 Figures, 28 Tables
摘要：现实世界的农业监测往往受到严重的类别不平衡和高昂的标签获取成本的阻碍，导致数据严重匮乏。在Few-Shot learning（FSL）中，训练集通常是人为平衡的。然而，这与自然界中观察到的长尾分布脱节，导致分布变化，破坏了模型推广到现实世界农业任务的能力。我们之前介绍了Dirichlet先验增强（DirPA; Reuss等人，2026 a）以在模型训练期间主动地减轻这种标签分布偏斜的影响。在这项工作中，我们扩大了原来的研究的地理范围。具体而言，我们评估了这种扩展方法在欧盟（EU）多个国家的应用，超越了本地化实验，以测试该方法在不同农业环境中的适应能力。我们的研究结果证明了DirPA在不同地理区域的有效性。我们表明，DirPA不仅提高了系统的鲁棒性和稳定的极端长尾分布下的训练，无论目标区域，但也大大提高了个人类特定的性能，通过主动模拟先验。
摘要：Real-world agricultural monitoring is often hampered by severe class imbalance and high label acquisition costs, resulting in significant data scarcity. In few-shot learning (FSL) -- a framework specifically designed for data-scarce settings -- , training sets are often artificially balanced. However, this creates a disconnect from the long-tailed distributions observed in nature, leading to a distribution shift that undermines the model's ability to generalize to real-world agricultural tasks. We previously introduced Dirichlet Prior Augmentation (DirPA; Reuss et al., 2026a) to proactively mitigate the effects of such label distribution skews during model training. In this work, we extend the original study's geographical scope. Specifically, we evaluate this extended approach across multiple countries in the European Union (EU), moving beyond localized experiments to test the method's resilience across diverse agricultural environments. Our results demonstrate the effectiveness of DirPA across different geographical regions. We show that DirPA not only improves system robustness and stabilizes training under extreme long-tailed distributions, regardless of the target region, but also substantially improves individual class-specific performance by proactively simulating priors.

【4】Forecasting Epileptic Seizures from Contactless Camera via Cross-Species Transfer Learning
标题：通过跨物种迁移学习通过非接触式摄像机预测癫痫发作
链接：https://arxiv.org/abs/2603.12887

作者：Mingkai Zhai,Wei Wang,Zongsheng Li,Quanying Liu
摘要：癫痫发作的预测是癫痫研究中一个重要而又具有挑战性的问题。现有的方法主要依赖于神经信号，如脑电图（EEG），这需要专门的设备，并限制了在现实世界中的长期部署。相比之下，视频数据提供了一种非侵入性和可访问的替代方案，但现有的基于视频的研究主要集中在发病后癫痫发作检测，使癫痫发作预测在很大程度上未经探索。在这项工作中，我们制定了一个新的任务，基于视频的癫痫发作预测，其中短发作前的视频片段（3-10秒）被用来预测是否会在随后的5秒内发生癫痫发作。为了解决注释人类癫痫视频的稀缺性，我们提出了一个跨物种迁移学习框架，该框架利用大规模啮齿动物视频数据进行辅助预训练。这使得该模型能够捕获跨物种概括的与行为相关的行为动态。实验结果表明，我们的方法实现了超过70%的预测准确率下，严格的视频设置和优于现有的基线。这些发现强调了跨物种学习在建立非侵入性、可扩展的癫痫早期预警系统方面的潜力。
摘要：Epileptic seizure forecasting is a clinically important yet challenging problem in epilepsy research. Existing approaches predominantly rely on neural signals such as electroencephalography (EEG), which require specialized equipment and limit long-term deployment in real-world settings. In contrast, video data provide a non-invasive and accessible alternative, yet existing video-based studies mainly focus on post-onset seizure detection, leaving seizure forecasting largely unexplored. In this work, we formulate a novel task of video-based epileptic seizure forecasting, where short pre-ictal video segments (3-10 seconds) are used to predict whether a seizure will occur within the subsequent 5 seconds. To address the scarcity of annotated human epilepsy videos, we propose a cross-species transfer learning framework that leverages large-scale rodent video data for auxiliary pretraining. This enables the model to capture seizure-related behavioral dynamics that generalize across species. Experimental results demonstrate that our approach achieves over 70% prediction accuracy under a strictly video-only setting and outperforms existing baselines. These findings highlight the potential of cross-species learning for building non-invasive, scalable early-warning systems for epilepsy.

【5】Enhanced Drug-drug Interaction Prediction Using Adaptive Knowledge Integration
标题：使用自适应知识集成增强药物相互作用预测
链接：https://arxiv.org/abs/2603.12885

作者：Pengfei Liu,Jun Tao,Zhixiang Ren
摘要：药物相互作用事件（DDIE）预测对于预防不良反应和确保最佳治疗结果至关重要。然而，现有的方法往往面临的挑战与不平衡的数据集，复杂的相互作用机制，和未知的药物组合的泛化能力差。为了解决这些挑战，我们提出了一个知识增强框架，自适应注入到一个大的语言模型（LLM）的先验药物知识。该框架利用强化学习技术来促进自适应知识提取和合成，从而有效地优化策略空间，以提高LLM对DDIE预测的准确性。作为Few-Shot学习的结果，与基线相比，我们取得了显着的改进。这种方法建立了一个有效的框架，科学知识学习的DDIE预测。
摘要：Drug-drug interaction event (DDIE) prediction is crucial for preventing adverse reactions and ensuring optimal therapeutic outcomes. However, existing methods often face challenges with imbalanced datasets, complex interaction mechanisms, and poor generalization to unknown drug combinations. To address these challenges, we propose a knowledge augmentation framework that adaptively infuses prior drug knowledge into a large language model (LLM). This framework utilizes reinforcement learning techniques to facilitate adaptive knowledge extraction and synthesis, thereby efficiently optimizing the strategy space to enhance the accuracy of LLMs for DDIE predictions. As a result of few-shot learning, we achieved a notable improvement compared to the baseline. This approach establishes an effective framework for scientific knowledge learning for DDIE predictions.

【6】Adaptive Diffusion Posterior Sampling for Data and Model Fusion of Complex Nonlinear Dynamical Systems
标题：复杂非线性动力系统数据与模型融合的自适应扩散后验抽样
链接：https://arxiv.org/abs/2603.12635

作者：Dibyajyoti Chakraborty,Hojin Kim,Romit Maulik
摘要：混沌、高维非线性动力学系统的高保真数值模拟在计算上是昂贵的，需要开发有效的代理模型。这种系统的大多数代理模型都是确定性的，例如当涉及神经操作员时。然而，确定性模型往往无法捕捉混沌系统的内在分布不确定性。这项工作提出了一种利用生成机器学习的替代建模公式，其中深度学习扩散模型用于在长期范围内概率预测湍流。我们引入了一个多步自回归扩散目标，与标准单步训练相比，它显着增强了长期部署的稳定性。为了处理复杂的，非结构化的几何形状，我们利用多尺度图形Transformer架构，结合扩散预处理和体素网格池。更重要的是，我们的建模框架提供了一个统一的平台，也预测传感器放置的时空重要位置，无论是通过不确定性估计或通过误差估计模块。最后，在这些动态变化的传感器位置的地面实况状态的观察同化使用扩散后验采样不需要再训练的代理模型。我们提出了我们的方法，二维均匀和各向同性湍流，并在一个面向后的步骤流，展示其实用性的预测，自适应传感器的位置，和数据同化高维混沌系统。
摘要：High-fidelity numerical simulations of chaotic, high dimensional nonlinear dynamical systems are computationally expensive, necessitating the development of efficient surrogate models. Most surrogate models for such systems are deterministic, for example when neural operators are involved. However, deterministic models often fail to capture the intrinsic distributional uncertainty of chaotic systems. This work presents a surrogate modeling formulation that leverages generative machine learning, where a deep learning diffusion model is used to probabilistically forecast turbulent flows over long horizons. We introduce a multi-step autoregressive diffusion objective that significantly enhances long-rollout stability compared to standard single-step training. To handle complex, unstructured geometries, we utilize a multi-scale graph transformer architecture incorporating diffusion preconditioning and voxel-grid pooling. More importantly, our modeling framework provides a unified platform that also predicts spatiotemporally important locations for sensor placement, either via uncertainty estimates or through an error-estimation module. Finally, the observations of the ground truth state at these dynamically varying sensor locations are assimilated using diffusion posterior sampling requiring no retraining of the surrogate model. We present our methodology on two-dimensional homogeneous and isotropic turbulence and for a flow over a backwards-facing step, demonstrating its utility in forecasting, adaptive sensor placement, and data assimilation for high dimensional chaotic systems.

【7】NeuroLoRA: Context-Aware Neuromodulation for Parameter-Efficient Multi-Task Adaptation
标题：NeuLoRA：上下文感知神经调节，实现参数高效的多任务适应
链接：https://arxiv.org/abs/2603.12378

作者：Yuxin Yang,Haoran Zhang,Mingxuan Li,Jiachen Xu,Ruoxi Shen,Zhenyu Wang,Tianhao Liu,Siqi Chen,Weilin Huang
备注：work in progress
摘要：参数有效的微调（PEFT）技术，特别是低秩自适应（LoRA），已经成为使大型语言模型（LLM）适应下游任务的关键。虽然最近的FlyLoRA框架成功地利用生物启发的稀疏随机投影来减轻参数干扰，但它依赖于一种静态的、基于幅度的路由机制，该机制与输入上下文无关。在本文中，我们提出了NeuroLoRA，一种新的混合专家（MoE）的LoRA框架的启发生物神经调制-神经元兴奋性的动态调节的基础上的背景。NeuroLoRA保留了冻结随机投影的计算效率，同时引入了一个轻量级的，可学习的神经调节门，在专家选择之前根据上下文重新缩放投影空间。我们进一步提出了一个对比性的损失，明确地执行专家子空间之间的分离，提高任务解耦和持续学习能力。在MMLU、GSM 8 K和ScienceQA上进行的大量实验表明，NeuroLoRA在单任务适应、多任务模型合并和连续连续学习场景中的性能始终优于FlyLoRA和其他强基线，同时保持相当的参数效率。
摘要：Parameter-Efficient Fine-Tuning (PEFT) techniques, particularly Low-Rank Adaptation (LoRA), have become essential for adapting Large Language Models (LLMs) to downstream tasks. While the recent FlyLoRA framework successfully leverages bio-inspired sparse random projections to mitigate parameter interference, it relies on a static, magnitude-based routing mechanism that is agnostic to input context. In this paper, we propose NeuroLoRA, a novel Mixture-of-Experts (MoE) based LoRA framework inspired by biological neuromodulation -- the dynamic regulation of neuronal excitability based on context. NeuroLoRA retains the computational efficiency of frozen random projections while introducing a lightweight, learnable neuromodulation gate that contextually rescales the projection space prior to expert selection. We further propose a Contrastive Orthogonality Loss to explicitly enforce separation between expert subspaces, enhancing both task decoupling and continual learning capacity. Extensive experiments on MMLU, GSM8K, and ScienceQA demonstrate that NeuroLoRA consistently outperforms FlyLoRA and other strong baselines across single-task adaptation, multi-task model merging, and sequential continual learning scenarios, while maintaining comparable parameter efficiency.

【8】DART: Input-Difficulty-AwaRe Adaptive Threshold for Early-Exit DNNs
标题：DART：早期退出DNN的输入-困难-AwarRe自适应阈值
链接：https://arxiv.org/abs/2603.12269

作者：Parth Patne,Mahdi Taheri,Christian Herglotz,Maksim Jenihhin,Milos Krstic,Michael Hübner
摘要：早期退出深度神经网络通过在达到足够的置信度时终止计算来实现自适应推理，从而降低资源受限环境中边缘AI加速器的成本。然而，现有的方法依赖于次优的退出策略，忽略输入难度，并独立地优化阈值。本文介绍了DART（输入难度感知自适应阈值），一个框架，克服了这些限制。DART引入了三个关键创新：（1）轻量级难度估计模块，以最小的计算开销量化输入复杂性，（2）基于动态规划的联合退出策略优化算法，以及（3）自适应系数管理系统。在不同DNN基准测试（AlexNet，ResNet-18，VGG-16）上的实验表明，与静态网络相比，DART实现了高达\textbf{3.3$\times$}的加速，\textbf{5.1$\times$}的低能耗和高达\textbf{42\%\}的低平均功耗，同时保持了竞争力的准确性。将DART扩展到Vision Transformers（LeViT）会产生功率（5.0$\times$）和执行时间（3.6$\times$）增益，但也会损失准确性（高达17%），这强调了对特定于transformer的提前退出机制的需求。我们进一步引入了难度感知效率得分（DAES），这是一种新的多目标指标，在此基础上，DART比基线提高了14.8，突出了卓越的准确性，效率和鲁棒性。
摘要：Early-exit deep neural networks enable adaptive inference by terminating computation when sufficient confidence is achieved, reducing cost for edge AI accelerators in resource-constrained settings. Existing methods, however, rely on suboptimal exit policies, ignore input difficulty, and optimize thresholds independently. This paper introduces DART (Input-Difficulty-Aware Adaptive Threshold), a framework that overcomes these limitations. DART introduces three key innovations: (1) a lightweight difficulty estimation module that quantifies input complexity with minimal computational overhead, (2) a joint exit policy optimization algorithm based on dynamic programming, and (3) an adaptive coefficient management system. Experiments on diverse DNN benchmarks (AlexNet, ResNet-18, VGG-16) demonstrate that DART achieves up to \textbf{3.3$\times$} speedup, \textbf{5.1$\times$} lower energy, and up to \textbf{42\%} lower average power compared to static networks, while preserving competitive accuracy. Extending DART to Vision Transformers (LeViT) yields power (5.0$\times$) and execution-time (3.6$\times$) gains but also accuracy loss (up to 17 percent), underscoring the need for transformer-specific early-exit mechanisms. We further introduce the Difficulty-Aware Efficiency Score (DAES), a novel multi-objective metric, under which DART achieves up to a 14.8 improvement over baselines, highlighting superior accuracy, efficiency, and robustness trade-offs.

强化学习(5篇)

【1】PISmith: Reinforcement Learning-based Red Teaming for Prompt Injection Defenses
标题：PISmith：基于强化学习的红色团队进行即时注射防御
链接：https://arxiv.org/abs/2603.13026

作者：Chenlong Yin,Runpeng Geng,Yanting Wang,Jinyuan Jia
备注：26 pages, 3 figures
摘要：即时注入对现实世界的LLM应用程序，特别是自治代理，构成严重的安全风险。虽然已经提出了许多防御措施，但它们对自适应攻击的鲁棒性仍然没有得到充分的评估，可能会造成虚假的安全感。在这项工作中，我们提出了PISmith，一个基于强化学习（RL）的红队框架，通过训练攻击LLM来优化实际黑盒设置中的注入提示，系统地评估现有的注入提示防御，其中攻击者只能查询防御LLM并观察其输出。我们发现，直接应用标准GRPO来攻击强大的防御系统会导致次优性能，因为极端的奖励稀疏性-大多数生成的注入提示被防御系统阻止，导致策略的熵在发现有效的攻击策略之前崩溃，而罕见的成功无法有效地学习。作为回应，我们引入了自适应熵正则化和动态优势加权，以维持探索并从稀缺的成功中扩大学习。对13个基准测试的广泛评估表明，最先进的即时注入防御仍然容易受到自适应攻击。我们还将PISmith与静态、基于搜索和基于RL的攻击类别的7个基线进行了比较，结果表明PISmith始终实现了最高的攻击成功率。此外，PISmith在InjecAgent和AgentDojo上的代理设置中实现了强大的性能，可以对抗开源和闭源LLM（例如，GPT-40-mini和GPT-5-nano）。我们的代码可在https://github.com/albert-y1n/PISmith上获得。
摘要：Prompt injection poses serious security risks to real-world LLM applications, particularly autonomous agents. Although many defenses have been proposed, their robustness against adaptive attacks remains insufficiently evaluated, potentially creating a false sense of security. In this work, we propose PISmith, a reinforcement learning (RL)-based red-teaming framework that systematically assesses existing prompt-injection defenses by training an attack LLM to optimize injected prompts in a practical black-box setting, where the attacker can only query the defended LLM and observe its outputs. We find that directly applying standard GRPO to attack strong defenses leads to sub-optimal performance due to extreme reward sparsity -- most generated injected prompts are blocked by the defense, causing the policy's entropy to collapse before discovering effective attack strategies, while the rare successes cannot be learned effectively. In response, we introduce adaptive entropy regularization and dynamic advantage weighting to sustain exploration and amplify learning from scarce successes. Extensive evaluation on 13 benchmarks demonstrates that state-of-the-art prompt injection defenses remain vulnerable to adaptive attacks. We also compare PISmith with 7 baselines across static, search-based, and RL-based attack categories, showing that PISmith consistently achieves the highest attack success rates. Furthermore, PISmith achieves strong performance in agentic settings on InjecAgent and AgentDojo against both open-source and closed-source LLMs (e.g., GPT-4o-mini and GPT-5-nano). Our code is available at https://github.com/albert-y1n/PISmith.

【2】ARL-Tangram: Unleash the Resource Efficiency in Agentic Reinforcement Learning
标题：ARL-Tangram：在抽象强化学习中释放资源效率
链接：https://arxiv.org/abs/2603.13019

作者：Bangjun Xiao,Yihao Zhao,Xiangwei Deng,Shihua Yu,Yuxing Xiang,Huaqiu Liu,Qiying Wang,Liang Zhao,Hailin Zhang,Xuanzhe Liu,Xin Jin,Fuli Luo
摘要：强化学习（RL）已经成为云集群中的一种变革性工作负载，使大型语言模型（LLM）能够通过与现实世界的交互来解决复杂问题。然而，与传统RL不同，代理RL需要大量的外部云资源，例如，用于代码执行的CPU和用于奖励模型的GPU，它们存在于主训练集群之外。现有的代理RL框架通常依赖于静态过度供应，即，资源往往与长期存在的轨迹联系在一起，或被任务孤立，这导致严重的资源效率低下。我们提出了行动级编排，并将其纳入ARL-Tangram，一个统一的资源管理系统，使细粒度的外部资源共享和弹性。ARL-Tangram利用统一的动作级公式和弹性调度算法，以最大限度地减少动作完成时间（ACT），同时满足异构资源约束。此外，异构资源管理器被定制为有效地支持在具有异构特性和拓扑的资源上的动作级执行。在真实的Agent强化学习任务上的测试表明，ARL-Tangram提高了平均ACT最多4.3times，加快了强化学习训练的步长最多1.5times，节省了外部资源最多71.2%。该系统已部署用于支持MiMo系列模型的培训。
摘要：Agentic reinforcement learning (RL) has emerged as a transformative workload in cloud clusters, enabling large language models (LLMs) to solve complex problems through interactions with real world. However, unlike traditional RL, agentic RL demands substantial external cloud resources, e.g., CPUs for code execution and GPUs for reward models, that exist outside the primary training cluster. Existing agentic RL framework typically rely on static over-provisioning, i.e., resources are often tied to long-lived trajectories or isolated by tasks, which leads to severe resource inefficiency. We propose the action-level orchestration, and incorporate it into ARL-Tangram, a unified resource management system that enables fine-grained external resource sharing and elasticity. ARL-Tangram utilizes a unified action-level formulation and an elastic scheduling algorithm to minimize action completion time (ACT) while satisfying heterogeneous resource constraints. Further, heterogeneous resource managers are tailored to efficiently support the action-level execution on resources with heterogeneous characteristics and topologies. Evaluation on real-world agentic RL tasks demonstrates that ARL-Tangram improves average ACT by up to 4.3$\times$, speeds up the step duration of RL training by up to 1.5$\times$, and saves the external resources by up to 71.2$\%$. This system has been deployed to support the training of the MiMo series models.

【3】Swap-guided Preference Learning for Personalized Reinforcement Learning from Human Feedback
标题：基于反馈的Swap引导偏好学习的个性化强化学习
链接：https://arxiv.org/abs/2603.12595

作者：Gihoon Kim,Euntai Kim
备注：ICLR 2026
摘要：基于人类反馈的强化学习（RLHF）是一种广泛使用的方法，用于将大规模AI系统与人类价值观相结合。然而，RLHF通常假设一个单一的、普遍的奖励，这忽略了不同的偏好并限制了个性化。变分偏好学习（VPL）试图通过引入用户特定的潜变量来解决这个问题。尽管它的承诺，我们发现，VPL遭受后方塌陷。虽然这种现象在VAE中是众所周知的，但以前在偏好学习框架中还没有发现。在稀疏偏好数据和过度表达的解码器下，VPL可能会导致潜在变量被忽略，恢复到单一奖励模型。为了克服这一局限性，我们提出了交换引导的偏好学习（SPL）。其关键思想是构造虚构的交换注释器，并使用其首选项的镜像属性来指导编码器。SPL引入了三个组件：（1）交换引导的基本正则化，（2）优先逆自回归流（P-IAF），以及（3）自适应潜在条件反射。实验表明，SPL减轻崩溃，丰富用户特定的潜伏期，并提高偏好预测。我们的代码和数据可在https://github.com/cobang0111/SPL上获得
摘要：Reinforcement Learning from Human Feedback (RLHF) is a widely used approach to align large-scale AI systems with human values. However, RLHF typically assumes a single, universal reward, which overlooks diverse preferences and limits personalization. Variational Preference Learning (VPL) seeks to address this by introducing user-specific latent variables. Despite its promise, we found that VPL suffers from posterior collapse. While this phenomenon is well known in VAEs, it has not previously been identified in preference learning frameworks. Under sparse preference data and with overly expressive decoders, VPL may cause latent variables to be ignored, reverting to a single-reward model. To overcome this limitation, we propose Swap-guided Preference Learning (SPL). The key idea is to construct fictitious swap annotators and use the mirroring property of their preferences to guide the encoder. SPL introduces three components: (1) swap-guided base regularization, (2) Preferential Inverse Autoregressive Flow (P-IAF), and (3) adaptive latent conditioning. Experiments show that SPL mitigates collapse, enriches user-specific latents, and improves preference prediction. Our code and data are available at https://github.com/cobang0111/SPL

【4】CALF: Communication-Aware Learning Framework for Distributed Reinforcement Learning
标题：CALF：用于分布式强化学习的通信感知学习框架
链接：https://arxiv.org/abs/2603.12543

作者：Carlos Purves,Pietro Lio'
摘要：分布式强化学习策略在跨边缘设备和云服务器部署时面临网络延迟、抖动和数据包丢失。标准RL训练假设零延迟交互，在现实网络条件下导致严重的性能下降。我们引入了CALF（通信感知学习框架），它在仿真过程中根据真实的网络模型训练策略。系统实验表明，与网络无关的基线相比，网络感知培训大大减少了部署性能差距。跨异构硬件的分布式策略部署验证了在训练期间显式建模通信约束能够实现强大的现实世界执行。这些发现将网络条件确立为Wi-Fi类分布式部署的模拟到真实传输的主轴，补充了物理和视觉域随机化。
摘要：Distributed reinforcement learning policies face network delays, jitter, and packet loss when deployed across edge devices and cloud servers. Standard RL training assumes zero-latency interaction, causing severe performance degradation under realistic network conditions. We introduce CALF (Communication-Aware Learning Framework), which trains policies under realistic network models during simulation. Systematic experiments demonstrate that network-aware training substantially reduces deployment performance gaps compared to network-agnostic baselines. Distributed policy deployments across heterogeneous hardware validate that explicitly modelling communication constraints during training enables robust real-world execution. These findings establish network conditions as a major axis of sim-to-real transfer for Wi-Fi-like distributed deployments, complementing physics and visual domain randomisation.

【5】Thermodynamics of Reinforcement Learning Curricula
标题：强化学习课程的热力学
链接：https://arxiv.org/abs/2603.12324

作者：Jacob Adamczyk,Juan Sebastian Rojas,Rahul V. Kulkarni
备注：Accepted at SciForDL Workshop at ICLR 2026
摘要：统计力学和机器学习之间的联系已经多次被证明是富有成效的，为优化，泛化和表示学习提供了深入的见解。在这项工作中，我们遵循这一传统，利用非平衡态热力学的结果来形式化强化学习（RL）中的课程学习。特别是，我们提出了一个几何框架RL解释奖励参数作为任务流形上的坐标。我们表明，通过最大限度地减少多余的热力学工作，最佳课程对应于测地线在这个任务空间。作为这个框架的应用，我们提供了一个算法，“MEW”（最小超额工作），推导出最大熵RL中温度退火的原则性时间表。
摘要：Connections between statistical mechanics and machine learning have repeatedly proven fruitful, providing insight into optimization, generalization, and representation learning. In this work, we follow this tradition by leveraging results from non-equilibrium thermodynamics to formalize curriculum learning in reinforcement learning (RL). In particular, we propose a geometric framework for RL by interpreting reward parameters as coordinates on a task manifold. We show that, by minimizing the excess thermodynamic work, optimal curricula correspond to geodesics in this task space. As an application of this framework, we provide an algorithm, "MEW" (Minimum Excess Work), to derive a principled schedule for temperature annealing in maximum-entropy RL.

分层学习(1篇)

【1】Wear Classification of Abrasive Flap Wheels using a Hierarchical Deep Learning Approach
标题：使用分层深度学习方法对磨粒瓣轮进行磨损分类
链接：https://arxiv.org/abs/2603.12852

作者：Falko Kähler,Maxim Wille,Ole Schmedemann,Thorsten Schüppstuhl
备注：14 pages, 11 figures, 8 tables
摘要：由于其灵活性，研磨瓣轮通常用于精加工复杂的自由曲面。然而，这种灵活性会导致复杂的磨损模式，如凹/凸瓣轮廓或瓣撕裂，这会影响磨削结果。本文提出了一种新的，基于视觉的层次分类框架，自动化的襟翼轮磨损状态监测。与整体分类方法不同，我们将问题分解为三个逻辑级别：（1）状态检测（新与磨损），（2）磨损类型识别（矩形，凹面，凸面）和皮瓣撕裂检测，以及（3）严重程度评估（部分与完全变形）。生成了一个定制的真实襟翼轮图像数据集，并使用了具有EfficientNetV 2架构的迁移学习方法。结果表明，具有较高的鲁棒性，分类准确率范围从93.8%（皮瓣撕裂）到99.3%（凹陷严重程度）。此外，加权类激活映射（Grad-CAM）被用来验证模型学习物理相关的功能和检查错误的分类。所提出的分层方法提供了一个自适应过程控制和自动瓣砂轮磨削磨损的考虑的基础。
摘要：Abrasive flap wheels are common for finishing complex free-form surfaces due to their flexibility. However, this flexibility results in complex wear patterns such as concave/convex flap profiles or flap tears, which influence the grinding result. This paper proposes a novel, vision-based hierarchical classification framework to automate the wear condition monitoring of flap wheels. Unlike monolithic classification approaches, we decompose the problem into three logical levels: (1) state detection (new vs. worn), (2) wear type identification (rectangular, concave, convex) and flap tear detection, and (3) severity assessment (partial vs. complete deformation). A custom-built dataset of real flap wheel images was generated and a transfer learning approach with EfficientNetV2 architecture was used. The results demonstrate high robustness with classification accuracies ranging from 93.8% (flap tears) to 99.3% (concave severity). Furthermore, Gradient-weighted Class Activation Mapping (Grad-CAM) is utilized to validate that the models learn physically relevant features and examine false classifications. The proposed hierarchical method provides a basis for adaptive process control and wear consideration in automated flap wheel grinding.

医学相关(2篇)

【1】Accelerating Stroke MRI with Diffusion Probabilistic Models through Large-Scale Pre-training and Target-Specific Fine-Tuning
标题：通过大规模预训练和靶点特异性微调加速具有扩散概率模型的卒中MRI
链接：https://arxiv.org/abs/2603.13007

作者：Yamin Arefeen,Sidharth Kumar,Steven Warach,Hamidreza Saber,Jonathan Tamir
摘要：目的：开发一种数据高效策略，用于使用扩散概率生成模型（DPM）加速MRI重建，在只有有限的完全采样数据样本可用时，在临床卒中MRI中实现更快的扫描时间。研究方法：我们的简单训练策略受到基础模型范式的启发，首先在fastMRI中对大量不同的公开可用的大脑MRI数据集进行训练，然后使用精心选择的学习率和微调持续时间对目标应用程序的小数据集进行微调。该方法进行了评估控制fastMRI实验和临床中风MRI数据与盲临床读者的研究。结果如下：在大约4000个具有非FLAIR对比度的受试者上预训练并在仅来自20个目标受试者的FLAIR数据上微调的DPM实现了与在多个加速因子中使用更多目标域FLAIR数据训练的模型相当的重建性能。实验表明，适度的微调与降低学习率产生改善的性能，而不足或过度微调降低重建质量。当应用于临床卒中MRI时，一项涉及两名神经放射科医生的设盲阅片人研究表明，使用所提出的方法从2 × $加速数据重建的图像在图像质量和结构描绘方面不劣于标准治疗。结论：大规模预训练结合有针对性的微调，可在数据受限的加速临床卒中MRI中实现基于DPM的MRI重建。所提出的方法大大减少了对大型应用特定数据集的需求，同时保持临床上可接受的图像质量，支持在目标应用中使用基础启发的扩散模型进行加速MRI。
摘要：Purpose: To develop a data-efficient strategy for accelerated MRI reconstruction with Diffusion Probabilistic Generative Models (DPMs) that enables faster scan times in clinical stroke MRI when only limited fully-sampled data samples are available. Methods: Our simple training strategy, inspired by the foundation model paradigm, first trains a DPM on a large, diverse collection of publicly available brain MRI data in fastMRI and then fine-tunes on a small dataset from the target application using carefully selected learning rates and fine-tuning durations. The approach is evaluated on controlled fastMRI experiments and on clinical stroke MRI data with a blinded clinical reader study. Results: DPMs pre-trained on approximately 4000 subjects with non-FLAIR contrasts and fine-tuned on FLAIR data from only 20 target subjects achieve reconstruction performance comparable to models trained with substantially more target-domain FLAIR data across multiple acceleration factors. Experiments reveal that moderate fine-tuning with a reduced learning rate yields improved performance, while insufficient or excessive fine-tuning degrades reconstruction quality. When applied to clinical stroke MRI, a blinded reader study involving two neuroradiologists indicates that images reconstructed using the proposed approach from $2 \times$ accelerated data are non-inferior to standard-of-care in terms of image quality and structural delineation. Conclusion: Large-scale pre-training combined with targeted fine-tuning enables DPM-based MRI reconstruction in data-constrained, accelerated clinical stroke MRI. The proposed approach substantially reduces the need for large application-specific datasets while maintaining clinically acceptable image quality, supporting the use of foundation-inspired diffusion models for accelerated MRI in targeted applications.

【2】Unmasking Biases and Reliability Concerns in Convolutional Neural Networks Analysis of Cancer Pathology Images
标题：揭开癌症病理图像卷积神经网络分析中的偏见和可靠性问题
链接：https://arxiv.org/abs/2603.12445

作者：Michael Okonoda,Eder Martinez,Abhilekha Dalal,Lior Shamir
备注：Electronics, published
摘要：卷积神经网络在从X光照片中识别不同类型的癌症方面表现出了很好的效果。然而，CNN的不透明性使得人们很难完全理解它们的运作方式，从而将它们的评估限制在经验评估上。在这里，我们研究了标准实践的合理性，通过这些标准实践，CNN被评估用于癌症病理学的目的。使用四种常见的CNN架构和不同类型的癌症，如黑色素瘤、癌、结直肠癌和肺癌，分析了13个高度使用的癌症基准数据集。我们将每个模型的准确性与由原始图像背景中不包含临床相关内容的裁剪片段组成的数据集进行了比较。因为渲染的数据集不包含临床信息，所以零假设是CNN在对这些数据集进行分类时应该提供仅仅基于机会的准确性。结果表明，CNN模型在使用裁剪片段时提供了高准确性，有时高达93%，即使它们缺乏生物医学信息。这些结果表明，一些CNN架构比其他架构对偏置更敏感。分析表明，机器学习评估的常见做法在应用于癌症病理学时可能会导致不可靠的结果。这些偏差很难识别，并且可能会误导研究人员，因为他们使用可用的基准数据集来测试CNN方法的有效性。
摘要：Convolutional Neural Networks have shown promising effectiveness in identifying different types of cancer from radiographs. However, the opaque nature of CNNs makes it difficult to fully understand the way they operate, limiting their assessment to empirical evaluation. Here we study the soundness of the standard practices by which CNNs are evaluated for the purpose of cancer pathology. Thirteen highly used cancer benchmark datasets were analyzed, using four common CNN architectures and different types of cancer, such as melanoma, carcinoma, colorectal cancer, and lung cancer. We compared the accuracy of each model with that of datasets made of cropped segments from the background of the original images that do not contain clinically relevant content. Because the rendered datasets contain no clinical information, the null hypothesis is that the CNNs should provide mere chance-based accuracy when classifying these datasets. The results show that the CNN models provided high accuracy when using the cropped segments, sometimes as high as 93\%, even though they lacked biomedical information. These results show that some CNN architectures are more sensitive to bias than others. The analysis shows that the common practices of machine learning evaluation might lead to unreliable results when applied to cancer pathology. These biases are very difficult to identify, and might mislead researchers as they use available benchmark datasets to test the efficacy of CNN methods.

蒸馏|知识提取(1篇)

【1】NanoVDR: Distilling a 2B Vision-Language Retriever into a 70M Text-Only Encoder for Visual Document Retrieval
标题：NanoVDR：将2B视觉语言检索器提炼为70 M纯文本编码器，用于视觉文档检索
链接：https://arxiv.org/abs/2603.12824

作者：Zhuchenyang Liu,Yao Zhang,Yu Xiao
摘要：视觉语言模型（VLM）为基础的检索先进的视觉文档检索（VDR），以令人印象深刻的质量。它们需要相同的数十亿参数编码器来进行文档索引和查询编码，即使对于纯文本查询也会产生高延迟和GPU依赖性。我们观察到，这种设计是不必要的对称：文档在视觉上是复杂的，需要强大的视觉理解，而查询只是短文本字符串。NanoVDR通过解耦两个编码路径来利用这种查询-文档不对称性：冻结的2B VLM教师离线索引文档，而小到69 M参数的蒸馏纯文本学生在推理时编码查询。关键的设计选择是蒸馏目标。通过对三个主干和22个ViDoRe基准数据集的六个目标进行系统比较，我们发现查询文本上的逐点余弦对齐始终优于基于排名和对比的替代方案，同时只需要预缓存的教师查询嵌入，并且在训练过程中不需要文档处理。此外，我们将跨语言传输确定为主要的性能瓶颈，并通过使用机器翻译的查询来增强训练数据来廉价地解决它。由此产生的NanoVDR-S-Multi（DistilBERT，69 M）保留了95.1%的教师质量，并在v2和v3上优于DSE-Qwen 2（2B），参数减少了32$\times$，CPU查询延迟降低了50$\times$，总培训成本低于13 GPU小时。
摘要：Vision-Language Model (VLM) based retrievers have advanced visual document retrieval (VDR) to impressive quality. They require the same multi-billion parameter encoder for both document indexing and query encoding, incurring high latency and GPU dependence even for plain-text queries. We observe that this design is unnecessarily symmetric: documents are visually complex and demand strong visual understanding, whereas queries are just short text strings. NanoVDR exploits this query--document asymmetry by decoupling the two encoding paths: a frozen 2B VLM teacher indexes documents offline, while a distilled text-only student as small as 69M parameters encodes queries at inference. The key design choice is the distillation objective. Through systematic comparison of six objectives across three backbones and 22 ViDoRe benchmark datasets, we find that pointwise cosine alignment on query text consistently outperforms ranking-based and contrastive alternatives, while requiring only pre-cached teacher query embeddings and no document processing during training. Furthermore, we identify cross-lingual transfer as the primary performance bottleneck, and resolve it cheaply by augmenting training data with machine-translated queries. The resulting NanoVDR-S-Multi (DistilBERT, 69M) retains 95.1\% of teacher quality and outperforms DSE-Qwen2 (2B) on v2 and v3 with 32$\times$ fewer parameters and 50$\times$ lower CPU query latency, at a total training cost under 13 GPU-hours.

推荐(2篇)

【1】Anchored Alignment: Preventing Positional Collapse in Multimodal Recommender Systems
标题：锚定对齐：防止多模式推荐系统中的位置崩溃
链接：https://arxiv.org/abs/2603.12726

作者：Yonghun Jeong,David Yoon Suk Kang,Yeon-Chang Lee
备注：5 pages, 5 figures
摘要：多模态推荐系统（MMRS）利用图像、文本和交互信号来丰富项目表示。然而，最近的对齐基于MMRS，强制一个统一的嵌入空间往往模糊模态特定的结构和加剧ID的优势。因此，我们提出了AnchorRec，一个多模态推荐框架，在一个轻量级的投影域中执行间接的，基于锚的对齐。通过将对齐与表示学习解耦，AnchorRec保留了每个模态的原生结构，同时保持跨模态一致性并避免位置崩溃。在四个Amazon数据集上的实验表明，AnchorRec实现了具有竞争力的前N推荐准确性，而定性分析表明改进了多模态表达性和一致性。AnchorRec的代码库可以在https://github.com/hun9008/AnchorRec上找到。
摘要：Multimodal recommender systems (MMRS) leverage images, text, and interaction signals to enrich item representations. However, recent alignment based MMRSs that enforce a unified embedding space often blur modality specific structures and exacerbate ID dominance. Therefore, we propose AnchorRec, a multimodal recommendation framework that performs indirect, anchor based alignment in a lightweight projection domain. By decoupling alignment from representation learning, AnchorRec preserves each modality's native structure while maintaining cross modal consistency and avoiding positional collapse. Experiments on four Amazon datasets show that AnchorRec achieves competitive top N recommendation accuracy, while qualitative analyses demonstrate improved multimodal expressiveness and coherence. The codebase of AnchorRec is available at https://github.com/hun9008/AnchorRec.

【2】A Holistic Framework for Automated Configuration Recommendation for Cloud Service Monitoring
标题：云服务监控自动配置推荐的整体框架
链接：https://arxiv.org/abs/2603.12268

作者：Anson Bastos,Shreeya Venneti,Anjaly Parayil,Ayush Choure,Chetan Bansal,Rujia Wang
摘要：大规模云服务的可靠性对于用户满意度和业务连续性至关重要。尽管在可靠性工程方面进行了大量投资，但生产事故仍然不可避免，通常会导致客户影响和运营开销。在大型云计算公司中，跨区域部署多种服务，需要强大的健康监控系统。然而，目前的监视器配置过程是手动的，主要是被动的和临时的，导致覆盖范围和冗余警报的差距。在本文中，我们对Microsoft中的监视器创建进行了全面的研究，确定了现有流程中的关键组件。我们进一步设计了一个模块化的推荐框架，处理图结构的服务实体，建议最佳的监视器配置。通过对历史数据的广泛实验和对Microsoft生产服务建议的用户研究，我们证明了我们的方法在为监视器配置提供相关建议方面的有效性。
摘要：Reliability of large-scale cloud services is critical for user satisfaction and business continuity. Despite significant investments in reliability engineering, production incidents remain inevitable, often leading to customer impact and operational overhead. In large cloud companies, multiple services are deployed across regions necessitating robust health monitoring systems. However, the current monitor configuration process is manual, largely reactive and ad hoc, resulting in gaps in coverage and redundant alerts. In this paper, we present a comprehensive study of monitor creation in Microsoft, identifying key components in the existing process. We further design a modular recommendation framework that processes the graph structured service entities to suggest optimal monitor configurations. Through extensive experimentation on historical data and user study of recommendations for production services at Microsoft, we demonstrate the efficacy of our approach in providing relevant recommendations for monitor configurations.

聚类(1篇)

【1】Federated Hierarchical Clustering with Automatic Selection of Optimal Cluster Numbers
标题：自动选择最佳集群数的联邦分层集群
链接：https://arxiv.org/abs/2603.12684

作者：Yue Zhang,Chuanlong Qiu,Xinfa Liao,Yiqun Zhang
备注：29 pages, 7 figures
摘要：联邦聚类（FC）是一种新兴的和有前途的解决方案，在探索数据分布模式，从分布式和隐私保护的数据在无监督的方式。现有的FC方法隐式地依赖于客户端具有已知数量的均匀大小的集群的假设。然而，集群的真实数量通常是未知的，并且集群大小在真实场景中自然是不平衡的。此外，联邦学习中的隐私保护传输约束不可避免地减少了可用信息，使得开发鲁棒和准确的FC极具挑战性。因此，我们提出了一种新的FC框架命名为美联储-$k^*$-HC，它可以自动确定一个最佳数量的集群$k^*$的基础上探索通过层次聚类的数据分布。为了获得用于确定$k^*$的全局数据分布，我们让每个客户端生成微子集群。然后将它们的原型上传到服务器进行分层合并。基于密度的合并设计允许探索不同大小和形状的簇，并且渐进合并过程可以根据原型之间的相邻关系来确定$k^*$。在不同数据集上的大量实验证明了所提出的Fed-$k^*$-HC在准确地探索适当数量的聚类方面的FC能力。
摘要：Federated Clustering (FC) is an emerging and promising solution in exploring data distribution patterns from distributed and privacy-protected data in an unsupervised manner. Existing FC methods implicitly rely on the assumption that clients are with a known number of uniformly sized clusters. However, the true number of clusters is typically unknown, and cluster sizes are naturally imbalanced in real scenarios. Furthermore, the privacy-preserving transmission constraints in federated learning inevitably reduce usable information, making the development of robust and accurate FC extremely challenging. Accordingly, we propose a novel FC framework named Fed-$k^*$-HC, which can automatically determine an optimal number of clusters $k^*$ based on the data distribution explored through hierarchical clustering. To obtain the global data distribution for $k^*$ determination, we let each client generate micro-subclusters. Their prototypes are then uploaded to the server for hierarchical merging. The density-based merging design allows exploring clusters of varying sizes and shapes, and the progressive merging process can self-terminate according to the neighboring relationships among the prototypes to determine $k^*$. Extensive experiments on diverse datasets demonstrate the FC capability of the proposed Fed-$k^*$-HC in accurately exploring a proper number of clusters.

超分辨率|去噪|去模糊|去雾(1篇)

【1】Fractals made Practical: Denoising Diffusion as Partitioned Iterated Function Systems
标题：分数形变得实用：消除作为分区迭代函数系统的扩散
链接：https://arxiv.org/abs/2603.13069

作者：Ann Dooms
摘要：当扩散模型将噪声转化为照片时，它实际上在做什么？我们证明了确定性DDIM反向链作为分区迭代函数系统（PIFS）运行，并且该框架作为去噪扩散模型时间表，架构和训练目标的统一设计语言。从PIFS结构中我们得到了三个可计算的几何量：每步收缩阈值$L^*_t$，对角扩展函数$f_t（λ）$和全局扩展阈值$λ^{**}$。这些量不需要模型评估，并充分表征去噪动态。他们从结构上解释了扩散模型的两种状态行为：通过扩散跨补丁注意力在高噪声下的全局上下文组装和通过严格方差顺序的逐补丁抑制释放在低噪声下的精细细节合成。自我注意力是PIFS收缩的自然原型。PIFS吸引子的Kaplan-Yorke维数通过离散的Moran方程在Lyapunov谱上解析地确定。通过研究的分形几何的PIFS，我们得出三个最优的设计标准，并显示四个突出的经验设计选择（余弦时间表偏移，分辨率依赖的logSNR移位，最小信噪比损失加权，并对齐你的步骤采样）每个出现作为近似解决方案，我们明确的几何优化问题调整理论到实践中。
摘要：What is a diffusion model actually doing when it turns noise into a photograph? We show that the deterministic DDIM reverse chain operates as a Partitioned Iterated Function System (PIFS) and that this framework serves as a unified design language for denoising diffusion model schedules, architectures, and training objectives. From the PIFS structure we derive three computable geometric quantities: a per-step contraction threshold $L^*_t$, a diagonal expansion function $f_t(λ)$ and a global expansion threshold $λ^{**}$. These quantities require no model evaluation and fully characterize the denoising dynamics. They structurally explain the two-regime behavior of diffusion models: global context assembly at high noise via diffuse cross-patch attention and fine-detail synthesis at low noise via patch-by-patch suppression release in strict variance order. Self-attention emerges as the natural primitive for PIFS contraction. The Kaplan-Yorke dimension of the PIFS attractor is determined analytically through a discrete Moran equation on the Lyapunov spectrum. Through the study of the fractal geometry of the PIFS, we derive three optimal design criteria and show that four prominent empirical design choices (the cosine schedule offset, resolution-dependent logSNR shift, Min-SNR loss weighting, and Align Your Steps sampling) each arise as approximate solutions to our explicit geometric optimization problems tuning theory into practice.

自动驾驶|车辆|车道检测等(1篇)

【1】Spatial PDE-aware Selective State-space with Nested Memory for Mobile Traffic Grid Forecasting
标题：用于移动交通网格预测的具有嵌套内存的空间PCE感知选择性状态空间
链接：https://arxiv.org/abs/2603.12353

作者：Zineddine Bettouche,Khalid Ali,Andreas Fischer,Andreas Kassler
摘要：蜂窝网络中的流量预测是一个具有挑战性的时空预测问题，由于强烈的时间依赖性，跨小区的空间异构性，以及对大型网络部署的可扩展性的需求。传统的细胞特定的模型会产生高昂的训练和维护成本，而全球模型往往无法捕捉异构的空间动态。最近基于注意力或图神经网络的时空架构提高了准确性，但引入了高计算开销，限制了它们在大规模或实时设置中的适用性。我们研究时空网格预测，其中每个时间步是一个二维网格的流量值，并预测下一个网格补丁使用以前的补丁。我们提出了NeST-S6，一种具有空间PDE感知核心的卷积选择性状态空间模型（SSM），在嵌套学习范式中实现：卷积局部空间混合为空间PDE感知SSM核心提供信息，而当一步预测误差指示未建模动态时，嵌套学习长期记忆由学习优化器更新。在三种分辨率（202，502，1002）的移动流量网格（米兰数据集）上，NeST-S6在单步和6步自回归推出中的误差都低于强大的Mamba系列基线。在漂移应力测试下，我们的模型的嵌套记忆比无记忆消融降低了48-65%的MAE。与竞争对手的逐像素扫描模型相比，NeST-S6还将全网格重建速度提高了32倍，MAC减少了4.3倍，同时实现了61%的逐像素RMSE降低。
摘要：Traffic forecasting in cellular networks is a challenging spatiotemporal prediction problem due to strong temporal dependencies, spatial heterogeneity across cells, and the need for scalability to large network deployments. Traditional cell-specific models incur prohibitive training and maintenance costs, while global models often fail to capture heterogeneous spatial dynamics. Recent spatiotemporal architectures based on attention or graph neural networks improve accuracy but introduce high computational overhead, limiting their applicability in large-scale or real-time settings. We study spatiotemporal grid forecasting, where each time step is a 2D lattice of traffic values, and predict the next grid patch using previous patches. We propose NeST-S6, a convolutional selective state-space model (SSM) with a spatial PDE-aware core, implemented in a nested learning paradigm: convolutional local spatial mixing feeds a spatial PDE-aware SSM core, while a nested-learning long-term memory is updated by a learned optimizer when one-step prediction errors indicate unmodeled dynamics. On the mobile-traffic grid (Milan dataset) at three resolutions (202, 502, 1002), NeST-S6 attains lower errors than a strong Mamba-family baseline in both single-step and 6-step autoregressive rollouts. Under drift stress tests, our model's nested memory lowers MAE by 48-65% over a no-memory ablation. NeST-S6 also speeds full-grid reconstruction by 32 times and reduces MACs by 4.3 times compared to competitive per-pixel scanning models, while achieving 61% lower per-pixel RMSE.

联邦学习|隐私保护|加密(1篇)

【1】SCOPE: Semantic Coreset with Orthogonal Projection Embeddings for Federated learning
标题：范围：用于联邦学习的具有垂直投影嵌入的语义核心集
链接：https://arxiv.org/abs/2603.12976

作者：Md Anwar Hossen,Nathan R. Tallent,Luanzheng Guo,Ali Jannesary
摘要：科学发现越来越需要在联邦数据集上学习，这些数据集由高分辨率仪器的数据流提供，具有极端的类不平衡。当前的ML方法要么需要不切实际的数据聚合，要么由于类不平衡而失败。现有的核心集选择方法依赖于本地的搜索，使他们不知道的全球数据景观，并倾向于次优和非代表性的修剪。为了克服这些挑战，我们引入了SCOPE（使用正交投影嵌入进行联邦学习的语义核心集），这是一个用于联邦数据的核心集框架，可以过滤异常并自适应地修剪冗余数据以减轻长尾偏斜。通过分析潜在空间分布，我们使用衡量核心类特征可靠性的代表性得分，量化正交残差新颖性的多样性得分，以及指示与竞争类相似性的边界接近度得分来对每个数据点进行评分。与以前的方法不同，SCOPE只与联邦服务器共享标量度量，以构建全局共识，确保通信效率。在全球共识的指导下，SCOPE动态过滤局部噪音并丢弃冗余样本以抵消全局特征偏差。大量的实验表明，SCOPE产生竞争力的全球精度和强大的收敛，同时实现了卓越的效率，减少了128倍至512倍的上行链路带宽，7.72倍的挂钟加速和减少FLOP和VRAM的足迹，为本地coreset选择。
摘要：Scientific discovery increasingly requires learning on federated datasets, fed by streams from high-resolution instruments, that have extreme class imbalance. Current ML approaches either require impractical data aggregation or fail due to class imbalance. Existing coreset selection methods rely on local heuristics, making them unaware of the global data landscape and prone to sub-optimal and non-representative pruning. To overcome these challenges, we introduce SCOPE (Semantic Coreset using Orthogonal Projection Embeddings for Federated learning), a coreset framework for federated data that filters anomalies and adaptively prunes redundant data to mitigate long-tail skew. By analyzing the latent space distribution, we score each data point using a representation score that measures the reliability of core class features, a diversity score that quantifies the novelty of orthogonal residuals, and a boundary proximity score that indicates similarity to competing classes. Unlike prior methods, SCOPE shares only scalar metrics with a federated server to construct a global consensus, ensuring communication efficiency. Guided by the global consensus, SCOPE dynamically filters local noise and discards redundant samples to counteract global feature skews. Extensive experiments demonstrate that SCOPE yields competitive global accuracy and robust convergence, all while achieving exceptional efficiency with a 128x to 512x reduction in uplink bandwidth, a 7.72x wall-clock acceleration and reduced FLOP and VRAM footprints for local coreset selection.

推理|分析|理解|解释(8篇)

【1】Breaking the Tuning Barrier: Zero-Hyperparameters Yield Multi-Corner Analysis Via Learned Priors
标题：打破调整障碍：通过习得先验进行零超参数多角分析
链接：https://arxiv.org/abs/2603.13092

作者：Wei W. Xing,Kaiqi Huang,Jiazhan Liu,Hong Qiu,Shan Shen
备注：Accepted by DAC2026. Initial Version
摘要：成品率多角分析可验证25个以上工艺电压温度角的电路，从而产生$O（K \times N）$的组合仿真成本，其中$K$表示角，$N$超过每个角10^4 $个样本。现有的方法面临着一个基本的权衡：简单的模型实现了自动化，但在非线性电路上失败了，而高级AI模型捕获复杂的行为，但每次设计迭代都需要数小时的超参数调整，形成了调整障碍。我们通过替换工程先验（即，模型规格），并从经过数百万次回归任务预训练的基础模型中学习先验知识。该模型执行上下文学习，立即适应每个电路，无需调整或重新训练。它的注意力机制通过识别操作条件之间的共享电路物理来自动跨角落传输知识。结合自动特征选择器（1152 D到48 D），我们的方法匹配最先进的准确性（平均MRE低至0.11%），零调整，减少总验证成本超过10倍。
摘要：Yield Multi-Corner Analysis validates circuits across 25+ Process-Voltage-Temperature corners, resulting in a combinatorial simulation cost of $O(K \times N)$ where $K$ denotes corners and $N$ exceeds $10^4$ samples per corner. Existing methods face a fundamental trade-off: simple models achieve automation but fail on nonlinear circuits, while advanced AI models capture complex behaviors but require hours of hyperparameter tuning per design iteration, forming the Tuning Barrier. We break this barrier by replacing engineered priors (i.e., model specifications) with learned priors from a foundation model pre-trained on millions of regression tasks. This model performs in-context learning, instantly adapting to each circuit without tuning or retraining. Its attention mechanism automatically transfers knowledge across corners by identifying shared circuit physics between operating conditions. Combined with an automated feature selector (1152D to 48D), our method matches state-of-the-art accuracy (mean MREs as low as 0.11\%) with zero tuning, reducing total validation cost by over $10\times$.

【2】L2GTX: From Local to Global Time Series Explanations
标题：L2 GTX：从本地到全球时间序列解释
链接：https://arxiv.org/abs/2603.13065

作者：Ephrem Tibebe Mekonnen,Luca Longo,Lucas Rizzo,Pierpaolo Dondio
备注：Accepted for publication at the 4th World Conference on Explainable Artificial Intelligence (xAI 2026), 18 pages, 6 figures
摘要：深度学习模型在时间序列分类中实现了高准确性，但了解其类级别的决策行为仍然具有挑战性。时间序列的解释必须尊重时间依赖性，并识别跨实例重复出现的模式。现有的方法面临三个限制：为图像和表格数据开发的模型不可知的XAI方法不容易扩展到时间序列，时间序列的全球解释合成仍然未被探索，并且大多数现有的全球方法是模型特定的。我们提出了L2 GTX，这是一个与模型无关的框架，它通过聚合来自一组代表性实例的本地解释来生成类全局解释。L2 GTX提取参数化时间事件基元的聚类，例如增加或减少的趋势和局部极值，以及LOMANCE产生的实例级解释中的重要性分数。这些聚类在实例之间合并以减少冗余，并使用实例-聚类重要性矩阵来估计全局相关性。根据用户定义的实例选择预算，L2 GTX选择代表性实例，最大限度地覆盖有影响力的集群。然后，来自所选实例的事件被聚合到简洁的类全局解释中。六个基准时间序列数据集上的实验表明，L2 GTX产生紧凑的和可解释的全球解释，同时保持稳定的全球忠诚度测量平均本地代理保真度。
摘要：Deep learning models achieve high accuracy in time series classification, yet understanding their class-level decision behaviour remains challenging. Explanations for time series must respect temporal dependencies and identify patterns that recur across instances. Existing approaches face three limitations: model-agnostic XAI methods developed for images and tabular data do not readily extend to time series, global explanation synthesis for time series remains underexplored, and most existing global approaches are model-specific. We propose L2GTX, a model-agnostic framework that generates class-wise global explanations by aggregating local explanations from a representative set of instances. L2GTX extracts clusters of parameterised temporal event primitives, such as increasing or decreasing trends and local extrema, together with their importance scores from instance-level explanations produced by LOMATCE. These clusters are merged across instances to reduce redundancy, and an instance-cluster importance matrix is used to estimate global relevance. Under a user-defined instance selection budget, L2GTX selects representative instances that maximise coverage of influential clusters. Events from the selected instances are then aggregated into concise class-wise global explanations. Experiments on six benchmark time series datasets show that L2GTX produces compact and interpretable global explanations while maintaining stable global faithfulness measured as mean local surrogate fidelity.

【3】A Multi-task Large Reasoning Model for Molecular Science
标题：一个面向分子科学的多任务大型推理模型
链接：https://arxiv.org/abs/2603.12808

作者：Pengfei Liu,Shuang Ge,Jun Tao,Zhixiang Ren
摘要：分子科学人工智能的进步需要从纯粹的数据驱动的预测到知识指导的计算推理的范式转变。现有的分子模型主要是专有的，缺乏一般的分子智能和概括性。这强调了计算方法的必要性，可以有效地将科学逻辑与深度学习架构相结合。在这里，我们介绍了一个多任务的大型推理模型，旨在通过结构化推理和反射来模拟分子科学家的认知过程。我们的方法结合了多专业模块，以提供多功能的分子专业知识和通过注入分子知识的强化学习增强的思想链（CoT）框架，从而实现结构化和反思性推理。对10个分子任务和47个指标的系统评估表明，我们的模型比基础架构平均提高了50.3%，优于20多个最先进的基线，包括超大参数基础模型，尽管使用的训练数据和计算资源显著减少。这验证了嵌入显式推理机制能够实现高效学习，允许较小规模的模型在有效性和可解释性方面超越大规模模型。通过关于中枢神经系统（CNS）候选药物设计的案例研究验证了该计算框架的实际效用，说明了其在智能分子设计的数据驱动和知识集成方法之间架起桥梁的能力。
摘要：Advancements in artificial intelligence for molecular science are necessitating a paradigm shift from purely data-driven predictions to knowledge-guided computational reasoning. Existing molecular models are predominantly proprietary, lacking general molecular intelligence and generalizability. This underscores the necessity for computational methods that can effectively integrate scientific logic with deep learning architectures. Here we introduce a multi-task large reasoning model designed to emulate the cognitive processes of molecular scientists through structured reasoning and reflection. Our approach incorporates multi-specialist modules to provide versatile molecular expertise and a chain-of-thought (CoT) framework enhanced by reinforcement learning infused with molecular knowledge, enabling structured and reflective reasoning. Systematic evaluations across 10 molecular tasks and 47 metrics demonstrate that our model achieves an average 50.3% improvement over the base architecture, outperforming over 20 state-of-the-art baselines, including ultra-large-parameter foundation models, despite using significantly fewer training data and computational resources. This validates that embedding explicit reasoning mechanisms enables high-efficiency learning, allowing smaller-scale models to surpass massive counterparts in both efficacy and interpretability. The practical utility of this computational framework was validated through a case study on the design of central nervous system (CNS) drug candidates, illustrating its capacity to bridge data-driven and knowledge-integrated approaches for intelligent molecular design.

【4】Vision Verification Enhanced Fusion of VLMs for Efficient Visual Reasoning
标题：视觉验证增强的VLM融合以实现高效的视觉推理
链接：https://arxiv.org/abs/2603.12669

作者：Selim Furkan Tekin,Yichang Xu,Gaowen Liu,Ramana Rao Kompella,Margaret L. Loper,Ling Liu
摘要：随着视觉语言模型（VLM）的数量和多样性的不断增加，许多工作探索基于语言的集成，协作，并在多个VLM路由技术，以提高多模型推理。相比之下，我们使用视觉和语言模态来解决多样化的模型选择。我们引入焦点错误多样性来捕获跨VLMs的互补推理，并引入基于CKA的焦点多样性度量（CKA-focal）来测量其视觉嵌入中的不一致性。在从候选VLMs池构建的集成表面上，我们应用遗传算法有效地修剪掉那些不增加融合性能值的组件VLMs。我们确定每个任务的最佳组合，以及融合模型池中每个VLMs的输出，并表明异构模型可以动态捕获认知不确定性并减轻幻觉。我们的V3 Fusion方法能够产生双焦点多样性融合预测，具有高性能的视觉语言推理，即使没有多数共识或大多数VLM做出不正确的预测。广泛的实验验证了V3 Fusion在四个流行的VLM基准测试（A-OKVQA，MMMU，MMMU-Pro和OCR-VQA）。结果表明，V3 Fusion在MMMU上的性能优于性能最好的VLM 8.09%，MMMU-Pro的精度提高了4.87%。对于生成任务，V3 Fusion的性能优于Intern-VL 2 -8b和Qwen2.5-VL-7 b，这两个VLM在A-OKVQA和OCR-VQA上的表现都名列前茅。我们的代码和数据集可以在https://github.com/sftekin/v3fusion上找到。
摘要：With the growing number and diversity of Vision-Language Models (VLMs), many works explore language-based ensemble, collaboration, and routing techniques across multiple VLMs to improve multi-model reasoning. In contrast, we address the diverse model selection using both vision and language modalities. We introduce focal error diversity to capture complementary reasoning across VLMs and a CKA-based focal diversity metric (CKA-focal) to measure disagreement in their visual embeddings. On the constructed ensemble surface from a pool of candidate VLMs, we applied a Genetic Algorithm to effectively prune out those component VLMs that do not add value to the fusion performance. We identify the best combination for each task as well as fuse the outputs of each VLMs in the model pool, and show that heterogeneous models can capture epistemic uncertainty dynamically and mitigate hallucinations. Our V3Fusion approach is capable of producing dual focal-diversity fused predictions with high performance for vision-language reasoning, even when there is no majority consensus or the majority of VLMs make incorrect predictions. Extensive experiments validate V3Fusion on four popular VLM benchmarks (A-OKVQA, MMMU, MMMU-Pro, and OCR-VQA). The results show that V3Fusion outperforms the best-performing VLM on MMMU by 8.09% and MMMU-Pro by 4.87% gain in accuracy. For generative tasks, V3Fusion outperforms Intern-VL2-8b and Qwen2.5-VL-7b, the top-2 VLM performers on both A-OKVQA and OCR-VQA. Our code and datasets are available at https://github.com/sftekin/v3fusion.

【5】TERMINATOR: Learning Optimal Exit Points for Early Stopping in Chain-of-Thought Reasoning
标题：研究者：学习思想链推理中早期停止的最佳退出点
链接：https://arxiv.org/abs/2603.12529

作者：Alliot Nagle,Jakhongir Saydaliev,Dhia Garbaya,Michael Gastpar,Ashok Vardhan Makkuva,Hyeji Kim
备注：35 pages, 31 figures
摘要：大型推理模型（LRM）通过思想链（CoT）推理在复杂的推理任务中实现了令人印象深刻的性能，这使它们能够在得出最终答案之前生成中间思维令牌。然而，LRM往往遭受显着的过度思考，花费过多的计算时间，即使在答案生成的早期on. Prior工作已经确定了存在一个最佳的推理长度，这样，截断推理在这一点上显着缩短CoT输出几乎没有变化的性能。然而，确定实际数据集的最佳CoT长度是非常重要的，因为它们完全依赖于任务和模型。在本文中，我们正是针对这一点，并设计了一个早期退出策略，LRM在推理，以减轻过度思考。支撑推理机的中心思想是，LRM的最终答案的第一次到达通常是可预测的，我们利用这些第一个答案的位置来创建一个新的最佳推理长度的数据集来训练推理机。在这种方法的支持下，在MATH-500、AIME 2025、HumanEval和GPQA这四个具有挑战性的实际数据集上，EQUATOR平均实现了14%-55%的CoT长度的显著减少，同时优于当前最先进的方法。
摘要：Large Reasoning Models (LRMs) achieve impressive performance on complex reasoning tasks via Chain-of-Thought (CoT) reasoning, which enables them to generate intermediate thinking tokens before arriving at the final answer. However, LRMs often suffer from significant overthinking, spending excessive compute time even after the answer is generated early on. Prior work has identified the existence of an optimal reasoning length such that truncating reasoning at this point significantly shortens CoT outputs with virtually no change in performance. However, determining optimal CoT lengths for practical datasets is highly non-trivial as they are fully task and model-dependent. In this paper, we precisely address this and design TERMINATOR, an early-exit strategy for LRMs at inference to mitigate overthinking. The central idea underpinning TERMINATOR is that the first arrival of an LRM's final answer is often predictable, and we leverage these first answer positions to create a novel dataset of optimal reasoning lengths to train TERMINATOR. Powered by this approach, TERMINATOR achieves significant reductions in CoT lengths of 14%-55% on average across four challenging practical datasets: MATH-500, AIME 2025, HumanEval, and GPQA, whilst outperforming current state-of-the-art methods.

【6】Efficient Reasoning with Balanced Thinking
标题：平衡思维的高效推理
链接：https://arxiv.org/abs/2603.12372

作者：Yulin Li,Tengyao Tu,Li Ding,Junjie Wang,Huiling Zhen,Yixin Chen,Yong Li,Zhuotao Tian
备注：Accepted by ICLR 2026
摘要：大型推理模型（LRM）已经显示出卓越的推理能力，但它们往往会过度思考，在简单问题上花费冗余的计算步骤，或者思考不足，尽管具有固有的能力，但无法探索足够的推理路径。这些问题导致效率低下和潜在的不准确性，限制了在资源有限的环境中的实际部署。现有的方法来减轻过度思考，如抑制反射关键字或调整推理长度，可能会无意中引起思考不足，损害准确性。因此，我们提出了ReBalance，这是一个无需训练的框架，可以通过平衡思维实现高效推理。ReBalance利用信心作为推理动态的连续指标，通过高置信方差识别过度思考，并通过一致的过度自信识别思考不足。通过将小规模数据集的隐藏状态聚集成推理模式原型，计算导向向量来引导LRM的推理轨迹。动态控制功能根据实时置信度调节该向量的强度和方向，在过度思考期间修剪冗余，并在欠思考期间促进探索。在从0.5B到32B的四个模型上进行的广泛实验，以及在数学推理，一般问题回答和编码任务中的九个基准测试表明，ReBalance有效地减少了输出冗余，同时提高了准确性，为高效和强大的LRM部署提供了通用的，免训练的即插即用策略。代码可在https://github.com/yu-lin-li/ReBalance上获得。
摘要：Large Reasoning Models (LRMs) have shown remarkable reasoning capabilities, yet they often suffer from overthinking, expending redundant computational steps on simple problems, or underthinking, failing to explore sufficient reasoning paths despite inherent capabilities. These issues lead to inefficiencies and potential inaccuracies, limiting practical deployment in resource-constrained settings. Existing methods to mitigate overthinking, such as suppressing reflective keywords or adjusting reasoning length, may inadvertently induce underthinking, compromising accuracy. Therefore, we propose ReBalance, a training-free framework that achieves efficient reasoning with balanced thinking. ReBalance leverages confidence as a continuous indicator of reasoning dynamics, identifying overthinking through high confidence variance and underthinking via consistent overconfidence. By aggregating hidden states from a small-scale dataset into reasoning mode prototypes, we compute a steering vector to guide LRMs' reasoning trajectories. A dynamic control function modulates this vector's strength and direction based on real-time confidence, pruning redundancy during overthinking, and promoting exploration during underthinking. Extensive experiments conducted on four models ranging from 0.5B to 32B, and across nine benchmarks in math reasoning, general question answering, and coding tasks demonstrate that ReBalance effectively reduces output redundancy while improving accuracy, offering a general, training-free, and plug-and-play strategy for efficient and robust LRM deployment. Code is available at https://github.com/yu-lin-li/ReBalance .

【7】HCP-DCNet: A Hierarchical Causal Primitive Dynamic Composition Network for Self-Improving Causal Understanding
标题：HCP-DCNet：一个用于自我改进因果理解的分层因果原始动态合成网络
链接：https://arxiv.org/abs/2603.12305

作者：Ming Lei,Shufan Wu,Christophe Baehr
备注：17 pages, 2 figures, submitted to a journal and under review
摘要：理解和推理因果关系的能力--包括干预、反事实和潜在机制--是强大人工智能的基石。虽然深度学习擅长模式识别，但它从根本上缺乏因果关系模型，使得系统在分布变化下变得脆弱，无法回答“如果”问题。本文介绍了\n {层次因果原始动态组合网络（HCP-DCNet）}，一个统一的框架，桥梁连续的物理动力学与离散的符号因果推理。从单片表示出发，HCP-DCNet将因果场景分解为可重用的，类型化的因果原语，组织成四个抽象层：物理，功能，事件和规则。双通道路由网络动态地将这些原语组成特定于任务的、完全可区分的因果执行图（CEG）。最重要的是，该系统采用了一个\endash\end我们建立严格的理论保证，包括类型安全的组成，路由收敛，因果动态的普遍近似。在模拟的物理和社会环境中进行的大量实验表明，HCP-DCNet在因果发现、反事实推理和成分概括方面的表现明显优于最先进的基线。这项工作提供了一个有原则的，可扩展的，可解释的架构，用于构建具有类似人类的因果抽象和持续自我完善能力的AI系统。
摘要：The ability to understand and reason about cause and effect -- encompassing interventions, counterfactuals, and underlying mechanisms -- is a cornerstone of robust artificial intelligence. While deep learning excels at pattern recognition, it fundamentally lacks a model of causality, making systems brittle under distribution shifts and unable to answer ``what-if'' questions. This paper introduces the \emph{Hierarchical Causal Primitive Dynamic Composition Network (HCP-DCNet)}, a unified framework that bridges continuous physical dynamics with discrete symbolic causal inference. Departing from monolithic representations, HCP-DCNet decomposes causal scenes into reusable, typed \emph{causal primitives} organized into four abstraction layers: physical, functional, event, and rule. A dual-channel routing network dynamically composes these primitives into task-specific, fully differentiable \emph{Causal Execution Graphs (CEGs)}. Crucially, the system employs a \emph{causal-intervention-driven meta-evolution} strategy, enabling autonomous self-improvement through a constrained Markov decision process. We establish rigorous theoretical guarantees, including type-safe composition, routing convergence, and universal approximation of causal dynamics. Extensive experiments across simulated physical and social environments demonstrate that HCP-DCNet significantly outperforms state-of-the-art baselines in causal discovery, counterfactual reasoning, and compositional generalization. This work provides a principled, scalable, and interpretable architecture for building AI systems with human-like causal abstraction and continual self-refinement capabilities.

【8】Explainable AI Using Inherently Interpretable Components for Wearable-based Health Monitoring
标题：可解释人工智能使用固有可解释组件进行基于可穿戴设备的健康监控
链接：https://arxiv.org/abs/2603.12880

作者：Maurice Kuschel,Solveig Vieluf,Claus Reinsberger,Tobias Loddenkemper,Tanuj Hasija
备注：Submitted to the IEEE Journal of Biomedical and Health Informatics
摘要：通过基于AI的模型，可穿戴设备在医疗和健康领域的使用为实时监控和可解释的事件检测提供了巨大的潜力。可解释人工智能（XAI）需要评估模型已经学习了什么，并为患者、医疗保健专业人员、模型开发人员和领域专家建立对模型输出的信任。由于数据的复杂性和时间依赖性，解释可穿戴设备记录的时间序列数据的AI决策尤其具有挑战性。使用可解释特性的可解释性往往会导致性能损失。我们提出了一种新的XAI方法，它结合了解释空间和基于概念的解释来解释AI对时间序列数据的预测。通过使用内在可解释组件（IIC），它将特定领域的可解释概念封装在自定义解释空间中，我们保留了在时间序列上训练的模型的性能，同时实现了基于提取特征的基于概念的解释的可解释性。此外，我们定义了一组特定于域的IIC，用于基于可穿戴设备的健康监测，并展示了它们在实际应用中的可用性，包括状态评估和癫痫发作检测。
摘要：The use of wearables in medicine and wellness, enabled by AI-based models, offers tremendous potential for real-time monitoring and interpretable event detection. Explainable AI (XAI) is required to assess what models have learned and build trust in model outputs, for patients, healthcare professionals, model developers, and domain experts alike. Explaining AI decisions made on time-series data recorded by wearables is especially challenging due to the data's complex nature and temporal dependencies. Too often, explainability using interpretable features leads to performance loss. We propose a novel XAI method that combines explanation spaces and concept-based explanations to explain AI predictions on time-series data. By using Inherently Interpretable Components (IICs), which encapsulate domain-specific, interpretable concepts within a custom explanation space, we preserve the performance of models trained on time series while achieving the interpretability of concept-based explanations based on extracted features. Furthermore, we define a domain-specific set of IICs for wearable-based health monitoring and demonstrate their usability in real applications, including state assessment and epileptic seizure detection.

检测相关(4篇)

【1】FraudFox: Adaptable Fraud Detection in the Real World
标题：FraudFox：现实世界中的适应性欺诈检测
链接：https://arxiv.org/abs/2603.13014

作者：Matthew Butler,Yi Fan,Christos Faloutsos
摘要：所提出的方法（FraudFox）为资源受限环境中的对抗性攻击提供了解决方案。我们关注的问题如下：周一凌晨3点，试图购买500美元鞋子的“史密斯”有多可疑？如何在对抗性环境中合并来自少数风险评估模块（“预言”）的风险评分？更重要的是，给定历史数据（订单、价格和之后发生的事情）和业务目标/限制，哪些交易（如上面的“史密斯”交易）应该“通过”，而不是发送给人类调查人员？业务限制可以是：“至多x$$调查是可行的”，或“至多\$y$由于欺诈而损失”。这是我们在这项工作中关注的两个研究问题。解决第一个问题（“oracle-weighting”）的一种方法是通过使用具有动态重要性权重的扩展卡尔曼滤波器，来自动和连续地更新每个“oracle”的权重。对于第二个问题，我们展示了如何获得最优决策面，以及如何计算帕累托最优集，以允许假设问题。一个重要的考虑因素是适应：欺诈者会根据我们过去的决定改变他们的行为;因此，我们需要相应地适应。由此产生的系统，\方法，是可扩展的，适应不断变化的欺诈行为，有效的，并已在\textbf{生产}亚马逊。FraudFox增强了欺诈预防子系统，并带来了显著的性能提升。
摘要：The proposed method (FraudFox) provides solutions to adversarial attacks in a resource constrained environment. We focus on questions like the following: How suspicious is `Smith', trying to buy \$500 shoes, on Monday 3am? How to merge the risk scores, from a handful of risk-assessment modules (`oracles') in an adversarial environment? More importantly, given historical data (orders, prices, and what-happened afterwards), and business goals/restrictions, which transactions, like the `Smith' transaction above, which ones should we `pass', versus send to human investigators? The business restrictions could be: `at most $x$ investigations are feasible', or `at most \$$y$ lost due to fraud'. These are the two research problems we focus on, in this work. One approach to address the first problem (`oracle-weighting'), is by using Extended Kalman Filters with dynamic importance weights, to automatically and continuously update our weights for each 'oracle'. For the second problem, we show how to derive an optimal decision surface, and how to compute the Pareto optimal set, to allow what-if questions. An important consideration is adaptation: Fraudsters will change their behavior, according to our past decisions; thus, we need to adapt accordingly. The resulting system, \method, is scalable, adaptable to changing fraudster behavior, effective, and already in \textbf{production} at Amazon. FraudFox augments a fraud prevention sub-system and has led to significant performance gains.

【2】Rethinking VLMs for Image Forgery Detection and Localization
标题：重新思考图像伪造检测和定位的VLM
链接：https://arxiv.org/abs/2603.12930

作者：Shaofeng Guo,Jiequan Cui,Richang Hong
备注：8pages
摘要：随着人工智能生成内容（AIGC）的迅速崛起，图像操作变得越来越容易，这对图像伪造检测和定位（IFDL）提出了重大挑战。在本文中，我们研究如何充分利用视觉语言模型（VLM），以协助IFDL任务。特别是，我们观察到，从VLM先验几乎没有好处的检测和定位性能，甚至有负面影响，由于其固有的偏见语义可扩展性，而不是真实性。此外，位置掩码显式地编码伪造概念，其可以用作VLM的额外先验以简化其训练优化，从而增强检测和定位结果的可解释性。基于这些发现，我们提出了一个新的IFDL管道命名为IFDL-VLM。为了证明我们的方法的有效性，我们在9个流行的基准测试中进行了实验，并评估了模型在域内和跨数据集泛化设置下的性能。实验结果表明，我们始终实现新的国家的最先进的性能在检测，定位和可解释性。代码可在：https://github.com/sha0fengGuo/IFDL-VLM。
摘要：With the rapid rise of Artificial Intelligence Generated Content (AIGC), image manipulation has become increasingly accessible, posing significant challenges for image forgery detection and localization (IFDL). In this paper, we study how to fully leverage vision-language models (VLMs) to assist the IFDL task. In particular, we observe that priors from VLMs hardly benefit the detection and localization performance and even have negative effects due to their inherent biases toward semantic plausibility rather than authenticity. Additionally, the location masks explicitly encode the forgery concepts, which can serve as extra priors for VLMs to ease their training optimization, thus enhancing the interpretability of detection and localization results. Building on these findings, we propose a new IFDL pipeline named IFDL-VLM. To demonstrate the effectiveness of our method, we conduct experiments on 9 popular benchmarks and assess the model performance under both in-domain and cross-dataset generalization settings. The experimental results show that we consistently achieve new state-of-the-art performance in detection, localization, and interpretability.Code is available at: https://github.com/sha0fengGuo/IFDL-VLM.

【3】Surprised by Attention: Predictable Query Dynamics for Time Series Anomaly Detection
标题：令人惊讶的注意：时间序列异常检测的可预测查询动态
链接：https://arxiv.org/abs/2603.12916

作者：Kadir-Kaan Özer,René Ebeling,Markus Enzweiler
备注：Main: 17 Pages, 7 Figures, 3 Tables; Appendix: 3 Pages, 4 Tables
摘要：多变量时间序列异常通常表现为跨通道依赖性的变化，而不是简单的幅度偏移。例如，在自动驾驶中，转向命令可能是内部一致的，但与由此产生的横向加速度解耦。基于残差的检测器可能会错过这样的异常时，灵活的序列模型仍然重建信号，尽管改变了协调。我们介绍AxonAD，一个无监督的检测器，将多头注意查询进化作为一个短期的可预测的过程。一个梯度更新的重建路径是与历史上唯一的预测，预测未来的查询向量从过去的上下文。这是通过对指数移动平均（EMA）目标编码器的掩蔽预测目标进行训练。在推理时，重构误差与尾部聚合的查询不匹配分数相结合，该分数测量最近时间步上的预测查询和目标查询之间的余弦偏差。这种双重方法提供了对结构依赖性偏移的灵敏度，同时保留幅度水平检测。在具有间隔注释的专有车载遥测和具有无阈值和范围感知指标的TSB-AD多变量套件（17个数据集，180个系列）上，AxonAD在强基线上提高了排名质量和时间定位。消融确认查询预测和组合评分是观察到的收益的主要驱动因素。代码可在URL https://github.com/iis-esslingen/AxonAD上获得。
摘要：Multivariate time series anomalies often manifest as shifts in cross-channel dependencies rather than simple amplitude excursions. In autonomous driving, for instance, a steering command might be internally consistent but decouple from the resulting lateral acceleration. Residual-based detectors can miss such anomalies when flexible sequence models still reconstruct signals plausibly despite altered coordination. We introduce AxonAD, an unsupervised detector that treats multi-head attention query evolution as a short horizon predictable process. A gradient-updated reconstruction pathway is coupled with a history-only predictor that forecasts future query vectors from past context. This is trained via a masked predictor-target objective against an exponential moving average (EMA) target encoder. At inference, reconstruction error is combined with a tail-aggregated query mismatch score, which measures cosine deviation between predicted and target queries on recent timesteps. This dual approach provides sensitivity to structural dependency shifts while retaining amplitude-level detection. On proprietary in-vehicle telemetry with interval annotations and on the TSB-AD multi-variate suite (17 datasets, 180 series) with threshold-free and range-aware metrics, AxonAD improves ranking quality and temporal localization over strong baselines. Ablations confirm that query prediction and combined scoring are the primary drivers of the observed gains. Code is available at the URL https://github.com/iis-esslingen/AxonAD.

【4】Show, Don't Tell: Detecting Novel Objects by Watching Human Videos
标题：表演，不要说：通过观看人类视频来检测新奇物体
链接：https://arxiv.org/abs/2603.12751

作者：James Akl,Jose Nicolas Avendano Arbelaez,James Barabas,Jennifer L. Barry,Kalie Ching,Noam Eshed,Jiahui Fu,Michel Hidalgo,Andrew Hoelscher,Tushar Kusnur,Andrew Messing,Zachary Nagler,Brian Okorn,Mauro Passerino,Tim J. Perkins,Eric Rosen,Ankit Shah,Tanmay Shankar,Scott Shaw
摘要：机器人如何快速识别和识别人类演示过程中展示给它的新物体？现有的闭集对象检测器经常在这方面失败，因为对象是分布外的。而开集检测器（例如，VLM）有时会成功，但它们通常需要昂贵且繁琐的人在回路提示工程来唯一地识别新的对象实例。在本文中，我们提出了一个自我监督的系统，消除了繁琐的语言描述和昂贵的提示工程的需要，通过训练一个定制的对象检测器自动创建的数据集，由人类示范本身监督。在我们的方法中，“显示，不告诉”，我们在演示过程中向检测器显示感兴趣的特定对象，而不是通过复杂的语言描述告诉检测器这些对象。通过完全绕过语言，这种范例使我们能够快速训练定制的检测器，以适应在人工任务演示中观察到的相关对象。我们开发了一个集成的机器人系统部署我们的“显示，不告诉”范例的自动数据集创建和新的对象检测在现实世界的机器人。实验结果表明，我们的管道显着优于最先进的检测和识别方法的操作对象，从而提高机器人的任务完成。
摘要：How can a robot quickly identify and recognize new objects shown to it during a human demonstration? Existing closed-set object detectors frequently fail at this because the objects are out-of-distribution. While open-set detectors (e.g., VLMs) sometimes succeed, they often require expensive and tedious human-in-the-loop prompt engineering to uniquely recognize novel object instances. In this paper, we present a self-supervised system that eliminates the need for tedious language descriptions and expensive prompt engineering by training a bespoke object detector on an automatically created dataset, supervised by the human demonstration itself. In our approach, "Show, Don't Tell," we show the detector the specific objects of interest during the demonstration, rather than telling the detector about these objects via complex language descriptions. By bypassing language altogether, this paradigm enables us to quickly train bespoke detectors tailored to the relevant objects observed in human task demonstrations. We develop an integrated on-robot system to deploy our "Show, Don't Tell" paradigm of automatic dataset creation and novel object-detection on a real-world robot. Empirical results demonstrate that our pipeline significantly outperforms state-of-the-art detection and recognition methods for manipulated objects, leading to improved task completion for the robot.

分类|识别(2篇)

【1】SortScrews: A Dataset and Baseline for Real-time Screw Classification
标题：SortScrews：实时螺丝分类的数据集和基线
链接：https://arxiv.org/abs/2603.13027

作者：Tianhao Fu,Bingxuan Yang,Juncheng Guo,Shrena Sribalan,Yucheng Chen
摘要：自动识别螺钉类型对于工业自动化、机器人和库存管理非常重要。然而，公开可用的螺丝分类数据集是稀缺的，特别是在自动分拣系统中经常遇到的受控单对象场景。在这项工作中，我们引入$\textbf{SortScrews}$，一个数据集的casewise视觉分类的螺钉。该数据集包含560个RGB图像，分辨率为$512\times512$，涵盖六种螺钉类型和一个背景类。使用标准化采集设置捕获图像，包括四种捕获设置中照明和相机视角的轻微变化。为了促进可重复的研究和数据集扩展，我们还提供了一个可重复使用的数据收集脚本，允许用户使用廉价的相机设置轻松地为自定义硬件组件构建类似的数据集。我们使用在ImageNet上预训练的EfficientNet-B 0和ResNet-18分类器使用迁移学习建立基线结果。此外，我们还进行了深入的故障分析。尽管数据集大小有限，但这些轻量级模型实现了强大的分类准确性，表明即使使用相对较小的数据集，受控的采集条件也可以实现有效的学习。数据集、收集管道和基线训练代码可在https://github.com/ATATC/SortScrews上公开获得。
摘要：Automatic identification of screw types is important for industrial automation, robotics, and inventory management. However, publicly available datasets for screw classification are scarce, particularly for controlled single-object scenarios commonly encountered in automated sorting systems. In this work, we introduce $\textbf{SortScrews}$, a dataset for casewise visual classification of screws. The dataset contains 560 RGB images at $512\times512$ resolution covering six screw types and a background class. Images are captured using a standardized acquisition setup and include mild variations in lighting and camera perspective across four capture settings. To facilitate reproducible research and dataset expansion, we also provide a reusable data collection script that allows users to easily construct similar datasets for custom hardware components using inexpensive camera setups. We establish baseline results using transfer learning with EfficientNet-B0 and ResNet-18 classifiers pretrained on ImageNet. In addition, we conduct a well-explored failure analysis. Despite the limited dataset size, these lightweight models achieve strong classification accuracy, demonstrating that controlled acquisition conditions enable effective learning even with relatively small datasets. The dataset, collection pipeline, and baseline training code are publicly available at https://github.com/ATATC/SortScrews.

【2】A Fractional Fox H-Function Kernel for Support Vector Machines: Robust Classification via Weighted Transmutation Operators
标题：支持向量机的分数Fox H-函数核：通过加权变形操作的鲁棒分类
链接：https://arxiv.org/abs/2603.12794

作者：Gustavo Dorrego
备注：7 pages, 4 figures
摘要：支持向量机（SVM）在很大程度上依赖于核函数的选择来将数据映射到高维特征空间。虽然高斯径向基函数（RBF）是行业标准，但其指数衰减使其非常容易受到结构噪声和离群值的影响，通常会导致复杂数据集的严重过拟合。在本文中，我们提出了一类新的非平稳核的基本解的广义时间-空间分数阶扩散波方程。通过利用加权Sobolev空间上的结构保持变换方法，我们引入了Fox-Dorrego核，一个由Fox H函数控制的精确分析Mercer核。与标准内核不同，我们的公式结合了一个老化的权重函数（“Alzheia效应”）来惩罚远处的离群值，并结合了一个分数渐近幂律衰减，以实现鲁棒的重尾特征映射（类似于Lévy航班）。在合成数据集和真实世界的高维雷达数据（电离层）上的数值实验表明，所提出的Fox-Dorrego内核始终优于标准高斯RBF基线，将分类错误率降低了约50%，同时保持对离群值的结构鲁棒性。
摘要：Support Vector Machines (SVMs) rely heavily on the choice of the kernel function to map data into high-dimensional feature spaces. While the Gaussian Radial Basis Function (RBF) is the industry standard, its exponential decay makes it highly susceptible to structural noise and outliers, often leading to severe overfitting in complex datasets. In this paper, we propose a novel class of non-stationary kernels derived from the fundamental solution of the generalized time-space fractional diffusion-wave equation. By leveraging a structure-preserving transmutation method over Weighted Sobolev Spaces, we introduce the Fox-Dorrego Kernel, an exact analytical Mercer kernel governed by the Fox H-function. Unlike standard kernels, our formulation incorporates an aging weight function (the "Amnesia Effect") to penalize distant outliers and a fractional asymptotic power-law decay to allow for robust, heavy-tailed feature mapping (analogous to Lévy flights). Numerical experiments on both synthetic datasets and real-world high-dimensional radar data (Ionosphere) demonstrate that the proposed Fox-Dorrego kernel consistently outperforms the standard Gaussian RBF baseline, reducing the classification error rate by approximately 50\% while maintaining structural robustness against outliers.

表征(2篇)

【1】Representation Learning for Spatiotemporal Physical Systems
标题：时空物理系统的表示学习
链接：https://arxiv.org/abs/2603.13227

作者：Helen Qu,Rudy Morel,Michael McCabe,Alberto Bietti,François Lanusse,Shirley Ho,Yann LeCun
备注：Published at ICLR 2026 Workshop on AI & PDE
摘要：时空物理系统的机器学习方法主要集中在下一帧预测上，其目标是为系统的时间演化学习一个准确的仿真器。然而，这些仿真器的训练在计算上是昂贵的，并且受到性能缺陷的影响，例如自回归推出期间的复合误差。在这项工作中，我们从不同的角度来看待预测下一帧的下游科学任务，例如估计系统的控制物理参数。这些任务的准确性提供了一个独特的量化一瞥这些模型的表示的物理相关性。我们评估了通用的自我监督方法在学习物理基础表征中的有效性，这些表征对下游科学任务很有用。令人惊讶的是，我们发现并非所有为物理建模设计的方法在这些任务上都优于通用的自监督学习方法，并且在潜在空间中学习的方法（例如，联合嵌入预测架构或JEPA）优于那些优化像素级预测目标的架构。代码可在https://github.com/helenqu/physical-representation-learning上获得。
摘要：Machine learning approaches to spatiotemporal physical systems have primarily focused on next-frame prediction, with the goal of learning an accurate emulator for the system's evolution in time. However, these emulators are computationally expensive to train and are subject to performance pitfalls, such as compounding errors during autoregressive rollout. In this work, we take a different perspective and look at scientific tasks further downstream of predicting the next frame, such as estimation of a system's governing physical parameters. Accuracy on these tasks offers a uniquely quantifiable glimpse into the physical relevance of the representations of these models. We evaluate the effectiveness of general-purpose self-supervised methods in learning physics-grounded representations that are useful for downstream scientific tasks. Surprisingly, we find that not all methods designed for physical modeling outperform generic self-supervised learning methods on these tasks, and methods that learn in the latent space (e.g., joint embedding predictive architectures, or JEPAs) outperform those optimizing pixel-level prediction objectives. Code is available at https://github.com/helenqu/physical-representation-learning.

【2】TerraFlow: Multimodal, Multitemporal Representation Learning for Earth Observation
标题：TerraFlow：用于地球观测的多模式、多时态表示学习
链接：https://arxiv.org/abs/2603.12762

作者：Nazar Puriy,Johannes Jakubik,Benedikt Blumenstiel,Konrad Schindler
摘要：我们提出了TerraFlow，一种新的方法，多模态，多时相学习地球观测。TerraFlow建立在时间训练目标的基础上，能够跨空间、时间和模态进行序列感知学习，同时对现实世界地球观测数据中常见的可变长度输入保持鲁棒性。我们的实验证明了TerraFlow在GEO-Bench-2基准的所有时间任务中优于最先进的地球观测基础模型。此外，我们还证明了TerraFlow能够为基于深度学习的自然灾害风险地图预测迈出第一步-这是一项其他最先进的基础模型经常崩溃的任务。TerraFlow在F1得分和Brier得分方面分别比最先进的基础模型高出50%和24%。
摘要：We propose TerraFlow, a novel approach to multimodal, multitemporal learning for Earth observation. TerraFlow builds on temporal training objectives that enable sequence-aware learning across space, time, and modality, while remaining robust to the variable-length inputs commonly encountered in real-world Earth observation data. Our experiments demonstrate superiority of TerraFlow over state-of-the-art foundation models for Earth observation across all temporal tasks of the GEO-Bench-2 benchmark. We additionally demonstrate that TerraFlow is able to make initial steps towards deep-learning based risk map prediction for natural disasters -- a task on which other state-of-the-art foundation models frequently collapse. TerraFlow outperforms state-of-the-art foundation models by up to 50% in F1 score and 24% in Brier score.

3D|3D重建等相关(1篇)

【1】3DTCR: A Physics-Based Generative Framework for Vortex-Following 3D Reconstruction to Improve Tropical Cyclone Intensity Forecasting
标题：3DTCR：基于物理的涡跟踪3D重建生成框架，以改进热带气旋强度预测
链接：https://arxiv.org/abs/2603.13049

作者：Jun Liu,Xiaohui Zhong,Kai Zheng,Jiarui Li,Yifei Li,Tao Zhou,Wenxu Qian,Shun Dai,Ruian Tie,Yangyang Zhao,Hao Li
摘要：热带气旋（TC）强度预报仍然具有挑战性，因为目前的数值和基于AI的天气模式无法令人满意地代表极端TC结构和强度。虽然强度时间序列预报取得了重大进展，但它输出的是强度序列，而不是控制TC演变的三维内核细尺度结构和物理机制。高分辨率的数值模拟可以捕捉到这些特征，但对于大规模的操作应用来说，计算成本仍然很高，效率也很低。在这里，我们提出了3DTCR，这是一个基于物理的生成框架，将物理约束与生成AI效率相结合，用于3D TC结构重建。在六年，3公里分辨率的移动域WRF数据集上训练，3DTCR使用条件流匹配（CFM）实现区域自适应涡流跟随重建，通过潜在域自适应和两阶段转移学习进行优化。该框架减轻了低分辨率目标和过度平滑预测带来的限制，提高了TC内核结构和强度的代表性，同时保持了轨道稳定性。结果表明，3DTCR在5 d以内的所有预报时间内都优于ECMWF的高分辨率预报系统（ECMWF-HRES），并使最大WS 10 M的RMSE相对于其FuXi输入降低了36.5%。这些研究结果突出了3DTCR作为一个基于物理的生成框架，有效地解决了精细尺度结构在较低的计算成本，这可能提供了一个有前途的途径，提高TC强度预测。
摘要：Tropical cyclone (TC) intensity forecasting remains challenging as current numerical and AI-based weather models fail to satisfactorily represent extreme TC structure and intensity. Although intensity time-series forecasting has achieved significant advances, it outputs intensity sequences rather than the three-dimensional inner-core fine-scale structure and physical mechanisms governing TC evolution. High-resolution numerical simulations can capture these features but remain computationally expensive and inefficient for large-scale operational applications. Here we present 3DTCR, a physics-based generative framework combining physical constraints with generative AI efficiency for 3D TC structure reconstruction. Trained on a six-year, 3-km-resolution moving-domain WRF dataset, 3DTCR enables region-adaptive vortex-following reconstruction using conditional Flow Matching(CFM), optimized via latent domain adaptation and two-stage transfer learning. The framework mitigates limitations imposed by low-resolution targets and over-smoothed forecasts, improving the representation of TC inner-core structure and intensity while maintaining track stability. Results demonstrate that 3DTCR outperforms the ECMWF high-resolution forecasting system (ECMWF-HRES) in TC intensity prediction at nearly all lead times up to 5 days and reduces the RMSE of maximum WS10M by 36.5\% relative to its FuXi inputs. These findings highlight 3DTCR as a physics-based generative framework that efficiently resolves fine-scale structures at lower computational cost, which may offer a promising avenue for improving TC intensity forecasting.

优化|敛散性(10篇)

【1】PhysMoDPO: Physically-Plausible Humanoid Motion with Preference Optimization
标题：PhysMoDPO：具有偏好优化的物理上合理的人形运动
链接：https://arxiv.org/abs/2603.13228

作者：Yangsong Zhang,Anujith Muraleedharan,Rikhat Akizhanov,Abdul Ahad Butt,Gül Varol,Pascal Fua,Fabio Pizzati,Ivan Laptev
摘要：文本条件下的人体运动生成的最新进展在很大程度上是由大规模人体运动数据训练的扩散模型驱动的。在此基础上，最近的方法试图通过应用全身控制器（WBC）将扩散生成的运动转换为可执行的轨迹，来转移这种模型用于角色动画和真实的机器人控制。虽然WBC轨迹变得符合物理学，但它们可能会暴露出与原始运动的实质性偏差。为了解决这个问题，我们在这里提出了PhysMoDPO，一个直接偏好优化框架。与之前依赖于手工制作的物理感知算法（如滑脚惩罚）的工作不同，我们将WBC集成到我们的训练管道中并优化扩散模型，使WBC的输出符合物理和原始文本指令。为了训练PhysMoDPO，我们部署了基于物理和特定任务的奖励，并使用它们来为合成轨迹分配偏好。我们对文本到运动和空间控制任务的广泛实验表明，PhysMoDPO在模拟机器人的物理现实主义和任务相关指标方面都有持续的改进。此外，我们证明了PhysMoDPO的结果显着改善时，应用到zero-shot运动传输模拟和现实世界的部署G1类人机器人。
摘要：Recent progress in text-conditioned human motion generation has been largely driven by diffusion models trained on large-scale human motion data. Building on this progress, recent methods attempt to transfer such models for character animation and real robot control by applying a Whole-Body Controller (WBC) that converts diffusion-generated motions into executable trajectories. While WBC trajectories become compliant with physics, they may expose substantial deviations from original motion. To address this issue, we here propose PhysMoDPO, a Direct Preference Optimization framework. Unlike prior work that relies on hand-crafted physics-aware heuristics such as foot-sliding penalties, we integrate WBC into our training pipeline and optimize diffusion model such that the output of WBC becomes compliant both with physics and original text instructions. To train PhysMoDPO we deploy physics-based and task-specific rewards and use them to assign preference to synthesized trajectories. Our extensive experiments on text-to-motion and spatial control tasks demonstrate consistent improvements of PhysMoDPO in both physical realism and task-related metrics on simulated robots. Moreover, we demonstrate that PhysMoDPO results in significant improvements when applied to zero-shot motion transfer in simulation and for real-world deployment on a G1 humanoid robot.

【2】OpenACMv2: An Accuracy-Constrained Co-Optimization Framework for Approximate DCiM
标题：OpenACMv2：一个近似DCiM的精度约束协同优化框架
链接：https://arxiv.org/abs/2603.13042

作者：Yiqi Zhou,Yue Yuan,Yikai Wang,Bohao Liu,Qinxin Mei,Zhuohua Liu,Shan Shen,Wei Xing,Daying Sun,Li Li,Guozhu Liu
备注：Accepted by DAC2026. Initial version
摘要：数字内存计算（DCiM）通过减少数据移动来加速神经网络。近似DCiM可以进一步提高功率性能面积（PPA），但需要在耦合架构和晶体管级选择之间进行精度约束的协同优化。在OpenYield的基础上，我们引入了精度约束协同优化（ACCO），并介绍了OpenACMv 2，这是一个开放式框架，通过两级优化实现了ACCO：（1）压缩器组合和SRAM宏参数的精度约束架构搜索，由PPA和误差的快速基于GNN的代理驱动;（2）使用Monte Carlo的标准单元和SRAM位单元的变化和PVT感知晶体管尺寸。通过将ACCO解耦为架构级探索和电路级调整，OpenACMv 2集成了经典的单目标和多目标优化器，以实现强大的PPA精度权衡和稳健的收敛。该工作流程与FreePDK 45和OpenROAD兼容，支持可重复评估和易于采用。实验表明，显着的PPA的改善控制精度预算下，使快速的“假设”探索近似DCiM。该框架可在https://github.com/ShenShan123/OpenACM上查阅。
摘要：Digital Compute-in-Memory (DCiM) accelerates neural networks by reducing data movement. Approximate DCiM can further improve power-performance-area (PPA), but demands accuracy-constrained co-optimization across coupled architecture and transistor-level choices. Building on OpenYield, we introduce Accuracy-Constrained Co-Optimization (ACCO) and present OpenACMv2, an open framework that operationalizes ACCO via two-level optimization: (1) accuracy-constrained architecture search of compressor combinations and SRAM macro parameters, driven by a fast GNN-based surrogate for PPA and error; and (2) variation- and PVT-aware transistor sizing for standard cells and SRAM bitcells using Monte Carlo. By decoupling ACCO into architecture-level exploration and circuit-level sizing, OpenACMv2 integrates classic single- and multi-objective optimizers to deliver strong PPA-accuracy tradeoffs and robust convergence. The workflow is compatible with FreePDK45 and OpenROAD, supporting reproducible evaluation and easy adoption. Experiments demonstrate significant PPA improvements under controlled accuracy budgets, enabling rapid "what-if" exploration for approximate DCiM. The framework is available on https://github.com/ShenShan123/OpenACM.

【3】Finite Difference Flow Optimization for RL Post-Training of Text-to-Image Models
标题：文本到图像模型RL后训练的有限差流优化
链接：https://arxiv.org/abs/2603.12893

作者：David McAllister,Miika Aittala,Tero Karras,Janne Hellsten,Angjoo Kanazawa,Timo Aila,Samuli Laine
备注：Code available at https://github.com/NVlabs/finite-difference-flow-optimization
摘要：强化学习（RL）已成为训练后基于扩散的图像合成模型的标准技术，因为它能够从奖励信号中学习，以显式地改善图像质量和即时对齐等理想方面。在本文中，我们提出了一种在线RL变体，通过对成对轨迹进行采样并将流速拉向更有利的图像方向来减少模型更新的方差。与将每个采样步骤视为单独的策略动作的现有方法不同，我们将整个采样过程视为单个动作。我们使用高质量的视觉语言模型和现成的质量指标进行实验，并使用广泛的指标来评估输出。与以前的方法相比，我们的方法收敛速度更快，输出质量更高，对齐更及时。
摘要：Reinforcement learning (RL) has become a standard technique for post-training diffusion-based image synthesis models, as it enables learning from reward signals to explicitly improve desirable aspects such as image quality and prompt alignment. In this paper, we propose an online RL variant that reduces the variance in the model updates by sampling paired trajectories and pulling the flow velocity in the direction of the more favorable image. Unlike existing methods that treat each sampling step as a separate policy action, we consider the entire sampling process as a single action. We experiment with both high-quality vision language models and off-the-shelf quality metrics for rewards, and evaluate the outputs using a broad set of metrics. Our method converges faster and yields higher output quality and prompt alignment than previous approaches.

【4】Optimize Wider, Not Deeper: Consensus Aggregation for Policy Optimization
标题：更广泛而非更深入地优化：策略优化的共识聚合
链接：https://arxiv.org/abs/2603.12596

作者：Zelal Su,Mustafaoglu,Sungyoung Lee,Eshan Balachandar,Risto Miikkulainen,Keshav Pingali
摘要：邻近策略优化（PPO）使用多个时期的剪切SGD来近似信任区域更新。每个历元可能会从自然梯度方向进一步漂移，从而产生路径相关噪声。为了理解这种漂移，我们可以使用Fisher信息几何将策略更新分解为信号（自然梯度投影）和浪费（消耗信任域预算而没有一阶代理改进的Fisher正交残差）。从经验上看，信号会饱和，但浪费会随着时间的推移而增加，从而造成优化深度的困境。我们提出了策略优化的共识聚合（CAPO），它将计算从深度重定向到宽度：$K$ PPO复制在同一批上进行优化，仅在小批量洗牌顺序上有所不同，然后聚合成共识。我们研究聚集在两个空间：欧几里德参数空间，通过对数意见池的政策分布的自然参数空间。在自然参数空间中，共识可证明实现更高的KL-惩罚代理和更严格的信任域的遵守比平均专家，参数平均继承这些保证近似。在连续控制任务中，CAPO的性能优于PPO，并且在固定样本预算下计算匹配的更深基线高达8.6倍。CAPO表明，政策优化可以通过优化更广泛，而不是更深，没有额外的环境相互作用。
摘要：Proximal policy optimization (PPO) approximates the trust region update using multiple epochs of clipped SGD. Each epoch may drift further from the natural gradient direction, creating path-dependent noise. To understand this drift, we can use Fisher information geometry to decompose policy updates into signal (the natural gradient projection) and waste (the Fisher-orthogonal residual that consumes trust region budget without first-order surrogate improvement). Empirically, signal saturates but waste grows with additional epochs, creating an optimization-depth dilemma. We propose Consensus Aggregation for Policy Optimization (CAPO), which redirects compute from depth to width: $K$ PPO replicates are optimized on the same batch, differing only in minibatch shuffling order, and then aggregated into a consensus. We study aggregation in two spaces: Euclidean parameter space, and the natural parameter space of the policy distribution via the logarithmic opinion pool. In natural parameter space, the consensus provably achieves higher KL-penalized surrogate and tighter trust region compliance than the mean expert; parameter averaging inherits these guarantees approximately. On continuous control tasks, CAPO outperforms PPO and compute-matched deeper baselines under fixed sample budgets by up to 8.6x. CAPO demonstrates that policy optimization can be improved by optimizing wider, rather than deeper, without additional environment interactions.

【5】Byzantine-Robust Optimization under $(L_0, L_1)$-Smoothness
标题：$（L_0，L_1）$-光滑度下的拜占庭鲁棒优化
链接：https://arxiv.org/abs/2603.12512

作者：Arman Bolatov,Samuel Horváth,Martin Takáč,Eduard Gorbunov
备注：10 pages, 1 table, 4 figures, accepted to CPAL 2026
摘要：我们考虑分布式优化拜占庭攻击下的存在$（L_0，L_1）$-光滑，一个推广的标准$L$-光滑，捕获功能与状态相关的梯度Lipschitz常数。我们提出了Byz-NSGDM，这是一种具有动量的归一化随机梯度下降方法，可以在保持收敛保证的同时实现对拜占庭工人的鲁棒性。我们的算法结合了动量归一化和最近邻混合（NNM）增强的拜占庭鲁棒聚合来处理$（L_0，L_1）$平滑和拜占庭对手带来的挑战。我们证明了Byz-NSGDM达到了O（K^{-1/4}）$的收敛速度到拜占庭偏差地板成比例的鲁棒性系数和梯度异质性。实验验证异构MNIST分类，合成$（L_0，L_1）$光滑优化，字符级语言建模与一个小的GPT模型证明了我们的方法对各种拜占庭攻击策略的有效性。消融研究进一步表明，Byz-NSGDM在广泛的动量和学习率选择中是稳健的。
摘要：We consider distributed optimization under Byzantine attacks in the presence of $(L_0,L_1)$-smoothness, a generalization of standard $L$-smoothness that captures functions with state-dependent gradient Lipschitz constants. We propose Byz-NSGDM, a normalized stochastic gradient descent method with momentum that achieves robustness against Byzantine workers while maintaining convergence guarantees. Our algorithm combines momentum normalization with Byzantine-robust aggregation enhanced by Nearest Neighbor Mixing (NNM) to handle both the challenges posed by $(L_0,L_1)$-smoothness and Byzantine adversaries. We prove that Byz-NSGDM achieves a convergence rate of $O(K^{-1/4})$ up to a Byzantine bias floor proportional to the robustness coefficient and gradient heterogeneity. Experimental validation on heterogeneous MNIST classification, synthetic $(L_0,L_1)$-smooth optimization, and character-level language modeling with a small GPT model demonstrates the effectiveness of our approach against various Byzantine attack strategies. An ablation study further shows that Byz-NSGDM is robust across a wide range of momentum and learning rate choices.

【6】Less Data, Faster Convergence: Goal-Driven Data Optimization for Multimodal Instruction Tuning
标题：更少的数据，更快的收敛：目标驱动的数据优化用于多模式指令调优
链接：https://arxiv.org/abs/2603.12478

作者：Rujie Wu,Haozhe Zhao,Hai Ci,Yizhou Wang
摘要：多模式指令调整通常是计算效率低下的，因为训练预算分散在大型混合图像-视频池中，其效用非常不均匀。我们提出了目标驱动的数据优化（GDO），一个框架，计算六个样本描述符为每个候选人和构造优化的1$\times$训练子集为不同的目标。在8个H20 GPU上使用固定的单历元Qwen 3-VL-8B-Instruct训练和评估配方，GDO使用的训练样本比Uni-10 x基线少得多，同时收敛速度更快，精度更高。相对于固定的512 k样本Uni-10 x基线，GDO在MVBench上35.4k样本、VideoMME上26.6k样本、MLVU上27.3k样本和LVBench上34.7k样本后达到Uni-10 x参考值，同时准确度分别提高了+1.38、+1.67、+3.08和+0.84个百分点。MVBench和MLVU的收益最大，而LVBench的改进幅度较小，这与其超长视频设置以及该基准与短视频/图像主导的训练池之间的不匹配一致。在MinLoss、Diverse、Temp和Temp+中，更强的时间强调会稳步提高长视频理解行为。总的来说，GDO提供了一个目标驱动的数据优化框架，可以在固定的训练协议下用更少的训练样本实现更快的收敛。代码可在https://github.com/rujiewu/GDO上获得。
摘要：Multimodal instruction tuning is often compute-inefficient because training budgets are spread across large mixed image-video pools whose utility is highly uneven. We present Goal-Driven Data Optimization (GDO), a framework that computes six sample descriptors for each candidate and constructs optimized 1$\times$ training subsets for different goals. Under a fixed one-epoch Qwen3-VL-8B-Instruct training and evaluation recipe on 8 H20 GPUs, GDO uses far fewer training samples than the Uni-10x baseline while converging faster and achieving higher accuracy. Relative to the fixed 512k-sample Uni-10x baseline, GDO reaches the Uni-10x reference after 35.4k samples on MVBench, 26.6k on VideoMME, 27.3k on MLVU, and 34.7k on LVBench, while improving Accuracy by +1.38, +1.67, +3.08, and +0.84 percentage points, respectively. The gains are largest on MVBench and MLVU, while LVBench improves more modestly, consistent with its ultra-long-video setting and the mismatch between that benchmark and the short-video/image-dominant training pool. Across MinLoss, Diverse, Temp, and Temp+, stronger temporal emphasis yields steadily better long-video understanding behavior. Overall, GDO provides a goal-driven data optimization framework that enables faster convergence with fewer training samples under a fixed training protocol. Code is available at https://github.com/rujiewu/GDO.

【7】KernelFoundry: Hardware-aware evolutionary GPU kernel optimization
标题：KernelFoundry：硬件感知的进化式图形处理器内核优化
链接：https://arxiv.org/abs/2603.12440

作者：Nina Wiedemann,Quentin Leboutet,Michael Paulitsch,Diana Wofk,Benjamin Ummenhofer
摘要：优化GPU内核对大型语言模型（LLM）提出了比标准代码生成任务更大的挑战，因为它需要了解硬件架构、并行优化策略和性能分析输出。大多数现有的基于LLM的内核生成方法依赖于简单的提示和反馈循环，仅通过分析反馈间接地结合硬件感知。我们介绍了KernelFoundry，一个进化的框架，通过三个关键机制有效地探索GPU内核设计空间：（1）MAP-Elites质量多样性搜索与内核特定的行为维度，以维持不同的优化策略的探索;（2）元提示进化，其与内核共同进化提示以揭示特定于任务的优化策略，以及（3）基于模板的参数优化，以将内核调整到输入和硬件。我们在KernelBench，robust-kbench和自定义任务上评估了这个框架，生成SYCL内核作为跨平台GPU编程模型和CUDA内核，以与以前的工作进行比较。我们的方法始终优于基线方法，在SYCL的KernelBench上实现了2.3倍的平均加速比。此外，KernelFoundry是作为一个分布式框架实现的，可以远程访问各种硬件，实现快速基准测试，并具有灵活的用户输入层，支持为基准测试之外的各种真实用例生成内核。
摘要：Optimizing GPU kernels presents a significantly greater challenge for large language models (LLMs) than standard code generation tasks, as it requires understanding hardware architecture, parallel optimization strategies, and performance profiling outputs. Most existing LLM-based approaches to kernel generation rely on simple prompting and feedback loops, incorporating hardware awareness only indirectly through profiling feedback. We introduce KernelFoundry, an evolutionary framework that efficiently explores the GPU kernel design space through three key mechanisms: (1) MAP-Elites quality-diversity search with kernel-specific behavioral dimensions to sustain exploration across diverse optimization strategies; (2) meta-prompt evolution, which co-evolves prompts with kernels to uncover task-specific optimization strategies, and (3) template-based parameter optimization to tune kernels to inputs and hardware. We evaluate this framework on KernelBench, robust-kbench, and custom tasks, generating SYCL kernels as a cross-platform GPU programming model and CUDA kernels for comparison to prior work. Our approach consistently outperforms the baseline methods, achieving an average speedup of 2.3x on KernelBench for SYCL. Moreover, KernelFoundry is implemented as a distributed framework with remote access to diverse hardware, enabling rapid benchmarking and featuring a flexible user input layer that supports kernel generation for a wide range of real-world use cases beyond benchmarking.

【8】A Geometrically-Grounded Drive for MDL-Based Optimization in Deep Learning
标题：深度学习中基于MDL的优化的几何基础驱动器
链接：https://arxiv.org/abs/2603.12304

作者：Ming Lei,Shufan Wu,Christophe Baehr
备注：8 pages, 9 figures, submitted to a journal and under review
摘要：本文介绍了一种新的优化框架，该框架从根本上将最小描述长度（MDL）原则集成到深度神经网络的训练动态中。超越其传统的角色作为模型选择标准，我们重新制定MDL作为一个积极的，自适应的驱动力在优化过程本身。我们的方法的核心是一个几何接地的认知流形，其演变是由\textit{耦合Ricci流}，丰富了一个新的\textit{MDL驱动}项来自第一原则。这种驱动由任务丢失梯度调制，在数据保真度和模型简化之间创建了一个无缝的和谐，在训练过程中主动压缩内部表示。我们建立了一个全面的理论基础，证明关键属性，包括单调减少的描述长度（定理~\ref{thm：收敛}），有限数量的拓扑相变通过几何手术协议（定理~\ref{thm：手术}，\ref{thm：终极_命运}），并出现普遍的临界行为（定理~\ref{thm：普遍性}）。此外，我们提供了一个实用的，计算效率高的算法与$O（N \log N）$每次迭代的复杂性（定理~\ref{thm：复杂性}），同时保证数值稳定性（定理~\ref{thm：稳定性}）和指数收敛的凸性假设（定理~\ref{thm：收敛率}）。合成回归和分类任务的实证验证证实了理论预测，证明了算法在实现鲁棒泛化和自主模型简化方面的有效性。这项工作通过将几何深度学习与信息理论原理相统一，为实现更自主、更可推广和更可解释的AI系统提供了一条原则性的道路。
摘要：This paper introduces a novel optimization framework that fundamentally integrates the Minimum Description Length (MDL) principle into the training dynamics of deep neural networks. Moving beyond its conventional role as a model selection criterion, we reformulate MDL as an active, adaptive driving force within the optimization process itself. The core of our method is a geometrically-grounded cognitive manifold whose evolution is governed by a \textit{coupled Ricci flow}, enriched with a novel \textit{MDL Drive} term derived from first principles. This drive, modulated by the task-loss gradient, creates a seamless harmony between data fidelity and model simplification, actively compressing the internal representation during training. We establish a comprehensive theoretical foundation, proving key properties including the monotonic decrease of description length (Theorem~\ref{thm:convergence}), a finite number of topological phase transitions via a geometric surgery protocol (Theorems~\ref{thm:surgery}, \ref{thm:ultimate_fate}), and the emergence of universal critical behavior (Theorem~\ref{thm:universality}). Furthermore, we provide a practical, computationally efficient algorithm with $O(N \log N)$ per-iteration complexity (Theorem~\ref{thm:complexity}), alongside guarantees for numerical stability (Theorem~\ref{thm:stability}) and exponential convergence under convexity assumptions (Theorem~\ref{thm:convergence_rate}). Empirical validation on synthetic regression and classification tasks confirms the theoretical predictions, demonstrating the algorithm's efficacy in achieving robust generalization and autonomous model simplification. This work provides a principled path toward more autonomous, generalizable, and interpretable AI systems by unifying geometric deep learning with information-theoretic principles.

【9】Convergence Rate of a Functional Learning Method for Contextual Stochastic Optimization
标题：上下文随机优化函数学习方法的收敛速度
链接：https://arxiv.org/abs/2603.13048

作者：Noel Smith,Andrzej Ruszczynski
摘要：我们考虑一个随机优化问题，涉及两个随机变量：一个上下文变量$X$和一个因变量$Y$。目标是最小化应用于条件期望$\mathbb{E}[f（X，Y，β）\mid X]$的非线性损失泛函的期望值，其中$f$是非线性函数，$β$表示决策变量。我们专注于实际重要的设置，其中直接从条件分布Y \mid X\$采样是不可行的，只有一个流的i.i.d.\观测对$\{（X^k，Y^k）\}_{k= 0，1，2，\ldots}$可用。在我们的方法中，条件期望在预先指定的参数函数类中进行近似。我们分析了一个同时学习和优化算法，联合估计的条件期望和优化的外部目标，并建立该方法实现了收敛速度的顺序$\mathcal{O}\big（1/\sqrt{N}\big）$，其中$N$表示观察到的对的数量。
摘要：We consider a stochastic optimization problem involving two random variables: a context variable $X$ and a dependent variable $Y$. The objective is to minimize the expected value of a nonlinear loss functional applied to the conditional expectation $\mathbb{E}[f(X, Y,β) \mid X]$, where $f$ is a nonlinear function and $β$ represents the decision variables. We focus on the practically important setting in which direct sampling from the conditional distribution of $Y \mid X$ is infeasible, and only a stream of i.i.d.\ observation pairs $\{(X^k, Y^k)\}_{k=0,1,2,\ldots}$ is available. In our approach, the conditional expectation is approximated within a prespecified parametric function class. We analyze a simultaneous learning-and-optimization algorithm that jointly estimates the conditional expectation and optimizes the outer objective, and establish that the method achieves a convergence rate of order $\mathcal{O}\big(1/\sqrt{N}\big)$, where $N$ denotes the number of observed pairs.

【10】Optimal Experimental Design for Reliable Learning of History-Dependent Constitutive Laws
标题：可靠学习历史相关本构定律的最佳实验设计
链接：https://arxiv.org/abs/2603.12365

作者：Kaushik Bhattacharya,Lianghao Cao,Andrew Stuart
摘要：与历史相关的本构模型可以作为微观力学聚集效应的宏观封闭。它们的参数通常从实验数据中学习。在有限的实验预算下，很难得到表征本构关系所需的全方位响应。因此，数据可以很好地解释了一系列的参数选择，导致参数估计是不确定的或不可靠的。为了解决这个问题，我们提出了一个贝叶斯最优实验设计框架来量化，解释和最大化实验设计的效用，以可靠地学习历史相关的本构模型。在这个框架中，设计效用被定义为参数不确定性的预期减少或预期的信息增益。这使得使用模拟数据的计算机设计优化成为可能，并降低了可靠参数识别的物理实验成本。我们介绍了两个近似，使这个框架实用的先进材料测试昂贵的前向模型和高维数据：（i）高斯近似的预期信息增益，和（ii）代理近似的Fisher信息矩阵。前者使有效的设计优化和解释，而后者扩展了这种方法，分批设计优化，摊销重复效用评估的成本。粘弹性固体单轴试验的数值研究表明，优化的试样几何形状和加载路径产生的图像和力的数据，显着提高参数识别相对于随机设计，特别是与记忆效应相关的参数。
摘要：History-dependent constitutive models serve as macroscopic closures for the aggregated effects of micromechanics. Their parameters are typically learned from experimental data. With a limited experimental budget, eliciting the full range of responses needed to characterize the constitutive relation can be difficult. As a result, the data can be well explained by a range of parameter choices, leading to parameter estimates that are uncertain or unreliable. To address this issue, we propose a Bayesian optimal experimental design framework to quantify, interpret, and maximize the utility of experimental designs for reliable learning of history-dependent constitutive models. In this framework, the design utility is defined as the expected reduction in parametric uncertainty or the expected information gain. This enables in silico design optimization using simulated data and reduces the cost of physical experiments for reliable parameter identification. We introduce two approximations that make this framework practical for advanced material testing with expensive forward models and high-dimensional data: (i) a Gaussian approximation of the expected information gain, and (ii) a surrogate approximation of the Fisher information matrix. The former enables efficient design optimization and interpretation, while the latter extends this approach to batched design optimization by amortizing the cost of repeated utility evaluations. Our numerical studies of uniaxial tests for viscoelastic solids show that optimized specimen geometries and loading paths yield image and force data that significantly improve parameter identifiability relative to random designs, especially for parameters associated with memory effects.

预测|估计(6篇)

【1】Competition-Aware CPC Forecasting with Near-Market Coverage
标题：具有竞争意识的CPC预测具有近市场覆盖
链接：https://arxiv.org/abs/2603.13059

作者：Sebastian Frey,Edoardo Beccari,Maximilian Kranz,Nicolò Alberto Pellizzari,Ali Mete Karaman,Qiwei Han,Maximilian Kaiser
备注：16 pages, 2 figures, 4 tables
摘要：付费搜索中的每次点击成本（CPC）是由竞争格局产生的不稳定的拍卖结果，从任何单个广告客户的历史中只能部分观察到。使用来自集中汽车租赁市场（2021- 2023年）的Google Ads拍卖日志，我们预测了1，811个关键字系列的每周CPC，并通过从关键字文本，CPC轨迹和地理市场结构中获得的补充信号来近似潜在竞争。我们构建（i）语义邻域和语义关键字图，这些语义邻域和语义关键字图来自于预先训练的基于transformer的关键字文本表示，（ii）通过CPC轨迹的动态时间规整（DTW）对齐的行为邻域，以及（iii）捕获本地化需求和市场异质性的地理意图协变量。我们广泛评估这些信号作为独立的协变量和时空图预测器的关系先验，将它们与强大的统计，神经和时间序列基础模型基线进行基准测试。在各种方法中，竞争意识增强提高了与业务相关的中长期的稳定性和错误状况，其中竞争机制的变化和波动性是最重要的。结果表明，广泛的市场结果覆盖，结合关键词派生的语义和地理先验，提供了一种可扩展的方式来近似潜在的竞争，提高CPC预测拍卖驱动的市场。
摘要：Cost-per-click (CPC) in paid search is a volatile auction outcome generated by a competitive landscape that is only partially observable from any single advertiser's history. Using Google Ads auction logs from a concentrated car-rental market (2021--2023), we forecast weekly CPC for 1,811 keyword series and approximate latent competition through complementary signals derived from keyword text, CPC trajectories, and geographic market structure. We construct (i) semantic neighborhoods and a semantic keyword graph from pretrained transformer-based representations of keyword text, (ii) behavioral neighborhoods via Dynamic Time Warping (DTW) alignment of CPC trajectories, and (iii) geographic-intent covariates capturing localized demand and marketplace heterogeneity. We extensively evaluate these signals both as stand-alone covariates and as relational priors in spatiotemporal graph forecasters, benchmarking them against strong statistical, neural, and time-series foundation-model baselines. Across methods, competition-aware augmentation improves stability and error profiles at business-relevant medium and longer horizons, where competitive regimes shift and volatility is most consequential. The results show that broad market-outcome coverage, combined with keyword-derived semantic and geographic priors, provides a scalable way to approximate latent competition and improve CPC forecasting in auction-driven markets.

【2】From AI Weather Prediction to Infrastructure Resilience: A Correction-Downscaling Framework for Tropical Cyclone Impacts
标题：从人工智能天气预测到基础设施复原力：热带气旋影响的修正-缩减框架
链接：https://arxiv.org/abs/2603.12828

作者：You Wu,Zhenguo Wang,Naiyu Wang
摘要：本文讨论了基础设施弹性中缺失的能力：将快速的全球人工智能天气预报转化为资产规模的可操作风险。我们介绍了基于人工智能的校正-降尺度框架（ACDF），它将粗糙的人工智能天气预测（AIWP）转换为热带气旋的500米无偏风场和输电塔/线路故障概率。ACDF将风暴尺度偏差校正与地形感知降尺度分离，防止错误传播，同时恢复控制结构载荷的亚公里可变性。在11个影响中国浙江的台风上进行了测试，ACDF将站尺度风速MAE降低了38.8%，与Pangu-Weather相比，与观测同化的中尺度分析相匹配，但在单个GPU上每12小时运行25秒。在台风黑格比的案例中，ACDF再现了观测到的强风尾，隔离了沿海高风险走廊，并标记了失败的线路，在塔和线路尺度上展示了可操作的指导。ACDF提供了一个端到端的途径，从人工智能全球预测到关键基础设施的可操作的、基于影响的预警。
摘要：This paper addresses a missing capability in infrastructure resilience: turning fast, global AI weather forecasts into asset-scale, actionable risk. We introduce the AI-based Correction-Downscaling Framework (ACDF), which transforms coarse AI weather prediction (AIWP) into 500-m, unbiased wind fields and transmission tower/line failure probabilities for tropical cyclones. ACDF separates storm-scale bias correction from terrain-aware downscaling, preventing error propagation while restoring sub-kilometer variability that governs structural loading. Tested on 11 typhoons affecting Zhejiang, China under leave-one-storm-out evaluation, ACDF reduces station-scale wind-speed MAE by 38.8% versus Pangu-Weather, matches observation-assimilated mesoscale analyses, yet runs in 25 s per 12-h cycle on a single GPU. In the Typhoon Hagupit case, ACDF reproduced observed high-wind tails, isolated a coastal high-risk corridor, and flagged the line that failed, demonstrating actionable guidance at tower and line scales. ACDF provides an end-to-end pathway from AI global forecasts to operational, impact-based early warning for critical infrastructure.

【3】Overcoming the Modality Gap in Context-Aided Forecasting
标题：克服上下文辅助预测中的形态差距
链接：https://arxiv.org/abs/2603.12451

作者：Vincent Zhihao Zheng,Étienne Marcotte,Arjun Ashok,Andrew Robert Williams,Lijun Sun,Alexandre Drouin,Valentina Zantedeschi
摘要：上下文辅助预测（CAF）有望整合领域知识和前瞻性信息，使人工智能系统能够超越传统的统计方法。然而，最近的实证研究揭示了一个令人困惑的差距：多模态模型往往无法超越他们的单峰同行。我们假设这种表现不佳源于现有数据集的上下文质量差，因为验证具有挑战性。为了解决这些局限性，我们引入了一种半合成的数据增强方法，该方法可以生成描述时间动态的上下文，并可验证地补充数值历史。这种方法可以创建大规模的数据集，从而产生CAF-7 M，这是一个包含700万个上下文增强时间序列窗口的语料库，其中包括一个经过严格验证的测试集。我们证明，半合成预训练可以有效地转移到现实世界的评估，并显示出上下文利用的明确证据。我们的研究结果表明，数据集的质量，而不是架构的限制，一直是上下文辅助预测的主要瓶颈。
摘要：Context-aided forecasting (CAF) holds promise for integrating domain knowledge and forward-looking information, enabling AI systems to surpass traditional statistical methods. However, recent empirical studies reveal a puzzling gap: multimodal models often fail to outperform their unimodal counterparts. We hypothesize that this underperformance stems from poor context quality in existing datasets, as verification is challenging. To address these limitations, we introduce a semi-synthetic data augmentation method that generates contexts both descriptive of temporal dynamics and verifiably complementary to numerical histories. This approach enables massive-scale dataset creation, resulting in CAF-7M, a corpus of 7 million context-augmented time series windows, including a rigorously verified test set. We demonstrate that semi-synthetic pre-training transfers effectively to real-world evaluation, and show clear evidence of context utilization. Our results suggest that dataset quality, rather than architectural limitations, has been the primary bottleneck in context-aided forecasting.

【4】Multi-objective Genetic Programming with Multi-view Multi-level Feature for Enhanced Protein Secondary Structure Prediction
标题：具有多视图多水平特征的多目标遗传规划增强蛋白质二级结构预测
链接：https://arxiv.org/abs/2603.12293

作者：Yining Qian,Lijie Su,Meiling Xu,Xianpeng Wang
摘要：预测蛋白质二级结构对于理解蛋白质功能和推进药物发现至关重要。然而，复杂的序列-结构关系对精确建模提出了重大挑战。为了解决这些问题，我们提出了MOGP-MMF，一个多目标遗传编程框架，重新制定PSSP作为一个自动优化任务，专注于功能选择和融合。具体来说，MOGP-MMF引入了一种多视图多级表示策略，该策略集成了进化、语义和新引入的结构视图，以捕获全面的蛋白质折叠逻辑。利用丰富的算子集，该框架同时演化线性和非线性融合函数，有效地捕获高阶特征交互，同时降低融合复杂度。为了解决精度和复杂度之间的权衡问题，提出了一种改进的多目标GP算法，该算法引入了知识转移机制，利用先验进化经验引导种群向全局最优解进化。在七个基准数据集上进行的广泛实验表明，MOGP-MMF超越了最先进的方法，特别是在Q8准确性和结构完整性方面。此外，MOGP-MMF生成了一组多样的非支配解，为各种实际应用场景提供了灵活的模型选择方案。源代码可在GitHub上获取：https://github.com/qian-ann/MOGP-MMF/tree/main。
摘要：Predicting protein secondary structure is essential for understanding protein function and advancing drug discovery. However, the intricate sequence-structure relationship poses significant challenges for accurate modeling. To address these, we propose MOGP-MMF, a multi-objective genetic programming framework that reformulates PSSP as an automated optimization task focused on feature selection and fusion. Specifically, MOGP-MMF introduces a multi-view multi-level representation strategy that integrates evolutionary, semantic, and newly introduced structural views to capture the comprehensive protein folding logic. Leveraging an enriched operator set, the framework evolves both linear and nonlinear fusion functions, effectively capturing high-order feature interactions while reducing fusion complexity. To resolve the accuracy-complexity trade-off, an improved multi-objective GP algorithm is developed, incorporating a knowledge transfer mechanism that utilizes prior evolutionary experience to guide the population toward global optima. Extensive experiments across seven benchmark datasets demonstrate that MOGP-MMF surpasses state-of-the-art methods, particularly in Q8 accuracy and structural integrity. Furthermore, MOGP-MMF generates a diverse set of non-dominated solutions, offering flexible model selection schemes for various practical application scenarios. The source code is available on GitHub: https://github.com/qian-ann/MOGP-MMF/tree/main.

【5】From Garbage to Gold: A Data-Architectural Theory of Predictive Robustness
标题：从垃圾到黄金：预测稳健性的数据架构理论
链接：https://arxiv.org/abs/2603.12288

作者：Terrence J. Lee-St. John,Jordan L. Lawson,Bartlomiej Piechowski-Jozwiak
备注：120 pages, 12 figures, 3 tables. Simulation code and documentation available at: https://github.com/tjleestjohn/from-garbage-to-gold
摘要：表格机器学习提出了一个悖论：现代模型使用高维（高维），共线，易出错的数据实现最先进的性能，无视“垃圾输入，垃圾输出”的咒语。为了帮助解决这个问题，我们综合了信息论、潜在因素模型和心理测量学的原理，阐明了预测鲁棒性不仅仅来自数据的清洁度，还来自数据架构和模型容量之间的协同作用。将预测空间的“噪声”划分为“预测误差”和“结构不确定性”（随机生成映射的信息缺陷），我们证明了利用高D的易错预测集渐进地克服了这两种类型的噪声，而清除低D集从根本上受到结构不确定性的限制。我们证明了为什么“信息共线性”（来自共享潜在原因的依赖关系）增强了可靠性和收敛效率，并解释了为什么增加的维度减少了潜在的推理负担，使有限样本的可行性。为了解决实际的限制，我们提出了“主动数据为中心的AI”来识别有效实现鲁棒性的预测器。我们还推导出系统误差机制的边界，并说明为什么吸收“流氓”依赖关系的模型可以减轻违反假设的情况。将潜在架构与良性过拟合联系起来，我们向对结果误差和预测空间噪声的鲁棒性的统一视图迈出了第一步，同时也描绘了传统DCAI对标签清洗的关注何时仍然强大。通过将数据质量从项目级完美重新定义为组合级架构，我们为“本地工厂”提供了理论基础-从实时的，未经策划的企业“数据沼泽”中学习-支持从“模型转移”到“方法转移”的部署范式转变，以克服静态概括性限制。
摘要：Tabular machine learning presents a paradox: modern models achieve state-of-the-art performance using high-dimensional (high-D), collinear, error-prone data, defying the "Garbage In, Garbage Out" mantra. To help resolve this, we synthesize principles from Information Theory, Latent Factor Models, and Psychometrics, clarifying that predictive robustness arises not solely from data cleanliness, but from the synergy between data architecture and model capacity. Partitioning predictor-space "noise" into "Predictor Error" and "Structural Uncertainty" (informational deficits from stochastic generative mappings), we prove that leveraging high-D sets of error-prone predictors asymptotically overcomes both types of noise, whereas cleaning a low-D set is fundamentally bounded by Structural Uncertainty. We demonstrate why "Informative Collinearity" (dependencies from shared latent causes) enhances reliability and convergence efficiency, and explain why increased dimensionality reduces the latent inference burden, enabling feasibility with finite samples. To address practical constraints, we propose "Proactive Data-Centric AI" to identify predictors that enable robustness efficiently. We also derive boundaries for Systematic Error Regimes and show why models that absorb "rogue" dependencies can mitigate assumption violations. Linking latent architecture to Benign Overfitting, we offer a first step towards a unified view of robustness to Outcome Error and predictor-space noise, while also delineating when traditional DCAI's focus on label cleaning remains powerful. By redefining data quality from item-level perfection to portfolio-level architecture, we provide a theoretical rationale for "Local Factories" -- learning from live, uncurated enterprise "data swamps" -- supporting a deployment paradigm shift from "Model Transfer" to "Methodology Transfer'' to overcome static generalizability limitations.

【6】Predictive Analytics for Foot Ulcers Using Time-Series Temperature and Pressure Data
标题：使用时间序列温度和压力数据对足部溃疡进行预测分析
链接：https://arxiv.org/abs/2603.12278

作者：Md Tanvir Hasan Turja
备注：36 pages, 19 figures
摘要：糖尿病足溃疡（DFU）是糖尿病的严重并发症，通常导致显著的发病率。本文提出了一种预测分析框架，利用可穿戴足部传感器捕获的时间序列数据-特别是用于温度测量的NTC薄膜热电偶和用于足底负荷监测的TMF压力传感器。数据收集自行走在仪器化路径上的健康受试者。应用无监督机器学习算法隔离森林和K最近邻（KNN）来检测可能表明早期溃疡风险的异常。通过严格的数据预处理和有针对性的特征工程，提取生理模式，以识别足部温度和压力的细微变化。结果表明，隔离森林是敏感的微异常，而KNN是有效的标记极端偏差，虽然在一个较高的假阳性率。温度和压力读数之间的强相关性支持组合传感器监测，以提高预测精度。这些发现为糖尿病足的实时健康监测提供了依据，旨在促进早期干预，降低DFU的发生率。
摘要：Diabetic foot ulcers (DFUs) are a severe complication of diabetes, often resulting in significant morbidity. This paper presents a predictive analytics framework utilizing time-series data captured by wearable foot sensors -- specifically NTC thin-film thermocouples for temperature measurement and FlexiForce pressure sensors for plantar load monitoring. Data was collected from healthy subjects walking on an instrumented pathway. Unsupervised machine learning algorithms, Isolation Forest and K-Nearest Neighbors (KNN), were applied to detect anomalies that may indicate early ulcer risk. Through rigorous data preprocessing and targeted feature engineering, physiologic patterns were extracted to identify subtle changes in foot temperature and pressure. Results demonstrate Isolation Forest is sensitive to micro-anomalies, while KNN is effective in flagging extreme deviations, albeit at a higher false-positive rate. Strong correlations between temperature and pressure readings support combined sensor monitoring for improved predictive accuracy. These findings provide a basis for real-time diabetic foot health surveillance, aiming to facilitate earlier intervention and reduce DFU incidence.

其他神经网络|深度学习|模型|建模(20篇)

【1】Towards Faithful Multimodal Concept Bottleneck Models
标题：迈向忠实的多模式概念瓶颈模型
链接：https://arxiv.org/abs/2603.13163

作者：Pierre Moreau,Emeline Pineau Ferrand,Yann Choho,Benjamin Wong,Annabelle Blangero,Milan Bhan
摘要：概念瓶颈模型（CBMs）是一种可解释的模型，它通过一层人类可解释的概念来进行预测。虽然在视觉和最近的NLP中得到了广泛的研究，但在多模态环境中，CBM在很大程度上仍未被探索。为了使它们的解释是忠实的，建立信任措施必须满足两个条件：概念必须被正确地检测，概念表示必须只对它们的预期语义进行编码，而不会将无关的任务相关或概念间信息走私到最终预测中，这种现象称为泄漏。现有的方法将概念检测和泄漏缓解视为单独的问题，并且通常以牺牲预测准确性为代价来改进一个。在这项工作中，我们介绍了f-CBM，一个忠实的多模态CBM框架，建立在视觉语言的骨干上，通过两个互补的策略共同针对这两个方面：一个可区分的泄漏损失，以减轻泄漏，和一个Kolmogorov-Arnold网络预测头，提供足够的表现力，以提高概念检测。实验表明，f-CBM实现了任务准确性，概念检测和泄漏减少之间的最佳权衡，同时无缝地应用于图像和文本或纯文本数据集，使其在各种模态之间通用。
摘要：Concept Bottleneck Models (CBMs) are interpretable models that route predictions through a layer of human-interpretable concepts. While widely studied in vision and, more recently, in NLP, CBMs remain largely unexplored in multimodal settings. For their explanations to be faithful, CBMs must satisfy two conditions: concepts must be properly detected, and concept representations must encode only their intended semantics, without smuggling extraneous task-relevant or inter-concept information into final predictions, a phenomenon known as leakage. Existing approaches treat concept detection and leakage mitigation as separate problems, and typically improve one at the expense of predictive accuracy. In this work, we introduce f-CBM, a faithful multimodal CBM framework built on a vision-language backbone that jointly targets both aspects through two complementary strategies: a differentiable leakage loss to mitigate leakage, and a Kolmogorov-Arnold Network prediction head that provides sufficient expressiveness to improve concept detection. Experiments demonstrate that f-CBM achieves the best trade-off between task accuracy, concept detection, and leakage reduction, while applying seamlessly to both image and text or text-only datasets, making it versatile across modalities.

【2】Exact Federated Continual Unlearning for Ridge Heads on Frozen Foundation Models
标题：冻结地基模型上山脊的精确联邦连续取消学习
链接：https://arxiv.org/abs/2603.12977

作者：Yijun Quan,Wentai Wu,Giovanni Montana
摘要：基础模型通常被部署为具有小的可训练头的冻结特征提取器，以适应联邦设置中的私有用户生成的数据。“被遗忘的权利”要求根据需求从训练模型中消除特定样本或用户的影响。现有的联邦非学习方法针对一般的深度模型，并依赖于近似重建或选择性再训练，使得精确性昂贵或难以实现。我们研究这个问题，在一个实际相关的，但在探索制度：一个冻结的基础模型与脊回归头。确切的最优值仅通过两个可加的充分统计量来依赖于数据，我们将其转化为一个通信协议，该协议通过固定大小的消息支持任意的添加和删除请求流。服务器维护一个头部，在精确的算术中，在每个请求之后，它都与集中式重新训练一致。我们提供了确定性的再训练等价保证，顺序和分区不变性，两个服务器端的变种，以及零KL发散的贝叶斯证书。在四个基准测试上的实验证实了保证：两种变体都将集中式脊再训练匹配到10 ^{-9}$相对Frobenius误差内，并以
摘要：Foundation models are commonly deployed as frozen feature extractors with a small trainable head to adapt to private, user-generated data in federated settings. The ``right to be forgotten'' requires removing the influence of specific samples or users from the trained model on demand. Existing federated unlearning methods target general deep models and rely on approximate reconstruction or selective retraining, making exactness costly or elusive. We study this problem in a practically relevant but under-explored regime: a frozen foundation model with a ridge-regression head. The exact optimum depends on the data only through two additive sufficient statistics, which we turn into a communication protocol supporting an arbitrary stream of \emph{add} and \emph{delete} requests via fixed-size messages. The server maintains a head that is, in exact arithmetic, \emph{pointwise identical} to centralized retraining after every request. We provide deterministic retrain-equivalence guarantees, order and partition invariance, two server-side variants, and a Bayesian certificate of zero KL divergence. Experiments on four benchmarks confirm the guarantees: both variants match centralized ridge retraining to within $10^{-9}$ relative Frobenius error and complete each request at orders-of-

【3】Upper Bounds for Local Learning Coefficients of Three-Layer Neural Networks
标题：三层神经网络局部学习系数的上界
链接：https://arxiv.org/abs/2603.12785

作者：Yuki Kurumadani
摘要：已知三层神经网络形成奇异学习模型，并且它们的贝叶斯渐近行为由学习系数或实对数典型阈值控制。虽然这个数量已经澄清了规则模型和一些特殊的奇异模型，广泛适用的方法来评估它在神经网络仍然有限。最近，半正则模型的局部学习系数的公式被提出，得到了一个上界的学习系数。然而，该公式仅适用于实现参数集中的非奇异点，不能用于奇异点。特别是，对于三层神经网络，所得到的上限已被证明与某些情况下已知的学习系数值有很大不同。本文给出了三层神经网络奇异点处局部学习系数的一个上界公式。该公式可以解释为预算约束和供求约束下的计数规则，并适用于一般的解析激活函数。特别是，它涵盖了swish函数和多项式函数，将以前的结果扩展到更广泛的一类激活函数。我们进一步表明，当输入维数为1时，这里得到的上界与已知的学习系数一致，从而部分解决了上述差异。我们的研究结果还提供了一个系统的角度来看，三层神经网络的权重参数如何影响学习系数。
摘要：Three-layer neural networks are known to form singular learning models, and their Bayesian asymptotic behavior is governed by the learning coefficient, or real log canonical threshold. Although this quantity has been clarified for regular models and for some special singular models, broadly applicable methods for evaluating it in neural networks remain limited. Recently, a formula for the local learning coefficient of semiregular models was proposed, yielding an upper bound on the learning coefficient. However, this formula applies only to nonsingular points in the set of realization parameters and cannot be used at singular points. In particular, for three-layer neural networks, the resulting upper bound has been shown to differ substantially from learning coefficient values already known in some cases. In this paper, we derive an upper-bound formula for the local learning coefficient at singular points in three-layer neural networks. This formula can be interpreted as a counting rule under budget constraints and demand-supply constraints, and is applicable to general analytic activation functions. In particular, it covers the swish function and polynomial functions, extending previous results to a wider class of activation functions. We further show that, when the input dimension is one, the upper bound obtained here coincides with the already known learning coefficient, thereby partially resolving the discrepancy above. Our result also provides a systematic perspective on how the weight parameters of three-layer neural networks affect the learning coefficient.

【4】UNIStainNet: Foundation-Model-Guided Virtual Staining of H&E to IHC
标题：UNISainNet：基础模型引导的H&E到IHC的虚拟染色
链接：https://arxiv.org/abs/2603.12716

作者：Jillur Rahman Saurav,Thuong Le Hoai Pham,Pritam Mukherjee,Paul Yi,Brent A. Orr,Jacob M. Luber
摘要：Virtual immunohistochemistry (IHC) staining from hematoxylin and eosin (H&E) images can accelerate diagnostics by providing preliminary molecular insight directly from routine sections, reducing the need for repeat sectioning when tissue is limited. Existing methods improve realism through contrastive objectives, prototype matching, or domain alignment, yet the generator itself receives no direct guidance from pathology foundation models. We present UNIStainNet, a SPADE-UNet conditioned on dense spatial tokens from a frozen pathology foundation model (UNI), providing tissue-level semantic guidance for stain translation. A misalignment-aware loss suite preserves stain quantification accuracy, and learned stain embeddings enable a single model to serve multiple IHC markers simultaneously. On MIST, UNIStainNet achieves state-of-the-art distributional metrics on all four stains (HER2, Ki67, ER, PR) from a single unified model, where prior methods typically train separate per-stain models. On BCI, it also achieves the best distributional metrics. A tissue-type stratified failure analysis reveals that remaining errors are systematic, concentrating in non-tumor tissue. Code is available at https://github.com/facevoid/UNIStainNet.

【5】Human-AI Collaborative Autonomous Experimentation With Proxy Modeling for Comparative Observation
标题：采用代理建模进行比较观察的人机协作自主实验
链接：https://arxiv.org/abs/2603.12618

作者：Arpan Biswas,Hiroshi Funakubo,Yongtao Liu
备注：14 pages, 7 figures
摘要：Optimization for different tasks like material characterization, synthesis, and functional properties for desired applications over multi-dimensional control parameters need a rapid strategic search through active learning such as Bayesian optimization (BO). However, such high-dimensional experimental physical descriptors are complex and noisy, from which realization of a low-dimensional mathematical scalar metrics or objective functions can be erroneous. Moreover, in traditional purely data-driven autonomous exploration, such objective functions often ignore the subtle variation and key features of the physical descriptors, thereby can fail to discover unknown phenomenon of the material systems. To address this, here we present a proxy-modelled Bayesian optimization (px-BO) via on-the-fly teaming between human and AI agents. Over the loop of BO, instead of defining a mathematical objective function directly from the experimental data, we introduce a voting system on the fly where the new experimental outcome will be compared with existing experiments, and the human agents will choose the preferred samples. These human-guided comparisons are then transformed into a proxy-based objective function via fitting Bradley-Terry (BT) model. Then, to minimize human interaction, this iteratively trained proxy model also acts as an AI agent for future surrogate human votes. Finally, these surrogate votes are periodically validated by human agents, and the corrections are then learned by the proxy model on-the-fly. We demonstrated the performance of the proposed px-BO framework into simulated and BEPS data generated from PTO sample. We find that our approach provided better control of the domain experts for an improved search over traditional data-driven exploration, thus, signifies the importance of human-AI teaming in an accelerated and meaningful material space exploration.

【6】When Drafts Evolve: Speculative Decoding Meets Online Learning
标题：当草稿演变时：推测解码与在线学习相遇
链接：https://arxiv.org/abs/2603.12617

作者：Yu-Yang Qian,Hao-Cong Wu,Yichao Fu,Hao Zhang,Peng Zhao
摘要：Speculative decoding has emerged as a widely adopted paradigm for accelerating large language model inference, where a lightweight draft model rapidly generates candidate tokens that are then verified in parallel by a larger target model. However, due to limited model capacity, drafts often struggle to approximate the target distribution, resulting in shorter acceptance lengths and diminished speedup. A key yet under-explored observation is that speculative decoding inherently provides verification feedback that quantifies the deviation between the draft and target models at no additional cost. This process naturally forms an iterative "draft commits-feedback provides-draft adapts" evolving loop, which precisely matches the online learning paradigm. Motivated by this connection, we propose OnlineSpec, a unified framework that systematically leverages interactive feedback to continuously evolve draft models. Grounded in dynamic regret minimization, we establish a formal link between online learning performance and speculative system's acceleration rate, and develop novel algorithms via modern online learning techniques, including optimistic online learning that adaptively reuses historical gradients as predictive update hints, and online ensemble learning that dynamically maintains multiple draft models. Our algorithms are equipped with theoretical justifications and improved acceleration rates, achieving up to 24% speedup over seven benchmarks and three foundation models.

【7】Maximizing Incremental Information Entropy for Contrastive Learning
标题：最大化对比学习的增量信息熵
链接：https://arxiv.org/abs/2603.12594

作者：Jiansong Zhang,Zhuoqin Yang,Xu Wu,Xiaoling Luo,Peizhong Liu,Linlin Shen
备注：ICLR 2026 (The Fourteenth International Conference on Learning Representations) https://openreview.net/forum?id=XL7ValpExh
摘要：Contrastive learning has achieved remarkable success in self-supervised representation learning, often guided by information-theoretic objectives such as mutual information maximization. Motivated by the limitations of static augmentations and rigid invariance constraints, we propose IE-CL (Incremental-Entropy Contrastive Learning), a framework that explicitly optimizes the entropy gain between augmented views while preserving semantic consistency. Our theoretical framework reframes the challenge by identifying the encoder as an information bottleneck and proposes a joint optimization of two components: a learnable transformation for entropy generation and an encoder regularizer for its preservation. Experiments on CIFAR-10/100, STL-10, and ImageNet demonstrate that IE-CL consistently improves performance under small-batch settings. Moreover, our core modules can be seamlessly integrated into existing frameworks. This work bridges theoretical principles and practice, offering a new perspective in contrastive learning.

【8】CA-HFP: Curvature-Aware Heterogeneous Federated Pruning with Model Reconstruction
标题：CA-SPP：具有模型重建的曲线感知的异类联邦修剪
链接：https://arxiv.org/abs/2603.12591

作者：Gang Hu,Yinglei Teng,Pengfei Wu,Shijun Ma
摘要：Federated learning on heterogeneous edge devices requires personalized compression while preserving aggregation compatibility and stable convergence. We present Curvature-Aware Heterogeneous Federated Pruning (CA-HFP), a practical framework that enables each client perform structured, device-specific pruning guided by a curvature-informed significance score, and subsequently maps its compact submodel back into a common global parameter space via a lightweight reconstruction. We derive a convergence bound for federated optimization with multiple local SGD steps that explicitly accounts for local computation, data heterogeneity, and pruning-induced perturbations; from which a principled loss-based pruning criterion is derived. Extensive experiments on FMNIST, CIFAR-10, and CIFAR-100 using VGG and ResNet architectures under varying degrees of data heterogeneity demonstrate that CA-HFP preserves model accuracy while significantly reducing per-client computation and communication costs, outperforming standard federated training and existing pruning-based baselines.

【9】Scaling Laws and Pathologies of Single-Layer PINNs: Network Width and PDE Nonlinearity
标题：单层PINN的缩放定律和病理学：网络宽度和PDL非线性
链接：https://arxiv.org/abs/2603.12556

作者：Faris Chaudhry
备注：Accepted at the Machine Learning and Physical Sciences Workshop (NeurIPS 2025)
摘要：We establish empirical scaling laws for Single-Layer Physics-Informed Neural Networks on canonical nonlinear PDEs. We identify a dual optimization failure: (i) a baseline pathology, where the solution error fails to decrease with network width, even at fixed nonlinearity, falling short of theoretical approximation bounds, and (ii) a compounding pathology, where this failure is exacerbated by nonlinearity. We provide quantitative evidence that a simple separable power law is insufficient, and that the scaling behavior is governed by a more complex, non-separable relationship. This failure is consistent with the concept of spectral bias, where networks struggle to learn the high-frequency solution components that intensify with nonlinearity. We show that optimization, not approximation capacity, is the primary bottleneck, and propose a methodology to empirically measure these complex scaling effects.

【10】Embedded Quantum Machine Learning in Embedded Systems: Feasibility, Hybrid Architectures, and Quantum Co-Processors
标题：嵌入式系统中的嵌入式量子机器学习：可行性、混合架构和量子协处理器
链接：https://arxiv.org/abs/2603.12540

作者：Somdip Dey,Syed Muhammad Raza
备注：6 pages, 1 figure, 5th International Conference Computing, Mathematics & Engineering Technologies (iCoMET 2026)
摘要：Embedded quantum machine learning (EQML) seeks to bring quantum machine learning (QML) capabilities to resource-constrained edge platforms such as IoT nodes, wearables, drones, and cyber-physical controllers. In 2026, EQML is technically feasible only in limited and highly experimental forms: (i) hybrid workflows where an embedded device performs sensing and classical processing while offloading a narrowly scoped quantum subroutine to a remote quantum processing unit (QPU) or nearby quantum appliance, and (ii) early-stage "embedded QPU" concepts in which a compact quantum co-processor is integrated with classical control hardware. A practical bridge is quantum-inspired machine learning and optimisation on classical embedded processors and FPGAs. This paper analyses feasibility from a circuits-and-systems perspective aligned with the academic community, formalises two implementation pathways, identifies the dominant barriers (latency, data encoding overhead, NISQ noise, tooling mismatch, and energy), and maps them to concrete engineering directions in interface design, control electronics, power management, verification, and security. We also argue that responsible deployment requires adversarial evaluation and governance practices that are increasingly necessary for edge AI systems.

【11】Learning Pore-scale Multiphase Flow from 4D Velocimetry
标题：从4D测速学学习孔尺度多相流
链接：https://arxiv.org/abs/2603.12516

作者：Chunyang Wang,Linqi Zhu,Yuxuan Gu,Robert van der Merwe,Xin Ju,Catherine Spurin,Samuel Krevor,Rex Ying,Tobias Pfaff,Martin J. Blunt,Tom Bultreys,Gege Wen
摘要：Multiphase flow in porous media underpins subsurface energy and environmental technologies, including geological CO$_2$ storage and underground hydrogen storage, yet pore-scale dynamics in realistic three-dimensional materials remain difficult to characterize and predict. Here we introduce a multimodal learning framework that infers multiphase pore-scale flow directly from time-resolved four-dimensional (4D) micro-velocimetry measurements. The model couples a graph network simulator for Lagrangian tracer-particle motion with a 3D U-Net for voxelized interface evolution. The imaged pore geometry serves as a boundary constraint to the flow velocity and the multiphase interface predictions, which are coupled and updated iteratively at each time step. Trained autoregressively on experimental sequences in capillary-dominated conditions ($Ca\approx10^{-6}$), the learned surrogate captures transient, nonlocal flow perturbations and abrupt interface rearrangements (Haines jumps) over rollouts spanning seconds of physical time, while reducing hour-to-day--scale direct numerical simulations to seconds of inference. By providing rapid, experimentally informed predictions, the framework opens a route to ''digital experiments'' to replicate pore-scale physics observed in multiphase flow experiments, offering an efficient tool for exploring injection conditions and pore-geometry effects relevant to subsurface carbon and hydrogen storage.

【12】Modal Logical Neural Networks for Financial AI
标题：金融人工智能的模式逻辑神经网络
链接：https://arxiv.org/abs/2603.12487

作者：Antonin Sulc
备注：4 pages, 1 figure, Accepted at ICLR 2026 FinAI
摘要：The financial industry faces a critical dichotomy in AI adoption: deep learning often delivers strong empirical performance, while symbolic logic offers interpretability and rule adherence expected in regulated settings. We use Modal Logical Neural Networks (MLNNs) as a bridge between these worlds, integrating Kripke semantics into neural architectures to enable differentiable reasoning about necessity, possibility, time, and knowledge. We illustrate MLNNs as a differentiable ``Logic Layer'' for finance by mapping core components, Necessity Neurons ($\Box$) and Learnable Accessibility ($A_θ$), to regulatory guardrails, market stress testing, and collusion detection. Four case studies show how MLNN-style constraints can promote compliance in trading agents, help recover latent trust networks for market surveillance, encourage robustness under stress scenarios, and distinguish statistical belief from verified knowledge to help mitigate robo-advisory hallucinations.

【13】Revisiting Model Stitching In the Foundation Model Era
标题：重温基础模型时代的模型缝合
链接：https://arxiv.org/abs/2603.12433

作者：Zheda Mai,Ke Zhang,Fu-En Wang,Zixiao Ken Wang,Albert Y. C. Chen,Lu Xia,Min Sun,Wei-Lun Chao,Cheng-Hao Kuo
备注：Accepted by CVPR 2023
摘要：Model stitching, connecting early layers of one model (source) to later layers of another (target) via a light stitch layer, has served as a probe of representational compatibility. Prior work finds that models trained on the same dataset remain stitchable (negligible accuracy drop) despite different initializations or objectives. We revisit stitching for Vision Foundation Models (VFMs) that vary in objectives, data, and modality mix (e.g., CLIP, DINOv2, SigLIP 2) and ask: Are heterogeneous VFMs stitchable? We introduce a systematic protocol spanning the stitch points, stitch layer families, training losses, and downstream tasks. Three findings emerge. (1) Stitch layer training matters: conventional approaches that match the intermediate features at the stitch point or optimize the task loss end-to-end struggle to retain accuracy, especially at shallow stitch points. (2) With a simple feature-matching loss at the target model's penultimate layer, heterogeneous VFMs become reliably stitchable across vision tasks. (3) For deep stitch points, the stitched model can surpass either constituent model at only a small inference overhead (for the stitch layer). Building on these findings, we further propose the VFM Stitch Tree (VST), which shares early layers across VFMs while retaining their later layers, yielding a controllable accuracy-latency trade-off for multimodal LLMs that often leverage multiple VFMs. Taken together, our study elevates stitching from a diagnostic probe to a practical recipe for integrating complementary VFM strengths and pinpointing where their representations align or diverge.

【14】Sinkhorn-Drifting Generative Models
标题：辛克霍恩漂移生成模型
链接：https://arxiv.org/abs/2603.12366

作者：Ping He,Om Khangaonkar,Hamed Pirsiavash,Yikun Bai,Soheil Kolouri
摘要：We establish a theoretical link between the recently proposed "drifting" generative dynamics and gradient flows induced by the Sinkhorn divergence. In a particle discretization, the drift field admits a cross-minus-self decomposition: an attractive term toward the target distribution and a repulsive/self-correction term toward the current model, both expressed via one-sided normalized Gibbs kernels. We show that Sinkhorn divergence yields an analogous cross-minus-self structure, but with each term defined by entropic optimal-transport couplings obtained through two-sided Sinkhorn scaling (i.e., enforcing both marginals). This provides a precise sense in which drifting acts as a surrogate for a Sinkhorn-divergence gradient flow, interpolating between one-sided normalization and full two-sided Sinkhorn scaling. Crucially, this connection resolves an identifiability gap in prior drifting formulations: leveraging the definiteness of the Sinkhorn divergence, we show that zero drift (equilibrium of the dynamics) implies that the model and target measures match. Experiments show that Sinkhorn drifting reduces sensitivity to kernel temperature and improves one-step generative quality, trading off additional training time for a more stable optimization, without altering the inference procedure used by drift methods. These theoretical gains translate to strong low-temperature improvements in practice: on FFHQ-ALAE at the lowest temperature setting we evaluate, Sinkhorn drifting reduces mean FID from 187.7 to 37.1 and mean latent EMD from 453.3 to 144.4, while on MNIST it preserves full class coverage across the temperature sweep. Project page: https://mint-vu.github.io/SinkhornDrifting/

【15】Alternating Gradient Flow Utility: A Unified Metric for Structural Pruning and Dynamic Routing in Deep Networks
标题：交替梯度流实用程序：深度网络中结构修剪和动态路由的统一指标
链接：https://arxiv.org/abs/2603.12354

作者：Tianhao Qian,Zhuoxuan Li,Jinde Cao,Xinli Shi,Hanjie Liu,Leszek Rutkowski
备注：11 pages, 6 figures, 9 tables
摘要：Efficient deep learning traditionally relies on static heuristics like weight magnitude or activation awareness (e.g., Wanda, RIA). While successful in unstructured settings, we observe a critical limitation when applying these metrics to the structural pruning of deep vision networks. These contemporary metrics suffer from a magnitude bias, failing to preserve critical functional pathways. To overcome this, we propose a decoupled kinetic paradigm inspired by Alternating Gradient Flow (AGF), utilizing an absolute feature-space Taylor expansion to accurately capture the network's structural "kinetic utility". First, we uncover a topological phase transition at extreme sparsity, where AGF successfully preserves baseline functionality and exhibits topological implicit regularization, avoiding the collapse seen in models trained from scratch. Second, transitioning to architectures without strict structural priors, we reveal a phenomenon of Sparsity Bottleneck in Vision Transformers (ViTs). Through a gradient-magnitude decoupling analysis, we discover that dynamic signals suffer from signal compression in converged models, rendering them suboptimal for real-time routing. Finally, driven by these empirical constraints, we design a hybrid routing framework that decouples AGF-guided offline structural search from online execution via zero-cost physical priors. We validate our paradigm on large-scale benchmarks: under a 75% compression stress test on ImageNet-1K, AGF effectively avoids the structural collapse where traditional metrics aggressively fall below random sampling. Furthermore, when systematically deployed for dynamic inference on ImageNet-100, our hybrid approach achieves Pareto-optimal efficiency. It reduces the usage of the heavy expert by approximately 50% (achieving an estimated overall cost of 0.92$\times$) without sacrificing the full-model accuracy.

【16】No More DeLuLu: Physics-Inspired Kernel Networks for Geometrically-Grounded Neural Computation
标题：不再有DeLuLu：用于几何接地神经计算的受物理启发的核心网络
链接：https://arxiv.org/abs/2603.12276

作者：Taha Bouhsine
备注：for more info check www.azetta.ai
摘要：We introduce the yat-product, a kernel operator combining quadratic alignment with inverse-square proximity. We prove it is a Mercer kernel, analytic, Lipschitz on bounded domains, and self-regularizing, admitting a unique RKHS embedding. Neural Matter Networks (NMNs) use yat-product as the sole non-linearity, replacing conventional linear-activation-normalization blocks with a single geometrically-grounded operation. This architectural simplification preserves universal approximation while shifting normalization into the kernel itself via the denominator, rather than relying on separate normalization layers. Empirically, NMN-based classifiers match linear baselines on MNIST while exhibiting bounded prototype evolution and superposition robustness. In language modeling, Aether-GPT2 achieves lower validation loss than GPT-2 with a comparable parameter budget while using yat-based attention and MLP blocks. Our framework unifies kernel learning, gradient stability, and information geometry, establishing NMNs as a principled alternative to conventional neural architectures.

【17】Association-Aware GNN for Precoder Learning in Cell-Free Systems
标题：关联感知GNN用于无细胞系统中的预编码器学习
链接：https://arxiv.org/abs/2603.13035

作者：Mingyu Deng,Shengqian Han
摘要：Deep learning has been widely recognized as a promising approach for optimizing multi-user multi-antenna precoders in traditional cellular systems. However, a critical distinction between cell-free and cellular systems lies in the flexibility of user equipment (UE)-access point (AP) associations. Consequently, the optimal precoder depends not only on channel state information but also on the dynamic UE-AP association status. In this paper, we propose an association-aware graph neural network (AAGNN) that explicitly incorporates association status into the precoding design. We leverage the permutation equivariance properties of the cell-free precoding policy to reduce the training complexity of AAGNN and employ an attention mechanism to enhance its generalization performance. Simulation results demonstrate that the proposed AAGNN outperforms baseline learning methods in both learning performance and generalization capabilities while maintaining low training and inference complexity.

【18】A theory of learning data statistics in diffusion models, from easy to hard
标题：扩散模型中学习数据统计的理论，从简单到困难
链接：https://arxiv.org/abs/2603.12901

作者：Lorenzo Bardone,Claudia Merger,Sebastian Goldt
摘要：While diffusion models have emerged as a powerful class of generative models, their learning dynamics remain poorly understood. We address this issue first by empirically showing that standard diffusion models trained on natural images exhibit a distributional simplicity bias, learning simple, pair-wise input statistics before specializing to higher-order correlations. We reproduce this behaviour in simple denoisers trained on a minimal data model, the mixed cumulant model, where we precisely control both pair-wise and higher-order correlations of the inputs. We identify a scalar invariant of the model that governs the sample complexity of learning pair-wise and higher-order correlations that we call the diffusion information exponent, in analogy to related invariants in different learning paradigms. Using this invariant, we prove that the denoiser learns simple, pair-wise statistics of the inputs at linear sample complexity, while more complex higher-order statistics, such as the fourth cumulant, require at least cubic sample complexity. We also prove that the sample complexity of learning the fourth cumulant is linear if pair-wise and higher-order statistics share a correlated latent structure. Our work describes a key mechanism for how diffusion models can learn distributions of increasing complexity.

【19】EB-RANSAC: Random Sample Consensus based on Energy-Based Model
标题：EB-RASAC：基于能量模型的随机样本共识
链接：https://arxiv.org/abs/2603.12525

作者：Muneki Yasuda,Nao Watanabe,Kaiji Sekimoto
摘要：Random sample consensus (RANSAC), which is based on a repetitive sampling from a given dataset, is one of the most popular robust estimation methods. In this study, an energy-based model (EBM) for robust estimation that has a similar scheme to RANSAC, energy-based RANSAC (EB-RANSAC), is proposed. EB-RANSAC is applicable to a wide range of estimation problems similar to RANSAC. However, unlike RANSAC, EB-RANSAC does not require a troublesome sampling procedure and has only one hyperparameter. The effectiveness of EB-RANSAC is numerically demonstrated in two applications: a linear regression and maximum likelihood estimation.

【20】Pruning-induced phases in fully-connected neural networks: the eumentia, the dementia, and the amentia
标题：全连接神经网络中修剪诱导的阶段：正常、痴呆和精神错乱
链接：https://arxiv.org/abs/2603.12316

作者：Haining Pan,Nakul Aggarwal,J. H. Pixley
备注：14 pages, 15 figures
摘要：Modern neural networks are heavily overparameterized, and pruning, which removes redundant neurons or connections, has emerged as a key approach to compressing them without sacrificing performance. However, while practical pruning methods are well developed, whether pruning induces sharp phase transitions in the neural networks and, if so, to what universality class they belong, remain open questions. To address this, we study fully-connected neural networks trained on MNIST, independently varying the dropout (i.e., removing neurons) rate at both the training and evaluation stages to map the phase diagram. We identify three distinct phases: eumentia (the network learns), dementia (the network has forgotten), and amentia (the network cannot learn), sharply distinguished by the power-law scaling of the cross-entropy loss with the training dataset size. {In the eumentia phase, the algebraic decay of the loss, as documented in the machine learning literature as neural scaling laws, is from the perspective of statistical mechanics the hallmark of quasi-long-range order.} We demonstrate that the transition between the eumentia and dementia phases is accompanied by scale invariance, with a diverging length scale that exhibits hallmarks of a Berezinskii-Kosterlitz-Thouless-like transition; the phase structure is robust across different network widths and depths. Our results establish that dropout-induced pruning provides a concrete setting in which neural network behavior can be understood through the lens of statistical mechanics.

其他(34篇)

【1】Learnability and Privacy Vulnerability are Entangled in a Few Critical Weights
标题：可学习性和隐私漏洞与几个关键因素交织在一起
链接：https://arxiv.org/abs/2603.13186

作者：Xingli Fang,Jung-Eun Kim
备注：ICLR 2026
摘要：Prior approaches for membership privacy preservation usually update or retrain all weights in neural networks, which is costly and can lead to unnecessary utility loss or even more serious misalignment in predictions between training data and non-training data. In this work, we observed three insights: i) privacy vulnerability exists in a very small fraction of weights; ii) however, most of those weights also critically impact utility performance; iii) the importance of weights stems from their locations rather than their values. According to these insights, to preserve privacy, we score critical weights, and instead of discarding those neurons, we rewind only the weights for fine-tuning. We show that, through extensive experiments, this mechanism exhibits outperforming resilience in most cases against Membership Inference Attacks while maintaining utility.

【2】MXNorm: Reusing MXFP block scales for efficient tensor normalisation
标题：MXNorm：重用MXFP块尺度以实现高效的张量归一化
链接：https://arxiv.org/abs/2603.13180

作者：Callum McLean,Luke Y. Prince,Alexandre Payot,Paul Balança,Carlo Luschi
备注：Preprint, Under Review. 15 pages, 12 figures
摘要：Matrix multiplication performance has long been the major bottleneck to scaling deep learning workloads, which has stimulated the design of new accelerators that use increasingly low-precision number formats. However, improvements in matrix multiplication performance have far outstripped improvements in performance on reductions and elementwise computations, which are still being performed in higher precision. In this work, we propose MXNorm, a drop-in replacement for RMSNorm that estimates the RMS using only the block scales calculated as part of the MXFP8 cast and enables a 32x decrease in the size of reduction needed for normalization. We validate our approximation method on pre-training of Llama 3 models of 125M, 1B and 8B parameters, finding minimal loss of training accuracy compared to a baseline using RMSNorm with MXFP8 matmuls. We also show practical kernel speedups using only torch.compile of up to 2.4x for MXNorm over RMSNorm, corresponding to a 1.3% speedup in Llama 3 8B transformer layers in MXFP8 and a 2.6% speedup in NVFP4.

【3】ZO-SAM: Zero-Order Sharpness-Aware Minimization for Efficient Sparse Training
标题：ZO-Sam：零阶清晰度感知最小化，以实现高效稀疏训练
链接：https://arxiv.org/abs/2603.13115

作者：Jie Ji,Gen Li,Kaiyuan Deng,Fatemeh Afghah,Xiaolong Ma
摘要：Deep learning models, despite their impressive achievements, suffer from high computational costs and memory requirements, limiting their usability in resource-constrained environments. Sparse neural networks significantly alleviate these constraints by dramatically reducing parameter count and computational overhead. However, existing sparse training methods often experience chaotic and noisy gradient signals, severely hindering convergence and generalization performance, particularly at high sparsity levels. To tackle this critical challenge, we propose Zero-Order Sharpness-Aware Minimization (ZO-SAM), a novel optimization framework that strategically integrates zero-order optimization within the SAM approach. Unlike traditional SAM, ZO-SAM requires only a single backpropagation step during perturbation, selectively utilizing zero-order gradient estimations. This innovative approach reduces the backpropagation computational cost by half compared to conventional SAM, significantly lowering gradient variance and effectively eliminating associated computational overhead. By harnessing SAM's capacity for identifying flat minima, ZO-SAM stabilizes the training process and accelerates convergence. These efficiency gains are particularly important in sparse training scenarios, where computational cost is the primary bottleneck that limits the practicality of SAM. Moreover, models trained with ZO-SAM exhibit improved robustness under distribution shift, further broadening its practicality in real-world deployments.

【4】Influence Malleability in Linearized Attention: Dual Implications of Non-Convergent NTK Dynamics
标题：线性化注意力的影响可塑性：非收敛NTK动力学的双重含义
链接：https://arxiv.org/abs/2603.13085

作者：Jose Marie Antonio Miñoza,Paulo Mario P. Medina,Sebastian C. Ibañez
摘要：Understanding the theoretical foundations of attention mechanisms remains challenging due to their complex, non-linear dynamics. This work reveals a fundamental trade-off in the learning dynamics of linearized attention. Using a linearized attention mechanism with exact correspondence to a data-dependent Gram-induced kernel, both empirical and theoretical analysis through the Neural Tangent Kernel (NTK) framework shows that linearized attention does not converge to its infinite-width NTK limit, even at large widths. A spectral amplification result establishes this formally: the attention transformation cubes the Gram matrix's condition number, requiring width $m = Ω(κ^6)$ for convergence, a threshold that exceeds any practical width for natural image datasets. This non-convergence is characterized through influence malleability, the capacity to dynamically alter reliance on training examples. Attention exhibits 6--9$\times$ higher malleability than ReLU networks, with dual implications: its data-dependent kernel can reduce approximation error by aligning with task structure, but this same sensitivity increases susceptibility to adversarial manipulation of training data. These findings suggest that attention's power and vulnerability share a common origin in its departure from the kernel regime.

【5】Deconstructing the Failure of Ideal Noise Correction: A Three-Pillar Diagnosis
标题：解构理想噪音修正的失败：三支柱诊断
链接：https://arxiv.org/abs/2603.12997

作者：Chen Feng,Zhuo Zhi,Zhao Huang,Jiawei Ge,Ling Xiao,Nicu Sebe,Georgios Tzimiropoulos,Ioannis Patras
备注：Accepted to CVPR2026
摘要：Statistically consistent methods based on the noise transition matrix ($T$) offer a theoretically grounded solution to Learning with Noisy Labels (LNL), with guarantees of convergence to the optimal clean-data classifier. In practice, however, these methods are often outperformed by empirical approaches such as sample selection, and this gap is usually attributed to the difficulty of accurately estimating $T$. The common assumption is that, given a perfect $T$, noise-correction methods would recover their theoretical advantage. In this work, we put this longstanding hypothesis to a decisive test. We conduct experiments under idealized conditions, providing correction methods with a perfect, oracle transition matrix. Even under these ideal conditions, we observe that these methods still suffer from performance collapse during training. This compellingly demonstrates that the failure is not fundamentally a $T$-estimation problem, but stems from a more deeply rooted flaw. To explain this behaviour, we provide a unified analysis that links three levels: macroscopic convergence states, microscopic optimisation dynamics, and information-theoretic limits on what can be learned from noisy labels. Together, these results give a formal account of why ideal noise correction fails and offer concrete guidance for designing more reliable methods for learning with noisy labels.

【6】Retrieval-Enhanced Real Estate Appraisal
标题：检索增强型房地产评估
链接：https://arxiv.org/abs/2603.12986

作者：Simon Popelier,Matthieu X. B. Sarazin,Maximilien Bohm,Mathieu Gierski,Hanna Mergui,Matthieu Ospici,Adrien Bernhardt
备注：Accepted at NFMCP 2024 workshop (New Frontiers in Mining Complex Patterns), held in conjunction with ECML 2024
摘要：The Sales Comparison Approach (SCA) is one of the most popular when it comes to real estate appraisal. Used as a reference in real estate expertise and as one of the major types of Automatic Valuation Models (AVM), it recently gained popularity within machine learning methods. The performance of models able to use data represented as sets and graphs made it possible to adapt this methodology efficiently, yielding substantial results. SCA relies on taking past transactions (comparables) as references, selected according to their similarity with the target property's sale. In this study, we focus on the selection of these comparables for real estate appraisal. We demonstrate that the selection of comparables used in many state-of-the-art algorithms can be significantly improved by learning a selection policy instead of imposing it. Our method relies on a hybrid vector-geographical retrieval module capable of adapting to different datasets and optimized jointly with an estimation module. We further show that the use of carefully selected comparables makes it possible to build models that require fewer comparables and fewer parameters with performance close to state-of-the-art models. All our evaluations are made on five datasets which span areas in the United States, Brazil, and France.

【7】Surrogates for Physics-based and Data-driven Modelling of Parametric Systems: Review and New Perspectives
标题：基于物理和数据驱动的参数系统建模的替代品：回顾和新观点
链接：https://arxiv.org/abs/2603.12870

作者：Matteo Giacomini,Pedro Díez
摘要：Surrogate models provide compact relations between user-defined input parameters and output quantities of interest, enabling the efficient evaluation of complex parametric systems in many-query settings. Such capabilities are essential in a wide range of applications, including optimisation, control, data assimilation, uncertainty quantification, and emerging digital twin technologies in various fields such as manufacturing, personalised healthcare, smart cities, and sustainability. This article reviews established methodologies for constructing surrogate models exploiting either knowledge of the governing laws and the dynamical structure of the system (physics-based) or experimental observations (data-driven), as well as hybrid approaches combining these two paradigms. By revisiting the design of a surrogate model as a functional approximation problem, existing methodologies are reviewed in terms of the choice of (i) a reduced basis and (ii) a suitable approximation criterion. The paper reviews methodologies pertaining to the field of Scientific Machine Learning, and it aims at synthesising established knowledge, recent advances, and new perspectives on: dimensionality reduction, physics-based, and data-driven surrogate modelling based on proper orthogonal decomposition, proper generalised decomposition, and artificial neural networks; multi-fidelity methods to exploit information from sources with different fidelities; adaptive sampling, enrichment, and data augmentation techniques to enhance the quality of surrogate models.

【8】On Linear Separability of the MNIST Handwritten Digits Dataset
标题：MNIST手写数字数据集的线性可分离性
链接：https://arxiv.org/abs/2603.12850

作者：Ákos Hajnal
备注：8 pages, 1 figure
摘要：The MNIST dataset containing thousands of handwritten digit images is still a fundamental benchmark for evaluating various pattern-recognition and image-classification models. Linear separability is a key concept in many statistical and machine-learning techniques. Despite the long history of the MNIST dataset and its relative simplicity in size and resolution, the question of whether the dataset is linearly separable has never been fully answered -- scientific and informal sources share conflicting claims. This paper aims to provide a comprehensive empirical investigation to address this question, distinguishing pairwise and one-vs-rest separation of the training, the test and the combined sets, respectively. It reviews the theoretical approaches to assessing linear separability, alongside state-of-the-art methods and tools, then systematically examines all relevant assemblies, and reports the findings.

【9】SLICE: Semantic Latent Injection via Compartmentalized Embedding for Image Watermarking
标题：SlICE：通过图像水印的分区嵌入的语义潜在注入
链接：https://arxiv.org/abs/2603.12749

作者：Zheng Gao,Yifan Yang,Xiaoyu Li,Xiaoyan Feng,Haoran Fan,Yang Song,Jiaojiao Jiang
摘要：Watermarking the initial noise of diffusion models has emerged as a promising approach for image provenance, but content-independent noise patterns can be forged via inversion and regeneration attacks. Recent semantic-aware watermarking methods improve robustness by conditioning verification on image semantics. However, their reliance on a single global semantic binding makes them vulnerable to localized but globally coherent semantic edits. To address this limitation and provide a trustworthy semantic-aware watermark, we propose $\underline{\textbf{S}}$emantic $\underline{\textbf{L}}$atent $\underline{\textbf{I}}$njection via $\underline{\textbf{C}}$ompartmentalized $\underline{\textbf{E}}$mbedding ($\textbf{SLICE}$). Our framework decouples image semantics into four semantic factors (subject, environment, action, and detail) and precisely anchors them to distinct regions in the initial Gaussian noise. This fine-grained semantic binding enables advanced watermark verification where semantic tampering is detectable and localizable. We theoretically justify why SLICE enables robust and reliable tamper localization and provides statistical guarantees on false-accept rates. Experimental results demonstrate that SLICE significantly outperforms existing baselines against advanced semantic-guided regeneration attacks, substantially reducing attack success while preserving image quality and semantic fidelity. Overall, SLICE offers a practical, training-free provenance solution that is both fine-grained in diagnosis and robust to realistic adversarial manipulations.

【10】Altered Thoughts, Altered Actions: Probing Chain-of-Thought Vulnerabilities in VLA Robotic Manipulation
标题：改变的想法，改变的行动：探索VLA机器人操纵中的思想链漏洞
链接：https://arxiv.org/abs/2603.12717

作者：Tuan Duong Trinh,Naveed Akhtar,Basim Azam
摘要：Recent Vision-Language-Action (VLA) models increasingly adopt chain-of-thought (CoT) reasoning, generating a natural-language plan before decoding motor commands. This internal text channel between the reasoning module and the action decoder has received no adversarial scrutiny. We ask: which properties of this intermediate plan does the action decoder actually rely on, and can targeted corruption of the reasoning trace alone -- with all inputs left intact -- degrade a robot's physical task performance? We design a taxonomy of seven text corruptions organized into three attacker tiers (blind noise, mechanical-semantic, and LLM-adaptive) and apply them to a state-of-the-art reasoning VLA across 40 LIBERO tabletop manipulation tasks. Our results reveal a striking asymmetry: substituting object names in the reasoning trace reduces overall success rate by 8.3~percentage points (pp) -- reaching $-$19.3~pp on goal-conditioned tasks and $-$45~pp on individual tasks -- whereas sentence reordering, spatial-direction reversal, token noise, and even a 70B-parameter LLM crafting plausible-but-wrong plans all have negligible impact (within $\pm$4~pp). This asymmetry indicates that the action decoder depends on entity-reference integrity rather than reasoning quality or sequential structure. Notably, a sophisticated LLM-based attacker underperforms simple mechanical object-name substitution, because preserving plausibility inadvertently retains the entity-grounding structure the decoder needs. A cross-architecture control using a non-reasoning VLA confirms the vulnerability is exclusive to reasoning-augmented models, while instruction-level attacks degrade both architectures -- establishing that the internal reasoning trace is a distinct and stealthy threat vector invisible to input-validation defenses.

【11】Disentangled Latent Dynamics Manifold Fusion for Solving Parameterized PDEs
标题：解纠缠潜在动力学Manifolation融合求解参数化偏出方程
链接：https://arxiv.org/abs/2603.12676

作者：Zhangyong Liang,Ji Zhang
摘要：Generalizing neural surrogate models across different PDE parameters remains difficult because changes in PDE coefficients often make learning harder and optimization less stable. The problem becomes even more severe when the model must also predict beyond the training time range. Existing methods usually cannot handle parameter generalization and temporal extrapolation at the same time. Standard parameterized models treat time as just another input and therefore fail to capture intrinsic dynamics, while recent continuous-time latent methods often rely on expensive test-time auto-decoding for each instance, which is inefficient and can disrupt continuity across the parameterized solution space. To address this, we propose Disentangled Latent Dynamics Manifold Fusion (DLDMF), a physics-informed framework that explicitly separates space, time, and parameters. Instead of unstable auto-decoding, DLDMF maps PDE parameters directly to a continuous latent embedding through a feed-forward network. This embedding initializes and conditions a latent state whose evolution is governed by a parameter-conditioned Neural ODE. We further introduce a dynamic manifold fusion mechanism that uses a shared decoder to combine spatial coordinates, parameter embeddings, and time-evolving latent states to reconstruct the corresponding spatiotemporal solution. By modeling prediction as latent dynamic evolution rather than static coordinate fitting, DLDMF reduces interference between parameter variation and temporal evolution while preserving a smooth and coherent solution manifold. As a result, it performs well on unseen parameter settings and in long-term temporal extrapolation. Experiments on several benchmark problems show that DLDMF consistently outperforms state-of-the-art baselines in accuracy, parameter generalization, and extrapolation robustness.

【12】Sobolev--Ricci Curvature
标题：索博列夫--里奇·曲瓦特
链接：https://arxiv.org/abs/2603.12652

作者：Kyoichi Iwasaki,Tam Le,Hideitsu Hino
备注：42 pages, 13 figures
摘要：Ricci curvature is a fundamental concept in differential geometry for encoding local geometric structure, and its graph-based analogues have recently gained prominence as practical tools for reweighting, pruning, and reshaping network geometry. We propose Sobolev-Ricci Curvature (SRC), a graph Ricci curvature canonically induced by Sobolev transport geometry, which admits efficient evaluation via a tree-metric Sobolev structure on neighborhood measures. We establish two consistency behaviors that anchor SRC to classical transport curvature: (i) on trees endowed with the length measure, SRC recovers Ollivier-Ricci curvature (ORC) in the canonical W1 setting, and (ii) SRC vanishes in the Dirac limit, matching the flat case of measure-theoretic Ricci curvature. We demonstrate SRC as a reusable curvature primitive in two representative pipelines. We define Sobolev-Ricci Flow by replacing ORC with SRC in a Ricci-flow-style reweighting rule, and we use SRC for curvature-guided edge pruning aimed at preserving manifold structure. Overall, SRC provides a transport-based foundation for scalable curvature-driven graph transformation and manifold-oriented pruning.

【13】LightMoE: Reducing Mixture-of-Experts Redundancy through Expert Replacing
标题：LightMoE：通过专家替换减少专家混合冗余
链接：https://arxiv.org/abs/2603.12645

作者：Jiawei Hao,Zhiwei Hao,Jianyuan Guo,Li Shen,Yong Luo,Han Hu,Dan Zeng
摘要：Mixture-of-Experts (MoE) based Large Language Models (LLMs) have demonstrated impressive performance and computational efficiency. However, their deployment is often constrained by substantial memory demands, primarily due to the need to load numerous expert modules. While existing expert compression techniques like pruning or merging attempt to mitigate this, they often suffer from irreversible knowledge loss or high training overhead. In this paper, we propose a novel expert compression paradigm termed expert replacing, which replaces redundant experts with parameter-efficient modules and recovers their capabilities with low training costs. We find that even a straightforward baseline of this paradigm yields promising performance. Building on this foundation, we introduce LightMoE, a framework that enhances the paradigm by introducing adaptive expert selection, hierarchical expert construction, and an annealed recovery strategy. Experimental results show that LightMoE matches the performance of LoRA fine-tuning at a 30% compression ratio. Even under a more aggressive 50% compression rate, it outperforms existing methods and achieves average performance improvements of 5.6% across five diverse tasks. These findings demonstrate that LightMoE strikes a superior balance among memory efficiency, training efficiency, and model performance.

【14】FastDSAC: Unlocking the Potential of Maximum Entropy RL in High-Dimensional Humanoid Control
标题：FastDSAC：释放多维人形机器人控制中最大熵RL的潜力
链接：https://arxiv.org/abs/2603.12612

作者：Jun Xue,Junze Wang,Xinming Zhang,Shanze Wang,Yanjun Chen,Wei Zhang
摘要：Scaling Maximum Entropy Reinforcement Learning (RL) to high-dimensional humanoid control remains a formidable challenge, as the ``curse of dimensionality'' induces severe exploration inefficiency and training instability in expansive action spaces. Consequently, recent high-throughput paradigms have largely converged on deterministic policy gradients combined with massive parallel simulation. We challenge this compromise with FastDSAC, a framework that effectively unlocks the potential of maximum entropy stochastic policies for complex continuous control. We introduce Dimension-wise Entropy Modulation (DEM) to dynamically redistribute the exploration budget and enforce diversity, alongside a continuous distributional critic tailored to ensure value fidelity and mitigate high-dimensional value overestimation. Extensive evaluations on HumanoidBench and other continuous control tasks demonstrate that rigorously designed stochastic policies can consistently match or outperform deterministic baselines, achieving notable gains of 180\% and 400\% on the challenging \textit{Basketball} and \textit{Balance Hard} tasks.

【15】Feynman: Knowledge-Infused Diagramming Agent for Scalable Visual Designs
标题：费曼：可扩展视觉设计的知识注入图表代理
链接：https://arxiv.org/abs/2603.12597

作者：Zixin Wen,Yifu Cai,Kyle Lee,Sam Estep,Josh Sunshine,Aarti Singh,Yuejie Chi,Wode Ni
备注：A previous version was submitted to ICLR 2025
摘要：Visual design is an essential application of state-of-the-art multi-modal AI systems. Improving these systems requires high-quality vision-language data at scale. Despite the abundance of internet image and text data, knowledge-rich and well-aligned image-text pairs are rare. In this paper, we present a scalable diagram generation pipeline built with our agent, Feynman. To create diagrams, Feynman first enumerates domain-specific knowledge components (''ideas'') and performs code planning based on the ideas. Given the plan, Feynman translates ideas into simple declarative programs and iterates to receives feedback and visually refine diagrams. Finally, the declarative programs are rendered by the Penrose diagramming system. The optimization-based rendering of Penrose preserves the visual semantics while injecting fresh randomness into the layout, thereby producing diagrams with visual consistency and diversity. As a result, Feynman can author diagrams along with grounded captions with very little cost and time. Using Feynman, we synthesized a dataset with more than 100k well-aligned diagram-caption pairs. We also curate a visual-language benchmark, Diagramma, from freshly generated data. Diagramma can be used for evaluating the visual reasoning capabilities of vision-language models. We plan to release the dataset, benchmark, and the full agent pipeline as an open-source project.

【16】Deferred is Better: A Framework for Multi-Granularity Deferred Interaction of Heterogeneous Features
标题：延迟更好：一个用于多粒度延迟交互的框架
链接：https://arxiv.org/abs/2603.12586

作者：Yi Xu,Moyu Zhang,Chaofan Fan,Jinxin Hu,Yu Zhang,Xiaoyi Zeng
摘要：Click-through rate (CTR) prediction models estimates the probability of a user-item click by modeling interactions across a vast feature space. A fundamental yet often overlooked challenge is the inherent heterogeneity of these features: their sparsity and information content vary dramatically. For instance, categorical features like item IDs are extremely sparse, whereas numerical features like item price are relatively dense. Prevailing CTR models have largely ignored this heterogeneity, employing a uniform feature interaction strategy that inputs all features into the interaction layers simultaneously. This approach is suboptimal, as the premature introduction of low-information features can inject significant noise and mask the signals from information-rich features, which leads to model collapse and hinders the learning of robust representations. To address the above challenge, we propose a Multi-Granularity Information-Aware Deferred Interaction Network (MGDIN), which adaptively defers the introduction of features into the feature interaction process. MGDIN's core mechanism operates in two stages: First, it employs a multi-granularity feature grouping strategy to partition the raw features into distinct groups with more homogeneous information density in different granularities, thereby mitigating the effects of extreme individual feature sparsity and enabling the model to capture feature interactions from diverse perspectives. Second, a delayed interaction mechanism is implemented through a hierarchical masking strategy, which governs when and how each group participates by masking low-information groups in the early layers and progressively unmasking them as the network deepens. This deferred introduction allows the model to establish a robust understanding based on high-information features before gradually incorporating sparser information from other groups...

【17】A Spectral Revisit of the Distributional Bellman Operator under the Cramér Metric
标题：Cramér度规下分配Bellman方程的谱重访
链接：https://arxiv.org/abs/2603.12576

作者：Keru Wang,Yixin Deng,Yao Lyu,Stephen Redmond,Shengbo Eben Li
摘要：Distributional reinforcement learning (DRL) studies the evolution of full return distributions under Bellman updates rather than focusing on expected values. A classical result is that the distributional Bellman operator is contractive under the Cramér metric, which corresponds to an $L^2$ geometry on differences of cumulative distribution functions (CDFs). While this contraction ensures stability of policy evaluation, existing analyses remain largely metric, focusing on contraction properties without elucidating the structural action of the Bellman update on distributions. In this work, we analyse distributional Bellman dynamics directly at the level of CDFs, treating the Cramér geometry as the intrinsic analytical setting. At this level, the Bellman update acts affinely on CDFs and linearly on differences between CDFs, and its contraction property yields a uniform bound on this linear action. Building on this intrinsic formulation, we construct a family of regularised spectral Hilbert representations that realise the CDF-level geometry by exact conjugation, without modifying the underlying Bellman dynamics. The regularisation affects only the geometry and vanishes in the zero-regularisation limit, recovering the native Cramér metric. This framework clarifies the operator structure underlying distributional Bellman updates and provides a foundation for further functional and operator-theoretic analyses in DRL.

【18】Asymptotic and Finite-Time Guarantees for Langevin-Based Temperature Annealing in InfoNCE
标题：InfoNSO中基于Langevin的温度退变的渐进和临时时间保证
链接：https://arxiv.org/abs/2603.12552

作者：Faris Chaudhry
备注：Accepted at the Optimization for Machine Learning Workshop (NeurIPS 2025)
摘要：The InfoNCE loss in contrastive learning depends critically on a temperature parameter, yet its dynamics under fixed versus annealed schedules remain poorly understood. We provide a theoretical analysis by modeling embedding evolution under Langevin dynamics on a compact Riemannian manifold. Under mild smoothness and energy-barrier assumptions, we show that classical simulated annealing guarantees extend to this setting: slow logarithmic inverse-temperature schedules ensure convergence in probability to a set of globally optimal representations, while faster schedules risk becoming trapped in suboptimal minima. Our results establish a link between contrastive learning and simulated annealing, providing a principled basis for understanding and tuning temperature schedules.

【19】A Reduction Algorithm for Markovian Contextual Linear Bandits
标题：马尔科夫上下文线性盗贼的一种约简算法
链接：https://arxiv.org/abs/2603.12530

作者：Kaan Buyukkalayci,Osama Hanna,Christina Fragouli
摘要：Recent work shows that when contexts are drawn i.i.d., linear contextual bandits can be reduced to single-context linear bandits. This ``contexts are cheap" perspective is highly advantageous, as it allows for sharper finite-time analyses and leverages mature techniques from the linear bandit literature, such as those for misspecification and adversarial corruption. Motivated by applications with temporally correlated availability, we extend this perspective to Markovian contextual linear bandits, where the action set evolves via an exogenous Markov chain. Our main contribution is a reduction that applies under uniform geometric ergodicity. We construct a stationary surrogate action set to solve the problem using a standard linear bandit oracle, employing a delayed-update scheme to control the bias induced by the nonstationary conditional context distributions. We further provide a phased algorithm for unknown transition distributions that learns the surrogate mapping online. In both settings, we obtain a high-probability worst-case regret bound matching that of the underlying linear bandit oracle, with only lower-order dependence on the mixing time.

【20】Curriculum Sampling: A Two-Phase Curriculum for Efficient Training of Flow Matching
标题：课程抽样：有效训练流量匹配的两阶段课程
链接：https://arxiv.org/abs/2603.12517

作者：Pengwei Sun
摘要：Timestep sampling $p(t)$ is a central design choice in Flow Matching models, yet common practice increasingly favors static middle-biased distributions (e.g., Logit-Normal). We show that this choice induces a speed--quality trade-off: middle-biased sampling accelerates early convergence but yields worse asymptotic fidelity than Uniform sampling. By analyzing per-timestep training losses, we identify a U-shaped difficulty profile with persistent errors near the boundary regimes, implying that under-sampling the endpoints leaves fine details unresolved. Guided by this insight, we propose \textbf{Curriculum Sampling}, a two-phase schedule that begins with middle-biased sampling for rapid structure learning and then switches to Uniform sampling for boundary refinement. On CIFAR-10, Curriculum Sampling improves the best FID from $3.85$ (Uniform) to $3.22$ while reaching peak performance at $100$k rather than $150$k training steps. Our results highlight that timestep sampling should be treated as an evolving curriculum rather than a fixed hyperparameter.

【21】Probing Length Generalization in Mamba via Image Reconstruction
标题：通过图像重建探索Mamba中的长度概括
链接：https://arxiv.org/abs/2603.12499

作者：Jan Rathjens,Robin Schiewer,Laurenz Wiskott,Anand Subramoney
摘要：Mamba has attracted widespread interest as a general-purpose sequence model due to its low computational complexity and competitive performance relative to transformers. However, its performance can degrade when inference sequence lengths exceed those seen during training. We study this phenomenon using a controlled vision task in which Mamba reconstructs images from sequences of image patches. By analyzing reconstructions at different stages of sequence processing, we reveal that Mamba qualitatively adapts its behavior to the distribution of sequence lengths encountered during training, resulting in strategies that fail to generalize beyond this range. To support our analysis, we introduce a length-adaptive variant of Mamba that improves performance across training sequence lengths. Our results provide an intuitive perspective on length generalization in Mamba and suggest directions for improving the architecture.

【22】Bases of Steerable Kernels for Equivariant CNNs: From 2D Rotations to the Lorentz Group
标题：等变CNN的可控核基：从二维旋转到Lorentz群
链接：https://arxiv.org/abs/2603.12459

作者：Alan Garbarz
备注：28 pages. Comments are welcome
摘要：We present an alternative way of solving the steerable kernel constraint that appears in the design of steerable equivariant convolutional neural networks. We find explicit real and complex bases which are ready to use, for different symmetry groups and for feature maps of arbitrary tensor type. A major advantage of this method is that it bypasses the need to numerically or analytically compute Clebsch-Gordan coefficients and works directly with the representations of the input and output feature maps. The strategy is to find a basis of kernels that respect a simpler invariance condition at some point $x_0$, and then \textit{steer} it with the defining equation of steerability to move to some arbitrary point $x=g\cdot x_0$. This idea has already been mentioned in the literature before, but not advanced in depth and with some generality. Here we describe how it works with minimal technical tools to make it accessible for a general audience.

【23】Bridging the Gap Between Security Metrics and Key Risk Indicators: An Empirical Framework for Vulnerability Prioritization
标题：弥合安全指标和关键风险指标之间的差距：漏洞优先级的经验框架
链接：https://arxiv.org/abs/2603.12450

作者：Emad Sherif,Iryna Yevseyeva,Vitor Basto-Fernandes,Allan Cook
摘要：Organisations overwhelmingly prioritize vulnerability remediation using Common Vulnerability Scoring System (CVSS) severity scores, yet CVSS classifiers achieve an Area Under the Precision-Recall Curve (AUPRC) of 0.011 on real-world exploitation data, near random chance. We propose a composite Key Risk Indicator grounded in expected-loss decomposition, integrating dimensions of threat, impact, and exposure. We evaluated the KRI framework against the Known Exploited Vulnerabilities (KEV) catalog using a comprehensive dataset of 280,694 Common Vulnerabilities and Exposures (CVEs). KRI achieves Receiver Operating Characteristic Area Under the Curve (ROC-AUC) 0.927 and AUPRC 0.223 versus 0.747 and 0.011 for CVSS (24 percents, 20). Ablation analysis shows Exploit Prediction Scoring System (EPSS) alone achieves AUPRC 0.365, higher than full KRI (0.223), confirming that EPSS and KRI serve distinct objectives: EPSS maximizes raw exploit detection, while KRI re-orders by impact and exposure, capturing 92.3 percents of impact-weighted remediation value at k=500 versus 82.6 percents for EPSS, and surfacing 1.75 more Critical-severity exploited CVEs. KRI's net benefit exceeds EPSS whenever the severity premium exceeds 2. While EPSS serves as a robust baseline for exploit detection, the KRI framework is the superior choice for organizations seeking to align remediation efforts with tangible risk reduction.

【24】Beyond Motion Imitation: Is Human Motion Data Alone Sufficient to Explain Gait Control and Biomechanics?
标题：超越运动模仿：仅靠人类运动数据就足以解释步态控制和生物力学吗？
链接：https://arxiv.org/abs/2603.12408

作者：Xinyi Liu,Jangwhan Ahn,Edgar Lobaton,Jennie Si,He Huang
备注：8 pages, 7 figures
摘要：With the growing interest in motion imitation learning (IL) for human biomechanics and wearable robotics, this study investigates how additional foot-ground interaction measures, used as reward terms, affect human gait kinematics and kinetics estimation within a reinforcement learning-based IL framework. Results indicate that accurate reproduction of forward kinematics alone does not ensure biomechanically plausible joint kinetics. Adding foot-ground contacts and contact forces to the IL reward terms enables the prediction of joint moments in forward walking simulation, which are significantly closer to those computed by inverse dynamics. This finding highlights a fundamental limitation of motion-only IL approaches, which may prioritize kinematics matching over physical consistency. Incorporating kinetic constraints, particularly ground reaction force and center of pressure information, significantly enhances the realism of internal and external kinetics. These findings suggest that, when imitation learning is applied to human-related research domains such as biomechanics and wearable robot co-design, kinetics-based reward shaping is necessary to achieve physically consistent gait representations.

【25】Budget-Sensitive Discovery Scoring: A Formally Verified Framework for Evaluating AI-Guided Scientific Selection
标题：预算敏感的发现评分：评估人工智能引导的科学选择的正式验证框架
链接：https://arxiv.org/abs/2603.12349

作者：Abhinaba Basu,Pavan Chakraborty
摘要：Scientific discovery increasingly relies on AI systems to select candidates for expensive experimental validation, yet no principled, budget-aware evaluation framework exists for comparing selection strategies -- a gap intensified by large language models (LLMs), which generate plausible scientific proposals without reliable downstream evaluation. We introduce the Budget-Sensitive Discovery Score (BSDS), a formally verified metric -- 20 theorems machine-checked by the Lean 4 proof assistant -- that jointly penalizes false discoveries (lambda-weighted FDR) and excessive abstention (gamma-weighted coverage gap) at each budget level. Its budget-averaged form, the Discovery Quality Score (DQS), provides a single summary statistic that no proposer can inflate by performing well at a cherry-picked budget. As a case study, we apply BSDS/DQS to: do LLMs add marginal value to an existing ML pipeline for drug discovery candidate selection? We evaluate 39 proposers -- 11 mechanistic variants, 14 zero-shot LLM configurations, and 14 few-shot LLM configurations -- using SMILES representations on MoleculeNet HIV (41,127 compounds, 3.5% active, 1,000 bootstrap replicates) under both random and scaffold splits. Three findings emerge. First, the simple RF-based Greedy-ML proposer achieves the best DQS (-0.046), outperforming all MLP variants and LLM configurations. Second, no LLM surpasses the Greedy-ML baseline under zero-shot or few-shot evaluation on HIV or Tox21, establishing that LLMs provide no marginal value over an existing trained classifier. Third, the proposer hierarchy generalizes across five MoleculeNet benchmarks spanning 0.18%-46.2% prevalence, a non-drug AV safety domain, and a 9x7 grid of penalty parameters (tau >= 0.636, mean tau = 0.863). The framework applies to any setting where candidates are selected under budget constraints and asymmetric error costs.

【26】Maximum Entropy Exploration Without the Rollouts
标题：没有滚动的最大熵探索
链接：https://arxiv.org/abs/2603.12325

作者：Jacob Adamczyk,Adam Kamoski,Rahul V. Kulkarni
摘要：Efficient exploration remains a central challenge in reinforcement learning, serving as a useful pretraining objective for data collection, particularly when an external reward function is unavailable. A principled formulation of the exploration problem is to find policies that maximize the entropy of their induced steady-state visitation distribution, thereby encouraging uniform long-run coverage of the state space. Many existing exploration approaches require estimating state visitation frequencies through repeated on-policy rollouts, which can be computationally expensive. In this work, we instead consider an intrinsic average-reward formulation in which the reward is derived from the visitation distribution itself, so that the optimal policy maximizes steady-state entropy. An entropy-regularized version of this objective admits a spectral characterization: the relevant stationary distributions can be computed from the dominant eigenvectors of a problem-dependent transition matrix. This insight leads to a novel algorithm for solving the maximum entropy exploration problem, EVE (EigenVector-based Exploration), which avoids explicit rollouts and distribution estimation, instead computing the solution through iterative updates, similar to a value-based approach. To address the original unregularized objective, we employ a posterior-policy iteration (PPI) approach, which monotonically improves the entropy and converges in value. We prove convergence of EVE under standard assumptions and demonstrate empirically that it efficiently produces policies with high steady-state entropy, achieving competitive exploration performance relative to rollout-based baselines in deterministic grid-world environments.

【27】VQQA: An Agentic Approach for Video Evaluation and Quality Improvement
标题：VQQA：视频评估和质量改进的抽象方法
链接：https://arxiv.org/abs/2603.12310

作者：Yiwen Song,Tomas Pfister,Yale Song
摘要：Despite rapid advancements in video generation models, aligning their outputs with complex user intent remains challenging. Existing test-time optimization methods are typically either computationally expensive or require white-box access to model internals. To address this, we present VQQA (Video Quality Question Answering), a unified, multi-agent framework generalizable across diverse input modalities and video generation tasks. By dynamically generating visual questions and using the resulting Vision-Language Model (VLM) critiques as semantic gradients, VQQA replaces traditional, passive evaluation metrics with human-interpretable, actionable feedback. This enables a highly efficient, closed-loop prompt optimization process via a black-box natural language interface. Extensive experiments demonstrate that VQQA effectively isolates and resolves visual artifacts, substantially improving generation quality in just a few refinement steps. Applicable to both text-to-video (T2V) and image-to-video (I2V) tasks, our method achieves absolute improvements of +11.57% on T2V-CompBench and +8.43% on VBench2 over vanilla generation, significantly outperforming state-of-the-art stochastic search and prompt optimization techniques.

【28】Global Evolutionary Steering: Refining Activation Steering Control via Cross-Layer Consistency
标题：全球进化引导：通过跨层一致性完善激活引导控制
链接：https://arxiv.org/abs/2603.12298

作者：Xinyan Jiang,Wenjing Yu,Di Wang,Lijie Hu
摘要：Activation engineering enables precise control over Large Language Models (LLMs) without the computational cost of fine-tuning. However, existing methods deriving vectors from static activation differences are susceptible to high-dimensional noise and layer-wise semantic drift, often capturing spurious correlations rather than the target intent. To address this, we propose Global Evolutionary Refined Steering (GER-steer), a training-free framework that grounded in the geometric stability of the network's representation evolution. GER-steer exploits this global signal to rectify raw steering vectors, effectively decoupling robust semantic intent from orthogonal artifacts. Extensive evaluations confirm that GER-steer consistently outperforms baselines, delivering superior efficacy and generalization without layer-specific tuning, establishing a universal solution for reliable model alignment.

【29】Weakly Time-Coupled Approximation of Markov Decision Processes
标题：马尔科夫决策过程的弱时间耦合逼近
链接：https://arxiv.org/abs/2603.12636

作者：Negar Soheili,Selvaprabu Nadarajah,Bo Yang
摘要：Finite-horizon Markov decision processes (MDPs) with high-dimensional exogenous uncertainty and endogenous states arise in operations and finance, including the valuation and exercise of Bermudan and real options, but face a scalability barrier as computational complexity grows with the horizon. A common approximation represents the value function using basis functions, but methods for fitting weights treat cross-stage optimization differently. Least squares Monte Carlo (LSM) fits weights via backward recursion and regression, avoiding joint optimization but accumulating error over the horizon. Approximate linear programming (ALP) and pathwise optimization (PO) jointly fit weights to produce upper bounds, but temporal coupling causes computational complexity to grow with the horizon. We show this coupling is an artifact of the approximation architecture, and develop a weakly time-coupled approximation (WTCA) where cross-stage dependence is independent of horizon. For any fixed basis function set, the WTCA upper bound is tighter than that of ALP and looser than that of PO, and converges to the optimal policy value as the basis family expands. We extend parallel deterministic block coordinate descent to the stochastic MDP setting exploiting weak temporal coupling. Applied to WTCA, weak coupling yields computational complexity independent of the horizon. Within equal time budget, solving WTCA accommodates more exogenous samples or basis functions than PO, yielding tighter bounds despite PO being tighter for fixed samples and basis functions. On Bermudan option and ethanol production instances, WTCA produces tighter upper bounds than PO and LSM in every instance tested, with near-optimal policies at longer horizons.

【30】Batched Kernelized Bandits: Refinements and Extensions
标题：批量核心化盗贼：改进和扩展
链接：https://arxiv.org/abs/2603.12627

作者：Chenkai Ma,Keqin Chen,Jonathan Scarlett
摘要：In this paper, we consider the problem of black-box optimization with noisy feedback revealed in batches, where the unknown function to optimize has a bounded norm in some Reproducing Kernel Hilbert Space (RKHS). We refer to this as the Batched Kernelized Bandits problem, and refine and extend existing results on regret bounds. For algorithmic upper bounds, (Li and Scarlett, 2022) shows that $B=O(\log\log T)$ batches suffice to attain near-optimal regret, where $T$ is the time horizon and $B$ is the number of batches. We further refine this by (i) finding the optimal number of batches including constant factors (to within $1+o(1)$), and (ii) removing a factor of $B$ in the regret bound. For algorithm-independent lower bounds, noticing that existing results only apply when the batch sizes are fixed in advance, we present novel lower bounds when the batch sizes are chosen adaptively, and show that adaptive batches have essentially same minimax regret scaling as fixed batches. Furthermore, we consider a robust setting where the goal is to choose points for which the function value remains high even after an adversarial perturbation. We present the robust-BPE algorithm, and show that a suitably-defined cumulative regret notion incurs the same bound as the non-robust setting, and derive a simple regret bound significantly below that of previous work.

【31】Variational Garrote for Sparse Inverse Problems
标题：稀疏反问题的变分绞索
链接：https://arxiv.org/abs/2603.12562

作者：Kanghun Lee,Hyungjoon Soh,Junghyo Jo
备注：10 pages, 4 figures
摘要：Sparse regularization plays a central role in solving inverse problems arising from incomplete or corrupted measurements. Different regularizers correspond to different prior assumptions about the structure of the unknown signal, and reconstruction performance depends on how well these priors match the intrinsic sparsity of the data. This work investigates the effect of sparsity priors in inverse problems by comparing conventional L1 regularization with the Variational Garrote (VG), a probabilistic method that approximates L0 sparsity through variational binary gating variables. A unified experimental framework is constructed across multiple reconstruction tasks including signal resampling, signal denoising, and sparse-view computed tomography. To enable consistent comparison across models with different parameterizations, regularization strength is swept across wide ranges and reconstruction behavior is analyzed through train-generalization error curves. Experiments reveal characteristic bias-variance tradeoff patterns across tasks and demonstrate that VG frequently achieves lower minimum generalization error and improved stability in strongly underdetermined regimes where accurate support recovery is critical. These results suggest that sparsity priors closer to spike-and-slab structure can provide advantages when the underlying coefficient distribution is strongly sparse. The study highlights the importance of prior-data alignment in sparse inverse problems and provides empirical insights into the behavior of variational L0-type methods across different information bottlenecks.

【32】FloeNet: A mass-conserving global sea ice emulator that generalizes across climates
标题：FloeNet：一个大规模保护全球海冰模拟器，可针对不同气候进行推广
链接：https://arxiv.org/abs/2603.12449

作者：William Gregory,Mitchell Bushuk,James Duncan,Elynn Wu,Adam Subel,Spencer K. Clark,Bill Hurlin,Oliver Watt-Meyer,Alistair Adcroft,Chris Bretherton,Laure Zanna
备注：4 Figures, 18 supplementary figures
摘要：We introduce FloeNet, a machine-learning emulator trained on the Geophysical Fluid Dynamics Laboratory global sea ice model, SIS2. FloeNet is a mass-conserving model, emulating 6-hour mass and area budget tendencies related to sea ice and snow-on-sea-ice growth, melt, and advection. We train FloeNet using simulated data from a reanalysis-forced ice-ocean simulation and test its ability to generalize to pre-industrial control and 1% CO2 climates. FloeNet outperforms a non-conservative model at reproducing sea ice and snow-on-sea-ice mean state, trends, and inter-annual variability, with volume anomaly correlations above 0.96 in the Antarctic and 0.76 in the Arctic, across all forcings. FloeNet also produces the correct thermodynamic vs dynamic response to forcing, enabling physical interpretability of emulator output. Finally, we show that FloeNet outputs high-fidelity coupling-related variables, including ice-surface skin temperature, ice-to-ocean salt flux, and melting energy fluxes. We hypothesize that FloeNet will improve polar climate processes within existing atmosphere and ocean emulators.

【33】The Privacy-Utility Trade-Off of Location Tracking in Ad Personalization
标题：广告个性化中位置跟踪的隐私与效用权衡
链接：https://arxiv.org/abs/2603.12374

作者：Mohammad Mosaffa,Omid Rafieian
备注：57 pages, 11 figures. Digital advertising, causal inference, and machine learning
摘要：Firms collect vast amounts of behavioral and geographical data on individuals. While behavioral data captures an individual's digital footprint, geographical data reflects their physical footprint. Given the significant privacy risks associated with combining these data sources, it is crucial to understand their respective value and whether they act as complements or substitutes in achieving firms' business objectives. In this paper, we combine economic theory, machine learning, and causal inference to quantify the value of geographical data, the extent to which behavioral data can substitute for it, and the mechanisms through which it benefits firms. Using data from a leading in-app advertising platform in a large Asian country, we document that geographical data is most valuable in the early cold-start stage, when behavioral histories are limited. In this stage, geographical data complements behavioral data, improving targeting performance by almost 20%. As users accumulate richer behavioral histories, however, the role of geographical data shifts: it becomes largely substitutable, as behavioral data alone captures the relevant heterogeneity. These results highlight a central privacy-utility trade-off in ad personalization and inform managerial decisions about when location tracking creates value.

【34】Probabilistic Joint and Individual Variation Explained (ProJIVE) for Data Integration
标题：数据集成的概率联合和个体变异解释（ProJIVE）
链接：https://arxiv.org/abs/2603.12351

作者：Raphiel J. Murden,Ganzhong Tian,Deqiang Qiu,Benajmin B. Risk
摘要：Collecting multiple types of data on the same set of subjects is common in modern scientific applications including, genomics, metabolomics, and neuroimaging. Joint and Individual Variance Explained (JIVE) seeks a low-rank approximation of the joint variation between two or more sets of features captured on common subjects and isolates this variation from that unique to eachset of features. We develop an expectation-maximization (EM) algorithm to estimate a probabilistic model for the JIVE framework. The model extends probabilistic principal components analysis to multiple data sets. Our maximum likelihood approach simultaneously estimates joint and individual components, which can lead to greater accuracy compared to other methods. We apply ProJIVE to measures of brain morphometry and cognition in Alzheimer's disease. ProJIVE learns biologically meaningful courses of variation, and the joint morphometry and cognition subject scores are strongly related to more expensive existing biomarkers. Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. Code to reproduce the analysis is available on our GitHub page.

机器翻译由腾讯交互翻译提供，仅供参考

点击“阅读原文”获取带摘要的学术速递