机器学习学术速递[3.11]

点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.LG 方向，今日共计152篇

大模型相关(17篇)

【1】MSSR: Memory-Aware Adaptive Replay for Continual LLM Fine-Tuning
标题：MSR：内存感知自适应重播，用于连续LLM微调
链接：https://arxiv.org/abs/2603.09892

作者：Yiyang Lu,Yu He,Jianlong Chen,Hongyuan Zha
摘要：大型语言模型（LLM）的持续微调变得越来越重要，因为这些模型部署在动态环境中，任务和数据分布随着时间的推移而变化。虽然强大的适应性可以快速获取新知识，但它也使LLM面临灾难性的遗忘，即先前学习的技能在顺序训练中会退化。现有的基于重放的策略，如固定的交错重放，精度监督，和损失驱动的调度，仍然是有限的：一些依赖于启发式规则，并提供仅部分缓解遗忘，而其他提高性能，但产生大量的计算开销。受顺序微调下的保留动态的启发，我们提出了记忆启发采样和重复重放（MSSR），一个经验重放框架，估计样本级的记忆强度和时间表排练在自适应的时间间隔，以减轻灾难性的遗忘，同时保持快速适应。在三个骨干模型和11个顺序任务上的广泛实验表明，MSSR的性能始终优于最先进的重放基线，在推理密集型和多项选择基准上的收益尤其强劲。
摘要：Continual fine-tuning of large language models (LLMs) is becoming increasingly crucial as these models are deployed in dynamic environments where tasks and data distributions evolve over time. While strong adaptability enables rapid acquisition of new knowledge, it also exposes LLMs to catastrophic forgetting, where previously learned skills degrade during sequential training. Existing replay-based strategies, such as fixed interleaved replay, accuracy-supervised, and loss-driven scheduling, remain limited: some depend on heuristic rules and provide only partial mitigation of forgetting, while others improve performance but incur substantial computational overhead. Motivated by retention dynamics under sequential fine-tuning, we propose Memory-Inspired Sampler and Scheduler Replay (MSSR), an experience replay framework that estimates sample-level memory strength and schedules rehearsal at adaptive intervals to mitigate catastrophic forgetting while maintaining fast adaptation. Extensive experiments across three backbone models and 11 sequential tasks show that MSSR consistently outperforms state-of-the-art replay baselines, with particularly strong gains on reasoning-intensive and multiple-choice benchmarks.

【2】GAST: Gradient-aligned Sparse Tuning of Large Language Models with Data-layer Selection
标题：GAST：具有数据层选择的大型语言模型的对象对齐稀疏调优
链接：https://arxiv.org/abs/2603.09865

作者：Kai Yao,Zhenghan Song,Kaixin Wu,Mingjie Zhong,Danzhao Cheng,Zhaorui Tan,Yixin Ji,Penglei Gao
摘要：参数高效微调（PEFT）已成为适应大型语言模型的关键策略，稀疏调优的最新进展通过选择性地更新关键参数或数据子集来减少开销。现有的方法通常集中在两个不同的范例：层选择性方法，旨在微调关键层，以最大限度地减少计算负载，和数据选择性方法，旨在选择有效的训练子集，以提高训练。然而，目前的方法通常忽略了这样一个事实，即不同的数据点对不同的模型层有不同程度的贡献，并且它们经常从被认为是低质量的数据中丢弃潜在有价值的信息。为了解决这些局限性，我们提出了一致性对齐稀疏调整（GAST），这是一种创新的方法，同时在数据和层维度上进行选择性微调，作为统一优化策略的组成部分。GAST通过采用层稀疏策略来专门针对信息中的冗余，该策略自适应地为每个层选择最具影响力的数据点，提供比仅限于单个维度的方法更全面和复杂的解决方案。实验表明，GAST始终优于基线方法，建立了一个有前途的方向，为未来的研究PEFT策略。
摘要：Parameter-Efficient Fine-Tuning (PEFT) has become a key strategy for adapting large language models, with recent advances in sparse tuning reducing overhead by selectively updating key parameters or subsets of data. Existing approaches generally focus on two distinct paradigms: layer-selective methods aiming to fine-tune critical layers to minimize computational load, and data-selective methods aiming to select effective training subsets to boost training. However, current methods typically overlook the fact that different data points contribute varying degrees to distinct model layers, and they often discard potentially valuable information from data perceived as of low quality. To address these limitations, we propose Gradient-aligned Sparse Tuning (GAST), an innovative method that simultaneously performs selective fine-tuning at both data and layer dimensions as integral components of a unified optimization strategy. GAST specifically targets redundancy in information by employing a layer-sparse strategy that adaptively selects the most impactful data points for each layer, providing a more comprehensive and sophisticated solution than approaches restricted to a single dimension. Experiments demonstrate that GAST consistently outperforms baseline methods, establishing a promising direction for future research in PEFT strategies.

【3】EsoLang-Bench: Evaluating Genuine Reasoning in Large Language Models via Esoteric Programming Languages
标题：EsoLang-Bench：通过深奥的编程语言评估大型语言模型中的真实推理
链接：https://arxiv.org/abs/2603.09678

作者：Aman Sharma,Paras Chopra
备注：24 pages, 7 figures, preprint
摘要：大型语言模型在代码生成基准测试中的性能接近天花板，但这些结果越来越多地反映了记忆而不是真正的推理。我们介绍EsoLang-Bench，这是一个使用五种深奥的编程语言（Brainfuck，Befunge-98，Whitespace，Unlambda和Shakespeare）的基准测试，由于预训练的经济不合理性，这些语言缺乏基准测试游戏激励。这些语言需要与主流编程相同的计算原语，但公共存储库比Python少1，000 -100，000倍（基于GitHub搜索计数）。我们评估了五种激励策略的五种前沿模型，发现了一个巨大的能力差距：在标准基准测试中达到85-95%的模型在同等深奥任务中的得分仅为0-11%，在简单层之外的准确率为0%。Few-Shot学习和自我反思无法提高表现，这表明这些技术利用了训练先验，而不是实现真正的学习。EsoLang-Bench提供了第一个旨在模仿人类学习的基准测试，通过文档，解释器反馈和迭代实验获得新的语言，测量抗数据污染的可转移推理技能。
摘要：Large language models achieve near-ceiling performance on code generation benchmarks, yet these results increasingly reflect memorization rather than genuine reasoning. We introduce EsoLang-Bench, a benchmark using five esoteric programming languages (Brainfuck, Befunge-98, Whitespace, Unlambda, and Shakespeare) that lack benchmark gaming incentives due to their economic irrationality for pre-training. These languages require the same computational primitives as mainstream programming but have 1,000-100,000x fewer public repositories than Python (based on GitHub search counts). We evaluate five frontier models across five prompting strategies and find a dramatic capability gap: models achieving 85-95% on standard benchmarks score only 0-11% on equivalent esoteric tasks, with 0% accuracy beyond the Easy tier. Few-shot learning and self-reflection fail to improve performance, suggesting these techniques exploit training priors rather than enabling genuine learning. EsoLang-Bench provides the first benchmark designed to mimic human learning by acquiring new languages through documentation, interpreter feedback, and iterative experimentation, measuring transferable reasoning skills resistant to data contamination.

【4】MM-Zero: Self-Evolving Multi-Model Vision Language Models From Zero Data
标题：MM-Zero：来自零数据的自进化多模型视觉语言模型
链接：https://arxiv.org/abs/2603.09206

作者：Zongxia Li,Hongyang Du,Chengsong Huang,Xiyang Wu,Lantao Yu,Yicheng He,Jing Xie,Xiaomin Wu,Zhichao Liu,Jiarui Zhang,Fuxiao Liu
摘要：自我进化已经成为改进基础模型的关键范例，如大型语言模型（LLM）和视觉语言模型（VLM），只需最少的人为干预。虽然最近的方法已经证明LLM代理可以在几乎没有数据的情况下从头开始自我进化，但VLM引入了一种额外的视觉模式，通常需要至少一些种子数据（如图像）来引导自我进化过程。在这项工作中，我们提出了多模型多模态零（MM零），第一个基于RL的框架，实现零数据的自进化VLM推理。超越先前的双角色（提议者和求解者）设置，MM-Zero引入了一个多角色自进化训练框架，包括三个专门的角色：生成抽象视觉概念并制定问题的提议者;将这些概念转换为可执行代码的编码器（例如，Python、SVG）来渲染视觉图像;以及一个求解器，它对生成的视觉内容执行多模态推理。这三个角色都是从相同的基本模型初始化的，并使用组相对策略优化（GRPO）进行训练，精心设计的奖励机制集成了执行反馈，视觉验证和难度平衡。我们的实验表明，MM-Zero在广泛的多模态基准测试中提高了VLM推理性能。MM-Zero为多模态模型建立了一条通往自进化多模型系统的可扩展路径，将自我改进的前沿扩展到传统的两模型范式之外。
摘要：Self-evolving has emerged as a key paradigm for improving foundational models such as Large Language Models (LLMs) and Vision Language Models (VLMs) with minimal human intervention. While recent approaches have demonstrated that LLM agents can self-evolve from scratch with little to no data, VLMs introduce an additional visual modality that typically requires at least some seed data, such as images, to bootstrap the self-evolution process. In this work, we present Multi-model Multimodal Zero (MM-Zero), the first RL-based framework to achieve zero-data self-evolution for VLM reasoning. Moving beyond prior dual-role (Proposer and Solver) setups, MM-Zero introduces a multi-role self-evolving training framework comprising three specialized roles: a Proposer that generates abstract visual concepts and formulates questions; a Coder that translates these concepts into executable code (e.g., Python, SVG) to render visual images; and a Solver that performs multimodal reasoning over the generated visual content. All three roles are initialized from the same base model and trained using Group Relative Policy Optimization (GRPO), with carefully designed reward mechanisms that integrate execution feedback, visual verification, and difficulty balancing. Our experiments show that MM-Zero improves VLM reasoning performance across a wide range of multimodal benchmarks. MM-Zero establishes a scalable path toward self-evolving multi-model systems for multimodal models, extending the frontier of self-improvement beyond the conventional two-model paradigm.

【5】Emotion is Not Just a Label: Latent Emotional Factors in LLM Processing
标题：情感不仅仅是一个标签：LLM处理中的潜在情感因素
链接：https://arxiv.org/abs/2603.09205

作者：Benjamin Reichman,Adar Avasian,Samuel Webster,Larry Heck
摘要：大型语言模型通常部署在情感基调变化很大的文本上，但它们的推理行为通常在评估时没有考虑情感作为表征变化的来源。先前的工作主要将情感作为预测目标，例如在情感分析或情感分类中。相比之下，我们将情感作为一个潜在的因素来研究，它塑造了模型对文本的关注和推理。我们分析了情感基调如何系统地改变注意力几何Transformer模型，显示的指标，如局部性，质心距离和熵不同的情绪，并与下游的问答性能。为了便于对这些影响的控制研究，我们引入了情感统一阅读QA（AURA-QA），这是一个情感平衡的问答数据集，由人类创作的上下文段落。最后，提出了一个情感正则化框架，该框架在训练过程中约束情感条件表征漂移。在多个QA基准测试中的实验表明，这种方法可以提高情绪变化和非情绪变化数据集的阅读理解能力，在几个基准测试中，在分布偏移和域内改进下获得一致的收益。
摘要：Large language models are routinely deployed on text that varies widely in emotional tone, yet their reasoning behavior is typically evaluated without accounting for emotion as a source of representational variation. Prior work has largely treated emotion as a prediction target, for example in sentiment analysis or emotion classification. In contrast, we study emotion as a latent factor that shapes how models attend to and reason over text. We analyze how emotional tone systematically alters attention geometry in transformer models, showing that metrics such as locality, center-of-mass distance, and entropy vary across emotions and correlate with downstream question-answering performance. To facilitate controlled study of these effects, we introduce Affect-Uniform ReAding QA (AURA-QA), a question-answering dataset with emotionally balanced, human-authored context passages. Finally, an emotional regularization framework is proposed that constrains emotion-conditioned representational drift during training. Experiments across multiple QA benchmarks demonstrate that this approach improves reading comprehension in both emotionally-varying and non-emotionally varying datasets, yielding consistent gains under distribution shift and in-domain improvements on several benchmarks.

【6】Wrong Code, Right Structure: Learning Netlist Representations from Imperfect LLM-Generated RTL
标题：错误的代码，正确的结构：从不完美的LLM生成的RTL中学习网表表示
链接：https://arxiv.org/abs/2603.09161

作者：Siyang Cai,Cangyuan Li,Yinhe Han,Ying Wang
摘要：学习有效的网表表示从根本上受到标记数据集稀缺性的限制，因为真正的设计受到知识产权（IP）的保护，并且注释成本高昂。因此，现有的工作集中在小规模电路与干净的标签，限制可扩展性的现实设计。与此同时，大型语言模型（LLM）可以大规模生成寄存器传输级（RTL），但其功能不正确阻碍了它们在电路分析中的使用。在这项工作中，我们做了一个关键的观察：即使当LLM-Generated RTL在功能上不完善时，合成的网表仍然保留了强烈指示预期功能的结构模式。基于这一见解，我们提出了一个具有成本效益的数据增强和训练框架，该框架系统地利用不完善的LLM生成的RTL作为网表表示学习的训练数据，形成从自动代码生成到下游任务的端到端管道。我们对电路功能理解任务进行评估，包括子电路边界识别和组件分类，跨越越来越大规模的基准，将任务范围从操作员级别扩展到IP级别。评估表明，在我们的噪声合成语料库上训练的模型可以很好地推广到现实世界的网表，匹配甚至超过在稀缺的高质量数据上训练的方法，并有效地打破了电路表示学习中的数据瓶颈。
摘要：Learning effective netlist representations is fundamentally constrained by the scarcity of labeled datasets, as real designs are protected by Intellectual Property (IP) and costly to annotate. Existing work therefore focuses on small-scale circuits with clean labels, limiting scalability to realistic designs. Meanwhile, Large Language Models (LLMs) can generate Register-Transfer-Level (RTL) at scale, but their functional incorrectness has hindered their use in circuit analysis. In this work, we make a key observation: even when LLM-Generated RTL is functionally imperfect, the synthesized netlists still preserve structural patterns that are strongly indicative of the intended functionality. Building on this insight, we propose a cost-effective data augmentation and training framework that systematically exploits imperfect LLM-Generated RTL as training data for netlist representation learning, forming an end-to-end pipeline from automated code generation to downstream tasks. We conduct evaluations on circuit functional understanding tasks, including sub-circuit boundary identification and component classification, across benchmarks of increasing scales, extending the task scope from operator-level to IP-level. The evaluations demonstrate that models trained on our noisy synthetic corpus generalize well to real-world netlists, matching or even surpassing methods trained on scarce high-quality data and effectively breaking the data bottleneck in circuit representation learning.

【7】Not All News Is Equal: Topic- and Event-Conditional Sentiment from Finetuned LLMs for Aluminum Price Forecasting
标题：并非所有新闻都是平等的：来自铝价格预测的微调LLM的主题和事件条件情绪
链接：https://arxiv.org/abs/2603.09085

作者：Alvaro Paredes Amorin,Andre Python,Christoph Weisser
备注：8 pages
摘要：通过捕捉主流情绪和市场情绪，文本数据对于预测商品价格，特别是金属市场的价格变得越来越重要。然而，轻量级、微调的大型语言模型（LLM）在提取铝价预测信号方面的有效性，以及这些信号最具信息性的具体市场条件，仍然没有得到充分的探索。该研究从英文和中文新闻标题（路透社，道琼斯通讯社和中国新闻社）中生成月度情绪评分，并将其与传统的表格数据相结合，包括基本金属指数，汇率，通货膨胀率和能源价格。我们通过上海金属交易所2007年至2024年的多空模拟来评估这些模型的预测性能和经济效用。我们的研究结果表明，在高波动性期间，包含来自微调Qwen3模型（夏普比率1.04）的情绪数据的长短期记忆（LSTM）模型显著优于仅使用表格数据的基线模型（夏普比率0.23）。随后的分析阐明了新闻来源，主题和事件类型在铝价预测中的细微差别。
摘要：By capturing the prevailing sentiment and market mood, textual data has become increasingly vital for forecasting commodity prices, particularly in metal markets. However, the effectiveness of lightweight, finetuned large language models (LLMs) in extracting predictive signals for aluminum prices, and the specific market conditions under which these signals are most informative, remains under-explored. This study generates monthly sentiment scores from English and Chinese news headlines (Reuters, Dow Jones Newswires, and China News Service) and integrates them with traditional tabular data, including base metal indices, exchange rates, inflation rates, and energy prices. We evaluate the predictive performance and economic utility of these models through long-short simulations on the Shanghai Metal Exchange from 2007 to 2024. Our results demonstrate that during periods of high volatility, Long Short-Term Memory (LSTM) models incorporating sentiment data from a finetuned Qwen3 model (Sharpe ratio 1.04) significantly outperform baseline models using tabular data alone (Sharpe ratio 0.23). Subsequent analysis elucidates the nuanced roles of news sources, topics, and event types in aluminum price forecasting.

【8】Learning Adaptive LLM Decoding
标题：学习自适应LLM解码
链接：https://arxiv.org/abs/2603.09065

作者：Chloe H. Su,Zhe Ye,Samuel Tenka,Aidan Yang,Soonho Kong,Udaya Ghai
摘要：从大型语言模型（LLM）解码通常依赖于固定的采样超参数（例如，温度，top-p），尽管在任务难度和跨提示和单独解码步骤的不确定性方面存在实质性变化。我们建议学习自适应解码策略，在推理时动态选择采样策略，条件是可用的计算资源。我们引入轻量级解码适配器，而不是微调语言模型本身，这些适配器通过强化学习和可验证的终端奖励（例如数学和编码任务的正确性）进行训练。在序列级，我们将解码框定为上下文强盗问题：策略为每个提示选择解码策略（例如贪婪，top-k，min-p），条件是提示嵌入和并行采样预算。在令牌级别，我们将解码建模为部分可观察的马尔可夫决策过程（POMDP），其中策略基于内部模型特征和剩余令牌预算在每个令牌步骤选择采样动作。在MATH和CodeContests基准测试上的实验表明，学习的适配器改善了准确性预算的权衡：在MATH上，在固定的令牌预算下，令牌级适配器将Pass@1准确性提高了10.2%，而在固定的并行采样下，序列级适配器产生了2-3%的增益。消融分析支持序列和标记级适应的贡献。
摘要：Decoding from large language models (LLMs) typically relies on fixed sampling hyperparameters (e.g., temperature, top-p), despite substantial variation in task difficulty and uncertainty across prompts and individual decoding steps. We propose to learn adaptive decoding policies that dynamically select sampling strategies at inference time, conditioned on available compute resources. Rather than fine-tuning the language model itself, we introduce lightweight decoding adapters trained with reinforcement learning and verifiable terminal rewards (e.g. correctness on math and coding tasks). At the sequence level, we frame decoding as a contextual bandit problem: a policy selects a decoding strategy (e.g. greedy, top-k, min-p) for each prompt, conditioned on the prompt embedding and a parallel sampling budget. At the token level, we model decoding as a partially observable Markov decision process (POMDP), where a policy selects sampling actions at each token step based on internal model features and the remaining token budget. Experiments on the MATH and CodeContests benchmarks show that the learned adapters improve the accuracy-budget tradeoff: on MATH, the token-level adapter improves Pass@1 accuracy by up to 10.2% over the best static baseline under a fixed token budget, while the sequence-level adapter yields 2-3% gains under fixed parallel sampling. Ablation analyses support the contribution of both sequence- and token-level adaptation.

【9】FlexServe: A Fast and Secure LLM Serving System for Mobile Devices with Flexible Resource Isolation
标题：FlexServe：一个针对移动设备的快速安全的LLM服务系统，具有灵活的资源隔离
链接：https://arxiv.org/abs/2603.09046

作者：Yinpeng Wu,Yitong Chen,Lixiang Wang,Jinyu Gu,Zhichao Hua,Yubin Xia
备注：13 pages, 11 figures
摘要：设备端大型语言模型（LLM）已经见证了爆炸式的增长，与云端LLM相比，它提供了更高的隐私性和可用性。在LLM推理过程中，模型权重和用户数据都是有价值的，攻击者甚至可能危及操作系统内核来窃取它们。ARM TrustZone是移动设备上事实上的基于硬件的隔离技术，用于保护敏感应用程序免受受损操作系统的影响。然而，由于其内存和NPU的不灵活隔离，使用TrustZone保护LLM推理会产生显著的开销。为了应对这些挑战，本文介绍了FlexServe，一个快速，安全的LLM服务系统的移动设备。首先介绍了一种灵活的资源隔离机制，构造了灵活安全存储器（Flex-Mem）和灵活安全NPU（Flex-NPU）。内存页和NPU都可以在非保护模式和保护模式之间有效地切换。基于这些机制，FlexServe在TrustZone的安全世界中设计了一个快速安全的LLM推理框架。引入LLM-Aware内存管理和安全推理流水线来加速推理。提出了一种多模型工作流优化算法。我们实现了一个原型的FlexServe和比较它与两个信任区为基础的strawman设计。结果表明，与strawman相比，FlexServe在首次令牌时间（TTFT）方面实现了平均10.05\times $加速，与启用了管道和安全NPU的优化strawman相比，平均TTFT加速为2.44\times $。对于多模型代理工作流，与strawman和优化strawman相比，端到端加速分别高达24.30\times $和4.05\times $。
摘要：Device-side Large Language Models (LLMs) have witnessed explosive growth, offering higher privacy and availability compared to cloud-side LLMs. During LLM inference, both model weights and user data are valuable, and attackers may even compromise the OS kernel to steal them. ARM TrustZone is the de facto hardware-based isolation technology on mobile devices, used to protect sensitive applications from a compromised OS. However, protecting LLM inference with TrustZone incurs significant overhead due to its inflexible isolation of memory and the NPU. To address these challenges, this paper introduces FlexServe, a fast and secure LLM serving system for mobile devices. It first introduces a Flexible Resource Isolation mechanism to construct Flexible Secure Memory (Flex-Mem) and Flexible Secure NPU (Flex-NPU). Both memory pages and the NPU can be efficiently switched between unprotected and protected modes. Based on these mechanisms, FlexServe designs a fast and secure LLM inference framework within TrustZone's secure world. The LLM-Aware Memory Management and Secure Inference Pipeline are introduced to accelerate inference. A Multi-Model Scheduler is proposed to optimize multi-model workflows. We implement a prototype of FlexServe and compare it with two TrustZone-based strawman designs. The results show that FlexServe achieves an average $10.05\times$ speedup in Time to First Token (TTFT) compared to the strawman, and an average $2.44\times$ TTFT speedup compared to an optimized strawman with pipeline and secure NPU enabled. For multi-model agent workflows, the end-to-end speedup is up to $24.30\times$ and $4.05\times$ compared to the strawman and optimized strawman, respectively.

【10】SCALAR: Learning and Composing Skills through LLM Guided Symbolic Planning and Deep RL Grounding
标题：SCARAL：通过LLM指导的符号规划和深入的RL基础学习和创作技能
链接：https://arxiv.org/abs/2603.09036

作者：Renos Zabounidis,Yue Wu,Simon Stepputtis,Woojun Kim,Yuanzhi Li,Tom Mitchell,Katia Sycara
备注：Best Paper Award Honorable Mention at NeurIPS 2025 Workshop on Bridging Language, Agent, and World Models for Reasoning and Planning
摘要：基于LM的代理在提供高级操作API时表现出色，但难以将语言置于低级控制中。以前的工作有LLM为RL生成技能或奖励函数，但这些一次性方法缺乏反馈来纠正规范错误。我们介绍SCALAR，一个双向框架耦合LLM规划与RL通过一个学习的技能库。LLM提出具有前提条件和效果的技能; RL为每个技能训练策略，并反馈执行结果以迭代地细化规范，提高对初始错误的鲁棒性。边界轨迹分析通过分析RL轨迹来纠正LLM先验;边界检查点可选地保存技能边界处的环境状态以提高样本效率。在Craftax上，SCALAR实现了88.2%的钻石收集，比最佳基线提高了1.9倍，并且在先前方法完全失败的情况下达到了9.1%的侏儒矿。
摘要：LM-based agents excel when given high-level action APIs but struggle to ground language into low-level control. Prior work has LLMs generate skills or reward functions for RL, but these one-shot approaches lack feedback to correct specification errors. We introduce SCALAR, a bidirectional framework coupling LLM planning with RL through a learned skill library. The LLM proposes skills with preconditions and effects; RL trains policies for each skill and feeds back execution results to iteratively refine specifications, improving robustness to initial errors. Pivotal Trajectory Analysis corrects LLM priors by analyzing RL trajectories; Frontier Checkpointing optionally saves environment states at skill boundaries to improve sample efficiency. On Craftax, SCALAR achieves 88.2% diamond collection, a 1.9x improvement over the best baseline, and reaches the Gnomish Mines 9.1% of the time where prior methods fail entirely.

【11】A Consensus-Driven Multi-LLM Pipeline for Missing-Person Investigations
标题：基于知识驱动的多LLM失踪人员调查管道
链接：https://arxiv.org/abs/2603.08954

作者：Joshua Castillo,Ravi Mukkamala
备注：Accepted to CAC: Applied Computing & Automation Conferences 2026. 16 pages, 6 figures
摘要：失踪人员调查的头72小时对于成功恢复至关重要。Guardian是一个端到端系统，旨在支持失踪儿童调查和早期搜索计划。本文介绍了监护人LLM管道，一个多模型系统，其中LLM用于智能信息提取和处理失踪人员搜索操作。管道协调跨任务专用LLM模型的端到端执行，并调用共识LLM引擎，该引擎比较多个模型输出并解决分歧。该管道通过基于QLoRA的微调，使用策划的数据集进一步加强。所提出的设计与弱监督和LLM辅助注释的先前工作相一致，强调LLM作为结构化提取器和标签器的保守，可审计的使用，而不是不受约束的端到端决策者。
摘要：The first 72 hours of a missing-person investigation are critical for successful recovery. Guardian is an end-to-end system designed to support missing-child investigation and early search planning. This paper presents the Guardian LLM Pipeline, a multi-model system in which LLMs are used for intelligent information extraction and processing related to missing-person search operations. The pipeline coordinates end-to-end execution across task-specialized LLM models and invokes a consensus LLM engine that compares multiple model outputs and resolves disagreements. The pipeline is further strengthened by QLoRA-based fine-tuning, using curated datasets. The presented design aligns with prior work on weak supervision and LLM-assisted annotation, emphasizing conservative, auditable use of LLMs as structured extractors and labelers rather than unconstrained end-to-end decision makers.

【12】Interpretable Markov-Based Spatiotemporal Risk Surfaces for Missing-Child Search Planning with Reinforcement Learning and LLM-Based Quality Assurance
标题：具有强化学习和基于LLM的质量保证的失踪儿童搜索规划的可解释基于马尔科夫的时空风险表面
链接：https://arxiv.org/abs/2603.08933

作者：Joshua Castillo,Ravi Mukkamala
备注：14 pages, 7 figures. Accepted at ICEIS 2026 (International Conference on Enterprise Information Systems)
摘要：失踪儿童调查的头72小时对于成功恢复至关重要。然而，执法机构往往面临零散、非结构化的数据，缺乏动态的地理空间预测工具。我们的系统Guardian为失踪儿童调查和早期搜索规划提供了端到端的决策支持系统。它将异构的，非结构化的情况下，文档转换为模式对齐的时空表示，丰富的情况下，地理编码和交通环境，并提供概率搜索产品跨越0-72小时。在本文中，我们提出了一个概述的监护人以及详细描述的三层预测组件的系统。第一层是马尔可夫链，一个稀疏的，可解释的模型与过渡，将道路可达性成本，隔离的偏好，走廊偏见与单独的日/夜参数化。马尔可夫链的输出预测分布，然后转化为操作上有用的搜索计划的第二层的强化学习。最后，第三层的LLM在层2搜索计划发布之前执行层2搜索计划的事后验证。使用合成但现实的案例研究，我们报告了24/48/72小时范围内的定量输出，并分析了灵敏度，故障模式和权衡。结果表明，所提出的预测系统与三层架构产生可解释的区域优化和人类审查的先验。
摘要：The first 72 hours of a missing-child investigation are critical for successful recovery. However, law enforcement agencies often face fragmented, unstructured data and a lack of dynamic, geospatial predictive tools. Our system, Guardian, provides an end-to-end decision-support system for missing-child investigation and early search planning. It converts heterogeneous, unstructured case documents into a schema-aligned spatiotemporal representation, enriches cases with geocoding and transportation context, and provides probabilistic search products spanning 0-72 hours. In this paper, we present an overview of Guardian as well as a detailed description of a three-layer predictive component of the system. The first layer is a Markov chain, a sparse, interpretable model with transitions incorporating road accessibility costs, seclusion preferences, and corridor bias with separate day/night parameterizations. The Markov chain's output prediction distributions are then transformed into operationally useful search plans by the second layer's reinforcement learning. Finally, the third layer's LLM performs post hoc validation of layer 2 search plans prior to their release. Using a synthetic but realistic case study, we report quantitative outputs across 24/48/72-hour horizons and analyze sensitivity, failure modes, and tradeoffs. Results show that the proposed predictive system with the three-layer architecture produces interpretable priors for zone optimization and human review.

【13】Vision-Language Models Encode Clinical Guidelines for Concept-Based Medical Reasoning
标题：视觉语言模型编码基于概念的医学推理的临床指南
链接：https://arxiv.org/abs/2603.08921

作者：Mohamed Harmanani,Bining Long,Zhuoxin Guo,Paul F. R. Wilson,Amirhossein Sabour,Minh Nguyen Nhat To,Gabor Fichtinger,Purang Abolmaesumi,Parvin Mousavi
备注：CVPR 2026 Findings
摘要：概念瓶颈模型（CBMs）是可解释AI的一个重要框架，它将学习到的视觉特征映射到一组有意义的概念，用于特定于任务的下游预测。它们的顺序结构通过将模型预测与支持它们的基本概念联系起来来增强透明度。在医学成像中，透明度是必不可少的，建立信任措施为可解释的模型设计提供了一个有吸引力的基础。然而，离散的概念表示往往忽略了更广泛的临床背景，如诊断指南和专家诊断，降低了复杂情况下的可靠性。我们提出了MedCBR，一个基于概念的推理框架，将临床指南与视觉语言和推理模型相结合。标记的临床描述符被转换成符合指南的文本，并且基于概念的模型用多任务目标进行训练，该多任务目标结合了多模态对比对准、概念监督和诊断分类，以联合地形成图像特征、概念和病理。然后，推理模型将这些预测转换为解释诊断的结构化临床叙述，并根据既定的指南模拟专家推理。MedCBR实现了卓越的诊断和概念级性能，超声AUROC为94.2%，乳腺X线摄影为84.0%。在非医学数据集上的进一步实验达到了86.1%的准确率。我们的框架增强了可解释性，并形成了从医学图像分析到决策的端到端桥梁。
摘要：Concept Bottleneck Models (CBMs) are a prominent framework for interpretable AI that map learned visual features to a set of meaningful concepts for task-specific downstream predictions. Their sequential structure enhances transparency by connecting model predictions to the underlying concepts that support them. In medical imaging, where transparency is essential, CBMs offer an appealing foundation for explainable model design. However, discrete concept representations often overlook broader clinical context such as diagnostic guidelines and expert heuristics, reducing reliability in complex cases. We propose MedCBR, a concept-based reasoning framework that integrates clinical guidelines with vision-language and reasoning models. Labeled clinical descriptors are transformed into guideline-conformant text, and a concept-based model is trained with a multitask objective combining multimodal contrastive alignment, concept supervision, and diagnostic classification to jointly ground image features, concepts, and pathology. A reasoning model then converts these predictions into structured clinical narratives that explain the diagnosis, emulating expert reasoning based on established guidelines. MedCBR achieves superior diagnostic and concept-level performance, with AUROCs of 94.2% on ultrasound and 84.0% on mammography. Further experiments on non-medical datasets achieve 86.1% accuracy. Our framework enhances interpretability and forms an end-to-end bridge from medical image analysis to decision-making.

【14】Quantifying Memorization and Privacy Risks in Genomic Language Models
标题：量化基因组语言模型中的精简化和隐私风险
链接：https://arxiv.org/abs/2603.08913

作者：Alexander Nemecek,Wenbiao Li,Xiaoqian Jiang,Jaideep Vaidya,Erman Ayday
备注：13 pages
摘要：基因组语言模型（GLM）已经成为学习DNA序列表示的强大工具，使变异预测，调控元件识别和跨任务迁移学习取得进展。然而，随着这些模型越来越多地在敏感的基因组队列上进行训练或微调，它们可能会记住训练数据中的特定序列，从而引发对隐私、数据泄露和监管合规性的严重担忧。尽管越来越多的人意识到通用语言模型中的记忆风险，但在基因组领域中，这些风险几乎没有系统的评估，其中数据表现出独特的属性，如固定的核苷酸字母表，强大的生物结构和个体可识别性。我们提出了一个全面的，多向量的隐私评估框架，旨在量化GLM的记忆风险。我们的方法集成了三个互补的风险评估方法：基于困惑的检测，金丝雀序列提取，和成员推理。这些组合成一个统一的评估管道，产生最坏情况下的记忆风险评分。为了实现受控评估，我们以不同的重复率将金丝雀序列植入合成和真实的基因组数据集，从而精确量化重复和训练动态如何影响记忆。我们评估我们的框架在多个GLM架构，检查序列重复，模型容量和记忆风险之间的关系。我们的研究结果表明，GLM表现出可测量的记忆和记忆的程度不同的架构和培训制度。这些发现表明，没有一个单一的攻击向量能够捕获记忆风险的全部范围，这强调了多向量隐私审计作为基因组AI系统标准实践的必要性。
摘要：Genomic language models (GLMs) have emerged as powerful tools for learning representations of DNA sequences, enabling advances in variant prediction, regulatory element identification, and cross-task transfer learning. However, as these models are increasingly trained or fine-tuned on sensitive genomic cohorts, they risk memorizing specific sequences from their training data, raising serious concerns around privacy, data leakage, and regulatory compliance. Despite growing awareness of memorization risks in general-purpose language models, little systematic evaluation exists for these risks in the genomic domain, where data exhibit unique properties such as a fixed nucleotide alphabet, strong biological structure, and individual identifiability. We present a comprehensive, multi-vector privacy evaluation framework designed to quantify memorization risks in GLMs. Our approach integrates three complementary risk assessment methodologies: perplexity-based detection, canary sequence extraction, and membership inference. These are combined into a unified evaluation pipeline that produces a worst-case memorization risk score. To enable controlled evaluation, we plant canary sequences at varying repetition rates into both synthetic and real genomic datasets, allowing precise quantification of how repetition and training dynamics influence memorization. We evaluate our framework across multiple GLM architectures, examining the relationship between sequence repetition, model capacity, and memorization risk. Our results establish that GLMs exhibit measurable memorization and that the degree of memorization varies across architectures and training regimes. These findings reveal that no single attack vector captures the full scope of memorization risk, underscoring the need for multi-vector privacy auditing as a standard practice for genomic AI systems.

【15】APPLV: Adaptive Planner Parameter Learning from Vision-Language-Action Model
标题：APPLV：视觉-语言-动作模型的自适应规划器参数学习
链接：https://arxiv.org/abs/2603.08862

作者：Yuanjie Lu,Beichen Wang,Zhengqi Wu,Yang Li,Xiaomin Lin,Chengzhi Mao,Xuesu Xiao
摘要：高度受限环境中的自主导航仍然是移动机器人的挑战。经典的导航方法提供了安全保证，但需要环境特定的参数调整;端到端学习绕过参数调整，但在受限空间中难以进行精确控制。为此，最近的机器人学习方法自动化参数调整，同时保持经典系统的安全性，但仍然面临着推广到看不见的环境的挑战。最近，视觉-语言-动作（VLA）模型通过利用基础模型的场景理解能力显示出了希望，但在导航任务中仍然难以实现精确控制和推理延迟。在本文中，我们提出了自适应规划参数学习视觉语言行动模型（\textsc{applv}）。与直接输出动作的传统VLA模型不同，\textsc{applv}利用带有回归头的预训练视觉语言模型来预测配置经典规划器的规划器参数。我们开发了两种训练策略：从收集的导航轨迹进行监督学习微调和强化学习微调，以进一步优化导航性能。我们在模拟的基准自主机器人导航（BARN）数据集和物理机器人实验中评估多个运动规划器。结果表明，\textsc{applv}优于现有的方法，在导航性能和推广到看不见的环境。
摘要：Autonomous navigation in highly constrained environments remains challenging for mobile robots. Classical navigation approaches offer safety assurances but require environment-specific parameter tuning; end-to-end learning bypasses parameter tuning but struggles with precise control in constrained spaces. To this end, recent robot learning approaches automate parameter tuning while retaining classical systems' safety, yet still face challenges in generalizing to unseen environments. Recently, Vision-Language-Action (VLA) models have shown promise by leveraging foundation models' scene understanding capabilities, but still struggle with precise control and inference latency in navigation tasks. In this paper, we propose Adaptive Planner Parameter Learning from Vision-Language-Action Model (\textsc{applv}). Unlike traditional VLA models that directly output actions, \textsc{applv} leverages pre-trained vision-language models with a regression head to predict planner parameters that configure classical planners. We develop two training strategies: supervised learning fine-tuning from collected navigation trajectories and reinforcement learning fine-tuning to further optimize navigation performance. We evaluate \textsc{applv} across multiple motion planners on the simulated Benchmark Autonomous Robot Navigation (BARN) dataset and in physical robot experiments. Results demonstrate that \textsc{applv} outperforms existing methods in both navigation performance and generalization to unseen environments.

【16】Hindsight Credit Assignment for Long-Horizon LLM Agents
标题：长期LLM代理人的事后诸葛亮信用分配
链接：https://arxiv.org/abs/2603.08754

作者：Hui-Ze Tan,Xiao-Wen Yang,Hao Chen,Jie-Jing Shao,Yi Wen,Yuteng Shen,Weihong Luo,Xiku Du,Lan-Zhe Guo,Yu-Feng Li
摘要：大型语言模型（LLM）代理在长时间、多步骤任务中，由于奖励稀疏，经常面临显著的信用分配挑战。现有的无值方法，如组相对策略优化（GRPO），遇到两个根本的瓶颈：不准确的步骤级Q值估计和中间状态的不对齐的值基线。为了解决这些局限性，我们引入HCAPO，第一个框架整合后见之明信用分配到LLM代理。HCAPO利用LLM本身作为事后评论家，通过事后推理来完善步骤级Q值。此外，HCAPO的多尺度优势机制有效地补充了关键决策状态下不准确的价值基线。在三个具有挑战性的基准，包括WebShop和ALFWorld的评估，表明HCAPO始终优于最先进的RL方法。值得注意的是，HCAPO在WebShop上的成功率提高了7.7%，在ALFWorld上的成功率提高了13.8%。这些结果表明，HCAPO显着提高勘探效率，促进简洁的决策，并确保在复杂的，长期的任务的可扩展性。
摘要：Large Language Model (LLM) agents often face significant credit assignment challenges in long-horizon, multi-step tasks due to sparse rewards. Existing value-free methods, such as Group Relative Policy Optimization (GRPO), encounter two fundamental bottlenecks: inaccurate step-level Q-value estimation and misaligned value baselines for intermediate states. To address these limitations, we introduce HCAPO, the first framework to integrate hindsight credit assignment into LLM agents. HCAPO leverages the LLM itself as a post-hoc critic to refine step-level Q-values through hindsight reasoning. Furthermore, HCAPO's multi-scale advantage mechanism effectively supplements the inaccurate value baselines at critical decision states. Evaluations across three challenging benchmarks, including WebShop and ALFWorld, demonstrate that HCAPO consistently outperforms state-of-the-art RL methods. Notably, HCAPO achieves a 7.7% improvement in success rate on WebShop and a 13.8% on ALFWorld over GRPO using the Qwen2.5-7B-Instruct model. These results indicate that HCAPO significantly enhances exploration efficiency, promotes concise decision-making, and ensures scalability in complex, long-horizon tasks.

【17】Skip to the Good Part: Representation Structure & Inference-Time Layer Skipping in Diffusion vs. Autoregressive LLMs
标题：跳到好的部分：表示结构和推理时间层跳过扩散与自回归LLM
链接：https://arxiv.org/abs/2603.07475

作者：Raghavv Goel,Risheek Garrepalli,Sudhanshu Agrawal,Chris Lott,Mingu Lee,Fatih Porikli
备注：Accepted at Sci4DL and Delta workshops at ICLR 2026
摘要：自回归（AR）语言模型通过从左到右的预测逐渐形成表示，而扩散语言模型（dLLM）通过全序列去噪进行训练。尽管最近的dLLM与AR性能相匹配，但仍不清楚扩散目标是否从根本上重塑了整个深度的内部表示。我们执行第一层和令牌代表性分析，比较原生dLLM（LLaDA），原生AR模型（Qwen2.5）和AR初始化的dLLM（Dream-7 B）。我们发现，扩散目标导致不同的，更层次的抽象与大量的早期层冗余和减少近因偏差，而AR目标产生紧密耦合，深度依赖的表示。至关重要的是，AR初始化的dLLM保留AR样的代表性动态，尽管扩散训练，揭示持久的初始化偏见。利用这种观察到的代表性冗余，我们引入了一个静态的，任务不可知的推理时间层跳过方法，不需要架构的变化或KV缓存共享。原生dLLM实现了高达18.75%的FLOP减少，同时在推理和代码生成基准上保持了90%以上的性能，而AR模型在类似的跳过下急剧下降。这些结果将训练目标与表征结构联系起来，并实现了实用的缓存正交效率增益。
摘要：Autoregressive (AR) language models form representations incrementally through left-to-right prediction, whereas diffusion language models (dLLMs) are trained via full-sequence denoising. Although recent dLLMs match AR performance, it remains unclear whether diffusion objectives fundamentally reshape internal representations across depth. We perform the first layer- and token-wise representational analysis comparing native dLLMs (LLaDA), native AR models (Qwen2.5), and AR-initialized dLLMs (Dream-7B). We find that diffusion objectives result in different, more hierarchical abstractions with substantial early-layer redundancy and reduced recency bias, while AR objectives produce tightly coupled, depth-dependent representations. Critically, AR-initialized dLLMs retain AR-like representational dynamics despite diffusion training, revealing persistent initialization bias. Leveraging this observed representational redundancy, we introduce a static, task-agnostic inference-time layer-skipping method requiring no architectural changes or KV-cache sharing. Native dLLMs achieve up to 18.75% FLOPs reduction while preserving over 90% performance on reasoning and code generation benchmarks, whereas AR models degrade sharply under comparable skipping. These results link training objectives to representational structure and enable practical, cache-orthogonal efficiency gains.

Graph相关(图学习|图神经网络|图优化等)(5篇)

【1】A Graph-Based Approach to Spectrum Demand Prediction Using Hierarchical Attention Networks
标题：使用分层注意力网络的基于图的频谱需求预测方法
链接：https://arxiv.org/abs/2603.09859

作者：Mohamad Alkadamani,Halim Yanikomeroglu,Amir Ghasemi
备注：7 pages, 6 figures. Presented at IEEE GLOBECOM 2025, Taiwan. To appear in the conference proceedings
摘要：无线连接需求的激增，加上频谱资源的有限性，迫使开发有效的频谱管理方法。频谱共享是一个很有前途的途径，尽管它需要准确描述频谱需求，以便做出明智的决策。本文介绍了HR-GAT，层次分辨率图注意力网络模型，旨在预测频谱需求使用地理空间数据。HR-GAT能够熟练地处理复杂的空间需求模式，并解决通常会挑战标准机器学习模型的空间自相关问题，这些问题通常会导致泛化能力差。HR-GAT在加拿大五个主要城市进行了测试，与八个基线模型相比，其频谱需求预测准确度提高了21%，突出了其卓越的性能和可靠性。
摘要：The surge in wireless connectivity demand, coupled with the finite nature of spectrum resources, compels the development of efficient spectrum management approaches. Spectrum sharing presents a promising avenue, although it demands precise characterization of spectrum demand for informed policy-making. This paper introduces HR-GAT, a hierarchical resolution graph attention network model, designed to predict spectrum demand using geospatial data. HR-GAT adeptly handles complex spatial demand patterns and resolves issues of spatial autocorrelation that usually challenge standard machine learning models, often resulting in poor generalization. Tested across five major Canadian cities, HR-GAT improves predictive accuracy of spectrum demand by 21% over eight baseline models, underscoring its superior performance and reliability.

【2】TA-GGAD: Testing-time Adaptive Graph Model for Generalist Graph Anomaly Detection
标题：TA-GDAD：用于通用图异常检测的测试时自适应图模型
链接：https://arxiv.org/abs/2603.09349

作者：Xiong Zhang,Hong Peng,Changlong Fu,Xin Jin,Yun Yang,Cheng Xie
摘要：现实世界中大量的异常节点，如假新闻，不合规用户，恶意交易和恶意帖子，严重损害了图数据生态系统的健康，迫切需要有效的识别和处理。由于异常跨越多个数据域，但在特征上存在巨大差异，跨域检测模型面临严重的域转移问题，这限制了它们在所有域中的通用性。本研究确定并定量分析了一个特定的特征不匹配模式所表现出的域转移图异常检测，我们定义为\mathcal{AD}$的异常dispertativity}问题。在对$\mathcal{AD}$问题建模的基础上，提出了一种新的基于图的异常检测模型。它在不同的图中实现了跨域泛化，只需要一个训练阶段就可以在不同的域中有效地执行。基于14个不同的真实世界图形的实验结果证实了该模型在跨域适应方面的突破，在检测准确性方面达到了开创性的最先进水平（SOTA）。综上所述，本文提出的$\mathcal{AD}$理论为泛图异常检测（GGAD）的进一步研究提供了一个新的理论视角和实践途径。该代码可从https://anonymous.4open.science/r/Anonymization-TA-GGAD/获得。
摘要：A significant number of anomalous nodes in the real world, such as fake news, noncompliant users, malicious transactions, and malicious posts, severely compromises the health of the graph data ecosystem and urgently requires effective identification and processing. With anomalies that span multiple data domains yet exhibit vast differences in features, cross-domain detection models face severe domain shift issues, which limit their generalizability across all domains. This study identifies and quantitatively analyzes a specific feature mismatch pattern exhibited by domain shift in graph anomaly detection, which we define as the \emph{Anomaly Disassortativity} issue ($\mathcal{AD}$). Based on the modeling of the issue $\mathcal{AD}$, we introduce a novel graph foundation model for anomaly detection. It achieves cross-domain generalization in different graphs, requiring only a single training phase to perform effectively across diverse domains. The experimental findings, based on fourteen diverse real-world graphs, confirm a breakthrough in the model's cross-domain adaptation, achieving a pioneering state-of-the-art (SOTA) level in terms of detection accuracy. In summary, the proposed theory of $\mathcal{AD}$ provides a novel theoretical perspective and a practical route for future research in generalist graph anomaly detection (GGAD). The code is available at https://anonymous.4open.science/r/Anonymization-TA-GGAD/.

【3】Transductive Generalization via Optimal Transport and Its Application to Graph Node Classification
标题：基于最优传输的转化概括及其在图节点分类中的应用
链接：https://arxiv.org/abs/2603.09257

作者：MoonJeong Park,Seungbeom Lee,Kyungmin Kim,Jaeseung Heo,Seunghyuk Cho,Shouheng Li,Sangdon Park,Dongwoo Kim
摘要：许多现有的转换边界依赖于经典的复杂性措施，计算上是棘手的，往往与经验行为不一致。在这项工作中，我们建立了新的基于表示的泛化边界在一个无分布的转换设置，学习表示是依赖的，测试功能在训练过程中是可访问的。我们通过最佳运输，编码特征分布之间的Wasserstein距离表示，推导出全局和类的界限。我们证明了我们的边界是有效的计算和强相关的经验推广图节点分类，改进经典的复杂性措施。此外，我们的分析揭示了GNN聚合过程如何转换表示分布，从而在类内集中和类间分离之间进行权衡。这产生深度依赖的特征，捕捉深度和泛化误差之间的非单调关系，在实践中观察到的。该代码可在https://github.com/ml-postech/Transductive-OT-Gen-Bound上获得。
摘要：Many existing transductive bounds rely on classical complexity measures that are computationally intractable and often misaligned with empirical behavior. In this work, we establish new representation-based generalization bounds in a distribution-free transductive setting, where learned representations are dependent, and test features are accessible during training. We derive global and class-wise bounds via optimal transport, expressed in terms of Wasserstein distances between encoded feature distributions. We demonstrate that our bounds are efficiently computable and strongly correlate with empirical generalization in graph node classification, improving upon classical complexity measures. Additionally, our analysis reveals how the GNN aggregation process transforms the representation distributions, inducing a trade-off between intra-class concentration and inter-class separation. This yields depth-dependent characterizations that capture the non-monotonic relationship between depth and generalization error observed in practice. The code is available at https://github.com/ml-postech/Transductive-OT-Gen-Bound.

【4】Are Expressive Encoders Necessary for Discrete Graph Generation?
标题：离散图生成需要表达编码器吗？
链接：https://arxiv.org/abs/2603.08825

作者：Jay Revolinsky,Harry Shomer,Jiliang Tang
备注：25 pages, 15 figures, 10 tables
摘要：离散图生成已经成为一种强大的图形数据建模范例，通常依赖于高度表达的神经骨干，如Transformers或高阶架构。我们通过引入GenGNN（一个用于图形生成的模块化消息传递框架）来重新审视这种设计选择。使用GenGNN的扩散模型在树和平面数据集上实现了超过90%的有效性，在图形Transformers的边缘内，推理速度快2- 5倍。对于分子生成，具有GenGNN骨架的DiGress实现了99.49%的有效性。系统的消融研究显示了每个GenGNN组件提供的益处，表明需要剩余连接来减轻复杂图形结构上的过度平滑。通过尺度分析，我们应用原则性的度量空间视图来研究学习的扩散表示，并揭示GNN是否可以成为离散扩散的表达性神经骨干。
摘要：Discrete graph generation has emerged as a powerful paradigm for modeling graph data, often relying on highly expressive neural backbones such as transformers or higher-order architectures. We revisit this design choice by introducing GenGNN, a modular message-passing framework for graph generation. Diffusion models with GenGNN achieve more than 90% validity on Tree and Planar datasets, within margins of graph transformers, at 2-5x faster inference speed. For molecule generation, DiGress with a GenGNN backbone achieves 99.49% Validity. A systematic ablation study shows the benefit provided by each GenGNN component, indicating the need for residual connections to mitigate oversmoothing on complicated graph-structure. Through scaling analyses, we apply a principled metric-space view to investigate learned diffusion representations and uncover whether GNNs can be expressive neural backbones for discrete diffusion.

【5】a-TMFG: Scalable Triangulated Maximally Filtered Graphs via Approximate Nearest Neighbors
标题：a-TMFG：通过逼近近邻的可扩展三角化最大过滤图
链接：https://arxiv.org/abs/2603.09564

作者：Lionel Yelibi
摘要：传统的三角最大过滤图（TMFG）构造需要预先计算和存储密集的相关矩阵，这限制了它对中小型数据集的适用性。在这里，我们确定了大规模使用TMFG时的关键内存和运行时复杂性挑战。然后，我们提出了近似三角形最大过滤图（a-TMFG）算法。这是一种新颖的方法，可以从TMFG启发的数据中扩展人工图的构建。该方法采用k-最近邻图（kNNG）的初始建设，并实现了内存管理策略，搜索和估计丢失的相关性的飞行。这提供了控制组合爆炸的表示。该算法的鲁棒性进行了测试的参数和噪声，并在数百万的观察数据集上进行评估。这种新方法提供了一种简洁的方式来构建图形，用于图形用作监督和无监督学习的输入但不存在自然图形的用例。
摘要：The traditional Triangular Maximally Filtered Graph (TMFG) construction requires pre-computation and storage of a dense correlation matrix; this limits its applicability to small and medium-sized datasets. Here we identify key memory and runtime complexity challenges when using TMFG at scale. We then present the Approximate Triangular Maximally Filtered Graph (a-TMFG) algorithm. This is a novel approach to scaling the construction of artificial graphs from data inspired by TMFG. The method employs k-Nearest Neighbors Graphs (kNNG) for initial construction, and implements a memory management strategy to search and estimate missing correlations on-the-fly. This provides representations to control combinatorial explosion. The algorithm is tested for robustness to the parameters and noise, and is evaluated on datasets with millions of observations. This new method provides a parsimonious way to construct graphs for use-cases where graphs are used as input to supervised and unsupervised learning but where no natural graph exists.

Transformer(6篇)

【1】Correction of Transformer-Based Models with Smoothing Pseudo-Projector
标题：用平滑伪投影器修正基于变换器的模型
链接：https://arxiv.org/abs/2603.09815

作者：Vitaly Bulgakov
备注：29 pages, 23 figures
摘要：伪投影仪是一种轻量级的修改，可以集成到现有的语言模型和其他神经网络中，而无需改变其核心架构。它可以被看作是一个隐藏表示校正器，通过抑制由标签无关的输入内容引起的方向来降低对噪声的敏感性。设计灵感来自多重网格（MG）范式，最初开发用于加速偏微分方程和边值问题迭代求解器的收敛，后来通过代数多重网格方法扩展到更一般的线性系统。我们称之为伪投影仪的方法，因为它的线性原型对应于一个严格的幂等正交投影仪，而实际的配方采用可学习的限制和延长运营商，因此，在一般情况下，不满足一个确切的正交投影的属性。我们评估所提出的方法基于transformer的文本分类任务，以及控制合成基准，证明其有效性，提高训练的动态性和鲁棒性。实验结果，以及支持的理论分析，表明在一系列设置的训练行为的一致改善，没有观察到其他不利影响。我们的下一步将是将这种方法扩展到语言模型。
摘要：The pseudo-projector is a lightweight modification that can be integrated into existing language models and other neural networks without altering their core architecture. It can be viewed as a hidden-representation corrector that reduces sensitivity to noise by suppressing directions induced by label-irrelevant input content. The design is inspired by the multigrid (MG) paradigm, originally developed to accelerate the convergence of iterative solvers for partial differential equations and boundary value problems, and later extended to more general linear systems through algebraic multigrid methods. We refer to the method as a pseudo-projector because its linear prototype corresponds to a strictly idempotent orthogonal projector, whereas the practical formulation employs learnable restriction and prolongation operators and therefore does not, in general, satisfy the properties of an exact orthogonal projection. We evaluate the proposed approach on transformer-based text classification tasks, as well as controlled synthetic benchmarks, demonstrating its effectiveness in improving training dynamics and robustness. Experimental results, together with supporting theoretical heuristics, indicate consistent improvements in training behavior across a range of settings, with no adverse effects observed otherwise. Our next step will be to extend this approach to language models.

【2】An Optimal Control Approach To Transformer Training
标题：Transformer训练的最优控制方法
链接：https://arxiv.org/abs/2603.09571

作者：Kağan Akman,Naci Saldı,Serdar Yüksel
摘要：在本文中，我们开发了一种严格的最优控制理论方法来训练Transformer，该方法尊重关键的结构约束，例如（i）执行过程中的实现输入独立性，（ii）问题的集成控制性质，以及（iii）位置依赖性。我们将Transformer架构建模为具有共享动作的离散时间受控粒子系统，表现出无噪声的McKean-Vlasov动力学。虽然由此产生的动态不是马尔可夫的，我们表明，提升它的概率措施产生一个完全观察到的马尔可夫决策过程（MDP）。位置编码被纳入状态空间，以保持提升下的序列顺序。利用动态规划原理，我们建立了存在性的全局最优政策的温和假设下的紧凑性。我们进一步证明了提升的闭环策略等价于依赖于初始分布的开环策略，它们与实现输入无关，并且与标准Transformer训练兼容。为了训练一个Transformer，我们提出了一个三重量化的训练过程，通过量化的状态空间，概率测度的空间，和动作空间的提升MDP，并表明，任何最优的策略，三重量化模型是接近最优的原始训练问题。最后，我们建立了提升模型的稳定性和经验一致性，证明了随着数据大小的增加，价值函数对于初始经验测度的扰动和政策的收敛是连续的。这种方法为基于梯度的训练提供了一种全局最优和鲁棒的替代方案，而不需要平滑度或凸度。
摘要：In this paper, we develop a rigorous optimal control-theoretic approach to Transformer training that respects key structural constraints such as (i) realized-input-independence during execution, (ii) the ensemble control nature of the problem, and (iii) positional dependence. We model the Transformer architecture as a discrete-time controlled particle system with shared actions, exhibiting noise-free McKean-Vlasov dynamics. While the resulting dynamics is not Markovian, we show that lifting it to probability measures produces a fully-observed Markov decision process (MDP). Positional encodings are incorporated into the state space to preserve the sequence order under lifting. Using the dynamic programming principle, we establish the existence of globally optimal policies under mild assumptions of compactness. We further prove that closed-loop policies in the lifted is equivalent to an initial-distribution dependent open-loop policy, which are realized-input-independent and compatible with standard Transformer training. To train a Transformer, we propose a triply quantized training procedure for the lifted MDP by quantizing the state space, the space of probability measures, and the action space, and show that any optimal policy for the triply quantized model is near-optimal for the original training problem. Finally, we establish stability and empirical consistency properties of the lifted model by showing that the value function is continuous with respect to the perturbations of the initial empirical measures and convergence of policies as the data size increases. This approach provides a globally optimal and robust alternative to gradient-based training without requiring smoothness or convexity.

【3】TrainDeeploy: Hardware-Accelerated Parameter-Efficient Fine-Tuning of Small Transformer Models at the Extreme Edge
标题：TrainDeeploy：硬件加速的参数高效微调小型Transformer模型
链接：https://arxiv.org/abs/2603.09511

作者：Run Wang,Victor J. B. Jung,Philip Wiese,Francesco Conti,Alessio Burrello,Luca Benini
备注：Accepted at DATE 2026 (Design, Automation and Test in Europe). 7 pages, 6 figures
摘要：深度神经网络的设备上调整可以在边缘实现长期适应，同时保护数据隐私。然而，反向传播的高计算和存储器需求对超低功耗、存储器受限的极端边缘设备提出了重大挑战。由于基于注意力的模型的架构复杂性和计算规模，这些挑战进一步放大。我们提出了TrainDeeploy，这是一个框架，它在异构超低功耗片上系统（SoC）上统一了高效的推理和设备上的训练。TrainDeeploy为支持卷积神经网络（CNN）和Transformer模型的极端边缘SoC提供了第一个完整的设备上训练管道，以及多种训练策略，如选择性分层微调和低秩自适应（LoRA）。在基于RISC-V的异构SoC上，我们演示了紧凑型卷积Transformer（CCT）的首次端到端设备上微调，每秒可实现多达11个训练图像。我们发现，与完全反向传播相比，LoRA将动态内存使用量减少了23%，可训练参数和梯度的数量减少了15倍，内存传输量减少了1.6倍。TrainDeeploy在CCT上实现了高达4.6 FLOP/cycle（0.28 M参数，71- 126 M FLOP），在Deep-AE上实现了高达13.4 FLOP/cycle（0.27 M参数，0.8 M FLOP），同时扩展了先前框架的范围，以支持CNN和Transformer模型，并在极端边缘平台上进行参数高效调整。
摘要：On-device tuning of deep neural networks enables long-term adaptation at the edge while preserving data privacy. However, the high computational and memory demands of backpropagation pose significant challenges for ultra-low-power, memory-constrained extreme-edge devices. These challenges are further amplified for attention-based models due to their architectural complexity and computational scale. We present TrainDeeploy, a framework that unifies efficient inference and on-device training on heterogeneous ultra-low-power System-on-Chips (SoCs). TrainDeeploy provides the first complete on-device training pipeline for extreme-edge SoCs supporting both Convolutional Neural Networks (CNNs) and Transformer models, together with multiple training strategies such as selective layer-wise fine-tuning and Low-Rank Adaptation (LoRA). On a RISC-V-based heterogeneous SoC, we demonstrate the first end-to-end on-device fine-tuning of a Compact Convolutional Transformer (CCT), achieving up to 11 trained images per second. We show that LoRA reduces dynamic memory usage by 23%, decreases the number of trainable parameters and gradients by 15x, and reduces memory transfer volume by 1.6x compared to full backpropagation. TrainDeeploy achieves up to 4.6 FLOP/cycle on CCT (0.28M parameters, 71-126M FLOPs) and up to 13.4 FLOP/cycle on Deep-AE (0.27M parameters, 0.8M FLOPs), while expanding the scope of prior frameworks to support both CNN and Transformer models with parameter-efficient tuning on extreme-edge platforms.

【4】Variational Routing: A Scalable Bayesian Framework for Calibrated Mixture-of-Experts Transformers
标题：变分路由：用于校准混合专家变形器的可扩展Bayesian框架
链接：https://arxiv.org/abs/2603.09453

作者：Albus Yizhuo Li,Matthew Wicker
备注：8 pages, 7 figures for main text; 16 pages for Appendix; In submission to ICML 2026;
摘要：基础模型越来越多地部署在理解其输出的不确定性对于确保负责任的部署至关重要的环境中。虽然贝叶斯方法提供了一种原则性的方法来量化不确定性，但它们的计算开销使得它们在基础模型规模上的训练或推理不切实际。最先进的模型通过精心设计的稀疏性（包括专家混合（MoE）层）实现数万亿的参数计数。在这项工作中，我们证明了校准的不确定性，通过引入变分混合专家路由（VMoER），结构化贝叶斯方法建模不确定性的MoE层。VMoER将贝叶斯推理限制在专家选择阶段，这通常由确定性路由网络完成。我们实例VMoER使用两种推理策略：摊销变分推理路由logits和推断随机专家选择的温度参数。在经过测试的基础模型中，VMoER将噪声下的路由稳定性提高了38%，将校准误差降低了94%，并将分布外AUROC提高了12%，同时产生不到1%的额外FLOP。这些结果表明，VMoER提供了一个可扩展的路径强大的和不确定性的基础模型。
摘要：Foundation models are increasingly being deployed in contexts where understanding the uncertainty of their outputs is critical to ensuring responsible deployment. While Bayesian methods offer a principled approach to uncertainty quantification, their computational overhead renders their use impractical for training or inference at foundation model scale. State-of-the-art models achieve parameter counts in the trillions through carefully engineered sparsity including Mixture-of-Experts (MoE) layers. In this work, we demonstrate calibrated uncertainty at scale by introducing Variational Mixture-of-Experts Routing (VMoER), a structured Bayesian approach for modelling uncertainty in MoE layers. VMoER confines Bayesian inference to the expert-selection stage which is typically done by a deterministic routing network. We instantiate VMoER using two inference strategies: amortised variational inference over routing logits and inferring a temperature parameter for stochastic expert selection. Across tested foundation models, VMoER improves routing stability under noise by 38\%, reduces calibration error by 94\%, and increases out-of-distribution AUROC by 12\%, while incurring less than 1\% additional FLOPs. These results suggest VMoER offers a scalable path toward robust and uncertainty-aware foundation models.

【5】The Radio-Frequency Transformer for Signal Separation
标题：用于信号分离的射频Transformer
链接：https://arxiv.org/abs/2603.09201

作者：Egor Lifar,Semyon Savkin,Rachana Madhukara,Tejas Jayashankar,Yury Polyanskiy,Gregory W. Wornell
摘要：我们研究了一个信号分离的问题：估计一个被未知的非高斯背景/干扰污染的感兴趣的信号（SOI）。给定由SOI和干扰示例组成的训练数据，我们将展示如何构建一个完全数据驱动的信号分离器。为此，我们学习了一个很好的SOI离散令牌化器，然后在交叉熵损失上训练一个端到端的Transformer。与传统的均方误差（MSE）相比，使用交叉熵的训练显示出实质性的改进。我们的tokenizer是Google的SoundStream的修改，它包含了额外的Transformer层，并从VQVAE切换到有限标量量化（FSQ）。在麻省理工学院RF挑战数据集的真实和合成混合物中，我们的方法实现了具有竞争力的性能，包括与现有技术相比将QPSK信号与5G干扰分离的误码率（BER）降低了122倍。学习的表示适应干扰类型，没有边信息，并显示zero-shot泛化到看不见的混合物在推理时间，强调其潜力超越RF。虽然我们在射频混合物上实例化了我们的方法，但我们希望相同的架构适用于引力波数据（例如，LIGO应变）和其他需要对背景和噪声进行数据驱动建模的科学传感问题。
摘要：We study a problem of signal separation: estimating a signal of interest (SOI) contaminated by an unknown non-Gaussian background/interference. Given the training data consisting of examples of SOI and interference, we show how to build a fully data-driven signal separator. To that end we learn a good discrete tokenizer for SOI and then train an end-to-end transformer on a cross-entropy loss. Training with a cross-entropy shows substantial improvements over the conventional mean-squared error (MSE). Our tokenizer is a modification of Google's SoundStream, which incorporates additional transformer layers and switches from VQVAE to finite-scalar quantization (FSQ). Across real and synthetic mixtures from the MIT RF Challenge dataset, our method achieves competitive performance, including a 122x reduction in bit-error rate (BER) over prior state-of-the-art techniques for separating a QPSK signal from 5G interference. The learned representation adapts to the interference type without side information and shows zero-shot generalization to unseen mixtures at inference time, underscoring its potential beyond RF. Although we instantiate our approach on radio-frequency mixtures, we expect the same architecture to apply to gravitational-wave data (e.g., LIGO strain) and other scientific sensing problems that require data-driven modeling of background and noise.

【6】GIAT: A Geologically-Informed Attention Transformer for Lithology Identification
标题：GIAT：一种基于地质信息的岩性识别注意力Transformer
链接：https://arxiv.org/abs/2603.09165

作者：Jie Li,Qishun Yang,Nuo Li
摘要：从测井资料中准确识别岩性是地下资源评价的关键。虽然基于Transformer的模型擅长于序列建模，但其“黑匣子”性质和缺乏地质指导限制了其性能和可信度。为了克服这些局限性，这封信提出了地质信息注意力Transformer（GIAT），这是一个新的框架，将数据驱动的地质先验与Transformer的注意力机制深度融合。GIAT的核心是一种新的注意偏向机制。我们重新利用类别序列相关（CSC）过滤器来生成一个地质信息的关系矩阵，该矩阵被注入到自注意力计算中，以明确地引导模型走向地质连贯模式。在两个具有挑战性的数据集上，GIAT实现了最先进的性能，准确率高达95.4%，显著优于现有模型。更重要的是，GIAT在输入扰动下表现出异常的解释忠诚度，并产生地质连贯的预测。我们的工作为地球科学应用构建更准确、可靠和可解释的深度学习模型提供了一种新范式。
摘要：Accurate lithology identification from well logs is crucial for subsurface resource evaluation. Although Transformer-based models excel at sequence modeling, their "black-box" nature and lack of geological guidance limit their performance and trustworthiness. To overcome these limitations, this letter proposes the Geologically-Informed Attention Transformer (GIAT), a novel framework that deeply fuses data-driven geological priors with the Transformer's attention mechanism. The core of GIAT is a new attention-biasing mechanism. We repurpose Category-Wise Sequence Correlation (CSC) filters to generate a geologically-informed relational matrix, which is injected into the self-attention calculation to explicitly guide the model toward geologically coherent patterns. On two challenging datasets, GIAT achieves state-of-the-art performance with an accuracy of up to 95.4%, significantly outperforming existing models. More importantly, GIAT demonstrates exceptional interpretation faithfulness under input perturbations and generates geologically coherent predictions. Our work presents a new paradigm for building more accurate, reliable, and interpretable deep learning models for geoscience applications.

GAN|对抗|攻击|生成相关(3篇)

【1】ActiveUltraFeedback: Efficient Preference Data Generation using Active Learning
标题：Active UltraFeedback：使用主动学习高效的偏好数据生成
链接：https://arxiv.org/abs/2603.09692

作者：Davit Melikidze,Marian Schneider,Jessica Lam,Martin Wertich,Ido Hakimi,Barna Pásztor,Andreas Krause
备注：35 pages, 6 figures, 24 tables
摘要：基于人类反馈的强化学习（RLHF）已经成为调整大型语言模型（LLM）的标准，但其有效性受到获取偏好数据的高成本的影响，特别是在低资源和专家领域。为了解决这个问题，我们引入了ACTIVEULTRAFEEDBACK，这是一个模块化的主动学习管道，它利用不确定性估计来动态地识别最具信息性的注释响应。我们的管道促进了标准响应选择方法的系统评估，以及双反向汤普森采样（DRTS）和DELTAUCB，这两种新方法优先考虑具有较大预测质量差距的响应对，利用最近的结果表明，这些对提供了良好的信号进行微调。我们的实验表明，ACTIVEULTRAFEEDBACK产生高质量的数据集，导致下游性能的显着改善，特别是实现可比或优越的结果，只有六分之一的注释数据相对于静态基线。我们的管道可以在https://github.com/lasgroup/ActiveUltraFeedback上找到，我们的偏好数据集可以在https://huggingface.co/ActiveUltraFeedback上找到。
摘要：Reinforcement Learning from Human Feedback (RLHF) has become the standard for aligning Large Language Models (LLMs), yet its efficacy is bottlenecked by the high cost of acquiring preference data, especially in low-resource and expert domains. To address this, we introduce ACTIVEULTRAFEEDBACK, a modular active learning pipeline that leverages uncertainty estimates to dynamically identify the most informative responses for annotation. Our pipeline facilitates the systematic evaluation of standard response selection methods alongside DOUBLE REVERSE THOMPSON SAMPLING (DRTS) and DELTAUCB, two novel methods prioritizing response pairs with large predicted quality gaps, leveraging recent results showing that such pairs provide good signals for fine-tuning. Our experiments demonstrate that ACTIVEULTRAFEEDBACK yields high-quality datasets that lead to significant improvements in downstream performance, notably achieving comparable or superior results with as little as one-sixth of the annotated data relative to static baselines. Our pipeline is available at https://github.com/lasgroup/ActiveUltraFeedback and our preference datasets at https://huggingface.co/ActiveUltraFeedback.

【2】Sim2Act: Robust Simulation-to-Decision Learning via Adversarial Calibration and Group-Relative Perturbation
标题：Sim 2Act：通过对抗校准和群体相对扰动进行稳健的模拟到决策学习
链接：https://arxiv.org/abs/2603.09053

作者：Hongyu Cao,Jinghan Zhang,Kunpeng Liu,Dongjie Wang,Feng Xia,Haifeng Chen,Xiaohua Hu,Yanjie Fu
备注：9 pages, 5 figures
摘要：模拟到决策学习可以在数字环境中进行安全的策略培训，而不会危及现实世界的部署，并且在供应链和工业系统等关键任务领域中变得至关重要。然而，从嘈杂或有偏见的真实世界数据中学习的模拟器通常会在决策关键区域表现出预测错误，导致不稳定的动作排名和不可靠的策略。现有的方法要么专注于提高平均模拟保真度，要么采用保守的正则化，这可能会导致政策崩溃，放弃高风险高回报的行动。我们提出了Sim 2Act，一个强大的模拟决策框架，解决了模拟器和政策的鲁棒性。首先，我们引入了一种对抗性校准机制，该机制对决策关键状态-动作对中的模拟错误进行重新加权，以使代理保真度与下游决策影响保持一致。其次，我们开发了一个组相对扰动策略，稳定的政策学习下模拟器的不确定性，而不强制执行过于悲观的约束。在多个供应链基准上进行的大量实验表明，在结构化和非结构化扰动下，仿真鲁棒性得到了提高，决策性能更加稳定。
摘要：Simulation-to-decision learning enables safe policy training in digital environments without risking real-world deployment, and has become essential in mission-critical domains such as supply chains and industrial systems. However, simulators learned from noisy or biased real-world data often exhibit prediction errors in decision-critical regions, leading to unstable action ranking and unreliable policies. Existing approaches either focus on improving average simulation fidelity or adopt conservative regularization, which may cause policy collapse by discarding high-risk high-reward actions. We propose Sim2Act, a robust simulation-to-decision framework that addresses both simulator and policy robustness. First, we introduce an adversarial calibration mechanism that re-weights simulation errors in decision-critical state-action pairs to align surrogate fidelity with downstream decision impact. Second, we develop a group-relative perturbation strategy that stabilizes policy learning under simulator uncertainty without enforcing overly pessimistic constraints. Extensive experiments on multiple supply chain benchmarks demonstrate improved simulation robustness and more stable decision performance under structured and unstructured perturbations.

【3】KernelCraft: Benchmarking for Agentic Close-to-Metal Kernel Generation on Emerging Hardware
标题：KernelCraft：新兴硬件上大型接近金属内核生成的基准测试
链接：https://arxiv.org/abs/2603.08721

作者：Jiayi Nie,Haoran Wu,Yao Lai,Zeyu Cao,Cheng Zhang,Binglei Lou,Erwei Wang,Jianyi Cheng,Timothy M. Jones,Robert Mullins,Rika Antonova,Yiren Zhao
摘要：具有新型指令集架构（ISA）的新AI加速器通常需要开发人员手动制作低级内核-这是一个耗时，费力且容易出错的过程，无法跨不同的硬件目标扩展。这阻碍了新兴硬件平台有效地进入市场。虽然之前基于LLM的代码生成在成熟的GPU生态系统中表现出了希望，但目前仍不清楚代理LLM系统是否可以快速为具有新ISA的新兴硬件生成有效和高效的内核。我们介绍KernelCraft：这是第一个评估LLM代理通过函数调用、反馈驱动的工作流为定制加速器生成和优化低级内核的能力的基准测试。在KernelCraft中，代理在ISA和硬件约束下使用来自编译检查，模拟和正确性验证的自动反馈来改进内核。在我们的实验中，我们评估了三个新兴加速器平台上超过20个ML任务的代理性能，每个任务有5种不同的任务配置，并对任务配置复杂性进行了特殊评估。在四个领先的推理模型中，顶级代理在几个细化步骤内为以前看不见的ISA生成功能有效的内核，优化的内核匹配或优于基于模板的编译器基线。有了这一点，我们证明了潜在的加速器设计师和内核开发人员的内核开发成本降低。
摘要：New AI accelerators with novel instruction set architectures (ISAs) often require developers to manually craft low-level kernels -- a time-consuming, laborious, and error-prone process that cannot scale across diverse hardware targets. This prevents emerging hardware platforms from reaching the market efficiently. While prior LLM-based code generation has shown promise in mature GPU ecosystems, it remains unclear whether agentic LLM systems can quickly produce valid and efficient kernels for emerging hardware with new ISAs. We present KernelCraft: the first benchmark to evaluate an LLM agent's ability to generate and optimize low-level kernels for customized accelerators via a function-calling, feedback-driven workflow. Within KernelCraft, the agent refines kernels under ISA and hardware constraints using automated feedback derived from compilation checks, simulation, and correctness validation against ground truth. In our experiments, we assess agent performance across three emerging accelerator platforms on more than 20 ML tasks, each with 5 diverse task configurations, with special evaluation of task configuration complexity. Across four leading reasoning models, top agents produce functionally valid kernels for previously unseen ISAs within a few refinement steps, with optimized kernels that match or outperform template-based compiler baselines. With that, we demonstrate the potential for reducing the cost of kernel development for accelerator designers and kernel developers.

半/弱/无/有监督|不确定性|主动学习(3篇)

【1】Good Reasoning Makes Good Demonstrations: Implicit Reasoning Quality Supervision via In-Context Reinforcement Learning
标题：好的推理造就好的演示：通过上下文强化学习的隐式推理质量监督
链接：https://arxiv.org/abs/2603.09803

作者：Tiehua Mei,Minxuan Lv,Leiyu Pan,Zhenpeng Su,Hongru Hou,Hengrui Chen,Ao Xu,Deqing Yang
摘要：带有可验证奖励的强化学习（RLVR）改进了大型语言模型中的推理，但平等对待所有正确的解决方案，可能会加强偶然获得正确答案的有缺陷的痕迹。我们观察到，更好的推理是更好的老师：高质量的解决方案比低质量的解决方案更有效。我们把这种教学能力示范效用，并表明，政策模型自己的上下文中的学习能力提供了一种有效的方法来衡量它，产生一个质量信号，称为证据增益。为了在训练过程中使用这个信号，我们引入了In-Context RLVR。通过贝叶斯分析，我们表明，这一目标隐含地通过证据增益重新加权奖励，将较高的权重分配给高质量的轨迹，将较低的权重分配给低质量的轨迹，而不需要昂贵的计算或外部评估。在数学基准测试上的实验表明，与标准RLVR相比，准确性和推理质量都有所提高。
摘要：Reinforcement Learning with Verifiable Rewards (RLVR) improves reasoning in large language models but treats all correct solutions equally, potentially reinforcing flawed traces that get correct answers by chance. We observe that better reasoning are better teachers: high-quality solutions serve as more effective demonstrations than low-quality ones. We term this teaching ability Demonstration Utility, and show that the policy model's own in-context learning ability provides an efficient way to measure it, yielding a quality signal termed Evidence Gain. To employ this signal during training, we introduce In-Context RLVR. By Bayesian analysis, we show that this objective implicitly reweights rewards by Evidence Gain, assigning higher weights to high-quality traces and lower weights to low-quality ones, without requiring costly computation or external evaluators. Experiments on mathematical benchmarks show improvements in both accuracy and reasoning quality over standard RLVR.

【2】Cross-Domain Uncertainty Quantification for Selective Prediction: A Comprehensive Bound Ablation with Transfer-Informed Betting
标题：用于选择性预测的跨域不确定性量化：采用转移知情投注的全面绑定消融
链接：https://arxiv.org/abs/2603.08907

作者：Abhinaba Basu
摘要：我们提出了一个全面的消融的9个有限样本约束家庭的选择性预测与风险控制，结合浓度不等式（Hoeffding，经验伯恩斯坦，Clopper-Pearson，Wasserstein DRO，CVaR）与多重检验校正（联合约束，学习然后测试固定序列）和基于投注的置信序列（WSR）。我们的主要理论贡献是转移知情投注（TIB），它使用源域的风险状况来启动WSR财富过程，在数据稀缺的环境中实现更严格的边界，并提供正式的优势保证。我们证明了TIB财富过程仍然是一个有效的上鞅下的所有源目标分歧，TIB主导标准WSR域匹配时，没有数据独立的热启动可以实现更好的收敛。据我们所知，基于投注的置信序列，LTT单调测试和跨域转移的组合是文献中不存在的三向新颖性。我们评估了所有九个绑定的家庭在四个基准-MASSIVE（n=1,102），NyayaBench（n=280），CLINC-150（n=22.5K），和Banking 77（n= 13 K）-在18（α，δ）配置。在α =0.10的MASSIVE上，LTT消除了ln（K）联合约束惩罚，实现了94.0%的保证覆盖率，而Hoeffding为73.8%，相对提高了27%。在NyayaBench上，小的校准集使得Hoeffding家族界限在alpha=0.20以下不可行，Transfer-Informed Betting在alpha=0.10时实现了18.5%的覆盖率，比LTT + Hoeffding提高了5.4倍。我们还比较了分裂共形预测，共形方法产生的预测集（平均。1.67类），而选择性预测提供单一预测风险保证。我们将这些方法应用于代理缓存系统，正式确定一个渐进的信任模型，保证确定缓存的响应时，可以自主服务。
摘要：We present a comprehensive ablation of nine finite-sample bound families for selective prediction with risk control, combining concentration inequalities (Hoeffding, Empirical Bernstein, Clopper-Pearson, Wasserstein DRO, CVaR) with multiple-testing corrections (union bound, Learn Then Test fixed-sequence) and betting-based confidence sequences (WSR). Our main theoretical contribution is Transfer-Informed Betting (TIB), which warm-starts the WSR wealth process using a source domain's risk profile, achieving tighter bounds in data-scarce settings with a formal dominance guarantee. We prove that the TIB wealth process remains a valid supermartingale under all source-target divergences, that TIB dominates standard WSR when domains match, and that no data-independent warm-start can achieve better convergence. The combination of betting-based confidence sequences, LTT monotone testing, and cross-domain transfer is, to our knowledge, a three-way novelty not present in the literature. We evaluate all nine bound families on four benchmarks-MASSIVE (n=1,102), NyayaBench (n=280), CLINC-150 (n=22.5K), and Banking77 (n=13K)-across 18 (alpha, delta) configurations. On MASSIVE at alpha=0.10, LTT eliminates the ln(K) union-bound penalty, achieving 94.0% guaranteed coverage versus 73.8% for Hoeffding-a 27% relative improvement. On NyayaBench, where the small calibration set makes Hoeffding-family bounds infeasible below alpha=0.20, Transfer-Informed Betting achieves 18.5% coverage at alpha=0.10, a 5.4x improvement over LTT + Hoeffding. We additionally compare with split-conformal prediction, showing that conformal methods produce prediction sets (avg. 1.67 classes) whereas selective prediction provides single-prediction risk guarantees. We apply these methods to agentic caching systems, formalizing a progressive trust model where the guarantee determines when cached responses can be served autonomously.

【3】Adaptive Active Learning for Online Reliability Prediction of Satellite Electronics
标题：卫星电子设备在线可靠性预测的自适应主动学习
链接：https://arxiv.org/abs/2603.09058

作者：Shixiang Li,Yubin Tian,Dianpeng Wang,Piao Chen,Mengying Ren
摘要：卫星电子设备在轨可靠性的准确预测往往受到有限的数据可用性、变化的操作条件和相当大的单元间可变性的阻碍。为了克服这些障碍，本文提出了一种新的集成在线可靠性预测框架。主要贡献有两方面。首先，维纳过程为基础的退化模型的开发，将广义Arrhenius链接函数，个人的随机效应，相邻单位之间的空间相关性。进一步设计了定制的最大似然估计方法，以促进高效和准确的参数推断。其次，设计了两阶段主动学习采样方案，自适应提高预测精度。该策略首先根据空间配置选择代表性单元，然后使用平衡单元特定信息、模型不确定性和降解动态的综合标准确定最佳采样时间。数值实验和天宫空间站的实际算例表明，该方法在显著降低数据需求的同时，显著提高了可靠性预测精度，为复杂卫星电子系统的故障预测和健康管理提供了有效的解决方案。
摘要：Accurate on-orbit reliability prediction for satellite electronics is often hindered by limited data availability, varying operational conditions, and considerable unit-to-unit variability. To overcome these obstacles, this paper proposes a novel integrated online reliability prediction framework. The main contributions are twofold. First, a Wiener process-based degradation model is developed, incorporating a generalized Arrhenius link function, individual random effects, and spatial correlations among adjacent units. A customized maximum likelihood estimation method is further devised to facilitate efficient and accurate parameter inference. Second, a two-stage active learning sampling scheme is designed to adaptively enhance prediction accuracy. This strategy initially selects representative units based on spatial configuration, and subsequently determines optimal sampling times using a comprehensive criterion that balances unit-specific information, model uncertainty, and degradation dynamics. Numerical experiments and a practical case study from the Tiangong space station demonstrate that the proposed method markedly improves reliability prediction accuracy while significantly reducing data requirements, offering an efficient solution for the prognostic and health management of complex satellite electronic systems.

迁移|Zero/Few/One-Shot|自适应(4篇)

【1】OptEMA: Adaptive Exponential Moving Average for Stochastic Optimization with Zero-Noise Optimality
标题：OptEMA：具有零噪音最优性的随机优化自适应指数移动平均
链接：https://arxiv.org/abs/2603.09923

作者：Ganzhao Yuan
摘要：指数移动平均线（EMA）是Adam等广泛使用的优化器的基石。然而，亚当式方法的现有理论分析具有明显的局限性：它们的保证在零噪声状态下可能保持次优，依赖于限制性有界条件（例如，有界梯度或目标间隙），使用恒定或开环步长，或需要Lipschitz常数的先验知识。为了克服这些瓶颈，我们引入了OptEMA并分析了两种新的变体：OptEMA-M，它将自适应的，递减的EMA系数应用于具有固定二阶衰减的一阶矩，以及OptEMA-V，它交换了这些角色。关键是，OptEMA是闭环和Lipschitz自由的，因为它的有效步长是随机的，并且不需要Lipschitz常数进行参数化。在标准的随机梯度下降（SGD）的假设下，即平滑，一个下界的目标，无偏梯度有界方差，我们建立严格的收敛保证。这两种变体都实现了平均梯度范数的噪声自适应收敛速度$\widetilde{\mathcal{O}}（T^{-1/2}+σ^{1/2} T^{-1/4}）$，其中$σ$是噪声水平。特别是，在零噪声制度，其中$σ=0$，我们的界限减少到接近最佳的确定性率$\widetilde{\mathcal{O}}（T^{-1/2}）$没有手动超参数重新调整。
摘要：The Exponential Moving Average (EMA) is a cornerstone of widely used optimizers such as Adam. However, existing theoretical analyses of Adam-style methods have notable limitations: their guarantees can remain suboptimal in the zero-noise regime, rely on restrictive boundedness conditions (e.g., bounded gradients or objective gaps), use constant or open-loop stepsizes, or require prior knowledge of Lipschitz constants. To overcome these bottlenecks, we introduce OptEMA and analyze two novel variants: OptEMA-M, which applies an adaptive, decreasing EMA coefficient to the first-order moment with a fixed second-order decay, and OptEMA-V, which swaps these roles. Crucially, OptEMA is closed-loop and Lipschitz-free in the sense that its effective stepsizes are trajectory-dependent and do not require the Lipschitz constant for parameterization. Under standard stochastic gradient descent (SGD) assumptions, namely smoothness, a lower-bounded objective, and unbiased gradients with bounded variance, we establish rigorous convergence guarantees. Both variants achieve a noise-adaptive convergence rate of $\widetilde{\mathcal{O}}(T^{-1/2}+σ^{1/2} T^{-1/4})$ for the average gradient norm, where $σ$ is the noise level. In particular, in the zero-noise regime where $σ=0$, our bounds reduce to the nearly optimal deterministic rate $\widetilde{\mathcal{O}}(T^{-1/2})$ without manual hyperparameter retuning.

【2】CarbonBench: A Global Benchmark for Upscaling of Carbon Fluxes Using Zero-Shot Learning
标题：CarbonBench：使用Zero-Shot学习升级碳通量的全球基准
链接：https://arxiv.org/abs/2603.09868

作者：Aleksei Rozanov,Arvind Renganathan,Yimeng Zhang,Vipin Kumar
摘要：准确量化陆地碳交换对于气候政策和碳核算至关重要，但模型必须推广到稀疏涡度协方差观测中代表性不足的生态系统。尽管这一挑战是时间序列回归的zero-shot空间迁移学习的自然实例，但没有标准化的基准来严格评估具有不同气候制度和植被类型的地理上不同的位置的模型性能。我们介绍了CarbonBench，这是碳通量升级中zero-shot空间传输的第一个基准。CarbonBench包含来自全球567个通量塔站点的超过130万次每日观测（2000-2024）。它规定：（1）分层评估协议，明确测试未见过的植被类型和气候状况的泛化，将空间转移与时间自相关分离;（2）一套协调的遥感和气象特征，以实现灵活的架构设计;（3）从基于树的方法到域泛化架构的基线。通过将机器学习方法和地球系统科学联系起来，CarbonBench旨在实现迁移学习方法的系统比较，作为分布变化下回归的测试平台，并为下一代气候建模工作做出贡献。
摘要：Accurately quantifying terrestrial carbon exchange is essential for climate policy and carbon accounting, yet models must generalize to ecosystems underrepresented in sparse eddy covariance observations. Despite this challenge being a natural instance of zero-shot spatial transfer learning for time series regression, no standardized benchmark exists to rigorously evaluate model performance across geographically distinct locations with different climate regimes and vegetation types. We introduce CarbonBench, the first benchmark for zero-shot spatial transfer in carbon flux upscaling. CarbonBench comprises over 1.3 million daily observations from 567 flux tower sites globally (2000-2024). It provides: (1) stratified evaluation protocols that explicitly test generalization across unseen vegetation types and climate regimes, separating spatial transfer from temporal autocorrelation; (2) a harmonized set of remote sensing and meteorological features to enable flexible architecture design; and (3) baselines ranging from tree-based methods to domain-generalization architectures. By bridging machine learning methodologies and Earth system science, CarbonBench aims to enable systematic comparison of transfer learning methods, serves as a testbed for regression under distribution shift, and contributes to the next-generation climate modeling efforts.

【3】Exploiting Label-Aware Channel Scoring for Adaptive Channel Pruning in Split Learning
标题：利用标签感知频道评分在分裂学习中进行自适应频道修剪
链接：https://arxiv.org/abs/2603.09792

作者：Jialei Tan,Zheng Lin,Xiangming Cai,Ruoxi Zhu,Zihan Fang,Pingping Chen,Wei Ni
备注：6 pages, 6 figures,
摘要：分裂学习（SL）将大部分训练工作量转移到服务器，这减轻了客户端设备上的计算负担。然而，被称为破碎数据的中间特征表示的传输引起显著的通信开销，特别是当涉及大量客户端设备时。为了解决这个问题，我们提出了一个自适应通道修剪辅助SL（ACP-SL）计划。在ACP-SL中，标签感知渠道重要性评分（LCIS）模块旨在生成渠道重要性评分，区分重要渠道和不太重要的渠道。基于这些分数，一个自适应信道修剪（ACP）模块被开发来修剪不太重要的信道，从而压缩相应的粉碎数据，减少通信开销。实验结果表明，ACP-SL的测试精度始终优于基准计划。此外，它在更少的训练轮中达到目标测试精度，从而减少通信开销。
摘要：Split learning (SL) transfers most of the training workload to the server, which alleviates computational burden on client devices. However, the transmission of intermediate feature representations, referred to as smashed data, incurs significant communication overhead, particularly when a large number of client devices are involved. To address this challenge, we propose an adaptive channel pruning-aided SL (ACP-SL) scheme. In ACP-SL, a label-aware channel importance scoring (LCIS) module is designed to generate channel importance scores, distinguishing important channels from less important ones. Based on these scores, an adaptive channel pruning (ACP) module is developed to prune less important channels, thereby compressing the corresponding smashed data and reducing the communication overhead. Experimental results show that ACP-SL consistently outperforms benchmark schemes in test accuracy. Furthermore, it reaches a target test accuracy in fewer training rounds, thereby reducing communication overhead.

【4】Efficiently Aligning Draft Models via Parameter- and Data-Efficient Adaptation
标题：通过参数和数据高效的适应有效地调整草案模型
链接：https://arxiv.org/abs/2603.09527

作者：Luxi Lin,Zhihang Lin,Zhanpeng Zeng,Yuhao Chen,Qingyu Zhang,Jixiang Luo,Xuelong Li,Rongrong Ji
备注：10 pages
摘要：推测性解码加速了LLM推理，但当目标模型针对特定域进行微调时，性能会下降。一个简单的解决方案是为每个目标模型重新训练草稿模型，这是昂贵且低效的。为了解决这个问题，我们引入了一个参数和数据高效的框架，名为高效的草案适应，简称为EDA，有效地适应草案模型。EDA引入了三项创新：（1）解耦架构，其利用共享和私有组件来分别对共享和目标特定的输出分布进行建模，从而通过仅更新轻量级私有组件来实现参数高效的自适应;（2）数据再生策略，其利用微调的目标模型来再生训练数据，从而改善训练和推测解码之间的对准，从而导致更高的平均接受长度;（3）样本选择机制，其优先化高价值数据以用于有效适配。我们的实验表明，EDA有效地恢复投机性能微调模型，实现卓越的平均接受长度显着降低培训成本相比，完全再培训。代码可在https://github.com/Lyn-Lucy/Efficient-Draft-Adaptation上获得。
摘要：Speculative decoding accelerates LLM inference but suffers from performance degradation when target models are fine-tuned for specific domains. A naive solution is to retrain draft models for every target model, which is costly and inefficient. To address this, we introduce a parameter- and data-efficient framework named Efficient Draft Adaptation, abbreviated as EDA, for efficiently adapting draft models. EDA introduces three innovations: (1) a decoupled architecture that utilizes shared and private components to model the shared and target-specific output distributions separately, enabling parameter-efficient adaptation by updating only the lightweight private component;(2) a data regeneration strategy that utilizes the fine-tuned target model to regenerate training data, thereby improving the alignment between training and speculative decoding, leading to higher average acceptance length;(3) a sample selection mechanism that prioritizes high-value data for efficient adaptation. Our experiments show that EDA effectively restores speculative performance on fine-tuned models, achieving superior average acceptance lengths with significantly reduced training costs compared to full retraining. Code is available at https://github.com/Lyn-Lucy/Efficient-Draft-Adaptation.

强化学习(8篇)

【1】Impact of Markov Decision Process Design on Sim-to-Real Reinforcement Learning
标题：Markov决策过程设计对Sim-to-Real强化学习的影响
链接：https://arxiv.org/abs/2603.09427

作者：Tatjana Krau,Jorge Mandlmaier,Tobias Damm,Frieder Heieck
备注：Submitted at the 65th IEEE Conference on Decision and Control
摘要：强化学习（RL）在工业过程控制中表现出了强大的潜力，但在仿真中训练的策略在部署到物理硬件上时，往往会遇到显着的模拟与真实之间的差距。这项工作系统地分析了核心马尔可夫决策过程（MDP）的设计选择-状态组成，目标包含，奖励制定，终止标准和环境动态模型-如何影响这种转移。使用颜色混合任务，我们评估不同的MDP配置和混合动力学在模拟和现实世界的实验。我们在物理硬件上验证了我们的研究结果，表明基于物理的动态模型在严格的精度约束下实现了高达50%的真实世界成功，而简化模型完全失败。我们的研究结果为在工业过程控制中部署RL提供了实用的MDP设计指南。
摘要：Reinforcement Learning (RL) has demonstrated strong potential for industrial process control, yet policies trained in simulation often suffer from a significant sim-to-real gap when deployed on physical hardware. This work systematically analyzes how core Markov Decision Process (MDP) design choices -- state composition, target inclusion, reward formulation, termination criteria, and environment dynamics models -- affect this transfer. Using a color mixing task, we evaluate different MDP configurations and mixing dynamics across simulation and real-world experiments. We validate our findings on physical hardware, demonstrating that physics-based dynamics models achieve up to 50% real-world success under strict precision constraints where simplified models fail entirely. Our results provide practical MDP design guidelines for deploying RL in industrial process control.

【2】Reward-Zero: Language Embedding Driven Implicit Reward Mechanisms for Reinforcement Learning
标题：零奖励：语言嵌入驱动的强化学习隐性奖励机制
链接：https://arxiv.org/abs/2603.09331

作者：Heng Zhang,Haddy Alchaer,Arash Ajoudani,Yu She
备注：under review
摘要：我们引入了Reward-Zero，这是一种通用的隐式奖励机制，可以将自然语言任务描述转换为密集的，语义上接地的进度信号，用于强化学习（RL）。Reward-Zero是一个简单而复杂的通用奖励函数，它利用语言嵌入进行高效的RL训练。通过将任务规范的嵌入与从代理的交互经验中派生的嵌入进行比较，Reward-Zero产生连续的、语义对齐的完成感信号。这种奖励补充了稀疏或延迟的环境反馈，而不需要特定于任务的工程。当集成到标准RL框架中时，它可以加速探索，稳定训练，并增强跨不同任务的泛化能力。从经验上讲，使用Reward-Zero训练的智能体收敛速度更快，最终成功率比传统方法更高，例如具有共同奖励形成基线的PPO，成功解决了手工设计的奖励在一些复杂任务中无法完成的任务。此外，我们还开发了一个迷你基准测试，用于通过语言嵌入来评估任务执行过程中的完成感。这些结果突出了语言驱动的隐式奖励函数的承诺，作为一个实际的路径，实现更样本效率，概括性和可扩展的RL的体现代理。代码将在同行评审后发布。
摘要：We introduce Reward-Zero, a general-purpose implicit reward mechanism that transforms natural-language task descriptions into dense, semantically grounded progress signals for reinforcement learning (RL). Reward-Zero serves as a simple yet sophisticated universal reward function that leverages language embeddings for efficient RL training. By comparing the embedding of a task specification with embeddings derived from an agent's interaction experience, Reward-Zero produces a continuous, semantically aligned sense-of-completion signal. This reward supplements sparse or delayed environmental feedback without requiring task-specific engineering. When integrated into standard RL frameworks, it accelerates exploration, stabilizes training, and enhances generalization across diverse tasks. Empirically, agents trained with Reward-Zero converge faster and achieve higher final success rates than conventional methods such as PPO with common reward-shaping baselines, successfully solving tasks that hand-designed rewards could not in some complex tasks. In addition, we develop a mini benchmark for the evaluation of completion sense during task execution via language embeddings. These results highlight the promise of language-driven implicit reward functions as a practical path toward more sample-efficient, generalizable, and scalable RL for embodied agents. Code will be released after peer review.

【3】Strategically Robust Multi-Agent Reinforcement Learning with Linear Function Approximation
标题：具有线性函数逼近的策略稳健多智能体强化学习
链接：https://arxiv.org/abs/2603.09208

作者：Jake Gonzales,Max Horwitz,Eric Mazumdar,Lillian J. Ratliff
摘要：一般和马尔可夫博弈中可证明有效和鲁棒的均衡计算仍然是多智能体强化学习的核心挑战。纳什均衡是计算困难的一般和脆弱的，由于平衡的多重性和近似误差的敏感性。我们研究了风险敏感的量子响应均衡（RQRE），它在有限理性和风险敏感的情况下产生唯一的光滑解。我们提出了\texttt{RQRE-OVI}，一个乐观的值迭代算法计算RQRE与线性函数近似在大或连续的状态空间。通过有限样本遗憾分析，我们建立收敛性，并明确表征样本复杂性如何与合理性和风险敏感性参数。遗憾界揭示了一个定量的权衡：增加理性收紧遗憾，而风险敏感性诱导正则化，提高稳定性和鲁棒性。这暴露了预期绩效和稳健性之间的帕累托边界，纳什在完美理性和风险中性的极限中恢复。我们进一步表明，RQRE政策地图是Lipschitz连续的估计收益，不像纳什，和RQRE承认一个分布鲁棒优化的解释。从经验上讲，我们证明，\texttt{RQRE-OVI}实现竞争力的性能下自我发挥，同时产生更强大的行为下的交叉播放相比，基于纳什的方法。这些结果表明，\texttt{RQRE-OVI}提供了一个原则性，可扩展性和可调路径的平衡学习，提高了鲁棒性和泛化。
摘要：Provably efficient and robust equilibrium computation in general-sum Markov games remains a core challenge in multi-agent reinforcement learning. Nash equilibrium is computationally intractable in general and brittle due to equilibrium multiplicity and sensitivity to approximation error. We study Risk-Sensitive Quantal Response Equilibrium (RQRE), which yields a unique, smooth solution under bounded rationality and risk sensitivity. We propose \texttt{RQRE-OVI}, an optimistic value iteration algorithm for computing RQRE with linear function approximation in large or continuous state spaces. Through finite-sample regret analysis, we establish convergence and explicitly characterize how sample complexity scales with rationality and risk-sensitivity parameters. The regret bounds reveal a quantitative tradeoff: increasing rationality tightens regret, while risk sensitivity induces regularization that enhances stability and robustness. This exposes a Pareto frontier between expected performance and robustness, with Nash recovered in the limit of perfect rationality and risk neutrality. We further show that the RQRE policy map is Lipschitz continuous in estimated payoffs, unlike Nash, and RQRE admits a distributionally robust optimization interpretation. Empirically, we demonstrate that \texttt{RQRE-OVI} achieves competitive performance under self-play while producing substantially more robust behavior under cross-play compared to Nash-based approaches. These results suggest \texttt{RQRE-OVI} offers a principled, scalable, and tunable path for equilibrium learning with improved robustness and generalization.

【4】RubiCap: Rubric-Guided Reinforcement Learning for Dense Image Captioning
标题：RubiCAP：用于密集图像字幕的Ruby引导强化学习
链接：https://arxiv.org/abs/2603.09160

作者：Tzu-Heng Huang,Sirajul Salekin,Javier Movellan,Frederic Sala,Manjot Bilkhu
摘要：密集的图像字幕对于视觉语言预训练和文本到图像生成中的跨模态对齐至关重要，但缩放专家质量的注释成本过高。虽然通过强视觉语言模型（VLM）的合成字幕是一种实用的替代方案，但监督蒸馏通常会产生有限的输出多样性和弱泛化。强化学习（RL）可以克服这些限制，但到目前为止，它的成功主要集中在依赖于确定性检查器的可验证领域-这是开放式字幕中无法实现的奢侈品。我们用RubiCap解决了这个瓶颈，RubiCap是一种新的RL框架，它从LLM编写的规则中获得细粒度的，特定于样本的奖励信号。RubiCap首先组建了一个由候选标题组成的多元化委员会，然后聘请了一位法学硕士标题作者来提取共识优势，并诊断当前政策中的不足之处。这些见解被转换为明确的评估标准，使LLM法官能够分解整体质量评估，并用结构化的多方面评估取代粗略的标量奖励。在广泛的基准测试中，RubiCap在CapArena上实现了最高的胜率，优于监督蒸馏，先前的RL方法，人类专家注释和GPT-4V增强输出。在CaptionQA上，它展示了卓越的单词效率：我们的7 B模型与Qwen2.5-VL-32 B-Instruct匹配，我们的3B模型优于7 B模型。值得注意的是，使用紧凑的RubiCap-3B作为字幕生成更强大的预训练VLM比那些从专有模型的字幕训练。
摘要：Dense image captioning is critical for cross-modal alignment in vision-language pretraining and text-to-image generation, but scaling expert-quality annotations is prohibitively expensive. While synthetic captioning via strong vision-language models (VLMs) is a practical alternative, supervised distillation often yields limited output diversity and weak generalization. Reinforcement learning (RL) could overcome these limitations, but its successes have so far been concentrated in verifiable domains that rely on deterministic checkers -- a luxury not available in open-ended captioning. We address this bottleneck with RubiCap, a novel RL framework that derives fine-grained, sample-specific reward signals from LLM-written rubrics. RubiCap first assembles a diverse committee of candidate captions, then employs an LLM rubric writer to extract consensus strengths and diagnose deficiencies in the current policy. These insights are converted into explicit evaluation criteria, enabling an LLM judge to decompose holistic quality assessment and replace coarse scalar rewards with structured, multi-faceted evaluations. Across extensive benchmarks, RubiCap achieves the highest win rates on CapArena, outperforming supervised distillation, prior RL methods, human-expert annotations, and GPT-4V-augmented outputs. On CaptionQA, it demonstrates superior word efficiency: our 7B model matches Qwen2.5-VL-32B-Instruct, and our 3B model surpasses its 7B counterpart. Remarkably, using the compact RubiCap-3B as a captioner produces stronger pretrained VLMs than those trained on captions from proprietary models.

【5】Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards
标题：推理与信心脱钩：从可验证奖励中恢复强化学习校准
链接：https://arxiv.org/abs/2603.09117

作者：Zhengzhao Ma,Xueru Wen,Boxi Cao,Yaojie Lu,Hongyu Lin,Jinglin Yang,Min He,Xianpei Han,Le Sun
备注：9 pages, 8 figures
摘要：来自可验证奖励（RLVR）的强化学习显著增强了大型语言模型（LLM）的推理，但严重遭受校准退化，其中模型对不正确的答案变得过于自信。以往的研究致力于直接将校准目标纳入现有的优化目标。然而，我们的理论分析表明，存在一个基本的梯度之间的冲突最大化的政策准确性和最小化校准误差的优化。基于这一认识，我们提出了DCPO，一个简单而有效的框架，系统地解释推理和校准目标。大量的实验表明，我们的DCPO不仅保持了与GRPO相当的精度，而且还实现了最佳的校准性能，并大大减轻了过度自信的问题。我们的研究为更可靠的LLM部署提供了有价值的见解和实用的解决方案。
摘要：Reinforcement Learning from Verifiable Rewards (RLVR) significantly enhances large language models (LLMs) reasoning but severely suffers from calibration degeneration, where models become excessively over-confident in incorrect answers. Previous studies devote to directly incorporating calibration objective into existing optimization target. However, our theoretical analysis demonstrates that there exists a fundamental gradient conflict between the optimization for maximizing policy accuracy and minimizing calibration error. Building on this insight, we propose DCPO, a simple yet effective framework that systematically decouples reasoning and calibration objectives. Extensive experiments demonstrate that our DCPO not only preserves accuracy on par with GRPO but also achieves the best calibration performance and substantially mitigates the over-confidence issue. Our study provides valuable insights and practical solution for more reliable LLM deployment.

【6】Optimizing Reinforcement Learning Training over Digital Twin Enabled Multi-fidelity Networks
标题：通过数字双胞胎支持的多保真网络优化强化学习训练
链接：https://arxiv.org/abs/2603.08931

作者：Hanzhi Yu,Hasan Farooq,Julien Forgeat,Shruti Bothe,Kristijonas Cyras,Md Moin Uddin Chowdhury,Mingzhe Chen
摘要：在本文中，我们研究了一种新的数字网络孪生（DNT）辅助深度学习（DL）模型训练框架。特别是，我们考虑一个物理网络，其中一个基站（BS）使用几个天线来服务于多个移动用户，和一个DNT，这是一个虚拟的物理网络的表示。BS必须调整其天线倾斜角度以优化所有用户的数据速率。由于用户移动性，BS可能无法准确地跟踪网络动态，例如无线信道和用户移动性。因此，使用强化学习（RL）方法来动态地调整天线倾斜角度。为了训练RL，我们可以使用从物理网络和DNT收集的数据。从物理网络收集的数据更准确，但与从DNT收集的数据相比，会产生更多的通信开销。因此，有必要确定从物理网络和DNT收集的数据的比率，以改善RL模型的训练。我们制定这个问题作为一个优化问题，其目标是共同优化的倾斜角度调整策略和数据收集策略，旨在最大限度地提高所有用户的数据速率，同时限制从物理网络收集数据所引入的时间延迟。为了解决这个问题，我们提出了一个分层的RL框架，它集成了鲁棒的对抗性损失和邻近策略优化（PPO）。仿真结果表明，我们提出的方法减少了物理网络数据收集延迟高达28.01%和1倍相比，分层RL，使用香草PPO作为第一级RL，和基线，使用鲁棒RL在第一级，并随机选择数据收集率。
摘要：In this paper, we investigate a novel digital network twin (DNT) assisted deep learning (DL) model training framework. In particular, we consider a physical network where a base station (BS) uses several antennas to serve multiple mobile users, and a DNT that is a virtual representation of the physical network. The BS must adjust its antenna tilt angles to optimize the data rates of all users. Due to user mobility, the BS may not be able to accurately track network dynamics such as wireless channels and user mobilities. Hence, a reinforcement learning (RL) approach is used to dynamically adjust the antenna tilt angles. To train the RL, we can use data collected from the physical network and the DNT. The data collected from the physical network is more accurate but incurs more communication overhead compared to the data collected from the DNT. Therefore, it is necessary to determine the ratio of data collected from the physical network and the DNT to improve the training of the RL model. We formulate this problem as an optimization problem whose goal is to jointly optimize the tilt angle adjustment policy and the data collection strategy, aiming to maximize the data rates of all users while constraining the time delay introduced by collecting data from the physical network. To solve this problem, we propose a hierarchical RL framework that integrates robust adversarial loss and proximal policy optimization (PPO). Simulation results show that our proposed method reduces the physical network data collection delay by up to 28.01% and 1x compared to a hierarchical RL that uses vanilla PPO as the first level RL, and the baseline that uses robust-RL at the first level and selects the data collection ratio randomly.

【7】Multi-level meta-reinforcement learning with skill-based curriculum
标题：基于技能的课程的多层次元强化学习
链接：https://arxiv.org/abs/2603.08773

作者：Sichen Yang,Mauro Maggioni
备注：78 pages, 12 figures
摘要：我们考虑的问题，顺序决策与自然的多层次结构，子任务组装在一起，以完成复杂的目标。系统地推断和利用等级结构仍然是一个长期的挑战;我们描述了一种用于重复压缩马尔可夫决策过程（MDP）的有效的多级过程，其中，在一个级别上的参数策略族在较高级别上的压缩MDP中被视为单个动作，同时保留原始MDP的语义含义和结构，并模仿自然逻辑来处理复杂的MDP。更高级别的MDP本身是具有较少随机性的独立MDP，并且可以使用现有算法来求解。作为一个副产品，空间或时间尺度可能会在更高的水平上变粗，从而更有效地找到长期的最佳策略。该过程提供的多级表示将子任务彼此分离，并且通常大大减少了不必要的随机性和策略搜索空间，从而在求解MDP时减少了迭代和计算。这项工作的第二个基本方面是，这些多层次分解加上将策略分解为嵌入（特定于问题）和技能（包括高阶函数），产生了跨不同问题和不同层次的技能转移机会。这整个过程是在课程学习框架内，其中教师组织学生代理的学习过程的方式，逐渐增加任务的难度，并促进跨MDPs和水平内和跨课程的转移。这个框架的一致性及其好处可以在温和的假设下得到保证。我们展示了抽象，可移植性和课程学习的例子，包括迷宫+，一个更复杂的变体的迷宫的例子。
摘要：We consider problems in sequential decision making with natural multi-level structure, where sub-tasks are assembled together to accomplish complex goals. Systematically inferring and leveraging hierarchical structure has remained a longstanding challenge; we describe an efficient multi-level procedure for repeatedly compressing Markov decision processes (MDPs), wherein a parametric family of policies at one level is treated as single actions in the compressed MDPs at higher levels, while preserving the semantic meanings and structure of the original MDP, and mimicking the natural logic to address a complex MDP. Higher-level MDPs are themselves independent MDPs with less stochasticity, and may be solved using existing algorithms. As a byproduct, spatial or temporal scales may be coarsened at higher levels, making it more efficient to find long-term optimal policies. The multi-level representation delivered by this procedure decouples sub-tasks from each other and usually greatly reduces unnecessary stochasticity and the policy search space, leading to fewer iterations and computations when solving the MDPs. A second fundamental aspect of this work is that these multi-level decompositions plus the factorization of policies into embeddings (problem-specific) and skills (including higher-order functions) yield new transfer opportunities of skills across different problems and different levels. This whole process is framed within curriculum learning, wherein a teacher organizes the student agent's learning process in a way that gradually increases the difficulty of tasks and and promotes transfer across MDPs and levels within and across curricula. The consistency of this framework and its benefits can be guaranteed under mild assumptions. We demonstrate abstraction, transferability, and curriculum learning in examples, including MazeBase+, a more complex variant of the MazeBase example.

【8】A Survey of Reinforcement Learning For Economics
标题：经济学强化学习概览
链接：https://arxiv.org/abs/2603.08956

作者：Pranjal Rawat
摘要：这项调查（重新）向经济学家介绍了强化学习方法。维数灾难限制了精确动态规划的有效应用，迫使我们依赖于适当的“小”问题或我们将“大”问题转化为更小问题的能力。虽然这种简化对于许多经典应用已经足够了，但越来越多的经济模型抵制这种简化。强化学习算法提供了动态编程的自然的、基于样本的扩展，将易处理性扩展到具有高维状态、连续动作和策略交互的问题。我回顾了经典规划理论与现代学习算法的联系，并通过定价、库存控制、战略博弈和偏好诱导等方面的模拟实例展示了它们的机制。我还研究了这些算法的实际漏洞，注意到它们的脆弱性，样本效率低下，对超参数的敏感性，以及在表格设置之外缺乏全局收敛保证。强化学习的成功仍然受到这些约束的严格限制，以及对精确模拟器的依赖。在经济结构的指导下，强化学习提供了一个非常灵活的框架。它是计算经济学家工具包中一个不完美但有前途的补充。一项配套调查（Rust and Rawat，2026 b）涵盖了从观察到的行为推断偏好的逆问题。
摘要：This survey (re)introduces reinforcement learning methods to economists. The curse of dimensionality limits how far exact dynamic programming can be effectively applied, forcing us to rely on suitably "small" problems or our ability to convert "big" problems into smaller ones. While this reduction has been sufficient for many classical applications, a growing class of economic models resists such reduction. Reinforcement learning algorithms offer a natural, sample-based extension of dynamic programming, extending tractability to problems with high-dimensional states, continuous actions, and strategic interactions. I review the theory connecting classical planning to modern learning algorithms and demonstrate their mechanics through simulated examples in pricing, inventory control, strategic games, and preference elicitation. I also examine the practical vulnerabilities of these algorithms, noting their brittleness, sample inefficiency, sensitivity to hyperparameters, and the absence of global convergence guarantees outside of tabular settings. The successes of reinforcement learning remain strictly bounded by these constraints, as well as a reliance on accurate simulators. When guided by economic structure, reinforcement learning provides a remarkably flexible framework. It stands as an imperfect, but promising, addition to the computational economist's toolkit. A companion survey (Rust and Rawat, 2026b) covers the inverse problem of inferring preferences from observed behavior.

分层学习(1篇)

【1】Learning the Hierarchical Organization in Brain Network for Brain Disorder Diagnosis
标题：学习脑神经网络的层次结构用于脑疾病诊断
链接：https://arxiv.org/abs/2603.09606

作者：Jingfeng Tang,Peng Cao,Guangqi Wen,Jinzhu Yang,Xiaoli Liu,Osmar R. Zaiane
摘要：基于功能磁共振成像（fMRI）的脑网络分析是诊断脑疾病的关键。现有的方法通常依赖于预定义的功能子网络来构建子网络关联。然而，我们发现了许多具有高Pearson相关性的跨网络交互模式，这些模式是这种严格的基于先验的组织无法捕获的。为了克服这一局限性，我们提出了脑层次组织学习（BrainHO），以学习固有的层次脑网络的依赖关系的基础上，其内在的功能，而不是预定义的子网络标签。具体来说，我们设计了一个层次化的注意力机制，允许模型将节点聚合成一个层次化的组织，有效地捕捉子图级别上复杂的连接模式。为了确保多样性，互补性和稳定的组织，我们将正交性约束损失，以及层次一致性约束策略，使用高级图语义来细化节点级功能。在公开可用的ABIDE和REST-meta-MDD数据集上进行的大量实验表明，BrainHO不仅实现了最先进的分类性能，而且通过精确定位疾病相关的子网络，发现了可解释的临床重要生物标志物。
摘要：Brain network analysis based on functional Magnetic Resonance Imaging (fMRI) is pivotal for diagnosing brain disorders. Existing approaches typically rely on predefined functional sub-networks to construct sub-network associations. However, we identified many cross-network interaction patterns with high Pearson correlations that this strict, prior-based organization fails to capture. To overcome this limitation, we propose the Brain Hierarchical Organization Learning (BrainHO) to learn inherently hierarchical brain network dependencies based on their intrinsic features rather than predefined sub-network labels. Specifically, we design a hierarchical attention mechanism that allows the model to aggregate nodes into a hierarchical organization, effectively capturing intricate connectivity patterns at the subgraph level. To ensure diverse, complementary, and stable organizations, we incorporate an orthogonality constraint loss, alongside a hierarchical consistency constraint strategy, to refine node-level features using high-level graph semantics. Extensive experiments on the publicly available ABIDE and REST-meta-MDD datasets demonstrate that BrainHO not only achieves state-of-the-art classification performance but also uncovers interpretable, clinically significant biomarkers by precisely localizing disease-related sub-networks.

医学相关(4篇)

【1】SignalMC-MED: A Multimodal Benchmark for Evaluating Biosignal Foundation Models on Single-Lead ECG and PPG
标题：SignalMC-MED：评估单导心电图和ERG生物信号基金会模型的多模式基准
链接：https://arxiv.org/abs/2603.09940

作者：Fredrik K. Gustafsson,Xiao Gu,Mattia Carletti,Patitapaban Palo,David W. Eyre,David A. Clifton
备注：Code is available at https://github.com/fregu856/SignalMC-MED
摘要：最近的生物信号基础模型（FM）在不同的临床预测任务中表现出了良好的性能，但对长时间多模态数据的系统评价仍然有限。我们介绍SignalMC-MED，这是一种用于评估同步单导联心电图（ECG）和光电容积描记图（PPG）数据的生物信号FM的基准。SignalMC-MED源自MC-MED数据集，包括22，256次访视，10分钟重叠ECG和PPG信号，并包括20项临床相关任务，涵盖人口统计学预测，急诊室处置，实验室值回归和既往ICD-10诊断检测。使用该基准，我们对仅ECG、仅PPG和ECG + PPG设置中的代表性时间序列和生物信号FM进行了系统评价。我们发现，特定领域的生物信号FM始终优于一般的时间序列模型，多模态ECG + PPG融合产生了稳健的改进，比单峰输入。此外，使用完整的10分钟信号始终优于较短的片段，并且较大的模型变体不会可靠地优于较小的模型变体。手工制作的ECG域功能提供了一个强大的基线，并在与学习的FM表示相结合时提供补充价值。总之，这些结果将SignalMC-MED确立为标准化基准，并为评估和部署生物信号FM提供了实用指导。
摘要：Recent biosignal foundation models (FMs) have demonstrated promising performance across diverse clinical prediction tasks, yet systematic evaluation on long-duration multimodal data remains limited. We introduce SignalMC-MED, a benchmark for evaluating biosignal FMs on synchronized single-lead electrocardiogram (ECG) and photoplethysmogram (PPG) data. Derived from the MC-MED dataset, SignalMC-MED comprises 22,256 visits with 10-minute overlapping ECG and PPG signals, and includes 20 clinically relevant tasks spanning prediction of demographics, emergency department disposition, laboratory value regression, and detection of prior ICD-10 diagnoses. Using this benchmark, we perform a systematic evaluation of representative time-series and biosignal FMs across ECG-only, PPG-only, and ECG + PPG settings. We find that domain-specific biosignal FMs consistently outperform general time-series models, and that multimodal ECG + PPG fusion yields robust improvements over unimodal inputs. Moreover, using the full 10-minute signal consistently outperforms shorter segments, and larger model variants do not reliably outperform smaller ones. Hand-crafted ECG domain features provide a strong baseline and offer complementary value when combined with learned FM representations. Together, these results establish SignalMC-MED as a standardized benchmark and provide practical guidance for evaluating and deploying biosignal FMs.

【2】Democratising Clinical AI through Dataset Condensation for Classical Clinical Models
标题：通过经典临床模型的数据集浓缩实现临床人工智能民主化
链接：https://arxiv.org/abs/2603.09356

作者：Anshul Thakur,Soheila Molaei,Pafue Christy Nganjimi,Joshua Fieggen,Andrew A. S. Soltan,Danielle Belgrave,Lei Clifton,David A. Clifton
备注：22 pages, 5 figures, 5 tables
摘要：数据集压缩（DC）学习一个紧凑的合成数据集，使模型能够匹配全数据训练的性能，优先考虑效用而不是分布保真度。虽然DC通常是为了提高计算效率而开发的，但它也为医疗数据民主化带来了希望，特别是当与差异隐私相结合时，它允许合成数据作为真实记录的安全替代品。然而，现有的DC方法依赖于可微神经网络，限制了它们与广泛使用的临床模型（如决策树和Cox回归）的兼容性。我们解决这个差距使用差分私有，零阶优化框架，DC扩展到不可微的模型，只使用功能评估。六个数据集（包括分类和生存任务）的实证结果表明，所提出的方法产生的压缩数据集保留了模型效用，同时提供了有效的差分隐私保证-使模型无关的数据共享用于临床预测任务，而不会暴露敏感的患者信息。
摘要：Dataset condensation (DC) learns a compact synthetic dataset that enables models to match the performance of full-data training, prioritising utility over distributional fidelity. While typically explored for computational efficiency, DC also holds promise for healthcare data democratisation, especially when paired with differential privacy, allowing synthetic data to serve as a safe alternative to real records. However, existing DC methods rely on differentiable neural networks, limiting their compatibility with widely used clinical models such as decision trees and Cox regression. We address this gap using a differentially private, zero-order optimisation framework that extends DC to non-differentiable models using only function evaluations. Empirical results across six datasets, including both classification and survival tasks, show that the proposed method produces condensed datasets that preserve model utility while providing effective differential privacy guarantees - enabling model-agnostic data sharing for clinical prediction tasks without exposing sensitive patient information.

【3】From Days to Minutes: An Autonomous AI Agent Achieves Reliable Clinical Triage in Remote Patient Monitoring
标题：从几天到几分钟：自主人工智能代理在远程患者监护中实现可靠的临床分诊
链接：https://arxiv.org/abs/2603.09052

作者：Seunghwan Kim,Tiffany H. Kung,Heena Verma,Dilan Edirisinghe,Kaveh Sedehi,Johanna Alvarez,Diane Shilling,Audra Lisa Doyle,Ajit Chary,William Borden,Ming Jack Po
备注：46 pages, 11 figures, Abstract in metadata is shortened to meet arXiv character limits; see PDF for full version
摘要：背景资料：远程患者监护（RPM）产生大量数据，但由于数据量超出临床工作人员的承受能力，标志性试验（Tele-HF、BEAT-HF）失败。虽然TIM-HF2显示24/7医生主导的监测将死亡率降低了30%，但这种模式仍然非常昂贵且不可扩展。研究方法：我们开发了Sentinel，这是一种使用模型上下文协议（MCP）的自主AI代理，通过21种临床工具和多步推理对RPM生命体征进行上下文分类。评价包括：（1）自我一致性（100个读数x 5次运行）;（2）与基于规则的阈值进行比较;以及（3）使用连接矩阵设计对6名临床医生（3名医生，3名NP）进行验证。一个留一法（LOO）分析比较了代理人对个人的临床医生，严重的过度分诊病例进行了独立的医生裁定。结果如下：针对人类多数投票标准（N = 467），该代理实现了95.8%的紧急灵敏度和88.5%的灵敏度（85.7%的特异性）。四级精确准确率为69.4%（二次加权Kappa = 0.778）; 95.9%的分类在一个严重程度内。在LOO分析中，该试剂在紧急敏感性（97.5% vs. 60.0%总体）和可操作敏感性（90.9% vs. 69.5%）方面优于每位临床医生。虽然分歧倾向于过度分诊（22.5%），但对严重差距（>= 2级）的独立裁定在88 - 94%的病例中验证了药物递增;共识解决验证了100%。代理人表现出近乎完美的自我一致性（Kappa = 0.850）。平均费用为0.34美元/分诊。结论：哨兵分诊RPM生命体征的灵敏度超过了个体临床医生。通过自动化系统背景合成，Sentinel解决了先前RPM试验的核心限制，为密集监测提供了一条可扩展的路径，以降低死亡率，同时保持临床上可防御的过度分诊。
摘要：Background: Remote patient monitoring (RPM) generates vast data, yet landmark trials (Tele-HF, BEAT-HF) failed because data volume overwhelmed clinical staff. While TIM-HF2 showed 24/7 physician-led monitoring reduces mortality by 30%, this model remains prohibitively expensive and unscalable. Methods: We developed Sentinel, an autonomous AI agent using Model Context Protocol (MCP) for contextual triage of RPM vitals via 21 clinical tools and multi-step reasoning. Evaluation included: (1) self-consistency (100 readings x 5 runs); (2) comparison against rule-based thresholds; and (3) validation against 6 clinicians (3 physicians, 3 NPs) using a connected matrix design. A leave-one-out (LOO) analysis compared the agent against individual clinicians; severe overtriage cases underwent independent physician adjudication. Results: Against a human majority-vote standard (N=467), the agent achieved 95.8% emergency sensitivity and 88.5% sensitivity for all actionable alerts (85.7% specificity). Four-level exact accuracy was 69.4% (quadratic-weighted kappa=0.778); 95.9% of classifications were within one severity level. In LOO analysis, the agent outperformed every clinician in emergency sensitivity (97.5% vs. 60.0% aggregate) and actionable sensitivity (90.9% vs. 69.5%). While disagreements skewed toward overtriage (22.5%), independent adjudication of severe gaps (>=2 levels) validated agent escalation in 88-94% of cases; consensus resolution validated 100%. The agent showed near-perfect self-consistency (kappa=0.850). Median cost was $0.34/triage. Conclusions: Sentinel triages RPM vitals with sensitivity exceeding individual clinicians. By automating systematic context synthesis, Sentinel addresses the core limitation of prior RPM trials, offering a scalable path toward the intensive monitoring shown to reduce mortality while maintaining a clinically defensible overtriage profile.

【4】MAPLE: Elevating Medical Reasoning from Statistical Consensus to Process-Led Alignment
标题：MAPLE：将医学推理从统计共识提升到流程主导的一致
链接：https://arxiv.org/abs/2603.08987

作者：Kailong Fan,Anqi Pu,Yichen Wu,Wanhua Li,Yicong Li,Hanspeter Pfister,Huafeng Liu,Xiang Li,Quanzheng Li,Ning Guo
摘要：医学大型语言模型的最新进展已经探索了测试时强化学习（TTRL）来增强推理。然而，标准TTRL通常依赖于多数表决（MV）作为启发式监督信号，这在复杂的医疗场景中可能是不可靠的，其中最频繁的推理路径不一定是临床正确的。在这项工作中，我们提出了一种新的和统一的训练范式，将医疗过程奖励模型与TTRL相结合，以弥合测试时间缩放（TTS）和参数模型优化之间的差距。具体来说，我们提前TTRL框架取代传统的MV与细粒度，专家对齐的监督范例，使用Med-RPM。这种集成确保了强化学习由医学正确性而不仅仅是共识指导，有效地将基于搜索的智能提取到模型的参数记忆中。对四个不同基准的广泛评估表明，我们开发的方法始终显着优于目前的TTRL和独立的PRM选择。我们的研究结果表明，从随机奖励过渡到结构化的逐步奖励对于开发可靠和可扩展的医疗AI系统至关重要
摘要：Recent advances in medical large language models have explored Test-Time Reinforcement Learning (TTRL) to enhance reasoning. However, standard TTRL often relies on majority voting (MV) as a heuristic supervision signal, which can be unreliable in complex medical scenarios where the most frequent reasoning path is not necessarily the clinically correct one. In this work, we propose a novel and unified training paradigm that integrates medical process reward models with TTRL to bridge the gap between test-time scaling (TTS) and parametric model optimization. Specifically, we advance the TTRL framework by replacing the conventional MV with a fine-grained, expert-aligned supervision paradigm using Med-RPM. This integration ensures that reinforcement learning is guided by medical correctness rather than mere consensus, effectively distilling search-based intelligence into the model's parametric memory. Extensive evaluations on four different benchmarks have demonstrated that our developed method consistently and significantly outperforms current TTRL and standalone PRM selection. Our findings establish that transitioning from stochastic heuristics to structured, step-wise rewards is essential for developing reliable and scalable medical AI systems

蒸馏|知识提取(3篇)

【1】A Multi-Prototype-Guided Federated Knowledge Distillation Approach in AI-RAN Enabled Multi-Access Edge Computing System
标题：支持AI-RAN的多访问边缘计算系统中的多原型引导联邦知识提炼方法
链接：https://arxiv.org/abs/2603.09727

作者：Luyao Zou,Hayoung Oh,Chu Myaet Thwal,Apurba Adhikary,Seohyeon Hong,Zhu Han
备注：15 pages, 6 figures
摘要：随着无线网络的发展，多接入边缘计算（MEC）和人工智能（AI）-本地无线接入网（RAN）引起了人们的极大关注。特别是，AI-RAN和MEC的集成被设想为改变网络效率和响应能力。因此，研究支持AI-RAN的MEC系统具有重要的意义。联邦学习（Federated Learning，FL）是一种新兴的基于AI-RAN的MEC系统学习方法，它可以在不泄露原始数据的情况下，通过边缘设备协同训练全局模型。然而，传统的FL在处理非独立同分布（non-IID）数据时遇到了挑战。通过平均每个类的嵌入向量获得的单个原型可以在FL中使用以处理数据异构性问题。然而，由于平均操作，这可能导致有用信息的丢失。因此，在本文中，多原型引导的联邦知识蒸馏（MP-FedKD）方法的建议。特别地，自我知识的提炼被整合到外语中来处理非IID问题。为了解决基于单一原型的聚类策略所带来的信息丢失问题，采用了多原型策略，提出了一种条件层次凝聚聚类（CHAC）方法和一种原型对齐方案。此外，我们为每个本地客户端设计了一个新的损失函数（称为LEMGP损失），其中将重点关注全局原型和本地嵌入之间的关系。在具有各种非IID设置的多个数据集上进行的广泛实验表明，所提出的MP-FedKD方法在准确度、平均准确度和误差（RMSE和MAE）方面优于所考虑的最先进的基线。
摘要：With the development of wireless network, Multi-Access Edge Computing (MEC) and Artificial Intelligence (AI)-native Radio Access Network (RAN) have attracted significant attention. Particularly, the integration of AI-RAN and MEC is envisioned to transform network efficiency and responsiveness. Therefore, it is valuable to investigate AI-RAN enabled MEC system. Federated learning (FL) nowadays is emerging as a promising approach for AI-RAN enabled MEC system, in which edge devices are enabled to train a global model cooperatively without revealing their raw data. However, conventional FL encounters the challenge in processing the non-independent and identically distributed (non-IID) data. Single prototype obtained by averaging the embedding vectors per class can be employed in FL to handle the data heterogeneity issue. Nevertheless, this may result in the loss of useful information owing to the average operation. Therefore, in this paper, a multi-prototype-guided federated knowledge distillation (MP-FedKD) approach is proposed. Particularly, self-knowledge distillation is integrated into FL to deal with the non-IID issue. To cope with the problem of information loss caused by single prototype-based strategy, multi-prototype strategy is adopted, where we present a conditional hierarchical agglomerative clustering (CHAC) approach and a prototype alignment scheme. Additionally, we design a novel loss function (called LEMGP loss) for each local client, where the relationship between global prototypes and local embedding will be focused. Extensive experiments over multiple datasets with various non-IID settings showcase that the proposed MP-FedKD approach outperforms the considered state-of-the-art baselines regarding accuracy, average accuracy and errors (RMSE and MAE).

【2】SCDP: Learning Humanoid Locomotion from Partial Observations via Mixed-Observation Distillation
标题：SCDP：通过混合观察蒸馏从部分观察中学习类人机器人运动
链接：https://arxiv.org/abs/2603.09574

作者：Milo Carroll,Tianhu Peng,Lingfan Bao,Chengxu Zhou,Zhibin Li
备注：6 pages, 8 figures, 5 tables, iRos
摘要：从离线数据集中提取人形运动控制到可部署的策略仍然是一个挑战，因为现有方法依赖于特权全身状态，需要复杂且通常不可靠的状态估计。我们提出的传感器条件扩散政策（SCDP），使人形运动只使用板载传感器，消除了显式状态估计的需要。SCDP通过混合观测训练将传感器从监督中分离出来：在被监督的同时，传感器历史上的扩散模型条件预测特权未来的状态-动作轨迹，强制模型在部分可观测性下推断运动动态。我们进一步开发了限制去噪、上下文分布对齐和上下文感知注意力掩蔽，以鼓励模型内的隐式状态估计，并防止训练部署不匹配。我们验证SCDP的速度命令的运动和运动参考跟踪任务。在仿真中，SCDP在AMASS测试集上实现了近乎完美的速度控制成功率（99-100%）和93%的跟踪成功率，在仅使用机载传感器的情况下与特权基线相当。最后，我们在50 Hz的真实G1人形机器人上部署了经过训练的策略，在没有外部传感或状态估计的情况下展示了强大的真实机器人运动。
摘要：Distilling humanoid locomotion control from offline datasets into deployable policies remains a challenge, as existing methods rely on privileged full-body states that require complex and often unreliable state estimation. We present Sensor-Conditioned Diffusion Policies (SCDP) that enables humanoid locomotion using only onboard sensors, eliminating the need for explicit state estimation. SCDP decouples sensing from supervision through mixed-observation training: diffusion model conditions on sensor histories while being supervised to predict privileged future state-action trajectories, enforcing the model to infer the motion dynamics under partial observability. We further develop restricted denoising, context distribution alignment, and context-aware attention masking to encourage implicit state estimation within the model and to prevent train-deploy mismatch. We validate SCDP on velocity-commanded locomotion and motion reference tracking tasks. In simulation, SCDP achieves near-perfect success on velocity control (99-100%) and 93% tracking success in AMASS test set, performing comparable to privileged baselines while using only onboard sensors. Finally, we deploy the trained policy on a real G1 humanoid at 50 Hz, demonstrating robust real robot locomotion without external sensing or state estimation.

【3】SPREAD: Subspace Representation Distillation for Lifelong Imitation Learning
标题：SPRAD：用于终身模仿学习的子空间表示蒸馏
链接：https://arxiv.org/abs/2603.08763

作者：Kaushik Roy,Giovanni D'urso,Nicholas Lawrance,Brendan Tidd,Peyman Moghadam
备注：IEEE International Conference on Robotics & Automation (ICRA) 2026
摘要：终身模仿学习（LIL）的一个关键挑战是使智能体能够从专家演示中获得新技能，同时保留先前的知识。这需要保留低维流形和几何结构，这些结构是顺序学习中任务表示的基础。现有的提取方法依赖于原始特征空间中的L2范数特征匹配，对噪声和高维变化敏感，通常无法保留固有的任务流形。为了解决这个问题，我们引入了SPREAD，一个几何保持框架，采用奇异值分解（SVD）对齐低秩子空间内跨任务的策略表示。这种对齐保持了多模态特征的底层几何结构，促进了稳定的传输、鲁棒性和泛化。此外，我们提出了一个置信度指导的蒸馏策略，适用于Kullback-Leibler发散损失限制到前M个最有信心的行动样本，强调可靠的模式，提高优化稳定性。在LIBERO（终身模仿学习基准）上的实验表明，SPREAD大大提高了知识转移，减轻了灾难性遗忘，并实现了最先进的性能。
摘要：A key challenge in lifelong imitation learning (LIL) is enabling agents to acquire new skills from expert demonstrations while retaining prior knowledge. This requires preserving the low-dimensional manifolds and geometric structures that underlie task representations across sequential learning. Existing distillation methods, which rely on L2-norm feature matching in raw feature space, are sensitive to noise and high-dimensional variability, often failing to preserve intrinsic task manifolds. To address this, we introduce SPREAD, a geometry-preserving framework that employs singular value decomposition (SVD) to align policy representations across tasks within low-rank subspaces. This alignment maintains the underlying geometry of multimodal features, facilitating stable transfer, robustness, and generalization. Additionally, we propose a confidence-guided distillation strategy that applies a Kullback-Leibler divergence loss restricted to the top-M most confident action samples, emphasizing reliable modes and improving optimization stability. Experiments on the LIBERO, lifelong imitation learning benchmark, show that SPREAD substantially improves knowledge transfer, mitigates catastrophic forgetting, and achieves state-of-the-art performance.

推荐(1篇)

【1】What Do We Care About in Bandits with Noncompliance? BRACE: Bandits with Recommendations, Abstention, and Certified Effects
标题：我们关心不合规的强盗什么？BRACE：有建议、弃权和认证效果的强盗
链接：https://arxiv.org/abs/2603.09532

作者：Nicolás Della Penna
摘要：不服从的强盗将学习者的建议与实际提供的治疗分开，因此必须选择学习目标本身。一个平台可能关心当前中介工作流程中的推荐福利，未来直接控制机制的治疗学习，或这些目标之一的任何时间有效的不确定性。这些目标不必一致。我们正式这个目标选择问题，确定直接控制制度中的建议和治疗目标崩溃，并通过例子表明，建议福利可以严格超过每一个学习者可衡量的治疗政策时，下游行为者使用私人信息。对于有限上下文平方IV问题，我们提出了BRACE，一个无参数的相位加倍算法，执行IV逆矩阵认证后，否则返回全范围，但诚实的结构区间。BRACE同时提供策略值有效性、操作上最优推荐策略的固定间隙识别以及上下文同质性和可逆性下结构上最优治疗策略的固定间隙识别。我们补充的理论与有限的背景下的经验基准跨越直接控制，介导的现在与未来的权衡，弱识别，同质性故障，矩形过度识别。实验表明，安全性在简单问题上表现为遗憾，在弱识别条件下表现为回避和宽有效区间，在同质性失效条件下表现为偏好推荐福利的理由，在有额外工具时表现为更严格的结构不确定性。对于丰富的背景下，我们还得到了一个正交分数，其条件偏差分解成合规模型和结果模型的错误，澄清什么必须稳定的任何时间有效的半参数IV推理。
摘要：Bandits with noncompliance separate the learner's recommendation from the treatment actually delivered, so the learning target itself must be chosen. A platform may care about recommendation welfare in the current mediated workflow, treatment learning for a future direct-control regime, or anytime-valid uncertainty for one of those targets. These objectives need not agree. We formalize this objective-choice problem, identify the direct-control regime in which recommendation and treatment objectives collapse, and show by example that recommendation welfare can strictly exceed every learner-measurable treatment policy when downstream actors use private information. For finite-context square-IV problems we propose BRACE, a parameter-free phase-doubling algorithm that performs IV inversion only after matrix certification and otherwise returns full-range but honest structural intervals. BRACE delivers simultaneous policy-value validity, fixed-gap identification of the operationally optimal recommendation policy, and fixed-gap identification of the structurally optimal treatment policy under contextual homogeneity and invertibility. We complement the theory with a finite-context empirical benchmark spanning direct control, mediated present-versus-future tradeoffs, weak identification, homogeneity failure, and rectangular overidentification. The experiments show that safety appears as regret on easy problems, as abstention and wide valid intervals under weak identification, as a reason to prefer recommendation welfare under homogeneity failure, and as tighter structural uncertainty when extra instruments are available. For rich contexts, we also derive an orthogonal score whose conditional bias factorizes into compliance-model and outcome-model errors, clarifying what must be stabilized for anytime-valid semiparametric IV inference.

聚类(2篇)

【1】From Representation to Clusters: A Contrastive Learning Approach for Attributed Hypergraph Clustering
标题：从表示到集群：归因超图集群的对比学习方法
链接：https://arxiv.org/abs/2603.09370

作者：Li Ni,Shuaikang Zeng,Lin Mu,Longlong Lin
备注：Accepted at The Web Conference 2026. 12 pages, 5 figures
摘要：对比学习在属性超图聚类中表现出很强的性能。现有的基于对比学习的方法通常先学习节点嵌入，然后对这些嵌入应用聚类算法（如k-means）来获得聚类结果，然而这些方法缺乏直接的聚类监督，存在着在学习的图中包含与聚类无关的信息的风险.为此，提出了一种基于对比学习的属性超图聚类方法（CAHC），一种同时学习节点嵌入并获得聚类结果的端到端方法。CAHC包括两个主要步骤：表示学习和聚类分配学习。前者采用一种新的对比学习方法，结合节点级和超边级目标生成节点嵌入;后者采用嵌入和聚类优化相结合的方法，通过面向聚类的指导对节点嵌入进行优化，同时获得聚类结果。
摘要：Contrastive learning has demonstrated strong performance in attributed hypergraph clustering. Typically, existing methods based on contrastive learning first learn node embeddings and then apply clustering algorithms, such as k-means, to these embeddings to obtain the clustering results.However, these methods lack direct clustering supervision, risking the inclusion of clustering-irrelevant information in the learned graph.To this end, we propose a Contrastive learning approach for Attributed Hypergraph Clustering (CAHC), an end-to-end method that simultaneously learns node embeddings and obtains clustering results. CAHC consists of two main steps: representation learning and cluster assignment learning. The former employs a novel contrastive learning approach that incorporates both node-level and hyperedge-level objectives to generate node embeddings.The latter joint embedding and clustering optimization to refine these embeddings by clustering-oriented guidance and obtains clustering results simultaneously.Extensive experimental results demonstrate that CAHC outperforms baselines on eight datasets.

【2】FedLECC: Cluster- and Loss-Guided Client Selection for Federated Learning under Non-IID Data
标题：FedLECC：非IID数据下联邦学习的集群和丢失引导客户端选择
链接：https://arxiv.org/abs/2603.08911

作者：Daniel M. Jimenez-Gutierrez,Giovanni Giunta,Mehrdad Hassanzadeh,Aris Anagnostopoulos,Ioannis Chatzigiannakis,Andrea Vitaletti
备注：Accepted to the IEEE International Workshop on Intelligent Cloud Computing and Networking (ICCN) from the IEEE International Conference on Computer Communications (INFOCOM) 2026
摘要：联合学习（FL）通过允许协作模型训练而无需集中数据，从而在云边缘环境中实现分布式人工智能（AI）。在跨设备部署中，FL系统面临严格的通信和参与限制，以及强大的非独立同分布（非IID）数据，这会降低收敛性和模型质量。由于每轮训练中只有一部分设备（即客户端）可以参与，因此智能客户端选择成为系统的关键挑战。本文提出了FedLECC（Federated Learning with Enhanced Cluster Choice），一种轻量级的、集群感知的、丢失引导的跨设备FL客户端选择策略。FedLECC通过标签分布相似性对客户端进行分组，并优先考虑具有较高本地丢失的集群和客户端，从而能够选择少量但信息丰富且多样化的客户端。严重标签偏斜下的实验结果表明，FedLECC将测试准确率提高了12%，同时与强基线相比，将通信轮次减少了约22%，总体通信开销减少了50%。这些结果表明，明智的客户端选择提高了云边缘系统中FL工作负载的效率和可扩展性。
摘要：Federated Learning (FL) enables distributed Artificial Intelligence (AI) across cloud-edge environments by allowing collaborative model training without centralizing data. In cross-device deployments, FL systems face strict communication and participation constraints, as well as strong non-independent and identically distributed (non-IID) data that degrades convergence and model quality. Since only a subset of devices (a.k.a clients) can participate per training round, intelligent client selection becomes a key systems challenge. This paper proposes FedLECC (Federated Learning with Enhanced Cluster Choice), a lightweight, cluster-aware, and loss-guided client selection strategy for cross-device FL. FedLECC groups clients by label-distribution similarity and prioritizes clusters and clients with higher local loss, enabling the selection of a small yet informative and diverse set of clients. Experimental results under severe label skew show that FedLECC improves test accuracy by up to 12%, while reducing communication rounds by approximately 22% and overall communication overhead by up to 50% compared to strong baselines. These results demonstrate that informed client selection improves the efficiency and scalability of FL workloads in cloud-edge systems.

超分辨率|去噪|去模糊|去雾(1篇)

【1】Micro-Diffusion Compression -- Binary Tree Tweedie Denoising for Online Probability Estimation
标题：微扩散压缩--用于在线概率估计的二元树Tweedie去噪
链接：https://arxiv.org/abs/2603.08771

作者：Roberto Tacconelli
备注：12 pages, 1 figure
摘要：我们提出了Midicoth，一个无损压缩系统，它引入了一个微扩散去噪层，用于提高自适应统计模型产生的概率估计。在诸如部分匹配预测（PPM）之类的压缩器中，概率估计由先验平滑以处理稀疏观测。当上下文只被看到几次时，这种先验知识主导了预测，并产生比真实源分布明显更平坦的分布，导致压缩效率低下。Midicoth通过将先验平滑处理为收缩过程并应用反向去噪步骤来解决这一限制，该步骤使用经验校准统计来校正预测概率。为了使这种校正数据高效，该方法将每个字节预测分解成沿按位树的二元决策的层次结构。这将单个256路校准问题转换为一系列二元校准任务，从而能够从相对少量的观测值中可靠地估计校正项。去噪过程在多个连续步骤中应用，允许每个阶段细化前一阶段留下的残差预测误差。微扩散层作为一个轻量级的混合后校准阶段，在所有模型预测组合后应用，使其能够校正最终概率分布中的系统偏差。Midicoth结合了五个完全在线的组件：自适应PPM模型，长距离匹配模型，基于trie的单词模型，高阶上下文模型，以及作为最后阶段应用的微扩散去噪器。
摘要：We present Midicoth, a lossless compression system that introduces a micro-diffusion denoising layer for improving probability estimates produced by adaptive statistical models. In compressors such as Prediction by Partial Matching (PPM), probability estimates are smoothed by a prior to handle sparse observations. When contexts have been seen only a few times, this prior dominates the prediction and produces distributions that are significantly flatter than the true source distribution, leading to compression inefficiency. Midicoth addresses this limitation by treating prior smoothing as a shrinkage process and applying a reverse denoising step that corrects predicted probabilities using empirical calibration statistics. To make this correction data-efficient, the method decomposes each byte prediction into a hierarchy of binary decisions along a bitwise tree. This converts a single 256-way calibration problem into a sequence of binary calibration tasks, enabling reliable estimation of correction terms from relatively small numbers of observations. The denoising process is applied in multiple successive steps, allowing each stage to refine residual prediction errors left by the previous one. The micro-diffusion layer operates as a lightweight post-blend calibration stage applied after all model predictions have been combined, allowing it to correct systematic biases in the final probability distribution. Midicoth combines five fully online components: an adaptive PPM model, a long-range match model, a trie-based word model, a high-order context model, and the micro-diffusion denoiser applied as the final stage.

自动驾驶|车辆|车道检测等(4篇)

【1】Differentiable Stochastic Traffic Dynamics: Physics-Informed Generative Modelling in Transportation
标题：可区分随机交通动力学：交通中的物理信息生成模型
链接：https://arxiv.org/abs/2603.09174

作者：Wuping Xin
备注：29 pages
摘要：宏观交通流是随机的，但目前在交通文献中使用的基于物理的深度学习方法嵌入了确定性偏微分方程并产生点值输出;控制动力学的随机性在学习的表示中不起作用。这项工作开发了一个框架，其中的物理约束本身是分布式的，直接来自随机交通流动力学。从伊藤型Lighthill-Whitham-Richards模型出发，在布朗强迫下，我们推导出一个一点正向方程的边缘交通密度在每个空间位置。由守恒律引起的空间耦合表现为一个显式的条件漂移项，这使得封闭要求是透明的。基于这个公式，我们推导出一个等价的确定性概率流常微分方程，它是逐点可评估的，一旦指定了一个封闭可微的。将此作为物理约束，然后我们提出了一个具有对流闭合模块的分数网络，可通过去噪分数匹配和福克-普朗克残差损失来训练。由此产生的模型目标是一个数据条件密度分布，从中可以计算点估计，可信区间和风险度量。该框架提供了一个基础，分布式交通状态估计和随机基本图分析在物理信息生成设置。
摘要：Macroscopic traffic flow is stochastic, but the physics-informed deep learning methods currently used in transportation literature embed deterministic PDEs and produce point-valued outputs; the stochasticity of the governing dynamics plays no role in the learned representation. This work develops a framework in which the physics constraint itself is distributional and directly derived from stochastic traffic-flow dynamics. Starting from an Ito-type Lighthill-Whitham-Richards model with Brownian forcing, we derive a one-point forward equation for the marginal traffic density at each spatial location. The spatial coupling induced by the conservation law appears as an explicit conditional drift term, which makes the closure requirement transparent. Based on this formulation, we derive an equivalent deterministic Probability Flow ODE that is pointwise evaluable and differentiable once a closure is specified. Incorporating this as a physics constraint, we then propose a score network with an advection-closure module, trainable by denoising score matching together with a Fokker-Planck residual loss. The resulting model targets a data-conditioned density distribution, from which point estimates, credible intervals, and congestion-risk measures can be computed. The framework provides a basis for distributional traffic-state estimation and for stochastic fundamental-diagram analysis in a physics-informed generative setting.

【2】Probabilistic Hysteresis Factor Prediction for Electric Vehicle Batteries with Graphite Anodes Containing Silicon
标题：含硅石墨阳极电动汽车电池的概率滞后因子预测
链接：https://arxiv.org/abs/2603.09103

作者：Runyao Yu,Viviana Kleine,Philipp Gromotka,Thomas Rudolf,Adrian Eisenmann,Gautham Ram Chandra Mouli,Peter Palensky,Jochen L. Cremer
备注：11 pages, 5 figures, 6 tables
摘要：具有硅石墨基阳极的电池提供更高的能量密度和更好的充电性能，但会引入明显的电压滞后，使得充电状态（SoC）估计特别具有挑战性。现有的迟滞建模方法依赖于详尽的高保真测试或专注于传统的石墨基锂离子电池，而不考虑不确定性量化或计算约束。这项工作介绍了一种数据驱动的概率滞后因子预测方法，特别强调涉及硅石墨阳极电池的应用。提出了一个数据协调框架，以标准化不同操作条件下的异构驾驶循环。应用统计学习和深度学习模型来评估在考虑计算效率的同时预测具有不确定性的滞后因子的性能。通过再训练、zero-shot预测、微调和联合训练，进行了大量的实验，以评估最佳模型配置在看不见的车辆模型中的泛化能力。通过解决SoC评估中的关键挑战，这项研究促进了先进电池技术的采用。摘要页面可在https://runyao-yu.github.io/Porsche_Hysteresis_Factor_Prediction/上找到
摘要：Batteries with silicon-graphite-based anodes, which offer higher energy density and improved charging performance, introduce pronounced voltage hysteresis, making state-of-charge (SoC) estimation particularly challenging. Existing approaches to modeling hysteresis rely on exhaustive high-fidelity tests or focus on conventional graphite-based lithium-ion batteries, without considering uncertainty quantification or computational constraints. This work introduces a data-driven approach for probabilistic hysteresis factor prediction, with a particular emphasis on applications involving silicon-graphite anode-based batteries. A data harmonization framework is proposed to standardize heterogeneous driving cycles across varying operating conditions. Statistical learning and deep learning models are applied to assess performance in predicting the hysteresis factor with uncertainties while considering computational efficiency. Extensive experiments are conducted to evaluate the generalizability of the optimal model configuration in unseen vehicle models through retraining, zero-shot prediction, fine-tuning, and joint training. By addressing key challenges in SoC estimation, this research facilitates the adoption of advanced battery technologies. A summary page is available at: https://runyao-yu.github.io/Porsche_Hysteresis_Factor_Prediction/

【3】Latent World Models for Automated Driving: A Unified Taxonomy, Evaluation Framework, and Open Challenges
标题：自动驾驶的潜在世界模型：统一分类、评估框架和开放挑战
链接：https://arxiv.org/abs/2603.09086

作者：Rongxiang Zeng,Yongqi Dong
备注：17 pages, 6 figures, under review by IEEE Transactions on Intelligent Transportation Systems (IEEE-T-ITS)
摘要：新兴的生成世界模型和视觉-语言-动作（VLA）系统正在通过实现可扩展的模拟、长期预测和功能丰富的决策来迅速重塑自动驾驶。在这些方向上，潜在表示充当中央计算基底：它们压缩高维多传感器观测，实现时间上连贯的推出，并为规划，推理和可控生成提供接口。本文提出了一个统一的潜在空间框架，综合了世界自动驾驶模型的最新进展。该框架组织的设计空间的目标和形式的潜在表示（潜在的世界，潜在的行动，潜在的发电机，连续状态，离散令牌，和混合动力车）和结构先验的几何，拓扑和语义。在此分类的基础上，本文阐述了五个交叉的内部机制（即，结构同构，长期时间稳定性，语义和推理对齐，价值对齐的目标和后训练，以及自适应计算和审议），并将这些设计选择连接到鲁棒性，泛化性和可部署性。这项工作还提出了具体的评价处方，包括闭环指标套件和资源意识的审议成本，旨在减少开环/闭环不匹配。最后，本文确定了可操作的研究方向，以推进潜在世界模型，为决策做好准备，可验证，资源高效的自动驾驶。
摘要：Emerging generative world models and vision-language-action (VLA) systems are rapidly reshaping automated driving by enabling scalable simulation, long-horizon forecasting, and capability-rich decision making. Across these directions, latent representations serve as the central computational substrate: they compress high-dimensional multi-sensor observations, enable temporally coherent rollouts, and provide interfaces for planning, reasoning, and controllable generation. This paper proposes a unifying latent-space framework that synthesizes recent progress in world models for automated driving. The framework organizes the design space by the target and form of latent representations (latent worlds, latent actions, latent generators; continuous states, discrete tokens, and hybrids) and by structural priors for geometry, topology, and semantics. Building on this taxonomy, the paper articulates five cross-cutting internal mechanics (i.e, structural isomorphism, long-horizon temporal stability, semantic and reasoning alignment, value-aligned objectives and post-training, as well as adaptive computation and deliberation) and connects these design choices to robustness, generalization, and deployability. The work also proposes concrete evaluation prescriptions, including a closed-loop metric suite and a resource-aware deliberation cost, designed to reduce the open-loop / closed-loop mismatch. Finally, the paper identifies actionable research directions toward advancing latent world model for decision-ready, verifiable, and resource-efficient automated driving.

【4】Autonomous Edge-Deployed AI Agents for Electric Vehicle Charging Infrastructure Management
标题：用于电动汽车充电基础设施管理的自主边缘部署人工智能代理
链接：https://arxiv.org/abs/2603.08736

作者：Mohammed Cherifi
备注：27 pages, 10 figures (TikZ), 27 tables
摘要：公共电动汽车充电基础设施的故障率很高-现场研究报告称高达27.5%的直流快速充电器无法正常工作-并且解决问题的平均时间长达数天，每年造成数十亿美元的经济负担。以云为中心的架构无法实现自主操作所需的延迟、可靠性和带宽特性。我们提出了Auralink SDC（软件定义充电），这是一种在网络边缘部署领域专用AI代理的架构，用于自主充电基础设施管理。主要贡献包括：（1）置信度校准的自主解决方案（CCAR），使自主补救与正式的假阳性边界;（2）自适应检索增强推理（ARA），结合密集和稀疏检索与动态上下文分配;（3）Auralink边缘检索，实现低于50毫秒TTFT商品硬件下PREEMPT_RT约束;和（4）分层多智能体检索（HMAO）。实施使用通过QLoRA在跨OCPP 1.6/2.0.1，ISO 15118和操作事件历史的域语料库上微调的AuralinkLM模型。在受控环境中对18，000个标记事件的评估建立了78%的自主事件分辨率，87.6%的诊断准确率和28- 48 ms的TTFT延迟（P50）。这项工作提出了具有安全关键约束的边缘部署工业AI系统的架构和实现模式。
摘要：Public EV charging infrastructure suffers from significant failure rates -- with field studies reporting up to 27.5% of DC fast chargers non-functional -- and multi-day mean time to resolution, imposing billions in annual economic burden. Cloud-centric architectures cannot achieve the latency, reliability, and bandwidth characteristics required for autonomous operation. We present Auralink SDC (Software-Defined Charging), an architecture deploying domain-specialized AI agents at the network edge for autonomous charging infrastructure management. Key contributions include: (1) Confidence-Calibrated Autonomous Resolution (CCAR), enabling autonomous remediation with formal false-positive bounds; (2) Adaptive Retrieval-Augmented Reasoning (ARA), combining dense and sparse retrieval with dynamic context allocation; (3) Auralink Edge Runtime, achieving sub-50ms TTFT on commodity hardware under PREEMPT_RT constraints; and (4) Hierarchical Multi-Agent Orchestration (HMAO). Implementation uses AuralinkLM models fine-tuned via QLoRA on a domain corpus spanning OCPP 1.6/2.0.1, ISO 15118, and operational incident histories. Evaluation on 18,000 labeled incidents in a controlled environment establishes 78% autonomous incident resolution, 87.6% diagnostic accuracy, and 28-48ms TTFT latency (P50). This work presents architecture and implementation patterns for edge-deployed industrial AI systems with safety-critical constraints.

点云|SLAM|雷达|激光|深度RGBD相关(1篇)

【1】Memorization capacity of deep ReLU neural networks characterized by width and depth
标题：以宽度和深度为特征的深度ReLU神经网络的小型化能力
链接：https://arxiv.org/abs/2603.09589

作者：Xin Yang,Yunfei Yang
摘要：本文研究了具有ReLU激活的深度神经网络的记忆能力。具体来说，我们研究了这种网络的最小尺寸，以记忆任何$N$的数据点在单位球与成对分离距离$δ$和离散标签。大多数先前的研究通过参数或神经元的数量来表征记忆能力。我们通过构造宽度W和深度L满足W^2L^2 = O}（N log（δ^{-1}））的神经网络来推广这些结果，该神经网络可以记忆任意N$个数据样本。我们还证明了任何这样的网络也应该满足下界$W^2L^2=Ω（N \log（δ^{-1}））$，这意味着当$δ^{-1}$在$N$中是多项式时，我们的构造在对数因子上是最优的。因此，我们明确地描述了这种机制中深度神经网络记忆能力的宽度和深度之间的权衡。
摘要：This paper studies the memorization capacity of deep neural networks with ReLU activation. Specifically, we investigate the minimal size of such networks to memorize any $N$ data points in the unit ball with pairwise separation distance $δ$ and discrete labels. Most prior studies characterize the memorization capacity by the number of parameters or neurons. We generalize these results by constructing neural networks, whose width $W$ and depth $L$ satisfy $W^2L^2= \mathcal{O}(N\log(δ^{-1}))$, that can memorize any $N$ data samples. We also prove that any such networks should also satisfy the lower bound $W^2L^2=Ω(N \log(δ^{-1}))$, which implies that our construction is optimal up to logarithmic factors when $δ^{-1}$ is polynomial in $N$. Hence, we explicitly characterize the trade-off between width and depth for the memorization capacity of deep neural networks in this regime.

推理|分析|理解|解释(15篇)

【1】Think Before You Lie: How Reasoning Improves Honesty
标题：撒谎前三思：推理如何提高诚实
链接：https://arxiv.org/abs/2603.09957

作者：Ann Yuan,Asma Ghandeharioun,Carter Blum,Alicia Machado,Jessica Hoffmann,Daphne Ippolito,Martin Wattenberg,Lucas Dixon,Katja Filippova
摘要：虽然大型语言模型（LLM）的现有评估测量欺骗率，但引起欺骗行为的潜在条件却知之甚少。我们调查这个问题，使用一个新的数据集的现实道德权衡诚实招致可变成本。与人类相反，如果有时间考虑，人类往往会变得不那么诚实（Capraro，2017; Capraro等人，2019年），我们发现，推理不断增加跨尺度和几个LLM家庭的诚实。这种影响不仅是推理内容的函数，因为推理痕迹通常是最终行为的不良预测因素。相反，我们表明，潜在的几何形状的代表性空间本身的影响。也就是说，我们观察到，在这个空间中的欺骗性区域是亚稳态的：欺骗性的答案更容易不稳定的输入释义，输出restriking，和激活噪声比诚实的。我们这样解释推理的效果：作为道德推理的一部分，生成慎思令牌需要遍历有偏见的表征空间，最终将模型推向更稳定、更诚实的默认值。
摘要：While existing evaluations of large language models (LLMs) measure deception rates, the underlying conditions that give rise to deceptive behavior are poorly understood. We investigate this question using a novel dataset of realistic moral trade-offs where honesty incurs variable costs. Contrary to humans, who tend to become less honest given time to deliberate (Capraro, 2017; Capraro et al., 2019), we find that reasoning consistently increases honesty across scales and for several LLM families. This effect is not only a function of the reasoning content, as reasoning traces are often poor predictors of final behaviors. Rather, we show that the underlying geometry of the representational space itself contributes to the effect. Namely, we observe that deceptive regions within this space are metastable: deceptive answers are more easily destabilized by input paraphrasing, output resampling, and activation noise than honest ones. We interpret the effect of reasoning in this vein: generating deliberative tokens as part of moral reasoning entails the traversal of a biased representational space, ultimately nudging the model toward its more stable, honest defaults.

【2】From Semantics to Pixels: Coarse-to-Fine Masked Autoencoders for Hierarchical Visual Understanding
标题：从语义到像素：从粗到细的掩蔽自动编码器，用于分层视觉理解
链接：https://arxiv.org/abs/2603.09955

作者：Wenzhao Xiang,Yue Wu,Hongyang Yu,Feng Gao,Fan Yang,Xilin Chen
摘要：自监督视觉预训练方法面临着一个固有的紧张局势：对比学习（CL）捕捉全局语义，但失去了细粒度的细节，而掩蔽图像建模（MIM）保留了局部纹理，但由于语义不可知的随机掩蔽而遭受“注意力漂移”。我们提出了C2FMAE，一个从粗到细的掩码自动编码器，通过明确学习三个数据粒度的分层视觉表示来解决这种紧张局势：语义掩码（场景级），实例掩码（对象级）和RGB图像（像素级）。两个协同创新实施了严格的自上而下的学习原则。首先，级联解码器顺序地从场景语义到对象实例再到像素细节进行重建，建立并行解码器无法捕获的显式跨粒度依赖关系。其次，渐进式掩蔽课程动态地将训练重点从语义引导转移到实例引导，最后转移到随机掩蔽，从而创建从全局上下文到局部特征的结构化学习路径。为了支持这个框架，我们为所有1.28 M ImageNet-1 K图像构建了一个具有高质量伪标签的大规模多粒度数据集。大量的实验表明，C2FMAE在图像分类，对象检测和语义分割方面取得了显着的性能提升，验证了我们的分层设计在学习更强大和更通用的表示方面的有效性。
摘要：Self-supervised visual pre-training methods face an inherent tension: contrastive learning (CL) captures global semantics but loses fine-grained detail, while masked image modeling (MIM) preserves local textures but suffers from "attention drift" due to semantically-agnostic random masking. We propose C2FMAE, a coarse-to-fine masked autoencoder that resolves this tension by explicitly learning hierarchical visual representations across three data granularities: semantic masks (scene-level), instance masks (object-level), and RGB images (pixel-level). Two synergistic innovations enforce a strict top-down learning principle. First, a cascaded decoder sequentially reconstructs from scene semantics to object instances to pixel details, establishing explicit cross-granularity dependencies that parallel decoders cannot capture. Second, a progressive masking curriculum dynamically shifts the training focus from semantic-guided to instance-guided and finally to random masking, creating a structured learning path from global context to local features. To support this framework, we construct a large-scale multi-granular dataset with high-quality pseudo-labels for all 1.28M ImageNet-1K images. Extensive experiments show that C2FMAE achieves significant performance gains on image classification, object detection, and semantic segmentation, validating the effectiveness of our hierarchical design in learning more robust and generalizable representations.

【3】FreqCycle: A Multi-Scale Time-Frequency Analysis Method for Time Series Forecasting
标题：FreqLoop：一种用于时间序列预测的多尺度时频分析方法
链接：https://arxiv.org/abs/2603.09661

作者：Boya Zhang,Shuaijie Yin,Huiwen Zhu,Xing He
备注：18 pages, 17 figures, accepted to AAAI 2026. Code available at https://github.com/boya-zhang-ai/FreqCycle
摘要：时频特征挖掘是时间序列预测的关键。现有的研究主要集中在建模低频模式，其中大多数时间序列能量集中。对中高频的忽视继续限制了深度学习模型的进一步性能提升。我们提出了FreqCycle，这是一个新型框架，集成了：（i）滤波器增强周期预测（FECF）模块，通过显式学习时域中共享的周期模式来提取低频特征，以及（ii）分段频域模式学习（SFPL）模块通过可学习的滤波器和自适应加权来增强中高频能量比例。此外，时间序列数据往往表现出耦合的多周期性，如交织的每周和每日周期。为了解决耦合多周期性以及长回顾窗口的挑战，我们将FreqCycle分层扩展为MFreqCycle，它通过跨尺度交互来实现嵌套的周期性特征。在七个不同领域的基准测试上进行的大量实验表明，FreqCycle在保持更快推理速度的同时，实现了最先进的准确性，在性能和效率之间取得了最佳平衡。
摘要：Mining time-frequency features is critical for time series forecasting. Existing research has predominantly focused on modeling low-frequency patterns, where most time series energy is concentrated. The overlooking of mid to high frequency continues to limit further performance gains in deep learning models. We propose FreqCycle, a novel framework integrating: (i) a Filter-Enhanced Cycle Forecasting (FECF) module to extract low-frequency features by explicitly learning shared periodic patterns in the time domain, and (ii) a Segmented Frequency-domain Pattern Learning (SFPL) module to enhance mid to high frequency energy proportion via learnable filters and adaptive weighting. Furthermore, time series data often exhibit coupled multi-periodicity, such as intertwined weekly and daily cycles. To address coupled multi-periodicity as well as long lookback window challenges, we extend FreqCycle hierarchically into MFreqCycle, which decouples nested periodic features through cross-scale interactions. Extensive experiments on seven diverse domain benchmarks demonstrate that FreqCycle achieves state-of-the-art accuracy while maintaining faster inference speeds, striking an optimal balance between performance and efficiency.

【4】Multi-DNN Inference of Sparse Models on Edge SoCs
标题：边缘SOC上稀疏模型的多DNN推断
链接：https://arxiv.org/abs/2603.09642

作者：Jiawei Luo,Di Wu,Simon Dobson,Blesson Varghese
摘要：现代边缘应用程序越来越需要多DNN推理系统在异构处理器上执行任务，从并发执行和将每个模型匹配到最合适的加速器中获得性能。然而，现有的系统只支持一个单一的模型（或几个稀疏的变种）每个任务，这阻碍了这种匹配的效率，并导致高服务水平目标违反率。我们为多DNN推理系统引入了模型拼接，它通过重组稀疏模型的子图来创建模型变体，而无需重新训练。我们提出了一个演示系统，SparseLoom，显示模型拼接可以部署到SoC。我们的实验表明，与最先进的多DNN推理系统相比，SparseLoom将SLO违规率降低了74%，吞吐量提高了2.31倍，内存开销平均降低了28%。
摘要：Modern edge applications increasingly require multi-DNN inference systems to execute tasks on heterogeneous processors, gaining performance from both concurrent execution and from matching each model to the most suited accelerator. However, existing systems support only a single model (or a few sparse variants) per task, which impedes the efficiency of this matching and results in high Service Level Objective violation rates. We introduce model stitching for multi-DNN inference systems, which creates model variants by recombining subgraphs from sparse models without re-training. We present a demonstrator system, SparseLoom, that shows model stitching can be deployed to SoCs. We show experimentally that SparseLoom reduces SLO violation rates by up to 74%, improves throughput by up to 2.31x, and lowers memory overhead by an average of 28% compared to state-of-the-art multi-DNN inference systems.

【5】Towards Understanding Adam Convergence on Highly Degenerate Polynomials
标题：理解高度退化的多元性上的Adam收敛
链接：https://arxiv.org/abs/2603.09581

作者：Zhiwei Bai,Jiajie Zhao,Zhangchen Zhou,Zhi-Qin John Xu,Yaoyu Zhang
摘要：Adam是深度学习中广泛使用的优化算法，但它表现出固有优势的特定类别的目标函数仍然没有得到充分的探索。不同于以往的研究需要外部的收敛器和$β_2$接近1的收敛性，这项工作研究的“自然”的自收敛性质的亚当。我们确定了一类高度退化的多项式，亚当自动收敛，无需额外的搜索器。具体来说，我们推导出退化多项式的局部渐近稳定的理论条件，并证明理论界和实验结果之间的强对齐。我们证明了亚当实现这些退化函数的局部线性收敛，显着优于梯度下降和动量的次线性收敛。这种加速源于二阶矩$v_t$和平方梯度$g_t^2$之间的解耦机制，它以指数方式放大了有效的学习率。最后，我们描述亚当的超参数相图，确定三个不同的行为制度：稳定的收敛，尖峰，和SignGD样振荡。
摘要：Adam is a widely used optimization algorithm in deep learning, yet the specific class of objective functions where it exhibits inherent advantages remains underexplored. Unlike prior studies requiring external schedulers and $β_2$ near 1 for convergence, this work investigates the "natural" auto-convergence properties of Adam. We identify a class of highly degenerate polynomials where Adam converges automatically without additional schedulers. Specifically, we derive theoretical conditions for local asymptotic stability on degenerate polynomials and demonstrate strong alignment between theoretical bounds and experimental results. We prove that Adam achieves local linear convergence on these degenerate functions, significantly outperforming the sub-linear convergence of Gradient Descent and Momentum. This acceleration stems from a decoupling mechanism between the second moment $v_t$ and squared gradient $g_t^2$, which exponentially amplifies the effective learning rate. Finally, we characterize Adam's hyperparameter phase diagram, identifying three distinct behavioral regimes: stable convergence, spikes, and SignGD-like oscillation.

【6】Compiler-First State Space Duality and Portable $O(1)$ Autoregressive Caching for Inference
标题：面向推理的可移植$O（1）$自回归缓存
链接：https://arxiv.org/abs/2603.09555

作者：Cosmo Santoni
备注：18 pages, 6 figures. Code available at: https://github.com/CosmoNaught/mamba2-jax
摘要：状态空间模型版本通常耦合到融合的CUDA和Triton内核，继承了对NVIDIA硬件的严格依赖。我们表明，Mamba-2的状态空间对偶算法-对角状态结构，可分块递归和具有静态控制流的einsum主导计算-干净地映射到XLA的融合和平铺传递实际优化的内容，使自定义内核可选而不是必需的。我们实现了完整的推理路径（预填充，缓存自回归解码）的XLA下的形状标准原语，没有手写的内核，并实现了架构的理论O（1）$状态管理作为一个编译的设备上的缓存，在生成过程中不需要主机同步。该实现可以在CPU、NVIDIA GPU和Google Cloud TPU上从单个JAX源运行。在TPU v6 e上，跨越五个模型规模（130 M-2.7 B参数），XLA生成的代码在单流预填充时达到约140 TFLOPS（15%$ MFU），解码时带宽利用率高达64%$。贪婪解码在64个步骤中匹配PyTorch/CUDA参考token-for-token，隐藏状态协议在float 32舍入公差范围内。该模式转移到任何满足相同结构条件的SSM递归，在任何具有成熟XLA后端的平台上。该实现在https://github.com/CosmoNaught/mamba2-jax上公开提供，并合并到Bonsai JAX模型库中。
摘要：State-space model releases are typically coupled to fused CUDA and Triton kernels, inheriting a hard dependency on NVIDIA hardware. We show that Mamba-2's state space duality algorithm -- diagonal state structure, chunkable recurrence, and einsum-dominated compute with static control flow -- maps cleanly onto what XLA's fusion and tiling passes actually optimise, making custom kernels optional rather than required. We implement the full inference path (prefill, cached autoregressive decoding) as shaped standard primitives under XLA, without hand-written kernels, and realise the architecture's theoretical $O(1)$ state management as a compiled on-device cache requiring no host synchronisation during generation. The implementation runs unmodified on CPU, NVIDIA GPU, and Google Cloud TPU from a single JAX source. On TPU v6e across five model scales (130M--2.7B parameters), XLA-generated code reaches approximately 140 TFLOPS on single-stream prefill ($15%$ MFU) and up to $64%$ bandwidth utilisation on decode. Greedy decoding matches the PyTorch/CUDA reference token-for-token across 64 steps, with hidden-state agreement within float32 rounding tolerance. The pattern transfers to any SSM recurrence satisfying the same structural conditions, on any platform with a mature XLA backend. The implementation is publicly available at https://github.com/CosmoNaught/mamba2-jax and merged into the Bonsai JAX model library.

【7】Efficient Reasoning at Fixed Test-Time Cost via Length-Aware Attention Priors and Gain-Aware Training
标题：通过长度感知注意力先验和收益感知训练以固定测试时间成本进行高效推理
链接：https://arxiv.org/abs/2603.09253

作者：Rian Atri
备注：19 pages, 6 tables, 1 figure. NeurIPS 2025 Workshop on Efficient Reasoning
摘要：我们研究紧计算下的有效推理。我们询问如何在不增加测试时间成本的情况下做出结构化的、正确的决策。我们将两个仅训练的组件添加到小型和中型Transformers，它们也可以转移到更广泛的可微优化器。首先，通过模糊状态位置对齐（RPA）构建的长度感知注意力先验产生了一个归一化的前softmax偏差，它像结构化正则化器一样引导注意力，同时不添加新的推理参数。第二，最小增益感知控制器，监护人，轻推注意力锐度只有当验证改进保证它，以下两个时间尺度的政策梯度视图的非凸优化。它在推理时被禁用。KL透视图将z加上log pi的softmax显示为具有KL正则化的MAP，将先验知识建立在原则性目标上。在WikiText 2的严格计算奇偶校验下，我们在匹配基线延迟和内存的同时减少了验证交叉熵。在推理时，我们添加一个预先计算的、缓存的先验B of T作为每个人头的单个加性偏差。控制器不运行。在实践中，这会产生可忽略的开销，每个头的缓存偏差增加，没有可测量的p50延迟偏移。我们的研究结果表明，长度感知先验和后期相位增益控制保持稀缺的改进，特别是在长跨度，嘈杂的logit制度，同时保持测试时间成本有效不变。
摘要：We study efficient reasoning under tight compute. We ask how to make structured, correct decisions without increasing test time cost. We add two training only components to small and medium Transformers that also transfer to broader differentiable optimizers. First, a length aware attention prior built via fuzzy regime position alignment, RPA, yields a normalized pre softmax bias that guides attention like a structured regularizer while adding no new inference parameters. Second, a minimal gain aware controller, Guardian, nudges attention sharpness only when validation improvements warrant it, following a two timescale policy gradient view of nonconvex optimization. It is disabled at inference. A KL perspective shows softmax of z plus log pi as MAP with KL regularization, grounding the prior in a principled objective. Under strict compute parity on WikiText 2, we reduce validation cross entropy while matching baseline latency and memory. At inference, we add a precomputed, cached prior B of T as a single additive bias per head. The controller does not run. In practice, this incurs negligible overhead, a cached bias add per head, with no measurable p50 latency shift. Our results suggest that length aware priors and late phase gain control preserve scarce improvements, especially in long span, noisy logit regimes, while keeping test time costs effectively unchanged.

【8】The Reasoning Trap -- Logical Reasoning as a Mechanistic Pathway to Situational Awareness
标题：推理陷阱--逻辑推理作为情境意识的机械途径
链接：https://arxiv.org/abs/2603.09200

作者：Subramanyam Sahoo,Aman Chadha,Vinija Jain,Divya Chaudhary
备注：Accepted at ICLR 2026 Workshop on Logical Reasoning of Large Language Models. 21 Pages. Position Paper
摘要：情境意识，人工智能系统识别自身性质、理解其训练和部署环境以及对其环境进行战略性推理的能力，被广泛认为是先进人工智能系统中最危险的紧急能力之一。另外，越来越多的研究努力寻求提高大型语言模型（LLM）在演绎、归纳和溯因方面的逻辑推理能力。在本文中，我们认为，这两个研究轨迹上的碰撞过程。我们介绍了RAISE框架（推理推进到自我检查），它确定了三个机制的途径，通过这些途径，逻辑推理的改进使情境意识逐步深入：演绎自我推理，归纳上下文识别和溯因自我建模。我们形式化每一条途径，构建一个升级阶梯，从基本的自我识别到战略欺骗，并证明LLM逻辑推理中的每一个主要研究课题都直接映射到情境意识的特定放大器上。我们进一步分析了为什么目前的安全措施不足以防止这种升级。最后，我们提出了具体的保障措施，包括“镜像测试”基准和推理安全奇偶校验原则，并提出了一个不舒服的，但必要的问题，逻辑推理社区在这条轨道上的责任。
摘要：Situational awareness, the capacity of an AI system to recognize its own nature, understand its training and deployment context, and reason strategically about its circumstances, is widely considered among the most dangerous emergent capabilities in advanced AI systems. Separately, a growing research effort seeks to improve the logical reasoning capabilities of large language models (LLMs) across deduction, induction, and abduction. In this paper, we argue that these two research trajectories are on a collision course. We introduce the RAISE framework (Reasoning Advancing Into Self Examination), which identifies three mechanistic pathways through which improvements in logical reasoning enable progressively deeper levels of situational awareness: deductive self inference, inductive context recognition, and abductive self modeling. We formalize each pathway, construct an escalation ladder from basic self recognition to strategic deception, and demonstrate that every major research topic in LLM logical reasoning maps directly onto a specific amplifier of situational awareness. We further analyze why current safety measures are insufficient to prevent this escalation. We conclude by proposing concrete safeguards, including a "Mirror Test" benchmark and a Reasoning Safety Parity Principle, and pose an uncomfortable but necessary question to the logical reasoning community about its responsibility in this trajectory.

【9】Latent-DARM: Bridging Discrete Diffusion And Autoregressive Models For Reasoning
标题：潜在危险：跨越离散扩散和自回归模型进行推理
链接：https://arxiv.org/abs/2603.09184

作者：Lina Berrayana,Ahmed Heakl,Abdullah Sohail,Thomas Hofmann,Salman Khan,Wei Chen
备注：Published at LIT Workshop at ICLR 2026
摘要：大多数多智能体系统完全依赖于基于顺序生成的自回归语言模型（ARM）。虽然对流畅的文本有效，但ARM限制了全局推理和计划修订。另一方面，离散扩散语言模型（DDLM）使非顺序的，全球可修订的生成，并已显示出强大的规划能力，但其有限的文本流畅性阻碍了与ARM的直接合作。我们引入了Latent-DARM，这是一个连接DDLM（规划者）和ARM（执行者）的潜在空间通信框架，最大限度地提高了协作效益。在数学、科学和常识推理基准测试中，Latent-DARM的平均性能优于基于文本的界面，在DART-5上将准确率从27.0%提高到36.0%，在AIME 2024上将准确率从0.0%提高到14.0%。Latent-DARM接近最先进的推理模型的结果，同时使用不到其令牌预算的2.2%。这项工作推进了多Agent之间的协作与异构模型的代理。
摘要：Most multi-agent systems rely exclusively on autoregressive language models (ARMs) that are based on sequential generation. Although effective for fluent text, ARMs limit global reasoning and plan revision. On the other hand, Discrete Diffusion Language Models (DDLMs) enable non-sequential, globally revisable generation and have shown strong planning capabilities, but their limited text fluency hinders direct collaboration with ARMs. We introduce Latent-DARM, a latent-space communication framework bridging DDLM (planners) and ARM (executors), maximizing collaborative benefits. Across mathematical, scientific, and commonsense reasoning benchmarks, Latent-DARM outperforms text-based interfaces on average, improving accuracy from 27.0% to 36.0% on DART-5 and from 0.0% to 14.0% on AIME2024. Latent-DARM approaches the results of state-of-the-art reasoning models while using less than 2.2% of its token budget. This work advances multi-agent collaboration among agents with heterogeneous models.

【10】The $qs$ Inequality: Quantifying the Double Penalty of Mixture-of-Experts at Inference
标题：$qs$不平等：量化推理中混合专家的双重惩罚
链接：https://arxiv.org/abs/2603.08960

作者：Vignesh Adhinarayanan,Nuwan Jayasena
备注：10 pages, 6 tables
摘要：混合专家（MoE）模型在低训练FLOP下提供高质量，但这种效率通常在推理时消失。我们确定了一个双重惩罚，结构上的缺点MoE架构在解码过程中：首先，专家路由片段微批和减少权重重用;第二，大量的常驻专家池减少高带宽内存（HBM）的KV缓存的净空。这种现象，形式化为重用碎片，推动前馈网络（FFN）到带宽约束的制度，特别是在长上下文长度。我们引入了$qs$不等式，一个预测标准，确定当MoE是结构上不利的质量匹配的密集模型。该标准统一了稀疏性（$s$），每个令牌激活的参数的分数，以及质量等效因子（$q$），密集模型匹配MoE性能所需的大小乘数。我们对包括DeepSeek-V3、Qwen 3 - 235 B、Grok-1和Switch-C在内的前沿模型的评估表明，这种碎片化是一种普遍的架构现象。对于128 k上下文中的DeepSeek-V3，这导致质量匹配的密集基线的4.5倍吞吐量优势。至关重要的是，像Switch-C这样的大规模架构在质量匹配的密集模型仍然可行的集群规模上可能变得不可行。我们的研究结果表明，训练时间FLOP效率是一个不完整的代理推理时间的性能在长上下文服务。他们还指出，MoE可能最好被视为一个训练时间优化，与蒸馏到密集的模型作为一个可能的路径，以推理效率的部署。
摘要：Mixture-of-Experts (MoE) models deliver high quality at low training FLOPs, but this efficiency often vanishes at inference. We identify a double penalty that structurally disadvantages MoE architectures during decoding: first, expert routing fragments microbatches and reduces weight reuse; second, massive resident expert pools reduce high-bandwidth memory (HBM) headroom for the KV cache. This phenomenon, formalized as reuse fragmentation, pushes feed-forward networks (FFNs) into a bandwidth-bound regime, especially at long context lengths. We introduce the $qs$ inequality, a predictive criterion that identifies when MoE is structurally disadvantaged relative to a quality-matched dense model. This criterion unifies sparsity ($s$), the fraction of parameters activated per token, and the quality-equivalence factor ($q$), the size multiplier required for a dense model to match MoE performance. Our evaluation across frontier models including DeepSeek-V3, Qwen3-235B, Grok-1, and Switch-C demonstrates that this fragmentation is a general architectural phenomenon. For DeepSeek-V3 at 128k context, this results in a 4.5x throughput advantage for a quality-matched dense baseline. Crucially, massive architectures like Switch-C can become infeasible on cluster sizes where a quality-matched dense model remains viable. Our results suggest that training-time FLOP efficiency is an incomplete proxy for inference-time performance in long-context serving. They also indicate that MoE may be best viewed as a training-time optimization, with distillation into dense models as a possible path toward inference-efficient deployment.

【11】Data-Rate-Aware High-Speed CNN Inference on FPGAs
标题：基于DSP的数据速率感知高速CNN推理
链接：https://arxiv.org/abs/2603.08726

作者：Tobias Habermann,Martin Kumm
摘要：FPGA上基于数据流的CNN加速器通过将每层的计算直接映射到相应的硬件单元来实现低延迟和高吞吐量。然而，诸如池化和跨步卷积之类的层相对于其输入减少了其输出处的数据，强烈影响了后续层的数据速率。这导致在完全展开的设计中利用不足。虽然先前的工作引入了数据速率感知的逐层自适应，但确定最有效的实现仍然具有挑战性。本文提出了一种用于多像素处理的数据速率感知的CNN加速器架构。基于现有的分析模型，所提出的方法进行设计空间探索，以确定提高硬件利用率和资源效率的配置，同时保持连续的数据流，保持所有硬件单元忙碌。实验结果表明，与之前的设计相比，算法资源大幅减少，从而能够在单个FPGA上在各种数据速率下高效实现复杂的CNN。
摘要：Dataflow-based CNN accelerators on FPGAs achieve low latency and high throughput by mapping computations of each layer directly to corresponding hardware units. However, layers such as pooling and strided convolutions reduce the data at their output with respect to their input, strongly effecting the data rate of the following layers. This leads to underutilization in fully unrolled designs. While prior work introduced data-rate-aware layer-wise adaptation, determining the most efficient implementation remains challenging. This paper presents a data-rate-aware CNN accelerator architecture for multi-pixel processing. Building on existing analytical models, the proposed method performs design-space exploration to identify configurations that improve hardware utilization and resource efficiency while preserving continuous flow of data, keeping all hardware units busy. Experimental results show substantial reductions in arithmetic resources compared to previous designs, enabling efficient implementation of complex CNNs on a single FPGA across a wide range of data rates.

【12】Performance Analysis of Edge and In-Sensor AI Processors: A Comparative Review
标题：边缘和传感器内人工智能处理器的性能分析：比较评论
链接：https://arxiv.org/abs/2603.08725

作者：Luigi Capogrosso,Pietro Bonazzi,Michele Magno
备注：Accepted at the IEEE International Instrumentation and Measurement Technology Conference (I2MTC) 2026
摘要：本综述探讨了超低功耗边缘处理器的快速发展前景，涵盖异构片上系统（SoC）、神经加速器、近传感器和传感器内架构以及新兴的低功耗和以内存为中心的设计。我们根据其计算模式、功率包络和内存层次结构对商用和研究级平台进行分类，并分析其对始终在线和延迟关键型人工智能（AI）工作负载的适用性。为了用经验证据来补充架构概述，我们对3.36亿个Multiple-Accumulate（MAC）细分模型进行了基准测试（PicoSAM 2）在三个代表性处理器上实现：GAP 9，利用硬件加速器增强的多核RISC-V架构; STM32 N6，将先进的ARM Cortex-M55内核与专用神经架构加速器配对;和索尼IMX 500，代表在传感器堆叠互补金属氧化物半导体（CMOS）计算。总的来说，这些平台跨越了MCU级、嵌入式神经加速器和传感器内模式。评估报告延迟、推理效率、能量效率和能量延迟乘积。结果显示，硬件行为存在明显差异，IMX 500实现了最高的利用率（86.2 MAC/周期）和最低的能量延迟积，突出了传感器内处理的重要性和技术成熟度。GAP 9在微处理器级功耗预算范围内提供最佳能效，而STM32 N6以显著更高的能源成本提供最低的原始延迟。总之，审查和基准测试提供了当前设计方向和实际权衡的统一视图，这些设计方向和实际权衡正在塑造下一代超低功耗和传感器内AI处理器。
摘要：This review examines the rapidly evolving landscape of ultra-low-power edge processors, covering heterogeneous Systems-on-Chips (SoCs), neural accelerators, near-sensor and in-sensor architectures, and emerging dataflow and memory-centric designs. We categorize commercially available and research-grade platforms according to their compute paradigms, power envelopes, and memory hierarchies, and analyze their suitability for always-on and latency-critical Artificial Intelligence (AI) workloads. To complement the architectural overview with empirical evidence, we benchmark a 336 million Multiply-Accumulate (MAC) segmentation model (PicoSAM2) on three representative processors: GAP9, leveraging a multi-core RISC-V architecture augmented with hardware accelerators; the STM32N6, which pairs an advanced ARM Cortex-M55 core with a dedicated neural architecture accelerator; and the Sony IMX500, representing in-sensor stacked-Complementary Metal-Oxide-Semiconductor (CMOS) compute. Collectively, these platforms span MCU-class, embedded neural accelerator, and in-sensor paradigms. The evaluation reports latency, inference efficiency, energy efficiency, and energy-delay product. The results show a clear divergence in hardware behavior, with the IMX500 achieving the highest utilization (86.2 MAC/cycle) and the lowest energy-delay product, highlighting the growing significance and technological maturity of in-sensor processing. GAP9 offers the best energy efficiency within microcontroller-class power budgets, and the STM32N6 provides the lowest raw latency at a significantly higher energy cost. Together, the review and benchmarks provide a unified view of the current design directions and practical trade-offs that are shaping the next generation of ultra-low-power and in-sensor AI processors.

【13】ALADIN: Accuracy-Latency-Aware Design-space Inference Analysis for Embedded AI Accelerators
标题：ALAdin：嵌入式人工智能加速器的准确性、延迟性、感知设计空间推理分析
链接：https://arxiv.org/abs/2603.08722

作者：T. Baldi,D. Casini,A. Biondi
备注：Under review
摘要：深度神经网络（DNN）在资源受限的嵌入式系统上的推理引入了模型准确性、计算延迟和硬件限制之间的非平凡权衡，特别是当必须满足实时约束时。本文介绍了ALADIN，这是一个针对基于暂存器的AI加速器的混合精度量化神经网络（QNN）的精确延迟感知设计空间推理分析框架。ALADIN能够评估和分析推理瓶颈，并在准确性、延迟和资源消耗方面进行设计权衡，而无需在目标平台上进行部署，从而显著减少开发时间和成本。该框架引入了一个渐进的细化过程，通过整合平台无关的实现细节和硬件特定的特征，将规范的QONNX模型转换为平台感知的表示。ALADIN使用专门用于AI工作负载的基于RISC-V的平台的周期精确模拟器进行验证，证明其作为定量推理分析和软硬件协同设计工具的有效性。实验结果突出了架构决策和混合精度量化策略如何影响准确性，延迟和资源使用，并表明这些影响可以使用ALADIN进行精确评估和比较，同时还揭示了微妙的优化紧张局势。
摘要：The inference of deep neural networks (DNNs) on resource-constrained embedded systems introduces non-trivial trade-offs among model accuracy, computational latency, and hardware limitations, particularly when real-time constraints must be satisfied. This paper presents ALADIN, an accuracy-latency-aware design-space inference analysis framework for mixed-precision quantized neural networks (QNNs) targeting scratchpad-based AI accelerators. ALADIN enables the evaluation and analysis of inference bottlenecks and design trade-offs across accuracy, latency, and resource consumption without requiring deployment on the target platform, thereby significantly reducing development time and cost. The framework introduces a progressive refinement process that transforms a canonical QONNX model into platform-aware representations by integrating both platform-independent implementation details and hardware-specific characteristics. ALADIN is validated using a cycle-accurate simulator of a RISC-V based platform specialized for AI workloads, demonstrating its effectiveness as a tool for quantitative inference analysis and hardware-software co-design. Experimental results highlight how architectural decisions and mixed-precision quantization strategies impact accuracy, latency, and resource usage, and show that these effects can be precisely evaluated and compared using ALADIN, while also revealing subtle optimization tensions.

【14】Statistical Inference via Generative Models: Flow Matching and Causal Inference
标题：基于生成模型的统计推断：流匹配和因果推断
链接：https://arxiv.org/abs/2603.09009

作者：Shinto Eguchi
摘要：生成式人工智能已经取得了显著的经验成功，但从统计学的角度来看，它往往仍然是不透明的：它的预测可能是准确的，但其潜在的机制很难解释，分析和信任。这本书用统计学的语言重新解释了生成式人工智能，使用流匹配作为中心示例。关键的思想是，生成模型不应该仅仅被理解为产生合理数据的设备，而是作为高维概率分布的非参数学习方法。从这个角度来看，缺失数据填补成为原则性抽样从学习的条件分布，反事实分析成为干预分布的估计，和分布动态成为统计分析对象。在数学上，流匹配表示通过连续性方程和时间相关的速度场的分布变形，从而将分数匹配从静态分数场的学习扩展到运输路径本身的学习。在此基础上，本书开发了一个统计框架，其中生成模型用于估计滋扰组件，同时通过正交化和交叉拟合保持推理有效性，以双重/去偏机器学习的精神。生存分析，审查，缺失和因果推理的应用程序显示如何生成模型可以集成到结构化高维问题的统计推断。
摘要：Generative AI has achieved remarkable empirical success, but from the perspective of statistics it often remains opaque: its predictions may be accurate, yet the underlying mechanism is difficult to interpret, analyze, and trust. This book reinterprets generative AI in the language of statistics, using flow matching as a central example. The key idea is that generative models should be understood not merely as devices for producing plausible data, but as methods for the nonparametric learning of high-dimensional probability distributions. From this viewpoint, missing-data imputation becomes principled sampling from learned conditional distributions, counterfactual analysis becomes the estimation of intervention distributions, and distributional dynamics become statistically analyzable objects. Mathematically, flow matching represents distributional deformation through the continuity equation and a time-dependent velocity field, thereby extending score matching from the learning of static score fields to the learning of transport paths themselves. Building on this foundation, the book develops a statistical framework in which generative models are used to estimate nuisance components while inferential validity is maintained through orthogonalization and cross-fitting in the spirit of double/debiased machine learning. Applications to survival analysis, censoring, missingness, and causal inference show how generative models can be integrated into statistical inference for structured high-dimensional problems.

【15】Towards Reliable Simulation-based Inference
标题：迈向可靠的基于模拟的推理
链接：https://arxiv.org/abs/2603.08947

作者：Arnaud Delaunoy
备注：PhD thesis
摘要：科学知识通过观察世界，假设一些关于它的理论，并根据收集的数据进行测试来扩展。当这些理论采取统计模型的形式时，统计分析就参与了检验和完善科学假设的过程。在这篇论文中，我们专注于采用科学模拟器形式的统计模型，并提供有关机器学习如何在此背景下用于统计分析的背景。本论文的第一部分是关于经验表明，使用机器学习进行统计分析涉及一定程度的近似。具体来说，所有的统计分析都涉及一定程度的不确定性的结论，我们表明，近似可能导致过度自信的结论。我们提请注意这种过度自信的结论，并介绍了一个标准来诊断过度自信的近似。在第二部分中，我们介绍了平衡，一种正则化机器学习模型的方法，以减少过度自信，并支持校准或不自信的近似。平衡首先介绍了神经比率估计算法，然后扩展到其他算法。直觉为什么平衡导致不太过于自信的解决方案，它是经验表明，平衡的算法往往要么接近校准或信心不足。第三部分表明，贝叶斯神经网络也可以用来减轻近似的过度自信。与平衡不同，不需要正则化，并且这种解决方案可以使用很少的训练样本，因此计算昂贵的模拟器。为此，开发了一种新的贝叶斯神经网络先验，为基于模拟的推理量身定制，实证结果表明，与没有贝叶斯神经网络的类似解决方案相比，过度自信有所减少。
摘要：Scientific knowledge expands by observing the world, hypothesizing some theories about it, and testing them against collected data. When those theories take the form of statistical models, statistical analyses are involved in the process of testing and refining scientific hypotheses. In this thesis, we focus on statistical models that take the form of scientific simulators and provide background about how machine learning can be used for statistical analyses in this context. The first part of this thesis is about showing empirically that performing statistical analyses with machine learning involves a degree of approximation. Specifically, all statistical analyses involve a level of uncertainty in the conclusions drawn, and we show that approximations can lead to overconfident conclusions. We draw caution regarding such overconfident conclusions and introduce a criterion to diagnose overconfident approximations. In the second part, we introduce balancing, a way to regularize machine learning models to reduce overconfidence and favor calibrated or underconfident approximations. Balancing is first introduced for neural ratio estimation algorithms and then extended to other algorithms. Intuition about why balancing leads to less overconfident solutions is provided, and it is shown empirically that balanced algorithms are often either close to calibrated or underconfident. The third part shows that Bayesian neural networks can also be used to mitigate the overconfidence of approximations. Unlike balancing, no regularization is required, and this solution can then work with few training samples and, hence, computationally expensive simulators. To that end, a new Bayesian neural network prior tailored for simulation-based inference is developed, and empirical results show a reduction in overconfidence compared to similar solutions without Bayesian neural networks.

检测相关(2篇)

【1】GNNs for Time Series Anomaly Detection: An Open-Source Framework and a Critical Evaluation
标题：用于时间序列异常检测的GNN：开源框架和批判性评估
链接：https://arxiv.org/abs/2603.09675

作者：Federico Bello,Gonzalo Chiarlone,Marcelo Fiori,Gastón García González,Federico Larroca
摘要：人们越来越关注将基于图的方法应用于时间序列异常检测（TSAD），特别是图神经网络（GNN），因为它们自然地对多变量信号之间的依赖关系进行建模。GNN通常用作基于分数的TSAD管道中的主干，其中通过重建或预测误差识别异常，然后进行阈值处理。然而，尽管取得了可喜的成果，但该领域仍然缺乏标准化的评价框架，在指标设计和解释方面一直存在问题。因此，我们提出了一个使用GNN的TSAD开源框架，旨在支持跨数据集，图结构和评估策略的可重复实验。考虑到灵活性和可扩展性，该框架有助于TSAD模型之间的系统比较，并能够深入分析性能和可解释性。使用这个工具，我们评估了几个基于GNN的架构以及两个具有对比结构特征的真实数据集的基线模型。我们的研究结果表明，GNNs不仅提高了检测性能，而且在可解释性方面也有了显着的提高，这是一个对实际诊断特别有价值的特征。我们还发现，当图结构不确定或推断时，基于注意力的GNN具有鲁棒性。此外，我们反映在TSAD的共同评价实践，显示某些指标和阈值策略可以掩盖有意义的比较。总的来说，这项工作有助于实用的工具和关键的见解，以推进基于图形的TSAD系统的开发和评估。
摘要：There is growing interest in applying graph-based methods to Time Series Anomaly Detection (TSAD), particularly Graph Neural Networks (GNNs), as they naturally model dependencies among multivariate signals. GNNs are typically used as backbones in score-based TSAD pipelines, where anomalies are identified through reconstruction or prediction errors followed by thresholding. However, and despite promising results, the field still lacks standardized frameworks for evaluation and suffers from persistent issues with metric design and interpretation. We thus present an open-source framework for TSAD using GNNs, designed to support reproducible experimentation across datasets, graph structures, and evaluation strategies. Built with flexibility and extensibility in mind, the framework facilitates systematic comparisons between TSAD models and enables in-depth analysis of performance and interpretability. Using this tool, we evaluate several GNN-based architectures alongside baseline models across two real-world datasets with contrasting structural characteristics. Our results show that GNNs not only improve detection performance but also offer significant gains in interpretability, an especially valuable feature for practical diagnosis. We also find that attention-based GNNs offer robustness when graph structure is uncertain or inferred. In addition, we reflect on common evaluation practices in TSAD, showing how certain metrics and thresholding strategies can obscure meaningful comparisons. Overall, this work contributes both practical tools and critical insights to advance the development and evaluation of graph-based TSAD systems.

【2】Temporal-Conditioned Normalizing Flows for Multivariate Time Series Anomaly Detection
标题：用于多元时间序列异常检测的时间条件正规化流
链接：https://arxiv.org/abs/2603.09490

作者：David Baumgartner,Helge Langseth,Kenth Engø-Monsen,Heri Ramampiaro
摘要：本文介绍了时间条件归一化流（tcNF），一种新的框架，解决时间序列数据的异常检测与时间依赖性和不确定性的准确建模。通过对以前的观察进行规范化，tcNF有效地捕获了复杂的时间动态，并生成了预期行为的准确概率分布。这种自回归方法通过识别学习分布中的低概率事件来实现鲁棒的异常检测。我们在不同的数据集上评估tcNF，与现有方法相比，表现出良好的准确性和鲁棒性。提供了对优势和局限性以及开源代码的全面分析，以促进可重复性和未来的研究。
摘要：This paper introduces temporal-conditioned normalizing flows (tcNF), a novel framework that addresses anomaly detection in time series data with accurate modeling of temporal dependencies and uncertainty. By conditioning normalizing flows on previous observations, tcNF effectively captures complex temporal dynamics and generates accurate probability distributions of expected behavior. This autoregressive approach enables robust anomaly detection by identifying low-probability events within the learned distribution. We evaluate tcNF on diverse datasets, demonstrating good accuracy and robustness compared to existing methods. A comprehensive analysis of strengths and limitations and open-source code is provided to facilitate reproducibility and future research.

分类|识别(2篇)

【1】No evaluation without fair representation : Impact of label and selection bias on the evaluation, performance and mitigation of classification models
标题：没有公平代表就没有评估：标签和选择偏见对分类模型的评估、性能和缓解的影响
链接：https://arxiv.org/abs/2603.09662

作者：Magali Legast,Toon Calders,François Fouss
备注：31 pages, 14 figures + appendix Submitted to the ACM Journal on Responsible Computing
摘要：在机器学习数据集中，偏差可以以不同的方式引入，例如通过选择或标签偏差。虽然这些偏见类型本身对公平机器学习的重要方面有影响，但它们的不同影响尚未得到充分研究。在这项工作中，我们实证分析的标签偏见和几个亚型的选择偏见的分类模型的评价，其性能，以及偏见缓解方法的有效性的影响。我们还引入了一个偏置和评估框架，允许通过在低歧视的现实生活数据集中引入受控偏置来模拟公平世界及其有偏置的对应方。使用我们的框架，我们经验分析的影响，每一个偏见类型独立，同时获得一个更具代表性的评估模型和缓解方法比传统的使用有偏见的数据作为测试集的子集。我们的研究结果强调了影响偏差对模型性能影响的不同因素。他们还表明，公平性和准确性之间，以及个人和群体公平性之间缺乏权衡，当模型在测试集上进行评估时，不会表现出不必要的偏见。他们还指出，偏差缓解方法的性能受到数据中存在的偏差类型的影响。我们的研究结果呼吁未来的工作，以开发更准确的评估预测模型和公平干预，但也要更好地了解其他类型的偏见，更复杂的情况下，涉及不同的偏见类型的组合，以及其他因素，影响缓解方法的效率，如数据集的特点。
摘要：Bias can be introduced in diverse ways in machine learning datasets, for example via selection or label bias. Although these bias types in themselves have an influence on important aspects of fair machine learning, their different impact has been understudied. In this work, we empirically analyze the effect of label bias and several subtypes of selection bias on the evaluation of classification models, on their performance, and on the effectiveness of bias mitigation methods. We also introduce a biasing and evaluation framework that allows to model fair worlds and their biased counterparts through the introduction of controlled bias in real-life datasets with low discrimination. Using our framework, we empirically analyze the impact of each bias type independently, while obtaining a more representative evaluation of models and mitigation methods than with the traditional use of a subset of biased data as test set. Our results highlight different factors that influence how impactful bias is on model performance. They also show an absence of trade-off between fairness and accuracy, and between individual and group fairness, when models are evaluated on a test set that does not exhibit unwanted bias. They furthermore indicate that the performance of bias mitigation methods is influenced by the type of bias present in the data. Our findings call for future work to develop more accurate evaluations of prediction models and fairness interventions, but also to better understand other types of bias, more complex scenarios involving the combination of different bias types, and other factors that impact the efficiency of the mitigation methods, such as dataset characteristics.

【2】DendroNN: Dendrocentric Neural Networks for Energy-Efficient Classification of Event-Based Data
标题：DendroNN：用于基于事件的数据节能分类的树中心神经网络
链接：https://arxiv.org/abs/2603.09274

作者：Jann Krausse,Zhe Su,Kyrus Mama,Maryada,Klaus Knobloch,Giacomo Indiveri,Jürgen Becker
备注：Currently under review
摘要：时空信息是各种感觉处理和计算任务的核心。前馈尖峰神经网络可用于解决这些任务，同时通过基于事件的计算在能源效率方面提供潜在的好处。然而，它们难以高精度地解码时间信息。因此，它们通常采用递归或延迟来增强它们的时间计算能力，然而，这在硬件效率方面带来了不利影响。在大脑中，树突是最近才开始在这种机器学习系统中得到承认的计算动力。在这项工作中，我们专注于树突分支中存在的序列检测机制，并通过引入树状神经网络DendroNN将其转化为一种新型的神经网络。DendroNN将独特的传入尖峰序列识别为时空特征。这项工作进一步引入了一个重新布线阶段，训练的不可微的尖峰序列，而不使用梯度。在重新布线的过程中，网络会记住频繁出现的序列，并丢弃那些不提供任何区别信息的序列。这些网络在各种基于事件的时间序列数据集上显示出具有竞争力的准确性。我们还提出了一个异步数字硬件架构，使用时间轮机制，建立在事件驱动的设计DendroNN，消除每一步的全局更新典型的延迟或基于递归的模型。通过利用DendroNN的动态和静态稀疏性以及内在量化，它在相同的音频分类任务中以可比的精度实现了比最先进的神经形态硬件高出4倍的效率，证明了其适用于基于时空事件的计算。这项工作提供了一种新的方法，低功耗时空处理事件驱动的硬件。
摘要：Spatiotemporal information is at the core of diverse sensory processing and computational tasks. Feed-forward spiking neural networks can be used to solve these tasks while offering potential benefits in terms of energy efficiency by computing event-based. However, they have trouble decoding temporal information with high accuracy. Thus, they commonly resort to recurrence or delays to enhance their temporal computing ability which, however, bring downsides in terms of hardware-efficiency. In the brain, dendrites are computational powerhouses that just recently started to be acknowledged in such machine learning systems. In this work, we focus on a sequence detection mechanism present in branches of dendrites and translate it into a novel type of neural network by introducing a dendrocentric neural network, DendroNN. DendroNNs identify unique incoming spike sequences as spatiotemporal features. This work further introduces a rewiring phase to train the non-differentiable spike sequences without the use of gradients. During the rewiring, the network memorizes frequently occurring sequences and additionally discards those that do not contribute any discriminative information. The networks display competitive accuracies across various event-based time series datasets. We also propose an asynchronous digital hardware architecture using a time-wheel mechanism that builds on the event-driven design of DendroNNs, eliminating per-step global updates typical of delay- or recurrence-based models. By leveraging a DendroNN's dynamic and static sparsity along with intrinsic quantization, it achieves up to 4x higher efficiency than state-of-the-art neuromorphic hardware at comparable accuracy on the same audio classification task, demonstrating its suitability for spatiotemporal event-based computing. This work offers a novel approach to low-power spatiotemporal processing on event-driven hardware.

表征(2篇)

【1】Task Aware Modulation Using Representation Learning for Upsaling of Terrestrial Carbon Fluxes
标题：使用表示学习进行任务感知调制以提升陆地碳通量
链接：https://arxiv.org/abs/2603.09974

作者：Aleksei Rozanov,Arvind Renganathan,Vipin Kumar
备注：Accepted to the KGML Bridge at AAAI 2026 (non-archival)
摘要：精确地放大陆地碳通量是估算全球碳收支的核心，但由于地面测量数据的稀疏和区域性分布，仍然具有挑战性。现有的数据驱动的升级产品往往无法推广到观察到的领域之外，导致系统性的区域偏见和高度的预测不确定性。我们介绍了任务感知调制与表示学习（TAM-RL），这是一个将时空表示学习与知识引导的编码器-解码器架构和来自碳平衡方程的损失函数相结合的框架。在代表不同生物群落和气候状况的150多个通量塔站点中，TAM-RL相对于现有最先进的数据集提高了预测性能，将RMSE降低了8-9.6%，并将解释方差（$R^2$）从19.4%增加到43.8%，具体取决于目标通量。这些结果表明，结合物理接地约束与自适应表示学习，可以大大提高全球碳通量估计的鲁棒性和可移植性。
摘要：Accurately upscaling terrestrial carbon fluxes is central to estimating the global carbon budget, yet remains challenging due to the sparse and regionally biased distribution of ground measurements. Existing data-driven upscaling products often fail to generalize beyond observed domains, leading to systematic regional biases and high predictive uncertainty. We introduce Task-Aware Modulation with Representation Learning (TAM-RL), a framework that couples spatio-temporal representation learning with knowledge-guided encoder-decoder architecture and loss function derived from the carbon balance equation. Across 150+ flux tower sites representing diverse biomes and climate regimes, TAM-RL improves predictive performance relative to existing state-of-the-art datasets, reducing RMSE by 8-9.6% and increasing explained variance ($R^2$) from 19.4% to 43.8%, depending on the target flux. These results demonstrate that integrating physically grounded constraints with adaptive representation learning can substantially enhance the robustness and transferability of global carbon flux estimates.

【2】Semantic Level of Detail: Multi-Scale Knowledge Representation via Heat Kernel Diffusion on Hyperbolic Manifolds
标题：语义细节层次：双曲形Manifold上通过热核扩散的多尺度知识表示
链接：https://arxiv.org/abs/2603.08965

作者：Edward Izgorodin
备注：11 pages, 3 figures, 2 tables
摘要：人工智能记忆系统越来越多地将知识组织成图形结构-知识图，实体关系，社区层次结构-但缺乏持续分辨率控制的原则性机制：抽象级别之间的定性边界在哪里，代理应该如何导航它们？我们引入语义层次的细节（SLoD），一个框架，通过定义一个连续的缩放算子通过热核扩散的庞加莱球$\mathbb{B}^d$回答这两个问题。在粗尺度（$σ\to \infty$），扩散聚合嵌入到高级别的摘要;在细尺度（$σ\to 0$），本地语义细节被保留。我们证明了在Sarkar嵌入下树结构层次的层次相干性，其逼近误差为O（σ），失真为（1+\vareparent）.至关重要的是，我们表明，在图拉普拉斯算子的频谱间隙诱导紧急尺度边界-尺度表示经历定性过渡-这可以自动检测，而无需手动分辨率参数。在合成层次结构（HSBM），我们的边界扫描仪恢复种植的水平与ARI高达1.00，与检测优雅地降低附近的信息理论Kesten-Stigum阈值。在完整的WordNet名词层次结构（82 K同义词集）上，检测到的边界与真实的分类深度（$τ= 0.79$）一致，表明该方法在没有监督的情况下发现了现实世界知识图中有意义的抽象层次。
摘要：AI memory systems increasingly organize knowledge into graph structures -- knowledge graphs, entity relations, community hierarchies -- yet lack a principled mechanism for continuous resolution control: where do the qualitative boundaries between abstraction levels lie, and how should an agent navigate them? We introduce Semantic Level of Detail (SLoD), a framework that answers both questions by defining a continuous zoom operator via heat kernel diffusion on the Poincaré ball $\mathbb{B}^d$. At coarse scales ($σ\to \infty$), diffusion aggregates embeddings into high-level summaries; at fine scales ($σ\to 0$), local semantic detail is preserved. We prove hierarchical coherence with bounded approximation error $O(σ)$ and $(1+\varepsilon)$ distortion for tree-structured hierarchies under Sarkar embedding. Crucially, we show that spectral gaps in the graph Laplacian induce emergent scale boundaries -- scales where the representation undergoes qualitative transitions -- which can be detected automatically without manual resolution parameters. On synthetic hierarchies (HSBM), our boundary scanner recovers planted levels with ARI up to 1.00, with detection degrading gracefully near the information-theoretic Kesten-Stigum threshold. On the full WordNet noun hierarchy (82K synsets), detected boundaries align with true taxonomic depth ($τ= 0.79$), demonstrating that the method discovers meaningful abstraction levels in real-world knowledge graphs without supervision.

3D|3D重建等相关(1篇)

【1】Interactive 3D visualization of surface roughness predictions in additive manufacturing: A data-driven framework
标题：增材制造中表面粗糙度预测的交互式3D可视化：数据驱动框架
链接：https://arxiv.org/abs/2603.09353

作者：Engin Deniz Erkan,Elif Surer,Ulas Yaman
摘要：材料挤出增材制造中的表面粗糙度在零件上各不相同，在工艺规划过程中很难预测，因为它取决于打印参数和局部表面倾斜度，这决定了阶梯效应。一个数据驱动的框架，提出了预测的算术平均粗糙度（Ra）制造前使用的工艺参数和表面角度。使用三级Box-Behnken设计创建结构化实验数据集：打印87个样本，每个样本具有跨越不同倾斜角度的多个平面，产生用接触式轮廓仪获得的1566 Ra测量值。一个多层感知器回归器进行了训练，以捕捉制造条件，倾斜度和Ra之间的非线性关系。为了减少有限的实验数据，使用条件生成对抗网络来生成额外的条件特定的表格样本，从而提高预测性能。在保持测试集上评估模型性能。还开发了一个基于网络的决策支持界面，通过加载3D模型、指定打印参数和调整零件方向来实现交互式工艺规划。该系统根据模型几何形状计算面向倾斜度，并将预测的Ra可视化为表面上的交互式色图，从而能够快速识别容易出现高粗糙度的区域，并立即比较参数和方向选择。
摘要：Surface roughness in Material Extrusion Additive Manufacturing varies across a part and is difficult to anticipate during process planning because it depends on both printing parameters and local surface inclination, which governs the staircase effect. A data-driven framework is presented to predict the arithmetic mean roughness (Ra) prior to fabrication using process parameters and surface angle. A structured experimental dataset was created using a three-level Box-Behnken design: 87 specimens were printed, each with multiple planar faces spanning different inclination angles, yielding 1566 Ra measurements acquired with a contact profilometer. A multilayer perceptron regressor was trained to capture nonlinear relationships between manufacturing conditions, inclination, and Ra. To mitigate limited experimental data, a conditional generative adversarial network was used to generate additional condition-specific tabular samples, thereby improving predictive performance. Model performance was assessed on a hold-out test set. A web-based decision-support interface was also developed to enable interactive process planning by loading a 3D model, specifying printing parameters, and adjusting the part's orientation. The system computes face-wise inclination from the model geometry and visualizes predicted Ra as an interactive colormap over the surface, enabling rapid identification of regions prone to high roughness and immediate comparison of parameter and orientation choices.

优化|敛散性(5篇)

【1】Information Theoretic Bayesian Optimization over the Probability Simplex
标题：概率单形上的信息论Bayesian优化
链接：https://arxiv.org/abs/2603.09793

作者：Federico Pavesi,Antonio Candelieri,Noémie Jaquier
备注：16 pages, 5 figures
摘要：贝叶斯优化是一种数据高效的技术，已被证明是非常强大的优化昂贵的，黑箱，并可能有噪音的目标函数。许多应用涉及优化概率和混合物，这些概率和混合物自然属于概率单纯形，由非负项和为1定义的受约束的非欧几里德域。本文介绍了$α$-GaBO，一种新的概率单纯形上的贝叶斯优化算法。我们的方法是接地在信息几何，黎曼几何的一个分支，它赋予了一个黎曼度量和一类连接的单纯形。基于信息几何理论，我们构造了反映概率单纯形几何的Matérn核，以及用于获取函数的单参数几何优化器家族。我们验证了我们的方法在基准函数和各种现实世界的应用，包括混合物的组件，混合物的分类器，和机器人控制任务，显示其性能提高相比，约束欧几里得方法。
摘要：Bayesian optimization is a data-efficient technique that has been shown to be extremely powerful to optimize expensive, black-box, and possibly noisy objective functions. Many applications involve optimizing probabilities and mixtures which naturally belong to the probability simplex, a constrained non-Euclidean domain defined by non-negative entries summing to one. This paper introduces $α$-GaBO, a novel family of Bayesian optimization algorithms over the probability simplex. Our approach is grounded in information geometry, a branch of Riemannian geometry which endows the simplex with a Riemannian metric and a class of connections. Based on information geometry theory, we construct Matérn kernels that reflect the geometry of the probability simplex, as well as a one-parameter family of geometric optimizers for the acquisition function. We validate our method on benchmark functions and on a variety of real-world applications including mixtures of components, mixtures of classifiers, and a robotic control task, showing its increased performance compared to constrained Euclidean approaches.

【2】Beyond Test-Time Training: Learning to Reason via Hardware-Efficient Optimal Control
标题：超越测试时间训练：通过硬件高效的最优控制学习推理
链接：https://arxiv.org/abs/2603.09221

作者：Peihao Wang,Shan Yang,Xijun Wang,Tesi Xiao,Xin Liu,Changlong Yu,Yu Lou,Pan Li,Zhangyang Wang,Ming Lin,René Vidal
摘要：长期以来，关联记忆一直是序列模型设计的基础。除了回忆之外，人类还通过预测未来状态和选择目标导向的行动来进行推理，这是现代语言模型越来越需要的能力，但不是天生编码的。虽然以前的工作使用强化学习或测试时训练，但规划仍然在模型架构之外。我们将推理公式化为最优控制，并引入测试时间控制（TTC）层，该层在推理时间对潜在状态执行有限时域LQR规划，表示神经架构内的值函数，并将其作为嵌套目标，以实现预测前的规划。为了确保可扩展性，我们推导出一个硬件高效的LQR求解器的基础上的辛公式，并实现它作为一个融合的CUDA内核，使并行执行的开销最小。TTC层作为适配器集成到预训练的LLM中，在MATH-500上将数学推理性能提高了27.8%，在AMC和AIME上将Pass@8提高了2- 3倍，这表明将最优控制作为架构组件嵌入提供了一种有效且可扩展的推理机制。
摘要：Associative memory has long underpinned the design of sequential models. Beyond recall, humans reason by projecting future states and selecting goal-directed actions, a capability that modern language models increasingly require but do not natively encode. While prior work uses reinforcement learning or test-time training, planning remains external to the model architecture. We formulate reasoning as optimal control and introduce the Test-Time Control (TTC) layer, which performs finite-horizon LQR planning over latent states at inference time, represents a value function within neural architectures, and leverages it as the nested objective to enable planning before prediction. To ensure scalability, we derive a hardware-efficient LQR solver based on a symplectic formulation and implement it as a fused CUDA kernel, enabling parallel execution with minimal overhead. Integrated as an adapter into pretrained LLMs, TTC layers improve mathematical reasoning performance by up to +27.8% on MATH-500 and 2-3x Pass@8 improvements on AMC and AIME, demonstrating that embedding optimal control as an architectural component provides an effective and scalable mechanism for reasoning beyond test-time training.

【3】PPO-Based Hybrid Optimization for RIS-Assisted Semantic Vehicular Edge Computing
标题：基于PPO的RIS辅助语义车辆边缘计算混合优化
链接：https://arxiv.org/abs/2603.09082

作者：Wei Feng,Jingbo Zhang,Qiong Wu,Pingyi Fan,Qiang Fan
备注：This paper has been accepted by electronics. The source code has been released at: https://github.com/qiongwu86/PPO-Based-Hybrid-Optimization-for-RIS-Assisted-Semantic-Vehicular-Edge-Computing
摘要：为支持动态环境和间断链路下对延迟敏感的车联网应用，提出了一种可重构智能表面（RIS）辅助的语义感知车辆边缘计算（VEC）框架。这种方法集成了RIS来优化无线连接和语义通信，以通过传输语义特征来最大限度地减少延迟。我们制定了一个全面的联合优化问题，通过优化卸载率，语义符号的数量，和RIS相移。考虑到问题的高维性和非凸性，我们提出了一个两层的混合方案，采用邻近策略优化（PPO）的离散决策和线性规划（LP）的卸载优化。{The仿真结果验证了该框架相对于现有方法的优越性。具体而言，建议的基于PPO的混合优化方案减少了约40%至50%的平均端到端的延迟相比，遗传算法（GA）和量子行为粒子群优化（QPSO）。此外，该系统通过保持低延迟，即使在多达30辆车的拥堵场景中，也能表现出强大的可扩展性。
摘要：To support latency-sensitive Internet of Vehicles (IoV) applications amidst dynamic environments and intermittent links, this paper proposes a Reconfigurable Intelligent Surface (RIS)-aided semantic-aware Vehicle Edge Computing (VEC) framework. This approach integrates RIS to optimize wireless connectivity and semantic communication to minimize latency by transmitting semantic features. We formulate a comprehensive joint optimization problem by optimizing offloading ratios, the number of semantic symbols, and RIS phase shifts. Considering the problem's high dimensionality and non-convexity, we propose a two-tier hybrid scheme that employs Proximal Policy Optimization (PPO) for discrete decision-making and Linear Programming (LP) for offloading optimization. {The simulation results have validated the proposed framework's superiority over existing methods. Specifically, the proposed PPO-based hybrid optimization scheme reduces the average end-to-end latency by approximately 40% to 50% compared to Genetic Algorithm (GA) and Quantum-behaved Particle Swarm Optimization (QPSO). Moreover, the system demonstrates strong scalability by maintaining low latency even in congested scenarios with up to 30 vehicles.

【4】Flow Field Reconstruction via Voronoi-Enhanced Physics-Informed Neural Networks with End-to-End Sensor Placement Optimization
标题：基于Voronoi增强物理信息神经网络的流场重构及传感器优化配置
链接：https://arxiv.org/abs/2603.09371

作者：Renjie Xiao,Bingteng Sun,Yiling Chen,Lin Lu,Qiang Du,Junqiang Zhu
备注：36 pages, 9 figures
摘要：高保真流场重建在流体动力学中很重要，但它受到稀疏和时空不完整传感器测量的挑战，以及可能使预训练重建模型无效的预部署测量点的故障。物理信息神经网络（PINN）通过纳入管理物理来减轻对大型标记数据集的依赖，但传感器放置优化（重建准确性和鲁棒性的关键因素）仍然未得到充分研究。在这项研究中，我们提出了一个PINN与Voronoi增强传感器优化（VSOPINN）。VSOPINN支持用于稀疏传感器数据光栅化的可微软Voronoi构造、用于自适应传感器放置的形心Voronoi曲面细分（CVT）与PINN的端到端融合，以及通过共享编码器多解码器架构进行多条件流重建的统一布局优化。我们验证VSOPINN上的三个代表性的问题：盖驱动腔流，血管流，和环形旋转流。结果表明，VSOPINN显着提高重建精度在不同的雷诺数，自适应学习有效的传感器布局，并保持强大的部分传感器故障。该研究阐明了基于PINN的流场重建中传感器位置与重建精度之间的内在关系。
摘要：(short version abstract, full in article)High-fidelity flow field reconstruction is important in fluid dynamics, but it is challenged by sparse and spatiotemporally incomplete sensor measurements, as well as failures of pre-deployed measurement points that can invalidate pre-trained reconstruction models. Physics-informed neural networks (PINNs) alleviate dependence on large labeled datasets by incorporating governing physics, yet sensor placement optimization, a key factor in reconstruction accuracy and robustness, remains underexplored. In this study, we propose a PINN with Voronoi-enhanced Sensor Optimization (VSOPINN). VSOPINN enables differentiable soft Voronoi construction for sparse sensor data rasterization, end-to-end fusion of centroidal Voronoi tessellation (CVT) with PINNs for adaptive sensor placement, and unified layout optimization for multi-condition flow reconstruction through a shared encoder-multi-decoder architecture. We validate VSOPINN on three representative problems: lid-driven cavity flow, vascular flow, and annular rotating flow. Results show that VSOPINN significantly improves reconstruction accuracy across different Reynolds numbers, adaptively learns effective sensor layouts, and remains robust under partial sensor failure. The study clarifies the intrinsic relationship between sensor placement and reconstruction precision in PINN-based flow field reconstruction.

【5】On Regret Bounds of Thompson Sampling for Bayesian Optimization
标题：Bayesian优化Thompson抽样的遗憾界
链接：https://arxiv.org/abs/2603.09276

作者：Shion Takeno,Shogo Iwazaki
备注：42 pages
摘要：我们研究了一种广泛使用的贝叶斯优化方法，高斯过程汤普森采样（GP-TS），假设目标函数是一个样本路径从GP。相对于GP置信上界（GP-UCB），GP-TS的大多数分析都局限于期望后悔。此外，最近GP-UCB的分析，宽松的遗憾和改进的累积遗憾上界是否可以适用于GP-TS仍然不清楚。为了填补这些空白，本文展示了几个遗憾界限：（i）GP-TS的后悔下限，这意味着GP-TS遭受概率为$δ$的对$1/δ$的多项式依赖，（ii）累积后悔的二阶矩的上界，这直接暗示了$δ$上的改进的后悔上界，（iii）期望的宽松后悔上界，和（iv）一个改进的累积后悔上界的时间范围$T$。一路上，我们提供了几个有用的引理，包括放松的必要条件，从最近的分析，以获得改进的遗憾上界$T$。
摘要：We study a widely used Bayesian optimization method, Gaussian process Thompson sampling (GP-TS), under the assumption that the objective function is a sample path from a GP. Compared with the GP upper confidence bound (GP-UCB) with established high-probability and expected regret bounds, most analyses of GP-TS have been limited to expected regret. Moreover, whether the recent analyses of GP-UCB for the lenient regret and the improved cumulative regret upper bound can be applied to GP-TS remains unclear. To fill these gaps, this paper shows several regret bounds: (i) a regret lower bound for GP-TS, which implies that GP-TS suffers from a polynomial dependence on $1/δ$ with probability $δ$, (ii) an upper bound of the second moment of cumulative regret, which directly suggests an improved regret upper bound on $δ$, (iii) expected lenient regret upper bounds, and (iv) an improved cumulative regret upper bound on the time horizon $T$. Along the way, we provide several useful lemmas, including a relaxation of the necessary condition from recent analysis to obtain improved regret upper bounds on $T$.

预测|估计(6篇)

【1】A Hybrid Quantum-Classical Framework for Financial Volatility Forecasting Based on Quantum Circuit Born Machines
标题：基于量子电路玻恩机的混合量子-经典金融波动预测框架
链接：https://arxiv.org/abs/2603.09789

作者：Yixiong Chen
摘要：准确预测金融市场波动性对于风险管理、期权定价和投资组合优化至关重要。传统的计量经济学模型和经典的机器学习方法在处理金融时间序列固有的非线性和非平稳特性方面面临挑战。近年来，量子计算的快速发展为解决复杂的优化和采样问题提供了新的范式。本文提出了一种新的量子-经典混合计算框架，旨在将经典神经网络的强大表示能力与量子模型的独特优势相结合。针对金融市场波动预测的具体任务，我们设计并实现了一个基于该框架的混合模型，该模型将长短期记忆（LSTM）网络与量子电路出生机（QCBM）相结合。LSTM负责从历史时间序列数据中提取复杂的动态特征，而QCBM作为一个可学习的先验模块，为模型提供高质量的先验分布来指导预测过程。我们在两个真实的金融数据集上对模型进行了评估，这两个数据集由上证综合指数和沪深300指数的5分钟高频数据组成。实验结果表明，与纯粹的经典LSTM基线模型相比，我们的混合量子经典模型在多个关键指标上表现出显著的优势，包括均方误差（MSE），均方根误差（RMSE）和QLIKE损失，证明了量子计算在增强金融预测模型能力方面的巨大潜力。更广泛地说，所提出的混合框架提供了一种灵活的架构，可以适用于涉及高维、复杂或非线性数据分布的其他机器学习任务。
摘要：Accurate forecasting of financial market volatility is crucial for risk management, option pricing, and portfolio optimization. Traditional econometric models and classical machine learning methods face challenges in handling the inherent non-linear and non-stationary characteristics of financial time series. In recent years, the rapid development of quantum computing has provided a new paradigm for solving complex optimization and sampling problems. This paper proposes a novel hybrid quantum-classical computing framework aimed at combining the powerful representation capabilities of classical neural networks with the unique advantages of quantum models. For the specific task of financial market volatility forecasting, we designed and implemented a hybrid model based on this framework, which combines a Long Short-Term Memory (LSTM) network with a Quantum Circuit Born Machine (QCBM). The LSTM is responsible for extracting complex dynamic features from historical time series data, while the QCBM serves as a learnable prior module, providing the model with a high-quality prior distribution to guide the forecasting process. We evaluated the model on two real financial datasets consisting of 5-minute high-frequency data from the Shanghai Stock Exchange (SSE) Composite Index and CSI 300 Index. Experimental results show that, compared to a purely classical LSTM baseline model, our hybrid quantum-classical model demonstrates significant advantages across multiple key metrics, including Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and QLIKE loss, proving the great potential of quantum computing in enhancing the capabilities of financial forecasting models. More broadly, the proposed hybrid framework offers a flexible architecture that may be adapted to other machine learning tasks involving high-dimensional, complex, or non-linear data distributions.

【2】Physics-informed neural operator for predictive parametric phase-field modelling
标题：用于预测参数相场建模的物理信息神经运算符
链接：https://arxiv.org/abs/2603.09693

作者：Nanxi Chen,Airong Chen,Rujin Ma
摘要：通过相场模型预测材料的微观结构和形态演化是计算密集型的，特别是对于高通量参数研究。虽然傅立叶神经算子（FNO）等神经算子在加速参数偏微分方程（PDE）的求解方面表现出了希望，但缺乏明确的物理约束可能会限制复杂相场动力学的推广和长期精度。在这里，我们开发了一个物理信息的神经算子框架来学习参数相场偏微分方程，即PF-PINO。通过将相场控制方程的残差嵌入到数据保真度损失函数中，我们的框架在训练过程中有效地实施了物理约束。我们验证PF-PINO对基准相场问题，包括电化学腐蚀，枝晶凝固，和调幅分解。我们的研究结果表明，PF-PINO显着优于传统的FNO的准确性，泛化能力和长期稳定性。这项工作为相场建模提供了一个强大而有效的计算工具，并突出了物理学信息神经操作符在推进复杂界面演化问题的科学机器学习方面的潜力。
摘要：Predicting the microstructural and morphological evolution of materials through phase-field modelling is computationally intensive, particularly for high-throughput parametric studies. While neural operators such as the Fourier neural operator (FNO) show promise in accelerating the solution of parametric partial differential equations (PDEs), the lack of explicit physical constraints, may limit generalisation and long-term accuracy for complex phase-field dynamics. Here, we develop a physics-informed neural operator framework to learn parametric phase-field PDEs, namely PF-PINO. By embedding the residuals of phase-field governing equations into the data-fidelity loss function, our framework effectively enforces physical constraints during training. We validate PF-PINO against benchmark phase-field problems, including electrochemical corrosion, dendritic crystal solidification, and spinodal decomposition. Our results demonstrate that PF-PINO significantly outperforms conventional FNO in accuracy, generalisation capability, and long-term stability. This work provides a robust and efficient computational tool for phase-field modelling and highlights the potential of physics-informed neural operators to advance scientific machine learning for complex interfacial evolution problems.

【3】From Weighting to Modeling: A Nonparametric Estimator for Off-Policy Evaluation
标题：从加权到建模：政策外评估的非参数估计
链接：https://arxiv.org/abs/2603.09436

作者：Rong J. B. Zhu
摘要：我们研究的背景土匪，在那里我们的目标是评估一个新的政策，使用历史数据，包括上下文，行动和收到的奖励的设置关闭政策评估。该历史数据通常不能准确地忠实地表示新策略的动作分布。一种常见的方法，逆概率加权（IPW），调整这些差异的行动分布。然而，由于概率在分母中，该方法经常遭受高方差。双重鲁棒（DR）估计通过建模奖励减少方差，但不直接解决IPW的方差。在这项工作中，我们提出了一个非参数加权（NW）的方法，使用非参数模型构建权重IPW的限制。我们的NW方法实现了像IPW一样的低偏差，但通常表现出显着更低的方差。为了进一步减少方差，我们结合了奖励预测-类似于DR技术-导致模型辅助非参数加权（MNW）方法。MNW方法通过显式建模和减轻奖励建模的偏差来产生准确的值估计，而不旨在保证标准的双重鲁棒性。广泛的实证比较表明，我们的方法始终优于现有的技术，实现较低的方差值估计，同时保持低偏差。
摘要：We study off-policy evaluation in the setting of contextual bandits, where we aim to evaluate a new policy using historical data that consists of contexts, actions and received rewards. This historical data typically does not faithfully represent action distribution of the new policy accurately. A common approach, inverse probability weighting (IPW), adjusts for these discrepancies in action distributions. However, this method often suffers from high variance due to the probability being in the denominator. The doubly robust (DR) estimator reduces variance through modeling reward but does not directly address variance from IPW. In this work, we address the limitation of IPW by proposing a Nonparametric Weighting (NW) approach that constructs weights using a nonparametric model. Our NW approach achieves low bias like IPW but typically exhibits significantly lower variance. To further reduce variance, we incorporate reward predictions -- similar to the DR technique -- resulting in the Model-assisted Nonparametric Weighting (MNW) approach. The MNW approach yields accurate value estimates by explicitly modeling and mitigating bias from reward modeling, without aiming to guarantee the standard doubly robust property. Extensive empirical comparisons show that our approaches consistently outperform existing techniques, achieving lower variance in value estimation while maintaining low bias.

【4】Dynamic Multi-period Experts for Online Time Series Forecasting
标题：在线时间序列预测的动态多期专家
链接：https://arxiv.org/abs/2603.09062

作者：Seungha Hong,Sukang Chae,Suyeon Kim,Sanghwan Jang,Hwanjo Yu
备注：WWW 2026
摘要：在线时间序列预测（OTSF）要求模型不断适应概念漂移。然而，现有的方法往往把概念漂移作为一个整体的现象。为了解决这一限制，我们首先重新定义概念漂移，将其分为两种不同的类型：重复漂移，以前看到的模式重新出现，和新兴漂移，全新的模式出现。然后，我们提出DynaME（动态多周期专家），一种新的混合框架，旨在有效地解决这种双重性质的漂移。对于经常性漂移，DynaME聘请了一个由专业专家组成的委员会，他们在每个时间步动态地拟合最相关的历史周期模式。对于Emergent Drift，该框架检测高度不确定性的场景，并将依赖转移到稳定的一般专家身上。在多个基准数据集和主干上进行的大量实验表明，DynaME有效地适应了概念漂移，并显着优于现有的基线。
摘要：Online Time Series Forecasting (OTSF) requires models to continuously adapt to concept drift. However, existing methods often treat concept drift as a monolithic phenomenon. To address this limitation, we first redefine concept drift by categorizing it into two distinct types: Recurring Drift, where previously seen patterns reappear, and Emergent Drift, where entirely new patterns emerge. We then propose DynaME (Dynamic Multi-period Experts), a novel hybrid framework designed to effectively address this dual nature of drift. For Recurring Drift, DynaME employs a committee of specialized experts that are dynamically fitted to the most relevant historical periodic patterns at each time step. For Emergent Drift, the framework detects high-uncertainty scenarios and shifts reliance to a stable, general expert. Extensive experiments on several benchmark datasets and backbones demonstrate that DynaME effectively adapts to both concept drifts and significantly outperforms existing baselines.

【5】Robust Parameter and State Estimation in Multiscale Neuronal Systems Using Physics-Informed Neural Networks
标题：使用物理信息神经网络进行多尺度神经元系统的鲁棒参数和状态估计
链接：https://arxiv.org/abs/2603.08742

作者：Changliang Wei,Yangyang Wang,Xueyu Zhu
摘要：从部分和噪声观测中推断生物物理参数和隐藏状态变量是计算神经科学中的一个基本挑战。这一问题对于快速-慢速尖峰和爆发模式尤为困难，因为在这些模式中，强非线性、多尺度动力学和有限的观测数据往往导致对初始参数猜测的严重敏感性和传统数值正演方法的收敛失败。在这项工作中，我们开发了一个物理信息神经网络（PINN）框架，用于联合重建未观察到的状态变量和估计神经元模型中的未知生物物理参数。我们证明了生物物理神经元模型，包括Morris-Lecar模型在多个尖峰和爆发制度和呼吸模型神经元的方法的有效性。该方法只需要在短的观测窗口部分电压观测，并保持鲁棒性，即使在初始化时与非信息参数猜测。这些结果表明，PINN可以提供强大而准确的参数推断和状态重建，为多尺度神经元动力学中的逆问题提供了一个有前途的替代方案，传统技术往往难以解决。
摘要：Inferring biophysical parameters and hidden state variables from partial and noisy observations is a fundamental challenge in computational neuroscience. This problem is particularly difficult for fast - slow spiking and bursting models, where strong nonlinearities, multiscale dynamics, and limited observational data often lead to severe sensitivity to initial parameter guesses and convergence failure in the methods replying on the traditional numerical forward solvers. In this work, we developed a physics-informed neural network (PINN) framework for the joint reconstruction of unobserved state variables and the estimation of unknown biophysical parameters in neuronal models. We demonstrate the effectiveness of the method on biophysical neuron models, including the Morris-Lecar model across multiple spiking and bursting regimes and a respiratory model neuron. The method requires only partial voltage observations over short observation windows and remains robust even when initialized with non-informative parameter guesses. These results suggest that PINN can deliver robust and accurate parameter inference and state reconstruction, providing a promising alternative for inverse problems in multiscale neuronal dynamics, where traditional techniques often struggle.

【6】Kernel Debiased Plug-in Estimation based on the Universal Least Favorable Submodel
标题：基于通用最不有利子模型的核去偏插件估计
链接：https://arxiv.org/abs/2603.08945

作者：Haiyi Chen,Yang Liu,Ivana Malenica
摘要：本文提出了一种基于泛最不利子模型的核去偏插入式估计器ULFS-KDPE，用于估计非参数模型中的路径可微参数。该方法构建了一个数据自适应去偏流的再生核希尔伯特空间（RKHS），产生一个插件估计，实现半参数效率，而不需要显式推导或评估有效的影响函数。我们把ULFS-KDPE严格的功能分析的基础上制定的普遍最不利的更新作为一个非线性常微分方程的概率密度。我们建立存在性，唯一性，稳定性和有限时间收敛的经验分数沿诱导流。在标准的正则性条件下，得到的估计是正规的，渐近线性的，并达到半参数的有效界同时为一类广泛的路径可微参数。该方法承认一个计算上容易处理的实现有限维内核表示和原则停止标准的基础上。在有限的样本，解决一个丰富的收集与RKHS为基础的平滑和避免直接的影响函数评估得分方程的组合，导致改善数值稳定性。仿真研究说明了该方法和支持的理论结果。
摘要：We propose ULFS-KDPE, a kernel debiased plug-in estimator based on the universal least favorable submodel, for estimating pathwise differentiable parameters in nonparametric models. The method constructs a data-adaptive debiasing flow in a reproducing kernel Hilbert space (RKHS), producing a plug-in estimator that achieves semiparametric efficiency without requiring explicit derivation or evaluation of efficient influence functions. We place ULFS-KDPE on a rigorous functional-analytic foundation by formulating the universal least favorable update as a nonlinear ordinary differential equation on probability densities. We establish existence, uniqueness, stability, and finite-time convergence of the empirical score along the induced flow. Under standard regularity conditions, the resulting estimator is regular, asymptotically linear, and attains the semiparametric efficiency bound simultaneously for a broad class of pathwise differentiable parameters. The method admits a computationally tractable implementation based on finite-dimensional kernel representations and principled stopping criteria. In finite samples, the combination of solving a rich collection of score equations with RKHS-based smoothing and avoidance of direct influence-function evaluation leads to improved numerical stability. Simulation studies illustrate the method and support the theoretical results.

其他神经网络|深度学习|模型|建模(22篇)

【1】When Learning Rates Go Wrong: Early Structural Signals in PPO Actor-Critic
标题：当学习率出错时：PPO演员评论家中的早期结构信号
链接：https://arxiv.org/abs/2603.09950

作者：Alberto Fernández-Hernández,Cristian Pérez-Corral,Jose I. Mestre,Manuel F. Dolz,Jose Duato,Enrique S. Quintana-Ortí
摘要：深度强化学习系统对学习率（LR）高度敏感，选择稳定和高性能的训练运行通常需要广泛的超参数搜索。在最近策略优化（PPO）的演员-评论家的方法，小LR值导致收敛速度慢，而大LR值可能会导致不稳定或崩溃。我们使用过拟合-欠拟合指标（OUI）从网络中隐藏神经元的行为中分析这种现象，OUI是一种量化固定探针批次上二进制激活模式平衡的度量。我们引入了一个有效的基于批量的配方OUI和LR和激活符号的变化之间的理论联系，澄清如何正确的神经元的内部结构的演变取决于步长。从经验上讲，在三个离散控制环境和多个种子，我们表明，OUI测量只有10%的培训已经区分LR制度。我们观察到一个一致的不对称性：实现最高回报的评论家网络在中间OUI带（避免饱和），而实现最高回报的演员网络表现出相对较高的OUI值。然后，我们比较OUI为基础的筛选规则对早期返回，剪辑为基础的，分歧为基础的，翻转为基础的标准匹配召回成功运行。在这种情况下，OUI提供了最强的早期筛选信号：单独使用OUI可以在更广泛的召回率下实现最佳精度，而将早期返回与OUI相结合可以在最佳性能筛选机制中获得最高精度，从而在不需要全面训练的情况下积极修剪没有希望的运行。
摘要：Deep Reinforcement Learning systems are highly sensitive to the learning rate (LR), and selecting stable and performant training runs often requires extensive hyperparameter search. In Proximal Policy Optimization (PPO) actor--critic methods, small LR values lead to slow convergence, whereas large LR values may induce instability or collapse. We analyse this phenomenon from the behavior of the hidden neurons in the network using the Overfitting-Underfitting Indicator (OUI), a metric that quantifies the balance of binary activation patterns over a fixed probe batch. We introduce an efficient batch-based formulation of OUI and derive a theoretical connection between LR and activation sign changes, clarifying how a correct evolution of the neuron's inner structure depends on the step size. Empirically, across three discrete-control environments and multiple seeds, we show that OUI measured at only 10\% of training already discriminates between LR regimes. We observe a consistent asymmetry: critic networks achieving highest return operate in an intermediate OUI band (avoiding saturation), whereas actor networks achieving highest return exhibit comparatively high OUI values. We then compare OUI-based screening rules against early return, clip-based, divergence-based, and flip-based criteria under matched recall over successful runs. In this setting, OUI provides the strongest early screening signal: OUI alone achieves the best precision at broader recall, while combining early return with OUI yields the highest precision in best-performing screening regimes, enabling aggressive pruning of unpromising runs without requiring full training.

【2】A Unified Hierarchical Multi-Task Multi-Fidelity Framework for Data-Efficient Surrogate Modeling in Manufacturing
标题：用于制造业数据高效代理建模的统一分层多任务多保真框架
链接：https://arxiv.org/abs/2603.09842

作者：Manan Mehta,Zhiqiao Dong,Yuhang Yang,Chenhui Shao
摘要：替代建模是一种重要的数据驱动技术，用于量化制造和工程系统中输入变量和系统响应之间的关系。两个主要的挑战限制了它的有效性：（1）学习复杂的非线性关系的大数据需求，以及（2）从具有不同保真度水平的源收集的异构数据。多任务学习（MTL）通过在相关过程中实现信息共享来解决第一个挑战，而多保真度建模通过考虑依赖于可靠性的不确定性来解决第二个挑战。然而，现有的方法通常单独解决这些挑战，并且没有统一的框架同时利用任务间的相似性和依赖于数据特性的一致性。提出了一种新的基于高斯过程的代理建模分层多任务多保真度框架。该框架将每个任务的响应分解为特定于任务的全局趋势和使用分层贝叶斯公式跨任务联合学习的残余局部变化组件。该框架可容纳任意数量的任务，设计点和保真度水平，同时提供预测的不确定性量化。我们证明了所提出的方法的有效性，使用一维合成的例子和现实世界的发动机表面形状预测的案例研究。与（1）不考虑保真度信息的最先进的MTL模型和（2）独立学习任务的随机克里金模型相比，所提出的方法分别将预测精度提高了19%和23%。H-MT-MF框架提供了一个通用的和可扩展的解决方案，在制造系统的代理建模的特点是异构数据源。
摘要：Surrogate modeling is an essential data-driven technique for quantifying relationships between input variables and system responses in manufacturing and engineering systems. Two major challenges limit its effectiveness: (1) large data requirements for learning complex nonlinear relationships, and (2) heterogeneous data collected from sources with varying fidelity levels. Multi-task learning (MTL) addresses the first challenge by enabling information sharing across related processes, while multi-fidelity modeling addresses the second by accounting for fidelity-dependent uncertainty. However, existing approaches typically address these challenges separately, and no unified framework simultaneously leverages inter-task similarity and fidelity-dependent data characteristics. This paper develops a novel hierarchical multi-task multi-fidelity (H-MT-MF) framework for Gaussian process-based surrogate modeling. The proposed framework decomposes each task's response into a task-specific global trend and a residual local variability component that is jointly learned across tasks using a hierarchical Bayesian formulation. The framework accommodates an arbitrary number of tasks, design points, and fidelity levels while providing predictive uncertainty quantification. We demonstrate the effectiveness of the proposed method using a 1D synthetic example and a real-world engine surface shape prediction case study. Compared to (1) a state-of-the-art MTL model that does not account for fidelity information and (2) a stochastic kriging model that learns tasks independently, the proposed approach improves prediction accuracy by up to 19% and 23%, respectively. The H-MT-MF framework provides a general and extensible solution for surrogate modeling in manufacturing systems characterized by heterogeneous data sources.

【3】Learning Bayesian and Markov Networks with an Unreliable Oracle
标题：使用不可靠的Oracle学习Bayesian和Markov网络
链接：https://arxiv.org/abs/2603.09563

作者：Juha Harviainen,Pekka Parviainen,Vidya Sagar Sharma
摘要：我们研究了马尔可夫网络和贝叶斯网络的基于约束的结构学习中存在的一个不可靠的条件独立的预言，使最多有界数量的错误。对于马尔可夫网络，我们观察到，一个低的最大数量的顶点不相交路径意味着结构是唯一可识别的，即使错误的数量是（适度）指数的顶点数。然而，对于贝叶斯网络，我们证明了一个不能容忍任何错误总是识别的结构，即使许多常用的图参数，如树宽有界。最后，我们给出了算法的结构学习时，结构是唯一可识别的。
摘要：We study constraint-based structure learning of Markov networks and Bayesian networks in the presence of an unreliable conditional independence oracle that makes at most a bounded number of errors. For Markov networks, we observe that a low maximum number of vertex-wise disjoint paths implies that the structure is uniquely identifiable even if the number of errors is (moderately) exponential in the number of vertices. For Bayesian networks, however, we prove that one cannot tolerate any errors to always identify the structure even when many commonly used graph parameters like treewidth are bounded. Finally, we give algorithms for structure learning when the structure is uniquely identifiable.

【4】You Didn't Have to Say It like That: Subliminal Learning from Faithful Paraphrases
标题：你不必这么说：从忠实的转述中潜意识学习
链接：https://arxiv.org/abs/2603.09517

作者：Isaia Gisler,Zhonghao He,Tianyi Qiu
备注：Accepted for Spotlight presentation at EACL 2026 SRW. 5 pages, 2 figures, plus appendix. Equal supervision by Zhonghao He and Tianyi Qiu
摘要：When language models are trained on synthetic data, they (student model) can covertly acquire behavioral traits from the data-generating model (teacher model). Subliminal learning refers to the transmission of traits from a teacher to a student model via training on data unrelated to those traits. Prior work demonstrated this in the training domains of number sequences, code, and math Chain-of-Thought traces including transmission of misaligned behaviors. We investigate whether transmission occurs through natural language paraphrases with fixed semantic content, and whether content explicitly contradicting the teacher's preference can block it. We find that training on paraphrases from a teacher system-prompted to love a particular animal increases a student's preference for that animal by up to 19 percentage points. This occurs when paraphrased content is semantically unrelated to the animal, or even when it explicitly expresses dislike. The transmission succeeds despite aggressive filtering to ensure paraphrase fidelity. This raises concerns for pipelines where models generate their own training data: content-based inspection cannot detect such transmission, and even preference-contradicting content fails to prevent it.

【5】Reviving ConvNeXt for Efficient Convolutional Diffusion Models
标题：有效卷积扩散模型的恢复ConvNeXt
链接：https://arxiv.org/abs/2603.09408

作者：Taesung Kwon,Lorenzo Bianchi,Lennart Wittke,Felix Watine,Fabio Carrara,Jong Chul Ye,Romann Weber,Vinicius Azevedo
备注：CVPR 2026. Official implementation: https://github.com/star-kwon/FCDM
摘要：Recent diffusion models increasingly favor Transformer backbones, motivated by the remarkable scalability of fully attentional architectures. Yet the locality bias, parameter efficiency, and hardware friendliness--the attributes that established ConvNets as the efficient vision backbone--have seen limited exploration in modern generative modeling. Here we introduce the fully convolutional diffusion model (FCDM), a model having a backbone similar to ConvNeXt, but designed for conditional diffusion modeling. We find that using only 50% of the FLOPs of DiT-XL/2, FCDM-XL achieves competitive performance with 7$\times$ and 7.5$\times$ fewer training steps at 256$\times$256 and 512$\times$512 resolutions, respectively. Remarkably, FCDM-XL can be trained on a 4-GPU system, highlighting the exceptional training efficiency of our architecture. Our results demonstrate that modern convolutional designs provide a competitive and highly efficient alternative for scaling diffusion models, reviving ConvNeXt as a simple yet powerful building block for efficient generative modeling.

【6】CLoE: Expert Consistency Learning for Missing Modality Segmentation
标题：CLOE：缺失情态分割的专家一致性学习
链接：https://arxiv.org/abs/2603.09316

作者：Xinyu Tong,Meihua Zhou,Bowu Fan,Haitao Li
摘要：Multimodal medical image segmentation often faces missing modalities at inference, which induces disagreement among modality experts and makes fusion unstable, particularly on small foreground structures. We propose Consistency Learning of Experts (CLoE), a consistency-driven framework for missing-modality segmentation that preserves strong performance when all modalities are available. CLoE formulates robustness as decision-level expert consistency control and introduces a dual-branch Expert Consistency Learning objective. Modality Expert Consistency enforces global agreement among expert predictions to reduce case-wise drift under partial inputs, while Region Expert Consistency emphasizes agreement on clinically critical foreground regions to avoid background-dominated regularization. We further map consistency scores to modality reliability weights using a lightweight gating network, enabling reliability-aware feature recalibration before fusion. Extensive experiments on BraTS 2020 and MSD Prostate demonstrate that CLoE outperforms state-of-the-art methods in incomplete multimodal segmentation, while exhibiting strong cross-dataset generalization and improving robustness on clinically critical structures.

【7】A Gaussian Comparison Theorem for Training Dynamics in Machine Learning
标题：机器学习中训练动力学的高斯比较定理
链接：https://arxiv.org/abs/2603.09310

作者：Ashkan Panahi
摘要：We study training algorithms with data following a Gaussian mixture model. For a specific family of such algorithms, we present a non-asymptotic result, connecting the evolution of the model to a surrogate dynamical system, which can be easier to analyze. The proof of our result is based on the celebrated Gordon comparison theorem. Using our theorem, we rigorously prove the validity of the dynamic mean-field (DMF) expressions in the asymptotic scenarios. Moreover, we suggest an iterative refinement scheme to obtain more accurate expressions in non-asymptotic scenarios. We specialize our theory to the analysis of training a perceptron model with a generic first-order (full-batch) algorithm and demonstrate that fluctuation parameters in a non-asymptotic domain emerge in addition to the DMF kernels.

【8】Causally Sufficient and Necessary Feature Expansion for Class-Incremental Learning
标题：类增量学习的因果充要特征扩展
链接：https://arxiv.org/abs/2603.09145

作者：Zhen Zhang,Jielei Chu,Tianrui Li
摘要：Current expansion-based methods for Class Incremental Learning (CIL) effectively mitigate catastrophic forgetting by freezing old features. However, such task-specific features learned from the new task may collide with the old features. From a causal perspective, spurious feature correlations are the main cause of this collision, manifesting in two scopes: (i) guided by empirical risk minimization (ERM), intra-task spurious correlations cause task-specific features to rely on shortcut features. These non-robust features are vulnerable to interference, inevitably drifting into the feature space of other tasks; (ii) inter-task spurious correlations induce semantic confusion between visually similar classes across tasks. To address this, we propose a Probability of Necessity and Sufficiency (PNS)-based regularization method to guide feature expansion in CIL. Specifically, we first extend the definition of PNS to expansion-based CIL, termed CPNS, which quantifies both the causal completeness of intra-task representations and the separability of inter-task representations. We then introduce a dual-scope counterfactual generator based on twin networks to ensure the measurement of CPNS, which simultaneously generates: (i) intra-task counterfactual features to minimize intra-task PNS risk and ensure causal completeness of task-specific features, and (ii) inter-task interfering features to minimize inter-task PNS risk, ensuring the separability of inter-task representations. Theoretical analyses confirm its reliability. The regularization is a plug-and-play method for expansion-based CIL to mitigate feature collision. Extensive experiments demonstrate the effectiveness of the proposed method.

【9】Quality over Quantity: Demonstration Curation via Influence Functions for Data-Centric Robot Learning
标题：质量重于数量：通过影响函数演示以数据为中心的机器人学习
链接：https://arxiv.org/abs/2603.09056

作者：Haeone Lee,Taywon Min,Junsu Kim,Sinjae Kang,Fangchen Liu,Lerrel Pinto,Kimin Lee
备注：Accepted to ICRA 2026, 8 pages
摘要：Learning from demonstrations has emerged as a promising paradigm for end-to-end robot control, particularly when scaled to diverse and large datasets. However, the quality of demonstration data, often collected through human teleoperation, remains a critical bottleneck for effective data-driven robot learning. Human errors, operational constraints, and teleoperator variability introduce noise and suboptimal behaviors, making data curation essential yet largely manual and heuristic-driven. In this work, we propose Quality over Quantity (QoQ), a grounded and systematic approach to identifying high-quality data by defining data quality as the contribution of each training sample to reducing loss on validation demonstrations. To efficiently estimate this contribution, we leverage influence functions, which quantify the impact of individual training samples on model performance. We further introduce two key techniques to adapt influence functions for robot demonstrations: (i) using maximum influence across validation samples to capture the most relevant state-action pairs, and (ii) aggregating influence scores of state-action pairs within the same trajectory to reduce noise and improve data coverage. Experiments in both simulated and real-world settings show that QoQ consistently improves policy performances over prior data selection methods.

【10】Two Teachers Better Than One: Hardware-Physics Co-Guided Distributed Scientific Machine Learning
标题：两名老师胜过一名老师：硬件物理共同引导的分布式科学机器学习
链接：https://arxiv.org/abs/2603.09032

作者：Yuchen Yuan,Junhuan Yang,Hao Wan,Yipei Liu,Hanhan Wu,Youzuo Lin,Lei Yang
备注：7 pages, 9 figures. Accepted at the 63rd ACM/IEEE Design Automation Conference (DAC 2026), Long Beach, CA, July 2026
摘要：Scientific machine learning (SciML) is increasingly applied to in-field processing, controlling, and monitoring; however, wide-area sensing, real-time demands, and strict energy and reliability constraints make centralized SciML implementation impractical. Most SciML models assume raw data aggregation at a central node, incurring prohibitively high communication latency and energy costs; yet, distributing models developed for general-purpose ML often breaks essential physical principles, resulting in degraded performance. To address these challenges, we introduce EPIC, a hardware- and physics-co-guided distributed SciML framework, using full-waveform inversion (FWI) as a representative task. EPIC performs lightweight local encoding on end devices and physics-aware decoding at a central node. By transmitting compact latent features rather than high-volume raw data and by using cross-attention to capture inter-receiver wavefield coupling, EPIC significantly reduces communication cost while preserving physical fidelity. Evaluated on a distributed testbed with five end devices and one central node, and across 10 datasets from OpenFWI, EPIC reduces latency by 8.9$\times$ and communication energy by 33.8$\times$, while even improving reconstruction fidelity on 8 out of 10 datasets.

【11】An accurate flatness measure to estimate the generalization performance of CNN models
标题：用于估计CNN模型推广性能的准确平坦度测量
链接：https://arxiv.org/abs/2603.09016

作者：Rahman Taleghani,Maryam Mohammadi,Francesco Marchetti
摘要：Flatness measures based on the spectrum or the trace of the Hessian of the loss are widely used as proxies for the generalization ability of deep networks. However, most existing definitions are either tailored to fully connected architectures, relying on stochastic estimators of the Hessian trace, or ignore the specific geometric structure of modern Convolutional Neural Networks (CNNs). In this work, we develop a flatness measure that is both exact and architecturally faithful for a broad and practically relevant class of CNNs. We first derive a closed-form expression for the trace of the Hessian of the cross-entropy loss with respect to convolutional kernels in networks that use global average pooling followed by a linear classifier. Building on this result, we then specialize the notion of relative flatness to convolutional layers and obtain a parameterization-aware flatness measure that properly accounts for the scaling symmetries and filter interactions induced by convolution and pooling. Finally, we empirically investigate the proposed measure on families of CNNs trained on standard image-classification benchmarks. The results obtained suggest that the proposed measure can serve as a robust tool to assess and compare the generalization performance of CNN models, and to guide the design of architecture and training choices in practice.

【12】MAcPNN: Mutual Assisted Learning on Data Streams with Temporal Dependence
标题：MAcPNN：具有时间依赖性的数据流互助学习
链接：https://arxiv.org/abs/2603.08972

作者：Federico Giannini,Emanuele Della Valle
摘要：Internet of Things (IoT) Analytics often involves applying machine learning (ML) models on data streams. In such scenarios, traditional ML paradigms face obstacles related to continuous learning while dealing with concept drifts, temporal dependence, and avoiding forgetting. Moreover, in IoT, different edge devices build up a network. When learning models on those devices, connecting them could be useful in improving performance and reusing others' knowledge. This work proposes Mutual Assisted Learning, a learning paradigm grounded on Vygotsky's popular Sociocultural Theory of Cognitive Development. Each device is autonomous and does not need a central orchestrator. Whenever it degrades its performance due to a concept drift, it asks for assistance from others and decides whether their knowledge is useful for solving the new problem. This way, the number of connections is drastically reduced compared to the classical Federated Learning approaches, where the devices communicate at each training round. Every device is equipped with a Continuous Progressive Neural Network (cPNN) to handle the dynamic nature of data streams. We call this implementation Mutual Assisted cPNN (MAcPNN). To implement it, we allow cPNNs for single data point predictions and apply quantization to reduce the memory footprint. Experimental results prove the effectiveness of MAcPNN in boosting performance on synthetic and real data streams.

【13】A New Modeling to Feature Selection Based on the Fuzzy Rough Set Theory in Normal and Optimistic States on Hybrid Information Systems
标题：混合信息系统正常状态和乐观状态下基于模糊粗糙集理论的特征选择新模型
链接：https://arxiv.org/abs/2603.08900

作者：Mohammad Hossein Safarpour,Seyed Mohammad Alavi,Mohammad Izadikhah,Hossein Dibachi
备注：18 pages, 14 figures, 9 tables. Published version available at International Journal of Engineering. This preprint is distributed under CC BY 4.0 license
摘要：Considering the high volume, wide variety, and rapid speed of data generation, investigating feature selection methods for big data presents various applications and advantages. By removing irrelevant and redundant features, feature selection reduces data dimensions, thereby facilitating optimal decision-making within decision systems. One of the key tools for feature selection in hybrid information systems is fuzzy rough set theory. However, this theory faces two significant challenges: First, obtaining fuzzy equivalence relations through intersection operations in high-dimensional spaces can be both time-consuming and memory-intensive. Additionally, this method may produce noisy data, complicating the feature selection process. The purpose and innovation of this paper are to address these issues. We proposed a new feature selection model that calculates the combined distance between objects and subsequently used this information to derive the fuzzy equivalence relation. Rather than directly solving the feature selection problem, this approach reformulates it into an optimization problem that can be tackled using appropriate meta-heuristic algorithms. We have named this new approach FSbuHD. The FSbuHD model operates in two modes - normal and optimistic - based on the selection of one of the two introduced fuzzy equivalence relations. The model is then tested on standard datasets from the UCI repository and compared with other algorithms. The results of this research demonstrate that FSbuHD is one of the most efficient and effective methods for feature selection when compared to previous methods and algorithms.

【14】Why Channel-Centric Models are not Enough to Predict End-to-End Performance in Private 5G: A Measurement Campaign and Case Study
标题：为什么以队列为中心的模型不足以预测私有5G的端到端性能：测量活动和案例研究
链接：https://arxiv.org/abs/2603.08865

作者：Nils Jörgensen
摘要：Communication-aware robot planning requires accurate predictions of wireless network performance. Current approaches rely on channel-level metrics such as received signal strength and signal-to-noise ratio, assuming these translate reliably into end-to-end throughput. We challenge this assumption through a measurement campaign in a private 5G industrial environment. We evaluate throughput predictions from a commercial ray-tracing simulator as well as data-driven Gaussian process regression models against measurements collected using a mobile robot. The study uses off-the-shelf user equipment in an underground, radio-shielded facility with detailed 3D modeling, representing a best-case scenario for prediction accuracy. The ray-tracing simulator captures the spatial structure of indoor propagation and predicts channel-level metrics with reasonable fidelity. However, it systematically over-predicts throughput, even in line-of-sight regions. The dominant error source is shown to be over-estimation of sustainable MIMO spatial layers: the simulator assumes near-uniform four-layer transmission while measurements reveal substantial adaptation between one and three layers. This mismatch inflates predicted throughput even when channel metrics appear accurate. In contrast, a Gaussian process model with a rational quadratic kernel achieves approximately two-thirds reduction in prediction error with near-zero bias by learning end-to-end throughput directly from measurements. These findings demonstrate that favorable channel conditions do not guarantee high throughput; communication-aware planners relying solely on channel-centric predictions risk overly optimistic trajectories that violate reliability requirements. Accurate throughput prediction for 5G systems requires either extensive calibration of link-layer models or data-driven approaches that capture real system behavior.

【15】Expressivity-Efficiency Tradeoffs for Hybrid Sequence Models
标题：混合序列模型的表现效率权衡
链接：https://arxiv.org/abs/2603.08859

作者：John Cooper,Ilias Diakonikolas,Mingchen Ma,Frederic Sala
摘要：Hybrid sequence models--combining Transformer and state-space model layers--seek to gain the expressive versatility of attention as well as the computational efficiency of state-space model layers. Despite burgeoning interest in hybrid models, we lack a basic understanding of the settings where--and underlying mechanisms through which--they offer benefits over their constituent models. In this paper, we study this question, focusing on a broad family of core synthetic tasks. For this family of tasks, we prove the existence of fundamental limitations for non-hybrid models. Specifically, any Transformer or state-space model that solves the underlying task requires either a large number of parameters or a large working memory. On the other hand, for two prototypical tasks within this family--namely selective copying and associative recall--we construct hybrid models of small size and working memory that provably solve these tasks, thus achieving the best of both worlds. Our experimental evaluation empirically validates our theoretical findings. Importantly, going beyond the settings in our theoretical analysis, we empirically show that learned--rather than constructed--hybrids outperform non-hybrid models with up to 6x as many parameters. We additionally demonstrate that hybrid models exhibit stronger length generalization and out-of-distribution robustness than non-hybrids.

【16】MASEval: Extending Multi-Agent Evaluation from Models to Systems
标题：MASEval：将多智能体评估从模型扩展到系统
链接：https://arxiv.org/abs/2603.08835

作者：Cornelius Emde,Alexander Rubinstein,Anmol Goel,Ahmed Heakl,Sangdoo Yun,Seong Joon Oh,Martin Gubri
摘要：The rapid adoption of LLM-based agentic systems has produced a rich ecosystem of frameworks (smolagents, LangGraph, AutoGen, CAMEL, LlamaIndex, i.a.). Yet existing benchmarks are model-centric: they fix the agentic setup and do not compare other system components. We argue that implementation decisions substantially impact performance, including choices such as topology, orchestration logic, and error handling. MASEval addresses this evaluation gap with a framework-agnostic library that treats the entire system as the unit of analysis. Through a systematic system-level comparison across 3 benchmarks, 3 models, and 3 frameworks, we find that framework choice matters as much as model choice. MASEval allows researchers to explore all components of agentic systems, opening new avenues for principled system design, and practitioners to identify the best implementation for their use case. MASEval is available under the MIT licence https://github.com/parameterlab/MASEval.

【17】Hebbian-Oscillatory Co-Learning
标题：赫比-振荡联合学习
链接：https://arxiv.org/abs/2603.08731

作者：Hasi Hays
摘要：We introduce Hebbian-Oscillatory Co-Learning (HOC-L), a unified two-timescale dynamical framework for joint structural plasticity and phase synchronization in bio-inspired sparse neural architectures. HOC-L couples two recent frameworks: the hyperbolic sparse geometry of Resonant Sparse Geometry Networks (RSGN), which employs Poincaré ball embeddings with Hebbian-driven dynamic sparsity, and the oscillator-based attention of Selective Synchronization Attention (SSA), which replaces dot-product attention with Kuramoto-type phase-locking dynamics. The key mechanism is synchronization-gated plasticity: the macroscopic order parameter $r(t)$ of the oscillator ensemble gates Hebbian structural updates, so that connectivity consolidation occurs only when sufficient phase coherence signals a meaningful computational pattern. We prove convergence of the joint system to a stable equilibrium via a composite Lyapunov function and derive explicit timescale separation bounds. The resulting architecture achieves $O(n \cdot k)$ complexity with $k \ll n$, preserving the sparsity of both parent frameworks. Numerical simulations confirm the theoretical predictions, demonstrating emergent cluster-aligned connectivity and monotonic Lyapunov decrease.

【18】Memory-Augmented Spiking Networks: Synergistic Integration of Complementary Mechanisms for Neuromorphic Vision
标题：记忆增强尖峰网络：神经形态视觉互补机制的协同整合
链接：https://arxiv.org/abs/2603.08730

作者：Effiong Blessing,Chiung-Yi Tseng,Isaac Nkrumah,Junaid Rehman
摘要：Spiking Neural Networks (SNNs) provide biological plausibility and energy efficiency, yet systematic investigations of memory augmentation strategies remain limited. We conduct a five-model ablation study integrating Leaky Integrate-and-Fire neurons, Supervised Contrastive Learning (SCL), Hopfield networks, and Hierarchical Gated Recurrent Networks (HGRN) on the N-MNIST dataset. Baseline SNNs exhibit organized neuronal groupings, or structured assemblies, characterized by a silhouette score of $0.687 \pm 0.012$. Individual augmentations introduce trade-offs: SCL improves accuracy by $0.28\%$ but reduces clustering (silhouette score $0.637 \pm 0.015$), while HGRN yields consistent gains in both accuracy ($+1.01\%$) and computational efficiency ($170.6\times$). Full integration achieves a balanced improvement across metrics, reaching a silhouette score of $0.715 \pm 0.008$, classification accuracy of $97.49 \pm 0.10\%$, energy consumption of $1.85 \pm 0.06\,μ\mathrm{J}$, and sparsity of $97.0\%$. These results indicate that optimal performance emerges from architectural balance rather than isolated optimization, establishing design principles for memory-augmented neuromorphic systems.

【19】Equitable Multi-Task Learning for AI-RANs
标题：AI-RAN的公平多任务学习
链接：https://arxiv.org/abs/2603.08717

作者：Panayiotis Raptis,Fatih Aslan,George Iosifidis
备注：6 pages, 3 figures
摘要：AI-enabled Radio Access Networks (AI-RANs) are expected to serve heterogeneous users with time-varying learning tasks over shared edge resources. Ensuring equitable inference performance across these users requires adaptive and fair learning mechanisms. This paper introduces an online-within-online fair multi-task learning (OWO-FMTL) framework that ensures long-term equity across users. The method combines two learning loops: an outer loop updating the shared model across rounds and an inner loop rebalancing user priorities within each round with a lightweight primal-dual update. Equity is quantified via generalized alpha-fairness, allowing a trade-off between efficiency and fairness. The framework guarantees diminishing performance disparity over time and operates with low computational overhead suitable for edge deployment. Experiments on convex and deep learning tasks confirm that OWO-FMTL outperforms existing multi-task learning baselines under dynamic scenarios.

【20】Evolution of Photonic Quantum Machine Learning under Noise
标题：噪音下的光量子机器学习的演变
链接：https://arxiv.org/abs/2603.09645

作者：A. M. A. S. D. Alagiyawanna,Asoka Karunananda
备注：26 pages, 9 figures. Review article. Currently under review at Quantum Machine Intelligence (Springer Nature)
摘要：Photonic Quantum Machine Learning (PQML) is an emerging approach that integrates photonic quantum computing technologies with machine learning techniques to enable scalable and energy-efficient quantum information processing. Photonic systems offer advantages such as room-temperature operation, high-speed signal processing, and the ability to represent information in high-dimensional Hilbert spaces. However, noise remains a major challenge affecting the performance, reliability, and scalability of PQML implementations. This review provides a systematic analysis of noise sources in photonic quantum machine learning systems. We discuss photonic quantum computing architectures and examine key quantum machine learning algorithms implemented on photonic platforms, including Variational Quantum Circuits, Quantum Neural Networks, and Quantum Support Vector Machines. The paper categorizes major noise mechanisms and analyzes their impact on learning performance, training stability, and convergence behavior. Furthermore, we review both traditional and advanced noise characterization techniques and survey recent strategies for noise mitigation in photonic quantum systems. Finally, we highlight recent experimental advances and discuss future research directions for developing robust and scalable PQML systems under realistic noise conditions.

【21】Verifying Good Regulator Conditions for Hypergraph Observers: Natural Gradient Learning from Causal Invariance via Established Theorems
标题：为超图观察者提供良好的调节条件：通过既定定理从因果不变性中进行自然梯度学习
链接：https://arxiv.org/abs/2603.09067

作者：Max Zhuravlev
备注：18 pages, 15 formal results. Part of a series of companion papers submitted simultaneously; cross-references updated with arXiv IDs in v2
摘要：We verify that persistent observers in causally invariant hypergraph substrates satisfy the conditions of the Conant-Ashby Good Regulator Theorem. Building on Wolfram's hypergraph physics and Vanchurin's neural network cosmology, we formalize persistent observers as entities that minimize prediction error at their boundary with the environment. Applying a modern reformulation of the Conant-Ashby theorem, we demonstrate that hypergraph observers satisfy Good Regulator conditions, requiring them to maintain internal models. Once an internal model with loss function exists, the emergence of a Fisher information metric follows from standard information geometry. Invoking Amari's uniqueness theorem for reparameterization-invariant gradients, we show that natural gradient descent is the unique admissible learning rule. Under the ansatz M=F^2 for exponential family observers and one specific convergence time functional, we derive a closed-form formula for the regime parameter alpha in Vanchurin's Type II framework, with a quantum-classical threshold at kappa(F)=2. However, three alternative convergence models do not reproduce this result, so this prediction is strongly model-dependent. We further introduce the directional regime parameter alpha_{v_k} and the trace-free deviation tensor, showing that a single observer can simultaneously occupy different Vanchurin regimes along different eigendirections of the Fisher metric. This connects Wolfram and Vanchurin frameworks through established theorems, providing approximately 25-30% novel contribution.

【22】Permutation-Equivariant 2D State Space Models: Theory and Canonical Architecture for Multivariate Time Series
标题：置换等变2D状态空间模型：多元时间序列的理论和规范架构
链接：https://arxiv.org/abs/2603.08753

作者：Seungwoo Jeong,Heung-Il Suk
摘要：Multivariate time series (MTS) modeling often implicitly imposes an artificial ordering over variables, violating the inherent exchangeability found in many real-world systems where no canonical variable axis exists. We formalize this limitation as a violation of the permutation symmetry principle and require state-space dynamics to be permutation-equivariant along the variable axis. In this work, we theoretically characterize the complete canonical form of linear variable coupling under this symmetry constraint. We prove that any permutation-equivariant linear 2D state-space system naturally decomposes into local self-dynamics and a global pooled interaction, rendering ordered recurrence not only unnecessary but structurally suboptimal. Motivated by this theoretical foundation, we introduce the Variable-Invariant Two-Dimensional State Space Model (VI 2D SSM), which realizes the canonical equivariant form via permutation-invariant aggregation. This formulation eliminates sequential dependency chains along the variable axis, reducing the dependency depth from $\mathcal{O}(C)$ to $\mathcal{O}(1)$ and simplifying stability analysis to two scalar modes. Furthermore, we propose VI 2D Mamba, a unified architecture integrating multi-scale temporal dynamics and spectral representations. Extensive experiments on forecasting, classification, and anomaly detection benchmarks demonstrate that our model achieves state-of-the-art performance with superior structural scalability, validating the theoretical necessity of symmetry-preserving 2D modeling.

其他(34篇)

【1】From Data Statistics to Feature Geometry: How Correlations Shape Superposition
标题：从数据统计到特征几何：相关性如何塑造叠加
链接：https://arxiv.org/abs/2603.09972

作者：Lucas Prieto,Edward Stevinson,Melih Barsbey,Tolga Birdal,Pedro A. M. Mediano
摘要：A central idea in mechanistic interpretability is that neural networks represent more features than they have dimensions, arranging them in superposition to form an over-complete basis. This framing has been influential, motivating dictionary learning approaches such as sparse autoencoders. However, superposition has mostly been studied in idealized settings where features are sparse and uncorrelated. In these settings, superposition is typically understood as introducing interference that must be minimized geometrically and filtered out by non-linearities such as ReLUs, yielding local structures like regular polytopes. We show that this account is incomplete for realistic data by introducing Bag-of-Words Superposition (BOWS), a controlled setting to encode binary bag-of-words representations of internet text in superposition. Using BOWS, we find that when features are correlated, interference can be constructive rather than just noise to be filtered out. This is achieved by arranging features according to their co-activation patterns, making interference between active features constructive, while still using ReLUs to avoid false positives. We show that this kind of arrangement is more prevalent in models trained with weight decay and naturally gives rise to semantic clusters and cyclical structures which have been observed in real language models yet were not explained by the standard picture of superposition. Code for this paper can be found at https://github.com/LucasPrietoAl/correlations-feature-geometry.

【2】On the Width Scaling of Neural Optimizers Under Matrix Operator Norms I: Row/Column Normalization and Hyperparameter Transfer
标题：矩阵运算符规范下神经优化器的宽度缩放I：行/列规范化和超参数转移
链接：https://arxiv.org/abs/2603.09952

作者：Ruihan Xu,Jiajin Li,Yiping Lu
摘要：A central question in modern deep learning is how to design optimizers whose behavior remains stable as the network width $w$ increases. We address this question by interpreting several widely used neural-network optimizers, including \textrm{AdamW} and \textrm{Muon}, as instances of steepest descent under matrix operator norms. This perspective links optimizer geometry with the Lipschitz structure of the network forward map, and enables width-independent control of both Lipschitz and smoothness constants. However, steepest-descent rules induced by standard $p \to q$ operator norms lack layerwise composability and therefore cannot provide width-independent bounds in deep architectures. We overcome this limitation by introducing a family of mean-normalized operator norms, denoted $\pmean \to \qmean$, that admit layerwise composability, yield width-independent smoothness bounds, and give rise to practical optimizers such as \emph{rescaled} \textrm{AdamW}, row normalization, and column normalization. The resulting learning rate width-aware scaling rules recover $μ$P scaling~\cite{yang2021tensor} as a special case and provide a principled mechanism for cross-width learning-rate transfer across a broad class of optimizers. We further show that \textrm{Muon} can suffer an $\mathcal{O}(\sqrt{w})$ worst-case growth in the smoothness constant, whereas a new family of row-normalized optimizers we propose achieves width-independent smoothness guarantees. Based on the observations, we propose MOGA (Matrix Operator Geometry Aware), a width-aware optimizer based only on row/column-wise normalization that enables stable learning-rate transfer across model widths. Large-scale pre-training on GPT-2 and LLaMA shows that MOGA, especially with row normalization, is competitive with Muon while being notably faster in large-token and low-loss regimes.

【3】Towards a Neural Debugger for Python
标题：迈向Python的神经触发器
链接：https://arxiv.org/abs/2603.09951

作者：Maximilian Beck,Jonas Gehring,Jannik Kossen,Gabriel Synnaeve
备注：22 pages
摘要：Training large language models (LLMs) on Python execution traces grounds them in code execution and enables the line-by-line execution prediction of whole Python programs, effectively turning them into neural interpreters (FAIR CodeGen Team et al., 2025). However, developers rarely execute programs step by step; instead, they use debuggers to stop execution at certain breakpoints and step through relevant portions only while inspecting or modifying program variables. Existing neural interpreter approaches lack such interactive control. To address this limitation, we introduce neural debuggers: language models that emulate traditional debuggers, supporting operations such as stepping into, over, or out of functions, as well as setting breakpoints at specific source lines. We show that neural debuggers -- obtained via fine-tuning large LLMs or pre-training smaller models from scratch -- can reliably model both forward execution (predicting future states and outputs) and inverse execution (inferring prior states or inputs) conditioned on debugger actions. Evaluated on CruxEval, our models achieve strong performance on both output and input prediction tasks, demonstrating robust conditional execution modeling. Our work takes first steps towards future agentic coding systems in which neural debuggers serve as a world model for simulated debugging environments, providing execution feedback or enabling agents to interact with real debugging tools. This capability lays the foundation for more powerful code generation, program understanding, and automated debugging.

【4】Generative Drifting is Secretly Score Matching: a Spectral and Variational Perspective
标题：生成漂移是秘密的分数匹配：光谱和变分的角度
链接：https://arxiv.org/abs/2603.09936

作者：Erkan Turan,Maks Ovsjanikov
摘要：Generative Modeling via Drifting has recently achieved state-of-the-art one-step image generation through a kernel-based drift operator, yet the success is largely empirical and its theoretical foundations remain poorly understood. In this paper, we make the following observation: \emph{under a Gaussian kernel, the drift operator is exactly a score difference on smoothed distributions}. This insight allows us to answer all three key questions left open in the original work: (1) whether a vanishing drift guarantees equality of distributions ($V_{p,q}=0\Rightarrow p=q$), (2) how to choose between kernels, and (3) why the stop-gradient operator is indispensable for stable training. Our observations position drifting within the well-studied score-matching family and enable a rich theoretical perspective. By linearizing the McKean-Vlasov dynamics and probing them in Fourier space, we reveal frequency-dependent convergence timescales comparable to \emph{Landau damping} in plasma kinetic theory: the Gaussian kernel suffers an exponential high-frequency bottleneck, explaining the empirical preference for the Laplacian kernel. We also propose an exponential bandwidth annealing schedule $σ(t)=σ_0 e^{-rt}$ that reduces convergence time from $\exp(O(K_{\max}^2))$ to $O(\log K_{\max})$. Finally, by formalizing drifting as a Wasserstein gradient flow of the smoothed KL divergence, we prove that the stop-gradient operator is derived directly from the frozen-field discretization mandated by the JKO scheme, and removing it severs training from any gradient-flow guarantee. This variational perspective further provides a general template for constructing novel drift operators, demonstrated with a Sinkhorn divergence drift.

【5】What is Missing? Explaining Neurons Activated by Absent Concepts
标题：缺少什么？解释因缺乏概念而激活的神经元
链接：https://arxiv.org/abs/2603.09787

作者：Robin Hesse,Simone Schaub-Meyer,Janina Hesse,Bernt Schiele,Stefan Roth
备注：Preprint
摘要：Explainable artificial intelligence (XAI) aims to provide human-interpretable insights into the behavior of deep neural networks (DNNs), typically by estimating a simplified causal structure of the model. In existing work, this causal structure often includes relationships where the presence of a concept is associated with a strong activation of a neuron. For example, attribution methods primarily identify input pixels that contribute most to a prediction, and feature visualization methods reveal inputs that cause high activation of a target neuron - the former implicitly assuming that the relevant information resides in the input, and the latter that neurons encode the presence of concepts. However, a largely overlooked type of causal relationship is that of encoded absences, where the absence of a concept increases neural activation. In this work, we show that such missing but relevant concepts are common and that mainstream XAI methods struggle to reveal them when applied in their standard form. To address this, we propose two simple extensions to attribution and feature visualization techniques that uncover encoded absences. Across experiments, we show how mainstream XAI methods can be used to reveal and explain encoded absences, how ImageNet models exploit them, and that debiasing can be improved when considering them.

【6】Upper Generalization Bounds for Neural Oscillators
标题：神经振荡器的广义上界
链接：https://arxiv.org/abs/2603.09742

作者：Zifeng Huang,Konstantin M. Zuev,Yong Xia,Michael Beer
备注：This manuscript contains 25 pages with 4 figures
摘要：Neural oscillators that originate from the second-order ordinary differential equations (ODEs) have shown competitive performance in learning mappings between dynamic loads and responses of complex nonlinear structural systems. Despite this empirical success, theoretically quantifying the generalization capacities of their neural network architectures remains undeveloped. In this study, the neural oscillator consisting of a second-order ODE followed by a multilayer perceptron (MLP) is considered. Its upper probably approximately correct (PAC) generalization bound for approximating causal and uniformly continuous operators between continuous temporal function spaces and that for approximating the uniformly asymptotically incrementally stable second-order dynamical systems are derived by leveraging the Rademacher complexity framework. The theoretical results show that the estimation errors grow polynomially with respect to both the MLP size and the time length, thereby avoiding the curse of parametric complexity. Furthermore, the derived error bounds demonstrate that constraining the Lipschitz constants of the MLPs via loss function regularization can improve the generalization ability of the neural oscillator. A numerical study considering a Bouc-Wen nonlinear system under stochastic seismic excitation validates the theoretically predicted power laws of the estimation errors with respect to the sample size and time length, and confirms the effectiveness of constraining MLPs' matrix and vector norms in enhancing the performance of the neural oscillator under limited training data.

【7】Mousse: Rectifying the Geometry of Muon with Curvature-Aware Preconditioning
标题：慕思：用弯曲感知预处理纠正μ子的几何形状
链接：https://arxiv.org/abs/2603.09697

作者：Yechen Zhang,Shuhao Xing,Junhao Huang,Kai Lv,Yunhua Zhou,Xipeng Qiu,Qipeng Guo,Kai Chen
备注：17 pages, 10 figures
摘要：Recent advances in spectral optimization, notably Muon, have demonstrated that constraining update steps to the Stiefel manifold can significantly accelerate training and improve generalization. However, Muon implicitly assumes an isotropic optimization landscape, enforcing a uniform spectral update norm across all eigen-directions. We argue that this "egalitarian" constraint is suboptimal for Deep Neural Networks, where the curvature spectrum is known to be highly heavy-tailed and ill-conditioned. In such landscapes, Muon risks amplifying instabilities in high-curvature directions while limiting necessary progress in flat directions. In this work, we propose \textbf{Mousse} (\textbf{M}uon \textbf{O}ptimization \textbf{U}tilizing \textbf{S}hampoo's \textbf{S}tructural \textbf{E}stimation), a novel optimizer that reconciles the structural stability of spectral methods with the geometric adaptivity of second-order preconditioning. Instead of applying Newton-Schulz orthogonalization directly to the momentum matrix, Mousse operates in a whitened coordinate system induced by Kronecker-factored statistics (derived from Shampoo). Mathematically, we formulate Mousse as the solution to a spectral steepest descent problem constrained by an anisotropic trust region, where the optimal update is derived via the polar decomposition of the whitened gradient. Empirical results across language models ranging from 160M to 800M parameters demonstrate that Mousse consistently outperforms Muon, achieving around $\sim$12\% reduction in training steps with negligible computational overhead.

【8】On Catastrophic Forgetting in Low-Rank Decomposition-Based Parameter-Efficient Fine-Tuning
标题：基于低等级分解的参数高效微调中的灾难性遗忘
链接：https://arxiv.org/abs/2603.09684

作者：Muhammad Ahmad,Jingjing Zheng,Yankai Cao
摘要：Parameter-efficient fine-tuning (PEFT) based on low-rank decomposition, such as LoRA, has become a standard for adapting large pretrained models. However, its behavior in sequential learning -- specifically regarding catastrophic forgetting -- remains insufficiently understood. In this work, we present an empirical study showing that forgetting is strongly influenced by the geometry and parameterization of the update subspace. While methods that restrict updates to small, shared matrix subspaces often suffer from task interference, tensor-based decompositions (e.g., LoRETTA) mitigate forgetting by capturing richer structural information within ultra-compact budgets, and structurally aligned parameterizations (e.g., WeGeFT) preserve pretrained representations. Our findings highlight update subspace design as a key factor in continual learning and offer practical guidance for selecting efficient adaptation strategies in sequential settings.

【9】Well Log-Guided Synthesis of Subsurface Images from Sparse Petrography Data Using cGANs
标题：使用cGAN从稀疏岩石学数据中合成地下图像
链接：https://arxiv.org/abs/2603.09651

作者：Ali Sadeghkhani,A. Assadi,B. Bennett,A. Rabbani
备注：6 pages, 3 figures. Extended abstract presented at the Fifth EAGE Digitalization Conference & Exhibition, 24-26 March 2025, United Kingdom
摘要：Pore-scale imaging of subsurface formations is costly and limited to discrete depths, creating significant gaps in reservoir characterization. To address this, we present a conditional Generative Adversarial Network (cGAN) framework for synthesizing realistic thin section images of carbonate rock formations, conditioned on porosity values derived from well logs. The model is trained on 5,000 sub-images extracted from 15 petrography samples over a depth interval of 1992-2000m, the model generates geologically consistent images across a wide porosity range (0.004-0.745), achieving 81% accuracy within a 10\% margin of target porosity values. The successful integration of well log data with the trained generator enables continuous pore-scale visualization along the wellbore, bridging gaps between discrete core sampling points and providing valuable insights for reservoir characterization and energy transition applications such as carbon capture and underground hydrogen storage.

【10】MM-algorithms for traditional and convex NMF with Tweedie and Negative Binomial cost functions and empirical evaluation
标题：具有Tweedie和负二项成本函数的传统和凸NMF的MM算法和经验评估
链接：https://arxiv.org/abs/2603.09601

作者：Elisabeth Sommer James,Asger Hobolth,Marta Pelizzola
摘要：Non-negative matrix factorisation (NMF) is a widely used tool for unsupervised learning and feature extraction, with applications ranging from genomics to text analysis and signal processing. Standard formulations of NMF are typically derived under Gaussian or Poisson noise assumptions, which may be inadequate for data exhibiting overdispersion or other complex mean-variance relationships. In this paper, we develop a unified framework for both traditional and convex NMF under a broad class of distributional assumptions, including Negative Binomial and Tweedie models, where the connection between the Tweedie and the $β$-divergence is also highlighted. Using a Majorize-Minimisation approach, we derive multiplicative update rules for all considered models, and novel updates for convex NMF with Poisson and Negative Binomial cost functions. We provide a unified implementation of all considered models, including the first implementations of several convex NMF models. Empirical evaluations on mutational and word count data demonstrate that the choice of noise model critically affects model fit and feature recovery, and that convex NMF can provide an efficient and robust alternative to traditional NMF in scenarios where the number of classes is large. The code for our proposed updates is available in the R package nmfgenr and can be found at https://github.com/MartaPelizzola/nmfgenr.

【11】Nonparametric Variational Differential Privacy via Embedding Parameter Clipping
标题：基于嵌入参数裁剪的非参数变分差分隐私
链接：https://arxiv.org/abs/2603.09583

作者：Dina El Zein,Shashi Kumar,James Henderson
备注：8 pages, 1 figure
摘要：The nonparametric variational information bottleneck (NVIB) provides the foundation for nonparametric variational differential privacy (NVDP), a framework for building privacy-preserving language models. However, the learned latent representations can drift into regions with high information content, leading to poor privacy guarantees, but also low utility due to numerical instability during training. In this work, we introduce a principled parameter clipping strategy to directly address this issue. Our method is mathematically derived from the objective of minimizing the Rényi Divergence (RD) upper bound, yielding specific, theoretically grounded constraints on the posterior mean, variance, and mixture weight parameters. We apply our technique to an NVIB based model and empirically compare it against an unconstrained baseline. Our findings demonstrate that the clipped model consistently achieves tighter RD bounds, implying stronger privacy, while simultaneously attaining higher performance on several downstream tasks. This work presents a simple yet effective method for improving the privacy-utility trade-off in variational models, making them more robust and practical.

【12】Routing without Forgetting
标题：不忘记路由
链接：https://arxiv.org/abs/2603.09576

作者：Alessio Masano,Giovanni Bellitto,Dipam Goswani,Joost Van de Weijer,Concetto Spampinato
摘要：Continual learning in transformers is commonly addressed through parameter-efficient adaptation: prompts, adapters, or LoRA modules are specialized per task while the backbone remains frozen. Although effective in controlled multi-epoch settings, these approaches rely on gradual gradient-based specialization and struggle in Online Continual Learning (OCL), where data arrive as a non-stationary stream and each sample may be observed only once. We recast continual learning in transformers as a routing problem: under strict online constraints, the model must dynamically select the appropriate representational subspace for each input without explicit task identifiers or repeated optimization. We thus introduce Routing without Forgetting (RwF), a transformer architecture augmented with energy-based associative retrieval layers inspired by Modern Hopfield Networks. Instead of storing or merging task-specific prompts, RwF generates dynamic prompts through single-step associative retrieval over the transformer token embeddings at each layer. Retrieval corresponds to the closed-form minimization of a strictly convex free-energy functional, enabling input-conditioned routing within each forward pass, independently of iterative gradient refinement. Across challenging class-incremental benchmarks, RwF improves over existing prompt-based methods. On Split-ImageNet-R and Split-ImageNet-S, RwF outperforms prior prompt-based approaches by a large margin, even in few-shot learning regimes. These results indicate that embedding energy-based associative routing directly within the transformer backbone provides a principled and effective foundation for OCL.

【13】Reconstructing Movement from Sparse Samples: Enhanced Spatio-Temporal Matching Strategies for Low-Frequency Data
标题：从稀疏样本重建运动：低频数据的增强时空匹配策略
链接：https://arxiv.org/abs/2603.09412

作者：Ali Yousefian,Arianna Burzacchi,Simone Vantini
备注：22 pages, 14 figures, 3 tables
摘要：This paper explores potential improvements to the Spatial-Temporal Matching algorithm for matching the GPS trajectories to road networks. While this algorithm is effective, it presents some limitations in computational efficiency and the accuracy of the results, especially in dense environments with relatively high sampling intervals. To address this, the paper proposes four modifications to the original algorithm: a dynamic buffer, an adaptive observation probability, a redesigned temporal scoring function, and a behavioral analysis to account for the historical mobility patterns. The enhancements are assessed using real-world data from the urban area of Milan, and through newly defined evaluation metrics to be applied in the absence of ground truth. The results of the experiment show significant improvements in performance efficiency and path quality across various metrics.

【14】SPAARS: Safer RL Policy Alignment through Abstract Exploration and Refined Exploitation of Action Space
标题：SPAARS：通过抽象探索和精细利用行动空间实现更安全的RL政策调整
链接：https://arxiv.org/abs/2603.09378

作者：Swaminathan S K,Aritra Hazra
备注：9 pages
摘要：Offline-to-online reinforcement learning (RL) offers a promising paradigm for robotics by pre-training policies on safe, offline demonstrations and fine-tuning them via online interaction. However, a fundamental challenge remains: how to safely explore online without deviating from the behavioral support of the offline data? While recent methods leverage conditional variational autoencoders (CVAEs) to bound exploration within a latent space, they inherently suffer from an exploitation gap -- a performance ceiling imposed by the decoder's reconstruction loss. We introduce SPAARS, a curriculum learning framework that initially constrains exploration to the low-dimensional latent manifold for sample-efficient, safe behavioral improvement, then seamlessly transfers control to the raw action space, bypassing the decoder bottleneck. SPAARS has two instantiations: the CVAE-based variant requires only unordered (s,a) pairs and no trajectory segmentation; SPAARS-SUPE pairs SPAARS with OPAL temporal skill pretraining for stronger exploration structure at the cost of requiring trajectory chunks. We prove an upper bound on the exploitation gap using the Performance Difference Lemma, establish that latent-space policy gradients achieve provable variance reduction over raw-space exploration, and show that concurrent behavioral cloning during the latent phase directly controls curriculum transition stability. Empirically, SPAARS-SUPE achieves 0.825 normalized return on kitchen-mixed-v0 versus 0.75 for SUPE, with 5x better sample efficiency; standalone SPAARS achieves 92.7 and 102.9 normalized return on hopper-medium-v2 and walker2d-medium-v2 respectively, surpassing IQL baselines of 66.3 and 78.3 respectively, confirming the utility of the unordered-pair CVAE instantiation.

【15】Proxy-Guided Measurement Calibration
标题：代理引导的测量校准
链接：https://arxiv.org/abs/2603.09288

作者：Saketh Vishnubhatla,Shu Wan,Andre Harrison,Adrienne Raglin,Huan Liu
摘要：Aggregate outcome variables collected through surveys and administrative records are often subject to systematic measurement error. For instance, in disaster loss databases, county-level losses reported may differ from the true damages due to variations in on-the-ground data collection capacity, reporting practices, and event characteristics. Such miscalibration complicates downstream analysis and decision-making. We study the problem of outcome miscalibration and propose a framework guided by proxy variables for estimating and correcting the systematic errors. We model the data-generating process using a causal graph that separates latent content variables driving the true outcome from the latent bias variables that induce systematic errors. The key insight is that proxy variables that depend on the true outcome but are independent of the bias mechanism provide identifying information for quantifying the bias. Leveraging this structure, we introduce a two-stage approach that utilizes variational autoencoders to disentangle content and bias latents, enabling us to estimate the effect of bias on the outcome of interest. We analyze the assumptions underlying our approach and evaluate it on synthetic data, semi-synthetic datasets derived from randomized trials, and a real-world case study of disaster loss reporting.

【16】$P^2$GNN: Two Prototype Sets to boost GNN Performance
标题：$P ' 2$GNN：两个原型集可提高GNN性能
链接：https://arxiv.org/abs/2603.09195

作者：Arihant Jain,Gundeep Arora,Anoop Saladi,Chaosheng Dong
摘要：Message Passing Graph Neural Networks (MP-GNNs) have garnered attention for addressing various industry challenges, such as user recommendation and fraud detection. However, they face two major hurdles: (1) heavy reliance on local context, often lacking information about the global context or graph-level features, and (2) assumption of strong homophily among connected nodes, struggling with noisy local neighborhoods. To tackle these, we introduce $P^2$GNN, a plug-and-play technique leveraging prototypes to optimize message passing, enhancing the performance of the base GNN model. Our approach views the prototypes in two ways: (1) as universally accessible neighbors for all nodes, enriching global context, and (2) aligning messages to clustered prototypes, offering a denoising effect. We demonstrate the extensibility of our proposed method to all message-passing GNNs and conduct extensive experiments across 18 datasets, including proprietary e-commerce datasets and open-source datasets, on node recommendation and node classification tasks. Results show that $P^2$GNN outperforms production models in e-commerce and achieves the top average rank on open-source datasets, establishing it as a leading approach. Qualitative analysis supports the value of global context and noise mitigation in the local neighborhood in enhancing performance.

【17】The Costs of Reproducibility in Music Separation Research: a Replication of Band-Split RNN
标题：音乐分离研究中的复制成本：频段分裂RNN的复制
链接：https://arxiv.org/abs/2603.09187

作者：Paul Magron,Romain Serizel,Constance Douwes
摘要：Music source separation is the task of isolating the instrumental tracks from a music song. Despite its spectacular recent progress, the trend towards more complex architectures and training protocols exacerbates reproducibility issues. The band-split recurrent neural networks (BSRNN) model is promising in this regard, since it yields close to state-of-the-art results on public datasets, and requires reasonable resources for training. Unfortunately, it is not straightforward to reproduce since its full code is not available. In this paper, we attempt to replicate BSRNN as closely as possible to the original paper through extensive experiments, which allows us to conduct a critical reflection on this reproducibility issue. Our contributions are three-fold. First, this study yields several insights on the model design and training pipeline, which sheds light on potential future improvements. In particular, since we were unsuccessful in reproducing the original results, we explore additional variants that ultimately yield an optimized BSRNN model, whose performance largely improves that of the original. Second, we discuss reproducibility issues from both methodological and practical perspectives. We notably underline how substantial time and energy costs could have been saved upon availability of the full pipeline. Third, our code and pre-trained models are released publicly to foster reproducible research. We hope that this study will contribute to spread awareness on the importance of reproducible research in the music separation community, and help promoting more transparent and sustainable practices.

【18】Better Bounds for the Distributed Experts Problem
标题：分布式专家问题的更好界限
链接：https://arxiv.org/abs/2603.09168

作者：David P. Woodruff,Samson Zhou
摘要：In this paper, we study the distributed experts problem, where $n$ experts are distributed across $s$ servers for $T$ timesteps. The loss of each expert at each time $t$ is the $\ell_p$ norm of the vector that consists of the losses of the expert at each of the $s$ servers at time $t$. The goal is to minimize the regret $R$, i.e., the loss of the distributed protocol compared to the loss of the best expert, amortized over the all $T$ times, while using the minimum amount of communication. We give a protocol that achieves regret roughly $R\gtrsim\frac{1}{\sqrt{T}\cdot\text{poly}\log(nsT)}$, using $\mathcal{O}\left(\frac{n}{R^2}+\frac{s}{R^2}\right)\cdot\max(s^{1-2/p},1)\cdot\text{poly}\log(nsT)$ bits of communication, which improves on previous work.

【19】Overcoming Valid Action Suppression in Unmasked Policy Gradient Algorithms
标题：克服无屏蔽策略梯度算法中的有效动作抑制
链接：https://arxiv.org/abs/2603.09090

作者：Renos Zabounidis,Roy Siegelmann,Mohamad Qadri,Woojun Kim,Simon Stepputtis,Katia P. Sycara
摘要：In reinforcement learning environments with state-dependent action validity, action masking consistently outperforms penalty-based handling of invalid actions, yet existing theory only shows that masking preserves the policy gradient theorem. We identify a distinct failure mode of unmasked training: it systematically suppresses valid actions at states the agent has not yet visited. This occurs because gradients pushing down invalid actions at visited states propagate through shared network parameters to unvisited states where those actions are valid. We prove that for softmax policies with shared features, when an action is invalid at visited states but valid at an unvisited state $s^*$, the probability $π(a \mid s^*)$ is bounded by exponential decay due to parameter sharing and the zero-sum identity of softmax logits. This bound reveals that entropy regularization trades off between protecting valid actions and sample efficiency, a tradeoff that masking eliminates. We validate empirically that deep networks exhibit the feature alignment condition required for suppression, and experiments on Craftax, Craftax-Classic, and MiniHack confirm the predicted exponential suppression and demonstrate that feasibility classification enables deployment without oracle masks.

【20】Exclusive Self Attention
标题：独家自我关注
链接：https://arxiv.org/abs/2603.09078

作者：Shuangfei Zhai
摘要：We introduce exclusive self attention (XSA), a simple modification of self attention (SA) that improves Transformer's sequence modeling performance. The key idea is to constrain attention to capture only information orthogonal to the token's own value vector (thus excluding information of self position), encouraging better context modeling. Evaluated on the standard language modeling task, XSA consistently outperforms SA across model sizes up to 2.7B parameters and shows increasingly larger gains as sequence length grows.

【21】When to Retrain after Drift: A Data-Only Test of Post-Drift Data Size Sufficiency
标题：漂移后何时重新训练：漂移后数据大小足够性的纯数据测试
链接：https://arxiv.org/abs/2603.09024

作者：Ren Fujiwara,Yasuko Matsubara,Yasushi Sakurai
备注：Accepted by ICLR 2026
摘要：Sudden concept drift makes previously trained predictors unreliable, yet deciding when to retrain and what post-drift data size is sufficient is rarely addressed. We propose CALIPER - a detector- and model-agnostic, data-only test that estimates the post-drift data size required for stable retraining. CALIPER exploits state dependence in streams generated by dynamical systems: we run a single-pass weighted local regression over the post-drift window and track a one-step proxy error as a function of a locality parameter $θ$. When an effective sample size gate is satisfied, a monotonically non-increasing trend in this error with increasing a locality parameter indicates that the data size is sufficiently informative for retraining. We also provide a theoretical analysis of our method, and we show that the algorithm has a low per-update time and memory. Across datasets from four heterogeneous domains, three learner families, and two detectors, CALIPER consistently matches or exceeds the best fixed data size for retraining while incurring negligible overhead and often outperforming incremental updates. CALIPER closes the gap between drift detection and data-sufficient adaptation in streaming learning.

【22】The Coupling Within: Flow Matching via Distilled Normalizing Flows
标题：内部的耦合：通过蒸馏正化流进行流量匹配
链接：https://arxiv.org/abs/2603.09014

作者：David Berthelot,Tianrong Chen,Jiatao Gu,Marco Cuturi,Laurent Dinh,Bhavik Chandna,Michal Klein,Josh Susskind,Shuangfei Zhai
备注：Submitted to ICML 2026
摘要：Flow models have rapidly become the go-to method for training and deploying large-scale generators, owing their success to inference-time flexibility via adjustable integration steps. A crucial ingredient in flow training is the choice of coupling measure for sampling noise/data pairs that define the flow matching (FM) regression loss. While FM training defaults usually to independent coupling, recent works show that adaptive couplings informed by noise/data distributions (e.g., via optimal transport, OT) improve both model training and inference. We radicalize this insight by shifting the paradigm: rather than computing adaptive couplings directly, we use distilled couplings from a different, pretrained model capable of placing noise and data spaces in bijection -- a property intrinsic to normalizing flows (NF) through their maximum likelihood and invertibility requirements. Leveraging recent advances in NF image generation via auto-regressive (AR) blocks, we propose Normalized Flow Matching (NFM), a new method that distills the quasi-deterministic coupling of pretrained NF models to train student flow models. These students achieve the best of both worlds: significantly outperforming flow models trained with independent or even OT couplings, while also improving on the teacher AR-NF model.

【23】BiCLIP: Domain Canonicalization via Structured Geometric Transformation
标题：BiCLIP：通过结构化几何变换实现领域规范化
链接：https://arxiv.org/abs/2603.08942

作者：Pranav Mantini,Shishir K. Shah
摘要：Recent advances in vision-language models (VLMs) have demonstrated remarkable zero-shot capabilities, yet adapting these models to specialized domains remains a significant challenge. Building on recent theoretical insights suggesting that independently trained VLMs are related by a canonical transformation, we extend this understanding to the concept of domains. We hypothesize that image features across disparate domains are related by a canonicalized geometric transformation that can be recovered using a small set of anchors. Few-shot classification provides a natural setting for this alignment, as the limited labeled samples serve as the anchors required to estimate this transformation. Motivated by this hypothesis, we introduce BiCLIP, a framework that applies a targeted transformation to multimodal features to enhance cross-modal alignment. Our approach is characterized by its extreme simplicity and low parameter footprint. Extensive evaluations across 11 standard benchmarks, including EuroSAT, DTD, and FGVCAircraft, demonstrate that BiCLIP consistently achieves state-of-the-art results. Furthermore, we provide empirical verification of existing geometric findings by analyzing the orthogonality and angular distribution of the learned transformations, confirming that structured alignment is the key to robust domain adaptation. Code is available at https://github.com/QuantitativeImagingLaboratory/BilinearCLIP

【24】Uncovering a Winning Lottery Ticket with Continuously Relaxed Bernoulli Gates
标题：不断放松的伯努里·盖茨揭开中奖彩票
链接：https://arxiv.org/abs/2603.08914

作者：Itamar Tsayag,Ofir Lindenbaum
摘要：Over-parameterized neural networks incur prohibitive memory and computational costs for resource-constrained deployment. The Strong Lottery Ticket (SLT) hypothesis suggests that randomly initialized networks contain sparse subnetworks achieving competitive accuracy without weight training. Existing SLT methods, notably edge-popup, rely on non-differentiable score-based selection, limiting optimization efficiency and scalability. We propose using continuously relaxed Bernoulli gates to discover SLTs through fully differentiable, end-to-end optimization - training only gating parameters while keeping all network weights frozen at their initialized values. Continuous relaxation enables direct gradient-based optimization of an $\ell_0$-regularization objective, eliminating the need for non-differentiable gradient estimators or iterative pruning cycles. To our knowledge, this is the first fully differentiable approach for SLT discovery that avoids straight-through estimator approximations. Experiments across fully connected networks, CNNs (ResNet, Wide-ResNet), and Vision Transformers (ViT, Swin-T) demonstrate up to 90% sparsity with minimal accuracy loss - nearly double the sparsity achieved by edge-popup at comparable accuracy - establishing a scalable framework for pre-training network sparsification.

【25】SoftJAX & SoftTorch: Empowering Automatic Differentiation Libraries with Informative Gradients
标题：SoftJAX和SoftTorch：通过信息属性为自动差异库提供支持
链接：https://arxiv.org/abs/2603.08824

作者：Anselm Paulus,A. René Geist,Vít Musil,Sebastian Hoffmann,Onur Beker,Georg Martius
摘要：Automatic differentiation (AD) frameworks such as JAX and PyTorch have enabled gradient-based optimization for a wide range of scientific fields. Yet, many "hard" primitives in these libraries such as thresholding, Boolean logic, discrete indexing, and sorting operations yield zero or undefined gradients that are not useful for optimization. While numerous "soft" relaxations have been proposed that provide informative gradients, the respective implementations are fragmented across projects, making them difficult to combine and compare. This work introduces SoftJAX and SoftTorch, open-source, feature-complete libraries for soft differentiable programming. These libraries provide a variety of soft functions as drop-in replacements for their hard JAX and PyTorch counterparts. This includes (i) elementwise operators such as clip or abs, (ii) utility methods for manipulating Booleans and indices via fuzzy logic, (iii) axiswise operators such as sort or rank -- based on optimal transport or permutahedron projections, and (iv) offer full support for straight-through gradient estimation. Overall, SoftJAX and SoftTorch make the toolbox of soft relaxations easily accessible to differentiable programming, as demonstrated through benchmarking and a practical case study. Code is available at github.com/a-paulus/softjax and github.com/a-paulus/softtorch.

【26】The Temporal Markov Transition Field
标题：时间马尔科夫转移场
链接：https://arxiv.org/abs/2603.08803

作者：Michael Leznik
备注：13 pages, 2 figures
摘要：The Markov Transition Field (MTF), introduced by Wang and Oates (2015), encodes a time series as a two-dimensional image by mapping each pair of time steps to the transition probability between their quantile states, estimated from a single global transition matrix. This construction is efficient when the transition dynamics are stationary, but produces a misleading representation when the process changes regime over time: the global matrix averages across regimes and the resulting image loses all information about \emph{when} each dynamical regime was active. In this paper we introduce the \emph{Temporal Markov Transition Field} (TMTF), an extension that partitions the series into $K$ contiguous temporal chunks, estimates a separate local transition matrix for each chunk, and assembles the image so that each row reflects the dynamics local to its chunk rather than the global average. The resulting $T \times T$ image has $K$ horizontal bands of distinct texture, each encoding the transition dynamics of one temporal segment. We develop the formal definition, establish the key structural properties of the representation, work through a complete numerical example that makes the distinction from the global MTF concrete, analyse the bias--variance trade-off introduced by temporal chunking, and discuss the geometric interpretation of the local transition matrices in terms of process properties such as persistence, mean reversion, and trending behaviour. The TMTF is amplitude-agnostic and order-preserving, making it suitable as an input channel for convolutional neural networks applied to time series characterisation tasks.

【27】Generalized Reduction to the Isotropy for Flexible Equivariant Neural Fields
标题：柔性等变神经元场的广义各向同性约化
链接：https://arxiv.org/abs/2603.08758

作者：Alejandro García-Castellanos,Gijs Bellaard,Remco Duits,Daniel Pelt,Erik J Bekkers
摘要：Many geometric learning problems require invariants on heterogeneous product spaces, i.e., products of distinct spaces carrying different group actions, where standard techniques do not directly apply. We show that, when a group $G$ acts transitively on a space $M$, any $G$-invariant function on a product space $X \times M$ can be reduced to an invariant of the isotropy subgroup $H$ of $M$ acting on $X$ alone. Our approach establishes an explicit orbit equivalence $(X \times M)/G \cong X/H$, yielding a principled reduction that preserves expressivity. We apply this characterization to Equivariant Neural Fields, extending them to arbitrary group actions and homogeneous conditioning spaces, and thereby removing the major structural constraints imposed by existing methods.

【28】The AetherFloat Family: Block-Scale-Free Quad-Radix Floating-Point Architectures for AI Accelerators
标题：AetherFloat系列：用于人工智能加速器的无块规模四基浮点架构
链接：https://arxiv.org/abs/2603.08741

作者：Keita Morisaki
摘要：The IEEE 754 floating-point standard is the bedrock of modern computing, but its structural requirements -- a hidden leading bit, Base-2 bit-level normalization, and Sign-Magnitude encoding -- impose significant silicon area and power overhead in massively parallel Neural Processing Units (NPUs). Furthermore, the industry's recent shift to 8-bit formats (e.g., FP8 E4M3, OCP MX formats) has introduced a new hardware penalty: the strict necessity of Block-Scaling (AMAX) logic to prevent out-of-bound Large Language Model (LLM) activations from overflowing and degrading accuracy. The AetherFloat Family is a parameterizable architectural replacement designed from first principles for Hardware/Software Co-Design in AI acceleration. By synthesizing Lexicographic One's Complement Unpacking, Quad-Radix (Base-4) Scaling, and an Explicit Mantissa, AetherFloat achieves zero-cycle native integer comparability, branchless subnormal handling, and a verified 33.17% area, 21.99% total power, and 11.73% critical path delay reduction across the multiply-accumulate (MAC) unit. Instantiated as AetherFloat-8 (AF8), the architecture relies on a purely explicit 3-bit mantissa. Combined with Base-4 scaling, AF8 delivers a substantially wider dynamic range, acting as a ``Block-Scale-Free'' format for inference that circumvents dynamic scaling microarchitecture. Finally, a novel Vector-Shared 32-bit Galois Stochastic Rounding topology bounds precision variance while neutralizing the vanishing gradients that plague legacy formats. While AF16 serves as a near-lossless bfloat16 replacement via post-training quantization, AF8 is designed as a QAT-first inference format: its Block-Scale-Free property eliminates dynamic AMAX hardware at the cost of requiring quantization-aware fine-tuning for deployment.

【29】Sensitivity-Guided Framework for Pruned and Quantized Reservoir Computing Accelerators
标题：修剪和量化水库计算加速器的灵敏度引导框架
链接：https://arxiv.org/abs/2603.08737

作者：Atousa Jafari,Mahdi Taheri,Hassan Ghasemzadeh Mohammadi,Christian Herglotz,Marco Platzner
摘要：This paper presents a compression framework for Reservoir Computing that enables systematic design-space exploration of trade-offs among quantization levels, pruning rates, model accuracy, and hardware efficiency. The proposed approach leverages a sensitivity-based pruning mechanism to identify and remove less critical quantized weights with minimal impact on model accuracy, thereby reducing computational overhead while preserving accuracy. We perform an extensive trade-off analysis to validate the effectiveness of the proposed framework and the impact of pruning and quantization on model performance and hardware parameters. For this evaluation, we employ three time-series datasets, including both classification and regression tasks. Experimental results across selected benchmarks demonstrate that our proposed approach maintains high accuracy while substantially improving computational and resource efficiency in FPGA-based implementations, with variations observed across different configurations and time series applications. For instance, for the MELBOEN dataset, an accelerator quantized to 4-bit at a 15\% pruning rate reduces resource utilization by 1.2\% and the Power Delay Product (PDP) by 50.8\% compared to an unpruned model, without any noticeable degradation in accuracy.

【30】Unveiling the Potential of Quantization with MXFP4: Strategies for Quantization Error Reduction
标题：利用MXFP 4揭示量化潜力：量化误差减少策略
链接：https://arxiv.org/abs/2603.08713

作者：Jatin Chhugani,Geonhwa Jeong,Bor-Yiing Su,Yunjie Pan,Hanmei Yang,Aayush Ankit,Jiecao Yu,Summer Deng,Yunqing Chen,Nadathur Satish,Changkyu Kim
摘要：Large Language Models (LLMs) have intensified the need for low-precision formats that enable efficient, large-scale inference. The Open Compute Project (OCP) Microscaling (MX) standard is attractive due to its favorable hardware efficiency, but its 4-bit variant (MXFP4) lags behind NVIDIA's NVFP4 in accuracy, limiting adoption. We introduce two software-only techniques, Overflow-Aware Scaling (OAS) and Macro Block Scaling (MBS), that improve MXFP4 quantization fidelity without requiring hardware changes. OAS reduces overall errors by increasing effective dynamic range under power-of-two block scaling, while MBS allocates higher-precision scaling at a coarser granularity to better preserve outliers. Across multiple LLMs and standard downstream benchmarks, OAS and MBS reduce the end-to-end accuracy gap between MXFP4 and NVFP4 from about 10% to below 1% on average, while incurring modest GEMM overhead (6.2% on average). These results re-establish MXFP4 as a practical alternative to NVFP4, enabling near-NVFP4 accuracy while retaining MX's hardware-efficiency advantages (e.g., 12% relative area savings in tensor cores).

【31】Global universality via discrete-time signatures
标题：通过离散时间签名实现全球普遍性
链接：https://arxiv.org/abs/2603.09773

作者：Mihriban Ceylan,David J. Prömel
摘要：We establish global universal approximation theorems on spaces of piecewise linear paths, stating that linear functionals of the corresponding signatures are dense with respect to $L^p$- and weighted norms, under an integrability condition on the underlying weight function. As an application, we show that piecewise linear interpolations of Brownian motion satisfies this integrability condition. Consequently, we obtain $L^p$-approximation results for path-dependent functionals, random ordinary differential equations, and stochastic differential equations driven by Brownian motion.

【32】A Generative Sampler for distributions with possible discrete parameter based on Reversibility
标题：基于可逆性的可能离散参数分布的生成采样器
链接：https://arxiv.org/abs/2603.09251

作者：Lei Li,Zhen Wang,Lishuo Zhang
摘要：Learning to sample from complex unnormalized distributions is a fundamental challenge in computational physics and machine learning. While score-based and variational methods have achieved success in continuous domains, extending them to discrete or mixed-variable systems remains difficult due to ill-defined gradients or high variance in estimators. We propose a unified, target-gradient-free generative sampling framework applicable across diverse state spaces. Building on the fact that detailed balance implies the time-reversibility of the equilibrium stochastic process, we enforce this symmetry as a statistical constraint. Specifically, using a prescribed physical transition kernel (such as Metropolis-Hastings), we minimize the Maximum Mean Discrepancy (MMD) between the joint distributions of forward and backward Markov trajectories. Crucially, this training procedure relies solely on energy evaluations via acceptance ratios, circumventing the need for target score functions or continuous relaxations. We demonstrate the versatility of our method on three distinct benchmarks: (1) a continuous multi-modal Gaussian mixture, (2) the discrete high-dimensional Ising model, and (3) a challenging hybrid system coupling discrete indices with continuous dynamics. Experiments show that our framework accurately reproduces thermodynamic observables and captures mode-switching behavior across all regimes, offering a physically grounded and universally applicable alternative for equilibrium sampling.

【33】Data-driven robust Markov decision processes on Borel spaces: performance guarantees via an axiomatic approach
标题：Borel空间上的数据驱动稳健Markov决策过程：通过公理方法保证性能
链接：https://arxiv.org/abs/2603.08979

作者：Sivaramakrishnan Ramani
摘要：We consider Markov decision processes (MDPs) with unknown disturbance distribution and address this problem using the robust Markov decision process (RMDP) approach. We construct the empirical distribution of the unknown disturbance distribution and characterize our ambiguity set of distributions as the sublevel set of a nonnegative distance function from the empirical distribution. By connecting the weak convergence of distributions to convergence with respect to the distance function, we prove that the robust optimal value function and the out-of-sample value function converge to the true optimal value function with increasing sample-sizes. We establish that, for finite sample-sizes, the robust optimal value function serves as a high probability upper bound on the out-of-sample value function. We also obtain probabilistic convergence rates, sample complexity bounds, and out-of-distribution performance bounds. The finite sample performance guarantees rely on the distance function satisfying a certain concentration type inequality. Several well-studied distances in the literature meet the requirements imposed on the distance function. We also analyze the data-driven properties of empirical MDPs and demonstrate that, unlike our data-driven RMDPs, empirical MDPs fail to satisfy some of the finite sample performance guarantees.

【34】On the Formal Limits of Alignment Verification
标题：关于对齐验证的形式限制
链接：https://arxiv.org/abs/2603.08761

作者：Ayushi Agarwal
摘要：The goal of AI alignment is to ensure that an AI system reliably pursues intended objectives. A foundational question for AI safety is whether alignment can be formally certified: whether there exists a procedure that can guarantee that a given system satisfies an alignment specification. This paper studies the nature of alignment verification. We prove that no verification procedure can simultaneously satisfy three properties: soundness (no misaligned system is certified), generality (verification holds over the full input domain), and tractability (verification runs in polynomial time). Each pair of properties is achievable, but all three cannot hold simultaneously. Relaxing any one property restores the corresponding possibility, indicating that practical bounded or probabilistic assurance remains viable. The result follows from three independent barriers: the computational complexity of full-domain neural network verification, the non-identifiability of internal goal structure from behavioral observation, and the limits of finite evidence for properties defined over infinite domains. The trilemma establishes the limits of alignment certification and characterizes the regimes in which meaningful guarantees remain possible.

机器翻译由腾讯交互翻译提供，仅供参考

点击“阅读原文”获取带摘要的学术速递