Py学习  »  机器学习算法

机器学习学术速递[3.9]

arXiv每日学术速递 • 1 月前 • 820 次点击  

点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!


cs.LG 方向,今日共计127篇


大模型相关(14篇)

【1】BEVLM: Distilling Semantic Knowledge from LLMs into Bird's-Eye View Representations
标题:BEVLM:将LLM中的语义知识提炼为鸟瞰视图表示
链接:https://arxiv.org/abs/2603.06576

作者:Thomas Monninger,Shaoyuan Xie,Qi Alfred Chen,Sihao Ding
备注:4 figures, 6 tables in the main paper, 32 pages in total
摘要:将大型语言模型(LLM)集成到自动驾驶中,由于其强大的推理和语义理解能力而引起了越来越多的兴趣,这对于处理复杂的决策和长尾场景至关重要。然而,现有方法通常独立地向LLM馈送来自多视图和多帧图像的令牌,导致冗余计算和有限的空间一致性。视觉处理中的这种分离阻碍了准确的3D空间推理,并且无法保持视图之间的几何一致性。另一方面,从几何注释的任务(例如,对象检测)提供空间结构,但缺乏基础视觉编码器的语义丰富性。为了弥合这一差距,我们提出了BEVLM,一个框架,连接一个空间上一致的和语义提取的BEV表示与LLM。通过大量的实验,我们表明,BEVLM使LLM能够在交叉视图驾驶场景中更有效地推理,通过利用BEV特征作为统一输入,将准确率提高了46%。此外,通过将LLM中的语义知识提取到BEV表示中,BEVLM在安全关键场景中将闭环端到端驾驶性能显著提高了29%。
摘要:The integration of Large Language Models (LLMs) into autonomous driving has attracted growing interest for their strong reasoning and semantic understanding abilities, which are essential for handling complex decision-making and long-tail scenarios. However, existing methods typically feed LLMs with tokens from multi-view and multi-frame images independently, leading to redundant computation and limited spatial consistency. This separation in visual processing hinders accurate 3D spatial reasoning and fails to maintain geometric coherence across views. On the other hand, Bird's-Eye View (BEV) representations learned from geometrically annotated tasks (e.g., object detection) provide spatial structure but lack the semantic richness of foundation vision encoders. To bridge this gap, we propose BEVLM, a framework that connects a spatially consistent and semantically distilled BEV representation with LLMs. Through extensive experiments, we show that BEVLM enables LLMs to reason more effectively in cross-view driving scenes, improving accuracy by 46%, by leveraging BEV features as unified inputs. Furthermore, by distilling semantic knowledge from LLMs into BEV representations, BEVLM significantly improves closed-loop end-to-end driving performance by 29% in safety-critical scenarios.


【2】COLD-Steer: Steering Large Language Models via In-Context One-step Learning Dynamics
标题:COLD-Steer:通过上下文内一步学习动力学引导大型语言模型
链接:https://arxiv.org/abs/2603.06495

作者:Kartik Sharma,Rakshit S. Trivedi
备注:ICLR 2026. Code available at https://github.com/Ksartik/cold-steer
摘要:激活转向方法使大型语言模型(LLM)行为的推理时间控制无需再训练,但目前的方法面临着一个基本的权衡:样本有效的方法次优地从标记的示例中捕获转向信号,而更好地提取这些信号的方法需要数百到数千个示例。我们引入了COLD-Steer,这是一个无需训练的框架,它通过逼近上下文示例的梯度下降导致的表示变化来引导LLM激活。我们的关键见解是,微调对一小部分示例的影响可以在推理时有效地近似,而无需实际的参数更新。我们通过两种互补的方法将其形式化:(i)单位核近似方法,该方法直接使用相对于它们的梯度更新激活,跨示例进行归一化,以及(ii)有限差分近似,无论示例数量如何,都只需要两次向前传递。各种转向任务和基准测试的实验表明,COLD-Steer实现了高达95%的转向效率,同时使用的样本比最佳基线少50倍。冷转向有利于容纳不同的观点,没有广泛的示范数据,我们通过我们的实验验证多元化的对齐任务。我们的框架为自适应、上下文感知的模型控制开辟了新的可能性,可以通过学习动态的原则性近似而不是专门的训练程序灵活地解决不同的损失驱动的人类偏好。
摘要:Activation steering methods enable inference-time control of large language model (LLM) behavior without retraining, but current approaches face a fundamental trade-off: sample-efficient methods suboptimally capture steering signals from labeled examples, while methods that better extract these signals require hundreds to thousands of examples. We introduce COLD-Steer, a training-free framework that steers LLM activations by approximating the representational changes that would result from gradient descent on in-context examples. Our key insight is that the effect of fine-tuning on a small set of examples can be efficiently approximated at inference time without actual parameter updates. We formalize this through two complementary approaches: (i) a unit kernel approximation method that updates the activations directly using gradients with respect to them, normalized across examples, and (ii) a finite-difference approximation requiring only two forward passes regardless of example count. Experiments across a variety of steering tasks and benchmarks demonstrate that COLD-Steer achieves upto 95% steering effectiveness while using 50 times fewer samples compared to the best baseline. COLD-Steer facilitates accommodating diverse perspectives without extensive demonstration data, which we validate through our experiments on pluralistic alignment tasks. Our framework opens new possibilities for adaptive, context-aware model control that can flexibly address varying loss-driven human preferences through principled approximation of learning dynamics rather than specialized training procedures.


【3】MoEless: Efficient MoE LLM Serving via Serverless Computing
标题:MoEless:通过无服务器计算提供高效的MoE LLM服务
链接:https://arxiv.org/abs/2603.06350

作者:Hanfei Yu,Bei Ouyang,Shwai He,Ang Li,Hao Wang
摘要:大型语言模型(LLM)已成为人工智能的基石,推动了内容创建、搜索和推荐系统以及人工智能辅助工作流等不同领域的进步。为了减轻极端的训练成本和推进模型规模,专家混合(MoE)已成为现代LLM的流行骨干,通常在使用专家并行(EP)的分布式部署中提供服务。然而,MoE的稀疏激活机制导致严重的专家负载不平衡,其中一些专家变得过载,而其他人保持空闲,导致专家掉队,膨胀的推理延迟和服务成本。现有的专家负载平衡解决方案假定服务器基础设施上的静态资源配置,限制了专家的可扩展性和弹性,并导致昂贵的实时专家交换或降低生成质量。我们提出MoEless,第一个无服务器的MoE服务框架,减轻专家负载不平衡,并通过无服务器的专家加速推理。MoEless采用轻量级的层感知预测器来准确估计传入的专家负载分布并主动识别落伍者。我们设计了优化的专家缩放和放置策略,以最大限度地提高功能局部性,提高GPU利用率,并平衡专家和GPU之间的负载。MoEless原型建立在Megatron-LM之上,并部署在8 GPU测试平台上。使用开源MoE模型和真实工作负载的实验表明,与最先进的解决方案相比,MoEless将推理延迟降低了43%,推理成本降低了84%。
摘要 :Large Language Models (LLMs) have become a cornerstone of AI, driving progress across diverse domains such as content creation, search and recommendation systems, and AI-assisted workflows. To alleviate extreme training costs and advancing model scales, Mixture-of-Experts (MoE) has become a popular backbone for modern LLMs, which are commonly served in distributed deployment using expert parallelism (EP). However, MoE's sparse activation mechanism leads to severe expert load imbalance, where a few experts become overloaded while others remain idle, resulting in expert stragglers that inflate inference latency and serving cost. Existing expert load balancing solutions assume static resource configurations on serverful infrastructures, limiting expert scalability and elasticity, and resulting in either costly real-time expert swapping or degraded generation quality. We present MoEless, the first serverless MoE serving framework that mitigates expert load imbalance and accelerates inference via serverless experts. MoEless employs lightweight, layer-aware predictors to accurately estimate incoming expert load distributions and proactively identify stragglers. We design optimized expert scaling and placement strategies to maximize function locality, improve GPU utilization, and balance loads across experts and GPUs. MoEless is prototyped on top of Megatron-LM and deployed on an eight-GPU testbed. Experiments with open-source MoE models and real-world workloads show that MoEless reduces inference latency by 43% and inference cost by 84% compared to state-of-the-art solutions.


【4】From Entropy to Calibrated Uncertainty: Training Language Models to Reason About Uncertainty
标题:从熵到校准的不确定性:训练语言模型到推理不确定性
链接:https://arxiv.org/abs/2603.06317

作者:Azza Jenane,Nassim Walha,Lukas Kuhn,Florian Buettner
备注:4 pages, submitted to AISTATS Workshop
摘要:能够表达可解释和校准的不确定性的大型语言模型(LLM)在高风险领域至关重要。虽然存在事后计算不确定性的方法,但它们通常是基于采样的,因此计算昂贵或缺乏校准。我们提出了一个三阶段的管道后训练LLM,以有效地推断校准的不确定性估计其响应。首先,我们在训练数据上计算细粒度的基于熵的不确定性分数,捕获嵌入空间中模型输出的分布变异性。其次,这些分数通过Platt标度进行校准,产生可靠且人类可解释的不确定性信号。最后,目标LLM通过强化学习进行后训练,以通过可验证的奖励函数将其策略与这些校准信号对齐。与事后不确定性估计方法不同,我们的方法在测试时提供了可解释和计算效率高的不确定性估计。实验表明,使用我们的管道训练的模型比基线实现了更好的校准,并且无需进一步处理即可推广到看不见的任务,这表明它们学习了强大的不确定性推理行为。
摘要:Large Language Models (LLMs) that can express interpretable and calibrated uncertainty are crucial in high-stakes domains. While methods to compute uncertainty post-hoc exist, they are often sampling-based and therefore computationally expensive or lack calibration. We propose a three-stage pipeline to post-train LLMs to efficiently infer calibrated uncertainty estimates for their responses. First, we compute fine-grained entropy-based uncertainty scores on the training data, capturing the distributional variability of model outputs in embedding space. Second, these scores are calibrated via Platt scaling, producing reliable and human-interpretable uncertainty signals. Finally, the target LLM is post-trained via reinforcement learning to align its policy with these calibrated signals through a verifiable reward function. Unlike post-hoc uncertainty estimation methods, our approach provides interpretable and computationally efficient uncertainty estimates at test time. Experiments show that models trained with our pipeline achieve better calibration than baselines and generalize to unseen tasks without further processing, suggesting that they learn a robust uncertainty reasoning behavior.


【5】Partial Policy Gradients for RL in LLMs
标题:LLM中RL的部分政策支持者
链接:https://arxiv.org/abs/2603.06138

作者:Puneet Mathur,Branislav Kveton,Subhojyoti Mukherjee,Viet Dac Lai
摘要:强化学习是一个框架,用于学习在未知环境中顺序行动。我们提出了一个自然的方法来建模政策梯度的政策结构。关键思想是优化未来奖励的子集:较小的子集代表更简单的策略,可以更可靠地学习,因为它们的经验梯度估计更准确。我们的方法允许建模和比较不同的政策类,包括全面规划,贪婪,K步前瞻,和段政策。我们评估的政策经验上的多个人物角色对齐会话问题。不同的政策在不同的问题上表现出不同的特点,也凸显了我们研究政策类的重要性。
摘要:Reinforcement learning is a framework for learning to act sequentially in an unknown environment. We propose a natural approach for modeling policy structure in policy gradients. The key idea is to optimize for a subset of future rewards: smaller subsets represent simpler policies, which can be learned more reliably because their empirical gradient estimates are more accurate. Our approach allows for modeling and comparison of different policy classes, including full planning, greedy, K-step lookahead, and segment policies. We evaluate the policies empirically on multiple persona-alignment conversational problems. Different policies excel in different problems, reflecting their different characteristics and highlighting the importance of our studied policy class.


【6】Diffusion Language Models Are Natively Length-Aware
标题:扩散语言模型具有固有的长度感知能力
链接:https://arxiv.org/abs/2603.06123

作者:Vittorio Rossi,Giacomo Cirò,Davide Beltrame,Luca Gandolfi,Paul Röttger,Dirk Hovy
摘要:与自回归语言模型不同,自回归语言模型在预测序列结束(EoS)令牌时终止可变长度生成,扩散语言模型(DLM)在固定的最大长度上下文窗口上操作预定数量的去噪步骤。然而,这个过程与所需的响应长度无关,导致在推理和聊天任务中常见的大多数短响应的计算浪费。为了解决这个问题,我们推测,潜在的提示表示包含足够的信息来估计所需的输出长度。我们为这种现象提供了经验证据,并提出了一种zero-shot机制,在生成开始之前动态裁剪上下文窗口,从而减少扩散步骤并节省大量计算。我们评估了我们的方法在四个基准测试不同的任务-GSM 8 K(推理),HumanEval(代码生成),IfEval(指令遵循),和LongFormQA(问题回答)-揭示了巨大的效率增益在最小的性能影响。我们报告了所有任务的FLOP显著降低,没有统计上显著的性能下降,并且在4个任务中有2个任务的性能显著提高。
摘要:Unlike autoregressive language models, which terminate variable-length generation upon predicting an End-of-Sequence (EoS) token, Diffusion Language Models (DLMs) operate over a fixed maximum-length context window for a predetermined number of denoising steps. However, this process is independent of the required response length, resulting in computational waste for the majority of short responses common in reasoning and chat tasks. To address this problem, we conjecture that the latent prompt representation contains sufficient information to estimate the required output length. We provide empirical evidence for this phenomenon and propose a zero-shot mechanism to dynamically crop the context window before generation begins, leading to fewer diffusion steps and substantial computational savings. We evaluate our approach on four benchmarks with diverse tasks -- GSM8K (reasoning), HumanEval (code generation), IfEval (instruction following), and LongFormQA (question answering) -- revealing massive efficiency gains at minimal performance impact. We report significant reductions in FLOPs across all tasks, with no statistically significant performance degradation, and significant performance improvements in 2 out of 4 tasks.


【7】Who We Are, Where We Are: Mental Health at the Intersection of Person, Situation, and Large Language Models
标题:我们是谁,我们在哪里:人、情境和大型语言模型交叉点的心理健康
链接:https://arxiv.org/abs/2603.05953

作者:Nikita Soni,August Håkan Nilsson,Syeda Mahwish,Vasudha Varadarajan,H. Andrew Schwartz,Ryan L. Boyd
摘要:心理健康不是一个固定的特征,而是一个动态的过程,由个体性格和情境背景之间的相互作用形成。基于互动主义和建构主义心理学理论,我们开发了可解释的模型来预测幸福感,并在纵向社交媒体数据中识别适应性和不适应性自我状态。我们的方法整合了个人层面的心理特征(例如,弹性,认知扭曲,隐含的动机)与语言推断的情景特征来自情景8钻石框架。我们比较这些理论为基础的功能,以嵌入从心理测量学知情的语言模型,捕捉时间和个人特定的模式。结果表明,我们的原则,理论驱动的功能提供竞争力的性能,同时提供更大的可解释性。定性分析进一步强调了最能预测福祉的特征的心理连贯性。这些发现强调了将计算建模与心理学理论相结合的价值,以上下文敏感和人类可理解的方式评估动态心理状态。
摘要 :Mental health is not a fixed trait but a dynamic process shaped by the interplay between individual dispositions and situational contexts. Building on interactionist and constructionist psychological theories, we develop interpretable models to predict well-being and identify adaptive and maladaptive self-states in longitudinal social media data. Our approach integrates person-level psychological traits (e.g., resilience, cognitive distortions, implicit motives) with language-inferred situational features derived from the Situational 8 DIAMONDS framework. We compare these theory-grounded features to embeddings from a psychometrically-informed language model that captures temporal and individual-specific patterns. Results show that our principled, theory-driven features provide competitive performance while offering greater interpretability. Qualitative analyses further highlight the psychological coherence of features most predictive of well-being. These findings underscore the value of integrating computational modeling with psychological theory to assess dynamic mental states in contextually sensitive and human-understandable ways.


【8】Reference-guided Policy Optimization for Molecular Optimization via LLM Reasoning
标题:通过LLM推理进行分子优化的参考引导政策优化
链接:https://arxiv.org/abs/2603.05900

作者:Xuan Li,Zhanke Zhou,Zongze Li,Jiangchao Yao,Yu Rong,Lu Zhang,Bo Han
摘要:大型语言模型(LLM)在推理任务中从监督微调(SFT)和具有可验证奖励的强化学习(RLVR)中受益匪浅。然而,这些配方在基于优化的分子优化中表现不佳,其中每个数据点通常仅提供单个优化的参考分子,而没有逐步优化轨迹。我们发现,只有答案的SFT的参考分子崩溃推理,和RLVR提供稀疏的反馈相似性约束下,由于模型的缺乏有效的探索,这减慢了学习和限制优化。为了鼓励探索新分子,同时平衡参考分子的利用,我们引入了参考指导策略优化(RePO),这是一种从参考分子中学习而不需要轨迹数据的优化方法。在每次更新时,RePO从模型中对具有中间推理轨迹的候选分子进行采样,并使用可验证的奖励来训练模型,这些奖励以RL方式在相似性约束下测量属性满意度。同时,它通过将策略的中间推理轨迹保持为上下文并以监督的方式仅训练答案来应用参考指导。总之,RL术语促进探索,而指导术语通过在存在许多有效分子编辑时将输出接地到参考来减轻奖励稀疏并稳定训练。在分子优化基准中,RePO始终优于SFT和RLVR基线(例如,GRPO),实现优化指标(成功率$\times$相似性)的改进,提高竞争目标之间的平衡,并更好地推广到看不见的教学风格。我们的代码可在https://github.com/tmlr-group/RePO上公开获取。
摘要:Large language models (LLMs) benefit substantially from supervised fine-tuning (SFT) and reinforcement learning with verifiable rewards (RLVR) in reasoning tasks. However, these recipes perform poorly in instruction-based molecular optimization, where each data point typically provides only a single optimized reference molecule and no step-by-step optimization trajectory. We reveal that answer-only SFT on the reference molecules collapses reasoning, and RLVR provides sparse feedback under similarity constraints due to the model's lack of effective exploration, which slows learning and limits optimization. To encourage the exploration of new molecules while balancing the exploitation of the reference molecules, we introduce Reference-guided Policy Optimization (RePO), an optimization approach that learns from reference molecules without requiring trajectory data. At each update, RePO samples candidate molecules with their intermediate reasoning trajectories from the model and trains the model using verifiable rewards that measure property satisfaction under similarity constraints in an RL manner. Meanwhile, it applies reference guidance by keeping the policy's intermediate reasoning trajectory as context and training only the answer in a supervised manner. Together, the RL term promotes exploration, while the guidance term mitigates reward sparsity and stabilizes training by grounding outputs to references when many valid molecular edits exist. Across molecular optimization benchmarks, RePO consistently outperforms SFT and RLVR baselines (e.g., GRPO), achieving improvements on the optimization metric (Success Rate $\times$ Similarity), improving balance across competing objectives, and generalizing better to unseen instruction styles. Our code is publicly available at https://github.com/tmlr-group/RePO.


【9】ROSE: Reordered SparseGPT for More Accurate One-Shot Large Language Models Pruning
标题:ROSE:重新排序的SparseGPT,用于更准确的一次性大型语言模型修剪
链接:https://arxiv.org/abs/2603.05878

作者:Mingluo Su,Huan Wang
备注:CPAL 2026 oral
摘要:剪枝被广泛认为是减少大型语言模型(LLM)参数的有效方法,可能会导致更有效的部署和推理。LLM单次修剪的一个经典且突出的路径是利用二阶梯度(即,Hessian),以开创性工作SparseGPT为代表。然而,SparseGPT中预定义的从左到右的修剪顺序在权重呈现列模式时导致次优性能。本文研究了在SparseGPT框架下剪枝顺序的影响。分析导致我们提出了ROSE,一个重新排序的SparseGPT方法,优先权与更大的潜在修剪错误被修剪更早。ROSE首先执行预修剪以识别用于移除的候选权重,并估计列和块修剪损失。随后,执行两级重新排序:每个块内的列以列丢失的降序重新排序,而块基于块丢失重新排序。我们引入块丢失的相对范围作为识别列层的度量,从而在整个模型中实现自适应重新排序。在LLM(LLaMA 2 - 7 B/13 B/70 B,LLaMA 3 -8B,Mistral-7 B)上的大量实验结果表明,ROSE优于原有的SparseGPT和其他相应的剪枝方法。我们的代码可在https://github.com/mingluo-su/ROSE上获得。
摘要:Pruning is widely recognized as an effective method for reducing the parameters of large language models (LLMs), potentially leading to more efficient deployment and inference. One classic and prominent path of LLM one-shot pruning is to leverage second-order gradients (i.e., Hessian), represented by the pioneering work SparseGPT. However, the predefined left-to-right pruning order in SparseGPT leads to suboptimal performance when the weights exhibit columnar patterns. This paper studies the effect of pruning order under the SparseGPT framework. The analyses lead us to propose ROSE, a reordered SparseGPT method that prioritizes weights with larger potential pruning errors to be pruned earlier. ROSE first performs pre-pruning to identify candidate weights for removal, and estimates both column and block pruning loss. Subsequently, two-level reordering is performed: columns within each block are reordered in descending order of column loss, while blocks are reordered based on block loss. We introduce the relative range of block loss as a metric to identify columnar layers, enabling adaptive reordering across the entire model. Substantial empirical results on prevalent LLMs (LLaMA2-7B/13B/70B, LLaMA3-8B, Mistral-7B) demonstrate that ROSE surpasses the original SparseGPT and other counterpart pruning methods. Our code is available at https://github.com/mingluo-su/ROSE.


【10】ReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement Learning
标题:ReflexiCoder:通过强化学习教授大型语言模型对生成的代码进行自我反思并自我更正
链接:https://arxiv.org/abs/2603.05863

作者:Juyong Jiang,Jiasi Shen,Sunghun Kim,Kang Min Yoo,Jeonghoon Kim,Sungju Kim
摘要:虽然大型语言模型(LLM)已经彻底改变了代码生成,但标准的“系统1”方法,在单个向前传递中生成解决方案,在面对复杂的算法任务时,通常会达到性能上限。现有的迭代精化策略试图在推理时弥合这一差距,但它们主要依赖于外部预言机、执行反馈或计算代价高昂的迭代响应周期。在这项工作中,我们提出了ReflexiCoder,这是一种新型的强化学习(RL)框架,它将结构化推理轨迹内化,包括初始生成,错误和优化感知反射以及自我校正,直接进入模型的权重。与以前的方法不同,ReflexiCoder将范式从外部依赖的细化转变为内在的,完全自主的自我反思和推理时的自我纠正能力。我们利用具有粒度奖励函数的RL零训练范式来优化整个反射校正轨迹,教导模型如何在推理时不依赖地面实况反馈或执行引擎进行调试。在七个基准测试中进行的广泛实验表明,我们的ReflexiCoder-8B在1.5B-14 B范围内的领先开源模型中建立了一个新的最先进的(SOTA),实现了94.51%(87.20%),HumanEval(Plus),81.80%在单次尝试设置中,MBPP(Plus)为78.57%,BigCodeBench为35.00%,LiveCodeBench为52.21%,CodeForces为37.34%,与GPT-5.1等专有模型相媲美或超越。值得注意的是,我们的框架比基本模型更有效,通过规范的高速推理和反射模式将推理时间计算开销减少了约40%。源代码可在https://github.com/juyongjiang/ReflexiCoder上获得。
摘要 :While Large Language Models (LLMs) have revolutionized code generation, standard "System 1" approaches, generating solutions in a single forward pass, often hit a performance ceiling when faced with complex algorithmic tasks. Existing iterative refinement strategies attempt to bridge this gap at inference time, yet they predominantly rely on external oracles, execution feedback, or computationally expensive prompt-response cycles. In this work, we propose ReflexiCoder, a novel reinforcement learning (RL) framework that internalizes the structured reasoning trajectory, encompassing initial generation, bug and optimization aware reflection, and self-correction, directly into the model's weights. Unlike prior methods, ReflexiCoder shifts the paradigm from external-dependent refinement to an intrinsic, fully autonomous self-reflection and self-correction capabilities at inference time. We utilize an RL-zero training paradigm with granular reward functions to optimize the entire reflection-correction trajectory, teaching the model how to debug without reliance on ground-truth feedback or execution engines at inference time. Extensive experiments across seven benchmarks demonstrate that our ReflexiCoder-8B establishes a new state-of-the-art (SOTA) among leading open-source models in the 1.5B-14B range, achieving 94.51% (87.20%) on HumanEval (Plus), 81.80% (78.57%) on MBPP (Plus), 35.00% on BigCodeBench, 52.21% on LiveCodeBench, and 37.34% on CodeForces in a single-attempt setting, rivaling or surpassing proprietary models like GPT-5.1. Notably, our framework is significantly more token-efficient than base models, reducing inference-time compute overhead by approximately 40% through disciplined, high-speed reasoning and reflection patterns. Source code is available at https://github.com/juyongjiang/ReflexiCoder.


【11】Knowing without Acting: The Disentangled Geometry of Safety Mechanisms in Large Language Models
标题:知而不做:大型语言模型中安全机制的解开几何
链接:https://arxiv.org/abs/2603.05773

作者:Jinman Wu,Yi Xie,Shen Lin,Shiqian Zhao,Xiaofeng Chen
摘要:Safety alignment is often conceptualized as a monolithic process wherein harmfulness detection automatically triggers refusal. However, the persistence of jailbreak attacks suggests a fundamental mechanistic decoupling. We propose the \textbf{\underline{D}}isentangled \textbf{\underline{S}}afety \textbf{\underline{H}}ypothesis \textbf{(DSH)}, positing that safety computation operates on two distinct subspaces: a \textit{Recognition Axis} ($\mathbf{v}_H$, ``Knowing'') and an \textit{Execution Axis} ($\mathbf{v}_R$, ``Acting''). Our geometric analysis reveals a universal ``Reflex-to-Dissociation'' evolution, where these signals transition from antagonistic entanglement in early layers to structural independence in deep layers. To validate this, we introduce \textit{Double-Difference Extraction} and \textit{Adaptive Causal Steering}. Using our curated \textsc{AmbiguityBench}, we demonstrate a causal double dissociation, effectively creating a state of ``Knowing without Acting.'' Crucially, we leverage this disentanglement to propose the \textbf{Refusal Erasure Attack (REA)}, which achieves State-of-the-Art attack success rates by surgically lobotomizing the refusal mechanism. Furthermore, we uncover a critical architectural divergence, contrasting the \textit{Explicit Semantic Control} of Llama3.1 with the \textit{Latent Distributed Control} of Qwen2.5. The code and dataset are available at https://anonymous.4open.science/r/DSH.
摘要:Safety alignment is often conceptualized as a monolithic process wherein harmfulness detection automatically triggers refusal. However, the persistence of jailbreak attacks suggests a fundamental mechanistic decoupling. We propose the \textbf{\underline{D}}isentangled \textbf{\underline{S}}afety \textbf{\underline{H}}ypothesis \textbf{(DSH)}, positing that safety computation operates on two distinct subspaces: a \textit{Recognition Axis} ($\mathbf{v}_H$, ``Knowing'') and an \textit{Execution Axis} ($\mathbf{v}_R$, ``Acting''). Our geometric analysis reveals a universal ``Reflex-to-Dissociation'' evolution, where these signals transition from antagonistic entanglement in early layers to structural independence in deep layers. To validate this, we introduce \textit{Double-Difference Extraction} and \textit{Adaptive Causal Steering}. Using our curated \textsc{AmbiguityBench}, we demonstrate a causal double dissociation, effectively creating a state of ``Knowing without Acting.'' Crucially, we leverage this disentanglement to propose the \textbf{Refusal Erasure Attack (REA)}, which achieves State-of-the-Art attack success rates by surgically lobotomizing the refusal mechanism. Furthermore, we uncover a critical architectural divergence, contrasting the \textit{Explicit Semantic Control} of Llama3.1 with the \textit{Latent Distributed Control} of Qwen2.5. The code and dataset are available at https://anonymous.4open.science/r/DSH.


【12】Parallelization Strategies for Dense LLM Deployment: Navigating Through Application-Specific Tradeoffs and Bottlenecks
标题:密集LLM部署的并行化策略:应对特定于应用程序的权衡和瓶颈
链接:https://arxiv.org/abs/2603.05692

作者:Burak Topcu,Musa Oguzhan Cim,Poovaiah Palangappa,Meena Arunachalam,Mahmut Taylan Kandemir
备注:17 pages, 8 figures, 3 tables
摘要:生成式AI领域的突破性进展推动了大型语言模型(LLM)驱动的应用程序的爆炸式增长,这些应用程序的工作负载基本上由通过Transformer架构的推理序列组成。在这个快速扩展的生态系统中,密集的LLM--那些激活每个令牌生成的所有模型参数的LLM--形成了基于专家的高级变体的基础。密集模型由于其强大的泛化能力、可扩展性、易于微调以及跨不同任务的多功能性而继续占据主导地位。在LLM推理系统中,性能主要由延迟、响应时间和吞吐量(即,每单位时间生成的令牌)。延迟和吞吐量是内在耦合的:优化一个往往以牺牲另一个为代价。此外,当密集模型参数超过设备存储器容量时,必须使用的并行化策略和并行化配置会显著影响延迟和整体系统吞吐量。本文(i)研究了两种典型的密集LLM--Llama-3.1- 70 B和Llama-3.1- 405 B的工作负载,特别关注节点内并行化方案,(ii)分析了输入特性、并行度和并行策略如何影响延迟灵活性和延迟-吞吐量权衡,和(iii)确定关键的性能瓶颈,为满足服务水平协议(SLA)和维持推理质量的设计选择提供信息。我们的实证评估表明,张量并行(TP)提高了延迟目标,而管道并行(PP)更适合面向吞吐量的应用程序。我们强调,他们的混合使用通过控制TP和PP度提供控制的延迟吞吐量的相互作用。
摘要:Breakthroughs in the generative AI domain have fueled an explosion of large language model (LLM)-powered applications, whose workloads fundamentally consist of sequences of inferences through transformer architectures. Within this rapidly expanding ecosystem, dense LLMs--those that activate all model parameters for each token generation--form the foundation for advanced expert-based variants. Dense models continue to dominate because of their strong generalization ability, scalability, ease of fine-tuning, and versatility across diverse tasks. In LLM inference systems, performance is mainly characterized by latency, response time, and throughput (i.e., tokens generated per unit of time). Latency and throughput are inherently coupled: optimizing for one often comes at the expense of the other. Moreover, batching strategies and parallelism configurations, which are essential when dense model parameters exceed device memory capacity, can significantly affect both latency and overall system throughput. This paper (i) investigates the workloads of two representative dense LLMs--Llama-3.1-70B and Llama-3.1-405B, focusing in particular on intra-node parallelization schemes, (ii) analyzes how input characteristics, batching, and parallelism strategies influence latency flexibility and the latency-throughput tradeoff, and (iii) identifies key performance bottlenecks that inform design choices for meeting service-level agreements (SLAs) and sustaining inference quality. Our empirical evaluations reveal that Tensor Parallelism (TP) improves the latency objectives while Pipeline Parallelism (PP) is better-suited for throughput-oriented applications. We highlight that their hybrid usage by controlling the TP and PP degrees provides control over the latency-throughput interplay.


【13】Attention Meets Reachability: Structural Equivalence and Efficiency in Grammar-Constrained LLM Decoding
标题:注意力满足可达性:文法约束LLM解码中的结构等效和效率
链接:https://arxiv.org/abs/2603.05540

作者:Faruk Alpay,Bilge Senturk
备注:20 pages
摘要 :我们研究语法约束解码(GCD)之间的耦合自回归下一个令牌分布和可达性的预言下推系统编译从上下文无关的语法(CFG)。我们证明了一个甲骨文不变性定理:语言等价的语法诱导相同的可接受的下一个令牌集的每个前缀,因此相同的logit掩码,但可以产生可证明不同的编译状态空间和在线模糊成本。我们给出了在冗余非终结符委托下规范$a^n b^n$语言的精确控制状态爆破计数,并引入了从左到右的结构模糊成本(SAC)来衡量每个令牌的增量压缩解析森林增长。对于所有有限字符串上的两个等价文法,在右递归下,SAC是每个标记O(1)$,但在串联下,每个标记为$Θ(t^2)$,累积为$Θ(n^3)$。我们建立了独立于引擎的下限:任何合理的,检索效率高的,解析保持的在线掩蔽引擎必须在这个模型中无条件地在一个特定的恒定大小的CFG家族上每个标记产生$Ω(t^2)$工作。我们定义的解码成本等价类的语法和证明存在的最小SAC代表有界重写家庭。最后,我们通过Doob $h$-变换的真实条件采样器的特征,并推导出尖锐的一步KL和总变化失真界硬屏蔽解码的生存概率传播之间的容许下一个令牌。我们将这些结果与Transformer和Mixture-of-Experts架构相结合,根据词汇大小、活动状态集和波束宽度推导延迟包络,并将SAC连接到基于仪器的预测性能模型和自动语法优化。
摘要:We study grammar-constrained decoding (GCD) as a coupling between an autoregressive next-token distribution and a reachability oracle over a pushdown system compiled from a context-free grammar (CFG). We prove an oracle invariance theorem: language-equivalent grammars induce identical admissible next-token sets for every prefix, hence identical logit masks, yet can yield provably different compiled state spaces and online ambiguity costs. We give exact control-state blowup counts for the canonical $a^n b^n$ language under redundant nonterminal delegation, and introduce a left-to-right structural ambiguity cost (SAC) measuring incremental packed-parse-forest growth per token. For two equivalent grammars over all finite strings, SAC is $O(1)$ per token under right-recursion but $Θ(t^2)$ per token and $Θ(n^3)$ cumulatively under concatenation. We establish engine-independent lower bounds: any sound, retrieval-efficient, parse-preserving online masking engine must incur $Ω(t^2)$ work per token on a specific constant-size CFG family, unconditionally within this model. We define decoding-cost equivalence classes of grammars and prove existence of minimal-SAC representatives within bounded rewrite families. Finally, we characterize the true conditional sampler via a Doob $h$-transform and derive sharp one-step KL and total-variation distortion bounds for hard-masked decoding in terms of survival-probability spread among admissible next tokens. We integrate these results with Transformer and Mixture-of-Experts architectures, derive latency envelopes in terms of vocabulary size, active state sets, and beam width, and connect SAC to instrumentation-based predictive performance models and automated grammar optimization.


【14】Information-Theoretic Privacy Control for Sequential Multi-Agent LLM Systems
标题:顺序多代理LLM系统的信息论隐私控制
链接:https://arxiv.org/abs/2603.05520

作者:Sadia Asif,Mohammad Mohammadi Amiri
摘要:顺序多代理大型语言模型(LLM)系统越来越多地部署在敏感领域,如医疗保健,金融和企业决策,其中多个专门的代理协作处理单个用户请求。虽然个体代理可能满足本地隐私约束,但仍然可以通过顺序组合和中间表示来推断敏感信息。在这项工作中,我们研究\n {组成隐私泄漏}在顺序LLM代理管道。我们正式泄漏使用互信息,并推导出一个理论界的特点是如何本地引入的泄漏可以放大跨代理顺序执行。出于这种分析,我们提出了一个隐私正则化的训练框架,直接约束代理输出和代理本地敏感变量之间的信息流。我们在三个基准数据集上对不同深度的顺序代理管道进行了评估,展示了稳定的优化动态和一致的,可解释的隐私-效用权衡。我们的研究结果表明,在代理LLM系统的隐私不能保证单独的本地约束,而必须作为一个系统级的属性在培训和部署。
摘要:Sequential multi-agent large language model (LLM) systems are increasingly deployed in sensitive domains such as healthcare, finance, and enterprise decision-making, where multiple specialized agents collaboratively process a single user request. Although individual agents may satisfy local privacy constraints, sensitive information can still be inferred through sequential composition and intermediate representations. In this work, we study \emph{compositional privacy leakage} in sequential LLM agent pipelines. We formalize leakage using mutual information and derive a theoretical bound that characterizes how locally introduced leakage can amplify across agents under sequential execution. Motivated by this analysis, we propose a privacy-regularized training framework that directly constrains information flow between agent outputs and agent-local sensitive variables. We evaluate our approach across sequential agent pipelines of varying depth on three benchmark datasets, demonstrating stable optimization dynamics and consistent, interpretable privacy-utility trade-offs. Our results show that privacy in agentic LLM systems cannot be guaranteed by local constraints alone and must instead be treated as a system-level property during both training and deployment.


Graph相关(图学习|图神经网络|图优化等)(4篇)

【1】Ensemble Graph Neural Networks for Probabilistic Sea Surface Temperature Forecasting via Input Perturbations
标题:利用输入扰动进行海面温度概率预测的积分图神经网络
链接:https://arxiv.org/abs/2603.06153

作者:Alejandro J. González-Santana,Giovanny A. Cuervo-Londoño,Javier Sánchez
备注:20 pages, 14 figures, 6 tables
摘要:准确的区域海洋预报需要计算效率高且能够表示预测不确定性的模型。本文研究了利用图神经网络(GNNs)进行海表温度(SST)预报的集成学习策略,重点研究了输入扰动设计如何影响预报技巧和不确定性表示。我们将GNN架构适应于北大西洋的加那利群岛地区,并实施了一种受装袋启发的同质集成方法,其中通过扰动初始海洋状态而不是重新训练多个模型来在推理过程中引入多样性。几种基于噪声的集成生成策略进行了评估,包括高斯噪声,柏林噪声,分形柏林噪声,噪声强度和空间结构的系统变化。使用确定性指标(RMSE和偏差)和概率指标(包括连续排序概率得分(CRPS)和预测技能比)在15天的范围内评估预测。结果表明,虽然确定性的技能仍然可比的单一模型预测,输入扰动的类型和结构强烈影响不确定性表示,特别是在较长的前置时间。与纯随机高斯扰动相比,用空间相干扰动(例如低分辨率柏林噪声)生成的合奏实现了更好的校准和更低的CRPS。这些发现强调了噪声结构和尺度在集合GNN设计中的关键作用,并表明精心构建的输入扰动可以在无需额外训练成本的情况下产生校准良好的概率预报,支持集合GNN用于业务区域海洋预测的可行性。
摘要:Accurate regional ocean forecasting requires models that are both computationally efficient and capable of representing predictive uncertainty. This work investigates ensemble learning strategies for sea surface temperature (SST) forecasting using Graph Neural Networks (GNNs), with a focus on how input perturbation design affects forecast skill and uncertainty representation. We adapt a GNN architecture to the Canary Islands region in the North Atlantic and implement a homogeneous ensemble approach inspired by bagging, where diversity is introduced during inference by perturbing initial ocean states rather than retraining multiple models. Several noise-based ensemble generation strategies are evaluated, including Gaussian noise, Perlin noise, and fractal Perlin noise, with systematic variation of noise intensity and spatial structure. Ensemble forecasts are assessed over a 15-day horizon using deterministic metrics (RMSE and bias) and probabilistic metrics, including the Continuous Ranked Probability Score (CRPS) and the Spread-skill ratio. Results show that, while deterministic skill remains comparable to the single-model forecast, the type and structure of input perturbations strongly influence uncertainty representation, particularly at longer lead times. Ensembles generated with spatially coherent perturbations, such as low-resolution Perlin noise, achieve better calibration and lower CRPS than purely random Gaussian perturbations. These findings highlight the critical role of noise structure and scale in ensemble GNN design and demonstrate that carefully constructed input perturbations can yield well-calibrated probabilistic forecasts without additional training cost, supporting the feasibility of ensemble GNNs for operational regional ocean prediction.


【2】Predictive Coding Graphs are a Superset of Feedforward Neural Networks
标题:预测编码图是前向神经网络的超集
链接:https://arxiv.org/abs/2603.06142

作者:Björn van Zwol
备注:11 pages, 3 figures. Accepted at the NeuroAI Workshop @ NeurIPS 2024. OpenReview: https://openreview.net/forum?id=J36z3R0sNq
摘要:预测编码图(PCG)是最近引入的预测编码网络的推广,是一种受神经科学启发的概率潜变量模型。在这里,我们证明了PCG如何定义前馈人工神经网络(多层感知器)的数学超集。这使得PCN在当代机器学习(ML)中的地位更加稳固,并加强了早期研究使用非分层神经网络进行ML任务的建议,以及更普遍的神经网络拓扑概念。
摘要 :Predictive coding graphs (PCGs) are a recently introduced generalization to predictive coding networks, a neuroscience-inspired probabilistic latent variable model. Here, we prove how PCGs define a mathematical superset of feedforward artificial neural networks (multilayer perceptrons). This positions PCNs more strongly within contemporary machine learning (ML), and reinforces earlier proposals to study the use of non-hierarchical neural networks for ML tasks, and more generally the notion of topology in neural networks.


【3】The Value of Graph-based Encoding in NBA Salary Prediction
标题:基于图的编码在NBA薪资预测中的价值
链接:https://arxiv.org/abs/2603.05671

作者:Junhao Su,David Grimsman,Christopher Archibald
备注:6 pages,IEEE tempelate conference style. Submitted to IETC 2026, get decision on Mar 22th
摘要:考虑到每年的表现和位置的变化,职业运动员的市场估值是一个难题。在美国国家篮球协会(NBA)中,解决这个问题的一个简单方法是建立一个表格数据集,并使用监督机器学习来根据球员在前一年的表现预测球员的工资。对于年轻球员来说,他们的合同主要建立在选秀权上,这种方法效果很好,但是对于老将或者那些薪水高的球员来说,这种方法可能会失败。在本文中,我们展示了使用场上和场外数据构建知识图,将该图嵌入向量空间,并将该向量包含在表格数据中,可以让监督学习更好地了解影响工资的因素。我们比较了几种图嵌入算法,并表明这样的过程是至关重要的NBA工资预测。
摘要:Market valuations for professional athletes is a difficult problem, given the amount of variability in performance and location from year to year. In the National Basketball Association (NBA), a straightforward way to address this problem is to build a tabular data set and use supervised machine learning to predict a player's salary based on the player's performance in the previous year. For younger players, whose contracts are mostly built on draft position, this approach works well, however it can fail for veterans or those whose salaries are on the high tail of the distribution. In this paper, we show that building a knowledge graph with on and off court data, embedding that graph in a vector space, and including that vector in the tabular data allows the supervised learning to better understand the landscape of factors that affect salary. We compare several graph embedding algorithms and show that such a process is vital to NBA salary prediction.


【4】Random Dot Product Graphs as Dynamical Systems: Limitations and Opportunities
标题:作为动力系统的随机点积图:局限与机遇
链接:https://arxiv.org/abs/2603.05703

作者:Giulio Valentino Dalla Riva
备注:39 pages, 3 figures
摘要:我们能学习控制时间网络演化的微分方程吗?我们在随机点积图(RDPGs)中对此进行了研究,其中每个网络快照都是从未知动态下演变的潜在位置生成的。我们确定了三个基本的障碍:规范自由旋转模糊的潜在位置,可实现性约束的概率矩阵的流形结构,和轨迹恢复文物频谱嵌入。   我们开发了一个几何框架的基础上,正式这些障碍物的主要纤维束。我们将不可见动力学精确地刻画为反对称生成元,并证明了可实现的切空间的维数为$nd-d(d-1)/2$。出现了一个完整的二分法:多项式动力学具有交换生成元、固定本征向量和平凡完整性,使得规范对准纯粹是统计的;拉普拉斯动力学满足非交换性准则,产生非平凡完整性,曲率加权为1/(λ_i + λ_γ)$,将规范灵敏度与谱隙联系起来。在$d=2$中,这产生了完全限制的完整性$\mathrm{SO}(2)$;对于$d \ge 3$,一般的完整的$\mathrm{SO}(d)$仍然是不确定的。Cram'er-Rao下界揭示了控制曲率和内射性的相同谱隙同时控制Fisher信息,因此几何和统计困难是不可分割的。   我们证明了一个可辨识性原理:对称动力学不能吸收斜对称规范污染,因此动力学结构可以解决规范模糊。我们建设性地证明了这一点与锚为基础的对齐和从嘈杂的图形序列中恢复矢量场的UDE管道。然而,有限样本之间的相互作用的噪音,规范和动力学的表现力仍然超出了渐近理论。我们将这一差距视为一个开放的挑战。
摘要:Can we learn the differential equations governing the evolution of a temporal network? We investigate this within Random Dot Product Graphs (RDPGs), where each network snapshot is generated from latent positions evolving under unknown dynamics. We identify three fundamental obstructions: gauge freedom from rotational ambiguity in latent positions, realizability constraints from the manifold structure of the probability matrix, and trajectory recovery artifacts from spectral embedding.   We develop a geometric framework based on principal fiber bundles that formalizes these obstructions. We characterize invisible dynamics as exactly the skew-symmetric generators, and show the realizable tangent space has dimension $nd - d(d-1)/2$. An holonomy dichotomy emerges: polynomial dynamics have commuting generators, stationary eigenvectors, and trivial holonomy, making gauge alignment purely statistical; Laplacian dynamics satisfy a non-commutativity criterion producing nontrivial holonomy, with curvature weighted by $1/(λ_ι+ λ_γ)$ linking gauge sensitivity to the spectral gap. In $d=2$ this yields full restricted holonomy $\mathrm{SO}(2)$; for $d \ge 3$ generic full $\mathrm{SO}(d)$ remains conjectural. Cram'er--Rao lower bounds reveal that the same spectral gap controlling curvature and injectivity simultaneously controls Fisher information, so geometric and statistical difficulty are inextricable.   We prove an identifiability principle: symmetric dynamics cannot absorb skew-symmetric gauge contamination, so dynamics structure can resolve gauge ambiguity. We demonstrate this constructively with anchor-based alignment and a UDE pipeline recovering vector fields from noisy graph sequences. Yet finite-sample interactions between noise, gauge, and dynamics expressiveness remain beyond the asymptotic theory. We frame this gap as an open challenge.


Transformer(5篇)

【1】NOBLE: Accelerating Transformers with Nonlinear Low-Rank Branches
标题:NOBLE:用非线性低阶分支加速变形机
链接:https://arxiv.org/abs/2603.06492

作者:Ethan Smith
备注:14 pages, 5 figures, 5 tables
摘要:我们介绍了NOBLE(Nonlinear lOw-rank Branch for Linear Enhancement),一种将非线性低秩分支添加到Transformer线性层的架构增强。与LoRA和其他参数高效微调(PEFT)方法不同,NOBLE是为从头开始进行预训练而设计的。分支是架构的永久部分,而不是在冻结权重上进行微调的适配器。分支计算σ(xWdown)Wup,其中σ是可学习的非线性。我们评估了几个激活函数,发现CosNet是一个双层余弦非线性函数,具有可学习的频率和相位,在瓶颈空间中它们之间具有线性投影,表现最好。NOBLE以最小的开销实现了实质性的改进:高达1.47倍的步长加速,以达到基线评估损失(最多减少32%的训练步长),低至4%的额外参数和7%的步长时间开销,从而实现高达1.22倍的净挂钟加速。在LLM(250 M和1.5 B参数)、BERT、VQGAN和ViT上的实验一致地显示出提高的训练效率。我们提出一个警告:Mixup/CutMix增强会干扰NOBLE在Imagenet分类中的优势以及其他随机增强,但当禁用时,ViT也会得到改善。这种差异可能是由正则化技术解释的,正则化技术鼓励更平滑地拟合目标函数,而NOBLE可能更专注于目标函数的尖锐方面。
摘要:We introduce NOBLE (Nonlinear lOw-rank Branch for Linear Enhancement), an architectural augmentation that adds nonlinear low-rank branches to transformer linear layers. Unlike LoRA and other parameter-efficient fine-tuning (PEFT) methods, NOBLE is designed for pretraining from scratch. The branch is a permanent part of the architecture as opposed to an adapter for finetuning on top of frozen weights. The branch computes σ(xWdown)Wup where σ is a learnable nonlinearity. We evaluate several activation functions and find that CosNet, a two-layer cosine nonlinearity with learnable frequency and phase with a linear projection in between them in the bottleneck space, performs best. NOBLE achieves substantial improvements with minimal overhead: up to 1.47x step speedup to reach baseline eval loss (up to 32% fewer training steps), with as low as 4% additional parameters and 7% step time overhead, resulting in up to 1.22x net wallclock speedup. Experiments on LLMs (250M and 1.5B parameters), BERT, VQGAN, and ViT consistently show improved training efficiency. We identify one caveat: Mixup/CutMix augmentation interferes with NOBLE's benefits in Imagenet classification along with other stochastic augmentations, but when disabled, ViT also improves. This discrepancy is possibly explained by regularization techniques that encourage smoother fits to the target function while NOBLE may specialize more in sharper aspects of the target function.


【2】Dynamic Chunking Diffusion Transformer
标题:动态分块扩散Transformer
链接:https://arxiv.org/abs/2603.06351

作者:Akash Haridas,Utkarsh Saxena,Parsa Ashrafi Fashi,Mehdi Rezagholizadeh,Vikram Appia,Emad Barsoum
摘要:扩散Transformers将图像处理为由静态$\textit{patchify}$操作产生的固定长度的标记序列。虽然有效,但这种设计在低信息区域和高信息区域上都进行了统一的计算,忽略了图像包含不同细节的区域,并且去噪过程从早期的粗略结构发展到后期的精细细节。我们引入了动态分块扩散Transformer(DC-DiT),它通过学习的编码器-路由器-解码器支架来增强DiT骨干,该支架使用通过扩散训练学习的端到端分块机制,以数据依赖的方式自适应地将2D输入压缩为更短的令牌序列。该机制学习将均匀的背景区域压缩为更少的标记,将细节丰富的区域压缩为更多的标记,在没有明确监督的情况下出现有意义的视觉分割。此外,它还学会在扩散时间步上调整其压缩,在嘈杂阶段使用更少的令牌,在细节出现时使用更多的令牌。在类条件ImageNet $256{\times}256$上,DC-DiT在$4{\times}$和$16{\times}$压缩下,与参数匹配和FLOP匹配的DiT基线相比,始终提高了FID和Inception Score,表明这是一种有前途的技术,可能进一步应用于像素空间,视频和3D生成。除了准确性之外,DC-DiT还很实用:它可以从预训练的DiT检查点进行升级,只需最少的训练后计算(最多减少8 {\times}$的训练步骤),并与其他动态计算方法组合,以进一步减少生成FLOP。
摘要:Diffusion Transformers process images as fixed-length sequences of tokens produced by a static $\textit{patchify}$ operation. While effective, this design spends uniform compute on low- and high-information regions alike, ignoring that images contain regions of varying detail and that the denoising process progresses from coarse structure at early timesteps to fine detail at late timesteps. We introduce the Dynamic Chunking Diffusion Transformer (DC-DiT), which augments the DiT backbone with a learned encoder-router-decoder scaffold that adaptively compresses the 2D input into a shorter token sequence in a data-dependent manner using a chunking mechanism learned end-to-end with diffusion training. The mechanism learns to compress uniform background regions into fewer tokens and detail-rich regions into more tokens, with meaningful visual segmentations emerging without explicit supervision. Furthermore, it also learns to adapt its compression across diffusion timesteps, using fewer tokens at noisy stages and more tokens as fine details emerge. On class-conditional ImageNet $256{\times}256$, DC-DiT consistently improves FID and Inception Score over both parameter-matched and FLOP-matched DiT baselines across $4{\times}$ and $16{\times}$ compression, showing this is a promising technique with potential further applications to pixel-space, video and 3D generation. Beyond accuracy, DC-DiT is practical: it can be upcycled from pretrained DiT checkpoints with minimal post-training compute (up to $8{\times}$ fewer training steps) and composes with other dynamic computation methods to further reduce generation FLOPs.


【3】Stock Market Prediction Using Node Transformer Architecture Integrated with BERT Sentiment Analysis
标题:使用结合BERT情绪分析的节点Transformer架构进行股市预测
链接:https://arxiv.org/abs/2603.05917

作者:Mohammad Al Ridhawi,Mahtab Haj Ali,Hussein Al Osman
备注:14 pages, 5 figures, 10 tables, submitted to IEEE Access
摘要:股票市场预测对投资者、金融机构和政策制定者在以噪声、非平稳性和行为动力学为特征的复杂市场环境中运作提出了相当大的挑战。传统的预测方法往往无法捕捉金融市场固有的复杂模式和横截面依赖性。本文提出了一个集成框架相结合的节点Transformer架构与基于BERT的情绪分析的股票价格预测。该模型将股票市场表示为一个图形结构,其中单个股票形成节点,边缘捕获包括部门隶属关系,相关价格变动和供应链连接在内的关系。经过微调的BERT模型从社交媒体帖子中提取情绪,并通过基于注意力的融合将其与定量市场特征相结合。节点Transformer处理历史市场数据,同时捕获股票之间的时间演变和横截面依赖性。对1982年1月至2025年3月的20只标准普尔500指数股票进行的实验表明,集成模型的单日预测平均绝对百分比误差(MAPE)为0.80%,而ARIMA和LSTM的平均绝对百分比误差(MAPE)分别为1.20%和1.00%。情绪分析总体上减少了10%的预测误差,在盈利公告期间减少了25%,而基于图形的建模通过捕捉股票间的依赖关系,又增加了15%的预测误差。对于一天的预报,方向准确率达到65%。通过配对t检验的统计学验证证实了这些改善(所有比较p < 0.05)。在基准模型超过2%的高波动期,该模型将MAPE维持在1.5%以下。
摘要:Stock market prediction presents considerable challenges for investors, financial institutions, and policymakers operating in complex market environments characterized by noise, non-stationarity, and behavioral dynamics. Traditional forecasting methods often fail to capture the intricate patterns and cross-sectional dependencies inherent in financial markets. This paper presents an integrated framework combining a node transformer architecture with BERT-based sentiment analysis for stock price forecasting. The proposed model represents the stock market as a graph structure where individual stocks form nodes and edges capture relationships including sectoral affiliations, correlated price movements, and supply chain connections. A fine-tuned BERT model extracts sentiment from social media posts and combines it with quantitative market features through attention-based fusion. The node transformer processes historical market data while capturing both temporal evolution and cross-sectional dependencies among stocks. Experiments on 20 S&P 500 stocks spanning January 1982 to March 2025 demonstrate that the integrated model achieves a mean absolute percentage error (MAPE) of 0.80% for one-day-ahead predictions, compared to 1.20% for ARIMA and 1.00% for LSTM. Sentiment analysis reduces prediction error by 10% overall and 25% during earnings announcements, while graph-based modeling contributes an additional 15% improvement by capturing inter-stock dependencies. Directional accuracy reaches 65% for one-day forecasts. Statistical validation through paired t-tests confirms these improvements (p < 0.05 for all comparisons). The model maintains MAPE below 1.5% during high-volatility periods where baseline models exceed 2%.


【4】Predicting Atomistic Transitions with Transformers
标题:用Transformer预测原子转变
链接:https://arxiv.org/abs/2603.06526

作者:Henry Tischler,Wenting Li,Qi Tang,Danny Perez,Thomas Vogel
备注:Presented at the 2025 Conference on Data Analysis (CoDA), February 25-28, Santa Fe, New Mexico
摘要:材料和材料表面原子跃迁路径的准确知识对于许多材料科学问题至关重要。然而,用于找到这些转变的常规模拟技术是极其计算密集的。即使是大规模的加速材料模拟,计算成本也会限制实际应用领域。机器学习模型具有学习控制原子转换的复杂涌现行为的潜力,可以作为快速代理模型,在预测转换方面具有很大的潜力,并且大大降低了计算成本。在这里,我们演示了如何训练Transformers来预测纳米簇中的原子跃迁。我们将展示如何评估预测的物理有效性,以及如何通过稍微改变提供给模型的数据来生成大量额外的、不同的微观状态。
摘要:Accurate knowledge of the atomistic transition pathways in materials and material surfaces is crucial for many material science problems. However, conventional simulation techniques used to find these transitions are extremely computationally intensive. Even with large-scale, accelerated material simulations, the computational cost constrains the applicable domain in practice. Machine learning models, with the potential to learn the complex emergent behaviors governing atomistic transitions as a fast surrogate model, have great promise to predict transitions with a vastly reduced computational cost. Here, we demonstrate how transformers can be trained to predict atomistic transitions in nano-clusters. We show how we evaluate physical validity of the predictions and how a multitude of additional, different microstates can be generated by slightly varying the data provided to the model.


【5】Clinical-Injection Transformer with Domain-Adapted MAE for Lupus Nephritis Prognosis Prediction
标题:临床注射Transformer结合领域适应MAE预测狼疮性肾炎预后
链接:https://arxiv.org/abs/2603.05535

作者:Yuewen Huang,Zhitao Ye,Guangnan Feng,Fudan Zheng,Xia Gao,Yutong Lu
摘要:狼疮性肾炎(LN)是系统性红斑狼疮的一种严重并发症,与成人相比,儿童患者的严重程度和肾脏结局明显更差。尽管迫切的临床需求,预测小儿LN预后仍然是未探讨的计算病理学。此外,现有的唯一基于组织病理学的LN方法依赖于多种昂贵的染色方案,并且未能整合互补的临床数据。为了解决这些差距,我们提出了第一个多模式计算病理学框架,用于儿科LN的三级治疗反应预测(完全缓解,部分缓解和无反应),仅利用常规PAS染色活检和结构化临床数据。我们的框架引入了两个关键的方法创新。首先,临床注入Transformer(CIT)嵌入临床特征作为条件令牌到补丁级的自我注意,促进内隐和双向的跨模式的相互作用在一个统一的注意空间。其次,我们设计了一个解耦的表示知识适应策略,使用域适应的掩蔽自动编码器(MAE)。该策略明确地将自监督形态学特征学习与病理学知识提取分离。此外,我们引入了多粒度形态类型注入机制,以在实例和患者水平上将蒸馏的分类知识与下游预后预测相结合。通过对71例使用KDIGO标准化标签的儿童LN患者进行评估,我们的方法实现了90.1%的三级准确性和89.4%的AUC,证明了其作为高度准确和具有成本效益的预后工具的潜力。
摘要:Lupus nephritis (LN) is a severe complication of systemic lupus erythematosus that affects pediatric patients with significantly greater severity and worse renal outcomes compared to adults. Despite the urgent clinical need, predicting pediatric LN prognosis remains unexplored in computational pathology. Furthermore, the only existing histopathology-based approach for LN relies on multiple costly staining protocols and fails to integrate complementary clinical data. To address these gaps, we propose the first multimodal computational pathology framework for three-class treatment response prediction (complete remission, partial response, and no response) in pediatric LN, utilizing only routine PAS-stained biopsies and structured clinical data. Our framework introduces two key methodological innovations. First, a Clinical-Injection Transformer (CIT) embeds clinical features as condition tokens into patch-level self-attention, facilitating implicit and bidirectional cross-modal interactions within a unified attention space. Second, we design a decoupled representation-knowledge adaptation strategy using a domain-adapted Masked Autoencoder (MAE). This strategy explicitly separates self-supervised morphological feature learning from pathological knowledge extraction. Additionally, we introduce a multi-granularity morphological type injection mechanism to bridge distilled classification knowledge with downstream prognostic predictions at both the instance and patient levels. Evaluated on a cohort of 71 pediatric LN patients with KDIGO-standardized labels, our method achieves a three-class accuracy of 90.1% and an AUC of 89.4%, demonstrating its potential as a highly accurate and cost-effective prognostic tool.


GAN|对抗|攻击|生成相关(4篇)

【1】TempoSyncDiff: Distilled Temporally-Consistent Diffusion for Low-Latency Audio-Driven Talking Head Generation
标题:TempoSyncDiff:为低延迟音频驱动的会说话的头部生成提取时间一致的扩散
链接:https://arxiv.org/abs/2603.06057

作者:Soumya Mazumdar,Vineet Kumar Rakesh
摘要:扩散模型最近先进的逼真的人类合成,虽然实际的说话头生成(THG)仍然受到高推理延迟,时间不稳定性,如闪烁和身份漂移,以及不完美的视听对齐在具有挑战性的语音条件下。本文介绍了TempoSyncDiff,一个参考条件的潜在扩散框架,探讨了高效的音频驱动的说话头生成的几步推理。该方法采用了教师-学生蒸馏配方,其中一个标准的噪声预测目标训练的扩散教师指导一个轻量级的学生去噪器,能够以显着更少的推理步骤来提高生成稳定性。该框架采用身份锚定和时间正则化设计,以减轻身份漂移和帧到帧的闪烁在合成过程中,而基于视位的音频调节提供粗略的嘴唇运动控制。LRS 3数据集上的实验报告了与VAE重建和初步延迟表征相关的降噪阶段组件级指标,包括仅CPU和边缘计算测量以及边缘部署的可行性估计。结果表明,蒸馏扩散模型可以保留一个更强大的教师的重建行为,同时使大大降低延迟推理。这项研究被定位为第一步,实际的扩散为基础的说话头生成下的约束计算设置。GitHub:https://mazumdarsoumya.github.io/TempoSyncDiff
摘要:Diffusion models have recently advanced photorealistic human synthesis, although practical talking-head generation (THG) remains constrained by high inference latency, temporal instability such as flicker and identity drift, and imperfect audio-visual alignment under challenging speech conditions. This paper introduces TempoSyncDiff, a reference-conditioned latent diffusion framework that explores few-step inference for efficient audio-driven talking-head generation. The approach adopts a teacher-student distillation formulation in which a diffusion teacher trained with a standard noise prediction objective guides a lightweight student denoiser capable of operating with significantly fewer inference steps to improve generation stability. The framework incorporates identity anchoring and temporal regularization designed to mitigate identity drift and frame-to-frame flicker during synthesis, while viseme-based audio conditioning provides coarse lip motion control. Experiments on the LRS3 dataset report denoising-stage component-level metrics relative to VAE reconstructions and preliminary latency characterization, including CPU-only and edge computing measurements and feasibility estimates for edge deployment. The results suggest that distilled diffusion models can retain much of the reconstruction behaviour of a stronger teacher while enabling substantially lower latency inference. The study is positioned as an initial step toward practical diffusion-based talking-head generation under constrained computational settings. GitHub: https://mazumdarsoumya.github.io/TempoSyncDiff


【2】Making Reconstruction FID Predictive of Diffusion Generation FID
标题:使重建DID预测扩散生成DID
链接:https://arxiv.org/abs/2603.05630

作者:Tongda Xu,Mingwei He,Shady Abu-Hussein,Jose Miguel Hernandez-Lobato,Haotian Zhang,Kai Zhao,Chao Zhou,Ya-Qin Zhang,Yan Wang
摘要:众所周知,VAE的重建FID(rFID)与潜在扩散模型的生成FID(gFID)相关性很差。我们提出内插FID(iFID),一个简单的变体,表现出很强的相关性与gFID的rFID。具体来说,对于数据集中的每个元素,我们在潜在空间中检索其最近邻(NN)并插入其潜在表示。然后,我们解码内插的latent,并计算解码的样本和原始数据集之间的FID。此外,我们完善的索赔,rFID相关性差与gFID,显示rFID与样品质量的扩散细化阶段,而iFID与样品质量的扩散导航阶段。此外,我们提供了一个解释,为什么iFID与gFID相关,为什么重建指标与gFID呈负相关,通过连接到扩散泛化和幻觉的结果。从经验上讲,iFID是第一个证明与扩散gFID具有强相关性的指标,实现了Pearson线性和Spearman秩相关性约为0.85。源代码在https://github.com/tongdaxu/Making-rFID-Predictive-of-Diffusion-gFID中提供。
摘要:It is well known that the reconstruction FID (rFID) of a VAE is poorly correlated with the generation FID (gFID) of a latent diffusion model. We propose interpolated FID (iFID), a simple variant of rFID that exhibits a strong correlation with gFID. Specifically, for each element in the dataset, we retrieve its nearest neighbor (NN) in the latent space and interpolate their latent representations. We then decode the interpolated latent and compute the FID between the decoded samples and the original dataset. Additionally, we refine the claim that rFID correlates poorly with gFID, by showing that rFID correlates with sample quality in the diffusion refinement phase, whereas iFID correlates with sample quality in the diffusion navigation phase. Furthermore, we provide an explanation for why iFID correlates well with gFID, and why reconstruction metrics are negatively correlated with gFID, by connecting to results in the diffusion generalization and hallucination. Empirically, iFID is the first metric to demonstrate a strong correlation with diffusion gFID, achieving Pearson linear and Spearman rank correlations approximately 0.85. The source code is provided in https://github.com/tongdaxu/Making-rFID-Predictive-of-Diffusion-gFID.


【3】Identifying Adversary Characteristics from an Observed Attack
标题:从观察到的攻击中识别敌对者特征
链接:https://arxiv.org/abs/2603.05625

作者:Soyon Choi,Scott Alfeld,Meiyi Ma
摘要 :当用于自动决策系统时,机器学习(ML)模型容易受到数据操纵攻击。一些防御机制(例如,对抗正则化)直接影响ML模型,而其他(例如,异常检测)在更广泛的系统内起作用。在本文中,我们考虑一个不同的任务,防御对手,专注于攻击者,而不是攻击。我们提出并展示了一个框架,用于识别攻击者的特征,从观察到的攻击。   我们证明,没有额外的知识,攻击者是不可识别的(多个潜在的攻击者将执行相同的观察到的攻击)。为了解决这一挑战,我们提出了一个域不可知的框架,以确定最可能的攻击者。这个框架以两种方式帮助辩护人。首先,可以利用关于攻击者的知识来进行外源缓解(即,通过改变学习算法之外的决策系统和/或限制攻击者的能力来解决漏洞)。其次,当实施直接影响学习过程的防御方法时(例如,对抗性正则化),特定攻击者的知识提高了性能。我们提出了我们的框架的细节,并说明其适用性,通过对各种学习者的具体实例。
摘要:When used in automated decision-making systems, machine learning (ML) models are vulnerable to data-manipulation attacks. Some defense mechanisms (e.g., adversarial regularization) directly affect the ML models while others (e.g., anomaly detection) act within the broader system. In this paper we consider a different task for defending the adversary, focusing on the attacker, rather than the attack. We present and demonstrate a framework for identifying characteristics about the attacker from an observed attack.   We prove that, without additional knowledge, the attacker is non-identifiable (multiple potential attackers would perform the same observed attack). To address this challenge, we propose a domain-agnostic framework to identify the most probable attacker. This framework aids the defender in two ways. First, knowledge about the attacker can be leveraged for exogenous mitigation (i.e., addressing the vulnerability by altering the decision-making system outside the learning algorithm and/or limiting the attacker's capability). Second, when implementing defense methods that directly affect the learning process (e.g., adversarial regularization), knowledge of the specific attacker improves performance. We present the details of our framework and illustrate its applicability through specific instantiations on a variety of learners.


【4】RoboLayout: Differentiable 3D Scene Generation for Embodied Agents
标题:RoboPlan:为排队的代理生成差异化的3D场景
链接:https://arxiv.org/abs/2603.05522

作者:Ali Shamsaddinlou
摘要:视觉语言模型(VLM)的最新进展显示出强大的潜力,空间推理和3D场景布局生成的开放式语言指令。然而,生成的布局,不仅是语义连贯的,但也可行的互动体现代理仍然具有挑战性,特别是在物理约束的室内环境。在本文中,RoboLayout被引入作为LayoutVLM的扩展,增强了原有的框架与代理感知的推理和改进的优化稳定性。RoboLayout将明确的可达性约束集成到可区分的布局优化过程中,从而能够生成可由具体代理导航和操作的布局。重要的是,代理抽象不限于特定的机器人平台,并且可以表示具有不同物理能力的各种实体,例如服务机器人,仓库机器人,不同年龄组的人类或动物,允许环境设计针对预期的代理进行定制。此外,提出了一个局部细化阶段,选择性地重新优化有问题的对象放置,同时保持场景的其余部分固定,提高收敛效率,而不增加全局优化迭代。总体而言,RoboLayout保留了LayoutVLM强大的语义对齐和物理合理性,同时增强了以代理为中心的室内场景生成的适用性,正如不同场景配置的实验结果所证明的那样。
摘要:Recent advances in vision language models (VLMs) have shown strong potential for spatial reasoning and 3D scene layout generation from open-ended language instructions. However, generating layouts that are not only semantically coherent but also feasible for interaction by embodied agents remains challenging, particularly in physically constrained indoor environments. In this paper, RoboLayout is introduced as an extension of LayoutVLM that augments the original framework with agent-aware reasoning and improved optimization stability. RoboLayout integrates explicit reachability constraints into a differentiable layout optimization process, enabling the generation of layouts that are navigable and actionable by embodied agents. Importantly, the agent abstraction is not limited to a specific robot platform and can represent diverse entities with distinct physical capabilities, such as service robots, warehouse robots, humans of different age groups, or animals, allowing environment design to be tailored to the intended agent. In addition, a local refinement stage is proposed that selectively reoptimizes problematic object placements while keeping the remainder of the scene fixed, improving convergence efficiency without increasing global optimization iterations. Overall, RoboLayout preserves the strong semantic alignment and physical plausibility of LayoutVLM while enhancing applicability to agent-centric indoor scene generation, as demonstrated by experimental results across diverse scene configurations.


半/弱/无/有监督|不确定性|主动学习(4篇)

【1】Hierarchical Industrial Demand Forecasting with Temporal and Uncertainty Explanations
标题:具有时间和不确定性解释的分层工业需求预测
链接:https://arxiv.org/abs/2603.06555

作者:Harshavardhan Kamarthi,Shangqing Xu,Xinjie Tong,Xingyu Zhou,James Peters,Joseph Czyzyk,B. Aditya Prakash
摘要:分层时间序列预测对于各个行业的需求预测至关重要。虽然机器学习模型在这些预测任务中获得了显著的准确性和可扩展性,但其预测的可解释性(由应用程序提供信息)在很大程度上仍未得到探索。为了弥合这一差距,我们引入了一种新的解释性方法,用于大型分层概率时间序列预测,适应通用的解释性技术,同时解决与分层结构和不确定性相关的挑战。我们的方法为应对现实世界的工业供应链场景提供了有价值的解释性见解,包括1)层次结构内的各种时间序列和特定时间点的外部变量的重要性,2)不同变量对预测不确定性的影响,以及3)对训练数据集修改的预测变化的解释。为了评估可解释性方法,我们根据一家大型化工公司对一万多种产品的分层需求进行解释的真实场景生成了半合成数据集。实验表明,我们的可解释性方法成功地解释了国家的最先进的工业预测方法,具有显着更高的可解释性的准确性。此外,我们还提供了多个真实案例研究,展示了我们的方法在识别重要模式和解释方面的有效性,这些模式和解释有助于利益相关者更好地理解预测。此外,我们的方法有助于识别预测需求背后的关键驱动因素,从而实现更明智的决策和战略规划。我们的方法有助于在用户之间建立信任和信心,最终导致在实践中更好地采用和利用分层预测模型。
摘要:Hierarchical time-series forecasting is essential for demand prediction across various industries. While machine learning models have obtained significant accuracy and scalability on such forecasting tasks, the interpretability of their predictions, informed by application, is still largely unexplored. To bridge this gap, we introduce a novel interpretability method for large hierarchical probabilistic time-series forecasting, adapting generic interpretability techniques while addressing challenges associated with hierarchical structures and uncertainty. Our approach offers valuable interpretative insights in response to real-world industrial supply chain scenarios, including 1) the significance of various time-series within the hierarchy and external variables at specific time points, 2) the impact of different variables on forecast uncertainty, and 3) explanations for forecast changes in response to modifications in the training dataset. To evaluate the explainability method, we generate semi-synthetic datasets based on real-world scenarios of explaining hierarchical demands for over ten thousand products at a large chemical company. The experiments showed that our explainability method successfully explained state-of-the-art industrial forecasting methods with significantly higher explainability accuracy. Furthermore, we provide multiple real-world case studies that show the efficacy of our approach in identifying important patterns and explanations that help stakeholders better understand the forecasts. Additionally, our method facilitates the identification of key drivers behind forecasted demand, enabling more informed decision-making and strategic planning. Our approach helps build trust and confidence among users, ultimately leading to better adoption and utilization of hierarchical forecasting models in practice.


【2】Contrastive-to-Self-Supervised: A Two-Stage Framework for Script Similarity Learning
标题:对比与自我监督:剧本相似性学习的两阶段框架
链接:https://arxiv.org/abs/2603.06180

作者:Claire Roman,Philippe Meyer
摘要:学习字形和书写系统的相似性度量面临着一个根本性的挑战:虽然发明的字母表中的单个字形可以可靠地标记,但不同文字之间的历史关系仍然不确定和有争议。我们提出了一个两阶段的框架,解决这个认识论的约束。首先,我们训练一个编码器的对比度损失标记的发明字母,建立一个教师模型具有强大的歧视性的特点。其次,我们通过师生蒸馏扩展到历史证明的脚本,学生在教师知识的指导下学习无监督的表示,但可以自由发现潜在的跨脚本相似性。非对称设置使学生能够学习变形不变的嵌入,同时从干净的例子继承判别结构。我们的方法连接了监督对比学习和无监督发现,使不同系统之间的硬边界和反映潜在历史影响的软相似性成为可能。不同的书写系统上的实验证明了有效的Few-Shot脚本识别和有意义的脚本聚类,而不需要地面实况进化关系。
摘要 :Learning similarity metrics for glyphs and writing systems faces a fundamental challenge: while individual graphemes within invented alphabets can be reliably labeled, the historical relationships between different scripts remain uncertain and contested. We propose a two-stage framework that addresses this epistemological constraint. First, we train an encoder with contrastive loss on labeled invented alphabets, establishing a teacher model with robust discriminative features. Second, we extend to historically attested scripts through teacher-student distillation, where the student learns unsupervised representations guided by the teacher's knowledge but free to discover latent cross-script similarities. The asymmetric setup enables the student to learn deformation-invariant embeddings while inheriting discriminative structure from clean examples. Our approach bridges supervised contrastive learning and unsupervised discovery, enabling both hard boundaries between distinct systems and soft similarities reflecting potential historical influences. Experiments on diverse writing systems demonstrate effective few-shot glyph recognition and meaningful script clustering without requiring ground-truth evolutionary relationships.


【3】Margin and Consistency Supervision for Calibrated and Robust Vision Models
标题:校准且稳健的视觉模型的裕度和一致性监督
链接:https://arxiv.org/abs/2603.05812

作者:Salim Khazem
摘要:深度视觉分类器通常可以实现高精度,但在小的分布变化下仍然校准不良和脆弱。我们提出了保证金和一致性监督(MaCS),一个简单的,架构不可知的正则化框架,共同执行logit空间分离和局部预测稳定性。MaCS通过(i)铰链平方保证金惩罚来增强交叉熵,该惩罚在正确的类和最强的竞争者之间强制执行目标logit差距,以及(ii)一致性正则化器,该正则化器最小化干净输入和轻度扰动视图上的预测之间的KL分歧。我们提供了一个统一的理论分析表明,通过Lipschitz型稳定性代理形式化的增加分类保证金,同时降低局部灵敏度,从而提高了泛化保证和可证明的鲁棒性半径界缩放的保证金灵敏度比。在几个图像分类基准和几个跨越CNN和Vision Transformers的主干中,MaCS始终提高了校准(较低的ECE和NLL)和对常见损坏的鲁棒性,同时保持或提高了前1精度。我们的方法不需要额外的数据,没有架构变化,可以忽略不计的推理开销,使其成为标准训练目标的有效替代品。
摘要:Deep vision classifiers often achieve high accuracy while remaining poorly calibrated and fragile under small distribution shifts. We present Margin and Consistency Supervision (MaCS), a simple, architecture-agnostic regularization framework that jointly enforces logit-space separation and local prediction stability. MaCS augments cross-entropy with (i) a hinge-squared margin penalty that enforces a target logit gap between the correct class and the strongest competitor, and (ii) a consistency regularizer that minimizes the KL divergence between predictions on clean inputs and mildly perturbed views. We provide a unifying theoretical analysis showing that increasing classification margin while reducing local sensitivity formalized via a Lipschitz-type stability proxy yields improved generalization guarantees and a provable robustness radius bound scaling with the margin-to-sensitivity ratio. Across several image classification benchmarks and several backbones spanning CNNs and Vision Transformers, MaCS consistently improves calibration (lower ECE and NLL) and robustness to common corruptions while preserving or improving top-1 accuracy. Our approach requires no additional data, no architectural changes, and negligible inference overhead, making it an effective drop-in replacement for standard training objectives.


【4】Unsupervised domain adaptation for radioisotope identification in gamma spectroscopy
标题:伽玛光谱学中放射性同位素识别的无监督领域适应
链接:https://arxiv.org/abs/2603.05719

作者:Peter Lalor,Ayush Panigrahy,Alex Hagen
备注:32 pages, 3 figures, and 14 tables
摘要:对于许多实际应用来说,使用伽马光谱法训练用于放射性同位素识别的机器学习模型仍然是一个难以捉摸的挑战,这主要是由于难以获取和标记大量不同的实验数据集。模拟可以缓解这一挑战,但在部署到分布外的操作环境时,在模拟数据上训练的模型的准确性可能会大幅下降。在这项研究中,我们证明了无监督域自适应(UDA)可以提高在合成数据上训练的模型推广到新测试域的能力,前提是目标域的未标记数据可用。传统的监督技术无法利用这些数据,因为缺乏同位素标记排除了定义监督分类损失。相反,我们首先使用标记的合成数据预训练谱分类器,然后利用未标记的目标数据来对齐源域和目标域之间的学习特征表示。我们比较了一系列不同的UDA技术,发现最大限度地减少源和目标特征向量之间的最大平均差异(MMD)产生最一致的改善测试成绩。例如,使用一个自定义的基于变换的神经网络,我们实现了一个实验LaBr$_3$测试集的测试精度为0.904\pm 0.022$后,通过MMD最小化执行无监督的功能对齐,相比对齐前的0.754\pm 0.014$。总的来说,我们的研究结果突出了使用UDA来适应在真实世界部署的合成数据上训练的放射性同位素分类器的潜力。
摘要:Training machine learning models for radioisotope identification using gamma spectroscopy remains an elusive challenge for many practical applications, largely stemming from the difficulty of acquiring and labeling large, diverse experimental datasets. Simulations can mitigate this challenge, but the accuracy of models trained on simulated data can deteriorate substantially when deployed to an out-of-distribution operational environment. In this study, we demonstrate that unsupervised domain adaptation (UDA) can improve the ability of a model trained on synthetic data to generalize to a new testing domain, provided unlabeled data from the target domain are available. Conventional supervised techniques are unable to utilize this data because the absence of isotope labels precludes defining a supervised classification loss. Instead, we first pretrain a spectral classifier using labeled synthetic data and subsequently leverage unlabeled target data to align the learned feature representations between the source and target domains. We compare a range of different UDA techniques, finding that minimizing the maximum mean discrepancy (MMD) between source and target feature vectors yields the most consistent improvement to testing scores. For instance, using a custom transformer-based neural network, we achieved a testing accuracy of $0.904 \pm 0.022$ on an experimental LaBr$_3$ test set after performing unsupervised feature alignment via MMD minimization, compared to $0.754 \pm 0.014$ before alignment. Overall, our results highlight the potential of using UDA to adapt a radioisotope classifier trained on synthetic data for real-world deployment.


迁移|Zero/Few/One-Shot|自适应(8篇)

【1】SCOPE: Scene-Contextualized Incremental Few-Shot 3D Segmentation
标题:范围:场景背景化增量Few-Shot3D分割
链接:https://arxiv.org/abs/2603.06572

作者:Vishal Thengane,Zhaochong An,Tianjin Huang,Son Lam Phung,Abdesselam Bouzerdoum,Lu Yin,Na Zhao,Xiatian Zhu
备注:Accepted at CVPR 2026
摘要:增量式Few-Shot(IFS)分割的目的是随着时间的推移从少量注释中学习新的类别。虽然在2D中得到了广泛的研究,但它在3D点云中的研究仍然不足。现有的方法遭受灾难性的遗忘或无法在稀疏监督下学习有区别的原型,并且经常忽略一个关键线索:新类别经常出现在基础训练场景中作为未标记的背景。我们介绍SCOPE(Scene-COontextualised Prototype Enrichment),一个即插即用的背景引导原型丰富框架,它集成了任何基于原型的3D分割方法。在基础训练之后,类不可知分割模型从背景区域中提取高置信度的伪实例以构建原型池。当新的类到达与少数标记的样本,相关的背景原型检索和融合与Few-Shot原型,以形成丰富的表示,而无需重新训练的骨干或添加参数。在ScanNet和S3 DIS上的实验表明,SCOPE实现了SOTA性能,在保持低遗忘的同时,将新类IoU分别提高了6.98%和3.61%,平均IoU分别提高了2.25%和1.70%。代码可从https://github.com/Surrey-UP-Lab/SCOPE获得。
摘要 :Incremental Few-Shot (IFS) segmentation aims to learn new categories over time from only a few annotations. Although widely studied in 2D, it remains underexplored for 3D point clouds. Existing methods suffer from catastrophic forgetting or fail to learn discriminative prototypes under sparse supervision, and often overlook a key cue: novel categories frequently appear as unlabelled background in base-training scenes. We introduce SCOPE (Scene-COntextualised Prototype Enrichment), a plug-and-play background-guided prototype enrichment framework that integrates with any prototype-based 3D segmentation method. After base training, a class-agnostic segmentation model extracts high-confidence pseudo-instances from background regions to build a prototype pool. When novel classes arrive with few labelled samples, relevant background prototypes are retrieved and fused with few-shot prototypes to form enriched representations without retraining the backbone or adding parameters. Experiments on ScanNet and S3DIS show that SCOPE achieves SOTA performance, improving novel-class IoU by up to 6.98% and 3.61%, and mean IoU by 2.25% and 1.70%, respectively, while maintaining low forgetting. Code is available https://github.com/Surrey-UP-Lab/SCOPE.


【2】CLoPA: Continual Low Parameter Adaptation of Interactive Segmentation for Medical Image Annotation
标题:CLoPA:医学图像注释交互式分割的连续低参数适应
链接:https://arxiv.org/abs/2603.06426

作者:Parhom Esmaeili,Chayanin Tangwiriyasakul,Eli Gibson,Sebastien Ourselin,M. Jorge Cardoso
备注:10 pages, 2 figures
摘要:交互式分割使临床医生能够指导注释,但现有的zero-shot模型(如nnInteractive)无法在各种医学成像任务中始终达到专家级性能。由于注释活动产生了越来越多的特定于任务的标记数据流,因此分割模型的在线适应是对zero-shot推理的自然补充。我们提出了CLoPA,一个持续的适应策略,调整一小部分的nnInteractive的参数的注释缓存,触发轻量级情节调度。CLoPA不需要新的参数或对推理管道的更改,并且完全在现有的注释工作流中运行。在涵盖不同解剖目标和成像特征的八项医学分割十项全能任务中,CLoPA迅速将性能提升到专家级,即使是nnInteractive以前失败的任务,也能在一次训练后实现大部分收益。我们表明,调整不同的参数组的好处取决于任务的特点和数据制度。此外,对于具有复杂几何形状的目标(例如,肝血管),实例归一化和低级别特征调整饱和,这表明在最具挑战性的场景中需要更深的特征表示对齐。
摘要:Interactive segmentation enables clinicians to guide annotation, but existing zero-shot models like nnInteractive fail to consistently reach expert-level performance across diverse medical imaging tasks. Because annotation campaigns produce a growing stream of task-specific labelled data, online adaptation of the segmentation model is a natural complement to zero-shot inference. We propose CLoPA, a continual adaptation strategy that tunes a small fraction of nnInteractive's parameters on the annotation cache, triggered by lightweight episode scheduling. CLoPA requires no new parameters or changes to the inference pipeline, and operates entirely within the existing annotation workflow. Across eight Medical Segmentation Decathlon tasks spanning diverse anatomical targets and imaging characteristics, CLoPA rapidly elevates performance to expert-level, even for tasks where nnInteractive previously failed, with the majority of gains realised after a single training episode. We show that the benefits of tuning different parameter groups depends on task characteristics and data regimes. Also, that for targets with complex geometries (e.g., hepatic vessels), instance normalisation and low-level feature tuning saturates, suggesting a need for deeper feature-representation alignment in the most challenging scenarios.


【3】Adaptive Lipschitz-Free Conditional Gradient Methods for Stochastic Composite Nonconvex Optimization
标题:随机复合非凸优化的自适应无Lipschitz条件梯度方法
链接:https://arxiv.org/abs/2603.06369

作者:Ganzhao Yuan
摘要:We propose ALFCG (Adaptive Lipschitz-Free Conditional Gradient), the first \textit{adaptive} projection-free framework for stochastic composite nonconvex minimization that \textit{requires neither global smoothness constants nor line search}. Unlike prior conditional gradient methods that use openloop diminishing stepsizes, conservative Lipschitz constants, or costly backtracking, ALFCG maintains a self-normalized accumulator of historical iterate differences to estimate local smoothness and minimize a quadratic surrogate model at each step. This retains the simplicity of Frank-Wolfe while adapting to unknown geometry. We study three variants. ALFCG-FS addresses finite-sum problems with a SPIDER estimator. ALFCG-MVR1 and ALFCG-MVR2 handle stochastic expectation problems by using momentum-based variance reduction with single-batch and two-batch updates, and operate under average and individual smoothness, respectively. To reach an $ε$-stationary point, ALFCG-FS attains $\mathcal{O}(N+\sqrt{N}ε^{-2})$ iteration complexity, while ALFCG-MVR1 and ALFCG-MVR2 achieve $\tilde{\mathcal{O}}(σ^2ε^{-4}+ε^{-2})$ and $\tilde{\mathcal{O}}(σε^{-3}+ε^{-2})$, where $N$ is the number of components and $σ$ is the noise level. In contrast to typical $\mathcal{O}(ε^{-4})$ or $\mathcal{O}(ε^{-3})$ rates, our bounds reduce to the optimal rate up to logarithmic factors $\tilde{\mathcal{O}}(ε^{-2})$ as the noise level $σ\to 0$. Extensive experiments on multiclass classification over nuclear norm balls and $\ell_p$ balls show that ALFCG generally outperforms state-of-the-art conditional gradient baselines.
摘要:We propose ALFCG (Adaptive Lipschitz-Free Conditional Gradient), the first \textit{adaptive} projection-free framework for stochastic composite nonconvex minimization that \textit{requires neither global smoothness constants nor line search}. Unlike prior conditional gradient methods that use openloop diminishing stepsizes, conservative Lipschitz constants, or costly backtracking, ALFCG maintains a self-normalized accumulator of historical iterate differences to estimate local smoothness and minimize a quadratic surrogate model at each step. This retains the simplicity of Frank-Wolfe while adapting to unknown geometry. We study three variants. ALFCG-FS addresses finite-sum problems with a SPIDER estimator. ALFCG-MVR1 and ALFCG-MVR2 handle stochastic expectation problems by using momentum-based variance reduction with single-batch and two-batch updates, and operate under average and individual smoothness, respectively. To reach an $ε$-stationary point, ALFCG-FS attains $\mathcal{O}(N+\sqrt{N}ε^{-2})$ iteration complexity, while ALFCG-MVR1 and ALFCG-MVR2 achieve $\tilde{\mathcal{O}}(σ^2ε^{-4}+ε^{-2})$ and $\tilde{\mathcal{O}}(σε^{-3}+ε^{-2})$, where $N$ is the number of components and $σ$ is the noise level. In contrast to typical $\mathcal{O}(ε^{-4})$ or $\mathcal{O}(ε^{-3})$ rates, our bounds reduce to the optimal rate up to logarithmic factors $\tilde{\mathcal{O}}(ε^{-2})$ as the noise level $σ\to 0$. Extensive experiments on multiclass classification over nuclear norm balls and $\ell_p$ balls show that ALFCG generally outperforms state-of-the-art conditional gradient baselines.


【4】Learning Where the Physics Is: Probabilistic Adaptive Sampling for Stiff PDEs
标题:了解物理学在哪里:硬偏出方程的概率自适应采样
链接:https://arxiv.org/abs/2603.06287

作者:Akshay Govind Srinivasan,Balaji Srinivasan
备注:Accepted at AI&PDE Workshop at the Fourteenth International Conference on Learning Representations
摘要:对具有尖锐梯度的刚性偏微分方程(PDE)进行建模仍然是科学机器学习的一个重大挑战。虽然物理信息神经网络(PINN)与频谱偏差和缓慢的训练时间作斗争,但物理信息极端学习机(PIELM)提供了一种快速的封闭形式的线性解决方案,但从根本上受到物理不可知的随机初始化的限制。我们引入高斯混合模型自适应PIELM(GMM-PIELM),这是一个概率框架,它学习表示PIELM自适应采样内核的“物理位置”的概率密度函数。通过采用加权期望最大化(EM)算法,GMM-PIELM自主集中径向基函数中心的高数值误差的区域,如激波阵面和边界层。这种方法动态地改善了隐藏层的条件,而无需昂贵的基于梯度的优化(PINN)或贝叶斯搜索。我们评估我们的方法对一维奇摄动对流扩散方程的扩散系数$v =10^{-4}$。我们的方法实现了$L_2$错误高达$7$数量级低于基线RBF-PIELM,成功地解决了指数薄边界层,同时保留了数量级的速度优势的ELM架构。
摘要 :Modeling stiff partial differential equations (PDEs) with sharp gradients remains a significant challenge for scientific machine learning. While Physics-Informed Neural Networks (PINNs) struggle with spectral bias and slow training times, Physics-Informed Extreme Learning Machines (PIELMs) offer a rapid, closed-form linear solution but are fundamentally limited by physics-agnostic, random initialization. We introduce the Gaussian Mixture Model Adaptive PIELM (GMM-PIELM), a probabilistic framework that learns a probability density function representing the ``location of physics'' for adaptively sampling kernels of PIELMs. By employing a weighted Expectation-Maximization (EM) algorithm, GMM-PIELM autonomously concentrates radial basis function centers in regions of high numerical error, such as shock fronts and boundary layers. This approach dynamically improves the conditioning of the hidden layer without the expensive gradient-based optimization(of PINNs) or Bayesian search. We evaluate our methodology on 1D singularly perturbed convection-diffusion equations with diffusion coefficients $ν=10^{-4}$. Our method achieves $L_2$ errors up to $7$ orders of magnitude lower than baseline RBF-PIELMs, successfully resolving exponentially thin boundary layers while retaining the orders-of-magnitude speed advantage of the ELM architecture.


【5】Test-Time Adaptation via Many-Shot Prompting: Benefits, Limits, and Pitfalls
标题:通过多镜头预算进行测试时适应:好处、限制和陷阱
链接:https://arxiv.org/abs/2603.05829

作者:Shubhangi Upasani,Chen Wu,Jay Rainton,Bo Li,Changran Hu,Qizheng Zhang,Urmish Thakker
摘要:测试时自适应使大型语言模型(LLM)能够在不更新模型参数的情况下修改其推理行为。一种常见的方法是多次提示,其中大量的上下文学习(ICL)示例被注入作为输入空间测试时间更新。虽然性能可以随着添加更多的演示而提高,但这种更新机制的可靠性和局限性仍然知之甚少,特别是对于开源模型。我们提出了一个实证研究的多镜头提示跨任务和模型的骨干,分析性能如何与更新幅度,例子排序和选择政策。我们进一步研究了动态和加强ICL作为替代测试时更新策略,控制哪些信息被注入,以及它如何约束模型的行为。我们发现,多镜头提示是有效的结构化任务,演示提供高信息增益,但高度敏感的选择策略,往往显示有限的好处,开放式的一代任务。总的来说,我们的特点的实际限制,基于的测试时间适应和大纲时,输入空间更新是有益的与有害的。
摘要:Test-time adaptation enables large language models (LLMs) to modify their behavior at inference without updating model parameters. A common approach is many-shot prompting, where large numbers of in-context learning (ICL) examples are injected as an input-space test-time update. Although performance can improve as more demonstrations are added, the reliability and limits of this update mechanism remain poorly understood, particularly for open-source models. We present an empirical study of many-shot prompting across tasks and model backbones, analyzing how performance varies with update magnitude, example ordering, and selection policy. We further study Dynamic and Reinforced ICL as alternative test-time update strategies that control which information is injected and how it constrains model behavior. We find that many-shot prompting is effective for structured tasks where demonstrations provide high information gain, but is highly sensitive to selection strategy and often shows limited benefits for open-ended generation tasks. Overall, we characterize the practical limits of prompt-based test-time adaptation and outline when input-space updates are beneficial versus harmful.


【6】Self-Auditing Parameter-Efficient Fine-Tuning for Few-Shot 3D Medical Image Segmentation
标题:用于Few-Shot3D医学图像分割的自审核参数高效微调
链接:https://arxiv.org/abs/2603.05822

作者:Son Thai Ly,Hien V. Nguyen
摘要:使基础模型适应新的临床地点在实践中仍然具有挑战性。域转移和稀缺注释必须由专家处理,但许多临床小组无法随时访问熟练的人工智能工程师来调整适配器设计和训练配方。因此,适应周期可能从数周到数月不等,特别是在Few-Shot设置中。现有的PEFT方法需要手动适配器配置或自动搜索,这在Few-Shot 3D设置中在计算上是不可行的。我们建议SEA-PEFT(自我审计参数有效的微调)自动化这个过程。SEA-PEFT将适配器配置视为在线分配问题,在微调期间解决,而不是通过手动固定拓扑选择。SEA-PEFT使用一个搜索-审计-分配循环来训练活动适配器,通过暂时关闭它来估计每个适配器的Dice效用,然后使用贪婪背包分配器在参数预算下重新选择活动集。指数移动平均和四分位数范围平滑,以及一个快速状态排名控制器,稳定回路,提高可靠性,在高噪声Few-Shot制度。在TotalSegmentator和FLARE'22上,SEA-PEFT在1/5/10次射击设置中将最强固定拓扑PEFT基线的平均Dice提高了2.4- 2.8点,同时训练<1%的参数。出于可重复性的目的,我们在https://github.com/tsly123/SEA_PEFT上公开了我们的代码
摘要:Adapting foundation models to new clinical sites remains challenging in practice. Domain shift and scarce annotations must be handled by experts, yet many clinical groups do not have ready access to skilled AI engineers to tune adapter designs and training recipes. As a result, adaptation cycles can stretch from weeks to months, particularly in few-shot settings. Existing PEFT methods either require manual adapter configuration or automated searches that are computationally infeasible in few-shot 3D settings. We propose SEA-PEFT (SElf-Auditing Parameter-Efficient Fine-Tuning) to automate this process. SEA-PEFT treats adapter configuration as an online allocation problem solved during fine-tuning rather than through manual, fixed-topology choices. SEA-PEFT uses a search-audit-allocate loop that trains active adapters, estimates each adapter's Dice utility by momentarily toggling it off, and then reselects the active set under a parameter budget using a greedy knapsack allocator. Exponential Moving Average and Interquartile Range smoothing, together with a Finite-State Ranking controller, stabilize the loop and improve reliability in high-noise few-shot regimes. On TotalSegmentator and FLARE'22, SEA-PEFT improves mean Dice by 2.4--2.8 points over the strongest fixed-topology PEFT baselines across 1/5/10-shot settings while training <1% of parameters. For reproducibility purposes, we made our code publicly available at https://github.com/tsly123/SEA_PEFT


【7】JAWS: Enhancing Long-term Rollout of Neural Operators via Spatially-Adaptive Jacobian Regularization
标题:JAWS:通过空间自适应雅可比正则化增强神经运算符的长期滚动
链接:https://arxiv.org/abs/2603.05538

作者:Fengxiang Nie,Yasuhiro Suzuki
备注:11 pages, 16 figures
摘要:数据驱动的代理模型提高了模拟连续动态系统的效率,但其自回归推出往往受到不稳定性和谱爆破的限制。虽然全局正则化技术可以加强收缩动力学,但它们一致地阻尼高频特征,从而引入收缩耗散困境。此外,显式校正漂移的长时间轨迹优化方法受到内存约束的限制。在这项工作中,我们提出了雅可比自适应加权稳定性(JAWS),旨在减轻这些限制的概率正则化策略。通过将算子学习框架为具有空间异方差不确定性的最大后验概率(MAP)估计,JAWS基于局部物理复杂性动态地调节正则化强度。这允许模型在平滑区域强制收缩以抑制噪声,同时放松奇异特征附近的约束以保持梯度,有效地实现类似于数值激波捕获方案的行为。实验表明,这种空间自适应先验作为一个有效的频谱预处理,这减少了基础运营商的负担,处理高频不稳定性。这种减少使内存效率,短期轨迹优化,以匹配或超过长期基线的长期精度。在一维粘性Burgers方程上进行评估,我们的混合方法提高了长期稳定性,冲击保真度和分布外泛化,同时降低了训练计算成本。
摘要:Data-driven surrogate models improve the efficiency of simulating continuous dynamical systems, yet their autoregressive rollouts are often limited by instability and spectral blow-up. While global regularization techniques can enforce contractive dynamics, they uniformly damp high-frequency features, introducing a contraction-dissipation dilemma. Furthermore, long-horizon trajectory optimization methods that explicitly correct drift are bottlenecked by memory constraints. In this work, we propose Jacobian-Adaptive Weighting for Stability (JAWS), a probabilistic regularization strategy designed to mitigate these limitations. By framing operator learning as Maximum A Posteriori (MAP) estimation with spatially heteroscedastic uncertainty, JAWS dynamically modulates the regularization strength based on local physical complexity. This allows the model to enforce contraction in smooth regions to suppress noise, while relaxing constraints near singular features to preserve gradients, effectively realizing a behavior similar to numerical shock-capturing schemes. Experiments demonstrate that this spatially-adaptive prior serves as an effective spectral pre-conditioner, which reduces the base operator's burden of handling high-frequency instabilities. This reduction enables memory-efficient, short-horizon trajectory optimization to match or exceed the long-term accuracy of long-horizon baselines. Evaluated on the 1D viscous Burgers' equation, our hybrid approach improves long-term stability, shock fidelity, and out-of-distribution generalization while reducing training computational costs.


【8】SPPCSO: Adaptive Penalized Estimation Method for High-Dimensional Correlated Data
标题:SPPCSO:多维相关数据的自适应惩罚估计方法
链接:https://arxiv.org/abs/2603.06251

作者:Ying Hu,Hu Yang
摘要:随着高维相关数据的出现,多重共线性对模型稳定性提出了重大挑战,往往导致估计不稳定和预测精度降低。本文提出了单参数主成分选择算子(SPPCSO),它是一种创新的惩罚估计方法,它结合了单参数主成分回归和L1正则化,通过引入主成分信息自适应地调整收缩因子。该方法实现了变量选择和系数估计之间的平衡,即使在高维、高噪声环境下也能确保模型的稳定性和鲁棒估计。主要贡献在于解决传统变量选择方法在应用于高噪声、高维相关数据时的不稳定性。理论上,我们的方法具有选择一致性,并实现了一个较小的估计误差界相比,传统的惩罚估计方法。大量的数值实验表明,SPPCSO不仅能在高噪声环境下提供稳定可靠的估计,而且能在噪声变量高度相关的群体效应结构化数据中准确区分信号变量和噪声变量,有效地消除冗余变量,实现更稳定的变量选择。此外,SPPCSO在基因表达数据分析中成功识别了疾病相关基因,具有较强的实用价值。结果表明,SPPCSO作为一个理想的工具,高维变量的选择,提供了一个有效的和可解释的解决方案,为相关数据建模。
摘要:With the rise of high-dimensional correlated data, multicollinearity poses a significant challenge to model stability, often leading to unstable estimation and reduced predictive accuracy. This work proposes the Single-Parametric Principal Component Selection Operator (SPPCSO), an innovative penalized estimation method that integrates single-parametric principal component regression and $L_{1}$ regularization to adaptively adjust the shrinkage factor by incorporating principal component information. This approach achieves a balance between variable selection and coefficient estimation, ensuring model stability and robust estimation even in high-dimensional, high-noise environments. The primary contribution lies in addressing the instability of traditional variable selection methods when applied to high-noise, high-dimensional correlated data. Theoretically, our method exhibits selection consistency and achieves a smaller estimation error bound compared to traditional penalized estimation approaches. Extensive numerical experiments demonstrate that SPPCSO not only delivers stable and reliable estimation in high-noise settings but also accurately distinguishes signal variables from noise variables in group-effect structured data with highly correlated noise variables, effectively eliminating redundant variables and achieving more stable variable selection. Furthermore, SPPCSO successfully identifies disease-associated genes in gene expression data analysis, showcasing strong practical value. The results indicate that SPPCSO serves as an ideal tool for high-dimensional variable selection, offering an efficient and interpretable solution for modeling correlated data.


强化学习(7篇)

【1】Boosting deep Reinforcement Learning using pretraining with Logical Options
标题:使用具有逻辑选项的预训练来促进深度强化学习
链接:https://arxiv.org/abs/2603.06565

作者:Zihan Ye,Phil Chau,Raban Emunds,Jannis Blüml,Cedric Derstroff,Quentin Delfosse,Oleg Arenz,Kristian Kersting
摘要:深度强化学习代理通常是不一致的,因为它们过度利用了早期的奖励信号。最近,一些符号化的方法已经解决了这些挑战,编码稀疏的目标以及对齐的计划。然而,纯粹的符号架构规模复杂,难以应用于连续设置。因此,我们提出了一种混合方法,受到人类获得新技能的能力的启发。我们使用一个两阶段的框架,将符号结构注入到基于神经的强化学习代理中,而不会牺牲深度策略的表达能力。我们的方法被称为混合分层RL(H^2RL),它引入了一种基于逻辑选项的预训练策略,以引导学习策略远离短期奖励循环,转向目标导向行为,同时允许通过标准环境交互来细化最终策略。从经验上讲,我们表明,这种方法始终提高长期决策,并产生代理,超越强大的神经,符号和神经符号基线。
摘要:Deep reinforcement learning agents are often misaligned, as they over-exploit early reward signals. Recently, several symbolic approaches have addressed these challenges by encoding sparse objectives along with aligned plans. However, purely symbolic architectures are complex to scale and difficult to apply to continuous settings. Hence, we propose a hybrid approach, inspired by humans' ability to acquire new skills. We use a two-stage framework that injects symbolic structure into neural-based reinforcement learning agents without sacrificing the expressivity of deep policies. Our method, called Hybrid Hierarchical RL (H^2RL), introduces a logical option-based pretraining strategy to steer the learning policy away from short-term reward loops and toward goal-directed behavior while allowing the final policy to be refined via standard environment interaction. Empirically, we show that this approach consistently improves long-horizon decision-making and yields agents that outperform strong neural, symbolic, and neuro-symbolic baselines.


【2】A Reference Architecture of Reinforcement Learning Frameworks
标题:强化学习框架的参考架构
链接:https://arxiv.org/abs/2603.06413

作者:Xiaoran Liu,Istvan David
摘要:强化学习(RL)应用的激增产生了各种支持技术,如RL框架。然而,这些框架的架构模式在不同的实现中是不一致的,并且没有参考架构(RA)来形成比较、评估和集成的公共基础。为了解决这个问题,我们提出了一个RL框架的RA。通过扎根理论的方法,我们分析了18个国家的实践强化学习框架,并通过这一点,我们确定了经常性的架构组件和它们的关系,并将它们编入RA。为了证明我们的RA,我们重建特征RL模式。最后,我们确定架构趋势,例如,常用的组件,并概述了改进RL框架的途径。
摘要:The surge in reinforcement learning (RL) applications gave rise to diverse supporting technology, such as RL frameworks. However, the architectural patterns of these frameworks are inconsistent across implementations and there exists no reference architecture (RA) to form a common basis of comparison, evaluation, and integration. To address this gap, we propose an RA of RL frameworks. Through a grounded theory approach, we analyze 18 state-of-the-practice RL frameworks and, by that, we identify recurring architectural components and their relationships, and codify them in an RA. To demonstrate our RA, we reconstruct characteristic RL patterns. Finally, we identify architectural trends, e.g., commonly used components, and outline paths to improving RL frameworks.


【3】Synthetic Monitoring Environments for Reinforcement Learning
标题:强化学习的合成监控环境
链接:https://arxiv.org/abs/2603.06252

作者:Leonard Pleiss,Carolin Schmidt,Maximilian Schiffer
摘要:强化学习(RL)缺乏能够对代理行为进行精确白盒诊断的基准。当前的环境往往纠缠于复杂性因素,缺乏真实的最优性指标,因此很难隔离算法失败的原因。我们介绍合成监控环境(SME),一个无限的连续控制任务套件。SME提供完全可配置的任务特性和已知的最佳策略。因此,中小企业允许即时后悔的精确计算。它们严格的几何状态空间边界允许系统的分布内(WD)和分布外(OOD)评估。我们通过PPO,TD 3和SAC的多维消融展示了框架的好处,揭示了特定的环境属性-如动作或状态空间大小,奖励稀疏性和最佳策略的复杂性-如何影响WD和OOD性能。因此,我们表明,中小企业提供了一个标准化的,透明的测试平台过渡RL评估从实证基准到严格的科学分析。
摘要 :Reinforcement Learning (RL) lacks benchmarks that enable precise, white-box diagnostics of agent behavior. Current environments often entangle complexity factors and lack ground-truth optimality metrics, making it difficult to isolate why algorithms fail. We introduce Synthetic Monitoring Environments (SMEs), an infinite suite of continuous control tasks. SMEs provide fully configurable task characteristics and known optimal policies. As such, SMEs allow for the exact calculation of instantaneous regret. Their rigorous geometric state space bounds allow for systematic within-distribution (WD) and out-of-distribution (OOD) evaluation. We demonstrate the framework's benefit through multidimensional ablations of PPO, TD3, and SAC, revealing how specific environmental properties - such as action or state space size, reward sparsity and complexity of the optimal policy - impact WD and OOD performance. We thereby show that SMEs offer a standardized, transparent testbed for transitioning RL evaluation from empirical benchmarking toward rigorous scientific analysis.


【4】TADPO: Reinforcement Learning Goes Off-road
标题:TADPO:强化学习走向越野
链接:https://arxiv.org/abs/2603.05995

作者:Zhouchonghao Wu,Raymond Song,Vedant Mundheda,Luis E. Navarro-Serment,Christof Schoenborn,Jeff Schneider
备注:8 pages, 5 figures, 2 tables. Accepted at ICRA 2026
摘要:越野自动驾驶带来了巨大的挑战,例如导航未映射,具有不确定性和多样化动态的可变地形。应对这些挑战需要有效的长期规划和适应性控制。强化学习(RL)通过直接从交互中学习控制策略提供了一个很有前途的解决方案。然而,由于越野驾驶是一项具有低信号奖励的长时间任务,因此标准RL方法在这种情况下应用具有挑战性。我们介绍了TADPO,一种新的政策梯度制定,扩展了近端政策优化(PPO),利用政策外的轨迹教师指导和政策上的轨迹学生探索。在此基础上,我们开发了一种基于视觉的端到端RL系统,用于高速越野驾驶,能够在极端斜坡和障碍物丰富的地形中导航。我们展示了我们的性能在模拟,重要的是,zero-shot模拟到真实的传输在一个全面的越野车。据我们所知,这项工作代表了第一次部署基于RL的政策在一个全面的越野平台。
摘要:Off-road autonomous driving poses significant challenges such as navigating unmapped, variable terrain with uncertain and diverse dynamics. Addressing these challenges requires effective long-horizon planning and adaptable control. Reinforcement Learning (RL) offers a promising solution by learning control policies directly from interaction. However, because off-road driving is a long-horizon task with low-signal rewards, standard RL methods are challenging to apply in this setting. We introduce TADPO, a novel policy gradient formulation that extends Proximal Policy Optimization (PPO), leveraging off-policy trajectories for teacher guidance and on-policy trajectories for student exploration. Building on this, we develop a vision-based, end-to-end RL system for high-speed off-road driving, capable of navigating extreme slopes and obstacle-rich terrain. We demonstrate our performance in simulation and, importantly, zero-shot sim-to-real transfer on a full-scale off-road vehicle. To our knowledge, this work represents the first deployment of RL-based policies on a full-scale off-road platform.


【5】MIRACL: A Diverse Meta-Reinforcement Learning for Multi-Objective Multi-Echelon Combinatorial Supply Chain Optimisation
标题:MIRACL:用于多目标多梯队组合供应链优化的多元化元强化学习
链接:https://arxiv.org/abs/2603.05760

作者:Rifny Rachman,Josh Tingey,Richard Allmendinger,Wei Pan,Pradyumn Shukla,Bahrul Ilmi Nasution
摘要:多目标强化学习(MORL)是有效的多级组合供应链优化,其中的任务涉及高维,不确定性和竞争的目标。然而,它的部署在动态环境中的任务特定的再培训和大量的计算成本的需要受到阻碍。我们介绍MIRACL(Meta multI-objective Reinforcement leArning with Composite Learning),一个分层的Meta-MORL框架,允许在不同的任务中进行Few-Shot概括。MIRACL将每个任务分解为结构化的子问题,以实现有效的策略适应,并使用基于Pareto的适应策略跨任务元学习全局策略,以鼓励元训练和微调的多样性。据我们所知,这是Meta-MORL与组合优化中的此类机制的第一次整合。MIRACL虽然在供应链领域得到了验证,但在理论上是领域无关的,适用于更广泛的动态多目标决策问题。实证评估表明,MIRACL在简单到中等的任务中优于传统的MORL基线,实现了高达10%的超容量和5%的预期效用。这些结果强调了MIRACL在多目标问题中强大,有效的适应潜力。
摘要:Multi-objective reinforcement learning (MORL) is effective for multi-echelon combinatorial supply chain optimisation, where tasks involve high dimensionality, uncertainty, and competing objectives. However, its deployment in dynamic environments is hindered by the need for task-specific retraining and substantial computational cost. We introduce MIRACL (Meta multI-objective Reinforcement leArning with Composite Learning), a hierarchical Meta-MORL framework that allows for a few-shot generalisation across diverse tasks. MIRACL decomposes each task into structured subproblems for efficient policy adaptation and meta-learns a global policy across tasks using a Pareto-based adaptation strategy to encourage diversity in meta-training and fine-tuning. To our knowledge, this is the first integration of Meta-MORL with such mechanisms in combinatorial optimisation. Although validated in the supply chain domain, MIRACL is theoretically domain-agnostic and applicable to broader dynamic multi-objective decision-making problems. Empirical evaluations show that MIRACL outperforms conventional MORL baselines in simple to moderate tasks, achieving up to 10% higher hypervolume and 5% better expected utility. These results underscore the potential of MIRACL for robust, efficient adaptation in multi-objective problems.


【6】Reinforcement Learning for Power-Flow Network Analysis
标题:用于潮流网络分析的强化学习
链接:https://arxiv.org/abs/2603.05673

作者:Alperen Ergur,Julia Lindberg,Vinny Miller
备注:more experiments will be added in a relatively soon date
摘要:电力系统潮流方程是描述电力系统注入功率与节点电压之间关系的非线性多变量方程。给定一个网络拓扑,我们感兴趣的是找到具有许多平衡点的网络参数。这相当于找到具有许多实数解的潮流方程的实例。目前最先进的算法在计算代数是不能够回答这个问题的网络涉及的变量超过一个小数目。为了解决这个问题,我们设计了一个概率奖励函数,给出了一个很好的近似这个根计数,和一个状态空间,模仿的空间的潮流方程。我们推导出高斯模型的平均根计数,并将其用作RL代理的基线。代理发现潮流方程的实例,其具有比平均基线多得多的解。这表明了RL在潮流网络设计和分析方面的潜力,以及RL对涉及复杂非线性代数或几何的问题做出有意义贡献的潜力。\footnote{按作者顺序排列,所有作者贡献相同。
摘要:The power flow equations are non-linear multivariate equations that describe the relationship between power injections and bus voltages of electric power networks. Given a network topology, we are interested in finding network parameters with many equilibrium points. This corresponds to finding instances of the power flow equations with many real solutions. Current state-of-the art algorithms in computational algebra are not capable of answering this question for networks involving more than a small number of variables. To remedy this, we design a probabilistic reward function that gives a good approximation to this root count, and a state-space that mimics the space of power flow equations. We derive the average root count for a Gaussian model, and use this as a baseline for our RL agents. The agents discover instances of the power flow equations with many more solutions than the average baseline. This demonstrates the potential of RL for power-flow network design and analysis as well as the potential for RL to contribute meaningfully to problems that involve complex non-linear algebra or geometry. \footnote{Author order alphabetic, all authors contributed equally.


【7】A Novel Hybrid Heuristic-Reinforcement Learning Optimization Approach for a Class of Railcar Shunting Problems
标题:一类铁路车调车问题的新型混合启发式-强化学习优化方法
链接:https://arxiv.org/abs/2603.05579

作者:Ruonan Zhao,Joseph Geunes
摘要 :铁路调车是货运铁路站场的核心规划任务,站场规划人员需要拆卸和重新组装成出站列车的铁路车辆组。只能从一端访问的分类轨道可以被视为堆栈结构,其中轨道车仅从一端添加和移除,导致后进先出(LIFO)检索顺序。相比之下,双侧轨道的功能类似于队列结构,允许轨道车从一端添加并从另一端移除,遵循先进先出(FIFO)顺序。我们考虑一个问题,需要组装多个出站列车使用两个机车在铁路站场与两侧分类轨道访问。为了解决这一组合具有挑战性的问题类,我们将问题分解为两个子问题,每个子问题都具有单侧分类轨道访问和每侧的机车。我们提出了一种新的混合启发式强化学习(HHRL)框架,该框架将铁路特定的启发式解决方案与强化学习方法(特别是Q学习)相结合。所提出的框架利用方法来减少状态-动作空间,并在强化学习过程中指导探索。一系列数值实验的结果表明,HHRL算法的效率和质量的单边访问,单机车问题和双边访问,两机车问题。
摘要:Railcar shunting is a core planning task in freight railyards, where yard planners need to disassemble and reassemble groups of railcars to form outbound trains. Classification tracks with access from one side only can be considered as stack structures, where railcars are added and removed from only one end, leading to a last-in-first-out (LIFO) retrieval order. In contrast, two-sided tracks function like queue structures, allowing railcars to be added from one end and removed from the opposite end, following a first-in-first-out (FIFO) order. We consider a problem requiring assembly of multiple outbound trains using two locomotives in a railyard with two-sided classification track access. To address this combinatorially challenging problem class, we decompose the problem into two subproblems, each with one-sided classification track access and a locomotive on each side. We present a novel Hybrid Heuristic-Reinforcement Learning (HHRL) framework that integrates railway-specific heuristic solution approaches with a reinforcement learning method, specifically Q-learning. The proposed framework leverages methods to decrease the state-action space and guide exploration during reinforcement learning. The results of a series of numerical experiments demonstrate the efficiency and quality of the HHRL algorithm in both one-sided access, single-locomotive problems and two-sided access, two-locomotive problems.


医学相关(2篇)

【1】Artificial Intelligence for Detecting Fetal Orofacial Clefts and Advancing Medical Education
标题:人工智能用于检测胎儿口面部裂缝并推进医学教育
链接:https://arxiv.org/abs/2603.06522

作者:Yuanji Zhang,Yuhao Huang,Haoran Dou,Xiliang Zhu,Chen Ling,Zhong Yang,Lianying Liang,Jiuping Li,Siying Liang,Rui Li,Yan Cao,Yuhan Zhang,Jiewei Lai,Yongsong Zhou,Hongyu Zheng,Xinru Gao,Cheng Yu,Liling Shi,Mengqin Yuan,Honglong Li,Xiaoqiong Huang,Chaoyu Chen,Jialin Zhang,Wenxiong Pan,Alejandro F. Frangi,Guangzhi He,Xin Yang,Yi Xiong,Linliang Yin,Xuedong Deng,Dong Ni
备注:28 pages, 10 figures, 11 tables
摘要:口面裂是最常见的先天性颅面畸形之一,但由于缺乏经验丰富的专家和相对罕见的条件,准确的产前检测仍然具有挑战性。早期和可靠的诊断对于及时进行临床干预和降低相关发病率至关重要。在这里,我们展示了一个人工智能系统,在来自22家医院的9,215名胎儿的超过45,139张超声图像上进行训练,可以诊断胎儿口面裂,灵敏度和特异性分别超过93%和95%,与高级放射科医生的表现相匹配,并大大优于初级放射科医生。当用作医疗副驾驶员时,该系统将初级放射科医生的灵敏度提高了6%以上。除了直接的诊断帮助,该系统还加速了临床专业知识的发展。一项涉及24名放射科医生和学员的试点研究表明,该模型可以提高罕见疾病的专业知识发展。这种双重用途的方法提供了一种可扩展的解决方案,用于在经验丰富的放射科医生稀缺的情况下提高诊断准确性和专业培训。
摘要:Orofacial clefts are among the most common congenital craniofacial abnormalities, yet accurate prenatal detection remains challenging due to the scarcity of experienced specialists and the relative rarity of the condition. Early and reliable diagnosis is essential to enable timely clinical intervention and reduce associated morbidity. Here we show that an artificial intelligence system, trained on over 45,139 ultrasound images from 9,215 fetuses across 22 hospitals, can diagnose fetal orofacial clefts with sensitivity and specificity exceeding 93% and 95% respectively, matching the performance of senior radiologists and substantially outperforming junior radiologists. When used as a medical copilot, the system raises junior radiologists' sensitivity by more than 6%. Beyond direct diagnostic assistance, the system also accelerates the development of clinical expertise. A pilot study involving 24 radiologists and trainees demonstrated that the model can improve the expertise development for rare conditions. This dual-purpose approach offers a scalable solution for improving both diagnostic accuracy and specialist training in settings where experienced radiologists are scarce.


【2】FuseDiff: Symmetry-Preserving Joint Diffusion for Dual-Target Structure-Based Drug Design
标题:FuseDiff:基于双目标结构的药物设计的保持对称的联合扩散
链接:https://arxiv.org/abs/2603.05567

作者:Jianliang Wu,Anjie Qiao,Zhen Wang,Zhewei Wei,Sheng Chen
摘要:基于双靶点结构的药物设计旨在产生单个配体以及两个口袋特异性结合位姿,每个结合位姿均与相应的靶口袋兼容,从而实现具有更高功效和更低耐药性的多药疗法。现有的方法通常依赖于分级管道,其通过条件独立性假设将两个姿势解耦或强制执行过于严格的相关性,因此无法联合生成两个目标特定的绑定模式。为了解决这个问题,我们提出了FuseDiff,一个端到端的扩散模型,共同生成一个配体分子图和两个口袋的条件下的口袋特异性结合姿势。FuseDiff具有一个带有双目标局部上下文融合(DLCF)的消息传递主干,该主干融合了来自两个口袋的每个配体原子的局部上下文,以实现表达性联合建模,同时保留所需的对称性。与显式键生成一起,FuseDiff在共享图下强制两个姿势的拓扑一致性,同时允许每个口袋中的目标特定几何自适应。为了支持有原则的训练和评估,我们导出了一个双目标训练集,并使用一个独立的测试集进行评估。基准测试和真实双目标系统上的实验表明,FuseDiff实现了最先进的对接性能,并能够在基于对接的姿态搜索之前首次系统评估双目标姿态质量。
摘要:Dual-target structure-based drug design aims to generate a single ligand together with two pocket-specific binding poses, each compatible with a corresponding target pocket, enabling polypharmacological therapies with improved efficacy and reduced resistance. Existing approaches typically rely on staged pipelines, which either decouple the two poses via conditional-independence assumptions or enforce overly rigid correlations, and therefore fail to jointly generate two target-specific binding modes. To address this, we propose FuseDiff, an end-to-end diffusion model that jointly generates a ligand molecular graph and two pocket-specific binding poses conditioned on both pockets. FuseDiff features a message-passing backbone with Dual-target Local Context Fusion (DLCF), which fuses each ligand atom's local context from both pockets to enable expressive joint modeling while preserving the desired symmetries. Together with explicit bond generation, FuseDiff enforces topological consistency across the two poses under a shared graph while allowing target-specific geometric adaptation in each pocket. To support principled training and evaluation, we derive a dual-target training set and use an independent held-out test set for evaluation. Experiments on the benchmark and a real-world dual-target system show that FuseDiff achieves state-of-the-art docking performance and enables the first systematic assessment of dual-target pose quality prior to docking-based pose search.


自动驾驶|车辆|车道检测等(1篇)

【1】Spatiotemporal Heterogeneity of AI-Driven Traffic Flow Patterns and Land Use Interaction: A GeoAI-Based Analysis of Multimodal Urban Mobility
标题:人工智能驱动的交通流模式和土地利用相互作用的时空差异:基于地理人工智能的多模式城市交通分析
链接:https://arxiv.org/abs/2603.05581

作者:Olaf Yunus Laitinen Imanov
备注:13 pages, 7 figures, 9 tables. Submitted to Computers, Environment and Urban Systems (Elsevier)
摘要 :城市交通流是由土地利用结构和时空异质性交通需求之间复杂的、非线性的相互作用决定的。传统的全球回归和时间序列模型不能同时捕捉这些多尺度动态跨多种旅行模式。该研究提出了一个GeoAI混合分析框架,该框架依次集成了多尺度地理加权回归(MGWR),随机森林(RF)和时空图卷积网络(ST-GCN),以模拟交通流模式的时空异质性及其与三种移动模式(机动车,公共交通和主动交通)的土地利用的相互作用。将该框架应用于跨越两种截然不同的城市形态的六个城市的350个交通分析区的经验校准数据集,出现了四个关键发现:(i)GeoAI Hybrid实现了0.119的均方根误差(RMSE)和0.891的R^2,优于所有基准23-62%; ㈡ SHAP分析确定土地使用组合是机动车辆流量的最强预测因素,过境站密度是公共交通的最强预测因素;(iii)DBSCAN聚类识别出五种功能不同的城市交通类型,轮廓得分为0.71,和GeoAI Hybrid残差显示Moran's I=0.218(p<0.001),相对于OLS基线降低了72%;(iv)跨城市转移实验显示适度的集群内可转移性(R^2>=0.78)和有限的跨集群概括性,强调了城市形态背景的首要性。该框架为规划者和交通工程师提供了一个可解释、可扩展的工具包,用于基于证据的多式联运管理和土地使用政策设计。
摘要:Urban traffic flow is governed by the complex, nonlinear interaction between land use configuration and spatiotemporally heterogeneous mobility demand. Conventional global regression and time-series models cannot simultaneously capture these multi-scale dynamics across multiple travel modes. This study proposes a GeoAI Hybrid analytical framework that sequentially integrates Multiscale Geographically Weighted Regression (MGWR), Random Forest (RF), and Spatio-Temporal Graph Convolutional Networks (ST-GCN) to model the spatiotemporal heterogeneity of traffic flow patterns and their interaction with land use across three mobility modes: motor vehicle, public transit, and active transport. Applying the framework to an empirically calibrated dataset of 350 traffic analysis zones across six cities spanning two contrasting urban morphologies, four key findings emerge: (i) the GeoAI Hybrid achieves a root mean squared error (RMSE) of 0.119 and an R^2 of 0.891, outperforming all benchmarks by 23-62%; (ii) SHAP analysis identifies land use mix as the strongest predictor for motor vehicle flows and transit stop density as the strongest predictor for public transit; (iii) DBSCAN clustering identifies five functionally distinct urban traffic typologies with a silhouette score of 0.71, and GeoAI Hybrid residuals exhibit Moran's I=0.218 (p<0.001), a 72% reduction relative to OLS baselines; and (iv) cross-city transfer experiments reveal moderate within-cluster transferability (R^2>=0.78) and limited cross-cluster generalisability, underscoring the primacy of urban morphological context. The framework offers planners and transportation engineers an interpretable, scalable toolkit for evidence-based multimodal mobility management and land use policy design.


点云|SLAM|雷达|激光|深度RGBD相关(1篇)

【1】Why Depth Matters in Parallelizable Sequence Models: A Lie Algebraic View
标题:为什么深度在可并行化序列模型中很重要:李代数观点
链接:https://arxiv.org/abs/2603.05573

作者:Gyuryang Heo,Timothy Ngotiaoco,Kazuki Irie,Samuel J. Gershman,Bernardo Sabatini
摘要:可伸缩的序列模型,如Transformer变体和结构化状态空间模型,通常会用表达能力换取序列级并行性,从而实现有效的训练。在这里,我们研究的范围内的错误和错误的规模时,模型的表现力制度以外的使用李代数控制的角度。我们的理论制定了一个对应的序列模型的深度和塔的李代数扩展。呼应最近的理论研究,我们描述了李代数类的恒定深度序列模型和相应的表达性界限。此外,我们分析得出的近似误差界,并表明,误差随深度的增加呈指数级减小,与这些模型的强实证性能相一致。我们验证我们的理论预测,使用实验上的符号字和连续值的状态跟踪问题。
摘要:Scalable sequence models, such as Transformer variants and structured state-space models, often trade expressivity power for sequence-level parallelism, which enables efficient training. Here we examine the bounds on error and how error scales when models operate outside of their expressivity regimes using a Lie-algebraic control perspective. Our theory formulates a correspondence between the depth of a sequence model and the tower of Lie algebra extensions. Echoing recent theoretical studies, we characterize the Lie-algebraic class of constant-depth sequence models and their corresponding expressivity bounds. Furthermore, we analytically derive an approximation error bound and show that error diminishes exponentially as the depth increases, consistent with the strong empirical performance of these models. We validate our theoretical predictions using experiments on symbolic word and continuous-valued state-tracking problems.


推理|分析|理解|解释(5篇)

【1】Adapter-Augmented Bandits for Online Multi-Constrained Multi-Modal Inference Scheduling
标题:用于在线多约束多模式推理调度的适配器增强Bandits
链接:https://arxiv.org/abs/2603.06403

作者:Xianzhi Zhang,Yue Xu,Yinlin Zhu,Di Wu,Yipeng Zhou,Miao Hu,Guocong Quan
摘要:Multi-modal large language model (MLLM) inference scheduling enables strong response quality under practical and heterogeneous budgets, beyond what a homogeneous single-backend setting can offer. Yet online MLLM task scheduling is nontrivial, as requests vary sharply in modality composition and latent reasoning difficulty, while execution backends incur distinct, time-varying costs due to system jitter and network variation. These coupled uncertainties pose two core challenges: deriving semantically faithful yet scheduling-relevant multi-modal task representations, and making low-overhead online decisions over irreversible multi-dimensional budgets. Accordingly, we propose \emph{M-CMAB} (\underline{M}ulti-modal \underline{M}ulti-constraint \underline{C}ontextual \underline{M}ulti-\underline{A}rmed \underline{B}andit), a multi-adapter-enhanced MLLM inference scheduling framework with three components: (i) a CLS-attentive, frozen-backbone \emph{Predictor} that extracts compact task representations and updates only lightweight adapters for action-specific estimation; (ii) a primal-dual \emph{Constrainer} that maintains online Lagrange multipliers to enforce long-horizon constraints via per-round objectives; and (iii) a two-phase \emph{Scheduler} that balances exploration and exploitation under irreversible budgets. We establish a regret guarantee under multi-dimensional knapsack constraints. On a composite multimodal benchmark with heterogeneous backends, \emph{M-CMAB} consistently outperforms state-of-the-art baselines across budget regimes, achieving up to 14.18% higher reward and closely tracking an oracle-aided upper bound. Codes are available at https://anonymous.4open.science/r/M2CMAB/.
摘要:Multi-modal large language model (MLLM) inference scheduling enables strong response quality under practical and heterogeneous budgets, beyond what a homogeneous single-backend setting can offer. Yet online MLLM task scheduling is nontrivial, as requests vary sharply in modality composition and latent reasoning difficulty, while execution backends incur distinct, time-varying costs due to system jitter and network variation. These coupled uncertainties pose two core challenges: deriving semantically faithful yet scheduling-relevant multi-modal task representations, and making low-overhead online decisions over irreversible multi-dimensional budgets. Accordingly, we propose \emph{M-CMAB} (\underline{M}ulti-modal \underline{M}ulti-constraint \underline{C}ontextual \underline{M}ulti-\underline{A}rmed \underline{B}andit), a multi-adapter-enhanced MLLM inference scheduling framework with three components: (i) a CLS-attentive, frozen-backbone \emph{Predictor} that extracts compact task representations and updates only lightweight adapters for action-specific estimation; (ii) a primal-dual \emph{Constrainer} that maintains online Lagrange multipliers to enforce long-horizon constraints via per-round objectives; and (iii) a two-phase \emph{Scheduler} that balances exploration and exploitation under irreversible budgets. We establish a regret guarantee under multi-dimensional knapsack constraints. On a composite multimodal benchmark with heterogeneous backends, \emph{M-CMAB} consistently outperforms state-of-the-art baselines across budget regimes, achieving up to 14.18% higher reward and closely tracking an oracle-aided upper bound. Codes are available at https://anonymous.4open.science/r/M2CMAB/.


【2】Agentic retrieval-augmented reasoning reshapes collective reliability under model variability in radiology question answering
标题:统计检索增强推理在放射学问题回答中模型变异性下重塑集体可靠性
链接:https://arxiv.org/abs/2603.06271

作者:Mina Farajiamiri,Jeta Sopa,Saba Afza,Lisa Adams,Felix Barajas Ordonez,Tri-Thien Nguyen,Mahshad Lotfinia,Sebastian Wind,Keno Bressem,Sven Nebelung,Daniel Truhn,Soroosh Tayebi Arasteh
摘要:检索增强推理管道越来越多地用于构建大型语言模型(LLM)如何将外部证据纳入临床决策支持。这些系统迭代地检索策展的领域知识,并在选择答案之前将其合成为结构化报告。虽然这样的管道可以提高性能,但它们对模型可变性下的可靠性的影响仍然不清楚。在实际部署中,异构模型可能会以无法通过准确性捕获的方式对齐、分散或同步错误。我们评估了34个LLM对169个专家策划的公开可用的放射学问题,比较zero-shot推理与放射学特定的多步代理检索条件,其中所有模型都收到了来自策划的放射学知识的相同结构化证据报告。推理降低了模型间决策离散度(中位熵0.48 vs. 0.13),并提高了模型间正确性的鲁棒性(平均值0.74 vs. 0.81)。总体上,多数人的共识也增加了(P<0.001)。在两种策略下,共识强度和稳健正确性仍然相关(zero-shot的\r{ho}=0.88; agentic的\r{ho}=0.87),尽管高一致性并不能保证正确性。反应冗长没有显示出有意义的关联与正确性。在572个不正确的输出中,72%与中度或高度临床评估严重程度相关,尽管评分者间一致性较低(\k{appa}=0.02)。因此,主动检索与更集中的决策分布,更强的共识和更高的跨模型鲁棒性的正确性。这些研究结果表明,仅通过准确性或一致性来评估代理系统可能并不总是足够的,并且需要对稳定性、跨模型稳健性和潜在临床影响进行补充分析,以表征模型变异性下的可靠性。
摘要:Agentic retrieval-augmented reasoning pipelines are increasingly used to structure how large language models (LLMs) incorporate external evidence in clinical decision support. These systems iteratively retrieve curated domain knowledge and synthesize it into structured reports before answer selection. Although such pipelines can improve performance, their impact on reliability under model variability remains unclear. In real-world deployment, heterogeneous models may align, diverge, or synchronize errors in ways not captured by accuracy. We evaluated 34 LLMs on 169 expert-curated publicly available radiology questions, comparing zero-shot inference with a radiology-specific multi-step agentic retrieval condition in which all models received identical structured evidence reports derived from curated radiology knowledge. Agentic inference reduced inter-model decision dispersion (median entropy 0.48 vs. 0.13) and increased robustness of correctness across models (mean 0.74 vs. 0.81). Majority consensus also increased overall (P<0.001). Consensus strength and robust correctness remained correlated under both strategies (\r{ho}=0.88 for zero-shot; \r{ho}=0.87 for agentic), although high agreement did not guarantee correctness. Response verbosity showed no meaningful association with correctness. Among 572 incorrect outputs, 72% were associated with moderate or high clinically assessed severity, although inter-rater agreement was low (\k{appa}=0.02). Agentic retrieval therefore was associated with more concentrated decision distributions, stronger consensus, and higher cross-model robustness of correctness. These findings suggest that evaluating agentic systems through accuracy or agreement alone may not always be sufficient, and that complementary analyses of stability, cross-model robustness, and potential clinical impact are needed to characterize reliability under model variability.


【3】Revisiting the (Sub)Optimality of Best-of-N for Inference-Time Alignment
标题:重新审视N中最佳的推理时对齐的(次)最佳性
链接:https://arxiv.org/abs/2603.05739

作者:Ved Sriraman,Adam Block
备注:52 pages
摘要:N中最佳(BoN)采样是一种广泛使用的语言模型的推理时间对齐方法,其中从参考模型中采样N个候选响应,并选择根据学习的奖励模型具有最高预测奖励的响应。尽管其广泛的实际应用,最近的理论工作表明,它是统计次优和容易受到奖励黑客,模型利用学习奖励模型的弱点,以实现高估计奖励,而不是真正提高性能的过程。我们重新审视这个问题的假设,更密切地反映了实践比以前的工作。特别是,与早期专注于预期真实奖励的分析不同,这在许多实际环境中可能没有意义,我们研究了推理时间对齐如何影响胜率,这是一个基于成对比较的度量,与奖励模型在实践中的训练和评估方式更接近。我们证明,在最低条件下的参考模型和学习奖励模型的质量,适当调整的BoN是计算和统计上最优的,在实现高胜率,部分解释其广泛的实际成功。由于BoN在这种情况下仍然容易受到奖励黑客攻击,我们提出了一个简单实用的变体,可以证明消除奖励黑客攻击,同时保持最佳的统计性能。最后,我们证明了在考虑胜率时,先验方法是可证明次优的,突出了在分析推理时间对齐方法时选择适当目标的重要性。
摘要:Best-of-N (BoN) sampling is a widely used inference-time alignment method for language models, whereby N candidate responses are sampled from a reference model and the one with the highest predicted reward according to a learned reward model is selected. Despite its widespread practical use, recent theoretical work has suggested that it is statistically suboptimal and vulnerable to reward hacking, the process by which models exploit weaknesses in the learned reward model to achieve high estimated reward without genuinely improving performance. We revisit this question under assumptions that more closely reflect practice than that of prior work. In particular, in contradistinction to earlier analyses that focused on expected true reward, which may not be meaningful in many practical settings, we investigate how inference-time alignment affects the win-rate, a pairwise comparison-based metric more closely aligned with how reward models are trained and evaluated in practice. We demonstrate that, under minimal conditions on the quality of the reference model and learned reward model, properly tuned BoN is both computationally and statistically optimal in achieving high win-rate, partially explaining its widespread practical success. Because BoN remains susceptible to reward-hacking in this setting, we propose a simple and practical variant that provably eliminates reward-hacking while maintaining optimal statistical performance. Finally, we show that prior approaches are provably suboptimal when considering win-rate, highlighting the importance of choosing appropriate objectives when analyzing inference-time alignment methods.


【4】Prediction-Powered Conditional Inference
标题:预测动力条件推理
链接:https://arxiv.org/abs/2603.05575

作者:Yang Sui,Jin Zhou,Hua Zhou,Xiaowu Dai
摘要:我们研究了在标记数据稀缺,未标记协变量丰富,并且黑盒机器学习预测器可用的情况下的预测动力条件推理。目标是对在固定测试点评估的条件泛函(例如条件均值)执行统计推断,而不对条件关系施加参数模型。我们的方法结合了本地化与基于预测的方差减少。首先,我们介绍了一种基于再生核的本地化方法,该方法从协变量中学习数据自适应权重函数,并将测试点处的目标条件矩重新表示为加权无条件矩。其次,我们通过对该局部时刻进行基于校正的分解来整合机器学习预测,从而产生一个预测动力估计器和置信区间,当预测器提供信息时,可以减少方差,同时保持有效性,而不管预测器的准确性如何。我们建立了非渐近误差界和最小最大最优收敛速度的估计,证明逐点渐近正态性与一致的方差估计,并提供了一个明确的方差分解,其特征在于机器学习预测和未标记的协变量如何提高统计效率。模拟和真实数据集上的数值实验证明了有效的条件覆盖率和比其他方法更清晰的置信区间。
摘要:We study prediction-powered conditional inference in the setting where labeled data are scarce, unlabeled covariates are abundant, and a black-box machine-learning predictor is available. The goal is to perform statistical inference on conditional functionals evaluated at a fixed test point, such as conditional means, without imposing a parametric model for the conditional relationship. Our approach combines localization with prediction-based variance reduction. First, we introduce a reproducing kernel-based localization method that learns a data-adaptive weight function from covariates and reformulates the target conditional moment at the test point as a weighted unconditional moment. Second, we incorporate machine-learning predictions through a correction-based decomposition of this localized moment, yielding a prediction-powered estimator and confidence interval that reduce variance when the predictor is informative while preserving validity regardless of predictor accuracy. We establish nonasymptotic error bounds and minimax-optimal convergence rates for the resulting estimator, prove pointwise asymptotic normality with consistent variance estimation, and provide an explicit variance decomposition that characterizes how machine-learning predictions and unlabeled covariates improve statistical efficiency. Numerical experiments on simulated and real datasets demonstrate valid conditional coverage and substantially sharper confidence intervals than alternative methods.


【5】Machine Learning for analysis of Multiple Sclerosis cross-tissue bulk and single-cell transcriptomics data
标题:机器学习用于分析多发性硬化跨组织批量和单细胞转录组学数据
链接:https://arxiv.org/abs/2603.05572

作者:Francesco Massafra,Samuele Punzo,Silvia Giulia Galfré,Alessandro Maglione,Simone Pernice,Stefano Forti,Simona Rolla,Marco Beccuti,Marinella Clerico,Corrado Priami,Alina Sîrbu
摘要:多发性硬化症(MS)是一种中枢神经系统的慢性自身免疫性疾病,其分子机制仍不完全清楚。在这项研究中,我们开发了一个端到端的机器学习管道来分析来自外周血单核细胞和脑脊液的转录组数据,整合了批量微阵列和单细胞RNA测序数据集(集中在CD 4+和B细胞上)。经过严格的预处理,批量校正和基因去聚类,XGBoost分类器被训练来区分MS患者和健康对照。采用可解释的人工智能工具SHapley Additive exPlanations(SHAP)来识别驱动分类的关键基因,并将结果与差异表达分析(DEA)进行比较。通过相互作用网络和途径富集分析进一步研究SHAP优先基因。这些模型取得了强劲的性能,特别是在CSF B细胞(AUC=0.94)和微阵列(AUC=0.86)中。SHAP基因选择被证明是经典DEA的补充。在多个数据集中识别的基因簇突出了免疫激活,非经典免疫检查点(ITK,CLEC 2D,KLRG 1,CEACAM 1),核糖体和翻译程序,泛素-蛋白酶体调节,脂质运输和EB病毒相关途径。我们的综合性和可解释的框架揭示了传统分析之外的互补见解,并为MS发病机制提供了新的机制假设和潜在的生物标志物。
摘要:Multiple Sclerosis (MS) is a chronic autoimmune disease of the central nervous system whose molecular mechanisms remain incompletely understood. In this study, we developed an end-to-end machine learning pipeline to analyze transcriptomic data from peripheral blood mononuclear cells and cerebrospinal fluid, integrating both bulk microarray and single-cell RNA sequencing datasets (concentrating on CD4+ and B-cells). After rigorous preprocessing, batch correction, and gene declustering, XGBoost classifiers were trained to distinguish MS patients from healthy controls. Explainable AI tools, namely SHapley Additive exPlanations (SHAP), were employed to identify key genes driving classification, and results were compared with Differential Expression Analysis (DEA). SHAP-prioritized genes were further investigated through interaction networks and pathway enrichment analyses. The models achieved strong performance, particularly in CSF B-cells (AUC=0.94) and microarray (AUC=0.86). SHAP gene selection proved to be complementary to classical DEA. Gene clusters identified across multiple datasets highlighted immune activation, non-canonical immune checkpoints (ITK, CLEC2D, KLRG1, CEACAM1), ribosomal and translational programs, ubiquitin-proteasome regulation, lipid trafficking, and Epstein-Barr virus-related pathways. Our integrative and explainable framework reveals complementary insights beyond conventional analysis and provides novel mechanistic hypotheses and potential biomarkers for MS pathogenesis.


检测相关(2篇)

【1】DQE: A Semantic-Aware Evaluation Metric for Time Series Anomaly Detection
标题:DQE:时间序列异常检测的语义感知评估指标
链接:https://arxiv.org/abs/2603.06131

作者:Yuewei Li,Dalin Zhang,Huan Li,Xinyi Gong,Hongjun Chu,Zhaohui Song
摘要:时间序列异常检测在近年来取得了显著的进展。然而,评价做法尽管至关重要,但得到的关注相对较少。现有的度量表现出几个局限性:(1)偏向点级覆盖,(2)不敏感或不一致的近距离检测,(3)不充分的惩罚假警报,和(4)阈值或阈值间隔选择引起的不一致性。这些限制可能产生不可靠或违反直觉的结果,阻碍客观进展。在这项工作中,我们从检测语义的角度重新审视了时间序列异常检测的评估,并提出了一个新的度量更全面的评估。我们首先介绍了一种基于检测语义的分区策略,该策略将每个异常的局部时间区域分解为三个功能不同的子区域。使用这种划分,我们评估跨事件的整体检测行为,并为每个子区域设计更细粒度的评分机制,从而实现更可靠和可解释的评估。通过对现有指标的系统研究,我们确定了与阈值间隔选择相关的评估偏差,并采用了一种方法,在整个阈值范围内聚合检测质量,从而消除评估不一致性。对合成数据和真实数据的广泛实验表明,我们的指标提供了稳定的,有区别的,可解释的评价,同时实现了强大的评估与10个广泛使用的指标相比。
摘要:Time series anomaly detection has achieved remarkable progress in recent years. However, evaluation practices have received comparatively less attention, despite their critical importance. Existing metrics exhibit several limitations: (1) bias toward point-level coverage, (2) insensitivity or inconsistency in near-miss detections, (3) inadequate penalization of false alarms, and (4) inconsistency caused by threshold or threshold-interval selection. These limitations can produce unreliable or counterintuitive results, hindering objective progress. In this work, we revisit the evaluation of time series anomaly detection from the perspective of detection semantics and propose a novel metric for more comprehensive assessment. We first introduce a partitioning strategy grounded in detection semantics, which decomposes the local temporal region of each anomaly into three functionally distinct subregions. Using this partitioning, we evaluate overall detection behavior across events and design finer-grained scoring mechanisms for each subregion, enabling more reliable and interpretable assessment. Through a systematic study of existing metrics, we identify an evaluation bias associated with threshold-interval selection and adopt an approach that aggregates detection qualities across the full threshold spectrum, thereby eliminating evaluation inconsistency. Extensive experiments on synthetic and real-world data demonstrate that our metric provides stable, discriminative, and interpretable evaluation, while achieving robust assessment compared with ten widely used metrics.


【2】From Decoupled to Coupled: Robustness Verification for Learning-based Keypoint Detection with Joint Specifications
标题:从去耦合到耦合:具有联合规范的基于学习的关键点检测的鲁棒性验证
链接:https://arxiv.org/abs/2603.05604

作者:Xusheng Luo,Changliu Liu
备注:21 pages, 4 figures, 9 tables. arXiv admin note: text overlap with arXiv:2408.00117
摘要:关键点检测是许多视觉任务的基础,包括姿态估计、视点恢复和3D重建,但现代神经模型仍然容易受到小输入扰动的影响。尽管它的重要性,正式的鲁棒性验证关键点检测器在很大程度上是未开发的,由于高维输入和连续的坐标输出。我们针对基于热图的关键点检测器提出了第一个耦合鲁棒性验证框架,该框架限制了所有关键点的联合偏差,捕获它们的相互依赖性和下游任务要求。与之前的解耦,分类风格的方法,独立验证每个关键点,并产生保守的保证,我们的方法验证集体行为。我们使用混合整数线性规划(MILP)将验证制定为证伪问题,该混合整数线性规划将可达热图集与多面体编码联合偏差约束相结合。不可行性证明了鲁棒性,而可行性提供了反例,我们证明了该方法是合理的:如果它证明模型是鲁棒的,那么关键点检测模型就保证是鲁棒的。实验表明,我们的耦合方法实现了高的验证率,并保持有效的严格的错误阈值下解耦的方法失败。
摘要:Keypoint detection underpins many vision tasks, including pose estimation, viewpoint recovery, and 3D reconstruction, yet modern neural models remain vulnerable to small input perturbations. Despite its importance, formal robustness verification for keypoint detectors is largely unexplored due to high-dimensional inputs and continuous coordinate outputs. We propose the first coupled robustness verification framework for heatmap-based keypoint detectors that bounds the joint deviation across all keypoints, capturing their interdependencies and downstream task requirements. Unlike prior decoupled, classification-style approaches that verify each keypoint independently and yield conservative guarantees, our method verifies collective behavior. We formulate verification as a falsification problem using a mixed-integer linear program (MILP) that combines reachable heatmap sets with a polytope encoding joint deviation constraints. Infeasibility certifies robustness, while feasibility provides counterexamples, and we prove the method is sound: if it certifies the model as robust, then the keypoint detection model is guaranteed to be robust. Experiments show that our coupled approach achieves high verified rates and remains effective under strict error thresholds where decoupled methods fail.


分类|识别(3篇)

【1】Tiny, Hardware-Independent, Compression-based Classification
标题:微型、独立于硬件、基于压缩的分类
链接:https://arxiv.org/abs/2603.06359

作者:Charles Meyers,Aaron MacSween,Erik Elmroth,Tommy Löfstedt
摘要 :机器学习的最新发展凸显了在线平台与用户在隐私方面的冲突。随着监管机构和运营商试图对在线平台进行监管,用户隐私的重要性以及对用户数据权力的争夺已经加剧。随着用户越来越意识到隐私问题,客户端数据存储、管理和分析已成为大规模集中式机器学习的首选方法。然而,最先进的机器学习方法需要大量的标记用户数据,这使得它们不适合驻留在客户端并且只能访问单个用户数据的模型。最先进的方法在计算上也是昂贵的,这降低了在计算有限的硬件上的用户体验,并且还减少了电池寿命。最近的一种替代方法已经证明在各种数据的分类任务中非常成功-使用基于压缩的距离度量(称为归一化压缩距离)来测量经典基于距离的机器学习方法中通用对象之间的距离。在这项工作中,我们证明了标准化压缩距离实际上不是一个度量标准;将其开发用于更广泛的内核方法背景,以允许对复杂数据进行建模;并提出了改进使用此距离度量的模型的训练时间的技术。我们证明了归一化压缩距离的工作以及有时比其他指标和内核更好-而只需要稍微多一点的计算成本,尽管缺乏正式的度量属性。最终结果是一个简单的模型,即使在非常少量的样本上训练,也具有非常高的准确性,允许模型足够小且有效,仅使用用户提供的数据即可完全在客户端设备上运行。
摘要:The recent developments in machine learning have highlighted a conflict between online platforms and their users in terms of privacy. The importance of user privacy and the struggle for power over user data has been intensified as regulators and operators attempt to police online platforms. As users have become increasingly aware of privacy issues, client-side data storage, management, and analysis have become a favoured approach to large-scale centralised machine learning. However, state-of-the-art machine learning methods require vast amounts of labelled user data, making them unsuitable for models that reside client-side and only have access to a single user's data. State-of-the-art methods are also computationally expensive, which degrades the user experience on compute-limited hardware and also reduces battery life. A recent alternative approach has proven remarkably successful in classification tasks across a wide variety of data -- using a compression-based distance measure (called normalised compression distance) to measure the distance between generic objects in classical distance-based machine learning methods. In this work, we demonstrate that the normalised compression distance is actually not a metric; develop it for the wider context of kernel methods to allow modelling of complex data; and present techniques to improve the training time of models that use this distance measure. We demonstrate that the normalised compression distance works as well as and sometimes better than other metrics and kernels -- while requiring only marginally more computational costs and in spite of the lack of formal metric properties. The end results is a simple model with remarkable accuracy even when trained on a very small number of samples allowing for models that are small and effective enough to run entirely on a client device using only user-supplied data.


【2】Mitigating Bias in Concept Bottleneck Models for Fair and Interpretable Image Classification
标题:缓解公平和可解释图像分类的概念瓶颈模型中的偏差
链接:https://arxiv.org/abs/2603.05899

作者:Schrasing Tong,Antoine Salaun,Vincent Yuan,Annabel Adeyeri,Lalana Kagal
摘要:确保图像分类的公平性可以防止模型永久化和放大偏见。概念瓶颈模型(CBM)在通过稀疏的单层分类器进行预测之前,将图像映射到高级的人类可解释的概念。这种结构增强了可解释性,并且在理论上通过屏蔽敏感属性代理(如面部特征)来支持公平性。然而,CBM概念已经被称为泄漏与概念语义无关的信息,早期的结果显示,在像ImSitu这样的数据集上,性别偏见只略有减少。我们提出了三个偏见缓解技术,以提高公平性的建立信任措施:1。使用top-k概念过滤器减少信息泄漏,2.消除偏见的概念,3。对抗性去偏见。我们的结果优于以前的工作在公平性和性能的权衡,表明我们的debiased CBM提供了一个重要的一步,公平和可解释的图像分类。
摘要:Ensuring fairness in image classification prevents models from perpetuating and amplifying bias. Concept bottleneck models (CBMs) map images to high-level, human-interpretable concepts before making predictions via a sparse, one-layer classifier. This structure enhances interpretability and, in theory, supports fairness by masking sensitive attribute proxies such as facial features. However, CBM concepts have been known to leak information unrelated to concept semantics and early results reveal only marginal reductions in gender bias on datasets like ImSitu. We propose three bias mitigation techniques to improve fairness in CBMs: 1. Decreasing information leakage using a top-k concept filter, 2. Removing biased concepts, and 3. Adversarial debiasing. Our results outperform prior work in terms of fairness-performance tradeoffs, indicating that our debiased CBM provides a significant step towards fair and interpretable image classification.


【3】Robust support vector model based on bounded asymmetric elastic net loss for binary classification
标题:基于有界非对称弹性净损失的二元分类鲁棒支持载体模型
链接:https://arxiv.org/abs/2603.06257

作者:Haiyan Du,Hu Yang
摘要:本文提出了一种新的有界非对称弹性网络损失函数,并将其与支持向量机相结合,得到了BAEN-SVM。L_{baen}$是有界的和非对称的,并且可以退化为非对称弹性网铰链损失、弹球损失和非对称最小二乘损失。BAEN-SVM不仅有效地处理了噪声污染的数据,而且解决了传统SVM中的几何不合理性。通过证明BAEN-SVM的违规容限上界(VTUB),我们证明了该模型是几何定义良好的。此外,我们推导出BAEN-SVM的影响函数是有界的,从理论上保证了其对噪声的鲁棒性。模型的Fisher相合性进一步保证了模型的泛化能力。由于L_{\text{baen}}损失是非凸的,我们设计了一个基于裁剪对偶坐标下降的半二次型算法来有效地解决非凸优化问题。人工和基准数据集上的实验结果表明,该方法优于经典和先进的支持向量机,特别是在噪声环境中。
摘要:In this paper, we propose a novel bounded asymmetric elastic net ($L_{baen}$) loss function and combine it with the support vector machine (SVM), resulting in the BAEN-SVM. The $L_{baen}$ is bounded and asymmetric and can degrade to the asymmetric elastic net hinge loss, pinball loss, and asymmetric least squares loss. BAEN-SVM not only effectively handles noise-contaminated data but also addresses the geometric irrationalities in the traditional SVM. By proving the violation tolerance upper bound (VTUB) of BAEN-SVM, we show that the model is geometrically well-defined. Furthermore, we derive that the influence function of BAEN-SVM is bounded, providing a theoretical guarantee of its robustness to noise. The Fisher consistency of the model further ensures its generalization capability. Since the \( L_{\text{baen}} \) loss is non-convex, we designed a clipping dual coordinate descent-based half-quadratic algorithm to solve the non-convex optimization problem efficiently. Experimental results on artificial and benchmark datasets indicate that the proposed method outperforms classical and advanced SVMs, particularly in noisy environments.


表征(1篇)

【1】CLAIRE: Compressed Latent Autoencoder for Industrial Representation and Evaluation -- A Deep Learning Framework for Smart Manufacturing
标题:CLAIRE:用于工业表示和评估的压缩潜伏自动编码器--智能制造的深度学习框架
链接:https://arxiv.org/abs/2603.06361

作者:Mohammadhossein Ghahramani,Mengchu Zhou
备注:13 pages. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2026
摘要:由于传感器数据固有的复杂性、噪声和冗余性,高维工业环境中的准确故障检测仍然是一个重大挑战。本文介绍了CLAIRE,即,一个混合端到端学习框架,将无监督深度表示学习与监督分类相结合,用于智能制造系统中的智能质量控制。它采用优化的深度自动编码器将原始输入转换为紧凑的潜在空间,有效地捕获内在数据结构,同时抑制不相关或噪声特征。然后将学习的表示馈送到下游分类器中以执行二进制故障预测。在高维数据集上的实验结果表明,CLAIRE显著优于直接在原始特征上训练的传统分类器。此外,该框架采用了一个事后阶段,使用基于博弈论的可解释性技术,分析潜在的空间,并确定最翔实的输入功能,有助于故障预测。拟议的框架强调了将可解释的人工智能与特征感知正则化集成以实现鲁棒故障检测的潜力。所提出的框架的模块化和可解释性使其具有高度的适应性,在以复杂的高维数据为特征的其他领域(如医疗保健,金融和环境监测)中提供有前途的应用。
摘要 :Accurate fault detection in high-dimensional industrial environments remains a major challenge due to the inherent complexity, noise, and redundancy in sensor data. This paper introduces CLAIRE, i.e., a hybrid end-to-end learning framework that integrates unsupervised deep representation learning with supervised classification for intelligent quality control in smart manufacturing systems. It employs an optimized deep autoencoder to transform raw input into a compact latent space, effectively capturing the intrinsic data structure while suppressing irrelevant or noisy features. The learned representations are then fed into a downstream classifier to perform binary fault prediction. Experimental results on a high-dimensional dataset demonstrate that CLAIRE significantly outperforms conventional classifiers trained directly on raw features. Moreover, the framework incorporates a post hoc phase, using a game-theory-based interpretability technique, to analyze the latent space and identify the most informative input features contributing to fault predictions. The proposed framework highlights the potential of integrating explainable AI with feature-aware regularization for robust fault detection. The modular and interpretable nature of the proposed framework makes it highly adaptable, offering promising applications in other domains characterized by complex, high-dimensional data, such as healthcare, finance, and environmental monitoring.


3D|3D重建等相关(2篇)

【1】3D CBCT Artefact Removal Using Perpendicular Score-Based Diffusion Models
标题:使用基于穿孔评分的扩散模型进行3D CBCT伪影去除
链接:https://arxiv.org/abs/2603.06300

作者:Susanne Schaub,Florentin Bieder,Matheus L. Oliveira,Yulan Wang,Dorothea Dagassan-Berndt,Michael M. Bornstein,Philippe C. Cattin
备注:Accepted at DGM4MICCAI 2025
摘要:锥形束计算机断层扫描(CBCT)是牙科中广泛使用的3D成像技术,可提供高分辨率图像,同时最大限度地减少患者的辐射暴露。然而,CBCT非常容易受到高密度物体(如牙科植入物)产生的伪影的影响,这可能会影响图像质量和诊断准确性。为了减少伪影,在投影序列中的植入修复在许多伪影减少方法中起着至关重要的作用。近年来,扩散模型在图像生成中取得了很好的效果,并被广泛应用于图像修复任务中。然而,据我们所知,现有的基于扩散的植入物修复方法在独立的2D投影上操作。这种方法忽略了各个投影之间的相关性,导致重建图像中的不一致性。为了解决这个问题,我们提出了一种基于垂直分数扩散模型的3D牙科种植体修复方法,每个模型在两个不同的平面中训练,并在投影域中操作。投影系列的3D分布通过在采样方案中组合两个2D基于分数的扩散模型来建模。我们的研究结果表明,该方法的有效性,在生产高品质,减少伪影的三维CBCT图像,使其成为一个有前途的解决方案,以改善临床成像。
摘要:Cone-beam computed tomography (CBCT) is a widely used 3D imaging technique in dentistry, offering high-resolution images while minimising radiation exposure for patients. However, CBCT is highly susceptible to artefacts arising from high-density objects such as dental implants, which can compromise image quality and diagnostic accuracy. To reduce artefacts, implant inpainting in the sequence of projections plays a crucial role in many artefact reduction approaches. Recently, diffusion models have achieved state-of-the-art results in image generation and have widely been applied to image inpainting tasks. However, to our knowledge, existing diffusion-based methods for implant inpainting operate on independent 2D projections. This approach neglects the correlations among individual projections, resulting in inconsistencies in the reconstructed images. To address this, we propose a 3D dental implant inpainting approach based on perpendicular score-based diffusion models, each trained in two different planes and operating in the projection domain. The 3D distribution of the projection series is modelled by combining the two 2D score-based diffusion models in the sampling scheme. Our results demonstrate the method's effectiveness in producing high-quality, artefact-reduced 3D CBCT images, making it a promising solution for improving clinical imaging.


【2】Latent Diffusion-Based 3D Molecular Recovery from Vibrational Spectra
标题:基于潜在扩散的振动光谱的3D分子恢复
链接:https://arxiv.org/abs/2603.06113

作者:Wenjin Wu,Aleš Leonardis,Linjiang Chen,Jianbo Jiao
备注:27 pages, 10 figures
摘要:红外(IR)光谱是一种振动光谱,广泛用于分子结构测定,并为化学家提供关键的结构信息。然而,现有的从红外光谱恢复分子结构的方法通常依赖于一维SMILES字符串或二维分子图,这些方法无法捕捉光谱特征和三维分子几何形状之间的复杂关系。扩散模型的最新进展极大地增强了在3D空间中生成分子结构的能力。然而,没有现有的模型已经探索了对应于单个IR光谱的3D分子几何形状的分布。在这项工作中,我们介绍了IR-GeoDiff,一个潜在的扩散模型,恢复3D分子的几何形状从红外光谱的光谱信息集成到节点和边缘表示的分子结构。我们从光谱和结构的角度评估IR-GeoDiff,证明其能够恢复对应于给定IR光谱的分子分布。此外,基于注意力的分析表明,该模型能够集中在红外光谱中的特征官能团区域,定性地与常见的化学解释实践相一致。
摘要:Infrared (IR) spectroscopy, a type of vibrational spectroscopy, is widely used for molecular structure determination and provides critical structural information for chemists. However, existing approaches for recovering molecular structures from IR spectra typically rely on one-dimensional SMILES strings or two-dimensional molecular graphs, which fail to capture the intricate relationship between spectral features and three-dimensional molecular geometry. Recent advances in diffusion models have greatly enhanced the ability to generate molecular structures in 3D space. Yet, no existing model has explored the distribution of 3D molecular geometries corresponding to a single IR spectrum. In this work, we introduce IR-GeoDiff, a latent diffusion model that recovers 3D molecular geometries from IR spectra by integrating spectral information into both node and edge representations of molecular structures. We evaluate IR-GeoDiff from both spectral and structural perspectives, demonstrating its ability to recover the molecular distribution corresponding to a given IR spectrum. Furthermore, an attention-based analysis reveals that the model is able to focus on characteristic functional group regions in IR spectra, qualitatively consistent with common chemical interpretation practices.


编码器(1篇)

【1】A Persistent-State Dataflow Accelerator for Memory-Bound Linear Attention Decode on FPGA
标题:一种用于存储器限制线性注意力解码的持久状态数据流加速器
链接:https://arxiv.org/abs/2603.05931

作者:Neelesh Gupta,Peter Wang,Rajgopal Kannan,Viktor K. Prasanna
备注:6 pages, 6 figures
摘要:Gated DeltaNet(GDN)是一种线性注意力机制,它用固定大小的递归状态替换不断增长的KV缓存。像Qwen 3-Next这样的混合LLM使用75%的GDN层,并实现了与仅关注模型竞争的准确性。然而,在batch-1,GDN解码在GPU上是内存受限的,因为完整的递归状态必须通过HBM每个令牌进行往返。我们发现,这个瓶颈是架构,而不是算法,因为所有次二次序列模型在解码时表现出低于1 FLOP/B的算术强度,使它们比标准的Transformers更受内存限制。我们提出了一个FPGA加速器,消除了这一瓶颈,保持完整的2 MB经常性的状态持续在片上BRAM,转换的工作负载从内存绑定到计算绑定。我们的设计融合GDN递归到一个五阶段的流水线数据路径,每个令牌的每个状态矩阵只执行一次读取和一次写入,利用分组值注意力进行双头并行,并通过并行流水线重叠准备,计算和输出存储。我们探索了四个设计点的AMD Alveo U 55 C使用Vitis HLS,不同的头级并行从2到16价值头每次迭代。我们的最快配置实现了每个令牌63 $μ$s,比NVIDIA H100 PCIe上的GPU参考快4.5$\times$。实施后的功耗分析报告片内功率为9.96 W,解码每个令牌的能效提高了60倍。
摘要 :Gated DeltaNet (GDN) is a linear attention mechanism that replaces the growing KV cache with a fixed-size recurrent state. Hybrid LLMs like Qwen3-Next use 75% GDN layers and achieve competitive accuracy to attention-only models. However, at batch-1, GDN decode is memory-bound on GPUs since the full recurrent state must be round-tripped through HBM every token. We show that this bottleneck is architectural, not algorithmic, as all subquadratic sequence models exhibit arithmetic intensities below 1 FLOP/B at decode time, making them more memory-bound than standard Transformers. We present an FPGA accelerator that eliminates this bottleneck by holding the full 2 MB recurrent state persistently in on-chip BRAM, converting the workload from memory-bound to compute-bound. Our design fuses the GDN recurrence into a five-phase pipelined datapath that performs only one read and one write pass over each state matrix per token, exploits Grouped Value Attention for paired-head parallelism, and overlaps preparation, computation, and output storage via dataflow pipelining. We explore four design points on an AMD Alveo U55C using Vitis HLS, varying head-level parallelism from 2 to 16 value-heads per iteration. Our fastest configuration achieves 63 $μ$s per token, 4.5$\times$ faster than the GPU reference on NVIDIA H100 PCIe. Post-implementation power analysis reports 9.96 W on-chip, yielding up to 60$\times$ greater energy efficiency per token decoded.


优化|敛散性(5篇)

【1】SAHOO: Safeguarded Alignment for High-Order Optimization Objectives in Recursive Self-Improvement
标题:SAHOO:渐进式自我改进中的高级优化目标的保障一致
链接:https://arxiv.org/abs/2603.06333

作者:Subramanyam Sahoo,Aman Chadha,Vinija Jain,Divya Chaudhary
备注:Published at ICLR 2026 Workshop on AI with Recursive Self-Improvement. 20 pages, 5 figures
摘要:递归的自我改进正在从理论走向实践:现代系统可以批评、修改和评估自己的输出,但迭代的自我修改可能会带来微妙的对齐漂移。我们介绍了SAHOO,这是一个实用的框架,通过三项保障措施来监控和控制漂移:(i)目标漂移指数(GDI),一个结合了语义、词汇、结构和分布测量的学习多信号检测器;(ii)保存约束检查,强制执行安全关键不变量,例如语法正确性和非幻觉;以及(iii)回归风险量化以标记撤销先前增益的改进周期。在代码生成、数学推理和真实性的189个任务中,SAHOO产生了实质性的质量提升,包括代码任务提高了18.3%,推理提高了16.8%,同时保留了两个领域的约束,并保持了真实性的低违规。在三个周期的18个任务的一个小的验证集上校准延迟。我们进一步绘制了能力对齐边界,显示了有效的早期改进周期,但后来对齐成本上升,并暴露了特定领域的紧张局势,如流畅性与真实性。因此,SAHOO在递归自我改进过程中使对齐保持可测量,可部署,并在规模上进行系统验证。
摘要:Recursive self-improvement is moving from theory to practice: modern systems can critique, revise, and evaluate their own outputs, yet iterative self-modification risks subtle alignment drift. We introduce SAHOO, a practical framework to monitor and control drift through three safeguards: (i) the Goal Drift Index (GDI), a learned multi-signal detector combining semantic, lexical, structural, and distributional measures; (ii) constraint preservation checks that enforce safety-critical invariants such as syntactic correctness and non-hallucination; and (iii) regression-risk quantification to flag improvement cycles that undo prior gains. Across 189 tasks in code generation, mathematical reasoning, and truthfulness, SAHOO produces substantial quality gains, including 18.3 percent improvement in code tasks and 16.8 percent in reasoning, while preserving constraints in two domains and maintaining low violations in truthfulness. Thresholds are calibrated on a small validation set of 18 tasks across three cycles. We further map the capability-alignment frontier, showing efficient early improvement cycles but rising alignment costs later and exposing domain-specific tensions such as fluency versus factuality. SAHOO therefore makes alignment preservation during recursive self-improvement measurable, deployable, and systematically validated at scale.


【2】Agnostic learning in (almost) optimal time via Gaussian surface area
标题:通过高斯表面积在(几乎)最佳时间内进行不可知学习
链接:https://arxiv.org/abs/2603.06027

作者:Lucas Pesenti,Lucas Slot,Manuel Wiedmer
备注:20 pages
摘要:在困难不可知模型中,在高斯边缘下学习概念类的复杂性与其低次多项式的L1 $-逼近性密切相关。对于任何高斯表面积不超过$Γ$的概念类,Klivans等人(2008)证明了度$d = O(Γ^2 /\varepsilon^4)$足以实现$\varepsilon^4-近似。这导致了学习各种概念类的复杂性的最著名的界限。在本文中,我们改进了他们的分析,证明了度$d = \tilde O(Γ^2 /\varepsilon^2)$就足够了。鉴于Diakonikolas等人(2021)的下限,这在统计查询模型中产生了不可知学习多项式阈值函数复杂性的(接近)最佳界限。我们的证明依赖于Feldman等人(2020)的一个构造的直接模拟,Feldman等人考虑了布尔超立方体上的$L_1$-近似。
摘要:The complexity of learning a concept class under Gaussian marginals in the difficult agnostic model is closely related to its $L_1$-approximability by low-degree polynomials. For any concept class with Gaussian surface area at most $Γ$, Klivans et al. (2008) show that degree $d = O(Γ^2 / \varepsilon^4)$ suffices to achieve an $\varepsilon$-approximation. This leads to the best-known bounds on the complexity of learning a variety of concept classes. In this note, we improve their analysis by showing that degree $d = \tilde O (Γ^2 / \varepsilon^2)$ is enough. In light of lower bounds due to Diakonikolas et al. (2021), this yields (near) optimal bounds on the complexity of agnostically learning polynomial threshold functions in the statistical query model. Our proof relies on a direct analogue of a construction of Feldman et al. (2020), who considered $L_1$-approximation on the Boolean hypercube.


【3】Omni-Masked Gradient Descent: Memory-Efficient Optimization via Mask Traversal with Improved Convergence
标题:全屏蔽渐变下降:通过具有改进收敛性的屏蔽轨迹实现内存高效优化
链接:https://arxiv.org/abs/2603.05960

作者:Hui Yang,Tao Ren,Jinyang Jiang,Wan Tian,Yijie Peng
摘要:内存效率优化方法最近越来越受到关注,用于在GPU内存瓶颈下扩展大型语言模型的全参数训练。现有的方法要么缺乏明确的收敛性保证,要么只能达到标准的${\mathcal{O}}(ε^{-4})$迭代复杂性在非凸设置。我们提出了Omni-Masked Gradient Descent(OMGD),一种基于掩码遍历的优化方法,用于内存高效训练,并提供了一个非凸收敛分析,该分析建立了一个严格改进的迭代复杂度为$\tilde{\mathcal{O}}(ε^{-3})$,用于寻找$ε$-近似稳定点。从经验上讲,OMGD是一种轻量级的即插即用方法,可以无缝集成到大多数主流优化器中,在微调和预训练任务中都能获得与竞争基准相比的一致改进。
摘要:Memory-efficient optimization methods have recently gained increasing attention for scaling full-parameter training of large language models under the GPU-memory bottleneck. Existing approaches either lack clear convergence guarantees, or only achieve the standard ${\mathcal{O}}(ε^{-4})$ iteration complexity in the nonconvex settings. We propose Omni-Masked Gradient Descent (OMGD), an optimization method based on mask traversal for memory efficient training, and provide a nonconvex convergence analysis that establishes a strictly improved iteration complexity of $\tilde{\mathcal{O}}(ε^{-3})$ for finding an $ε$-approximate stationary point. Empirically, OMGD is a lightweight, plug-and-play approach that integrates seamlessly into most mainstream optimizers, yielding consistent improvements over competitive baselines in both fine-tuning and pre-training tasks.


【4】First-Order Softmax Weighted Switching Gradient Method for Distributed Stochastic Minimax Optimization with Stochastic Constraints
标题:随机约束分布式随机极小优化的一阶Softmax加权切换梯度法
链接:https://arxiv.org/abs/2603.05774

作者:Zhankun Luo,Antesh Upadhyay,Sang Bin Moon,Abolfazl Hashemi
摘要:研究了带随机约束的分布式随机极大极小优化问题。我们提出了一种新的一阶Softmax加权切换梯度方法量身定制的联邦学习。在完全客户端参与下,我们的算法达到了标准的$\mathcal{O}(ε^{-4})$ oracle复杂度,以满足最优性差距和可行性公差的统一界限$ε$。我们扩展我们的理论分析,以实际的部分参与制度,通过量化客户端抽样噪声通过随机优势假设。此外,通过放松对目标函数的标准有界性假设,我们建立了softmax超参数的严格更严格的下界。我们提供了一个统一的误差分解,并建立了一个尖锐的$\mathcal{O}(\log\frac{1}δ)$高概率收敛保证。最后,我们的框架表明,单循环原始切换机制提供了一个稳定的替代方案,优化最坏情况下的客户端性能,有效地绕过超参数敏感性和收敛振荡经常遇到的传统的原始-对偶或惩罚为基础的方法。通过对Neyman-Pearson(NP)分类和公平分类任务的实验,验证了该算法的有效性。
摘要 :This paper addresses the distributed stochastic minimax optimization problem subject to stochastic constraints. We propose a novel first-order Softmax-Weighted Switching Gradient method tailored for federated learning. Under full client participation, our algorithm achieves the standard $\mathcal{O}(ε^{-4})$ oracle complexity to satisfy a unified bound $ε$ for both the optimality gap and feasibility tolerance. We extend our theoretical analysis to the practical partial participation regime by quantifying client sampling noise through a stochastic superiority assumption. Furthermore, by relaxing standard boundedness assumptions on the objective functions, we establish a strictly tighter lower bound for the softmax hyperparameter. We provide a unified error decomposition and establish a sharp $\mathcal{O}(\log\frac{1}δ)$ high-probability convergence guarantee. Ultimately, our framework demonstrates that a single-loop primal-only switching mechanism provides a stable alternative for optimizing worst-case client performance, effectively bypassing the hyperparameter sensitivity and convergence oscillations often encountered in traditional primal-dual or penalty-based approaches. We verify the efficacy of our algorithm via experiment on the Neyman-Pearson (NP) classification and fair classification tasks.


【5】Learning Optimal Distributionally Robust Individualized Treatment Rules Integrating Multi-Source Data
标题:集成多源数据学习最佳分布稳健的个性化治疗规则
链接:https://arxiv.org/abs/2603.05568

作者:Wenhai Cui,Wen Su,Xingqiu Zhao
摘要:综合分析多个数据集以估计最佳个体化治疗规则(ITR)可以提高决策效率。中心挑战是后移,其中给定协变量的潜在结果的条件分布在源人群和目标人群之间不同。我们提出了一个先验信息为基础的分布鲁棒ITR(PDRO-ITR),最大化最坏情况下的政策价值的协变量依赖的分布不确定性集,确保鲁棒性能下后移。不确定性集被构造为源分布的个性化组合,权重结合先验源成员资格概率和偏差项约束到概率单纯形以适应后移。我们推导出一个封闭形式的解决方案的PDRO-ITR和开发一个自适应的程序来调整的不确定性水平。我们建立了PDRO-ITR估计的风险界限,保证了最坏情况下的鲁棒性能。大量的仿真和两个实际数据的应用表明,所提出的方法实现了优越的性能相比,现有的方法。
摘要:Integrative analysis of multiple datasets for estimating optimal individualized treatment rules (ITRs) can enhance decision efficiency. A central challenge is posterior shift, wherein the conditional distribution of potential outcomes given covariates differs between source and target populations. We propose a prior information-based distributionally robust ITR (PDRO-ITR) that maximizes the worst-case policy value over a covariate-dependent distributional uncertainty set, ensuring robust performance under posterior shift. The uncertainty set is constructed as an individualized combination of source distributions, with weights combining prior source-membership probabilities and deviation terms constrained to the probability simplex to accommodate posterior shift. We derive a closed-form solution for the PDRO-ITR and develop an adaptive procedure to tune the uncertainty level. We establish risk bounds for the PDRO-ITR estimator, which guarantees robust performance under the worst case. Extensive simulations and two real-data applications demonstrate that the proposed method achieves superior performance compared to existing approaches.


预测|估计(4篇)

【1】Improved high-dimensional estimation with Langevin dynamics and stochastic weight averaging
标题:利用Langevin动态和随机权重平均改进的多维估计
链接:https://arxiv.org/abs/2603.06028

作者:Stanley Wei,Alex Damian,Jason D. Lee
摘要:最近的重要工作研究了梯度下降在不同的高维设置中恢复S^{d-1}$中隐藏的种植方向$θ^\star \的能力,包括张量PCA和单索引模型。决定梯度下降穿越这些景观的能力的关键量是信息指数$k^\star$(Ben Arous等人,(2021)),这对应于种群景观中初始化时的鞍阶。Ben Arous等人,(2021)表明,$n \gtrsim d^{\max(1,k^\star-1)}$样本对于在线SGD恢复$θ^\star$是必要和充分的,Ben Arous等人,(2020)证明了朗之万动力学的类似下界。最近,Damian等人,(2023)表明,通过在平滑的景观上运行梯度下降来规避这些下限是可能的,并且该算法成功地使用了$n \gtrsim d^{\max(1,k^\star/2)}$样本,这在最坏的情况下是最优的。这就提出了一个问题,即是否有可能在没有显式平滑的情况下实现相同的速率。在本文中,我们表明,朗之万动力学可以成功与$n \gtrsim d^{ k^\star/2 }$样本,如果考虑的是平均的,而不是最后的。关键的想法是,噪声注入和平滑平均的组合能够模拟景观平滑的效果。我们将此结果应用于张量PCA和单索引模型设置。最后,我们推测minibatch SGD也可以达到相同的速率,而不添加任何额外的噪声。
摘要:Significant recent work has studied the ability of gradient descent to recover a hidden planted direction $θ^\star \in S^{d-1}$ in different high-dimensional settings, including tensor PCA and single-index models. The key quantity that governs the ability of gradient descent to traverse these landscapes is the information exponent $k^\star$ (Ben Arous et al., (2021)), which corresponds to the order of the saddle at initialization in the population landscape. Ben Arous et al., (2021) showed that $n \gtrsim d^{\max(1, k^\star-1)}$ samples were necessary and sufficient for online SGD to recover $θ^\star$, and Ben Arous et al., (2020) proved a similar lower bound for Langevin dynamics. More recently, Damian et al., (2023) showed it was possible to circumvent these lower bounds by running gradient descent on a smoothed landscape, and that this algorithm succeeds with $n \gtrsim d^{\max(1, k^\star/2)}$ samples, which is optimal in the worst case. This raises the question of whether it is possible to achieve the same rate without explicit smoothing. In this paper, we show that Langevin dynamics can succeed with $n \gtrsim d^{ k^\star/2 }$ samples if one considers the average iterate, rather than the last iterate. The key idea is that the combination of noise-injection and iterate averaging is able to emulate the effect of landscape smoothing. We apply this result to both the tensor PCA and single-index model settings. Finally, we conjecture that minibatch SGD can also achieve the same rate without adding any additional noise.


【2】Stochastic Event Prediction via Temporal Motif Transitions
标题:通过时间主题转变预测随机事件
链接:https://arxiv.org/abs/2603.05874

作者:İbrahim Bahadır Altun,Ahmet Erdem Sarıyüce
摘要:带有时间戳的交互网络出现在社会、金融和生物领域,预测未来事件需要对不断演变的拓扑结构和时间顺序进行建模。时间链接预测方法通常将任务框定为具有负采样的二进制分类,丢弃了现实世界交互的顺序和相关性质。我们介绍STEP(STochastic Event Predictor),一个框架,重新制定的时间链接预测连续时间的顺序预测问题。STEP通过泊松过程控制的离散时间基序转换对事件动态进行建模,维护一组随着新交互的到来而演变的开放基序实例。在每一步,框架决定是否启动一个新的时间主题或扩展现有的,选择最可能的事件通过贝叶斯评分的时间可能性和结构先验。STEP还生成紧凑的、基于时间模体的特征向量,这些特征向量可以与现有的时间图神经网络输出连接在一起,丰富它们的表示,而无需修改架构。五个真实世界的数据集上的实验表明,在最先进的基线分类和0.99精度下一个$k$顺序预测的平均精度增益高达21%,持续较低的运行时间比竞争的图案感知方法。
摘要:Networks of timestamped interactions arise across social, financial, and biological domains, where forecasting future events requires modeling both evolving topology and temporal ordering. Temporal link prediction methods typically frame the task as binary classification with negative sampling, discarding the sequential and correlated nature of real-world interactions. We introduce STEP (STochastic Event Predictor), a framework that reformulates temporal link prediction as a sequential forecasting problem in continuous time. STEP models event dynamics through discrete temporal motif transitions governed by Poisson processes, maintaining a set of open motif instances that evolve as new interactions arrive. At each step, the framework decides whether to initiate a new temporal motif or extend an existing one, selecting the most probable event via Bayesian scoring of temporal likelihoods and structural priors. STEP also produces compact, temporal motif-based feature vectors that can be concatenated with existing temporal graph neural network outputs, enriching their representations without architectural modifications. Experiments on five real-world datasets demonstrate up to 21% average precision gains over state-of-the-art baselines in classification and 0.99 precision in next $k$ sequential forecasting, with consistently lower runtime than competing motif-aware methods.


【3】Towards Efficient and Stable Ocean State Forecasting: A Continuous-Time Koopman Approach
标题:实现高效稳定的海洋状态预测:连续时间库普曼方法
链接:https://arxiv.org/abs/2603.05560

作者:Rares Grozavescu,Pengyu Zhang,Mark Girolami,Etienne Meunier
摘要:我们研究了连续时间Koopman自动编码器(CT-KAE)作为一个轻量级的代理模式,在两层准地转(QG)系统的长期地平线的海洋状态预报。通过将非线性动力学投影到由线性常微分方程控制的潜在空间中,该模型强制执行结构化和可解释的时间演化,同时通过矩阵指数公式实现时间分辨率不变的预测。在2083天的推出,CT-KAE表现出有限的误差增长和稳定的大规模统计,与自回归Transformer基线,表现出逐步的误差放大和能量漂移长期推出。虽然细尺度湍流结构的部分消散,散装能量谱,拟能演化,自相关结构保持一致的长期视野。与数值求解器相比,该模型实现了数量级更快的推理,这表明连续时间Koopman代理为高效稳定的混合物理机器学习气候模型提供了一个有前途的骨干。
摘要:We investigate the Continuous-Time Koopman Autoencoder (CT-KAE) as a lightweight surrogate model for long-horizon ocean state forecasting in a two-layer quasi-geostrophic (QG) system. By projecting nonlinear dynamics into a latent space governed by a linear ordinary differential equation, the model enforces structured and interpretable temporal evolution while enabling temporally resolution-invariant forecasting via a matrix exponential formulation. Across 2083-day rollouts, CT-KAE exhibits bounded error growth and stable large-scale statistics, in contrast to autoregressive Transformer baselines which exhibit gradual error amplification and energy drift over long rollouts. While fine-scale turbulent structures are partially dissipated, bulk energy spectra, enstrophy evolution, and autocorrelation structure remain consistent over long horizons. The model achieves orders-of-magnitude faster inference compared to the numerical solver, suggesting that continuous-time Koopman surrogates offer a promising backbone for efficient and stable hybrid physical-machine learning climate models.


【4】U6G XL-MIMO Radiomap Prediction: Multi-Config Dataset and Beam Map Approach
标题:U6 G XL-MMO无线电地图预测:多配置数据集和射束图方法
链接:https://arxiv.org/abs/2603.06401

作者:Xiaojie Li,Yu Han,Zhizheng Lu,Shi Jin,Chao-Kai Wen
备注:This work has been submitted to the IEEE for possible publication
摘要:具有XL-MIMO的上6 GHz(U6 G)频段是第六代无线系统的关键推动因素,但此类系统的智能无线电地图预测仍然具有挑战性。现有的数据集仅支持具有主要各向同性天线的小规模阵列(高达8x8),与6 G设想的1024个元件的定向阵列相去甚远。此外,目前的方法将阵列配置编码为标量参数,迫使神经网络外推阵列特定的辐射模式,这在预测训练数据中不存在的配置的无线电地图时失败。为了解决数据稀缺和推广局限性,本文从三个方面提出了XL-MIMO无线电地图预测。为了克服数据的局限性,我们构建了第一个XL-MIMO无线电地图数据集,包含78400无线电地图在800个城市场景,5个频段(1.8-6.7 GHz),和9个阵列配置高达32 x32均匀的平面阵列与定向元素。为了进行系统的评估,我们建立了一个全面的基准框架,涵盖了从没有现场测量的覆盖估计到看不见的配置和环境的泛化的实际场景。为了能够推广到任意波束配置而无需重新训练,我们提出了波束图,这是一种物理信息空间特征,可以分析计算阵列特定的覆盖模式。通过将确定性阵列辐射与多径传播数据解耦,波束图将泛化从神经网络外推转移到基于物理的计算。将波束图集成到现有架构中,当推广到看不见的配置时,平均绝对误差可降低60.0%,当转移到看不见的环境时,平均绝对误差可降低50.5%。完整的数据集和代码可在https://lxj321.github.io/MulticonfigRadiomapDataset/上公开获取。
摘要:The upper 6 GHz (U6G) band with XL-MIMO is a key enabler for sixth-generation wireless systems, yet intelligent radiomap prediction for such systems remains challenging. Existing datasets support only small-scale arrays (up to 8x8) with predominantly isotropic antennas, far from the 1024-element directional arrays envisioned for 6G. Moreover, current methods encode array configurations as scalar parameters, forcing neural networks to extrapolate array-specific radiation patterns, which fails when predicting radiomaps for configurations absent from training data. To jointly address data scarcity and generalization limitations, this paper advances XL-MIMO radiomap prediction from three aspects. To overcome data limitations, we construct the first XL-MIMO radiomap dataset containing 78400 radiomaps across 800 urban scenes, five frequency bands (1.8-6.7 GHz), and nine array configurations up to 32x32 uniform planar arrays with directional elements. To enable systematic evaluation, we establish a comprehensive benchmark framework covering practical scenarios from coverage estimation without field measurements to generalization across unseen configurations and environments. To enable generalization to arbitrary beam configurations without retraining, we propose the beam map, a physics-informed spatial feature that analytically computes array-specific coverage patterns. By decoupling deterministic array radiation from data learned multipath propagation, beam maps shift generalization from neural network extrapolation to physics-based computation. Integrating beam maps into existing architectures reduces mean absolute error by up to 60.0% when generalizing to unseen configurations and up to 50.5% when transferring to unseen environments. The complete dataset and code are publicly available at https://lxj321.github.io/MulticonfigRadiomapDataset/.


其他神经网络|深度学习|模型|建模(22篇)

【1】Causal Interpretation of Neural Network Computations with Contribution Decomposition
标题:利用贡献分解进行神经网络计算的因果解释
链接:https://arxiv.org/abs/2603.06557

作者:Joshua Brendan Melander,Zaki Alaoui,Shenghua Liu,Surya Ganguli,Stephen A. Baccus
备注:32 pages, 19 figures. ICLR 2026 poster
摘要:理解神经网络如何将输入转换为输出对于解释和操纵其行为至关重要。大多数现有的方法通过识别与人类可解释的概念相关的隐藏层激活模式来分析内部表征。在这里,我们采取直接的方法来研究隐藏的神经元如何驱动网络输出。我们引入了CODEC(贡献分解),这是一种使用稀疏自编码器将网络行为分解为隐藏神经元贡献的稀疏基序的方法,揭示了无法通过单独分析激活来确定的因果过程。将CODEC应用于基准图像分类网络,我们发现贡献在各层的稀疏性和维度上都有所增长,并且出乎意料地,它们对网络输出的积极和消极影响逐渐去相关。我们进一步表明,将贡献分解为稀疏模式可以更好地控制和解释中间层,支持对网络输出的因果操作和对不同图像组件的人类可解释的可视化,这些组件结合起来驱动输出。最后,通过分析脊椎动物视网膜神经活动的最新模型,我们证明了CODEC揭示了模型中间神经元的组合动作,并确定了动态感受野的来源。总的来说,CODEC提供了一个丰富且可解释的框架,用于理解非线性计算如何在分层中演变,建立贡献模式作为分析人工神经网络的机械见解的信息单元。
摘要:Understanding how neural networks transform inputs into outputs is crucial for interpreting and manipulating their behavior. Most existing approaches analyze internal representations by identifying hidden-layer activation patterns correlated with human-interpretable concepts. Here we take a direct approach to examine how hidden neurons act to drive network outputs. We introduce CODEC (Contribution Decomposition), a method that uses sparse autoencoders to decompose network behavior into sparse motifs of hidden-neuron contributions, revealing causal processes that cannot be determined by analyzing activations alone. Applying CODEC to benchmark image-classification networks, we find that contributions grow in sparsity and dimensionality across layers and, unexpectedly, that they progressively decorrelate positive and negative effects on network outputs. We further show that decomposing contributions into sparse modes enables greater control and interpretation of intermediate layers, supporting both causal manipulations of network output and human-interpretable visualizations of distinct image components that combine to drive that output. Finally, by analyzing state-of-the-art models of neural activity in the vertebrate retina, we demonstrate that CODEC uncovers combinatorial actions of model interneurons and identifies the sources of dynamic receptive fields. Overall, CODEC provides a rich and interpretable framework for understanding how nonlinear computations evolve across hierarchical layers, establishing contribution modes as an informative unit of analysis for mechanistic insights into artificial neural networks.


【2】When One Modality Rules Them All: Backdoor Modality Collapse in Multimodal Diffusion Models
标题:当一种模式统治所有人时:多模式扩散模型中的后门模式崩溃
链接:https://arxiv.org/abs/2603.06508

作者:Qitong Wang,Haoran Dai,Haotian Zhang,Christopher Rasmussen,Binghui Wang
备注:Accepted to the ICLR 2026 Workshop on Principled Design for Trustworthy AI. The first two authors contributed equally
摘要:虽然扩散模型已经彻底改变了视觉内容生成,但它们的快速采用强调了调查漏洞的迫切需要,例如,到后门攻击。在多模态扩散模型中,很自然地期望同时攻击多个模态(例如,文本和图像)将产生互补效应,并加强整体后门。在本文中,我们通过调查后门模态崩溃现象来挑战这一假设,后门机制退化到主要依赖于模态的子集,使其他模态变得多余。为了严格量化这种行为,我们引入了两个新的指标:触发器模态归因(TMA)和交叉触发器交互(CTI)。通过在多模态条件扩散的不同训练配置中进行广泛的实验,我们始终观察到后门行为中的“赢家通吃”动态。我们的研究结果表明,(1)攻击往往崩溃到子集模态优势,(2)跨模态的相互作用是可以忽略不计的,甚至是负面的,矛盾的直觉协同脆弱性。这些发现突出了当前评估中的一个关键盲点,表明高攻击成功率往往掩盖了对一部分模式的根本依赖。这为机理分析和未来国防发展奠定了原则基础。
摘要:While diffusion models have revolutionized visual content generation, their rapid adoption has underscored the critical need to investigate vulnerabilities, e.g., to backdoor attacks. In multimodal diffusion models, it is natural to expect that attacking multiple modalities simultaneously (e.g., text and image) would yield complementary effects and strengthen the overall backdoor. In this paper, we challenge this assumption by investigating the phenomenon of Backdoor Modality Collapse, a scenario where the backdoor mechanism degenerates to rely predominantly on a subset of modalities, rendering others redundant. To rigorously quantify this behavior, we introduce two novel metrics: Trigger Modality Attribution (TMA) and Cross-Trigger Interaction (CTI). Through extensive experiments across diverse training configurations in multimodal conditional diffusion, we consistently observe a ``winner-takes-all'' dynamic in backdoor behavior. Our results reveal that (1) attacks often collapse into subset-modality dominance, and (2) cross-modal interaction is negligible or even negative, contradicting the intuition of synergistic vulnerability. These findings highlight a critical blind spot in current assessments, suggesting that high attack success rates often mask a fundamental reliance on a subset of modalities. This establishes a principled foundation for mechanistic analysis and future defense development.


【3】Certified and accurate computation of function space norms of deep neural networks
标题:深度神经网络功能空间规范的认证且准确计算
链接:https://arxiv.org/abs/2603.06431

作者:Johannes Gründler,Moritz Maibaum,Philipp Petersen
摘要:偏微分方程的神经网络方法需要在函数空间范数中进行可靠的误差控制。然而,经过训练的神经网络通常只能在有限数量的点值上进行探测。如果没有强有力的假设,点评估本身并不能提供足够的信息来推导出严格的确定性和有保证的函数空间范数的界限。在这项工作中,我们超越了纯粹的黑盒设置,直接利用神经网络结构。我们提出了一个框架的认证和准确计算的神经网络,包括Lebesgue和Sobolev规范的积分量,通过结合区间算术外壳轴对齐框自适应标记/细化和基于正交的聚合。在每个盒子上,我们计算函数值和导数的保证上下界,并将这些局部证书传播到目标积分的全局上下界。我们的分析提供了一个一般的收敛定理,这种认证的自适应求积程序,并实例化它的功能值,雅可比,和海森,产生认证的计算$L^p$,$W^{1,p}$,和$W^{2,p}$规范。我们进一步展示了这些成分如何导致PINN内部残差的实际认证界限。数值实验表明了所提出的方法的准确性和实用性。
摘要:Neural network methods for PDEs require reliable error control in function space norms. However, trained neural networks can typically only be probed at a finite number of point values. Without strong assumptions, point evaluations alone do not provide enough information to derive tight deterministic and guaranteed bounds on function space norms. In this work, we move beyond a purely black-box setting and exploit the neural network structure directly. We present a framework for the certified and accurate computation of integral quantities of neural networks, including Lebesgue and Sobolev norms, by combining interval arithmetic enclosures on axis-aligned boxes with adaptive marking/refinement and quadrature-based aggregation. On each box, we compute guaranteed lower and upper bounds for function values and derivatives, and propagate these local certificates to global lower and upper bounds for the target integrals. Our analysis provides a general convergence theorem for such certified adaptive quadrature procedures and instantiates it for function values, Jacobians, and Hessians, yielding certified computation of $L^p$, $W^{1,p}$, and $W^{2,p}$ norms. We further show how these ingredients lead to practical certified bounds for PINN interior residuals. Numerical experiments illustrate the accuracy and practical behavior of the proposed methods.


【4】Kinetic-based regularization: Learning spatial derivatives and PDE applications
标题:基于动力学的正则化:学习空间导数和PDE应用
链接:https://arxiv.org/abs/2603.06380

作者:Abhisek Ganguly,Santosh Ansumali,Sauro Succi
备注:Published as a conference paper at ICLR 2026 Workshop AI and PDE
摘要:从离散和噪声数据中精确估计空间导数是科学机器学习和偏微分方程数值解的核心。我们扩展了基于动力学的正则化(KBR),这是一种具有单个可训练参数的局部多维核回归方法,可以在1D中学习具有可证明的二阶精度的空间导数。提出了两种导数学习方案:基于封闭形式预测表达式的显式方案和在感兴趣的点处求解扰动线性系统的隐式方案。完全本地化的配方,使有效的,噪声自适应导数估计,而不需要全局系统求解或启发式平滑。这两种方法都表现出二次收敛,匹配二阶有限差分干净的数据,以及一个可能的高维配方。初步结果表明,耦合KBR与保守的求解器,使稳定的冲击捕获在一维双曲偏微分方程,作为一个步骤,解决偏微分方程的不规则点云在更高的维度,同时保持守恒定律。
摘要:Accurate estimation of spatial derivatives from discrete and noisy data is central to scientific machine learning and numerical solutions of PDEs. We extend kinetic-based regularization (KBR), a localized multidimensional kernel regression method with a single trainable parameter, to learn spatial derivatives with provable second-order accuracy in 1D. Two derivative-learning schemes are proposed: an explicit scheme based on the closed-form prediction expressions, and an implicit scheme that solves a perturbed linear system at the points of interest. The fully localized formulation enables efficient, noise-adaptive derivative estimation without requiring global system solving or heuristic smoothing. Both approaches exhibit quadratic convergence, matching second-order finite difference for clean data, along with a possible high-dimensional formulation. Preliminary results show that coupling KBR with conservative solvers enables stable shock capture in 1D hyperbolic PDEs, acting as a step towards solving PDEs on irregular point clouds in higher dimensions while preserving conservation laws.


【5】Frequency-Separable Hamiltonian Neural Network for Multi-Timescale Dynamics
标题:用于多时间尺度动力学的频率可分离Hamilton神经网络
链接:https://arxiv.org/abs/2603.06354

作者:Yaojun Li,Yulong Yang,Christine Allen-Blanchette
摘要 :虽然哈密顿力学为神经网络建模动态系统提供了强大的归纳偏差,但哈密顿神经网络及其变体通常无法捕获跨越多个时间尺度的复杂时间动态。这种限制通常与深度神经网络的频谱偏差有关,这有利于学习低频,缓慢变化的动态。先前的方法已经试图通过辛积分方案来解决这个问题,辛积分方案强制执行能量守恒,或者通过结合几何约束来在配置空间上施加结构。然而,这样的方法要么仍然是有限的,在他们的能力,以充分捕捉多尺度动态或需要大量的特定领域的假设。在这项工作中,我们利用观察到的哈密顿函数承认分解成明确的快,慢模式,并可以从这些组件重建。我们介绍了频率可分离的哈密顿神经网络(FS-HNN),它使用多个网络来参数化系统哈密顿量,每个网络都由哈密顿动力学控制,并在不同时间尺度上采样的数据上进行训练。通过学习状态和边界条件辛算子,我们进一步将这个框架扩展到偏微分方程。从经验上讲,我们表明,FS-HNN提高了具有挑战性的动力系统的长期外推性能,并在广泛的ODE和PDE问题的推广。
摘要:While Hamiltonian mechanics provides a powerful inductive bias for neural networks modeling dynamical systems, Hamiltonian Neural Networks and their variants often fail to capture complex temporal dynamics spanning multiple timescales. This limitation is commonly linked to the spectral bias of deep neural networks, which favors learning low-frequency, slow-varying dynamics. Prior approaches have sought to address this issue through symplectic integration schemes that enforce energy conservation or by incorporating geometric constraints to impose structure on the configuration-space. However, such methods either remain limited in their ability to fully capture multiscale dynamics or require substantial domain specific assumptions. In this work, we exploit the observation that Hamiltonian functions admit decompositions into explicit fast and slow modes and can be reconstructed from these components. We introduce the Frequency-Separable Hamiltonian Neural Network (FS-HNN), which parameterizes the system Hamiltonian using multiple networks, each governed by Hamiltonian dynamics and trained on data sampled at distinct timescales. We further extend this framework to partial differential equations by learning a state- and boundary-conditioned symplectic operators. Empirically, we show that FS-HNN improves long-horizon extrapolation performance on challenging dynamical systems and generalizes across a broad range of ODE and PDE problems.


【6】Learning to Solve Orienteering Problem with Time Windows and Variable Profits
标题:带时间窗和可变收益的定向运动问题的求解
链接:https://arxiv.org/abs/2603.06260

作者:Songqun Gao,Zanxi Ruan,Patrick Floor,Marco Roveri,Luigi Palopoli,Daniele Fontanelli
备注:Accepted at ICLR 2026
摘要:具有时间窗和可变利润的定向越野问题(OPTWVP)在许多现实世界的应用中很常见,并且涉及连续时间变量。目前的方法无法开发一个有效的解决方案,这个定向越野问题的变量与离散和连续变量。在本文中,我们提出了一个基于学习的两阶段解耦离散-连续优化与服务时间引导轨迹(DeCoST),其目的是有效地解耦离散和连续决策变量的OPTWVP问题,同时使它们之间的有效和可学习的协调。在第一阶段,采用并行解码结构来预测路径和初始服务时间分配。第二阶段通过线性规划(LP)公式优化服务时间,并提供了一个长期的学习结构估计。我们严格证明了第二阶段的解决方案的全局最优性。OPTWVP实例上的实验表明,DeCoST在解质量和计算效率方面优于最先进的构造性求解器和最新的元启发式算法,在节点少于500个的实例上实现了高达6.6倍的推理加速。此外,所提出的框架与各种建设性求解器兼容,并不断提高OPTWVP的解决方案质量。
摘要:The orienteering problem with time windows and variable profits (OPTWVP) is common in many real-world applications and involves continuous time variables. Current approaches fail to develop an efficient solver for this orienteering problem variant with discrete and continuous variables. In this paper, we propose a learning-based two-stage DEcoupled discrete-Continuous optimization with Service-time-guided Trajectory (DeCoST), which aims to effectively decouple the discrete and continuous decision variables in the OPTWVP problem, while enabling efficient and learnable coordination between them. In the first stage, a parallel decoding structure is employed to predict the path and the initial service time allocation. The second stage optimizes the service times through a linear programming (LP) formulation and provides a long-horizon learning of structure estimation. We rigorously prove the global optimality of the second-stage solution. Experiments on OPTWVP instances demonstrate that DeCoST outperforms both state-of-the-art constructive solvers and the latest meta-heuristic algorithms in terms of solution quality and computational efficiency, achieving up to 6.6x inference speedup on instances with fewer than 500 nodes. Moreover, the proposed framework is compatible with various constructive solvers and consistently enhances the solution quality for OPTWVP.


【7】DC-Merge: Improving Model Merging with Directional Consistency
标题:DC-Merge:以方向一致性改进模型合并
链接:https://arxiv.org/abs/2603.06242

作者:Han-Chen Zhang,Zi-Hao Zhou,Mao-Lin Luo,Shimin Di,Min-Ling Zhang,Tong Wei
备注:Accepted by CVPR 2026 Main Track
摘要:模型合并旨在将多个任务适应模型集成到一个统一的模型中,以保留每个任务的知识。在本文中,我们确定,这种知识保留的关键在于保持合并的多任务向量和单个任务向量之间的奇异空间的方向一致性。然而,这种一致性经常受到两个问题的损害:i)任务向量内的不平衡能量分布,其中一小部分奇异值支配总能量,导致在合并时忽略语义上重要但较弱的分量,以及ii)参数空间中任务向量的几何不一致性,这导致直接合并扭曲其底层方向几何。为了解决这些挑战,我们提出了DC-Merge,一种方向一致的模型合并方法。它首先通过平滑其奇异值来平衡每个任务向量的能量分布,确保所有知识分量都得到充分表示。然后将这些能量平衡的向量投影到共享的正交子空间上,以最小的重建误差对齐它们的方向几何。最后,对齐的向量在共享正交子空间中聚合并投影回原始参数空间。对视觉和视觉语言基准测试的广泛实验表明,DC-Merge在完全微调和LoRA设置中始终实现最先进的性能。实现代码可在https://github.com/Tobeginwith/DC-Merge上获得。
摘要:Model merging aims to integrate multiple task-adapted models into a unified model that preserves the knowledge of each task. In this paper, we identify that the key to this knowledge retention lies in maintaining the directional consistency of singular spaces between merged multi-task vector and individual task vectors. However, this consistency is frequently compromised by two issues: i) an imbalanced energy distribution within task vectors, where a small fraction of singular values dominate the total energy, leading to the neglect of semantically important but weaker components upon merging, and ii) the geometric inconsistency of task vectors in parameter space, which causes direct merging to distort their underlying directional geometry. To address these challenges, we propose DC-Merge, a method for directional-consistent model merging. It first balances the energy distribution of each task vector by smoothing its singular values, ensuring all knowledge components are adequately represented. These energy-balanced vectors are then projected onto a shared orthogonal subspace to align their directional geometries with minimal reconstruction error. Finally, the aligned vectors are aggregated in the shared orthogonal subspace and projected back to the original parameter space. Extensive experiments on vision and vision-language benchmarks show that DC-Merge consistently achieves state-of-the-art performance in both full fine-tuning and LoRA settings. The implementation code is available at https://github.com/Tobeginwith/DC-Merge.


【8】Efficient Vector Search in the Wild: One Model for Multi-K Queries
标题:野外有效的载体搜索:多K搜索的一种模型
链接:https://arxiv.org/abs/2603.06159

作者:Yifan Peng,Jiafei Fan,Xingda Wei,Sijie Shen,Rong Chen,Jianning Wang,Xiaojian Luo,Wenyuan Yu,Jingren Zhou,Haibo Chen
摘要:学习top-K搜索是一种很有前途的向量查询服务方法,具有高精度和高性能。然而,目前针对特定K值训练的模型无法推广到现实世界的多K查询:它们遭受准确性下降(对于较大的Ks)和性能损失(对于较小的Ks)。训练模型以在不同的Ks上泛化需要多几个数量级的预处理时间,并且不适合在野外服务向量查询。我们提出了OMEGA,一个K-泛化学习的top-K搜索方法,同时实现高精度,高性能和低预处理成本的多K向量查询。关键的想法是,使用我们基于概率的特征在K=1上正确训练的基础模型可以用于通过动态细化过程准确预测较大的Ks,并以最小的性能损失准确预测较小的Ks。为了使我们的改进有效,我们进一步利用top-K搜索的统计特性来减少过多的模型调用。对多个公共和生产数据集的广泛评估表明,在相同的预处理预算下,与最先进的学习搜索方法相比,OMEGA的平均延迟降低了6-33%,而所有系统都实现了相同的召回目标。仅需16-30%的预处理时间,OMEGA就能达到这些基线的最佳平均延迟的1.01- 1.28倍。
摘要 :Learned top-K search is a promising approach for serving vector queries with both high accuracy and performance. However, current models trained for a specific K value fail to generalize to real-world multi-K queries: they suffer from accuracy degradation (for larger Ks) and performance loss (for smaller Ks). Training the model to generalize on different Ks requires orders of magnitude more preprocessing time and is not suitable for serving vector queries in the wild. We present OMEGA, a K-generalizable learned top-K search method that simultaneously achieves high accuracy, high performance, and low preprocessing cost for multi-K vector queries. The key idea is that a base model properly trained on K=1 with our trajectory-based features can be used to accurately predict larger Ks with a dynamic refinement procedure and smaller Ks with minimal performance loss. To make our refinements efficient, we further leverage the statistical properties of top-K searches to reduce excessive model invocations. Extensive evaluations on multiple public and production datasets show that, under the same preprocessing budgets, OMEGA achieves 6-33% lower average latency compared to state-of-the-art learned search methods, while all systems achieve the same recall target. With only 16-30% of the preprocessing time, OMEGA attains 1.01-1.28x of the optimal average latency of these baselines.


【9】Dynamic Momentum Recalibration in Online Gradient Learning
标题:在线梯度学习中的动态动量重新校准
链接:https://arxiv.org/abs/2603.06120

作者:Zhipeng Yao,Rui Yu,Guisong Chang,Ying Li,Yu Zhang,Dazhou Li
备注:Accepted by CVPR 2026
摘要:随机梯度下降(SGD)及其动量变体构成了深度学习优化的支柱,但其梯度行为的潜在动力学仍然没有得到充分的理解。在这项工作中,我们通过信号处理的镜头重新解释梯度更新,并揭示固定的动量系数固有地扭曲了偏差和方差之间的平衡,导致偏斜或次优的参数更新。为了解决这个问题,我们提出了SGDF(SGD with Filter),这是一种受最优线性过滤原理启发的优化器。SGDF计算在线时变增益,通过最小化均方误差来动态优化梯度估计,从而实现噪声抑制和信号保留之间的最佳权衡。此外,我们的方法可以扩展到其他优化器,展示其对优化框架的广泛适用性。在不同的架构和基准测试中进行的大量实验表明,SGDF超越了传统的动量方法,并实现了与最先进的优化器相当或超过最先进的优化器的性能。
摘要:Stochastic Gradient Descent (SGD) and its momentum variants form the backbone of deep learning optimization, yet the underlying dynamics of their gradient behavior remain insufficiently understood. In this work, we reinterpret gradient updates through the lens of signal processing and reveal that fixed momentum coefficients inherently distort the balance between bias and variance, leading to skewed or suboptimal parameter updates. To address this, we propose SGDF (SGD with Filter), an optimizer inspired by the principles of Optimal Linear Filtering. SGDF computes an online, time-varying gain to dynamically refine gradient estimation by minimizing the mean-squared error, thereby achieving an optimal trade-off between noise suppression and signal preservation. Furthermore, our approach could extend to other optimizers, showcasing its broad applicability to optimization frameworks. Extensive experiments across diverse architectures and benchmarks demonstrate SGDF surpasses conventional momentum methods and achieves performance on par with or surpassing state-of-the-art optimizers.


【10】Preventing Learning Stagnation in PPO by Scaling to 1 Million Parallel Environments
标题:通过扩展到100万个并行环境来防止PPO中的学习停滞
链接:https://arxiv.org/abs/2603.06009

作者:Michael Beukman,Khimya Khetarpal,Zeyu Zheng,Will Dabney,Jakob Foerster,Michael Dennis,Clare Lyle
摘要:高原,代理的性能停滞在一个次优的水平,是一个常见的问题,在深的政策RL。由于PPO的广泛采用,我们将重点放在PPO上,表明某些制度中的高原不是因为已知的探索,容量或优化挑战,而是因为基于样本的损失估计最终成为训练过程中真实目标的不良代理。作为概述,PPO使用当前策略在线切换来自多个并行环境的采样展开(我们称之为外循环),并针对此离线数据集执行重复的小批量SGD步骤(内循环)。在我们的工作中,我们只考虑外循环,并在概念上将其建模为随机优化。然后,通过针对先前策略的正则化强度来控制步长,并且通过在策略更新步骤之间收集的样本的数量来控制梯度噪声。该模型预测,如果外部步长相对于噪声太大,则性能将稳定在次优水平。从这个角度重新审视PPO可以清楚地看到,有两种方法可以解决这种特殊类型的学习停滞:要么减少步长,要么增加更新之间收集的样本数量。我们首先验证我们的模型的预测,并研究超参数的选择如何影响步长和更新噪声,得出的结论是,增加并行环境的数量是一个简单而强大的方法来减少这两个因素。接下来,我们提出了一个如何在增加并行化时共同扩展其他超参数的方法,并表明不正确地这样做会导致严重的性能下降。最后,我们通过将PPO扩展到超过100万个并行环境,在复杂的开放式领域中大大超越了先前的基线,从而实现了高达一万亿次转换的单调性能改进。
摘要:Plateaus, where an agent's performance stagnates at a suboptimal level, are a common problem in deep on-policy RL. Focusing on PPO due to its widespread adoption, we show that plateaus in certain regimes arise not because of known exploration, capacity, or optimization challenges, but because sample-based estimates of the loss eventually become poor proxies for the true objective over the course of training. As a recap, PPO switches between sampling rollouts from several parallel environments online using the current policy (which we call the outer loop) and performing repeated minibatch SGD steps against this offline dataset (the inner loop). In our work we consider only the outer loop, and conceptually model it as stochastic optimization. The step size is then controlled by the regularization strength towards the previous policy and the gradient noise by the number of samples collected between policy update steps. This model predicts that performance will plateau at a suboptimal level if the outer step size is too large relative to the noise. Recasting PPO in this light makes it clear that there are two ways to address this particular type of learning stagnation: either reduce the step size or increase the number of samples collected between updates. We first validate the predictions of our model and investigate how hyperparameter choices influence the step size and update noise, concluding that increasing the number of parallel environments is a simple and robust way to reduce both factors. Next, we propose a recipe for how to co-scale the other hyperparameters when increasing parallelization, and show that incorrectly doing so can lead to severe performance degradation. Finally, we vastly outperform prior baselines in a complex open-ended domain by scaling PPO to more than 1M parallel environments, thereby enabling monotonic performance improvement up to one trillion transitions.


【11】Implicit Style Conditioning: A Structured Style-Rewrite Framework for Low-Resource Character Modeling
标题:隐性风格条件反射:低资源角色建模的结构化风格重写框架
链接:https://arxiv.org/abs/2603.05933

作者:Chanhui Zhu
备注:26 pages, 4 figures. Preprint
摘要:大型语言模型(LLM)在角色扮演(RP)方面表现出了令人印象深刻的能力;然而,由于数据稀缺和风格分解的复杂性,具有高度风格化人物角色的小型语言模型(SLM)仍然是一个挑战。标准监督微调(SFT)往往捕捉表面层次的语义,而无法再现复杂的句法和语用细微差别的字符,导致“出字符”(OOC)的生成。为了解决这个问题,我们提出了一个结构化的风格重写框架,明确地解开风格到三个可解释的方面:词汇签名(通过PMI),句法模式(接地PCFG规则),和务实的风格。此外,我们通过思想链(CoT)蒸馏引入了一种隐式风格调节策略。通过在训练过程中利用显式推理痕迹作为强烈的归纳偏差,我们的方法将模型的潜在表示与结构化风格特征对齐,从而实现高保真风格化生成,而无需在推理过程中使用显式推理令牌。在特定的高风格化域(动漫角色)上的大量实验表明,我们的方法使Qwen-1.7B模型的性能明显优于更大的基线(例如,4 B Vanilla SFT)在风格一致性和语义忠实性方面的差异。我们的方法为消费者硬件上的民主化推理和部署提供了一个数据高效的范例。
摘要 :Large Language Models (LLMs) have demonstrated impressive capabilities in role-playing (RP); however, small Language Models (SLMs) with highly stylized personas remains a challenge due to data scarcity and the complexity of style disentanglement. Standard Supervised Fine-Tuning (SFT) often captures surface-level semantics while failing to reproduce the intricate syntactic and pragmatic nuances of a character, leading to "Out-Of-Character" (OOC) generation. To address this, we propose a Structured Style-Rewrite Framework that explicitly disentangles style into three interpretable dimensions: lexical signatures (via PMI), syntactic patterns (grounded in PCFG rules), and pragmatic style. Furthermore, we introduce an implicit style conditioning strategy via Chain-of-Thought (CoT) distillation. By leveraging explicit reasoning traces during training as a strong inductive bias, our approach aligns the model's latent representations with structured style features, enabling high-fidelity stylized generation without requiring explicit reasoning tokens during inference. Extensive experiments on a specific high-stylization domain (anime characters) demonstrate that our method enables a Qwen-1.7B model to outperform significantly larger baselines (e.g., 4B Vanilla SFT) in style consistency and semantic fidelity. Our approach offers a data-efficient paradigm for democratizing inference and deployment on consumer hardware.


【12】Weak-SIGReg: Covariance Regularization for Stable Deep Learning
标题:Weak-SIGReg:稳定深度学习的协方差正规化
链接:https://arxiv.org/abs/2603.05924

作者:Habibullah Akbar
备注:Accepted at GRaM workshop (ICLR 2026). Code & supplementary: https://github.com/kreasof-ai/sigreg
摘要:现代神经网络优化在很大程度上依赖于架构先验,如批量归一化和残差连接,以稳定训练动态。如果没有这些,或者在具有积极增强的低数据状态下,像Vision Transformers(ViT)这样的低偏置架构通常会遭受优化崩溃。这项工作采用了最近在LeJEPA自监督框架中引入的Sketched Isotropic Gaussian Regularization(SIGReg),并将其重新用作监督学习的通用优化稳定器。虽然原始公式针对完整的特征函数,但推导出计算效率高的变体Weak-SIGReg,其通过随机草图针对协方差矩阵。受相互作用粒子系统的启发,表示崩溃被视为随机漂移; SIGReg将表示密度约束为各向同性高斯,减轻这种漂移。从经验上讲,SIGReg在CIFAR-100上将ViT的训练从崩溃的20.73%恢复到72.02%的准确性,而无需架构黑客,并显着提高了使用纯SGD训练的深度香草MLP的收敛性。代码可以在\href{https://github.com/kreasof-ai/sigreg}{github.com/kreasof-ai/sigreg}上找到。
摘要:Modern neural network optimization relies heavily on architectural priorssuch as Batch Normalization and Residual connectionsto stabilize training dynamics. Without these, or in low-data regimes with aggressive augmentation, low-bias architectures like Vision Transformers (ViTs) often suffer from optimization collapse. This work adopts Sketched Isotropic Gaussian Regularization (SIGReg), recently introduced in the LeJEPA self-supervised framework, and repurposes it as a general optimization stabilizer for supervised learning. While the original formulation targets the full characteristic function, a computationally efficient variant is derived, Weak-SIGReg, which targets the covariance matrix via random sketching. Inspired by interacting particle systems, representation collapse is viewed as stochastic drift; SIGReg constrains the representation density towards an isotropic Gaussian, mitigating this drift. Empirically, SIGReg recovers the training of a ViT on CIFAR-100 from a collapsed 20.73\% to 72.02\% accuracy without architectural hacks and significantly improves the convergence of deep vanilla MLPs trained with pure SGD. Code is available at \href{https://github.com/kreasof-ai/sigreg}{github.com/kreasof-ai/sigreg}.


【13】Sparse Crosscoders for diffing MoEs and Dense models
标题:适用于不同MoE和密集模型的稀疏交叉点
链接:https://arxiv.org/abs/2603.05805

作者:Marmik Chaudhari,Nishkal Hundia,Idhant Gulati
备注:5 pages, 3 figures
摘要:混合专家(MoE)通过稀疏专家路由实现参数有效的缩放,但与密集模型相比,其内部表示仍然知之甚少。我们提出了一个系统的比较MoE和密集的模型内部使用crosscoders,稀疏自编码器的变体,共同模拟多个激活空间。我们跨代码、科学文本和英语故事在1B令牌上训练5层密集和MoE(相等的活动参数)。使用BatchTopK crosscoders与明确指定的共享功能,我们实现了$\sim 87\%$的分数方差解释和发现具体的差异,在功能组织。与密集模型相比,MoE学习显著更少的独特特征。MoE特定特征也表现出比共享特征更高的激活密度,而密度特定特征显示出更低的密度。我们的分析表明,MoEs开发了更专业,更集中的表示,而密集的模型将信息分布在更广泛,更通用的功能上。
摘要:Mixture of Experts (MoE) achieve parameter-efficient scaling through sparse expert routing, yet their internal representations remain poorly understood compared to dense models. We present a systematic comparison of MoE and dense model internals using crosscoders, a variant of sparse autoencoders, that jointly models multiple activation spaces. We train 5-layer dense and MoEs (equal active parameters) on 1B tokens across code, scientific text, and english stories. Using BatchTopK crosscoders with explicitly designated shared features, we achieve $\sim 87\%$ fractional variance explained and uncover concrete differences in feature organization. The MoE learns significantly fewer unique features compared to the dense model. MoE-specific features also exhibit higher activation density than shared features, whereas dense-specific features show lower density. Our analysis reveals that MoEs develop more specialized, focused representations while dense models distribute information across broader, more general-purpose features.


【14】Bridging Domains through Subspace-Aware Model Merging
标题:通过子空间感知模型合并桥梁领域
链接:https://arxiv.org/abs/2603.05768

作者:Levy Chaves,Chao Zhou,Rebekka Burkholz,Eduardo Valle,Sandra Avila
备注:Accepted at the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2026 (Main Track)
摘要:模型合并将多个特定于任务的模型集成到一个统一的模型中。最近的研究在提高分布式或多任务场景下的合并性能方面取得了进展,但模型合并中的域泛化仍然没有得到充分的研究。我们研究如何合并模型微调不同的领域影响泛化到看不见的领域。通过使用奇异值分解分析任务矩阵中的参数竞争,我们表明,与传统的多任务设置相比,在不同分布偏移下训练的合并模型会引起其子空间之间更强的冲突。为了缓解这个问题,我们提出了SCORE(子空间冲突解决mErging),一种旨在缓解这种奇异子空间冲突的方法。SCORE通过计算所有模型的级联前导奇异向量的主成分来找到共享正交基。然后将每个任务矩阵投影到共享基中,修剪非对角分量以去除冲突的奇异方向。SCORE在各种架构和模型规模的领域泛化设置中的平均性能始终优于现有的模型合并方法,证明了其有效性和可扩展性。
摘要:Model merging integrates multiple task-specific models into a single consolidated one. Recent research has made progress in improving merging performance for in-distribution or multi-task scenarios, but domain generalization in model merging remains underexplored. We investigate how merging models fine-tuned on distinct domains affects generalization to unseen domains. Through an analysis of parameter competition in the task matrix using singular value decomposition, we show that merging models trained under different distribution shifts induces stronger conflicts between their subspaces compared to traditional multi-task settings. To mitigate this issue, we propose SCORE (Subspace COnflict-Resolving mErging), a method designed to alleviate such singular subspace conflicts. SCORE finds a shared orthogonal basis by computing the principal components of the concatenated leading singular vectors of all models. It then projects each task matrix into the shared basis, pruning off-diagonal components to remove conflicting singular directions. SCORE consistently outperforms, on average, existing model merging approaches in domain generalization settings across a variety of architectures and model scales, demonstrating its effectiveness and scalability.


【15】Warm Starting State-Space Models with Automata Learning
标题:具有自动机学习的热启动状态空间模型
链接:https://arxiv.org/abs/2603.05694

作者:William Fishell,Sam Nicholas Kouteili,Mark Santolucito
摘要 :我们证明了摩尔机可以完全实现为状态空间模型(SSM),建立了符号自动机和这些连续机器学习架构之间的正式对应。这些Moore-SSM保留了原始Moore机的完整符号结构和输入输出行为,但在欧几里得空间中运行。通过这种对应关系,我们将SSM的训练与被动和主动自动机学习进行了比较。在从SYNTCOMP基准测试中恢复自动机时,我们表明SSM需要比符号方法多几个数量级的数据,并且无法学习状态结构。这表明符号结构为学习这些系统提供了强烈的归纳偏见。我们利用这种洞察力结合自动机学习和SSM的优势,以有效地学习复杂系统。我们从SYNTCOMP学习了一套仲裁器的自适应仲裁策略,并表明使用符号学习的近似值初始化SSM学习得更快,更好。与随机初始化模型相比,我们看到收敛速度快2 - 5倍,并且测试数据的整体模型准确度更高。我们的工作将自动机学习从纯粹的离散空间中提升出来,使连续域中的符号结构能够在复杂环境中有效学习。
摘要:We prove that Moore machines can be exactly realized as state-space models (SSMs), establishing a formal correspondence between symbolic automata and these continuous machine learning architectures. These Moore-SSMs preserve both the complete symbolic structure and input-output behavior of the original Moore machine, but operate in Euclidean space. With this correspondence, we compare the training of SSMs with both passive and active automata learning. In recovering automata from the SYNTCOMP benchmark, we show that SSMs require orders of magnitude more data than symbolic methods and fail to learn state structure. This suggests that symbolic structure provides a strong inductive bias for learning these systems. We leverage this insight to combine the strengths of both automata learning and SSMs in order to learn complex systems efficiently. We learn an adaptive arbitration policy on a suite of arbiters from SYNTCOMP and show that initializing SSMs with symbolically-learned approximations learn both faster and better. We see 2-5 times faster convergence compared to randomly initialized models and better overall model accuracies on test data. Our work lifts automata learning out of purely discrete spaces, enabling principled exploitation of symbolic structure in continuous domains for efficiently learning in complex settings.


【16】On the Value of Tokeniser Pretraining in Physics Foundation Models
标题:物理基础模型中代币预训练的价值
链接:https://arxiv.org/abs/2603.05598

作者:Hadi Sotoudeh,Payel Mukhopadhyay,Ruben Ohana,Michael McCabe,Neil D. Lawrence,Shirley Ho,Miles Cranmer
备注:16 pages, 4 figures. Workshop paper at ICLR 2026 AI & PDE
摘要:我们研究了标记器预训练对物理仿真的准确性和效率的影响。现代高分辨率模拟产生了跨越不同物理机制和尺度的大量数据。训练基础模型来学习这些数据背后的动力学,可以对复杂的多物理场现象进行建模,特别是在数据有限的情况下。新兴的物理基础模型类通常旨在联合学习两项任务:(i)提取高分辨率时空数据的紧凑表示,以及(ii)捕获支配物理动力学。然而,同时从头开始学习这两项任务可能会阻碍任何一个过程的有效性。我们证明,在训练动力学模型之前,使用自动编码目标预训练令牌化器可以提高下游任务的计算效率。值得注意的是,这种好处的大小取决于域对齐:在与下游任务相同的物理系统上进行预训练会产生最大的改进,而在其他系统上进行预训练会提供适度的收益。与从头开始训练相比,在10,500个训练步骤后,域内预训练将VRMSE降低了64%。据我们所知,这是第一次对物理基础模型的标记器预训练进行系统调查。我们进一步引入灵活的时空压缩操作,扩展因果卷积,以支持运行时可调的压缩比,使不同的下游任务的有效适应。我们的研究结果为训练高效的物理仿真器提供了实际指导,并强调了策略性预训练数据选择的重要性。
摘要:We investigate the impact of tokeniser pretraining on the accuracy and efficiency of physics emulation. Modern high-resolution simulations produce vast volumes of data spanning diverse physical regimes and scales. Training foundation models to learn the dynamics underlying such data enables the modelling of complex multiphysics phenomena, especially in data-limited settings. The emerging class of physics foundation models typically aims to learn two tasks jointly: (i) extracting compact representations of high-resolution spatiotemporal data, and (ii) capturing governing physical dynamics. However, learning both tasks from scratch simultaneously can impede the effectiveness of either process. We demonstrate that pretraining the tokeniser with an autoencoding objective prior to training the dynamics model enhances computational efficiency for downstream tasks. Notably, the magnitude of this benefit depends on domain alignment: pretraining on the same physical system as the downstream task yields the largest improvements, while pretraining on other systems provides moderate gains. In-domain pretraining reduces VRMSE by 64% after 10,500 training steps compared to training from scratch. To our knowledge, this is the first systematic investigation of tokeniser pretraining for physics foundation models. We further introduce flexible spatiotemporal compression operations that extend causal convolutions to support runtime-adjustable compression ratios, enabling efficient adaptation to diverse downstream tasks. Our findings provide practical guidance for training efficient physics emulators and highlight the importance of strategic pretraining data selection.


【17】Bias In, Bias Out? Finding Unbiased Subnetworks in Vanilla Models
标题:偏见进来,偏见出去?在香草模型中寻找无偏子网络
链接:https://arxiv.org/abs/2603.05582

作者:Ivan Luiz De Moura Matos,Abdel Djalil Sad Saoud,Ekaterina Iakovleva,Vito Paolo Pastore,Enzo Tartaglione
备注:This work has been accepted for publication at the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026
摘要:深度学习中的算法偏差问题导致了各种去偏差技术的发展,其中许多技术执行复杂的训练过程或数据集操作。然而,一个有趣的问题出现了:是否有可能从标准的香草训练模型中提取公平和偏见不可知的子网络,而不依赖于额外的数据,如无偏见的训练集?在这项工作中,我们引入了偏置不变子网络提取(BISE),这是一种学习策略,可以识别和隔离传统训练模型中已经存在的“无偏置”子网络,而无需重新训练或微调原始参数。我们的方法表明,这样的子网络可以通过修剪来提取,并且可以在不修改的情况下运行,有效地减少对有偏见的特征的依赖,并保持稳健的性能。我们的研究结果有助于通过参数删除对预训练神经网络进行结构调整来有效地减轻偏差,而不是以数据为中心或涉及(重新)训练所有模型参数的昂贵策略。在常见基准上进行的大量实验表明,我们的方法在所得到的去偏模型的性能和计算效率方面具有优势。
摘要:The issue of algorithmic biases in deep learning has led to the development of various debiasing techniques, many of which perform complex training procedures or dataset manipulation. However, an intriguing question arises: is it possible to extract fair and bias-agnostic subnetworks from standard vanilla-trained models without relying on additional data, such as unbiased training set? In this work, we introduce Bias-Invariant Subnetwork Extraction (BISE), a learning strategy that identifies and isolates "bias-free" subnetworks that already exist within conventionally trained models, without retraining or finetuning the original parameters. Our approach demonstrates that such subnetworks can be extracted via pruning and can operate without modification, effectively relying less on biased features and maintaining robust performance. Our findings contribute towards efficient bias mitigation through structural adaptation of pre-trained neural networks via parameter removal, as opposed to costly strategies that are either data-centric or involve (re)training all model parameters. Extensive experiments on common benchmarks show the advantages of our approach in terms of the performance and computational efficiency of the resulting debiased model.


【18】Autocorrelation effects in a stochastic-process model for decision making via time series
标题:时间序列决策随机过程模型中的自相关效应
链接:https://arxiv.org/abs/2603.05559

作者:Tomoki Yamagami,Mikio Hasegawa,Takatomo Mihana,Ryoichi Horisaki,Atsushi Uchida
备注:21 pages, 10 figures
摘要:决策者利用半导体激光器获得的光子混沌动力学提供了一种超快的方法来解决多臂强盗问题,通过使用时间的光信号作为驱动源的顺序决策。在这样的系统中,混沌波形的采样间隔形状所得到的时间序列的时间相关性,和实验报告,决策精度强烈依赖于这种自相关属性。然而,目前还不清楚自相关的好处是否可以用一个最小的数学模型来解释。在这里,我们分析了一个随机过程模型的时间序列为基础的决策使用拔河的原则,解决两个武装土匪问题,其中的阈值和两个值的马尔可夫信号共同演变。数值结果揭示了一个环境依赖的结构:负(正)自相关是最佳的奖励丰富(奖励穷人)的环境。这些发现表明,时间序列的负自相关是有利的,当获胜概率的总和大于1 $,而正自相关是有用的,当获胜概率的总和小于1 $。此外,性能是独立的自相关,如果获胜概率的总和等于1 $,这是数学澄清。这项研究为改进无线通信和机器人技术中强化学习应用的决策方案铺平了道路。
摘要 :Decision makers exploiting photonic chaotic dynamics obtained by semiconductor lasers provide an ultrafast approach to solving multi-armed bandit problems by using a temporal optical signal as the driving source for sequential decisions. In such systems, the sampling interval of the chaotic waveform shapes the temporal correlation of the resulting time series, and experiments have reported that decision accuracy depends strongly on this autocorrelation property. However, it remains unclear whether the benefit of autocorrelation can be explained by a minimal mathematical model. Here, we analyze a stochastic-process model of the time-series-based decision making using the tug-of-war principle for solving the two-armed bandit problem, where the threshold and a two-valued Markov signal evolve jointly. Numerical results reveal an environment-dependent structure: negative (positive) autocorrelation is optimal in reward-rich (reward-poor) environments. These findings show that negative autocorrelation of the time series is advantageous when the sum of the winning probabilities is more than $1$, whereas positive autocorrelation is useful when the sum of the winning probabilities is less than $1$. Moreover, the performance is independent of autocorrelation if the sum of the winning probabilities equals $1$, which is mathematically clarified. This study paves the way for improving the decision-making scheme for reinforcement learning applications in wireless communications and robotics.


【19】IntSeqBERT: Learning Arithmetic Structure in OEIS via Modulo-Spectrum Embeddings
标题:IntSeqBERT:通过模谱嵌入在OOIS中学习算术结构
链接:https://arxiv.org/abs/2603.05556

作者:Kazuhisa Nakasho
摘要:OEIS中的递归序列涵盖了从个位数常量到天文因子和指数的值,这使得标准标记化模型的预测具有挑战性,因为这些模型无法处理词汇表外的值或利用周期性算术结构。我们提出了IntSeqBERT,一个双流Transformer编码器的掩码整数序列建模OEIS。每个序列元素沿着两个互补轴编码:连续的对数尺度幅度嵌入和100个残基的sin/cos模嵌入(模2 $-101 $),通过薄膜融合。三个预测头(幅度回归,符号分类和100模数的模数预测)在274,705 OEIS序列上联合训练。在大规模(91.5M参数)下,IntSeqBERT在测试集上实现了95.85%的幅度准确度和50.38%的平均模数准确度(MMA),分别比标准标记化Transformer基线高出$+8.9$ pt和$+4.5$ pt。去除模流的烧蚀证实了它占MMA增益的$+15.2$ pt,并为幅度精度贡献了额外的$+6.2$ pt。基于概率中国剩余定理(CRT)的求解器将模型的预测转换为具体的整数,与令牌化的Transformer基线相比,下一项预测提高了7.4倍(Top-1:19.09% vs. 2.59%)。模谱分析揭示了归一化信息增益(NIG)与欧拉比值(r =-0.851$,p < 10^{-28}$)之间的强负相关性,为复合模通过CRT聚合更有效地捕获OEIS算术结构提供了经验证据。
摘要:Integer sequences in the OEIS span values from single-digit constants to astronomical factorials and exponentials, making prediction challenging for standard tokenised models that cannot handle out-of-vocabulary values or exploit periodic arithmetic structure. We present IntSeqBERT, a dual-stream Transformer encoder for masked integer-sequence modelling on OEIS. Each sequence element is encoded along two complementary axes: a continuous log-scale magnitude embedding and sin/cos modulo embeddings for 100 residues (moduli $2$--$101$), fused via FiLM. Three prediction heads (magnitude regression, sign classification, and modulo prediction for 100 moduli) are trained jointly on 274,705 OEIS sequences. At the Large scale (91.5M parameters), IntSeqBERT achieves 95.85% magnitude accuracy and 50.38% Mean Modulo Accuracy (MMA) on the test set, outperforming a standard tokenised Transformer baseline by $+8.9$ pt and $+4.5$ pt, respectively. An ablation removing the modulo stream confirms it accounts for $+15.2$ pt of the MMA gain and contributes an additional $+6.2$ pt to magnitude accuracy. A probabilistic Chinese Remainder Theorem (CRT)-based Solver converts the model's predictions into concrete integers, yielding a 7.4-fold improvement in next-term prediction over the tokenised-Transformer baseline (Top-1: 19.09% vs. 2.59%). Modulo spectrum analysis reveals a strong negative correlation between Normalised Information Gain (NIG) and Euler's totient ratio $\varphi(m)/m$ ($r = -0.851$, $p < 10^{-28}$), providing empirical evidence that composite moduli capture OEIS arithmetic structure more efficiently via CRT aggregation.


【20】Semantics-Aware Caching for Concept Learning
标题:概念学习的语义感知缓存
链接:https://arxiv.org/abs/2603.06506

作者:Louis Mozart Kamdem Teyou,Caglar Demir,Axel-Cyrille Ngonga Ngomo
摘要:概念学习是一种有监督的机器学习形式,它在描述逻辑中的知识库上运行。最先进的概念学习器通常依赖于通过可数无限概念空间的迭代搜索。在每次迭代中,他们检索候选解决方案的实例,为下一次迭代选择最佳概念。虽然简单的学习问题可能需要几十个实例检索调用来找到合适的解决方案,但复杂的学习问题可能需要数千个调用。我们减轻由此产生的运行时的挑战,提出了一个语义感知的缓存方法。我们的缓存本质上是一个包容感知的映射,通过清晰的集合操作将概念链接到一组实例。我们在5个数据集上进行的实验表明,我们的缓存可以减少概念检索和概念学习的运行时间,同时对符号和神经符号推理都有效。
摘要:Concept learning is a form of supervised machine learning that operates on knowledge bases in description logics. State-of-the-art concept learners often rely on an iterative search through a countably infinite concept space. In each iteration, they retrieve instances of candidate solutions to select the best concept for the next iteration. While simple learning problems might require a few dozen instance retrieval calls to find a fitting solution, complex learning problems might necessitate thousands of calls. We alleviate the resulting runtime challenge by presenting a semantics-aware caching approach. Our cache is essentially a subsumption-aware map that links concepts to a set of instances via crisp set operations. Our experiments on 5 datasets with 4 symbolic reasoners, a neuro-symbolic reasoner, and 5 popular pagination policies demonstrate that our cache can reduce the runtime of concept retrieval and concept learning by an order of magnitude while being effective for both symbolic and neuro-symbolic reasoners.


【21】Quantum Diffusion Models: Score Reversal Is Not Free in Gaussian Dynamics
标题:量子扩散模型:高斯动力学中分数递减不是自由的
链接:https://arxiv.org/abs/2603.06488

作者:Ammar Fayad
摘要:基于扩散的生成建模建议通过添加分数漂移来反转噪声半群。对于连续变量高斯马尔可夫动力学,完全正耦合漂移和扩散在发电机水平。对于具有热参数$ν$和压缩$r$的量子限制衰减器,固定扩散Wigner-score(Bayes)反向漂移违反CP当且仅当$\cosh(2 r)>ν$。任何高斯CP修复都必须注入额外的扩散,这意味着$-2\ln F\ge c_{\text{geom}}(v_{\min})I_{\mathrm{dec}}^{\mathrm{wc}}$。
摘要:Diffusion-based generative modeling suggests reversing a noising semigroup by adding a score drift. For continuous-variable Gaussian Markov dynamics, complete positivity couples drift and diffusion at the generator level. For a quantum-limited attenuator with thermal parameter $ν$ and squeezing $r$, the fixed-diffusion Wigner-score (Bayes) reverse drift violates CP iff $\cosh(2r)>ν$. Any Gaussian CP repair must inject extra diffusion, implying $-2\ln F\ge c_{\text{geom}}(ν_{\min})I_{\mathrm{dec}}^{\mathrm{wc}}$.


【22】Behavior-dLDS: A decomposed linear dynamical systems model for neural activity partially constrained by behavior
标题:行为dLDS:部分受行为约束的神经活动的分解线性动力系统模型
链接:https://arxiv.org/abs/2603.05612

作者:Eva Yezerets,En Yang,Misha B. Ahrens,Adam S. Charles
摘要 :大规模神经元网络的全脑记录现在为大脑如何驱动行为提供了前所未有的视角。然而,大脑活动既包含与行为直接相关的信息,也包含许多内部计算的潜力。此外,可观察到的行为不仅由大脑执行,还由脊髓和外周神经系统执行。行为是神经活动的粗粒度产物,因此我们认为它可以最好地由低维潜在神经动力学来表示。捕捉这种间接关系,同时消除行为生成网络与并行运行的内部计算之间的歧义,需要新的建模方法,这些方法可以体现大规模神经种群的并行和分布式特性。因此,我们提出了行为分解的线性动力系统(b-dLDS)解开同时记录的子系统,并确定如何潜在的神经子系统与行为。我们展示了b-dLDS在受控模拟数据上解耦行为与内部计算的能力,显示了对最先进模型的改进,该模型使用行为来监督基于行为的所有动态。然后,我们表明,b-dLDS可以进一步扩大到数万个神经元,通过应用我们的模型大规模记录的斑马鱼后脑在复杂的位置稳态行为,其中b-dLDS突出行为相关的动态连接网络。
摘要:Brain-wide recordings of large-scale networks of neurons now provide an unprecedented view into how the brain drives behavior. However, brain activity contains both information directly related to behavior as well as the potential for many internal computations. Moreover, observable behavior is executed not only by the brain, but also by the spinal cord and peripheral nervous system. Behavior is a coarse-grained product of neural activity, and we thus take the view that it can be best represented by lower-dimensional latent neural dynamics. Capturing this indirect relationship while disambiguating behavior-generating networks from internal computations running in parallel requires new modeling approaches that can embody the parallel and distributed nature of large-scale neural populations. We thus present behavior-decomposed linear dynamical systems (b-dLDS) to disentangle simultaneously recorded subsystems and identify how the latent neural subsystems relate to behavior. We demonstrate the ability of b-dLDS to decouple behavioral vs. internal computations on controlled, simulated data, showing improvements over a state-of-the-art model that uses behavior to supervise all dynamics based on behavior. We then show that b-dLDS can further scale up to tens of thousands of neurons by applying our model to large-scale recording of a zebrafish hindbrain during the complex positional homeostasis behavior, wherein b-dLDS highlights behavior-related dynamic connectivity networks.


其他(32篇)

【1】A recipe for scalable attention-based MLIPs: unlocking long-range accuracy with all-to-all node attention
标题:可扩展的基于注意力的MLIP配方:通过全对全的节点注意力释放长期准确性
链接:https://arxiv.org/abs/2603.06567

作者:Eric Qu,Brandon M. Wood,Aditi S. Krishnapriyan,Zachary W. Ulissi
摘要:机器学习原子间势(MLIP)发展迅速,许多顶级模型依赖于基于物理的强归纳偏差。然而,随着模型扩展到更大的系统,如生物分子和电解质,它们很难准确地捕获长程(LR)相互作用,导致目前的方法依赖于明确的基于物理的术语或组件。在这项工作中,我们提出了AllScAIP,一个简单的,基于注意力的,节能的MLIP模型,可扩展到O(1亿)训练样本。它使用数据驱动的全对全节点关注组件来解决长期挑战。广泛的消融表明,在低数据/小模型制度,归纳偏差提高样本效率。然而,随着数据和模型规模的扩大,这些好处会减少甚至逆转,而所有人的注意力对于捕获LR相互作用仍然至关重要。我们的模型在分子系统上实现了最先进的能量/力精度,以及许多基于物理的评估(OMol 25),同时在材料(OMat 24)和催化剂(OC 20)上具有竞争力。此外,它可以实现稳定的长时间尺度MD模拟,准确地恢复实验观测值,包括密度和蒸发热预测。
摘要:Machine-learning interatomic potentials (MLIPs) have advanced rapidly, with many top models relying on strong physics-based inductive biases. However, as models scale to larger systems like biomolecules and electrolytes, they struggle to accurately capture long-range (LR) interactions, leading current approaches to rely on explicit physics-based terms or components. In this work, we propose AllScAIP, a straightforward, attention-based, and energy-conserving MLIP model that scales to O(100 million) training samples. It addresses the long-range challenge using an all-to-all node attention component that is data-driven. Extensive ablations reveal that in low-data/small-model regimes, inductive biases improve sample efficiency. However, as data and model size scale, these benefits diminish or even reverse, while all-to-all attention remains critical for capturing LR interactions. Our model achieves state-of-the-art energy/force accuracy on molecular systems, as well as a number of physics-based evaluations (OMol25), while being competitive on materials (OMat24) and catalysts (OC20). Furthermore, it enables stable, long-timescale MD simulations that accurately recover experimental observables, including density and heat of vaporization predictions.


【2】Toward Generative Quantum Utility via Correlation-Complexity Map
标题:通过相关复杂性地图实现生成量子效用
链接:https://arxiv.org/abs/2603.06440

作者:Chen-Yu Liu,Leonardo Placidi,Eric Brunner,Enrico Rinaldi
备注:33 pages, 8 figures
摘要:我们提出了一个相关性复杂度图作为一个实用的诊断工具,用于确定真实世界的数据分布在结构上与IQP型量子生成模型一致。其特征在于两个互补的指标:(i)量子相关性相似性指标(QCLI),其从数据集的相关阶(Walsh-Hadamard/Fourier)功率谱计算,所述功率谱通过相互作用阶聚合,并且经由来自i.i.d.二项式参考;以及(ii)经典相关性复杂性指标(CCI),其被定义为未被最优Chow-Liu树近似捕获的总相关性的分数,由总相关性归一化。我们提供了理论支持QCLI的支持失配机制,固定架构的IQP家庭训练与MMD目标,更高的QCLI意味着一个较小的不可约近似地板。使用该地图,我们确定了经典的湍流数据作为IQP兼容和经典复杂(高QCLI/高CCI)。在这种布局的指导下,我们使用了一个可逆的浮点到位串表示和一个潜在参数自适应方案,该方案通过学习和内插低维潜在轨迹在时间序列上重用紧凑的IQP电路。在与经典模型(如受限玻尔兹曼机(RBM)和深度卷积生成对抗网络(DCGAN))的比较评估中,IQP方法实现了竞争性分布对齐,同时使用了更少的训练快照和小的潜在块,支持使用QCLI/CCI作为定位IQP对齐域和推进生成量子效用的实用指标。
摘要:We propose a Correlation-Complexity Map as a practical diagnostic tool for determining when real-world data distributions are structurally aligned with IQP-type quantum generative models. Characterized by two complementary indicators: (i) a Quantum Correlation-Likeness Indicator (QCLI), computed from the dataset's correlation-order (Walsh-Hadamard/Fourier) power spectrum aggregated by interaction order and quantified via Jensen-Shannon divergence from an i.i.d. binomial reference; and (ii) a Classical Correlation-Complexity Indicator (CCI), defined as the fraction of total correlation not captured by the optimal Chow-Liu tree approximation, normalized by total correlation. We provide theoretical support by relating QCLI to a support-mismatch mechanism, for fixed-architecture IQP families trained with an MMD objective, higher QCLI implies a smaller irreducible approximation floor. Using the map, we identify the classical turbulence data as both IQP-compatible and classically complex (high QCLI/high CCI). Guided by this placement, we use an invertible float-to-bitstring representation and a latent-parameter adaptation scheme that reuses a compact IQP circuit over a temporal sequence by learning and interpolating a low-dimensional latent trajectory. In comparative evaluations against classical models such as Restricted Boltzmann Machine (RBM) and Deep Convolutional Generative Adversarial Networks (DCGAN), the IQP approach achieves competitive distributional alignment while using substantially fewer training snapshots and a small latent block, supporting the use of QCLI/CCI as practical indicators for locating IQP-aligned domains and advancing generative quantum utility.


【3】Efficient, Property-Aligned Fan-Out Retrieval via RL-Compiled Diffusion
标题:基于RL编译扩散的高效属性对齐扇出检索
链接:https://arxiv.org/abs/2603.06397

作者:Pengcheng Jiang,Judith Yue Li,Moonkyung Ryu,R. Lily Hu,Kun Su,Zhong Yi Wan,Liam Hebert,Hao Peng,Jiawei Han,Dima Kuzmin,Craig Boutilier
摘要:许多现代检索问题都是集值的:给定一个广泛的意图,系统必须返回一个优化高阶属性的结果集合   (e.g.,多样性、覆盖面、互补性、一致性),同时保持固定数据库的基础。集值目标通常是   不可分解,并且不被现有的仅优先于前1检索的监督(查询、内容)数据集捕获。因此,扇出   检索通常用于生成不同的子查询以检索项目集。而强化学习(RL)可以通过以下方式优化设定级别目标   交互,部署用于扇出检索的RL调优的LLM在推理时是极其昂贵的。相反,基于扩散的生成   检索使得能够在嵌入空间中进行有效的单遍扇出,但是需要目标对齐的训练目标。为了解决这些问题,我们建议   R4 T(Retrieve-for-训练),其在三步过程中使用RL一次作为目标换能器:(i)训练具有复合集合级别奖励的扇出LLM,   (ii)合成目标一致的训练对,以及(iii)训练轻量级扩散检索器以对集值的条件分布建模   产出在由策划项目集组成的大规模时尚和音乐基准测试中,我们表明R4 T相对于强   同时将查询时扇出延迟降低一个数量级。
摘要 :Many modern retrieval problems are set-valued: given a broad intent, the system must return a collection of results that optimizes higher-order properties   (e.g., diversity, coverage, complementarity, coherence) while remaining grounded with respect to a fixed database. Set-valued objectives are typically   non-decomposable and are not captured by existing supervised (query, content) datasets which only prioritize top-1 retrieval. Consequently, fan-out   retrieval is often employed to generate diverse subqueries to retrieve item sets. While reinforcement learning (RL) can optimize set-level objectives via   interaction, deploying an RL-tuned LLM for fan-out retrieval is prohibitively expensive at inference time. Conversely, diffusion-based generative   retrieval enables efficient single-pass fan-out in embedding space, but requires objective-aligned training targets. To address these issues, we propose   R4T (Retrieve-for-Train), which uses RL once as an objective transducer in a three-step process: (i) train a fan-out LLM with composite set-level rewards,   (ii) synthesize objective-consistent training pairs, and (iii) train a lightweight diffusion retriever to model the conditional distribution of set-valued   outputs. Across large-scale fashion and music benchmarks consisting of curated item sets, we show that R4T improves retrieval quality relative to strong   baselines while reducing query-time fan-out latency by an order of magnitude.


【4】Talk Freely, Execute Strictly: Schema-Gated Agentic AI for Flexible and Reproducible Scientific Workflows
标题:自由交谈,严格执行:方案门控抽象人工智能,实现灵活且可复制的科学工作流程
链接:https://arxiv.org/abs/2603.06394

作者:Joel Strickland,Arjun Vijeta,Chris Moores,Oliwia Bodek,Bogdan Nenchev,Thomas Whitehead,Charles Phillips,Karl Tassenberg,Gareth Conduit,Ben Pellegrini
摘要:大型语言模型(LLM)现在可以将研究人员的纯语言目标转换为可执行的计算,但科学工作流需要确定性,出处和治理,当LLM决定运行什么时,这些都难以保证。对10个工业研发利益相关者的18位专家进行的半结构化访谈显示了2个相互竞争的要求-确定性,受约束的执行和没有工作流刚性的对话灵活性-以及任何解决方案必须满足的边界属性(人在回路控制和透明度)。我们提出模式门编排作为解决原则:模式成为一个强制性的执行边界在组合工作流级别,所以没有运行,除非完整的动作-包括跨步骤的依赖关系-验证对机器可检查的规范。   我们操作的2个要求,执行确定性(ED)和会话的灵活性(CF),并使用这些轴来审查20个系统跨越5个架构组沿验证范围的频谱。分数是通过多模型协议分配的-在3个LLM系列中有15个独立的会议-产生实质上接近完美的模型间协议(Krippendorff a=0.80 ED和a=0.98 CF),表明多模型LLM评分可以作为人类专家小组的可重复使用的替代方案进行架构评估。   由此产生的景观揭示了一个经验的帕累托前沿-没有审查系统同时实现高灵活性和高确定性-但在生成和工作流为中心的极端之间出现了一个收敛区。我们认为,一个模式门控架构,从执行权限分离的对话,定位解耦这种权衡,并提取3个操作原则-执行前的解释,约束计划行动编排,工具到工作流级门控-以指导采用。
摘要:Large language models (LLMs) can now translate a researcher's plain-language goal into executable computation, yet scientific workflows demand determinism, provenance, and governance that are difficult to guarantee when an LLM decides what runs. Semi-structured interviews with 18 experts across 10 industrial R&D stakeholders surface 2 competing requirements--deterministic, constrained execution and conversational flexibility without workflow rigidity--together with boundary properties (human-in-the-loop control and transparency) that any resolution must satisfy. We propose schema-gated orchestration as the resolving principle: the schema becomes a mandatory execution boundary at the composed-workflow level, so that nothing runs unless the complete action--including cross-step dependencies--validates against a machine-checkable specification.   We operationalize the 2 requirements as execution determinism (ED) and conversational flexibility (CF), and use these axes to review 20 systems spanning 5 architectural groups along a validation-scope spectrum. Scores are assigned via a multi-model protocol--15 independent sessions across 3 LLM families--yielding substantial-to-near-perfect inter-model agreement (Krippendorff a=0.80 for ED and a=0.98 for CF), demonstrating that multi-model LLM scoring can serve as a reusable alternative to human expert panels for architectural assessment.   The resulting landscape reveals an empirical Pareto front--no reviewed system achieves both high flexibility and high determinism--but a convergence zone emerges between the generative and workflow-centric extremes. We argue that a schema-gated architecture, separating conversational from execution authority, is positioned to decouple this trade-off, and distill 3 operational principles--clarification-before-execution, constrained plan-act orchestration, and tool-to-workflow-level gating--to guide adoption.


【5】Polarized Direct Cross-Attention Message Passing in GNNs for Machinery Fault Diagnosis
标题:GNN中极化直接交叉注意消息传递用于机械故障诊断
链接:https://arxiv.org/abs/2603.06303

作者:Zongyu Shi,Laibin Zhang,Maoyin Chen
摘要:安全关键型工业系统的可靠性取决于旋转机械中准确和鲁棒的故障诊断。传统的图形神经网络(GNNs)的机械故障诊断面临的局限性,在建模复杂的动态相互作用,由于它们依赖于预定义的静态图结构和同构的聚合方案。为了克服这些挑战,本文介绍了极化直接交叉注意(PolaDCA),一种新的关系学习框架,使自适应消息通过数据驱动的图形结构。我们的方法建立在一个直接的交叉注意(DCA)机制,动态推断注意力权重从三个语义不同的节点功能(如个人特征,邻域共识,邻域多样性),而不需要固定的邻接矩阵。理论分析表明,PolaDCA的噪声鲁棒性优于传统的GNN。在工业数据集上进行的广泛实验(即,XJTUSuprgear、CWRU Bearing和三相流设施数据集)在不同的噪声条件下展示了最先进的诊断准确性和增强的泛化能力,优于七种竞争性基线方法。该框架为安全关键的工业应用提供了一种有效的解决方案。
摘要:The reliability of safety-critical industrial systems hinges on accurate and robust fault diagnosis in rotating machinery. Conventional graph neural networks (GNNs) for machinery fault diagnosis face limitations in modeling complex dynamic interactions due to their reliance on predefined static graph structures and homogeneous aggregation schemes. To overcome these challenges, this paper introduces polarized direct cross-attention (PolaDCA), a novel relational learning framework that enables adaptive message passing through data-driven graph construction. Our approach builds upon a direct cross-attention (DCA) mechanism that dynamically infers attention weights from three semantically distinct node features (such as individual characteristics, neighborhood consensus, and neighborhood diversity) without requiring fixed adjacency matrices. Theoretical analysis establishes PolaDCA's superior noise robustness over conventional GNNs. Extensive experiments on industrial datasets (i.e., XJTUSuprgear, CWRUBearing and Three-Phase Flow Facility datasets) demonstrate state-of-the-art diagnostic accuracy and enhanced generalization under varying noise conditions, outperforming seven competitive baseline methods. The proposed framework provides an effective solution for safety-critical industrial applications.


【6】Stem: Rethinking Causal Information Flow in Sparse Attention
标题:Stem:重新思考稀疏注意力中的因果信息流
链接:https://arxiv.org/abs/2603.06274

作者:Lin Niu,Xin Luo,Linchuan Xie,Yifu Sun,Guanghua Yu,Jianchen Zhu,S Kevin Zhou
备注:12 pages, preprint
摘要:自我注意的二次计算复杂度仍然是将大型语言模型(LLM)扩展到长上下文的根本瓶颈,特别是在预填充阶段。本文从信息流的角度对因果注意机制进行了重新思考。由于因果约束,初始位置的标记参与每个后续标记的聚合。然而,现有的稀疏方法通常在层内的所有令牌位置上应用统一的前k选择,忽略了因果架构中固有的令牌信息的累积依赖性。为了解决这个问题,我们提出了干,一种新的,即插即用稀疏模块对准信息流。首先,Stem采用令牌位置衰减策略,在每个层中应用位置相关的top-k来保留递归依赖的初始令牌。其次,为了保存信息丰富的令牌,Stem利用了输出感知指标。它根据近似的输出幅度对高影响令牌进行优先级排序。广泛的评估表明,干实现卓越的准确性,减少计算和预填充延迟。
摘要 :The quadratic computational complexity of self-attention remains a fundamental bottleneck for scaling Large Language Models (LLMs) to long contexts, particularly during the pre-filling phase. In this paper, we rethink the causal attention mechanism from the perspective of information flow. Due to causal constraints, tokens at initial positions participate in the aggregation of every subsequent token. However, existing sparse methods typically apply a uniform top-k selection across all token positions within a layer, ignoring the cumulative dependency of token information inherent in causal architectures. To address this, we propose Stem, a novel, plug-and-play sparsity module aligned with information flow. First, Stem employs the Token Position-Decay strategy, applying position-dependent top-k within each layer to retain initial tokens for recursive dependencies. Second, to preserve information-rich tokens, Stem utilizes the Output-Aware Metric. It prioritizes high-impact tokens based on approximate output magnitude. Extensive evaluations demonstrate that Stem achieves superior accuracy with reduced computation and pre-filling latency.


【7】Looking Through Glass Box
标题:透过玻璃盒看
链接:https://arxiv.org/abs/2603.06272

作者:Alexis Kafantaris
备注:This is a theoretical framework with some empirical validation
摘要:这篇文章是关于模糊认知图的神经实现,FHM,和相应的评价。首先,一个神经网络被设计成与FCM的行为方式相同;它接受许多模糊认知图作为输入,并传播它们以学习因果关系模式。此外,该网络使用Langevin微分动力学,避免过拟合,逆求解输出节点值根据一定的政策。然而,获得逆解为用户提供了修改标准。拥有修改标准表明,信息现在是根据自由裁量权,因为不同的服务或产品更适合。最后,对几个数据集进行了评估,以检查网络的性能。
摘要:This essay is about a neural implementation of the fuzzy cognitive map, the FHM, and corresponding evaluations. Firstly, a neural net has been designed to behave the same way that an FCM does; as inputs it accepts many fuzzy cognitive maps and propagates them in order to learn causality patterns. Moreover, the network uses langevin differential Dynamics, which avoid overfit, to inverse solve the output node values according to some policy. Nevertheless, having obtained an inverse solution provides the user a modification criterion. Having the modification criterion suggests that information is now according to discretion as a different service or product is a better fit. Lastly, evaluation has been done on several data sets in order to examine the networks performance.


【8】Gradient Flow Polarizes Softmax Outputs towards Low-Entropy Solutions
标题:梯度流将Softmax输出两极分化为低熵解决方案
链接:https://arxiv.org/abs/2603.06248

作者:Aditya Varre,Mark Rofin,Nicolas Flammarion
备注:35 pages, 21 figures
摘要:理解基于softmax模型的复杂非凸训练动态对于解释Transformers的经验成功至关重要。在本文中,我们分析了value-softmax模型的梯度流动力学,定义为${L}(\mathbf{V} σ(\mathbf{a}))$,其中$\mathbf{V}$和$\mathbf{a}$分别是可学习的值矩阵和注意力向量。由于矩阵乘以softmax向量参数化构成了自我注意力的核心构建块,因此我们的分析可以直接洞察Transformer的训练动态。我们揭示了这种结构上的梯度流内在地驱动着以低熵输出为特征的解决方案的优化。我们证明了这种极化效应在各种目标中的普遍性,包括逻辑和平方损失。此外,我们讨论了这些理论结果的实际意义,提供了一个正式的经验现象,如注意力下沉和大规模激活的机制。
摘要:Understanding the intricate non-convex training dynamics of softmax-based models is crucial for explaining the empirical success of transformers. In this article, we analyze the gradient flow dynamics of the value-softmax model, defined as ${L}(\mathbf{V} σ(\mathbf{a}))$, where $\mathbf{V}$ and $\mathbf{a}$ are a learnable value matrix and attention vector, respectively. As the matrix times softmax vector parameterization constitutes the core building block of self-attention, our analysis provides direct insight into transformer's training dynamics. We reveal that gradient flow on this structure inherently drives the optimization toward solutions characterized by low-entropy outputs. We demonstrate the universality of this polarizing effect across various objectives, including logistic and square loss. Furthermore, we discuss the practical implications of these theoretical results, offering a formal mechanism for empirical phenomena such as attention sinks and massive activations.


【9】FedSCS-XGB -- Federated Server-centric surrogate XGBoost for continual health monitoring
标题:FedSCS-XGB --以联邦服务器为中心的代理XGBOP,用于持续的健康状况监控
链接:https://arxiv.org/abs/2603.06224

作者:Felix Walger,Mehdi Ejtehadi,Anke Schmeink,Diego Paez-Granados
备注:Submitted to IEEE EMBC 2026
摘要:具有本地数据处理功能的可穿戴传感器可以及早检测健康威胁,增强文档记录,并支持个性化治疗。在脊髓损伤(SCI)的背景下,涉及压力损伤和血压不稳定等风险,持续监测可以通过早期检测和干预来帮助减轻这些风险。在这项工作中,我们提出了一种新的分布式机器学习(DML)协议,用于基于梯度提升决策树(XGBoost)的可穿戴传感器数据的人类活动识别(HAR)。所提出的架构受到Party-Adaptive XGBoost(PAX)的启发,同时明确保留了标准XGBoost的关键结构和优化特性,包括基于直方图的分裂构造和树系综动态。首先,我们提供了一个理论分析,表明在适当的数据条件和适当的超参数选择下,所提出的分布式协议可以收敛到相当于集中式XGBoost训练的解决方案。其次,该协议是经验性的评估代表性的可穿戴传感器HAR数据集,反映了异构性和数据碎片典型的远程监控方案。对集中式XGBoost和IBM PAX的基准测试表明,理论上的收敛特性在实践中得到了反映。结果表明,该方法可以在保持XGBoost在分布式可穿戴HAR环境中的结构优势的同时,将集中式性能的差距缩小到1%以下。
摘要:Wearable sensors with local data processing can detect health threats early, enhance documentation, and support personalized therapy. In the context of spinal cord injury (SCI), which involves risks such as pressure injuries and blood pressure instability, continuous monitoring can help mitigate these by enabling early deDtection and intervention. In this work, we present a novel distributed machine learning (DML) protocol for human activity recognition (HAR) from wearable sensor data based on gradient-boosted decision trees (XGBoost). The proposed architecture is inspired by Party-Adaptive XGBoost (PAX) while explicitly preserving key structural and optimization properties of standard XGBoost, including histogram-based split construction and tree-ensemble dynamics. First, we provide a theoretical analysis showing that, under appropriate data conditions and suitable hyperparameter selection, the proposed distributed protocol can converge to solutions equivalent to centralized XGBoost training. Second, the protocol is empirically evaluated on a representative wearable-sensor HAR dataset, reflecting the heterogeneity and data fragmentation typical of remote monitoring scenarios. Benchmarking against centralized XGBoost and IBM PAX demonstrates that the theoretical convergence properties are reflected in practice. The results indicate that the proposed approach can match centralized performance up to a gap under 1\% while retaining the structural advantages of XGBoost in distributed wearable-based HAR settings.


【10】Topological descriptors of foot clearance gait dynamics improve differential diagnosis of Parkinsonism
标题:足间隙步态动力学的布局描述符改善帕金森病的鉴别诊断
链接:https://arxiv.org/abs/2603.06212

作者:Jhonathan Barrios,Wolfram Erlhagen,Miguel F. Gago,Estela Bicho,Flora Ferreira
备注:17 pages, 12 figures, Under review
摘要 :帕金森综合征的鉴别诊断仍然是一个临床挑战,由于重叠的运动症状和微妙的步态异常。准确的鉴别诊断对于治疗计划和预后至关重要。虽然步态分析是用于评估运动损伤的成熟方法,但是常规方法经常忽略嵌入在足部间隙模式中的隐藏的非线性和结构特征。我们评估拓扑数据分析(TDA)作为帕金森病分类的补充工具,使用足部间隙时间序列。持久同源性产生贝蒂曲线,持久性景观和轮廓,它们被用作随机森林分类器的特征。该数据集包括15名对照(CO),15名特发性帕金森病(IPD)和14名血管性帕金森综合征(VaP)。采用留一法交叉验证(LOOCV)对模型进行评估。贝蒂曲线描述符始终产生最强的结果。对于IPD与VaP,足部间隙变量最小脚趾间隙、最大脚趾后期摆动和最大脚跟间隙在药物(开)状态下在LOOCV下达到83%的准确度和AUC=0.89。性能在开状态下得到改善,并且在考虑关和开状态时得到进一步改善,表明拓扑特征对左旋多巴相关步态变化的敏感性。这些发现支持将TDA与机器学习相结合,以改善临床步态分析,并帮助对帕金森病进行鉴别诊断。
摘要:Differential diagnosis among parkinsonian syndromes remains a clinical challenge due to overlapping motor symptoms and subtle gait abnormalities. Accurate differentiation is crucial for treatment planning and prognosis. While gait analysis is a well established approach for assessing motor impairments, conventional methods often overlook hidden nonlinear and structural features embedded in foot clearance patterns. We evaluated Topological Data Analysis (TDA) as a complementary tool for Parkinsonism classification using foot clearance time series. Persistent homology produced Betti curves, persistence landscapes, and silhouettes, which were used as features for a Random Forest classifier. The dataset comprised 15 controls (CO), 15 idiopathic Parkinson's disease (IPD), and 14 vascular Parkinsonism (VaP). Models were assessed with leave-one-out cross-validation (LOOCV). Betti-curve descriptors consistently yielded the strongest results. For IPD vs VaP, foot clearance variables minimum toe clearance, maximum toe late swing, and maximum heel clearance achieved 83% accuracy and AUC=0.89 under LOOCV in the medicated (On) state. Performance improved in the On state and further when both Off and On states were considered, indicating sensitivity of the topological features to levodopa related gait changes. These findings support integrating TDA with machine learning to improve clinical gait analysis and aid differential diagnosis across parkinsonian disorders.


【11】EvoESAP: Non-Uniform Expert Pruning for Sparse MoE
标题:EvoESAP:稀疏MoE的非一致专家修剪
链接:https://arxiv.org/abs/2603.06003

作者:Zongfang Liu,Shengkun Tang,Boyang Sun,Zhiqiang Shen,Xin Yuan
摘要:稀疏混合专家(SMoE)语言模型在较低的每令牌计算下实现了强大的功能,但部署仍然受到内存和吞吐量的限制,因为必须存储和服务完整的专家池。训练后的专家修剪减少了这种成本,但大多数方法都集中在每个层中修剪哪些专家,并默认为统一的逐层稀疏分配,即使分配会强烈影响性能。我们将修剪解耦为层内专家排名和跨层预算分配,并引入\textbf{E} expected\textbf{S}peculative\textbf{A} ceptance\textbf{P}roxy(\textbf{ESAP}),这是一种推测解码启发的教师强制度量,用于衡量修剪后的模型与完整模型的匹配程度。ESAP是有界的和稳定的,使许多候选人的廉价比较,而无需昂贵的自回归解码。ESAP的基础上,我们提出了EvoESAP,一个进化的搜索框架,优化一个非均匀的逐层稀疏分配下一个固定的全球预算,同时保持层内修剪顺序固定,使其成为一个即插即用的方法,如频率,EAN,SEER,和REAP的标准。在25\%和50\%稀疏度下的7 B-30 B SMoE LLM中,EvoESAP始终发现非均匀分配,这些分配改善了开放式生成(在50\%稀疏度下,MATH-500上高达\textbf{+19.6\%}),同时与相同稀疏度下的均匀修剪相比,保持了竞争性的多项选择准确性。
摘要:Sparse Mixture-of-Experts (SMoE) language models achieve strong capability at low per-token compute, yet deployment remains memory- and throughput-bound because the full expert pool must be stored and served. Post-training expert pruning reduces this cost, but most methods focus on which experts to prune within each layer and default to a uniform layer-wise sparsity allocation, even though the allocation can strongly affect performance. We decouple pruning into within-layer expert ranking and across-layer budget allocation, and introduce \textbf{E}xpected \textbf{S}peculative \textbf{A}cceptance \textbf{P}roxy (\textbf{ESAP}), a speculative-decoding-inspired, teacher-forced metric that measures how well a pruned model matches the full model. ESAP is bounded and stable, enabling cheap comparison of many candidates without costly autoregressive decoding. Building on ESAP, we propose EvoESAP, an evolutionary searching framework that optimizes a non-uniform layer-wise sparsity allocation under a fixed global budget while holding the within-layer pruning order fixed, making it a plug-and-play method with criteria such as Frequency, EAN, SEER, and REAP. Across 7B--30B SMoE LLMs at 25\% and 50\% sparsity, EvoESAP consistently discovers non-uniform allocations that improve open-ended generation (up to \textbf{+19.6\%} on MATH-500 at 50\% sparsity) while preserving competitive multiple-choice accuracy compared with uniform pruning at the same sparsity.


【12】Addressing the Ecological Fallacy in Larger LMs with Human Context
标题:在人类背景下解决大型LM中的生态谬误
链接:https://arxiv.org/abs/2603.05928

作者:Nikita Soni,Dhruv Vijay Kunjadiya,Pratham Piyush Shah,Dikshya Mohanty,H. Andrew Schwartz,Niranjan Balasubramanian
摘要:语言模型训练和推理忽略了一个基本的语言事实--同一个人写的多个文本序列之间存在依赖性。先前的工作已经表明,解决这种形式的\textit{生态谬误}可以大大提高多个较小的(~ 124 M)基于GPT的模型的性能。在这项工作中,我们问,如果解决生态谬误的作者的语言环境建模与特定的LM任务(称为HuLM)可以提供类似的好处,一个更大规模的模型,一个8B美洲驼模型。为此,我们探讨的变体,处理作者的语言在他们的其他时间顺序的文本的背景下。我们使用HuLM目标研究了作者上下文的预训练效果,以及在作者上下文的微调过程中使用它(\textit{HuFT:Human-aware Fine-Tuning})。实证比较表明,单独使用QLoRA在微调期间解决生态谬误,与标准微调相比,提高了更大的8B模型的性能。此外,基于QLoRA的持续HuLM预训练产生了一个人类感知的模型,可推广到仅使用线性任务分类器训练的八个下游任务上,以提高性能。作者指出,这些结果表明了建模语言在其原始生成器上下文中的实用性和重要性。
摘要:Language model training and inference ignore a fundamental linguistic fact -- there is a dependence between multiple sequences of text written by the same person. Prior work has shown that addressing this form of \textit{ecological fallacy} can greatly improve the performance of multiple smaller (~124M) GPT-based models. In this work, we ask if addressing the ecological fallacy by modeling the author's language context with a specific LM task (called HuLM) can provide similar benefits for a larger-scale model, an 8B Llama model. To this end, we explore variants that process an author's language in the context of their other temporally ordered texts. We study the effect of pre-training with this author context using the HuLM objective, as well as using it during fine-tuning with author context (\textit{HuFT:Human-aware Fine-Tuning}). Empirical comparisons show that addressing the ecological fallacy during fine-tuning alone using QLoRA improves the performance of the larger 8B model over standard fine-tuning. Additionally, QLoRA-based continued HuLM pre-training results in a human-aware model generalizable for improved performance over eight downstream tasks with linear task classifier training alone. These results indicate the utility and importance of modeling language in the context of its original generators, the authors.


【13】Design Experiments to Compare Multi-armed Bandit Algorithms
标题:设计实验以比较多臂强盗算法
链接:https://arxiv.org/abs/2603.05919

作者:Huiling Meng,Ningyuan Chen,Xuefeng Gao
摘要:在线平台通常会比较多臂强盗算法,如UCB和Thompson Sampling,以选择性能最佳的策略。与静态处理的标准A/B测试不同,每次在$T$用户上运行强盗算法只产生一个相关轨迹,因为算法的决策取决于所有过去的交互。因此,可靠的推理需要多次独立重启算法,这使得实验成本高昂,并延迟了部署决策。我们提出了人工重放(AR)作为一个新的实验设计这个问题。AR首先运行一个策略并记录其轨迹。当执行第二个策略时,每当它选择第一个策略已经采取的操作时,它就会重用记录的奖励,否则只查询真实环境。我们为此设计开发了一个新的分析框架,并证明了所得估计的三个关键性质:它是无偏的;它只需要$T + o(T)$用户交互,而不是$2T$的治疗和控制策略,当两个策略都有次线性遗憾时,实验成本几乎减半;它的方差在$T$中呈次线性增长,而来自朴素设计的估计量的方差呈线性增长。使用UCB、Thompson Sampling和$ε$-greedy策略的数值实验证实了这些理论收益。
摘要 :Online platforms routinely compare multi-armed bandit algorithms, such as UCB and Thompson Sampling, to select the best-performing policy. Unlike standard A/B tests for static treatments, each run of a bandit algorithm over $T$ users produces only one dependent trajectory, because the algorithm's decisions depend on all past interactions. Reliable inference therefore demands many independent restarts of the algorithm, making experimentation costly and delaying deployment decisions. We propose Artificial Replay (AR) as a new experimental design for this problem. AR first runs one policy and records its trajectory. When the second policy is executed, it reuses a recorded reward whenever it selects an action the first policy already took, and queries the real environment only otherwise. We develop a new analytical framework for this design and prove three key properties of the resulting estimator: it is unbiased; it requires only $T + o(T)$ user interactions instead of $2T$ for a run of the treatment and control policies, nearly halving the experimental cost when both policies have sub-linear regret; and its variance grows sub-linearly in $T$, whereas the estimator from a naïve design has a linearly-growing variance. Numerical experiments with UCB, Thompson Sampling, and $ε$-greedy policies confirm these theoretical gains.


【14】PixARMesh: Autoregressive Mesh-Native Single-View Scene Reconstruction
标题:PixAR网状:自回归网格原生单视图场景重建
链接:https://arxiv.org/abs/2603.05888

作者:Xiang Zhang,Sohyun Yoo,Hongrui Wu,Chuan Li,Jianwen Xie,Zhuowen Tu
备注:CVPR 2026. Project Page: https://mlpc-ucsd.github.io/PixARMesh
摘要:我们介绍PixARMesh,一种直接从单个RGB图像自回归重建完整的3D室内场景网格的方法。与依赖于隐式符号距离场和事后布局优化的先前方法不同,PixARMesh在统一模型中联合预测对象布局和几何形状,在单个向前传递中生成连贯和艺术家就绪的网格。基于网格生成模型的最新进展,我们通过交叉注意增强了点云编码器的像素对齐图像特征和全局场景上下文,从而实现了从单个图像进行精确的空间推理。场景是从包含上下文、姿势和网格的统一令牌流自回归生成的,从而产生具有高保真几何形状的紧凑网格。在合成和真实世界数据集上的实验表明,PixARMesh实现了最先进的重建质量,同时生成了轻量级的高质量网格,可用于下游应用。
摘要:We introduce PixARMesh, a method to autoregressively reconstruct complete 3D indoor scene meshes directly from a single RGB image. Unlike prior methods that rely on implicit signed distance fields and post-hoc layout optimization, PixARMesh jointly predicts object layout and geometry within a unified model, producing coherent and artist-ready meshes in a single forward pass. Building on recent advances in mesh generative models, we augment a point-cloud encoder with pixel-aligned image features and global scene context via cross-attention, enabling accurate spatial reasoning from a single image. Scenes are generated autoregressively from a unified token stream containing context, pose, and mesh, yielding compact meshes with high-fidelity geometry. Experiments on synthetic and real-world datasets show that PixARMesh achieves state-of-the-art reconstruction quality while producing lightweight, high-quality meshes ready for downstream applications.


【15】MoE Lens -- An Expert Is All You Need
标题:MoE镜头--专家即可
链接:https://arxiv.org/abs/2603.05806

作者:Marmik Chaudhari,Idhant Gulati,Nishkal Hundia,Pranav Karra,Shivam Raval
备注:15 pages, 10 figures, ICLR 2025 Workshop on Sparsity in LLMs (SLLM)
摘要:混合专家(MoE)模型通过稀疏的专家激活实现了参数有效的缩放,但由于对其专业化行为的理解有限,优化其推理和内存成本仍然具有挑战性。我们提出了一个系统的分析专家专业化的MOES通过两个互补的方法:特定领域的路由模式和早期解码框架,跟踪专家的贡献输出表示。我们对DeepSeekMoE模型的分析表明,尽管有64个路由专家,每层计算有6个活跃专家,但该模型主要依赖于少数专业专家,最高权重专家的输出非常接近完整的集合预测。我们通过对令牌路由分布的系统分析定量验证了这些发现,表明很少有专家处理超过50%的跨不同专业领域的路由决策。每个层的单个专家和集成专家之间的隐藏状态相似度非常高,某些层的余弦相似度高达0.95,并且当在所有三个领域中使用单个专家时,困惑度仅增加5%。我们的研究结果表明,专家混合模型表现出集中的专业知识,突出了潜在的机会,通过有针对性的专家修剪推理优化,同时保持模型的性能和开放的途径,研究本地化的学习知识,在这些模型。
摘要:Mixture of Experts (MoE) models enable parameter-efficient scaling through sparse expert activations, yet optimizing their inference and memory costs remains challenging due to limited understanding of their specialization behavior. We present a systematic analysis of expert specialization in MoEs through two complementary approaches: domain-specific routing patterns and an early decoding framework that tracks expert contributions to output representations. Our analysis of the DeepSeekMoE model reveals that despite having 64 routed experts with 6 active for each layer's computation, the model predominantly relies on a few specialized experts, with the top-weighted expert's output closely approximating the full ensemble prediction. We quantitatively validate these findings through a systematic analysis of the token routing distribution, demonstrating that very few experts handle over 50\% of routing decisions across different specialized domains. Hidden state similarity between single and ensemble experts for every layer is extremely high, with some layers having cosine similarity as high as 0.95 and perplexity increasing by only 5\% when using a single expert across all three domains. Our results indicate that Mixture of Experts models exhibit concentrated expertise highlighting potential opportunities for inference optimization through targeted expert pruning while maintaining model performance and opening avenues towards studying localization of learned knowledge in these models.


【16】The Coordination Gap: Alternation Metrics for Temporal Dynamics in Multi-Agent Battle of the Exes
标题:协调差距:前任多智能体之战中时间动态的交替状态
链接:https://arxiv.org/abs/2603.05789

作者:Nikolaos Al. Papadopoulos,Konstantinos Psannis
备注:40 pages, 5 figures, 4 tables. Submitted to Mathematical Social Sciences
摘要:多智能体协调困境暴露了个体优化和集体福利之间的根本紧张关系,但这种协调的特征需要对时间结构和集体动态敏感的指标。作为一个诊断测试平台,我们研究了一个英国央行衍生的多智能体变体的战斗的Exes,正式将其作为一个马尔可夫游戏,其中出现的话轮转换作为一个周期性的协调制度。传统的基于结果的度量(例如,效率和最小/最大公平性)在时间上是盲目的--它们不能区分结构化交替与垄断或随机接入模式--并且公平性比率随着n的增长而失去区分能力,从而掩盖了不公平性。   为了解决这个问题,我们引入完美交替(PA)作为参考协调制度,并提出了六个新的交替(ALT)指标设计为时间敏感的可观的协调质量。使用Q学习代理作为最小自适应诊断基线,并与随机策略空过程进行比较,我们发现了一个明显的测量失败:尽管表现出欺骗性的高传统指标(例如,奖励公平性通常超过0.9),在ALT变量评估下,学习策略的性能比随机基线低81%-这在两个代理的情况下已经存在,并且随着n的增长而加剧。   这些结果表明,在这种情况下,高的总回报可以共存的时间协调性差,传统的指标可能会严重错误的新兴动态。我们的研究结果强调了时间感知的可观测量在多智能体博弈中分析协调的必要性,并强调了随机政策基线作为解释相对于机会水平行为的协调结果的基本零过程。
摘要:Multi-agent coordination dilemmas expose a fundamental tension between individual optimization and collective welfare, yet characterizing such coordination requires metrics sensitive to temporal structure and collective dynamics. As a diagnostic testbed, we study a BoE-derived multi-agent variant of the Battle of the Exes, formalizing it as a Markov game in which turn-taking emerges as a periodic coordination regime. Conventional outcome-based metrics (e.g., efficiency and min/max fairness) are temporally blind -- they cannot distinguish structured alternation from monopolistic or random access patterns -- and fairness ratios lose discriminative power as n grows, obscuring inequities.   To address this limitation, we introduce Perfect Alternation (PA) as a reference coordination regime and propose six novel Alternation (ALT) metrics designed as temporally sensitive observables of coordination quality. Using Q-learning agents as a minimal adaptive diagnostic baseline, and comparing against random-policy null processes, we uncover a clear measurement failure: despite exhibiting deceptively high traditional metrics (e.g., reward fairness often exceeding 0.9), learned policies perform up to 81% below random baselines under ALT-variant evaluation -- a deficit already present in the two-agent case and intensifying as n grows.   These results demonstrate, in this setting, that high aggregate payoffs can coexist with poor temporal coordination, and that conventional metrics may severely mischaracterize emergent dynamics. Our findings underscore the necessity of temporally aware observables for analyzing coordination in multi-agent games and highlight random-policy baselines as essential null processes for interpreting coordination outcomes relative to chance-level behavior.


【17】TML-Bench: Benchmark for Data Science Agents on Tabular ML Tasks
标题:TML-Bench:表格ML任务中数据科学代理的基准
链接:https://arxiv.org/abs/2603.05764

作者:Mykola Pinchuk
备注:19 pages, 16 tables and figures
摘要:自主编码代理可以在Kaggle风格的任务上快速生成强大的表格基线。实用价值取决于时间限制下的端到端正确性和可靠性。本文介绍了TML-Bench,一个用于Kaggle风格任务的数据科学代理的表格基准。本文评估了10个OSS LLM在四个Kaggle竞赛和三个时间预算(240秒,600秒和1200秒)。每个模型在每个任务和预算中运行五次。如果运行在代理无法访问的隐藏标签上生成有效提交和私有保留分数,则运行成功。本文报告了中位性能、成功率和运行间变异性。MiniMax-M2.1模型在论文的一级聚合下,在所有四个竞争中取得了最好的聚合性能得分。平均性能随着时间预算的增加而提高。在当前运行计数下,某些单个模型的缩放是有噪声的。代码和材料可在https://github.com/MykolaPinchuk/TML-bench/tree/master上获得。
摘要:Autonomous coding agents can produce strong tabular baselines quickly on Kaggle-style tasks. Practical value depends on end-to-end correctness and reliability under time limits. This paper introduces TML-Bench, a tabular benchmark for data science agents on Kaggle-style tasks. This paper evaluates 10 OSS LLMs on four Kaggle competitions and three time budgets (240s, 600s, and 1200s). Each model is run five times per task and budget. A run is successful if it produces a valid submission and a private-holdout score on hidden labels that are not accessible to the agent. This paper reports median performance, success rates, and run-to-run variability. MiniMax-M2.1 model achieves the best aggregate performance score on all four competitions under the paper's primary aggregation. Average performance improves with larger time budgets. Scaling is noisy for some individual models at the current run count. Code and materials are available at https://github.com/MykolaPinchuk/TML-bench/tree/master.


【18】Score-Guided Proximal Projection: A Unified Geometric Framework for Rectified Flow Editing
标题:分数引导的近端投影:用于纠正流编辑的统一几何框架
链接:https://arxiv.org/abs/2603.05761

作者:Vansh Bansal,James G Scott
摘要:整流(RF)模型实现了最先进的生成质量,但控制它们执行精确的任务(例如语义编辑或盲图像恢复)仍然是一个挑战。目前的方法分为基于反演的引导和后验采样近似(例如,DPS),这是计算昂贵和不稳定的。在这项工作中,我们提出了分数引导的近端投影(SGPP),一个统一的框架,确定性优化和随机采样之间的差距桥梁。我们将恢复任务重新定义为一个近似优化问题,定义了一个能量景观,该景观平衡了对输入的保真度与来自预训练分数字段的现实主义。我们从理论上证明,这个目标诱导一个正常的收缩属性,几何保证外的分布输入被抢购到数据流形,它有效地达到后验模式约束的流形。至关重要的是,我们证明了SGPP概括了最先进的编辑方法:RF反转有效地限制了我们的框架。通过放松最接近的方差,SGPP实现了“软指导”,在严格的身份保护和生成自由之间提供了一个连续的,无训练的权衡。
摘要:Rectified Flow (RF) models achieve state-of-the-art generation quality, yet controlling them for precise tasks -- such as semantic editing or blind image recovery -- remains a challenge. Current approaches bifurcate into inversion-based guidance, which suffers from "geometric locking" by rigidly adhering to the source trajectory, and posterior sampling approximations (e.g., DPS), which are computationally expensive and unstable. In this work, we propose Score-Guided Proximal Projection (SGPP), a unified framework that bridges the gap between deterministic optimization and stochastic sampling. We reformulate the recovery task as a proximal optimization problem, defining an energy landscape that balances fidelity to the input with realism from the pre-trained score field. We theoretically prove that this objective induces a normal contraction property, geometrically guaranteeing that out-of-distribution inputs are snapped onto the data manifold, and it effectively reaches the posterior mode constrained to the manifold. Crucially, we demonstrate that SGPP generalizes state-of-the-art editing methods: RF-inversion is effectively a limiting case of our framework. By relaxing the proximal variance, SGPP enables "soft guidance," offering a continuous, training-free trade-off between strict identity preservation and generative freedom.


【19】Full Dynamic Range Sky-Modelling For Image Based Lighting
标题:基于图像的照明的全动态范围天空建模
链接:https://arxiv.org/abs/2603.05758

作者:Ian J. Maquignaz
摘要:精确的环境地图是模拟真实世界户外场景的关键组成部分。它们能够实现迷人的视觉艺术,沉浸式虚拟现实以及广泛的科学和工程应用。为了减轻物理捕获、物理模拟和体绘制的负担,天空模型已经被提出作为快速、灵活和节省成本的替代方案。近年来,天空模型通过深度学习得到了扩展,使其更加全面和包容云的形成,但最近的工作表明,这些模型无法忠实地再现准确和逼真的自然天空。特别是在更高的分辨率下,DNN天空模型很难准确地模拟14EV+类不平衡的太阳区域,导致视觉质量差,光线传输、阴影和色调扭曲的场景。在这项工作中,我们提出了Icarus,一个全天候的天空模型,能够学习全动态范围(FDR)物理捕获的户外图像的曝光范围。我们的模型允许有条件生成环境地图,直观的用户定位的太阳和云的形成,并扩展到当前的最先进的,使用户控制的纹理的大气层。通过我们的评估,我们证明了Icarus可以与FDR物理捕获的户外图像或参数天空模型互换,并以前所未有的准确性,逼真度,照明方向性(阴影)和基于图像的闪电(IBL)中的色调照亮场景。
摘要:Accurate environment maps are a key component to modelling real-world outdoor scenes. They enable captivating visual arts, immersive virtual reality and a wide range of scientific and engineering applications. To alleviate the burden of physical-capture, physically-simulation and volumetric rendering, sky-models have been proposed as fast, flexible, and cost-saving alternatives. In recent years, sky-models have been extended through deep learning to be more comprehensive and inclusive of cloud formations, but recent work has demonstrated these models fall short in faithfully recreating accurate and photorealistic natural skies. Particularly at higher resolutions, DNN sky-models struggle to accurately model the 14EV+ class-imbalanced solar region, resulting in poor visual quality and scenes illuminated with skewed light transmission, shadows and tones. In this work, we propose Icarus, an all-weather sky-model capable of learning the exposure range of Full Dynamic Range (FDR) physically captured outdoor imagery. Our model allows conditional generation of environment maps with intuitive user-positioning of solar and cloud formations, and extends on current state-of-the-art to enable user-controlled texturing of atmospheric formations. Through our evaluation, we demonstrate Icarus is interchangeable with FDR physically captured outdoor imagery or parametric sky-models, and illuminates scenes with unprecedented accuracy, photorealism, lighting directionality (shadows), and tones in Image Based Lightning (IBL).


【20】Improved Scaling Laws via Weak-to-Strong Generalization in Random Feature Ridge Regression
标题:随机特征岭回归中通过弱到强推广改进标度定律
链接:https://arxiv.org/abs/2603.05691

作者:Diyuan Wu,Lehan Chen,Theodor Misiakiewicz,Marco Mondelli
摘要:在机器学习中,使用学习的模型来标记数据,然后使用这些数据来训练更有能力的模型越来越常见。从弱到强的泛化现象证明了这种两阶段过程的优势:一个强学生在从弱教师那里获得的不完美标签上进行训练,然而强学生的表现优于弱教师。在本文中,我们表明,潜在的改善是实质性的,在这个意义上说,它影响的标度律其次是测试误差。具体来说,我们考虑通过随机特征岭回归(RFRR)培训的学生和教师。我们的主要技术贡献是获得一个确定性的等价物的超额测试误差的学生训练的标签通过教师获得。通过这种确定性的等价物,我们然后确定制度,其中的学生的缩放法改进后的教师,揭示了改善可以实现在偏差主导和方差主导的设置。引人注目的是,学生可以达到最小最大最优率,而不管教师的比例法则如何--事实上,当教师的测试误差甚至不随样本大小衰减时。
摘要 :It is increasingly common in machine learning to use learned models to label data and then employ such data to train more capable models. The phenomenon of weak-to-strong generalization exemplifies the advantage of this two-stage procedure: a strong student is trained on imperfect labels obtained from a weak teacher, and yet the strong student outperforms the weak teacher. In this paper, we show that the potential improvement is substantial, in the sense that it affects the scaling law followed by the test error. Specifically, we consider students and teachers trained via random feature ridge regression (RFRR). Our main technical contribution is to derive a deterministic equivalent for the excess test error of the student trained on labels obtained via the teacher. Via this deterministic equivalent, we then identify regimes in which the scaling law of the student improves upon that of the teacher, unveiling that the improvement can be achieved both in bias-dominated and variance-dominated settings. Strikingly, the student may attain the minimax optimal rate regardless of the scaling law of the teacher -- in fact, when the test error of the teacher does not even decay with the sample size.


【21】When Rubrics Fail: Error Enumeration as Reward in Reference-Free RL Post-Training for Virtual Try-On
标题:当标题失败时:虚拟试用的无参考RL后训练中的错误列举作为奖励
链接:https://arxiv.org/abs/2603.05659

作者:Wisdom Ikezogwo,Mehmet Saygin Seyfioglu,Ranjay Krishna,Karim Bouyarmane
摘要:具有可验证奖励的强化学习(RLVR)和规则作为奖励(RaR)通过从理想的参考答案中合成评估标准,在具有明确正确性信号的领域甚至在主观领域中取得了巨大的进步。但是许多现实世界的任务都承认多个有效的输出,并且缺乏生成规则所依赖的单一理想答案。我们将这种无参考设置视为当前后训练方法中的一个空白,并提出了隐式错误计数(IEC)来填补它。IEC不是检查响应是否符合规则,而是列举它的错误,在任务相关轴上应用严重度加权分数,并将其转换为校准的每方面奖励。我们表明,天真的显式枚举是太嘈杂的稳定优化,和两个设计选择:隐式得分发射和组校准是必要的,使错误计数可靠的奖励。作为一个案例研究,我们验证IEC的虚拟试穿(VTO),一个域,同时过于约束的整体评分和过于宽容的基于规则的评价:微妙的服装错误是不可接受的,但许多输出变化是正确的。我们引入级联错误计数(CEC)作为评估指标,它可以很好地跟踪人类的偏好(60%的前1名与30%的其他人),并策划Mismatch-DressCode(MDressBench),这是一个与压力测试奖励设计具有最大属性不匹配的基准。在MDressBench上,IEC在所有指标上都优于RaR(CEC:平面参考5.31 vs. 5.60;非平面参考5.20 vs. 5.53)。在VITON-HD和DressCode上,IEC在8个感知指标中的6个上匹配或超过6个基线。这些结果表明,当理想的答案是不可用的,计数错误提供了一个更强的信号比建设的标题。
摘要:Reinforcement learning with verifiable rewards (RLVR) and Rubrics as Rewards (RaR) have driven strong gains in domains with clear correctness signals and even in subjective domains by synthesizing evaluation criteria from ideal reference answers. But many real-world tasks admit multiple valid outputs and lack the single ideal answer that rubric generation depends on. We identify this reference-free setting as a gap in current post-training methods and propose Implicit Error Counting (IEC) to fill it. Instead of checking what a response gets right against a rubric, IEC enumerates what it gets wrong, applying severity-weighted scores across task-relevant axes and converting them into calibrated per-aspect rewards. We show that naïve explicit enumeration is too noisy for stable optimization, and that two design choices: implicit score emission and group calibration are necessary to make error counting a reliable reward. As a case study, we validate IEC on virtual try-on (VTO), a domain that is simultaneously too constrained for holistic scoring and too permissive for rubric-based evaluation: subtle garment errors are unacceptable, yet many output variations are correct. We introduce Cascaded Error Counting (CEC) as an evaluation metric, which tracks human preferences well (60% top-1 vs. 30% others), and curate Mismatch-DressCode (MDressBench), a benchmark with maximal attribute mismatch to stress-test reward designs. On MDressBench, IEC outperforms RaR across all metrics (CEC: 5.31 vs. 5.60 on flat references; 5.20 vs. 5.53 on non-flat). On VITON-HD and DressCode, IEC matches or surpasses six baselines on 6 of 8 perceptual metrics. These results suggest that when ideal answers are unavailable, counting errors provide a stronger signal than constructing rubrics.


【22】RACAS: Controlling Diverse Robots With a Single Agentic System
标题:RACAS:用单个气动系统控制多样化机器人
链接:https://arxiv.org/abs/2603.05621

作者:Dylan R. Ashley,Jan Przepióra,Yimeng Chen,Ali Abualsaud,Nurzhan Yesmagambet,Shinkyu Park,Eric Feron,Jürgen Schmidhuber
备注:7 pages in main text + 1 page of appendices + 1 page of references, 5 figures in main text + 1 figure in appendices, 2 tables in main text
摘要:许多机器人平台都公开了一个API,通过这个API,外部软件可以命令它们的执行器并读取它们的传感器。然而,从这些低级接口过渡到高级自主行为需要一个复杂的管道,其组件需要不同的专业领域。弥合这一差距的现有方法要么需要针对每个新的实施例进行再训练,要么仅在结构相似的平台上得到验证。我们介绍RACAS(Robot-Agnostic Control via Actutic Systems),这是一种合作代理架构,其中三个基于LLM/VLM的模块(控制器,控制器和内存管理器)通过自然语言进行专门通信,以提供闭环机器人控制。RACAS只需要机器人的自然语言描述,可用动作的定义和任务规范;无需修改源代码,模型权重或奖励函数即可在平台之间移动。我们评估RACAS的几个任务,使用轮式地面机器人,最近出版的新型多关节机器人肢体,和水下航行器。RACAS始终如一地解决了这些完全不同的平台上的所有分配任务,展示了代理人工智能的潜力,大大减少了原型机器人解决方案的障碍。
摘要:Many robotic platforms expose an API through which external software can command their actuators and read their sensors. However, transitioning from these low-level interfaces to high-level autonomous behaviour requires a complicated pipeline, whose components demand distinct areas of expertise. Existing approaches to bridging this gap either require retraining for every new embodiment or have only been validated across structurally similar platforms. We introduce RACAS (Robot-Agnostic Control via Agentic Systems), a cooperative agentic architecture in which three LLM/VLM-based modules (Monitors, a Controller, and a Memory Curator) communicate exclusively through natural language to provide closed-loop robot control. RACAS requires only a natural language description of the robot, a definition of available actions, and a task specification; no source code, model weights, or reward functions need to be modified to move between platforms. We evaluate RACAS on several tasks using a wheeled ground robot, a recently published novel multi-jointed robotic limb, and an underwater vehicle. RACAS consistently solved all assigned tasks across these radically different platforms, demonstrating the potential of agentic AI to substantially reduce the barrier to prototyping robotic solutions.


【23】Koopman Regularized Deep Speech Disentanglement for Speaker Verification
标题:用于说话人验证的Koopman正规化深度语音解纠缠
链接:https://arxiv.org/abs/2603.05577

作者:Nikos Chazaridis,Mohammad Belal,Rafael Mestre,Timothy J. Norman,Christine Evers
备注:This work has been submitted to the IEEE for possible publication
摘要:人类语音既包含语言内容,又包含说话人相关特征,这使得说话人确认成为身份关键应用中的关键技术。现代深度学习说话人验证系统旨在学习对语义内容和环境噪声等干扰因素不变的说话人表示。然而,许多现有的方法依赖于标记数据,文本监督或大型预训练模型作为特征提取器,限制了可扩展性和实际部署,引发了可持续性问题。我们提出了Deep Koopman Speech Disentanglement Autoencoder(DKSD-AE),这是一种结构化的自动编码器,它将一种新颖的多步Koopman算子学习模块与实例归一化相结合,以解开扬声器和内容动态。跨多个数据集的定量实验表明,与最先进的基线相比,DKSD-AE实现了改进或有竞争力的说话人验证性能,同时保持了高内容EER,证实了有效的解纠缠。这些结果是在没有文本监督的情况下用更少的参数获得的。此外,性能保持稳定的评估规模增加,突出表示的鲁棒性和泛化。我们的研究结果表明,基于Koopman的时间建模,结合实例规范化,为以说话者为中心的表示学习提供了一个有效的和原则性的解决方案。
摘要 :Human speech contains both linguistic content and speaker dependent characteristics making speaker verification a key technology in identity critical applications. Modern deep learning speaker verification systems aim to learn speaker representations that are invariant to semantic content and nuisance factors such as ambient noise. However, many existing approaches depend on labelled data, textual supervision or large pretrained models as feature extractors, limiting scalability and practical deployment, raising sustainability concerns. We propose Deep Koopman Speech Disentanglement Autoencoder (DKSD-AE), a structured autoencoder that combines a novel multi-step Koopman operator learning module with instance normalization to disentangle speaker and content dynamics. Quantitative experiments across multiple datasets demonstrate that DKSD-AE achieves improved or competitive speaker verification performance compared to state-of-the-art baselines while maintaining high content EER, confirming effective disentanglement. These results are obtained with substantially fewer parameters and without textual supervision. Moreover, performance remains stable under increased evaluation scale, highlighting representation robustness and generalization. Our findings suggest that Koopman-based temporal modelling, when combined with instance normalization, provides an efficient and principled solution for speaker-focused representation learning.


【24】Aligning the True Semantics: Constrained Decoupling and Distribution Sampling for Cross-Modal Alignment
标题:调整真正的语义:跨模式调整的约束脱钩和分布抽样
链接:https://arxiv.org/abs/2603.05566

作者:Xiang Ma,Lexin Fang,Litian Xu,Caiming Zhang
备注:AAAI 2026 poster
摘要:跨模态对齐是多模态学习中的一项重要任务,旨在实现视觉和语言之间的语义一致性。这要求图像-文本对表现出相似的语义。传统算法追求嵌入的一致性来实现语义一致性,忽略了嵌入中存在的非语义信息。一种直观的方法是将嵌入解耦为语义和模态组件,仅对齐语义组件。然而,这引入了两个主要挑战:(1)没有建立区分语义和模态信息的标准。(2)情态空缺会导致语义对齐偏差或信息丢失。为了对齐真实的语义,我们提出了一种新的跨模态对齐算法,通过\textbf{C}约束\textbf{D}耦合和\textbf{D}约束\textbf{S}映射(CDDS)。具体而言,(1)引入双路径UNet来自适应地解耦嵌入,应用多个约束来确保有效分离。(2)提出了一种分布抽样方法来弥补模态差距,保证了对齐过程的合理性。在各种基准和模型骨干上的大量实验证明了CDDS的优越性,比最先进的方法高出6.6%到14.2%。
摘要:Cross-modal alignment is a crucial task in multimodal learning aimed at achieving semantic consistency between vision and language. This requires that image-text pairs exhibit similar semantics. Traditional algorithms pursue embedding consistency to achieve semantic consistency, ignoring the non-semantic information present in the embedding. An intuitive approach is to decouple the embeddings into semantic and modality components, aligning only the semantic component. However, this introduces two main challenges: (1) There is no established standard for distinguishing semantic and modal information. (2) The modality gap can cause semantic alignment deviation or information loss. To align the true semantics, we propose a novel cross-modal alignment algorithm via \textbf{C}onstrained \textbf{D}ecoupling and \textbf{D}istribution \textbf{S}ampling (CDDS). Specifically, (1) A dual-path UNet is introduced to adaptively decouple the embeddings, applying multiple constraints to ensure effective separation. (2) A distribution sampling method is proposed to bridge the modality gap, ensuring the rationality of the alignment process. Extensive experiments on various benchmarks and model backbones demonstrate the superiority of CDDS, outperforming state-of-the-art methods by 6.6\% to 14.2\%.


【25】When AI Levels the Playing Field: Skill Homogenization, Asset Concentration, and Two Regimes of Inequality
标题:当人工智能创造公平竞争环境时:技能同质化、资产集中和两种不平等状态
链接:https://arxiv.org/abs/2603.05565

作者:Xupeng Chen,Shuchen Meng
摘要:生成式人工智能压缩了任务内的技能差异,同时将经济价值转移到集中的互补资产上,这造成了一个明显的悖论:使个人表现平等的技术可能会扩大总体不平等。我们正式这种紧张关系的任务为基础的模型与内生教育,雇主筛选,和异质性企业。该模型产生了两种制度,其边界取决于人工智能的技术结构(专有与\商品)和劳动力市场制度(租金分享弹性,资产集中度)。通过模拟矩方法进行的情景分析,匹配六个经验目标,规范了模型的定量幅度;敏感性分解显示,五个非Δ基尼矩识别机制利率,但不是总符号,在校准参数下,总符号由m_6 $和m_6 $固定,而人工智能的技术结构($η_1$与\ $η_0$)独立地穿过边界。贡献是一种机制,而不是对符号的判断。使用BLS OEWS数据(2019- 2023)进行的职业水平回归说明了为什么这些数据无法测试模型的任务水平预测。这些预测是可以用职业内、任务内的面板数据来检验的,这些数据还没有大规模存在。
摘要:Generative AI compresses within-task skill differences while shifting economic value toward concentrated complementary assets, creating an apparent paradox: the technology that equalizes individual performance may widen aggregate inequality. We formalize this tension in a task-based model with endogenous education, employer screening, and heterogeneous firms. The model yields two regimes whose boundary depends on AI's technology structure (proprietary vs.\ commodity) and labor market institutions (rent-sharing elasticity, asset concentration). A scenario analysis via Method of Simulated Moments, matching six empirical targets, disciplines the model's quantitative magnitudes; a sensitivity decomposition reveals that the five non-$Δ$Gini moments identify mechanism rates but not the aggregate sign, which at the calibrated parameters is pinned by $m_6$ and $ξ$, while AI's technology structure ($η_1$ vs.\ $η_0$) independently crosses the boundary. The contribution is the mechanism -- not a verdict on the sign. Occupation-level regressions using BLS OEWS data (2019--2023) illustrate why such data cannot test the model's task-level predictions. The predictions are testable with within-occupation, within-task panel data that do not yet exist at scale.


【26】VDCook:DIY video data cook your MLLMs
标题:VDCook:DIY视频数据烹饪您的MLLM
链接:https://arxiv.org/abs/2603.05539

作者:Chengwei Wu
摘要:我们介绍VDCook:一个自进化的视频数据操作系统,一个可配置的视频数据构建平台,供研究人员和垂直领域团队使用。用户通过自然语言查询和可调参数(规模,检索合成比,质量阈值)发起数据请求。该系统自动执行查询优化,同时运行真正的视频检索和控制合成模块。它最终生成具有完整出处和元数据的域内数据包,以及可复制的Notebook。   与传统的静态、一次性构建的数据集不同,VDCook通过基于MCP(模型上下文协议)的自动数据摄取机制实现持续更新和域扩展,将数据集转换为动态演化的开放生态系统。该系统还提供多维元数据注释(场景分割、运动评分、OCR比率、自动字幕等),为灵活的后续数据“烹饪”和索引\cite{vlogger}奠定基础。   该平台旨在通过基础设施级解决方案显著降低构建专业视频培训数据集的障碍,同时支持社区贡献和支持治理的数据扩展模式。\textbf{项目演示:} https://screenapp.io/app/v/WP0SvffgsH
摘要:We introduce VDCook: a self-evolving video data operating system, a configurable video data construction platform for researchers and vertical domain teams. Users initiate data requests via natural language queries and adjustable parameters (scale, retrieval-synthesis ratio, quality threshold). The system automatically performs query optimization, concurrently running real video retrieval and controlled synthesis modules. It ultimately generates in-domain data packages with complete provenance and metadata, along with reproducible Notebooks.   Unlike traditional static, one-time-built datasets, VDCook enables continuous updates and domain expansion through its automated data ingestion mechanism based on MCP (Model Context Protocol)\cite{mcp2024anthropic}, transforming datasets into dynamically evolving open ecosystems. The system also provides multi-dimensional metadata annotation (scene segmentation, motion scoring, OCR ratio, automatic captioning, etc.), laying the foundation for flexible subsequent data `cooking' and indexing\cite{vlogger}.   This platform aims to significantly lower the barrier to constructing specialized video training datasets through infrastructure-level solutions, while supporting community contributions and a governance-enabled data expansion paradigm. \textbf{Project demo:} https://screenapp.io/app/v/WP0SvffgsH


【27】Traversal-as-Policy: Log-Distilled Gated Behavior Trees as Externalized, Verifiable Policies for Safe, Robust, and Efficient Agents
标题:遍历策略:日志蒸馏门控行为树作为安全,健壮和高效代理的外部化,可验证策略
链接:https://arxiv.org/abs/2603.05517

作者:Peiran Li,Jiashuo Sun,Fangzhou Lin,Shuo Xing,Tianfu Fu,Suofei Feng,Chaoqun Ni,Zhengzhong Tu
备注:30 pages, 1 figurres, 23 tables
摘要:自主LLM代理失败,因为长期策略仍然隐含在模型权重和成绩单中,而安全性是事后改造的。我们建议遍历策略:将沙盒OpenHands执行日志提取到一个可执行的门控行为树(GBT)中,并将树遍历-而不是无约束的生成-作为控制策略,只要任务在覆盖范围内。每个节点编码一个状态调节的动作宏挖掘和合并检查成功的轨迹;宏牵连不安全的痕迹附加确定性的预执行门结构化的工具上下文和有界的历史,根据经验为基础的单调性更新,所以以前拒绝不安全的上下文不能被重新接纳。在运行时,一个轻量级的查询器将基本模型的意图与子宏相匹配,在全局和节点本地门控下一次执行一个宏,并在停止时执行风险感知的最短路径恢复到可行的成功叶;访问的路径形成一个紧凑的脊柱内存,取代转录重放。在统一的OpenHands沙箱中对15个以上的软件、Web、推理和安全/安全基准进行评估,GBT提高了成功率,同时将违规行为推向零并降低了成本。在SWE工作台上验证(协议A,500个问题),GBT-SE将成功率从34.6%提高到73.6%,将违规率从2.8%减少到0.2%,并将令牌/字符使用量从208 k/820 k减少到126 k/490 k;使用相同的提取树,8B执行程序在SWE平台验证(14.0%58.8%)和WebArena(9.1%37.3%)上的成功率超过两倍。
摘要:Autonomous LLM agents fail because long-horizon policy remains implicit in model weights and transcripts, while safety is retrofitted post hoc. We propose Traversal-as-Policy: distill sandboxed OpenHands execution logs into a single executable Gated Behavior Tree (GBT) and treat tree traversal -- rather than unconstrained generation -- as the control policy whenever a task is in coverage. Each node encodes a state-conditioned action macro mined and merge-checked from successful trajectories; macros implicated by unsafe traces attach deterministic pre-execution gates over structured tool context and bounded history, updated under experience-grounded monotonicity so previously rejected unsafe contexts cannot be re-admitted. At runtime, a lightweight traverser matches the base model's intent to child macros, executes one macro at a time under global and node-local gating, and when stalled performs risk-aware shortest-path recovery to a feasible success leaf; the visited path forms a compact spine memory that replaces transcript replay. Evaluated in a unified OpenHands sandbox on 15+ software, web, reasoning, and safety/security benchmarks, GBT improves success while driving violations toward zero and reducing cost. On SWE-bench Verified (Protocol A, 500 issues), GBT-SE raises success from 34.6% to 73.6%, reduces violations from 2.8% to 0.2%, and cuts token/character usage from 208k/820k to 126k/490k; with the same distilled tree, 8B executors more than double success on SWE-bench Verified (14.0%58.8%) and WebArena (9.1%37.3%).


【28】AI End-to-End Radiation Treatment Planning Under One Second
标题:人工智能一秒内端到端放射治疗规划
链接:https://arxiv.org/abs/2603.06338

作者:Simon Arberet,Riqiang Gao,Martin Kraus,Florin C. Ghesu,Wilko Verbakel,Mamadou Diallo,Anthony Magliari,Venkatesan Karuppusamy,Sushil Beriwal,REQUITE Consortium,Ali Kamen,Dorin Comaniciu
摘要:基于人工智能的放射治疗(RT)计划有可能减少计划时间和计划者之间的差异,提高临床工作流程的效率和一致性。大多数现有的自动化方法依赖于多剂量评估和校正,导致计划生成时间为几分钟。我们介绍了AIRT(基于人工智能的放射治疗),这是一个端到端的深度学习框架,可以从CT图像和结构轮廓直接推断可交付的治疗计划。AIRT在单个Nvidia A100 GPU上生成单弧VMAT前列腺计划,从成像和解剖输入到叶测序,不到一秒。该框架包括可微分剂量反馈、对抗性注量图成形和计划生成增强,以提高计划质量和鲁棒性。该模型在超过10,000个完整的前列腺病例上进行了训练。在目标覆盖和OAR保留指标方面证明了与RapidPlan Eclipse相比的非劣效性。当使用AcurosXB进行评价时,目标均匀性(HI = 0.10 $\pm $0.01)和OAR保留与参考计划相似。这些结果代表了超快速标准化RT计划和简化临床工作流程的重要一步。
摘要:Artificial intelligence-based radiation therapy (RT) planning has the potential to reduce planning time and inter-planner variability, improving efficiency and consistency in clinical workflows. Most existing automated approaches rely on multiple dose evaluations and corrections, resulting in plan generation times of several minutes. We introduce AIRT (Artificial Intelligence-based Radiotherapy), an end-to-end deep-learning framework that directly infers deliverable treatment plans from CT images and structure contours. AIRT generates single-arc VMAT prostate plans, from imaging and anatomical inputs to leaf sequencing, in under one second on a single Nvidia A100 GPU. The framework includes a differentiable dose feedback, an adversarial fluence map shaping, and a plan generation augmentation to improve plan quality and robustness. The model was trained on more than 10,000 intact prostate cases. Non-inferiority to RapidPlan Eclipse was demonstrated across target coverage and OAR sparing metrics. Target homogeneity (HI = 0.10 $\pm$ 0.01) and OAR sparing were similar to reference plans when evaluated using AcurosXB. These results represent a significant step toward ultra-fast standardized RT planning and a streamlined clinical workflow.


【29】Random Quadratic Form on a Sphere: Synchronization by Common Noise
标题:球体上的随机二次型:通过公共噪音进行同步
链接:https://arxiv.org/abs/2603.06187

作者:Maximilian Engel,Anna Shalova
摘要:我们引入随机二次型(RQF):一个随机微分方程,它形式上对应于一个随机二次泛函在一个球上的梯度流。虽然系统的一点动力学是布朗运动,因此没有优选的方向,两点运动表现出非平凡的同步行为。在这项工作中,我们研究的同步RQF,即我们给出了分布和路径的性质的解决方案,通过研究不变的措施和随机吸引子的系统。   RQF模型的动机是研究Transformers中线性层的作用,并说明了在Transformers的简化模型中出现的常见噪声现象的同步。特别是,我们提供了另一种(独立于自我注意)解释的聚类行为在深Transformers,并表明,令牌集群,即使在没有自我注意机制。
摘要:We introduce the Random Quadratic Form (RQF): a stochastic differential equation which formally corresponds to the gradient flow of a random quadratic functional on a sphere. While the one-point dynamics of the system is a Brownian motion and thus has no preferred direction, the two-point motion exhibits nontrivial synchronizing behaviour. In this work we study synchronization of the RQF, namely we give both distributional and path-wise characterizations of the solutions by studying invariant measures and random attractors of the system.   The RQF model is motivated by the study of the role of linear layers in transformers and illustrates the synchronization by common noise phenomena arising in the simplified models of transformers. In particular, we provide an alternative (independent of self-attention) explanation of the clustering behaviour in deep transformers and show that tokens cluster even in the absence of the self-attention mechanism.


【30】The Rise of AI in Weather and Climate Information and its Impact on Global Inequality
标题:天气和气候信息中人工智能的兴起及其对全球不平等的影响
链接:https://arxiv.org/abs/2603.05710

作者:Amirpasha Mozaffari,Amanda Duarte,Lina Teckentrup,Stefano Materia,Gina E. C. Charnley,Lluis Palma,Eulalia Baulenas Serra,Dragana Bojovic,Paula Checchia,Aude Carreric,Francisco Doblas-Reyes
摘要:人工智能在地球系统科学中的迅速采用,有望在气候信息的生成方面实现前所未有的速度和保真度。然而,这种技术实力建立在一个脆弱和不平等的基础上:目前人工智能发展的轨迹有可能进一步自动化和扩大全球气候信息系统中的南北差距。我们概述了高性能计算和数据基础设施的全球不对称性,表明基础模型的开发几乎完全集中在全球北方。使用三个不同的领域,我们展示了这种基础设施不平等如何通过模型的输入,过程和输出来持续。例如,在天气和气候建模方面,依赖历史上有偏见的数据导致系统性的性能差距,对最脆弱的地区造成不成比例的影响。在气候影响建模中,数据稀疏和不具代表性的验证有可能导致误导性干预和适应不良。最后,在大型语言模型中,对气候知识的主要文本形式的依赖有可能加强现有的偏见。我们的结论是,解决这些差异需要重新审视三个阶段,即模型输入,过程和输出。这包括:(i)从以模型为中心的发展转向以数据为中心的发展,(ii)建立气候数字公共基础设施和以人为中心的评估指标,以及(iii)从生产者-消费者动态转向知识共同生产。这种不同知识系统的整合将真正实现计算主权的民主化,并确保人工智能革命促进真正的系统弹性,而不是加剧不平等。
摘要 :The rapid adoption of AI in Earth system science promises unprecedented speed and fidelity in the generation of climate information. However, this technological prowess rests on a fragile and unequal foundation: the current trajectory of AI development risks further automating and amplifying the North-South divide in the global climate information system. We outline the global asymmetry in High-Performance Computing and data infrastructure, demonstrating that the development of foundation models is almost exclusively concentrated in the Global North. Using three different domains, we show how this infrastructure inequality continues through models' inputs, processes and outputs. As an example, in weather and climate modelling, the reliance on historically biased data leads to systematic performance gaps that disproportionately affect the most vulnerable regions. In climate impact modelling, data sparsity and unrepresentative validation risk driving misleading interventions and maladaptation. Finally, in large language models, dependence on dominant textualised forms of climate knowledge risks reinforcing existing biases. We conclude that addressing these disparities demands revisiting the three phases, i.e. models Input, Process and Output. This involves (i) a perspective shift from model-centric to data-centric development, (ii) the establishment of a Climate Digital Public Infrastructure and human-centric evaluation metrics, and (iii) a move from producer-consumer dynamics toward knowledge co-production. This integration of diverse knowledge systems would truly democratise compute sovereignty and ensure that the AI revolution fosters genuine systemic resilience rather than exacerbating inequity.


【31】An intuitive rearranging of the Yates covariance decomposition for probabilistic verification of forecasts with the Brier score
标题:直观地重新排列耶茨协方差分解,用于使用Brier分数对预测进行概率验证
链接:https://arxiv.org/abs/2603.05544

作者:Bruno Hebling Vieira
备注:4 pages, 0 figures
摘要:正确的评分规则对于评估概率预报是必不可少的。我们提出了一个简单的代数重排的耶茨协方差分解的Brier得分成三个独立的非负的条款:方差不匹配的条款,相关赤字条款,和校准的长期。这种重新安排使完美预测的最优性条件变得透明:最优预测必须同时匹配结果的方差,与结果实现完美的正相关,并匹配结果的均值。与这些条件的任何偏差都会对Brier评分产生积极影响。
摘要:Proper scoring rules are essential for evaluating probabilistic forecasts. We propose a simple algebraic rearrangement of the Yates covariance decomposition of the Brier score into three independently non-negative terms: a variance mismatch term, a correlation deficit term, and a calibration-in-the-large term. This rearrangement makes the optimality conditions for perfect forecasting transparent: the optimal forecast must simultaneously match the variance of outcomes, achieve perfect positive correlation with outcomes, and match the mean of outcomes. Any deviation from these conditions results in a positive contribution to the Brier score.


【32】A mixed-frequency approach for exchange rates predictions
标题:汇率预测的混合频率方法
链接:https://arxiv.org/abs/2106.00283

作者:Raffaele Mattera,Michelangelo Misuraca,Germana Scepi,Maria Spano
摘要:选择一个适当的统计模型来预测汇率,今天仍然是决策者和中央银行家的一个相关问题。所谓的Meese和Rogoff难题评估汇率波动是不可预测的。在文献中,许多研究试图解决这个难题,找到替代的预测和统计模型的基础上的时间聚集。本文提出了一种基于混合频率模型的方法来克服时间聚集导致的信息缺乏。我们通过执行CAD/USD汇率预测与其他方法相比,我们的方法的有效性。
摘要:Selecting an appropriate statistical model to forecast exchange rates is still today a relevant issue for policymakers and central bankers. The so-called Meese and Rogoff puzzle assesses that exchange rate fluctuations are unpredictable. In the literature, a lot of studies tried to solve the puzzle finding alternative predictors and statistical models based on temporal aggregation. In this paper, we propose an approach based on mixed frequency models to overcome the lack of information caused by temporal aggregation. We show the effectiveness of our approach in comparison with other proposed methods by performing CAD/USD exchange rate predictions.


机器翻译由腾讯交互翻译提供,仅供参考

点击“阅读原文”获取带摘要的学术速递

Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/193744