点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!
cs.LG 方向,今日共计151篇
大模型相关(14篇)
【1】Dynamic Personality Adaptation in Large Language Models via State Machines
标题:通过状态机在大型语言模型中进行动态人格适应
链接:https://arxiv.org/abs/2602.22157
作者:Leon Pielage,Ole Hätscher,Mitja Back,Bernhard Marschall,Benjamin Risse
备注:22 pages, 5 figures, submitted to ICPR 2026
摘要:大型语言模型(LLM)无法调整其个性表达以应对不断变化的对话动态,这阻碍了它们在复杂的交互式环境中的表现。我们提出了一个模型不可知的动态人格模拟框架,采用状态机来表示潜在的个性状态,转移概率是动态适应的会话上下文。我们架构的一部分是一个模块化的管道,用于连续的个性评分,沿着潜在的轴评估对话,同时对特定的个性模型、它们的维度、过渡机制或所使用的LLM保持不可知性。这些分数的功能作为动态的状态变量,系统地重新配置系统提示,指导整个interaction.We的行为对齐评估这个框架的操作化的人际环(IPC)在医学教育环境。结果表明,该系统成功地适应用户输入的个性状态,但也影响用户的行为,从而促进降级培训。值得注意的是,即使使用轻量级、微调的分类器而不是大规模LLM,评分管道也能保持相当的精度。这项工作证明了模块化,个性化自适应架构的可行性,用于教育,客户支持和更广泛的人机交互。
摘要:The inability of Large Language Models (LLMs) to modulate their personality expression in response to evolving dialogue dynamics hinders their performance in complex, interactive contexts. We propose a model-agnostic framework for dynamic personality simulation that employs state machines to represent latent personality states, where transition probabilities are dynamically adapted to the conversational context. Part of our architecture is a modular pipeline for continuous personality scoring that evaluates dialogues along latent axes while remaining agnostic to the specific personality models, their dimensions, transition mechanisms, or LLMs used. These scores function as dynamic state variables that systematically reconfigure the system prompt, steering behavioral alignment throughout the interaction.We evaluate this framework by operationalizing the Interpersonal Circumplex (IPC) in a medical education setting. Results demonstrate that the system successfully adapts its personality state to user inputs, but also influences user behavior, thereby facilitating de-escalation training. Notably, the scoring pipeline maintains comparable precision even when utilizing lightweight, fine-tuned classifiers instead of large-scale LLMs. This work demonstrates the feasibility of modular, personality-adaptive architectures for education, customer support, and broader human-computer interaction.
【2】Provable Last-Iterate Convergence for Multi-Objective Safe LLM Alignment via Optimistic Primal-Dual
标题:通过乐观原始-二元实现多目标安全LLM对齐的可证明最后迭代收敛
链接:https://arxiv.org/abs/2602.22146
作者:Yining Li,Peizhong Ju,Ness Shroff
摘要:基于人类反馈的强化学习(RLHF)在调整大型语言模型(LLM)与人类偏好方面发挥着重要作用。虽然RLHF与预期的奖励约束可以制定为一个原始-对偶优化问题,标准的原始-对偶方法只能保证收敛的鞍点问题是在凸-凹形式的分配政策。此外,在实际应用中,标准的原始-对偶方法在政策参数化的情况下,可能会在最后一个周期出现不稳定或发散。在这项工作中,我们提出了一个通用的原始-对偶框架,安全RLHF,统一了广泛的类现有的对齐算法,包括安全RLHF,一杆,多杆为基础的方法。在此框架的基础上,我们引入了一个乐观的原始-对偶(OPD)算法,它结合了原始和对偶变量的预测更新,以稳定鞍点动态。我们建立了最后一个收敛保证所提出的方法,包括精确的政策优化分布空间和收敛到最优解的邻域,其差距是相关的近似误差和参数化政策下的偏见。我们的分析表明,乐观主义起着至关重要的作用,在减轻振荡固有的约束对齐目标,从而关闭约束RL和实际RLHF之间的一个关键的理论差距。
摘要:Reinforcement Learning from Human Feedback (RLHF) plays a significant role in aligning Large Language Models (LLMs) with human preferences. While RLHF with expected reward constraints can be formulated as a primal-dual optimization problem, standard primal-dual methods only guarantee convergence with a distributional policy where the saddle-point problem is in convex-concave form. Moreover, standard primal-dual methods may exhibit instability or divergence in the last iterate under policy parameterization in practical applications. In this work, we propose a universal primal-dual framework for safe RLHF that unifies a broad class of existing alignment algorithms, including safe-RLHF, one-shot, and multi-shot based methods. Building on this framework, we introduce an optimistic primal-dual (OPD) algorithm that incorporates predictive updates for both primal and dual variables to stabilize saddle-point dynamics. We establish last-iterate convergence guarantees for the proposed method, covering both exact policy optimization in the distributional space and convergence to a neighborhood of the optimal solution whose gap is related to approximation error and bias under parameterized policies. Our analysis reveals that optimism plays a crucial role in mitigating oscillations inherent to constrained alignment objectives, thereby closing a key theoretical gap between constrained RL and practical RLHF.
【3】SWE-Protégé: Learning to Selectively Collaborate With an Expert Unlocks Small Language Models as Software Engineering Agents
标题:SWE-Protégé:学习选择性地与专家协作解锁小语言模型作为软件工程代理
链接:https://arxiv.org/abs/2602.22124
作者:Patrick Tser Jern Kon,Archana Pradeep,Ang Chen,Alexander P. Ellis,Warren Hunt,Zijian Wang,John Yang,Samuel Thompson
摘要
:小型语言模型(SLM)在成本、延迟和适应性方面具有令人信服的优势,但到目前为止,在SWE-bench等长期软件工程任务上落后于大型模型,它们受到普遍的动作循环和低分辨率的影响。我们介绍SWE-Protégé,一个后培训框架,将软件修复重新定义为专家-门徒协作问题。在SWE-Protégé中,可持续土地管理仍然是唯一的决策者,同时学会有选择地寻求强有力的专家模型的指导,识别停滞状态,并贯彻专家的反馈意见。我们的方法将专家增强轨迹的监督微调与代理强化学习相结合,明确阻止退化循环和非生产性专家协作。我们对Qwen2.5-Coder-7 B-Instruct进行了轻微的后训练,在SWE工作台验证上实现了42.4%的Pass@1,比现有的SLM技术水平提高了25.4%,同时稀疏地使用专家帮助(每个任务约4次调用,总令牌的11%)。
摘要:Small language models (SLMs) offer compelling advantages in cost, latency, and adaptability, but have so far lagged behind larger models on long-horizon software engineering tasks such as SWE-bench, where they suffer from pervasive action looping and low resolution rates. We introduce SWE-Protégé, a post-training framework that reframes software repair as an expert-protégé collaboration problem. In SWE-Protégé, an SLM remains the sole decision-maker while learning to selectively seek guidance from a strong expert model, recognize stalled states, and follow through on expert feedback. Our approach combines supervised fine-tuning on expert-augmented trajectories with agentic reinforcement learning that explicitly discourages degenerative looping and unproductive expert collaboration. We lightly post-train Qwen2.5-Coder-7B-Instruct to achieve 42.4% Pass@1 on SWE-bench Verified, a +25.4% improvement over the prior SLM state of the art, while using expert assistance sparsely (~4 calls per task and 11% of total tokens).
【4】Enhancing LLM-Based Test Generation by Eliminating Covered Code
标题:通过消除覆盖代码来增强基于LLM的测试生成
链接:https://arxiv.org/abs/2602.21997
作者:WeiZhe Xu,Mengyu Liu,Fanxin Kong
备注:9 pages, 4 figures, supplementary material included
摘要:自动化测试生成对于软件质量保证至关重要,覆盖率是确保全面测试的关键指标。大型语言模型(LLM)的最新进展在改进测试生成方面显示出了希望,特别是在实现更高的覆盖率方面。然而,尽管现有的基于LLM的测试生成解决方案在小的、孤立的代码片段上表现良好,但当应用于测试中的复杂方法时,它们会遇到困难。为了解决这些问题,我们提出了一个可扩展的基于LLM的单元测试生成方法。我们的方法包括两个关键步骤。第一步是上下文信息检索,它使用LLM和静态分析来收集与测试中的复杂方法相关的上下文信息。第二步,带代码消除的迭代测试生成,重复地为代码片生成单元测试,跟踪实现的覆盖率,并选择性地删除已经覆盖的代码段。该过程简化了测试任务,并缓解了令牌限制或与过长上下文相关的推理有效性降低所产生的问题。通过对开源项目的全面评估,我们的方法优于最先进的基于LLM和基于搜索的方法,证明了其在实现复杂方法的高覆盖率方面的有效性。
摘要:Automated test generation is essential for software quality assurance, with coverage rate serving as a key metric to ensure thorough testing. Recent advancements in Large Language Models (LLMs) have shown promise in improving test generation, particularly in achieving higher coverage. However, while existing LLM-based test generation solutions perform well on small, isolated code snippets, they struggle when applied to complex methods under test. To address these issues, we propose a scalable LLM-based unit test generation method. Our approach consists of two key steps. The first step is context information retrieval, which uses both LLMs and static analysis to gather relevant contextual information associated with the complex methods under test. The second step, iterative test generation with code elimination, repeatedly generates unit tests for the code slice, tracks the achieved coverage, and selectively removes code segments that have already been covered. This process simplifies the testing task and mitigates issues arising from token limits or reduced reasoning effectiveness associated with excessively long contexts. Through comprehensive evaluations on open-source projects, our approach outperforms state-of-the-art LLM-based and search-based methods, demonstrating its effectiveness in achieving high coverage on complex methods.
【5】Offline Reasoning for Efficient Recommendation: LLM-Empowered Persona-Profiled Item Indexing
标题:高效推荐的离线推理:LLM授权的人物描述项目索引
链接:https://arxiv.org/abs/2602.21756
作者:Deogyong Kim,Junseong Lee,Jeongeun Lee,Changhoe Kim,Junguel Lee,Jungseok Lee,Dongha Lee
备注:Under review
摘要:大型语言模型(LLM)的最新进展提供了新的机会,通过丰富的语义理解和上下文推理捕捉用户兴趣和项目特征的细微差别的语义推荐系统。特别地,LLM已经被用作基于推断的用户项目相关性对候选项目重新排序的重新排序器。然而,这些方法通常需要昂贵的在线推理时间推理,导致高延迟,阻碍了现实世界的部署。在这项工作中,我们介绍了Persona 4 Rec,一个推荐框架,执行离线推理,构建可解释的人物角色表示的项目,实现轻量级和可扩展的实时推理。在离线阶段,Persona 4 Rec利用LLM对项目评论进行推理,推断不同的用户动机,解释为什么不同类型的用户可能会参与项目;这些推断的动机被具体化为角色表示,为每个项目提供多个人类可解释的视图。与依赖于单个项目表示的传统方法不同,Persona 4 Rec通过专用编码器学习将用户配置文件与最合理的项目侧角色对齐,有效地将用户-项目相关性转换为用户-角色相关性。在在线阶段,这个人物分析项目索引允许快速相关性计算,而无需调用昂贵的LLM推理。大量的实验表明,Persona 4 Rec实现了与最近基于LLM的重排序器相当的性能,同时大大减少了推理时间。此外,定性分析证实,人物角色表示不仅驱动有效的评分,但也提供直观的,审查接地的解释。这些结果表明,Persona 4 Rec为下一代推荐系统提供了一个实用和可解释的解决方案。
摘要
:Recent advances in large language models (LLMs) offer new opportunities for recommender systems by capturing the nuanced semantics of user interests and item characteristics through rich semantic understanding and contextual reasoning. In particular, LLMs have been employed as rerankers that reorder candidate items based on inferred user-item relevance. However, these approaches often require expensive online inference-time reasoning, leading to high latency that hampers real-world deployment. In this work, we introduce Persona4Rec, a recommendation framework that performs offline reasoning to construct interpretable persona representations of items, enabling lightweight and scalable real-time inference. In the offline stage, Persona4Rec leverages LLMs to reason over item reviews, inferring diverse user motivations that explain why different types of users may engage with an item; these inferred motivations are materialized as persona representations, providing multiple, human-interpretable views of each item. Unlike conventional approaches that rely on a single item representation, Persona4Rec learns to align user profiles with the most plausible item-side persona through a dedicated encoder, effectively transforming user-item relevance into user-persona relevance. At the online stage, this persona-profiled item index allows fast relevance computation without invoking expensive LLM reasoning. Extensive experiments show that Persona4Rec achieves performance comparable to recent LLM-based rerankers while substantially reducing inference time. Moreover, qualitative analysis confirms that persona representations not only drive efficient scoring but also provide intuitive, review-grounded explanations. These results demonstrate that Persona4Rec offers a practical and interpretable solution for next-generation recommender systems.
【6】Breaking Semantic-Aware Watermarks via LLM-Guided Coherence-Preserving Semantic Injection
标题:通过LLM引导的保持一致性的语义注入打破语义感知水印
链接:https://arxiv.org/abs/2602.21593
作者:Zheng Gao,Xiaoyu Li,Zhicheng Bao,Xiaoyan Feng,Jiaojiao Jiang
备注:Accepted by The Web Conference 2026 (Short Paper Track)
摘要:在社交媒体和在线版权分发场景中,生成图像在Web平台上激增,语义水印越来越多地被集成到扩散模型中,以支持可靠的出处跟踪和Web内容的防伪。然而,传统的基于噪声层的水印仍然容易受到可以恢复嵌入信号的反转攻击。为了缓解这一问题,最近的内容感知语义水印方案将水印信号绑定到高级图像语义,约束否则会破坏全局一致性的本地编辑。然而,大型语言模型(LLM)拥有结构化的推理能力,能够有针对性地探索语义空间,允许局部细粒度但全局一致的语义更改,使这种绑定无效。为了暴露这个被忽视的漏洞,我们引入了一种保持一致性的语义注入(CSI)攻击,该攻击在嵌入空间相似性约束下利用LLM引导的语义操作。这种对齐方式强制执行视觉语义一致性,同时选择性地扰乱水印相关语义,最终导致检测器误分类。大量的实证结果表明,CSI始终优于现行的攻击基线对内容感知的语义水印,揭示了一个基本的安全弱点,目前的语义水印设计时,面对LLM驱动的语义扰动。
摘要:Generative images have proliferated on Web platforms in social media and online copyright distribution scenarios, and semantic watermarking has increasingly been integrated into diffusion models to support reliable provenance tracking and forgery prevention for web content. Traditional noise-layer-based watermarking, however, remains vulnerable to inversion attacks that can recover embedded signals. To mitigate this, recent content-aware semantic watermarking schemes bind watermark signals to high-level image semantics, constraining local edits that would otherwise disrupt global coherence. Yet, large language models (LLMs) possess structured reasoning capabilities that enable targeted exploration of semantic spaces, allowing locally fine-grained but globally coherent semantic alterations that invalidate such bindings. To expose this overlooked vulnerability, we introduce a Coherence-Preserving Semantic Injection (CSI) attack that leverages LLM-guided semantic manipulation under embedding-space similarity constraints. This alignment enforces visual-semantic consistency while selectively perturbing watermark-relevant semantics, ultimately inducing detector misclassification. Extensive empirical results show that CSI consistently outperforms prevailing attack baselines against content-aware semantic watermarking, revealing a fundamental security weakness of current semantic watermark designs when confronted with LLM-driven semantic perturbations.
【7】Duel-Evolve: Reward-Free Test-Time Scaling via LLM Self-Preferences
标题:Duel-Evolve:通过LLM自我偏好进行免奖励测试时间缩放
链接:https://arxiv.org/abs/2602.21585
作者:Sweta Karlekar,Carolina Zheng,Magnus Saebo,Nicolas Beltran-Velez,Shuyang Yu,John Bowlan,Michal Kucer,David Blei
摘要:许多应用程序试图通过在离散输出空间上迭代地提出、评分和细化候选项来优化测试时的LLM输出。现有的方法使用一个校准的标量评估器的目标目标来指导搜索,但对于许多任务,这样的分数是不可用的,太稀疏,或不可靠。相比之下,成对比较往往更容易引出,仍然提供有用的信号改进方向,并且可以从LLM本身获得而无需外部监督。基于这一观察,我们引入了Duel-Evolve,这是一种进化优化算法,它用来自用于生成候选人的相同LLM的成对偏好取代了外部标量奖励。Duel-Evolve通过Bayesian Bradley-Terry模型聚合这些噪声候选比较,产生候选质量的不确定性感知估计。这些质量估计值指导使用双汤普森抽样将比较预算分配到合理的最优值,以及选择高质量的亲本以产生改进的候选者。我们在MathBench上评估了Duel-Evolve,它比现有方法和基线的准确率高出20个百分点,在LiveCodeBench上,它比可比的迭代方法提高了12个百分点。值得注意的是,该方法不需要奖励模型,在搜索过程中不需要地面实况标签,也不需要手工制作的评分函数。结果表明,成对的自我偏好提供了强大的优化信号,测试时间的改善,在大的,离散的输出空间。
摘要:Many applications seek to optimize LLM outputs at test time by iteratively proposing, scoring, and refining candidates over a discrete output space. Existing methods use a calibrated scalar evaluator for the target objective to guide search, but for many tasks such scores are unavailable, too sparse, or unreliable. Pairwise comparisons, by contrast, are often easier to elicit, still provide useful signal on improvement directions, and can be obtained from the LLM itself without external supervision. Building on this observation, we introduce Duel-Evolve, an evolutionary optimization algorithm that replaces external scalar rewards with pairwise preferences elicited from the same LLM used to generate candidates. Duel-Evolve aggregates these noisy candidate comparisons via a Bayesian Bradley-Terry model, yielding uncertainty-aware estimates of candidate quality. These quality estimates guide allocation of the comparison budget toward plausible optima using Double Thompson Sampling, as well as selection of high-quality parents to generate improved candidates. We evaluate Duel-Evolve on MathBench, where it achieves 20 percentage points higher accuracy over existing methods and baselines, and on LiveCodeBench, where it improves over comparable iterative methods by over 12 percentage points. Notably, the method requires no reward model, no ground-truth labels during search, and no hand-crafted scoring function. Results show that pairwise self-preferences provide strong optimization signal for test-time improvement over large, discrete output spaces.
【8】GradAlign: Gradient-Aligned Data Selection for LLM Reinforcement Learning
标题:GradAlign:LLM强化学习的对象对齐数据选择
链接:https://arxiv.org/abs/2602.21492
作者:Ningyuan Yang,Weihua Du,Weiwei Sun,Sean Welleck,Yiming Yang
备注:14 pages. Preliminary work
摘要
:强化学习(RL)已成为大型语言模型(LLM)的核心后训练范式,但其性能对训练问题的质量高度敏感。这种敏感性源于RL的非平稳性:推出是由不断演变的策略生成的,学习是由探索和奖励反馈形成的,不像具有固定轨迹的监督微调(SFT)。因此,先前的工作通常依赖于手动策展或简单的启发式过滤器(例如,准确性),这可能承认不正确或低效用的问题。我们提出了GradAlign,这是一种用于LLM强化学习的梯度对齐数据选择方法,它使用小型、可信的验证集来优先考虑政策梯度与验证梯度对齐的训练问题,从而产生自适应课程。我们在三个具有挑战性的数据体系中评估了GradAlign:不可靠的奖励信号,分布不平衡和低效用训练语料库,表明GradAlign始终优于现有基线,强调了方向梯度信号在导航非平稳策略优化中的重要性,并产生更稳定的训练和改进的最终性能。我们在https://github.com/StigLidu/GradAlign上发布我们的实现
摘要:Reinforcement learning (RL) has become a central post-training paradigm for large language models (LLMs), but its performance is highly sensitive to the quality of training problems. This sensitivity stems from the non-stationarity of RL: rollouts are generated by an evolving policy, and learning is shaped by exploration and reward feedback, unlike supervised fine-tuning (SFT) with fixed trajectories. As a result, prior work often relies on manual curation or simple heuristic filters (e.g., accuracy), which can admit incorrect or low-utility problems. We propose GradAlign, a gradient-aligned data selection method for LLM reinforcement learning that uses a small, trusted validation set to prioritize training problems whose policy gradients align with validation gradients, yielding an adaptive curriculum. We evaluate GradAlign across three challenging data regimes: unreliable reward signals, distribution imbalance, and low-utility training corpus, showing that GradAlign consistently outperforms existing baselines, underscoring the importance of directional gradient signals in navigating non-stationary policy optimization and yielding more stable training and improved final performance. We release our implementation at https://github.com/StigLidu/GradAlign
【9】Causal Decoding for Hallucination-Resistant Multimodal Large Language Models
标题:抗幻觉多模态大语言模型的因果解码
链接:https://arxiv.org/abs/2602.21441
作者:Shiwei Tan,Hengyi Wang,Weiyi Qin,Qi Xu,Zhigang Hua,Hao Wang
备注:Published in Transactions on Machine Learning Research (TMLR), 2026
摘要:多模态大型语言模型(MLLM)对视觉语言任务提供了详细的响应,但仍然容易受到对象幻觉(引入图像中不存在的对象)的影响,从而破坏了实践中的可靠性。先前的努力通常依赖于启发式惩罚、事后校正或通用解码调整,其不直接干预触发对象幻觉的机制,从而产生有限的收益。为了应对这一挑战,我们提出了一个因果解码框架,在生成过程中应用有针对性的因果干预,以遏制虚假的对象提到。通过重塑解码动态衰减虚假的依赖关系,我们的方法减少了错误的对象令牌,同时保持描述性的质量。在字幕和QA基准测试中,我们的框架大大降低了对象幻觉率,并在不降低整体输出质量的情况下实现了最先进的忠实性。
摘要:Multimodal Large Language Models (MLLMs) deliver detailed responses on vision-language tasks, yet remain susceptible to object hallucination (introducing objects not present in the image), undermining reliability in practice. Prior efforts often rely on heuristic penalties, post-hoc correction, or generic decoding tweaks, which do not directly intervene in the mechanisms that trigger object hallucination and thus yield limited gains. To address this challenge, we propose a causal decoding framework that applies targeted causal interventions during generation to curb spurious object mentions. By reshaping the decoding dynamics to attenuate spurious dependencies, our approach reduces false object tokens while maintaining descriptive quality. Across captioning and QA benchmarks, our framework substantially lowers object-hallucination rates and achieves state-of-the-art faithfulness without degrading overall output quality.
【10】PSF-Med: Measuring and Explaining Paraphrase Sensitivity in Medical Vision Language Models
标题:PSF-Med:测量和解释医学视觉语言模型中的释义敏感性
链接:https://arxiv.org/abs/2602.21428
作者:Binesh Sadanandan,Vahid Behzadan
摘要:医学视觉语言模型(VLM)可能会在临床医生重新表述同一问题时改变答案,这会增加部署风险。我们引入了释义敏感性失败(PSF)-Med,这是一个包含19,748个胸部X射线问题的基准,其中包含MIMIC-CXR和PadChest中约92,000个意义保留释义。在六个医学VLM中,我们测量了同一图像的是/否翻转,发现翻转率从8%到58%不等。然而,低翻转率并不意味着视觉基础:纯文本基线显示,即使图像被删除,一些模型也保持一致,这表明它们依赖于语言先验。为了研究一个模型中的机制,我们将GemmaScope 2稀疏自动编码器(SAE)应用于MedGemma 4 B,并分析FlipBank,这是一组158个翻转案例。我们在第17层确定了一个稀疏特征,该特征与提示成帧相关,并预测决策裕度偏移。在因果修补中,移除此功能的贡献平均可恢复45%的是-否logit边际,并完全逆转15%的翻转。根据这一发现,我们表明,在推理时将识别的特征夹在中间,相对于仅1.3个存储点的准确性成本,翻转率降低了31%,同时还降低了文本先验依赖性。这些结果表明,翻转率本身是不够的,鲁棒性评估应该测试释义稳定性和图像可靠性。
摘要:Medical Vision Language Models (VLMs) can change their answers when clinicians rephrase the same question, which raises deployment risks. We introduce Paraphrase Sensitivity Failure (PSF)-Med, a benchmark of 19,748 chest Xray questions paired with about 92,000 meaningpreserving paraphrases across MIMIC-CXR and PadChest. Across six medical VLMs, we measure yes/no flips for the same image and find flip rates from 8% to 58%. However, low flip rate does not imply visual grounding: text-only baselines show that some models stay consistent even when the image is removed, suggesting they rely on language priors. To study mechanisms in one model, we apply GemmaScope 2 Sparse Autoencoders (SAEs) to MedGemma 4B and analyze FlipBank, a curated set of 158 flip cases. We identify a sparse feature at layer 17 that correlates with prompt framing and predicts decision margin shifts. In causal patching, removing this feature's contribution recovers 45% of the yesminus-no logit margin on average and fully reverses 15% of flips. Acting on this finding, we show that clamping the identified feature at inference reduces flip rates by 31% relative with only a 1.3 percentage-point accuracy cost, while also decreasing text-prior reliance. These results suggest that flip rate alone is not enough; robustness evaluations should test both paraphrase stability and image reliance.
【11】Small Language Models for Privacy-Preserving Clinical Information Extraction in Low-Resource Languages
标题:低资源语言中用于隐私保护临床信息提取的小语言模型
链接:https://arxiv.org/abs/2602.21374
作者:Mohammadreza Ghaffarzadeh-Esfahani,Nahid Yousefian,Ebrahim Heidari-Farsani,Ali Akbar Omidvarian,Sepehr Ghahraei,Atena Farangi,AmirBahador Boroumand
备注:16 pages, 3 figures, 2 supplementary files
摘要:从低资源语言的医疗记录中提取临床信息仍然是医疗自然语言处理(NLP)的一个重大挑战。这项研究评估了一个两步管道,该管道将Aya-expansas-8B作为波斯语到英语的翻译模型与五个开源小语言模型(SLM)- Qwen2.5- 7 B-Instruct,Llama-3.1-8B-Instruct,Llama-3.2-3B-Instruct,Qwen2.5-1. 5 B-Instruct和Gemma-3-1B-it -结合起来,用于从1,在癌症姑息治疗呼叫中心收集的221份匿名波斯语抄本。使用没有微调的Few-Shot提示策略,对模型的宏观平均F1评分、马修斯相关系数(MCC)、灵敏度和特异性进行评估,以解释类别不平衡。Qwen2.5- 7 B-Instruct实现了最高的整体性能(中位宏F1:0.899; MCC:0.797),而Gemma-3-1B-it显示出最弱的结果。较大的模型(7 B-8B参数)在灵敏度和MCC方面始终优于较小的模型。对Aya-expansas-8B的双语分析表明,将波斯语成绩单翻译成英语提高了敏感性,减少了缺失的输出,并提高了对类不平衡的鲁棒性指标,尽管代价是特异性和精确度略低。实验水平的结果显示,在大多数模型中,生理症状的提取是可靠的,而心理抱怨、行政要求和复杂的躯体特征仍然具有挑战性。这些发现为在基础设施和注释资源有限的多语言临床NLP环境中部署开源SLM建立了一个实用的、保护隐私的蓝图,并强调了联合优化敏感医疗保健应用程序的模型规模和输入语言策略的重要性。
摘要:Extracting clinical information from medical transcripts in low-resource languages remains a significant challenge in healthcare natural language processing (NLP). This study evaluates a two-step pipeline combining Aya-expanse-8B as a Persian-to-English translation model with five open-source small language models (SLMs) -- Qwen2.5-7B-Instruct, Llama-3.1-8B-Instruct, Llama-3.2-3B-Instruct, Qwen2.5-1.5B-Instruct, and Gemma-3-1B-it -- for binary extraction of 13 clinical features from 1,221 anonymized Persian transcripts collected at a cancer palliative care call center. Using a few-shot prompting strategy without fine-tuning, models were assessed on macro-averaged F1-score, Matthews Correlation Coefficient (MCC), sensitivity, and specificity to account for class imbalance. Qwen2.5-7B-Instruct achieved the highest overall performance (median macro-F1: 0.899; MCC: 0.797), while Gemma-3-1B-it showed the weakest results. Larger models (7B--8B parameters) consistently outperformed smaller counterparts in sensitivity and MCC. A bilingual analysis of Aya-expanse-8B revealed that translating Persian transcripts to English improved sensitivity, reduced missing outputs, and boosted metrics robust to class imbalance, though at the cost of slightly lower specificity and precision. Feature-level results showed reliable extraction of physiological symptoms across most models, whereas psychological complaints, administrative requests, and complex somatic features remained challenging. These findings establish a practical, privacy-preserving blueprint for deploying open-source SLMs in multilingual clinical NLP settings with limited infrastructure and annotation resources, and highlight the importance of jointly optimizing model scale and input language strategy for sensitive healthcare applications.
【12】Tool-R0: Self-Evolving LLM Agents for Tool-Learning from Zero Data
标题:Tool-R 0:用于从零数据学习的自进化LLM代理
链接:https://arxiv.org/abs/2602.21320
作者:Emre Can Acikgoz,Cheng Qian,Jonas Hübotter,Heng Ji,Dilek Hakkani-Tür,Gokhan Tur
摘要:大型语言模型(LLM)正在成为自治代理的基础,这些代理可以使用工具来解决复杂的任务。强化学习(RL)已经成为注入这种代理能力的常用方法,但通常是在严格控制的训练设置下。它通常依赖于精心构建的任务解决方案对和大量的人类监督,这对向超级智能系统的开放式自我进化造成了根本性的障碍。在本文中,我们提出了Tool-R 0框架,用于在零数据假设下从头开始训练具有自玩RL的通用工具调用代理。从相同的基础LLM初始化,Tool-R 0共同进化了一个生成器和一个具有互补奖励的求解器:一个在另一个的能力边界上提出有针对性的挑战性任务,另一个学习用真实世界的工具调用来解决这些任务。这创建了一个自我进化的循环,不需要预先存在的任务或数据集。对不同工具使用基准的评估表明,Tool-R 0比基本模型产生92.5的相对改进,并在相同设置下超过完全监督的工具调用基线。我们的工作通过分析共同进化,课程动态和缩放行为进一步提供了对自我发挥LLM代理的经验见解。
摘要:Large language models (LLMs) are becoming the foundation for autonomous agents that can use tools to solve complex tasks. Reinforcement learning (RL) has emerged as a common approach for injecting such agentic capabilities, but typically under tightly controlled training setups. It often depends on carefully constructed task-solution pairs and substantial human supervision, which creates a fundamental obstacle to open-ended self-evolution toward superintelligent systems. In this paper, we propose Tool-R0 framework for training general purpose tool-calling agents from scratch with self-play RL, under a zero-data assumption. Initialized from the same base LLM, Tool-R0 co-evolves a Generator and a Solver with complementary rewards: one proposes targeted challenging tasks at the other's competence frontier and the other learns to solve them with real-world tool calls. This creates a self-evolving cycle that requires no pre-existing tasks or datasets. Evaluation on different tool-use benchmarks show that Tool-R0 yields 92.5 relative improvement over the base model and surpasses fully supervised tool-calling baselines under the same setting. Our work further provides empirical insights into self-play LLM agents by analyzing co-evolution, curriculum dynamics, and scaling behavior.
【13】Under the Influence: Quantifying Persuasion and Vigilance in Large Language Models
标题:受影响:量化大型语言模型中的说服和警惕
链接:https://arxiv.org/abs/2602.21262
作者:Sasha Robinson,Kerem Oktar,Katherine M. Collins,Ilia Sucholutsky,Kelsey R. Allen
摘要
:随着大型语言模型(LLM)越来越多地集成到高风险的人类决策领域,了解它们作为顾问引入的风险非常重要。为了成为有用的顾问,LLM必须筛选大量的内容,这些内容既有善意的,也有恶意的,然后使用这些信息来说服用户采取特定的行动。这涉及到两种社会能力:警惕(决定使用哪些信息,丢弃哪些信息的能力)和说服(综合现有证据,提出令人信服的论点)。虽然现有的工作是孤立地调查这些能力,但很少有事先调查这些能力如何联系在一起。在这里,我们使用一个简单的多回合解谜游戏,推箱子,研究LLM的能力,说服和理性警惕其他LLM代理。我们发现,解决难题的表现,说服能力,警惕性是可分离的能力在LLM。在游戏中表现良好并不意味着模型可以自动检测到它何时被误导,即使欺骗的可能性被明确提及。%作为提示的一部分。然而,LLM确实一贯地调整他们的令牌使用,当建议是善意的时,使用较少的令牌进行推理,而当建议是恶意的时,即使他们仍然被说服采取导致他们失败的行动。据我们所知,我们的工作首次调查了LLM中说服、警惕和任务绩效之间的关系,并表明独立监测这三个方面对于未来的人工智能安全工作至关重要。
摘要:With increasing integration of Large Language Models (LLMs) into areas of high-stakes human decision-making, it is important to understand the risks they introduce as advisors. To be useful advisors, LLMs must sift through large amounts of content, written with both benevolent and malicious intent, and then use this information to convince a user to take a specific action. This involves two social capacities: vigilance (the ability to determine which information to use, and which to discard) and persuasion (synthesizing the available evidence to make a convincing argument). While existing work has investigated these capacities in isolation, there has been little prior investigation of how these capacities may be linked. Here, we use a simple multi-turn puzzle-solving game, Sokoban, to study LLMs' abilities to persuade and be rationally vigilant towards other LLM agents. We find that puzzle-solving performance, persuasive capability, and vigilance are dissociable capacities in LLMs. Performing well on the game does not automatically mean a model can detect when it is being misled, even if the possibility of deception is explicitly mentioned. % as part of the prompt. However, LLMs do consistently modulate their token use, using fewer tokens to reason when advice is benevolent and more when it is malicious, even if they are still persuaded to take actions leading them to failure. To our knowledge, our work presents the first investigation of the relationship between persuasion, vigilance, and task performance in LLMs, and suggests that monitoring all three independently will be critical for future work in AI safety.
【14】Reasoning-Driven Design of Single Atom Catalysts via a Multi-Agent Large Language Model Framework
标题:基于多Agent大语言模型框架的单原子催化剂推理驱动设计
链接:https://arxiv.org/abs/2602.21533
作者:Dong Hyeon Mok,Seoin Back,Victor Fung,Guoxiang Hu
摘要:大型语言模型(LLM)越来越多地应用于自然语言处理之外,在传统上需要人类专业知识的复杂科学任务中表现出强大的能力。这一进展已经扩展到材料发现,其中LLM通过利用推理和上下文学习引入了一种新的范式,这些能力是传统机器学习方法所缺乏的。在这里,我们提出了一个基于多代理的电催化剂搜索通过推理和优化(MAESTRO)框架,其中多个LLM与专门的角色协作发现高性能的单原子催化剂的氧还原反应。在自主设计循环中,代理迭代推理,提出修改,反思结果并积累设计历史。通过这种迭代过程实现的上下文学习,MAESTRO确定了LLM背景知识中没有明确编码的设计原则,并成功发现了打破反应中间体之间传统比例关系的催化剂。这些结果突出了多代理LLM框架作为产生化学洞察力和发现有前途的催化剂的强大策略的潜力。
摘要:Large language models (LLMs) are becoming increasingly applied beyond natural language processing, demonstrating strong capabilities in complex scientific tasks that traditionally require human expertise. This progress has extended into materials discovery, where LLMs introduce a new paradigm by leveraging reasoning and in-context learning, capabilities absent from conventional machine learning approaches. Here, we present a Multi-Agent-based Electrocatalyst Search Through Reasoning and Optimization (MAESTRO) framework in which multiple LLMs with specialized roles collaboratively discover high-performance single atom catalysts for the oxygen reduction reaction. Within an autonomous design loop, agents iteratively reason, propose modifications, reflect on results and accumulate design history. Through in-context learning enabled by this iterative process, MAESTRO identified design principles not explicitly encoded in the LLMs' background knowledge and successfully discovered catalysts that break conventional scaling relations between reaction intermediates. These results highlight the potential of multi-agent LLM frameworks as a powerful strategy to generate chemical insight and discover promising catalysts.
Graph相关(图学习|图神经网络|图优化等)(3篇)
【1】RABot: Reinforcement-Guided Graph Augmentation for Imbalanced and Noisy Social Bot Detection
标题:RABot:增强引导的图增强,用于不平衡和吵闹的社交Bot检测
链接:https://arxiv.org/abs/2602.21749
作者:Longlong Zhang,Xi Wang,Haotong Du,Yangyi Xu,Zhuo Liu,Yang Liu
摘要:社交机器人检测对于维护在线信息生态系统的完整性至关重要。尽管最近的图神经网络(GNN)解决方案取得了很好的效果,但它们仍然受到两个实际挑战的阻碍:(i)由于生成机器人的高成本而产生的严重类不平衡,以及(ii)巧妙模仿人类行为并伪造欺骗性链接的机器人引入的拓扑噪声。我们提出了强化引导的图形增强社交机器人检测器(RABot),一个多粒度的图形增强框架,以统一的方式解决这两个问题。RABot采用邻域感知过采样策略,在局部子图内线性插值少数类嵌入,从而在低资源制度下稳定决策边界。同时,一个基于学习驱动的边缘过滤模块将基于相似性的边缘特征与自适应阈值优化相结合,以消除消息传递过程中的虚假交互,从而产生更清晰的拓扑结构。在三个真实世界的基准测试和四个GNN主干上进行的广泛实验表明,RABot始终超过最先进的基线。此外,由于其增强和过滤模块与底层架构正交,因此RABot可以无缝集成到现有的GNN管道中,以最小的开销提高性能。
摘要:Social bot detection is pivotal for safeguarding the integrity of online information ecosystems. Although recent graph neural network (GNN) solutions achieve strong results, they remain hindered by two practical challenges: (i) severe class imbalance arising from the high cost of generating bots, and (ii) topological noise introduced by bots that skillfully mimic human behavior and forge deceptive links. We propose the Reinforcement-guided graph Augmentation social Bot detector (RABot), a multi-granularity graph-augmentation framework that addresses both issues in a unified manner. RABot employs a neighborhood-aware oversampling strategy that linearly interpolates minority-class embeddings within local subgraphs, thereby stabilizing the decision boundary under low-resource regimes. Concurrently, a reinforcement-learning-driven edge-filtering module combines similarity-based edge features with adaptive threshold optimization to excise spurious interactions during message passing, yielding a cleaner topology. Extensive experiments on three real-world benchmarks and four GNN backbones demonstrate that RABot consistently surpasses state-of-the-art baselines. In addition, since its augmentation and filtering modules are orthogonal to the underlying architecture, RABot can be seamlessly integrated into existing GNN pipelines to boost performance with minimal overhead.
【2】NGDB-Zoo: Towards Efficient and Scalable Neural Graph Databases Training
标题:NGDB-Zoo:迈向高效且可扩展的神经图数据库训练
链接:https://arxiv.org/abs/2602.21597
作者:Zhongwei Xie,Jiaxin Bai,Shujie Liu,Haoyu Huang,Yufei Li,Yisen Gao,Hong Ting Tsang,Yangqiu Song
摘要:神经图数据库(NGDB)可以在不完整的知识结构上进行复杂的逻辑推理,但其训练效率和表达能力受到严格的查询级嵌入和结构排他嵌入的限制。我们提出了NGDB-Zoo,这是一个统一的框架,通过将操作员级别的培训与语义增强协同起来,解决了这些瓶颈。通过将逻辑运算符与查询拓扑解耦,NGDB-Zoo将训练循环转换为动态调度的数据流执行,从而实现多流并行,并实现与基线相比1.8\times $ -6.8\times $的吞吐量。此外,我们形式化了一个解耦的体系结构,以集成高维语义先验从预训练的文本编码器(PTE),而不会触发I/O失速或内存溢出。对包括ogbl-wikikikg 2和ATLAS-Wiki等大型图在内的六个基准测试的广泛评估表明,NGDB-Zoo在不同的逻辑模式中保持了较高的GPU利用率,并显着减轻了混合神经符号推理中的表示摩擦。
摘要:Neural Graph Databases (NGDBs) facilitate complex logical reasoning over incomplete knowledge structures, yet their training efficiency and expressivity are constrained by rigid query-level batching and structure-exclusive embeddings. We present NGDB-Zoo, a unified framework that resolves these bottlenecks by synergizing operator-level training with semantic augmentation. By decoupling logical operators from query topologies, NGDB-Zoo transforms the training loop into a dynamically scheduled data-flow execution, enabling multi-stream parallelism and achieving a $1.8\times$ - $6.8\times$ throughput compared to baselines. Furthermore, we formalize a decoupled architecture to integrate high-dimensional semantic priors from Pre-trained Text Encoders (PTEs) without triggering I/O stalls or memory overflows. Extensive evaluations on six benchmarks, including massive graphs like ogbl-wikikg2 and ATLAS-Wiki, demonstrate that NGDB-Zoo maintains high GPU utilization across diverse logical patterns and significantly mitigates representation friction in hybrid neuro-symbolic reasoning.
【3】Archetypal Graph Generative Models: Explainable and Identifiable Communities via Anchor-Dominant Convex Hulls
标题:原型图生成模型:基于锚支配凸壳的可解释和可识别社区
链接:https://arxiv.org/abs/2602.21342
作者:Nikolaos Nakis,Chrysoula Kosma,Panagiotis Promponas,Michail Chatzianastasis,Giannis Nikolentzos
备注:Accepted to AISTATS26 (Spotlight)
摘要:表示学习对于图机器学习任务(如链接预测,社区检测和网络可视化)至关重要。尽管最近在这些下游任务上取得了很大的进展,但在自我解释模型方面却进展甚微。理解预测背后的模式同样重要,这激发了人们对可解释机器学习的兴趣。在本文中,我们提出了GraphHull,一个可解释的生成模型,使用两个层次的凸包来表示网络。在全局层次上,凸包的顶点被视为原型,每个顶点对应于网络中的一个纯社区。在地方一级,每个社区是由一个原型船体的顶点作为代表性的配置文件,捕捉社区特定的变化。这种两级结构产生了清晰的多尺度解释:节点相对于全局原型和局部原型的位置直接解释了它的边。几何体通过设计表现良好,而局部外壳通过构造保持不相交。为了进一步鼓励多样性和稳定性,我们放置原则性先验,包括决定性点过程,并在MAP估计下使用可扩展的子采样拟合模型。在真实网络上的实验表明,GraphHull能够恢复多层次社区结构,并在链接预测和社区检测方面实现有竞争力或优越的性能,同时自然地提供可解释的预测。
摘要:Representation learning has been essential for graph machine learning tasks such as link prediction, community detection, and network visualization. Despite recent advances in achieving high performance on these downstream tasks, little progress has been made toward self-explainable models. Understanding the patterns behind predictions is equally important, motivating recent interest in explainable machine learning. In this paper, we present GraphHull, an explainable generative model that represents networks using two levels of convex hulls. At the global level, the vertices of a convex hull are treated as archetypes, each corresponding to a pure community in the network. At the local level, each community is refined by a prototypical hull whose vertices act as representative profiles, capturing community-specific variation. This two-level construction yields clear multi-scale explanations: a node's position relative to global archetypes and its local prototypes directly accounts for its edges. The geometry is well-behaved by design, while local hulls are kept disjoint by construction. To further encourage diversity and stability, we place principled priors, including determinantal point processes, and fit the model under MAP estimation with scalable subsampling. Experiments on real networks demonstrate the ability of GraphHull to recover multi-level community structure and to achieve competitive or superior performance in link prediction and community detection, while naturally providing interpretable predictions.
Transformer(3篇)
【1】TiMi: Empower Time Series Transformers with Multimodal Mixture of Experts
标题:TiMi:通过多模式专家混合为时间序列Transformer赋予力量
链接:https://arxiv.org/abs/2602.21693
作者:Jiafeng Lin,Yuxuan Wang,Huakun Luo,Zhongyi Pei,Jianmin Wang
摘要:多模态时间序列预测由于其利用其他模态固有的丰富信息提供比传统单模态模型更准确的预测的潜力而引起了人们的极大关注。然而,由于模态对齐的根本挑战,现有的方法往往难以有效地将多模态数据纳入预测,特别是对时间序列波动有因果影响的文本信息,如紧急报告和政策公告。在本文中,我们反思了文本信息在数值预报中的作用,并提出了具有多模态专家混合的时间序列Transformers,TiMi,以释放LLM的因果推理能力。具体地说,TiMi利用LLM生成对未来发展的推断,作为时间序列预测的指导。为了将外源因素和时间序列无缝集成到预测中,我们引入了多模态专家混合(MMoE)模块作为轻量级插件,以使基于Transformer的时间序列模型能够进行多模态预测,从而消除了对显式表示级别对齐的需求。在实验中,我们提出的TiMi在16个真实世界的多模态预测基准上表现出一致的最先进性能,优于先进的基线,同时提供强大的适应性和可解释性。
摘要
:Multimodal time series forecasting has garnered significant attention for its potential to provide more accurate predictions than traditional single-modality models by leveraging rich information inherent in other modalities. However, due to fundamental challenges in modality alignment, existing methods often struggle to effectively incorporate multimodal data into predictions, particularly textual information that has a causal influence on time series fluctuations, such as emergency reports and policy announcements. In this paper, we reflect on the role of textual information in numerical forecasting and propose Time series transformers with Multimodal Mixture-of-Experts, TiMi, to unleash the causal reasoning capabilities of LLMs. Concretely, TiMi utilizes LLMs to generate inferences on future developments, which serve as guidance for time series forecasting. To seamlessly integrate both exogenous factors and time series into predictions, we introduce a Multimodal Mixture-of-Experts (MMoE) module as a lightweight plug-in to empower Transformer-based time series models for multimodal forecasting, eliminating the need for explicit representation-level alignment. Experimentally, our proposed TiMi demonstrates consistent state-of-the-art performance on sixteen real-world multimodal forecasting benchmarks, outperforming advanced baselines while offering both strong adaptability and interpretability.
【2】Trie-Aware Transformers for Generative Recommendation
标题:尝试感知生成性推荐的Transformer
链接:https://arxiv.org/abs/2602.21677
作者:Zhenxiang Xu,Jiawei Chen,Sirui Chen,Yong He,Jieyu Yang,Chuan Yuan,Ke Ding,Can Wang
摘要:生成式推荐(GR)通过将下一项预测作为令牌级生成而不是基于分数的排名,与生成式AI的进步保持一致。大多数GR方法采用两阶段流水线:(i)\textit{item tokenization},将每个项目映射到离散的、分层组织的令牌序列;以及(ii)\textit{autoregressive generation},以用户交互历史的令牌为条件预测下一个项目的令牌。虽然分层标记化在项目上引入前缀树(trie),但是使用传统Transformers的标准自回归建模通常将项目标记转换为线性流,并且忽略了底层拓扑。 为了解决这个问题,我们提出了TrieRec,一个trie-aware生成推荐方法,通过两个位置编码增强Transformers的结构归纳偏见。首先,\textit{trie-aware绝对位置编码}将标记(节点)的局部结构上下文(例如深度,祖先和后代)聚合到标记表示中。第二,\textit{拓扑感知相对位置编码}将成对结构关系注入自我注意力,以捕获拓扑诱导的语义相关性。TrieRec也是模型不可知的,高效的,无超参数的。在我们的实验中,我们在三个代表性的GR骨干中实现了TrieRec,在四个真实世界的数据集上平均实现了8.83%的显着改进。
摘要:Generative recommendation (GR) aligns with advances in generative AI by casting next-item prediction as token-level generation rather than score-based ranking. Most GR methods adopt a two-stage pipeline: (i) \textit{item tokenization}, which maps each item to a sequence of discrete, hierarchically organized tokens; and (ii) \textit{autoregressive generation}, which predicts the next item's tokens conditioned on the tokens of user's interaction history. Although hierarchical tokenization induces a prefix tree (trie) over items, standard autoregressive modeling with conventional Transformers often flattens item tokens into a linear stream and overlooks the underlying topology. To address this, we propose TrieRec, a trie-aware generative recommendation method that augments Transformers with structural inductive biases via two positional encodings. First, a \textit{trie-aware absolute positional encoding} aggregates a token's (node's) local structural context (\eg depth, ancestors, and descendants) into the token representation. Second, a \textit{topology-aware relative positional encoding} injects pairwise structural relations into self-attention to capture topology-induced semantic relatedness. TrieRec is also model-agnostic, efficient, and hyperparameter-free. In our experiments, we implement TrieRec within three representative GR backbones, achieving notably improvements of 8.83\% on average across four real-world datasets.
【3】Benchmarking State Space Models, Transformers, and Recurrent Networks for US Grid Forecasting
标题:美国电网预测的状态空间模型、变形器和循环网络基准
链接:https://arxiv.org/abs/2602.21415
作者:Sunki Hong,Jisoo Lee,Yuanyuan Shi
备注:11 pages, 2 figures, 8 tables
摘要:选择正确的深度学习模型进行电网预测是一项挑战,因为性能在很大程度上取决于运营商可用的数据。本文提出了五种现代神经架构的综合基准:两种状态空间模型(PowerMamba,S-Mamba),两种Transformers(iTransformer,PatchTST)和传统的LSTM。我们评估了这些模型在24至168小时之间的预测窗口在六个不同的美国电网每小时的电力需求。为了确保公平的比较,我们采用专门的时间处理和模块化层来适应每个模型,该模块化层可以干净地集成天气协变量。我们的研究结果表明,没有一个最好的模型适用于所有情况。当仅使用历史负荷进行预测时,PatchTST和状态空间模型提供了最高的准确性。然而,当明确的天气数据被添加到输入中时,排名就会颠倒过来:iTransformer的准确性比PatchTST高出三倍。通过控制模型的大小,我们确认,这种优势源于架构的内在能力,跨不同的变量混合信息。将我们的评估扩展到太阳能发电,风力发电和批发价格进一步表明,模型排名取决于预测任务:PatchTST在太阳能等高度节奏的信号上表现出色,而状态空间模型更适合风和价格的混乱波动。最终,该基准测试为网格运营商提供了可操作的指导方针,以根据其特定的数据环境选择最佳的预测架构。
摘要:Selecting the right deep learning model for power grid forecasting is challenging, as performance heavily depends on the data available to the operator. This paper presents a comprehensive benchmark of five modern neural architectures: two state space models (PowerMamba, S-Mamba), two Transformers (iTransformer, PatchTST), and a traditional LSTM. We evaluate these models on hourly electricity demand across six diverse US power grids for forecast windows between 24 and 168 hours. To ensure a fair comparison, we adapt each model with specialized temporal processing and a modular layer that cleanly integrates weather covariates. Our results reveal that there is no single best model for all situations. When forecasting using only historical load, PatchTST and the state space models provide the highest accuracy. However, when explicit weather data is added to the inputs, the rankings reverse: iTransformer improves its accuracy three times more efficiently than PatchTST. By controlling for model size, we confirm that this advantage stems from the architecture's inherent ability to mix information across different variables. Extending our evaluation to solar generation, wind power, and wholesale prices further demonstrates that model rankings depend on the forecast task: PatchTST excels on highly rhythmic signals like solar, while state space models are better suited for the chaotic fluctuations of wind and price. Ultimately, this benchmark provides grid operators with actionable guidelines for selecting the optimal forecasting architecture based on their specific data environments.
GAN|对抗|攻击|生成相关(7篇)
【1】Bayesian Generative Adversarial Networks via Gaussian Approximation for Tabular Data Synthesis
标题:基于高斯逼近的Bayesian生成对抗网络用于表格数据合成
链接:https://arxiv.org/abs/2602.21948
作者:Bahrul Ilmi Nasution,Mark Elliot,Richard Allmendinger
备注:28 pages, 5 Figures, Accepted in Transactions on Data Privacy
摘要:生成对抗网络(GAN)已被用于许多研究中,以合成混合表格数据。条件表GAN(CTGAN)是最受欢迎的变体,但难以有效地导航风险-效用权衡。贝叶斯GAN在表格数据方面受到的关注较少,但已经在图像和文本等非结构化数据方面进行了探索。贝叶斯GAN中最常用的技术是马尔可夫链蒙特卡罗(MCMC),但它是计算密集型的,特别是在权重存储方面。在本文中,我们介绍了高斯近似CTGAN(GACTGAN),贝叶斯后验近似技术的集成使用随机权重平均高斯(SWAG)在CTGAN生成器合成表格数据,减少训练阶段后的计算开销。我们证明,GACTGAN产生更好的合成数据相比,CTGAN,实现更好地保存表格结构和推理统计,隐私风险更小。这些结果突出了GACTGAN作为贝叶斯表格合成的一个更简单,有效的实现。
摘要:Generative Adversarial Networks (GAN) have been used in many studies to synthesise mixed tabular data. Conditional tabular GAN (CTGAN) have been the most popular variant but struggle to effectively navigate the risk-utility trade-off. Bayesian GAN have received less attention for tabular data, but have been explored with unstructured data such as images and text. The most used technique employed in Bayesian GAN is Markov Chain Monte Carlo (MCMC), but it is computationally intensive, particularly in terms of weight storage. In this paper, we introduce Gaussian Approximation of CTGAN (GACTGAN), an integration of the Bayesian posterior approximation technique using Stochastic Weight Averaging-Gaussian (SWAG) within the CTGAN generator to synthesise tabular data, reducing computational overhead after the training phase. We demonstrate that GACTGAN yields better synthetic data compared to CTGAN, achieving better preservation of tabular structure and inferential statistics with less privacy risk. These results highlight GACTGAN as a simpler, effective implementation of Bayesian tabular synthesis.
【2】DocDjinn: Controllable Synthetic Document Generation with VLMs and Handwriting Diffusion
标题:DocDjinn:使用VLM和手写扩散的可控合成文档生成
链接:https://arxiv.org/abs/2602.21824
作者:Marcel Lamott,Saifullah Saifullah,Nauman Riaz,Yves-Noel Weweler,Tobias Alt-Veit,Ahmad Sarmad Ali,Muhammad Armaghan Shakir,Adrian Kalwa,Momina Moetesum,Andreas Dengel,Sheraz Ahmed,Faisal Shafait,Ulrich Schwanecke,Adrian Ulges
摘要:有效的文档智能模型依赖于大量带注释的训练数据。然而,由于数据采集的劳动密集型和成本高昂,获取充足和高质量的数据构成了重大挑战。此外,利用语言模型来注释真实文档会引起对数据隐私的担忧。合成文档生成已成为一种有前途的、保护隐私的替代方案。我们提出了DocDjinn,一个新的框架,可控的合成文档生成使用视觉语言模型(VLM),从未标记的种子样本产生注释的文件。我们的方法生成视觉上合理的和语义上一致的合成文件,遵循现有的源数据集的分布,通过基于聚类的种子选择与参数化采样。通过丰富的文件与现实的扩散为基础的手写和上下文的视觉元素,通过语义视觉解耦,我们生成多样化的,高质量的注释合成文档。我们评估了11个基准测试,包括关键信息提取、问题回答、文档分类和文档布局分析。据我们所知,这是第一项工作证明,VLM可以从未标记的种子中大规模生成忠实的注释文档数据集,这些数据集可以有效地丰富或近似真实的手动注释数据,用于各种文档理解任务。我们表明,只有100个真实的训练样本,我们的框架实现了平均$87\%$的完整的真实世界数据集的性能。我们公开发布我们的代码和140 k+合成文档样本。
摘要:Effective document intelligence models rely on large amounts of annotated training data. However, procuring sufficient and high-quality data poses significant challenges due to the labor-intensive and costly nature of data acquisition. Additionally, leveraging language models to annotate real documents raises concerns about data privacy. Synthetic document generation has emerged as a promising, privacy-preserving alternative. We propose DocDjinn, a novel framework for controllable synthetic document generation using Vision-Language Models (VLMs) that produces annotated documents from unlabeled seed samples. Our approach generates visually plausible and semantically consistent synthetic documents that follow the distribution of an existing source dataset through clustering-based seed selection with parametrized sampling. By enriching documents with realistic diffusion-based handwriting and contextual visual elements via semantic-visual decoupling, we generate diverse, high-quality annotated synthetic documents. We evaluate across eleven benchmarks spanning key information extraction, question answering, document classification, and document layout analysis. To our knowledge, this is the first work demonstrating that VLMs can generate faithful annotated document datasets at scale from unlabeled seeds that can effectively enrich or approximate real, manually annotated data for diverse document understanding tasks. We show that with only 100 real training samples, our framework achieves on average $87\%$ of the performance of the full real-world dataset. We publicly release our code and 140k+ synthetic document samples.
【3】Primary-Fine Decoupling for Action Generation in Robotic Imitation
标题:机器人模仿中动作生成的初级精细脱钩
链接:https://arxiv.org/abs/2602.21684
作者:Xiaohan Lei,Min Wang,Wengang Zhou,Xingyu Lu,Houqiang Li
备注:The Fourteenth International Conference on Learning Representations (ICLR), 2026
摘要:机器人操作动作序列的多模态分布对模仿学习提出了严峻的挑战。为此,现有的方法通常将动作空间建模为离散的令牌集或连续的潜变量分布。然而,这两种方法都存在权衡:一些方法将动作离散化为令牌,因此失去了细粒度的动作变化,而另一些方法在单个阶段中生成连续动作,往往会产生不稳定的模式转换。为了解决这些局限性,我们提出了初级精细解耦的动作生成(PF-DAG),一个两阶段的框架,从细粒度的变化中提取粗动作的一致性。首先,我们将动作块压缩成一小部分离散模式,使轻量级策略能够选择一致的粗略模式并避免模式反弹。第二,一个模式条件下的MeanFlow策略学习,以产生高保真的连续动作。理论上,我们证明了PF-DAG的两阶段设计实现了严格低于单阶段生成策略的MSE界。从经验上看,PF-DAG在Adroit、DexArt和MetaWorld基准测试的56个任务中的表现优于最先进的基线。它进一步推广到现实世界的触觉灵巧操纵任务。我们的工作表明,明确的模式级解耦,使强大的多模态建模和反应闭环控制机器人操作。
摘要
:Multi-modal distribution in robotic manipulation action sequences poses critical challenges for imitation learning. To this end, existing approaches often model the action space as either a discrete set of tokens or a continuous, latent-variable distribution. However, both approaches present trade-offs: some methods discretize actions into tokens and therefore lose fine-grained action variations, while others generate continuous actions in a single stage tend to produce unstable mode transitions. To address these limitations, we propose Primary-Fine Decoupling for Action Generation (PF-DAG), a two-stage framework that decouples coarse action consistency from fine-grained variations. First, we compress action chunks into a small set of discrete modes, enabling a lightweight policy to select consistent coarse modes and avoid mode bouncing. Second, a mode conditioned MeanFlow policy is learned to generate high-fidelity continuous actions. Theoretically, we prove PF-DAG's two-stage design achieves a strictly lower MSE bound than single-stage generative policies. Empirically, PF-DAG outperforms state-of-the-art baselines across 56 tasks from Adroit, DexArt, and MetaWorld benchmarks. It further generalizes to real-world tactile dexterous manipulation tasks. Our work demonstrates that explicit mode-level decoupling enables both robust multi-modal modeling and reactive closed-loop control for robotic manipulation.
【4】Training-free Composition of Pre-trained GFlowNets for Multi-Objective Generation
标题:用于多目标生成的预训练GFlowNet的免训练组合
链接:https://arxiv.org/abs/2602.21565
作者:Seokwon Yoon,Youngbin Choi,Seunghyuk Cho,Seungbeom Lee,MoonJeong Park,Dongwoo Kim
备注:22 pages, 12 figures, 12 tables
摘要:生成流网络(GFlowNets)学习按照奖励函数的比例对不同的候选人进行采样,使其非常适合科学发现,其中探索多个有希望的解决方案至关重要。将GFlowNets进一步扩展到多目标设置引起了越来越多的兴趣,因为现实世界的应用程序通常涉及多个相互冲突的目标。然而,现有的方法需要额外的训练,为每组目标,限制了它们的适用性,并产生大量的计算开销。我们提出了一种无需训练的混合策略,该策略在推理时组成预先训练的GFlowNet,从而无需微调或重新训练即可快速适应。重要的是,我们的框架是灵活的,能够处理不同的奖励组合,从线性标量化到复杂的非线性逻辑运算符,这在以前的文献中通常是单独处理的。我们证明了我们的方法可以精确地恢复线性标量化的目标分布,并通过失真因子量化非线性算子的近似质量。在合成2D网格和真实世界分子生成任务上的实验表明,我们的方法实现了与需要额外训练的基线相当的性能。
摘要:Generative Flow Networks (GFlowNets) learn to sample diverse candidates in proportion to a reward function, making them well-suited for scientific discovery, where exploring multiple promising solutions is crucial. Further extending GFlowNets to multi-objective settings has attracted growing interest since real-world applications often involve multiple, conflicting objectives. However, existing approaches require additional training for each set of objectives, limiting their applicability and incurring substantial computational overhead. We propose a training-free mixing policy that composes pre-trained GFlowNets at inference time, enabling rapid adaptation without finetuning or retraining. Importantly, our framework is flexible, capable of handling diverse reward combinations ranging from linear scalarization to complex non-linear logical operators, which are often handled separately in previous literature. We prove that our method exactly recovers the target distribution for linear scalarization and quantify the approximation quality for nonlinear operators through a distortion factor. Experiments on a synthetic 2D grid and real-world molecule-generation tasks demonstrate that our approach achieves performance comparable to baselines that require additional training.
【5】Adversarial Intent is a Latent Variable: Stateful Trust Inference for Securing Multimodal Agentic RAG
标题:对抗意图是一个潜在变量:确保多模式统计RAG安全的状态信任推理
链接:https://arxiv.org/abs/2602.21447
作者:Inderjeet Singh,Vikas Pahuja,Aishvariya Priya Rathina Sabapathy,Chiara Picardi,Amit Giloni,Roman Vainshtein,Andrés Murillo,Hisashi Kojima,Motoyoshi Sekiya,Yuki Unno,Junichi Suga
备注:13 pages, 2 figures, 5 tables
摘要:当前针对多模式代理RAG的无状态防御无法检测到跨检索、规划和生成组件分发恶意语义的对抗策略。我们将这一安全挑战表述为部分可观察马尔可夫决策过程(POMDP),其中对抗意图是从嘈杂的多阶段观察中推断出的潜在变量。我们介绍了MMA-RAG^T,一个推理时间控制框架,由模块化信任代理(MTA)管理,通过结构化LLM推理保持近似的信念状态。作为一个与模型无关的覆盖层,MMA-RAGT调解了一组可配置的内部检查点,以实施有状态的深度防御。对43,774个实例的广泛评估表明,相对于无防御基线,攻击成功率平均降低6.50倍,效用成本可以忽略不计。至关重要的是,因子消融验证了我们的理论界限:虽然状态和空间覆盖是单独必要的(分别为26.4 pp和13.6 pp增益),但当检查点检测完全相关时,无状态多点干预在同质无状态过滤下可以产生零边际效益。
摘要:Current stateless defences for multimodal agentic RAG fail to detect adversarial strategies that distribute malicious semantics across retrieval, planning, and generation components. We formulate this security challenge as a Partially Observable Markov Decision Process (POMDP), where adversarial intent is a latent variable inferred from noisy multi-stage observations. We introduce MMA-RAG^T, an inference-time control framework governed by a Modular Trust Agent (MTA) that maintains an approximate belief state via structured LLM reasoning. Operating as a model-agnostic overlay, MMA-RAGT mediates a configurable set of internal checkpoints to enforce stateful defence-in-depth. Extensive evaluation on 43,774 instances demonstrates a 6.50x average reduction factor in Attack Success Rate relative to undefended baselines, with negligible utility cost. Crucially, a factorial ablation validates our theoretical bounds: while statefulness and spatial coverage are individually necessary (26.4 pp and 13.6 pp gains respectively), stateless multi-point intervention can yield zero marginal benefit under homogeneous stateless filtering when checkpoint detections are perfectly correlated.
【6】Defensive Generation
标题:防御一代
链接:https://arxiv.org/abs/2602.21390
作者:Gabriele Farina,Juan Carlos Perdomo
摘要
:我们研究的问题,有效地生产,在一个在线的方式,生成模型的标量,多类和矢量值的结果,不能伪造的基础上观察到的数据和预先指定的计算测试的集合。我们的贡献是双重的。首先,我们扩展了在线高维多校准相对于RKHS和预期变分不等式问题的最新进展之间的联系,使前者的有效算法。然后,我们将这种算法机制的问题的结果不可否认性。我们的程序,防御性生成,是第一个有效地产生在线结果不可区分的生成模型的非伯努利结果,是不可证伪的无限类的测试,包括那些检查高阶矩的生成的分布。此外,我们的方法运行在近线性时间的样本数,并实现了最佳的,消失的T^{-1/2}率的产生错误。
摘要:We study the problem of efficiently producing, in an online fashion, generative models of scalar, multiclass, and vector-valued outcomes that cannot be falsified on the basis of the observed data and a pre-specified collection of computational tests. Our contributions are twofold. First, we expand on connections between online high-dimensional multicalibration with respect to an RKHS and recent advances in expected variational inequality problems, enabling efficient algorithms for the former. We then apply this algorithmic machinery to the problem of outcome indistinguishability. Our procedure, Defensive Generation, is the first to efficiently produce online outcome indistinguishable generative models of non-Bernoulli outcomes that are unfalsifiable with respect to infinite classes of tests, including those that examine higher-order moments of the generated distributions. Furthermore, our method runs in near-linear time in the number of samples and achieves the optimal, vanishing T^{-1/2} rate for generation error.
【7】EPSVec: Efficient and Private Synthetic Data Generation via Dataset Vectors
标题:EPSVec:通过数据集Vectors高效且私密的合成数据生成
链接:https://arxiv.org/abs/2602.21218
作者:Amin Banayeeanzade,Qingchuan Yang,Deqing Fu,Spencer Hong,Erin Babinsky,Alfy Samuel,Anoop Kumar,Robin Jia,Sai Praneeth Karimireddy
摘要:高质量的数据对于现代机器学习至关重要,但许多有价值的语料库是敏感的,不能自由共享。合成数据为下游开发提供了一种实用的替代品,大型语言模型(LLM)已经成为生成合成数据的强大引擎。然而,现有的私有文本生成方法效率低下:它们是数据密集型的,计算速度慢,并且通常需要大型私有语料库或批量才能达到可用的质量。我们引入了EPSVec,这是一种差异化的私有轻量级替代方案,它使用 * 数据集向量 * 来引导LLM生成--激活空间中的方向,这些方向捕获了私有数据和公共先验之间的分布差距。EPSVec只提取和净化一次导向矢量,然后执行标准解码。这使得隐私预算从生成开始就增加,从而允许任意多个合成样本,而无需额外的隐私成本,并且即使在低数据量的情况下也能产生很强的保真度。此外,我们通过利用预训练的(基础)模型和引入固定镜头提示来增强我们的方法,以提高生成多样性和保真度。我们的实验表明,EPSVec在分布对齐和下游效用方面优于现有基线,特别是在低数据制度中,同时显着降低了计算开销。
摘要:High-quality data is essential for modern machine learning, yet many valuable corpora are sensitive and cannot be freely shared. Synthetic data offers a practical substitute for downstream development, and large language models (LLMs) have emerged as powerful engines for generating it. However, existing private text generation methods are severely inefficient: they are data-intensive, computationally slow, and often require large private corpora or batch sizes to achieve usable quality. We introduce EPSVec, a differentially-private lightweight alternative that steers LLM generation using *dataset vectors*--directions in activation space that capture the distributional gap between private data and public priors. EPSVec extracts and sanitizes steering vectors just once and then performs standard decoding. This decouples the privacy budget from generation, enabling arbitrarily many synthetic samples without additional privacy cost and yielding strong fidelity even in low-data regimes. Furthermore, we enhance our method by utilizing pretrained (base) models and introducing fixed-shot prompting to boost generation diversity and fidelity. Our experiments demonstrate that EPSVec outperforms existing baselines in distributional alignment and downstream utility, particularly in low-data regimes, while significantly reducing computational overhead.
半/弱/无/有监督|不确定性|主动学习(6篇)
【1】GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL
标题:GUI-Libra:训练原生图形用户界面代理通过感知对象的监督和部分可验证的RL进行推理和行动
链接:https://arxiv.org/abs/2602.22190
作者:Rui Yang,Qianhui Wu,Zhaoyang Wang,Hanyang Chen,Ke Yang,Hao Cheng,Huaxiu Yao,Baoling Peng,Huan Zhang,Jianfeng Gao,Tong Zhang
备注:57 pages, 17 figures
摘要:开源原生GUI代理在长期导航任务上仍然落后于闭源系统。这一差距源于两个限制:缺乏高质量的、与动作一致的推理数据,以及直接采用通用的后训练管道,这些管道忽视了GUI代理的独特挑战。我们确定了这些管道中的两个基本问题:(i)标准SFT与CoT推理通常会伤害接地,以及(ii)逐步RLVR式训练面临部分可验证性,其中多个动作可以是正确的,但只有一个演示动作用于验证。这使得离线逐步度量对在线任务成功的预测很弱。在这项工作中,我们提出了GUI-Libra,这是一个量身定制的培训配方,可以解决这些挑战。首先,为了缓解行动一致的推理数据的稀缺性,我们引入了一个数据构建和过滤管道,并发布了一个精心策划的81 K GUI推理数据集。其次,为了调和推理与接地,我们提出了行动感知SFT,它混合了推理然后行动和直接行动数据,并重新加权令牌,以强调行动和接地。第三,为了在部分可验证性下稳定RL,我们确定了RLVR中KL正则化被忽视的重要性,并表明KL信任区域对于提高离线到在线的可预测性至关重要;我们进一步引入成功自适应缩放来降低不可靠的负梯度。在各种Web和移动基准测试中,GUI-Libra始终提高了逐步准确性和端到端任务完成率。我们的研究结果表明,精心设计的后期培训和数据管理可以在不需要昂贵的在线数据收集的情况下释放更强大的任务解决能力。我们发布了我们的数据集,代码和模型,以促进对具有推理能力的GUI代理的数据高效后训练的进一步研究。
摘要
:Open-source native GUI agents still lag behind closed-source systems on long-horizon navigation tasks. This gap stems from two limitations: a shortage of high-quality, action-aligned reasoning data, and the direct adoption of generic post-training pipelines that overlook the unique challenges of GUI agents. We identify two fundamental issues in these pipelines: (i) standard SFT with CoT reasoning often hurts grounding, and (ii) step-wise RLVR-tyle training faces partial verifiability, where multiple actions can be correct but only a single demonstrated action is used for verification. This makes offline step-wise metrics weak predictors of online task success. In this work, we present GUI-Libra, a tailored training recipe that addresses these challenges. First, to mitigate the scarcity of action-aligned reasoning data, we introduce a data construction and filtering pipeline and release a curated 81K GUI reasoning dataset. Second, to reconcile reasoning with grounding, we propose action-aware SFT that mixes reasoning-then-action and direct-action data and reweights tokens to emphasize action and grounding. Third, to stabilize RL under partial verifiability, we identify the overlooked importance of KL regularization in RLVR and show that a KL trust region is critical for improving offline-to-online predictability; we further introduce success-adaptive scaling to downweight unreliable negative gradients. Across diverse web and mobile benchmarks, GUI-Libra consistently improves both step-wise accuracy and end-to-end task completion. Our results suggest that carefully designed post-training and data curation can unlock significantly stronger task-solving capabilities without costly online data collection. We release our dataset, code, and models to facilitate further research on data-efficient post-training for reasoning-capable GUI agents.
【2】Learning Complex Physical Regimes via Coverage-oriented Uncertainty Quantification: An application to the Critical Heat Flux
标题:通过面向覆盖的不确定性量化学习复杂物理状态:临界热通量的应用
链接:https://arxiv.org/abs/2602.21701
作者:Michele Cazzola,Alberto Ghione,Lucia Sargentini,Julien Nespoulous,Riccardo Finotello
备注:34 pages, 14 figures
摘要:科学机器学习(ML)的一个核心挑战是正确表示由多机制行为控制的物理系统。在这些情况下,标准的数据分析技术往往无法捕捉到的数据的性质,因为系统的响应显着变化的状态空间,由于其随机性和不同的物理制度。因此,不确定性量化(UQ)不应仅仅被视为一种安全评估,而是对学习任务本身的支持,引导模型内化数据的行为。我们通过关注OECD/NEA反应堆系统多物理专家组提出的临界热通量(CHF)基准和数据集来解决这个问题。由于CHF对输入的非线性依赖性和不同微观物理机制的存在,该案例研究代表了对科学ML的测试。这些制度表现出不同的统计概况,一个复杂的,需要UQ技术内化的数据行为,并确保可靠的预测。在这项工作中,我们进行了UQ方法的比较分析,以确定其对物理表示的影响。我们对比事后方法,特别是保形预测,对端到端的覆盖面向管道,包括(贝叶斯)异方差回归和质量驱动的损失。这些方法不将不确定性视为最终指标,而是作为优化过程的一个积极组成部分,同时对预测及其行为进行建模。我们表明,虽然事后方法确保统计校准,覆盖面向学习有效地重塑模型的表示,以匹配复杂的物理制度。其结果是一个模型,不仅提供了高的预测精度,但也物理一致的不确定性估计,动态适应CHF的内在变异性。
摘要:A central challenge in scientific machine learning (ML) is the correct representation of physical systems governed by multi-regime behaviours. In these scenarios, standard data analysis techniques often fail to capture the nature of the data, as the system's response varies significantly across the state space due to its stochasticity and the different physical regimes. Uncertainty quantification (UQ) should thus not be viewed merely as a safety assessment, but as a support to the learning task itself, guiding the model to internalise the behaviour of the data. We address this by focusing on the Critical Heat Flux (CHF) benchmark and dataset presented by the OECD/NEA Expert Group on Reactor Systems Multi-Physics. This case study represents a test for scientific ML due to the non-linear dependence of CHF on the inputs and the existence of distinct microscopic physical regimes. These regimes exhibit diverse statistical profiles, a complexity that requires UQ techniques to internalise the data behaviour and ensure reliable predictions. In this work, we conduct a comparative analysis of UQ methodologies to determine their impact on physical representation. We contrast post-hoc methods, specifically conformal prediction, against end-to-end coverage-oriented pipelines, including (Bayesian) heteroscedastic regression and quality-driven losses. These approaches treat uncertainty not as a final metric, but as an active component of the optimisation process, modelling the prediction and its behaviour simultaneously. We show that while post-hoc methods ensure statistical calibration, coverage-oriented learning effectively reshapes the model's representation to match the complex physical regimes. The result is a model that delivers not only high predictive accuracy but also a physically consistent uncertainty estimation that adapts dynamically to the intrinsic variability of the CHF.
【3】Error-awareness Accelerates Active Automata Learning
标题:错误意识加速主动自动机学习
链接:https://arxiv.org/abs/2602.21674
作者:Loes Kruger,Sebastian Junges,Jurriaan Rot
摘要:主动自动机学习(AAL)算法可以通过与系统的交互来学习系统的行为模型。主要挑战仍然是扩展到更大的模型,特别是在存在许多可能的系统输入的情况下。现代AAL算法无法扩展,即使在每个状态下,大多数输入都会导致错误。在来自文献的各种挑战性问题中,这些误差是可观察到的,即,它们发出已知的错误输出。受这些问题的启发,我们研究如何更有效地学习这些系统。此外,我们考虑不同程度的知识,哪些输入是非错误产生在哪个状态。对于每个层次的知识,我们提供了一个匹配的适应最先进的AAL算法L#,使这个领域的知识。我们的实证评估表明,该方法加速学习的数量级与强大的,但现实的领域知识,以一个单一的数量级与有限的领域知识。
摘要:Active automata learning (AAL) algorithms can learn a behavioral model of a system from interacting with it. The primary challenge remains scaling to larger models, in particular in the presence of many possible inputs to the system. Modern AAL algorithms fail to scale even if, in every state, most inputs lead to errors. In various challenging problems from the literature, these errors are observable, i.e., they emit a known error output. Motivated by these problems, we study learning these systems more efficiently. Further, we consider various degrees of knowledge about which inputs are non-error producing at which state. For each level of knowledge, we provide a matching adaptation of the state-of-the-art AAL algorithm L# to make the most of this domain knowledge. Our empirical evaluation demonstrates that the methods accelerate learning by orders of magnitude with strong but realistic domain knowledge to a single order of magnitude with limited domain knowledge.
【4】Uncertainty-Aware Diffusion Model for Multimodal Highway Trajectory Prediction via DDIM Sampling
标题:DDIM抽样多模式公路轨迹预测的不确定性扩散模型
链接:https://arxiv.org/abs/2602.21319
作者:Marion Neumeier,Niklas Roßberg,Michael Botsch,Wolfgang Utschick
备注:Accepted as a conference paper in IEEE Intelligent Vehicles Symposium (IV) 2026, Detroit, MI, United States
摘要
:准确和不确定性感知的轨迹预测仍然是自动驾驶的核心挑战,这是由复杂的多智能体交互,多样化的场景背景和未来运动的固有随机性驱动的。基于扩散的生成模型最近显示出捕获多模态未来的强大潜力,但现有的方法,如cVMD遭受缓慢的采样,有限的开发生成多样性和脆弱的场景编码。 这项工作介绍了cVMDx,一个增强的基于扩散的轨迹预测框架,提高了效率,鲁棒性和多模态预测能力。通过DDIM采样,cVMDx可将推理时间缩短多达100倍,从而为不确定性估计提供实用的多样本生成。拟合的高斯混合模型进一步从生成的轨迹提供易于处理的多模态预测。此外,一个CVQ-VAE的变体进行评估的情况下编码。在公开的highD数据集上的实验表明,cVMDx比cVMD实现了更高的准确性和显着提高的效率,实现了完全随机的多模态轨迹预测。
摘要:Accurate and uncertainty-aware trajectory prediction remains a core challenge for autonomous driving, driven by complex multi-agent interactions, diverse scene contexts and the inherently stochastic nature of future motion. Diffusion-based generative models have recently shown strong potential for capturing multimodal futures, yet existing approaches such as cVMD suffer from slow sampling, limited exploitation of generative diversity and brittle scenario encodings. This work introduces cVMDx, an enhanced diffusion-based trajectory prediction framework that improves efficiency, robustness and multimodal predictive capability. Through DDIM sampling, cVMDx achieves up to a 100x reduction in inference time, enabling practical multi-sample generation for uncertainty estimation. A fitted Gaussian Mixture Model further provides tractable multimodal predictions from the generated trajectories. In addition, a CVQ-VAE variant is evaluated for scenario encoding. Experiments on the publicly available highD dataset show that cVMDx achieves higher accuracy and significantly improved efficiency over cVMD, enabling fully stochastic, multimodal trajectory prediction.
【5】Unsupervised Discovery of Intermediate Phase Order in the Frustrated $J_1$-$J_2$ Heisenberg Model via Prometheus Framework
标题:通过普罗米修斯框架无监督发现受挫的$J_1$-$J_2$海森堡模型中的中间相序
链接:https://arxiv.org/abs/2602.21468
作者:Brandon Yee,Wilson Collins,Maximilian Rutkowski
摘要:正方形晶格上的自旋1/2 J_1-J_2海森堡模型在Néel反铁磁和条纹有序区域之间表现出一个有争议的中间相,竞争的理论提出了格构价键、量子自旋和量子自旋液体基态。我们应用普罗米修斯变分自动编码器框架-以前验证经典(2D,3D伊辛)和量子(无序横向场伊辛)相变-系统地探索$J_1$-$J_2$相图通过无监督分析的精确对角化基态为$4 \次4$晶格。通过步长为0.01的$J_2/J_1 \in [0.3,0.7]$的密集参数扫描和全面的潜在空间分析,我们通过多个独立的方法使用无监督的序参数发现和临界点检测来研究中间状态的性质。这项工作展示了严格验证的机器学习方法在受挫的量子磁性中的应用,其中传统的序参量识别受到竞争相互作用和有限的可访问系统尺寸的挑战。
摘要:The spin-$1/2$ $J_1$-$J_2$ Heisenberg model on the square lattice exhibits a debated intermediate phase between Néel antiferromagnetic and stripe ordered regimes, with competing theories proposing plaquette valence bond, nematic, and quantum spin liquid ground states. We apply the Prometheus variational autoencoder framework -- previously validated on classical (2D, 3D Ising) and quantum (disordered transverse field Ising) phase transitions -- to systematically explore the $J_1$-$J_2$ phase diagram via unsupervised analysis of exact diagonalization ground states for a $4 \times 4$ lattice. Through dense parameter scans of $J_2/J_1 \in [0.3, 0.7]$ with step size 0.01 and comprehensive latent space analysis, we investigate the nature of the intermediate regime using unsupervised order parameter discovery and critical point detection via multiple independent methods. This work demonstrates the application of rigorously validated machine learning methods to open questions in frustrated quantum magnetism, where traditional order parameter identification is challenged by competing interactions and limited accessible system sizes.
【6】ConformalHDC: Uncertainty-Aware Hyperdimensional Computing with Application to Neural Decoding
标题:ConformalHDC:不确定性感知的多维计算及其在神经解码中的应用
链接:https://arxiv.org/abs/2602.21446
作者:Ziyi Liang,Hamed Poursiami,Zhishun Yang,Keiland Cooper,Akhilesh Jaiswal,Maryam Parsa,Norbert Fortin,Babak Shahbaba
摘要:超维计算(HDC)为神经形态学习提供了一个计算效率高的范例。然而,它缺乏严格的不确定性量化,导致开放的决策边界,因此容易受到离群值,对抗性扰动和分布外输入的影响。为了解决这些局限性,我们引入了ConformalHDC,这是一个统一的框架,它将保形预测的统计保证与HDC的计算效率相结合。对于这个框架,我们提出了两个互补的变化。首先,集值公式提供了有限样本,分布免费覆盖保证。使用精心设计的一致性分数,它形成了封闭的决策边界,提高了对不一致输入的鲁棒性。其次,点值公式利用相同的一致性分数来在需要时产生单个预测,通过考虑类交互,可能提高传统HDC的准确性。我们通过对多个真实世界数据集的评估证明了所提出的框架的广泛适用性。特别是,我们将我们的方法解码非空间刺激信息的具有挑战性的问题,记录作为主题进行序列记忆任务的海马神经元的尖峰活动。我们的研究结果表明,ConformalHDC不仅准确地解码神经活动数据中表示的刺激信息,而且还提供了严格的不确定性估计,并正确地回避来自其他行为状态的数据。总的来说,这些功能将框架定位为神经形态计算的可靠,不确定性感知基础。
摘要:Hyperdimensional Computing (HDC) offers a computationally efficient paradigm for neuromorphic learning. Yet, it lacks rigorous uncertainty quantification, leading to open decision boundaries and, consequently, vulnerability to outliers, adversarial perturbations, and out-of-distribution inputs. To address these limitations, we introduce ConformalHDC, a unified framework that combines the statistical guarantees of conformal prediction with the computational efficiency of HDC. For this framework, we propose two complementary variations. First, the set-valued formulation provides finite-sample, distribution-free coverage guarantees. Using carefully designed conformity scores, it forms enclosed decision boundaries that improve robustness to non-conforming inputs. Second, the point-valued formulation leverages the same conformity scores to produce a single prediction when desired, potentially improving accuracy over traditional HDC by accounting for class interactions. We demonstrate the broad applicability of the proposed framework through evaluations on multiple real-world datasets. In particular, we apply our method to the challenging problem of decoding non-spatial stimulus information from the spiking activity of hippocampal neurons recorded as subjects performed a sequence memory task. Our results show that ConformalHDC not only accurately decodes the stimulus information represented in the neural activity data, but also provides rigorous uncertainty estimates and correctly abstains when presented with data from other behavioral states. Overall, these capabilities position the framework as a reliable, uncertainty-aware foundation for neuromorphic computing.
迁移|Zero/Few/One-Shot|自适应(9篇)
【1】Robustness in sparse artificial neural networks trained with adaptive topology
标题:自适应布局训练的稀疏人工神经网络的鲁棒性
链接:https://arxiv.org/abs/2602.21961
作者:Bendegúz Sulyok,Gergely Palla,Filippo Radicchi,Santo Fortunato
摘要:我们研究了自适应拓扑结构训练的稀疏人工神经网络的鲁棒性。我们专注于一个简单而有效的架构,由三个稀疏层与99%的稀疏性,其次是密集层,适用于图像分类任务,如MNIST和时尚MNIST。通过更新每个时期之间的稀疏层的拓扑结构,我们实现了竞争力的准确性,尽管显着减少了权重的数量。我们的主要贡献是对这些网络的鲁棒性进行了详细的分析,探索了它们在各种扰动下的性能,包括随机链接删除,对抗性攻击和链接权重重排。通过大量的实验,我们证明了自适应拓扑结构不仅提高了效率,而且保持了鲁棒性。这项工作突出了自适应稀疏网络作为开发高效可靠的深度学习模型的一个有前途的方向的潜力。
摘要:We investigate the robustness of sparse artificial neural networks trained with adaptive topology. We focus on a simple yet effective architecture consisting of three sparse layers with 99% sparsity followed by a dense layer, applied to image classification tasks such as MNIST and Fashion MNIST. By updating the topology of the sparse layers between each epoch, we achieve competitive accuracy despite the significantly reduced number of weights. Our primary contribution is a detailed analysis of the robustness of these networks, exploring their performance under various perturbations including random link removal, adversarial attack, and link weight shuffling. Through extensive experiments, we demonstrate that adaptive topology not only enhances efficiency but also maintains robustness. This work highlights the potential of adaptive sparse networks as a promising direction for developing efficient and reliable deep learning models.
【2】RAMSeS: Robust and Adaptive Model Selection for Time-Series Anomaly Detection Algorithms
标题:RAMSeS:时间序列异常检测算法的鲁棒自适应模型选择
链接:https://arxiv.org/abs/2602.21766
作者:Mohamed Abdelmaksoud,Sheng Ding,Andrey Morozov,Ziawasch Abedjan
摘要:时间序列数据在不同领域之间差异很大,使得通用异常检测器不切实际。在一个数据集上表现良好的方法通常无法传输,因为异常是依赖于上下文的。关键的挑战是设计一种在特定环境中表现良好的方法,同时保持跨数据复杂度不同的域的适应性。我们提出了鲁棒性和自适应模型选择的时间序列异常检测RAMSeS框架。RAMSeS包括两个分支:(i)利用遗传算法优化的叠加系综,以利用互补检测器。(ii)自适应模型选择分支使用汤普森采样、生成对抗网络的鲁棒性测试和蒙特卡罗模拟等技术来识别最佳单个检测器。这种双重策略利用了多种型号的集体优势,并适应了特定于汽车的特点。我们评估RAMSeS,并表明它优于F1上的先前方法。
摘要:Time-series data vary widely across domains, making a universal anomaly detector impractical. Methods that perform well on one dataset often fail to transfer because what counts as an anomaly is context dependent. The key challenge is to design a method that performs well in specific contexts while remaining adaptable across domains with varying data complexities. We present the Robust and Adaptive Model Selection for Time-Series Anomaly Detection RAMSeS framework. RAMSeS comprises two branches: (i) a stacking ensemble optimized with a genetic algorithm to leverage complementary detectors. (ii) An adaptive model-selection branch identifies the best single detector using techniques including Thompson sampling, robustness testing with generative adversarial networks, and Monte Carlo simulations. This dual strategy exploits the collective strength of multiple models and adapts to dataset-specific characteristics. We evaluate RAMSeS and show that it outperforms prior methods on F1.
【3】MMLoP: Multi-Modal Low-Rank Prompting for Efficient Vision-Language Adaptation
标题:MMLoP:用于高效视觉语言适应的多模式低等级预算
链接:https://arxiv.org/abs/2602.21397
作者:Sajjad Ghiasvand,Haniyeh Ehsani Oskouie,Mahnoosh Alizadeh,Ramtin Pedarsani
摘要:即时学习已经成为一种主导范式,用于在不修改预训练权重的情况下使CLIP等视觉语言模型(VLM)适应下游任务。虽然跨多个Transformer层将提示扩展到视觉和文本编码器可以显著提高性能,但它会显著增加可训练参数的数量,因为最先进的方法需要数百万个参数,并且放弃了使即时调优具有吸引力的参数效率。在这项工作中,我们提出了\textbf{MMLoP}(\textbf {M}ulti-\textbf{M}odal \textbf{Lo}w-Rank \textbf{P}rompting),这是一个框架,它只需要\textbf{11.5K可训练参数}就可以实现深度多模态提示,与早期的纯文本方法(如CoOp)相当。MMLoP通过低秩因子分解在每个Transformer层对视觉和文本提示进行参数化,该低秩因子分解用作针对Few-Shot训练数据的过拟合的隐式正则化器。为了进一步缩小与最先进方法的精度差距,我们引入了三个互补组件:自调节一致性损失,其将提示表示锚定到在特征和logit级别上的冻结的zero-shot CLIP特征,均匀漂移校正,其去除由提示调整引起的全局嵌入移位以保持类判别结构,以及共享的向上投影,其通过共同的低秩因子来耦合视觉和文本提示以强制跨模态对齐。在三个基准测试和11个不同数据集上的广泛实验表明,MMLoP实现了非常有利的准确性-效率权衡,优于大多数现有方法,包括具有数量级更多参数的方法,同时在基础到新的泛化上实现了79.70%的调和平均值。
摘要:Prompt learning has become a dominant paradigm for adapting vision-language models (VLMs) such as CLIP to downstream tasks without modifying pretrained weights. While extending prompts to both vision and text encoders across multiple transformer layers significantly boosts performance, it dramatically increases the number of trainable parameters, with state-of-the-art methods requiring millions of parameters and abandoning the parameter efficiency that makes prompt tuning attractive. In this work, we propose \textbf{MMLoP} (\textbf{M}ulti-\textbf{M}odal \textbf{Lo}w-Rank \textbf{P}rompting), a framework that achieves deep multi-modal prompting with only \textbf{11.5K trainable parameters}, comparable to early text-only methods like CoOp. MMLoP parameterizes vision and text prompts at each transformer layer through a low-rank factorization, which serves as an implicit regularizer against overfitting on few-shot training data. To further close the accuracy gap with state-of-the-art methods, we introduce three complementary components: a self-regulating consistency loss that anchors prompted representations to frozen zero-shot CLIP features at both the feature and logit levels, a uniform drift correction that removes the global embedding shift induced by prompt tuning to preserve class-discriminative structure, and a shared up-projection that couples vision and text prompts through a common low-rank factor to enforce cross-modal alignment. Extensive experiments across three benchmarks and 11 diverse datasets demonstrate that MMLoP achieves a highly favorable accuracy-efficiency tradeoff, outperforming the majority of existing methods including those with orders of magnitude more parameters, while achieving a harmonic mean of 79.70\% on base-to-novel generalization.
【4】MrBERT: Modern Multilingual Encoders via Vocabulary, Domain, and Dimensional Adaptation
标题:MrBERT:通过词汇、领域和维度适应的现代多语言编码器
链接:https://arxiv.org/abs/2602.21379
作者:Daniel Tamayo,Iñaki Lacunza,Paula Rivera-Hidalgo,Severino Da Dalt,Javier Aula-Blasco,Aitor Gonzalez-Agirre,Marta Villegas
备注:24 pages, 14 tables and 4 figures
摘要:我们介绍了MrBERT,一个150 M-300 M参数编码器系列,构建在ModernBERT架构上,并在35种语言和代码上进行了预训练。通过有针对性的适应,该模型系列在加泰罗尼亚语和西班牙语特定任务上实现了最先进的结果,同时在专业的生物医学和法律领域建立了强大的性能。为了弥合研究和生产之间的差距,我们结合了Matryoshka表示学习(MRL),实现了灵活的向量大小调整,大大降低了推理和存储成本。最终,MrBERT家族证明了现代编码器架构可以优化,以实现本地化的语言卓越性和高效、高风险的领域专业化。我们在Huggingface上开源了完整的模型家族。
摘要:We introduce MrBERT, a family of 150M-300M parameter encoders built on the ModernBERT architecture and pre-trained on 35 languages and code. Through targeted adaptation, this model family achieves state-of-the-art results on Catalan- and Spanish-specific tasks, while establishing robust performance across specialized biomedical and legal domains. To bridge the gap between research and production, we incorporate Matryoshka Representation Learning (MRL), enabling flexible vector sizing that significantly reduces inference and storage costs. Ultimately, the MrBERT family demonstrates that modern encoder architectures can be optimized for both localized linguistic excellence and efficient, high-stakes domain specialization. We open source the complete model family on Huggingface.
【5】The Mean is the Mirage: Entropy-Adaptive Model Merging under Heterogeneous Domain Shifts in Medical Imaging
标题:平均值是海市蜃楼:医学成像中异类领域转变下的医学成像中的熵自适应模型融合
链接:https://arxiv.org/abs/2602.21372
作者:Sameer Ambekar,Reza Nasirigerdeh,Peter J. Schuffler,Lina Felsner,Daniel M. Lang,Julia A. Schnabel
摘要:在不可见的测试时间分布变化下的模型合并通常会使幼稚的策略变得不可靠,例如平均值。这一挑战在医学成像领域尤为严峻,因为模型在诊所根据私人数据进行本地微调,产生因扫描仪、协议和人群而异的特定领域模型。当在看不见的临床研究中心展开时,测试用例以未标记、非i.i.d.的形式到达。批处理,模型必须在没有标签的情况下立即适应。在这项工作中,我们引入了一个熵自适应,完全在线的模型合并方法,只通过正向传递产生一个特定于批次的合并模型,有效地利用目标信息。我们进一步证明了为什么意味着合并是容易失败和失调下异构域的变化。接下来,我们通过解耦编码器和分类头,使用单独的合并系数合并来减轻编码器分类器失配。我们使用九个医学和自然领域泛化图像分类数据集的两个主干,使用最先进的基线对我们的方法进行了广泛的评估,在标准评估和具有挑战性的场景中显示出一致的收益。这些性能的提高,同时保留单模型的推理在测试时,从而证明了我们的方法的有效性。
摘要:Model merging under unseen test-time distribution shifts often renders naive strategies, such as mean averaging unreliable. This challenge is especially acute in medical imaging, where models are fine-tuned locally at clinics on private data, producing domain-specific models that differ by scanner, protocol, and population. When deployed at an unseen clinical site, test cases arrive in unlabeled, non-i.i.d. batches, and the model must adapt immediately without labels. In this work, we introduce an entropy-adaptive, fully online model-merging method that yields a batch-specific merged model via only forward passes, effectively leveraging target information. We further demonstrate why mean merging is prone to failure and misaligned under heterogeneous domain shifts. Next, we mitigate encoder classifier mismatch by decoupling the encoder and classification head, merging with separate merging coefficients. We extensively evaluate our method with state-of-the-art baselines using two backbones across nine medical and natural-domain generalization image classification datasets, showing consistent gains across standard evaluation and challenging scenarios. These performance gains are achieved while retaining single-model inference at test-time, thereby demonstrating the effectiveness of our method.
【6】ACAR: Adaptive Complexity Routing for Multi-Model Ensembles with Auditable Decision Traces
标题:ACAR:具有可审计决策轨迹的多模型集成的自适应复杂性路由
链接:https://arxiv.org/abs/2602.21231
作者:Ramchand Kumaresan
备注:12 pages, 9 figures. Measurement framework for adaptive multi-model routing with auditable execution traces
摘要:我们提出了ACAR(自适应复杂性和属性路由),一个测量框架,研究多模型编排在可审计的条件下。ACAR使用从N=3个探针样本计算的自一致性方差(sigma)来跨单模型、双模型和三模型执行模式路由任务。该系统是在TEAMLLM之上实现的,TEAMLLM是一个具有不可变工件和完整决策跟踪的确定性执行基板。我们使用Claude Sonnet 4、GPT-4 o和Gemini 2.0 Flash对1,510个任务进行了评估,这些任务跨越了四个基准:MathArena、Reasoning Gym、LiveCodeBench和SuperGPQA,产生了7,550多个可审计的运行。结果表明,基于sigma的路由实现了55.6%的准确率,超过了54.4%的双模型基线,同时避免了54.2%的任务的完全集成。路由机制与模型无关,不需要学习组件。我们还记录了负面结果。首先,检索增强使准确率降低了3.4个百分点,因为中值检索相似度仅为0.167,这表明没有语义对齐的经验注入会引入噪音而不是接地。其次,当模型同意错误的答案(sigma等于零)时,没有下游集成可以恢复;这种同意但错误的失败模式是自我一致性的内在原因,并且将可实现的准确度限制在比完整集成低大约8个百分点。第三,基于代理信号(如响应相似性和熵)的归因估计与地面实况留一法值的相关性较弱,表明实际归因需要明确的反事实计算。这项工作文件的假设在实践中失败,并提供可证伪的基线,为未来的研究路由,检索和多模型属性。
摘要
:We present ACAR (Adaptive Complexity and Attribution Routing), a measurement framework for studying multi-model orchestration under auditable conditions. ACAR uses self-consistency variance (sigma) computed from N=3 probe samples to route tasks across single-model, two-model, and three-model execution modes. The system is implemented on top of TEAMLLM, a deterministic execution substrate with immutable artifacts and complete decision traces. We evaluate ACAR on 1,510 tasks spanning four benchmarks: MathArena, Reasoning Gym, LiveCodeBench, and SuperGPQA, using Claude Sonnet 4, GPT-4o, and Gemini 2.0 Flash, producing more than 7,550 auditable runs. Results show that sigma-based routing achieves 55.6 percent accuracy, exceeding the two-model baseline of 54.4 percent while avoiding full ensembling on 54.2 percent of tasks. The routing mechanism is model-agnostic and requires no learned components. We also document negative results. First, retrieval augmentation reduced accuracy by 3.4 percentage points, as median retrieval similarity was only 0.167, demonstrating that experience injection without semantic alignment introduces noise rather than grounding. Second, when models agree on incorrect answers (sigma equals zero), no downstream ensemble can recover; this agreement-but-wrong failure mode is intrinsic to self-consistency and bounds achievable accuracy at approximately eight percentage points below full ensembling. Third, attribution estimates based on proxy signals such as response similarity and entropy showed weak correlation with ground-truth leave-one-out values, indicating that practical attribution requires explicit counterfactual computation. This work documents which assumptions fail in practice and provides falsifiable baselines for future research on routing, retrieval, and multi-model attribution.
【7】Learning spatially adaptive sparsity level maps for arbitrary convolutional dictionaries
标题:学习任意卷积词典的空间自适应稀疏度水平地图
链接:https://arxiv.org/abs/2602.21707
作者:Joshua Schulz,David Schote,Christoph Kolbitsch,Kostas Papafitsoros,Andreas Kofler
摘要:最先进的学习重建方法通常依赖于黑盒模块,尽管它们具有强大的性能,但会对其可解释性和鲁棒性提出质疑。在这里,我们建立在最近提出的图像重建方法的基础上,该方法基于通过神经网络推断的空间自适应稀疏水平映射将数据驱动信息嵌入到基于模型的卷积字典正则化中。通过改进的网络设计和专用的训练策略,我们扩展了该方法,以实现滤波器排列不变性以及在推理时改变卷积字典的可能性。我们将我们的方法应用于低场MRI,并将其与其他几种最近基于深度学习的方法进行比较,这些方法也基于体内数据,其中展示了使用不同字典的好处。我们进一步评估了该方法的鲁棒性时,在分布和分布外的数据进行测试。当对后者进行测试时,与其他学习方法相比,所提出的方法受数据分布偏移的影响较小,我们将其归因于由于其底层基于模型的重建组件而减少了对训练数据的依赖。
摘要:State-of-the-art learned reconstruction methods often rely on black-box modules that, despite their strong performance, raise questions about their interpretability and robustness. Here, we build on a recently proposed image reconstruction method, which is based on embedding data-driven information into a model-based convolutional dictionary regularization via neural network-inferred spatially adaptive sparsity level maps. By means of improved network design and dedicated training strategies, we extend the method to achieve filter-permutation invariance as well as the possibility to change the convolutional dictionary at inference time. We apply our method to low-field MRI and compare it to several other recent deep learning-based methods, also on in vivo data, in which the benefit for the use of a different dictionary is showcased. We further assess the method's robustness when tested on in- and out-of-distribution data. When tested on the latter, the proposed method suffers less from the data distribution shift compared to the other learned methods, which we attribute to its reduced reliance on training data due to its underlying model-based reconstruction component.
【8】Efficient Inference after Directionally Stable Adaptive Experiments
标题:方向稳定自适应实验后的高效推理
链接:https://arxiv.org/abs/2602.21478
作者:Zikai Shen,Houssam Zenati,Nathan Kallus,Arthur Gretton,Koulik Khamaru,Aurélien Bibaut
备注:34 pages
摘要:我们研究了自适应数据收集后的标量值路径可微目标的推理,例如Bandit算法。我们引入了一个新的特定于目标的条件,方向稳定性,这是严格弱于以前施加的目标不可知的稳定性条件。方向稳定性下,我们表明,估计,将是有效的独立同分布下。当从自适应收集的轨迹计算时,数据保持渐近正态和半参数有效。正则梯度具有鞅形式,方向稳定性保证其可预测的二次变分的稳定性,使高维渐近正态性。我们使用卷积定理的自适应数据设置的效率,并给出了一个条件下,一步估计达到的效率界。我们验证方向稳定的LinUCB,产生的第一个半参数的效率保证下LinUCB采样的常规标量目标。
摘要:We study inference on scalar-valued pathwise differentiable targets after adaptive data collection, such as a bandit algorithm. We introduce a novel target-specific condition, directional stability, which is strictly weaker than previously imposed target-agnostic stability conditions. Under directional stability, we show that estimators that would have been efficient under i.i.d. data remain asymptotically normal and semiparametrically efficient when computed from adaptively collected trajectories. The canonical gradient has a martingale form, and directional stability guarantees stabilization of its predictable quadratic variation, enabling high-dimensional asymptotic normality. We characterize efficiency using a convolution theorem for the adaptive-data setting, and give a condition under which the one-step estimator attains the efficiency bound. We verify directional stability for LinUCB, yielding the first semiparametric efficiency guarantee for a regular scalar target under LinUCB sampling.
【9】Towards single-shot coherent imaging via overlap-free ptychography
标题:通过无毛刺重叠成像实现单次相干成像
链接:https://arxiv.org/abs/2602.21361
作者:Oliver Hoidn,Aashwin Mishra,Steven Henke,Albert Vong,Matthew Seaberg
摘要
:同步加速器和XFEL源的重叠关联成像需要密集的重叠扫描,限制了通量并增加了剂量。将相干衍射成像扩展到扩展样本上的无干扰操作仍然是一个悬而未决的问题。在这里,我们推广了PtychoPINN(O. Hoidn't {et al.},{Scientific Reports} \textbf{13},22789,2023),以菲涅耳相干衍射成像(CDI)几何结构提供无干扰、单次激发重建,同时还加速了传统的多次激发重叠关联成像。该框架将相干散射的可微前向模型与泊松光子计数似然耦合;实空间重叠通过基于坐标的分组而不是硬性要求作为可调参数进入。在合成基准上,重建在低计数($\sim\!10^4$光子/帧),并且使用实验探针的无干扰单次激发重建达到幅度结构相似性(SSIM)0.904,而无干扰约束重建为0.968。针对具有相同主干的数据饱和监督模型(16,384张训练图像),PtychoPINN仅用1,024张图像就实现了更高的SSIM,并推广到不可见的照明轮廓。每图形处理单元(GPU)的吞吐量约为最小二乘最大似然(LSQ-ML)重建的40倍,分辨率为匹配的128倍。这些结果在先进光子源和直线加速器相干光源的实验数据上得到了验证,将单次曝光的菲涅耳CDI和重叠重叠关联成像统一在一个框架内,支持现代光源下的剂量效率高,高通量成像。
摘要:Ptychographic imaging at synchrotron and XFEL sources requires dense overlapping scans, limiting throughput and increasing dose. Extending coherent diffractive imaging to overlap-free operation on extended samples remains an open problem. Here, we extend PtychoPINN (O. Hoidn \emph{et al.}, \emph{Scientific Reports} \textbf{13}, 22789, 2023) to deliver \emph{overlap-free, single-shot} reconstructions in a Fresnel coherent diffraction imaging (CDI) geometry while also accelerating conventional multi-shot ptychography. The framework couples a differentiable forward model of coherent scattering with a Poisson photon-counting likelihood; real-space overlap enters as a tunable parameter via coordinate-based grouping rather than a hard requirement. On synthetic benchmarks, reconstructions remain accurate at low counts ($\sim\!10^4$ photons/frame), and overlap-free single-shot reconstruction with an experimental probe reaches amplitude structural similarity (SSIM) 0.904, compared with 0.968 for overlap-constrained reconstruction. Against a data-saturated supervised model with the same backbone (16,384 training images), PtychoPINN achieves higher SSIM with only 1,024 images and generalizes to unseen illumination profiles. Per-graphics processing unit (GPU) throughput is approximately $40\times$ that of least-squares maximum-likelihood (LSQ-ML) reconstruction at matched $128\times128$ resolution. These results, validated on experimental data from the Advanced Photon Source and the Linac Coherent Light Source, unify single-exposure Fresnel CDI and overlapped ptychography within one framework, supporting dose-efficient, high-throughput imaging at modern light sources.
强化学习(2篇)
【1】Hierarchical Lead Critic based Multi-Agent Reinforcement Learning
标题:基于层次化主批评的多Agent强化学习
链接:https://arxiv.org/abs/2602.21680
作者:David Eckel,Henri Meeß
备注:16 pages, 10 Figures, Preprint
摘要:协作多智能体强化学习(MARL)解决了需要多个智能体协调的复杂任务,但通常限于局部(独立学习)或全局(集中式学习)。在本文中,我们介绍了一种新的顺序训练方案和MARL架构,它从不同层次上的多个角度进行学习。我们提出了分层首席评论员(HLC)--其灵感来自团队结构中自然出现的分布,其中遵循高层目标与低层执行相结合。HLC表明,引入多层次结构,利用本地和全局视角,可以提高性能,提高采样效率和稳健的策略。合作,非通信和部分可观察的MARL基准进行的实验结果表明,HLC优于单层次基线和规模强大的代理和难度的增加量。
摘要:Cooperative Multi-Agent Reinforcement Learning (MARL) solves complex tasks that require coordination from multiple agents, but is often limited to either local (independent learning) or global (centralized learning) perspectives. In this paper, we introduce a novel sequential training scheme and MARL architecture, which learns from multiple perspectives on different hierarchy levels. We propose the Hierarchical Lead Critic (HLC) - inspired by natural emerging distributions in team structures, where following high-level objectives combines with low-level execution. HLC demonstrates that introducing multiple hierarchies, leveraging local and global perspectives, can lead to improved performance with high sample efficiency and robust policies. Experimental results conducted on cooperative, non-communicative, and partially observable MARL benchmarks demonstrate that HLC outperforms single hierarchy baselines and scales robustly with increasing amounts of agents and difficulty.
【2】Overconfident Errors Need Stronger Correction: Asymmetric Confidence Penalties for Reinforcement Learning
标题:过度自信的错误需要更强有力的纠正:强化学习的不对称信心惩罚
链接:https://arxiv.org/abs/2602.21420
作者:Yuanda Xu,Hejian Sang,Zhengze Zhou,Ran He,Zhipeng Wang
摘要:带有可验证奖励的强化学习(RLVR)已经成为增强大型语言模型(LLM)推理的主要范例。然而,标准RLVR算法遭受了一个有据可查的病理学:虽然它们通过锐化采样提高了Pass@1的准确性,但它们同时缩小了模型的推理边界并减少了生成多样性。我们确定了一个根本原因,现有的方法忽视:统一的惩罚错误。当前的方法--无论是根据难度选择提示的数据过滤方法,还是优势规范化方案--都同等对待一个组中所有不正确的卷展。我们发现,这种一致性允许过度自信的错误(RL过程虚假地强化的不正确的推理路径)持续存在并垄断概率质量,最终抑制有效的探索轨迹。为了解决这个问题,我们提出了非对称置信度感知错误惩罚(ACE)。ACE引入了每次推出的置信度偏移度量,c_i = log(pi_theta(y_i| x)/ pi_ref(y_i| x)),以动态地调节负面优势。从理论上讲,我们证明了ACE的梯度可以分解成一个选择性的正则化限制过度自信的错误,加上一个良好的特征残差,部分缓和正则化的强度的梯度。我们在VERL框架内使用GRPO和DAPO在DAPO-Math-17 K数据集上进行了大量的实验,微调Qwen2.5-Math-7 B,Qwen 3 -8B-Base和Llama-3.1-8B-Instruct。在MATH-500和AIME 2025上进行了评估,ACE与现有方法无缝结合,并在所有三个模型系列和基准测试中不断改进整个Pass@k谱。
摘要:Reinforcement Learning with Verifiable Rewards (RLVR) has become the leading paradigm for enhancing reasoning in Large Language Models (LLMs). However, standard RLVR algorithms suffer from a well-documented pathology: while they improve Pass@1 accuracy through sharpened sampling, they simultaneously narrow the model's reasoning boundary and reduce generation diversity. We identify a root cause that existing methods overlook: the uniform penalization of errors. Current approaches -- whether data-filtering methods that select prompts by difficulty, or advantage normalization schemes -- treat all incorrect rollouts within a group identically. We show that this uniformity allows overconfident errors (incorrect reasoning paths that the RL process has spuriously reinforced) to persist and monopolize probability mass, ultimately suppressing valid exploratory trajectories. To address this, we propose the Asymmetric Confidence-aware Error Penalty (ACE). ACE introduces a per-rollout confidence shift metric, c_i = log(pi_theta(y_i|x) / pi_ref(y_i|x)), to dynamically modulate negative advantages. Theoretically, we demonstrate that ACE's gradient can be decomposed into the gradient of a selective regularizer restricted to overconfident errors, plus a well-characterized residual that partially moderates the regularizer's strength. We conduct extensive experiments fine-tuning Qwen2.5-Math-7B, Qwen3-8B-Base, and Llama-3.1-8B-Instruct on the DAPO-Math-17K dataset using GRPO and DAPO within the VERL framework. Evaluated on MATH-500 and AIME 2025, ACE composes seamlessly with existing methods and consistently improves the full Pass@k spectrum across all three model families and benchmarks.
符号|符号学习(2篇)
【1】Geometric Priors for Generalizable World Models via Vector Symbolic Architecture
标题:通过载体符号架构的可推广世界模型的几何先验
链接:https://arxiv.org/abs/2602.21467
作者:William Youngwoo Chung,Calvin Yeung,Hansen Jin Lillemark,Zhuowen Zou,Xiangjian Liu,Mohsen Imani
备注:9 pages, accepted to Neurips 2025 Workshop Symmetry and Geometry in Neural Representations
摘要:人工智能和神经科学的一个关键挑战是理解神经系统如何学习捕捉世界底层动态的表征。大多数世界模型用非结构化神经网络表示转换函数,限制了可解释性、样本效率和对不可见状态或动作组合的泛化。我们解决这些问题与一个可推广的世界模型接地矢量符号架构(VSA)的原则作为几何先验。我们的方法利用可学习的傅立叶全息简化表示(FHRR)编码器将状态和动作映射到具有学习组结构的高维复向量空间中,并使用逐元素复乘法模型转换。我们正式的框架的群论基础,并显示如何训练这种结构化的表示近似不变,使强大的多步组成直接在潜在空间和泛化性能在各种实验。在离散网格世界环境中,我们的模型对看不见的状态-动作对实现了87.5%的zero shot准确度,在20时间步的水平线上获得了53.6%的更高准确度,并且相对于MLP基线,对噪声的鲁棒性提高了4倍。这些结果强调了如何训练潜在的群体结构产生可推广的,数据高效的和可解释的世界模型,为现实世界的规划和推理提供了一条通往结构化模型的原则性途径。
摘要:A key challenge in artificial intelligence and neuroscience is understanding how neural systems learn representations that capture the underlying dynamics of the world. Most world models represent the transition function with unstructured neural networks, limiting interpretability, sample efficiency, and generalization to unseen states or action compositions. We address these issues with a generalizable world model grounded in Vector Symbolic Architecture (VSA) principles as geometric priors. Our approach utilizes learnable Fourier Holographic Reduced Representation (FHRR) encoders to map states and actions into a high dimensional complex vector space with learned group structure and models transitions with element-wise complex multiplication. We formalize the framework's group theoretic foundation and show how training such structured representations to be approximately invariant enables strong multi-step composition directly in latent space and generalization performances over various experiments. On a discrete grid world environment, our model achieves 87.5% zero shot accuracy to unseen state-action pairs, obtains 53.6% higher accuracy on 20-timestep horizon rollouts, and demonstrates 4x higher robustness to noise relative to an MLP baseline. These results highlight how training to have latent group structure yields generalizable, data-efficient, and interpretable world models, providing a principled pathway toward structured models for real-world planning and reasoning.
【2】SymTorch: A Framework for Symbolic Distillation of Deep Neural Networks
标题:SymTorch:深度神经网络符号蒸馏的框架
链接:https://arxiv.org/abs/2602.21307
作者:Elizabeth S. Z. Tan,Adil Soubki,Miles Cranmer
摘要:符号蒸馏用可解释的、封闭形式的数学表达式代替神经网络或其组件。这种方法在直接从经过训练的深度学习模型中发现物理定律和数学关系方面表现出了希望,但由于将符号回归集成到深度学习工作流中的工程障碍,采用仍然有限。我们介绍了SymTorch,这是一个通过包装神经网络组件,收集它们的输入输出行为,并通过PySR用人类可读的方程近似它们来自动化这种蒸馏的库。SymTorch处理阻碍采用的工程挑战:GPU-CPU数据传输,输入输出缓存,模型序列化以及神经和符号前向传递之间的无缝切换。我们展示了SymTorch在不同架构中的应用,包括GNN、PINN和Transformer模型。最后,我们提出了一个概念验证,通过用符号代理替换MLP层来加速LLM推理,实现了8.3%的吞吐量提高,性能适度下降。
摘要:Symbolic distillation replaces neural networks, or components thereof, with interpretable, closed-form mathematical expressions. This approach has shown promise in discovering physical laws and mathematical relationships directly from trained deep learning models, yet adoption remains limited due to the engineering barrier of integrating symbolic regression into deep learning workflows. We introduce SymTorch, a library that automates this distillation by wrapping neural network components, collecting their input-output behavior, and approximating them with human-readable equations via PySR. SymTorch handles the engineering challenges that have hindered adoption: GPU-CPU data transfer, input-output caching, model serialization, and seamless switching between neural and symbolic forward passes. We demonstrate SymTorch across diverse architectures including GNNs, PINNs and transformer models. Finally, we present a proof-of-concept for accelerating LLM inference by replacing MLP layers with symbolic surrogates, achieving an 8.3\% throughput improvement with moderate performance degradation.
医学相关(3篇)
【1】Disease Progression and Subtype Modeling for Combined Discrete and Continuous Input Data
标题:离散和连续输入数据相结合的疾病进展和亚型建模
链接:https://arxiv.org/abs/2602.22018
作者:Sterre de Jonge,Elisabeth J. Vinke,Meike W. Vernooij,Daniel C. Alexander,Alexandra L. Young,Esther E. Bron
备注:Accepted for publication, 2026 IEEE 23rd International Symposium on Biomedical Imaging (ISBI), April 2026, London, United Kingdom
摘要:疾病进展建模提供了一个强大的框架,以从短期生物标志物数据中识别长期疾病轨迹。它是更深入了解阿尔茨海默病等疾病轨迹较长的疾病的宝贵工具。大多数疾病进展模型的关键限制是它们特定于单一数据类型(例如,连续数据),从而限制了它们对异构、真实世界数据集的适用性。为了解决这个问题,我们提出了混合事件模型,一种新的疾病进展模型,可以处理离散和连续数据类型。该模型在子类型和阶段推理(SuStaIn)框架内实现,从而产生Mixed-SuStaIn,从而实现子类型和进展建模。我们通过模拟实验和来自阿尔茨海默病神经成像倡议的真实数据证明了Mixed-SuStaIn的有效性,表明它在混合数据集上表现良好。该代码可在https://github.com/ucl-pond/pySuStaIn上获得。
摘要
:Disease progression modeling provides a robust framework to identify long-term disease trajectories from short-term biomarker data. It is a valuable tool to gain a deeper understanding of diseases with a long disease trajectory, such as Alzheimer's disease. A key limitation of most disease progression models is that they are specific to a single data type (e.g., continuous data), thereby limiting their applicability to heterogeneous, real-world datasets. To address this limitation, we propose the Mixed Events model, a novel disease progression model that handles both discrete and continuous data types. This model is implemented within the Subtype and Stage Inference (SuStaIn) framework, resulting in Mixed-SuStaIn, enabling subtype and progression modeling. We demonstrate the effectiveness of Mixed-SuStaIn through simulation experiments and real-world data from the Alzheimer's Disease Neuroimaging Initiative, showing that it performs well on mixed datasets. The code is available at: https://github.com/ucl-pond/pySuStaIn.
【2】Brain Tumor Segmentation with Special Emphasis on the Non-Enhancing Brain Tumor Compartment
标题:脑肿瘤分割,特别强调非增强脑肿瘤区
链接:https://arxiv.org/abs/2602.21703
作者:T. Schaffer,A. Brawanski,S. Wein,A. M. Tomé,E. W. Lang
摘要:一个基于U-Net的深度学习架构被设计用于在脑肿瘤出现在各种MRI模式上时对其进行分割。特别强调的是非增强肿瘤区室。后者在最近的脑肿瘤分割挑战(如MICCAI挑战)中不再被考虑。然而,它被认为是患者生存时间以及进一步肿瘤生长区域的指示。因此,它认为有必要有手段自动划定其在肿瘤内的延伸。
摘要:A U-Net based deep learning architecture is designed to segment brain tumors as they appear on various MRI modalities. Special emphasis is lent to the non-enhancing tumor compartment. The latter has not been considered anymore in recent brain tumor segmentation challenges like the MICCAI challenges. However, it is considered to be indicative of the survival time of the patient as well as of areas of further tumor growth. Hence it deems essential to have means to automatically delineate its extension within the tumor.
【3】Multimodal Survival Modeling and Fairness-Aware Clinical Machine Learning for 5-Year Breast Cancer Risk Prediction
标题:用于5年乳腺癌风险预测的多模式生存建模和公平意识临床机器学习
链接:https://arxiv.org/abs/2602.21648
作者:Toktam Khatibi
摘要:临床风险预测模型在现实环境中往往表现不佳,这是由于校准不良、可移植性有限和亚组差异。这些挑战在以复杂特征相互作用和p >> n结构为特征的高维多模态癌症数据集中被放大。我们提出了一个完全可重复的多模式机器学习框架,用于乳腺癌的5年总生存期预测,将临床变量与METABRIC队列的高维转录组和拷贝数改变(CNA)特征相结合。 在基于方差和稀疏性的滤波和降维之后,使用分层训练/验证/测试分割以及基于验证的超参数调整来训练模型。比较了两种生存方法:弹性网络正则化Cox模型(CoxNet)和使用XGBoost实现的梯度提升生存树模型。CoxNet提供嵌入式特征选择和稳定的估计,而XGBoost则捕获非线性效应和高阶相互作用。 使用时间依赖性ROC曲线下面积(AUC)、平均精密度(AP)、校准曲线、Brier评分和自举95%置信区间评估性能。CoxNet的验证和测试AUC分别为98.3和96.6,AP值分别为90.1和80.4。XGBoost的验证和测试AUC分别为98.6和92.5,AP值分别为92.5和79.9。公平性诊断显示,在不同年龄组、雌激素受体状态、分子亚型和绝经状态之间存在稳定的歧视。 这项工作介绍了一个面向治理的多模态生存框架,强调校准,公平审计,鲁棒性和可重复性的高维临床机器学习。
摘要:Clinical risk prediction models often underperform in real-world settings due to poor calibration, limited transportability, and subgroup disparities. These challenges are amplified in high-dimensional multimodal cancer datasets characterized by complex feature interactions and a p >> n structure. We present a fully reproducible multimodal machine learning framework for 5-year overall survival prediction in breast cancer, integrating clinical variables with high-dimensional transcriptomic and copy-number alteration (CNA) features from the METABRIC cohort. After variance- and sparsity-based filtering and dimensionality reduction, models were trained using stratified train/validation/test splits with validation-based hyperparameter tuning. Two survival approaches were compared: an elastic-net regularized Cox model (CoxNet) and a gradient-boosted survival tree model implemented using XGBoost. CoxNet provides embedded feature selection and stable estimation, whereas XGBoost captures nonlinear effects and higher-order interactions. Performance was assessed using time-dependent area under the ROC curve (AUC), average precision (AP), calibration curves, Brier score, and bootstrapped 95 percent confidence intervals. CoxNet achieved validation and test AUCs of 98.3 and 96.6, with AP values of 90.1 and 80.4. XGBoost achieved validation and test AUCs of 98.6 and 92.5, with AP values of 92.5 and 79.9. Fairness diagnostics showed stable discrimination across age groups, estrogen receptor status, molecular subtypes, and menopausal state. This work introduces a governance-oriented multimodal survival framework emphasizing calibration, fairness auditing, robustness, and reproducibility for high-dimensional clinical machine learning.
蒸馏|知识提取(1篇)
【1】Latent Context Compilation: Distilling Long Context into Compact Portable Memory
标题:潜在上下文编译:将长上下文提炼到紧凑的便携式存储器中
链接:https://arxiv.org/abs/2602.21221
作者:Zeju Li,Yizhou Zhou,Qiang Xu
摘要:有效的长上下文LLM部署被摊销压缩和测试时训练之间的二分法所阻碍,摊销压缩与分布外的泛化斗争,测试时训练会导致高昂的合成数据成本,并需要修改模型权重,创建有状态参数,使并发服务复杂化。我们提出了潜在的上下文编译,一个框架,从根本上改变上下文处理从适应到编译。通过利用一次性LoRA模块作为编译器,我们将长上下文提取到紧凑的缓冲区令牌中-无状态,便携式内存工件,即插即用,与冻结的基本模型兼容。至关重要的是,我们引入了一个自对齐的优化策略,消除了合成上下文相关的QA对的需要。通过正则化上下文重建任务与上下文无关的随机查询,我们迫使压缩令牌驻留在模型的现有的推理以下的流形。Llama-3.1-8B的实验表明,潜在上下文编译保留了细粒度的细节和推理能力,即使在16倍压缩比下,也能有效地将内存密度与模型参数解耦。
摘要
:Efficient long-context LLM deployment is stalled by a dichotomy between amortized compression, which struggles with out-of-distribution generalization, and Test-Time Training, which incurs prohibitive synthetic data costs and requires modifying model weights, creating stateful parameters that complicate concurrent serving. We propose Latent Context Compilation, a framework that fundamentally shifts context processing from adaptation to compilation. By utilizing a disposable LoRA module as a compiler, we distill long contexts into compact buffer tokens -- stateless, portable memory artifacts that are plug-and-play compatible with frozen base models. Crucially, we introduce a self-aligned optimization strategy that eliminates the need for synthetic context-relevant QA pairs. By regularizing context reconstruction task with context-agnostic random queries, we force compressed tokens to reside within the model's existing instruction-following manifold. Experiments with Llama-3.1-8B demonstrate that Latent Context Compilation preserves fine-grained details and reasoning capabilities where prior methods falter, effectively decoupling memory density from model parameters even at a 16x compression ratio.
推荐(1篇)
【1】Learning to Collaborate via Structures: Cluster-Guided Item Alignment for Federated Recommendation
标题:学习通过结构协作:联合推荐的程序员引导的项目对齐
链接:https://arxiv.org/abs/2602.21957
作者:Yuchun Tu,Zhiwei Li,Bingli Sun,Yixuan Li,Xiao Song
备注:18 pages, 9 figures
摘要:联合推荐促进了分布式客户端之间的协作模型训练,同时将敏感的用户交互数据保持在本地。传统方法通常依赖于在服务器和客户端之间同步高维项表示。这种范例隐含地假设嵌入坐标的精确几何对齐对于跨客户端的协作是必要的。我们认为,在项目之间建立相对语义关系比强制共享表示更有效。具体而言,全局语义关系充当项目的结构约束。在这些约束条件下,框架允许项目表示在每个客户端上本地变化,这种灵活性使模型能够在保持全局一致性的同时捕获细粒度的用户个性化。为此,我们提出了一个框架,该框架将上传的嵌入转换为紧凑的集群标签。在这个框架中,服务器作为一个全局结构发现者来学习项目集群,并只分发结果标签。这种机制显式地切断了项嵌入的下游传输,从而使客户端不必维护全局共享的项嵌入。因此,CGFedRec实现了将全局协作信号有效地注入到本地项目表示中,而无需传输完整的嵌入。大量的实验表明,我们的方法显着提高了通信效率,同时在多个数据集上保持卓越的推荐精度。
摘要:Federated recommendation facilitates collaborative model training across distributed clients while keeping sensitive user interaction data local. Conventional approaches typically rely on synchronizing high-dimensional item representations between the server and clients. This paradigm implicitly assumes that precise geometric alignment of embedding coordinates is necessary for collaboration across clients. We posit that establishing relative semantic relationships among items is more effective than enforcing shared representations. Specifically, global semantic relations serve as structural constraints for items. Within these constraints, the framework allows item representations to vary locally on each client, which flexibility enables the model to capture fine-grained user personalization while maintaining global consistency. To this end, we propose Cluster-Guided FedRec framework (CGFedRec), a framework that transforms uploaded embeddings into compact cluster labels. In this framework, the server functions as a global structure discoverer to learn item clusters and distributes only the resulting labels. This mechanism explicitly cuts off the downstream transmission of item embeddings, relieving clients from maintaining global shared item embeddings. Consequently, CGFedRec achieves the effective injection of global collaborative signals into local item representations without transmitting full embeddings. Extensive experiments demonstrate that our approach significantly improves communication efficiency while maintaining superior recommendation accuracy across multiple datasets.
聚类(2篇)
【1】Deep Clustering based Boundary-Decoder Net for Inter and Intra Layer Stress Prediction of Heterogeneous Integrated IC Chip
标题:基于深度集群的边界解码器网络用于异类集成IC芯片层间和层内压力预测
链接:https://arxiv.org/abs/2602.21601
作者:Kart Leong Lim,Ji Lin
摘要:当3D异质IC封装在极端温度下经受热循环时,会出现高应力。应力主要发生在不同材料之间的界面处。本文采用基于深度生成模型(DGM)的潜空间表示方法对应力图像进行研究。然而,大多数DGM方法是无监督的,这意味着它们采用图像配对(输入和输出)来训练DGM。相反,我们依赖于最近的边界解码器(BD)网络,它使用边界条件和图像配对进行应力建模。边界网将材料参数映射到由其图像对应物共享的潜在空间。因为这样的设置在维度上是不适定的,我们进一步将BD网与深度聚类耦合。为了评估我们所提出的方法的性能,我们模拟了一个由1825张应力图像组成的IC芯片数据集。我们比较了我们的新方法,使用BD网的变体以及基线方法。我们表明,我们的方法是能够优于所有的比较方面的训练和测试误差减少。
摘要:High stress occurs when 3D heterogeneous IC packages are subjected to thermal cycling at extreme temperatures. Stress mainly occurs at the interface between different materials. We investigate stress image using latent space representation which is based on using deep generative model (DGM). However, most DGM approaches are unsupervised, meaning they resort to image pairing (input and output) to train DGM. Instead, we rely on a recent boundary-decoder (BD) net, which uses boundary condition and image pairing for stress modeling. The boundary net maps material parameters to the latent space co-shared by its image counterpart. Because such a setup is dimensionally wise ill-posed, we further couple BD net with deep clustering. To access the performance of our proposed method, we simulate an IC chip dataset comprising of 1825 stress images. We compare our new approach using variants of BD net as well as a baseline approach. We show that our approach is able to outperform all the comparison in terms of train and test error reduction.
【2】Fair Model-based Clustering
标题:基于公平模型的集群
链接:https://arxiv.org/abs/2602.21509
作者:Jinwon Park,Kunwoong Kim,Jihu Lee,Yongdai Kim
备注:Accepted by AAAI 2026 (Main Track, Oral presentation)
摘要
:公平聚类的目标是找到聚类,使得敏感属性(例如,性别、种族等)在每个聚类中,它与整个数据集的相似。已经提出了各种公平聚类算法,修改标准的K-均值聚类,以满足给定的公平性约束。现有的几种公平聚类算法的一个关键限制是,要学习的参数的数量是成比例的样本大小,因为每个数据的聚类分配应与聚类中心同时优化,从而扩大算法是困难的。在本文中,我们提出了一个新的公平聚类算法的基础上,有限的混合模型,称为公平模型为基础的聚类(FMC)。FMC的一个主要优点是可学习参数的数量与样本大小无关,因此可以很容易地扩展。特别是,小批量学习可以获得近似公平的聚类。此外,FMC可以应用于非度量数据(例如,分类数据),只要可能性是明确定义的。理论和经验的理由所提出的算法的优越性。
摘要:The goal of fair clustering is to find clusters such that the proportion of sensitive attributes (e.g., gender, race, etc.) in each cluster is similar to that of the entire dataset. Various fair clustering algorithms have been proposed that modify standard K-means clustering to satisfy a given fairness constraint. A critical limitation of several existing fair clustering algorithms is that the number of parameters to be learned is proportional to the sample size because the cluster assignment of each datum should be optimized simultaneously with the cluster center, and thus scaling up the algorithms is difficult. In this paper, we propose a new fair clustering algorithm based on a finite mixture model, called Fair Model-based Clustering (FMC). A main advantage of FMC is that the number of learnable parameters is independent of the sample size and thus can be scaled up easily. In particular, mini-batch learning is possible to obtain clusters that are approximately fair. Moreover, FMC can be applied to non-metric data (e.g., categorical data) as long as the likelihood is well-defined. Theoretical and empirical justifications for the superiority of the proposed algorithm are provided.
自动驾驶|车辆|车道检测等(3篇)
【1】Learning from Yesterday's Error: An Efficient Online Learning Method for Traffic Demand Prediction
标题:从昨天的错误中学习:一种有效的交通需求预测在线学习方法
链接:https://arxiv.org/abs/2602.21757
作者:Xiannan Huang,Quan Yuan,Chao Yang
摘要:准确预测短期交通需求是智能交通系统的关键。虽然深度学习模型在静态条件下具有强大的性能,但当面临外部事件或不断变化的城市动态引起的分布变化时,其准确性往往会显着下降。频繁的模型再训练,以适应这种变化会导致高昂的计算成本,特别是对于大规模或基础模型。为了应对这一挑战,我们提出了FORESEE(预测在线与残差平滑和包络专家),一个轻量级的在线自适应框架,是准确的,强大的,计算效率。FORESEE的操作不需要对基本模型进行任何参数更新。相反,它使用昨天的预测误差来校正每个地区今天的预测,通过指数平滑来稳定,该指数平滑由适应最近误差动态的专家混合机制指导。此外,自适应时空平滑组件跨相邻区域和时隙传播误差信号,捕获需求模式中的相干移位。在七个真实世界的数据集上进行的大量实验表明,FORESEE始终提高了预测精度,即使在分布变化最小的情况下也保持了鲁棒性(避免性能下降),并在现有的在线方法中实现了最低的计算开销。通过以可忽略的计算成本实时调整交通预测模型,FORESEE为在动态城市环境中部署可靠的最新预测系统铺平了道路。代码和数据可在https://github.com/xiannanhuang/FORESEE上获取
摘要:Accurately predicting short-term traffic demand is critical for intelligent transportation systems. While deep learning models achieve strong performance under stationary conditions, their accuracy often degrades significantly when faced with distribution shifts caused by external events or evolving urban dynamics. Frequent model retraining to adapt to such changes incurs prohibitive computational costs, especially for large-scale or foundation models. To address this challenge, we propose FORESEE (Forecasting Online with Residual Smoothing and Ensemble Experts), a lightweight online adaptation framework that is accurate, robust, and computationally efficient. FORESEE operates without any parameter updates to the base model. Instead, it corrects today's forecast in each region using yesterday's prediction error, stabilized through exponential smoothing guided by a mixture-of-experts mechanism that adapts to recent error dynamics. Moreover, an adaptive spatiotemporal smoothing component propagates error signals across neighboring regions and time slots, capturing coherent shifts in demand patterns. Extensive experiments on seven real-world datasets with three backbone models demonstrate that FORESEE consistently improves prediction accuracy, maintains robustness even when distribution shifts are minimal (avoiding performance degradation), and achieves the lowest computational overhead among existing online methods. By enabling real-time adaptation of traffic forecasting models with negligible computational cost, FORESEE paves the way for deploying reliable, up-to-date prediction systems in dynamic urban environments. Code and data are available at https://github.com/xiannanhuang/FORESEE
【2】INTACT: Intent-Aware Representation Learning for Cryptographic Traffic Violation Detection
标题:INTACT:用于加密交通违规检测的意图感知表示学习
链接:https://arxiv.org/abs/2602.21252
作者:Rahul D Ray
备注:13 pages, 3 figures
摘要:安全监视系统通常将异常检测视为识别与观察到的数据分布的统计偏差。然而,在加密流量分析中,违规行为不是由稀有性定义的,而是由明确的策略约束定义的,包括密钥重用禁止,降级预防和有限的密钥寿命。这种基本的不匹配限制了常规异常检测方法的可解释性和适应性。我们介绍了INTACT(意图感知加密流量),一个政策条件的框架,重新制定的违规检测为条件约束学习。INTACT不是学习行为特征上的静态决策边界,而是根据观察到的行为和声明的安全意图对违规概率进行建模。该架构将表示学习分解为行为和意图编码器,其融合嵌入产生违规分数,产生决策边界的策略参数化家族。我们在一个真实的网络流数据集和一个210,000迹的合成多意图加密数据集上评估了该框架。INTACT匹配或超过强无监督和监督基线,在真实数据集中实现近乎完美的区分(AUROC高达1.0000),并在合成环境中检测关系和复合违规方面具有一致的优势。这些结果表明,显式意图条件反射提高了加密监控的区分度、可解释性和鲁棒性。
摘要
:Security monitoring systems typically treat anomaly detection as identifying statistical deviations from observed data distributions. In cryptographic traffic analysis, however, violations are defined not by rarity but by explicit policy constraints, including key reuse prohibition, downgrade prevention, and bounded key lifetimes. This fundamental mismatch limits the interpretability and adaptability of conventional anomaly detection methods. We introduce INTACT (INTent-Aware Cryptographic Traffic), a policy-conditioned framework that reformulates violation detection as conditional constraint learning. Instead of learning a static decision boundary over behavioral features, INTACT models the probability of violation conditioned on both observed behavior and declared security intent. The architecture factorizes representation learning into behavioral and intent encoders whose fused embeddings produce a violation score, yielding a policy-parameterized family of decision boundaries. We evaluate the framework on a real-world network flow dataset and a 210,000-trace synthetic multi-intent cryptographic dataset. INTACT matches or exceeds strong unsupervised and supervised baselines, achieving near-perfect discrimination (AUROC up to 1.0000) in the real dataset and consistent superiority in detecting relational and composite violations in the synthetic setting. These results demonstrate that explicit intent conditioning improves discrimination, interpretability, and robustness in cryptographic monitoring.
【3】Urban Vibrancy Embedding and Application on Traffic Prediction
标题:城市活力嵌入及其在交通预测中的应用
链接:https://arxiv.org/abs/2602.21232
作者:Sumin Han,Jisun An,Dongman Lee
摘要:城市活力反映了城市空间内动态的人类活动,通常使用捕捉流动人口趋势的移动数据来衡量。本文提出了一种新的方法,从实时流动人口数据中提取城市活力嵌入,以增强交通预测模型。具体来说,我们利用变分自编码器(VAE)将这些数据压缩成可操作的嵌入,然后将其与长短期记忆(LSTM)网络集成以预测未来的嵌入。这些随后被应用在一个序列到序列的交通预测框架。我们的贡献有三个方面:(1)我们使用主成分分析(PCA)来解释嵌入,揭示时间模式,如工作日与周末的区别和季节模式;(2)我们提出了一种结合VAE和LSTM的方法,能够预测动态城市知识嵌入;(3)我们的方法提高了交通预测模型的准确性和响应能力,包括RNN,DCRNN,GTS和GMAN。这项研究证明了城市活力嵌入在推进交通预测和提供更细致入微的城市流动性分析方面的潜力。
摘要:Urban vibrancy reflects the dynamic human activity within urban spaces and is often measured using mobile data that captures floating population trends. This study proposes a novel approach to derive Urban Vibrancy embeddings from real-time floating population data to enhance traffic prediction models. Specifically, we utilize variational autoencoders (VAE) to compress this data into actionable embeddings, which are then integrated with long short-term memory (LSTM) networks to predict future embeddings. These are subsequently applied in a sequence-to-sequence framework for traffic forecasting. Our contributions are threefold: (1) We use principal component analysis (PCA) to interpret the embeddings, revealing temporal patterns such as weekday versus weekend distinctions and seasonal patterns; (2) We propose a method that combines VAE and LSTM, enabling forecasting dynamic urban knowledge embedding; and (3) Our approach improves accuracy and responsiveness in traffic prediction models, including RNN, DCRNN, GTS, and GMAN. This study demonstrates the potential of Urban Vibrancy embeddings to advance traffic prediction and offer a more nuanced analysis of urban mobility.
点云|SLAM|雷达|激光|深度RGBD相关(1篇)
【1】From Words to Amino Acids: Does the Curse of Depth Persist?
标题:从文字到氨基酸:深度诅咒持续存在吗?
链接:https://arxiv.org/abs/2602.21750
作者:Aleena Siji,Amir Mohammad Karimi Mamaghan,Ferdinand Kapl,Tobias Höppe,Emmanouil Angelis,Andrea Dittadi,Maurice Brenner,Michael Heinzinger,Karl Henrik Johansson,Kaitlin Maile,Johannes von Oswald,Stefan Bauer
摘要:蛋白质语言模型(PLM)已被广泛采用作为通用模型,在蛋白质工程和从头设计中表现出强大的性能。与大型语言模型(LLM)一样,它们通常被训练为深度Transformers,具有大规模序列语料库上的下一个标记或掩蔽标记预测目标,并通过增加模型深度来扩展。最近关于自回归LLM的工作已经确定了深度的诅咒:后面的层对最终输出预测的贡献很小。这些发现自然提出了一个问题,即类似的深度效率低下是否也出现在PLM中,其中许多广泛使用的模型不是自回归的,有些是多模态的,接受蛋白质序列和结构作为输入。在这项工作中,我们提出了一个深度分析的六个流行的PLM跨模型的家庭和规模,跨越三个训练目标,即自回归,掩蔽和扩散,并量化层的贡献如何演变与深度使用一组统一的探测和扰动为基础的测量。在所有模型中,我们观察到一致的深度依赖模式,这些模式扩展了LLM上的先前发现:后面的层对早期计算的依赖程度较低,主要是细化最终的输出分布,这些影响在更深的模型中越来越明显。总之,我们的研究结果表明,PLM表现出一种深度效率低下的形式,激励未来的工作更深度高效的架构和培训方法。
摘要:Protein language models (PLMs) have become widely adopted as general-purpose models, demonstrating strong performance in protein engineering and de novo design. Like large language models (LLMs), they are typically trained as deep transformers with next-token or masked-token prediction objectives on massive sequence corpora and are scaled by increasing model depth. Recent work on autoregressive LLMs has identified the Curse of Depth: later layers contribute little to the final output predictions. These findings naturally raise the question of whether a similar depth inefficiency also appears in PLMs, where many widely used models are not autoregressive, and some are multimodal, accepting both protein sequence and structure as input. In this work, we present a depth analysis of six popular PLMs across model families and scales, spanning three training objectives, namely autoregressive, masked, and diffusion, and quantify how layer contributions evolve with depth using a unified set of probing- and perturbation-based measurements. Across all models, we observe consistent depth-dependent patterns that extend prior findings on LLMs: later layers depend less on earlier computations and mainly refine the final output distribution, and these effects are increasingly pronounced in deeper models. Taken together, our results suggest that PLMs exhibit a form of depth inefficiency, motivating future work on more depth-efficient architectures and training methods.
联邦学习|隐私保护|加密(4篇)
【1】GFPL: Generative Federated Prototype Learning for Resource-Constrained and Data-Imbalanced Vision Task
标题:GFPL:资源受限和数据不平衡视觉任务的生成式联邦原型学习
链接:https://arxiv.org/abs/2602.21873
作者:Shiwei Lu,Yuhang He,Jiashuo Li,Qiang Wang,Yihong Gong
摘要
:联邦学习(FL)促进了分散图像的安全利用,推进了医学图像识别和自动驾驶的应用。然而,传统的FL在现实世界的部署面临着两个关键的挑战:无效的知识融合模型更新偏向多数类功能,和禁止通信开销,由于频繁的高维模型参数的传输。受人脑知识整合效率的启发,我们提出了一个新的生成式联邦原型学习(GFPL)框架来解决这些问题。在此框架内,基于高斯混合模型(GMM)的原型生成方法捕获类的统计信息的功能,而原型聚合策略使用Bhattacharyya距离有效地融合语义相似的知识跨客户端。此外,这些融合的原型被用来生成伪特征,从而减轻客户端之间的特征分布不平衡。为了进一步增强局部训练过程中的特征对齐,我们设计了一个双分类器架构,通过结合点回归和交叉熵的混合损失进行优化。在基准测试上的大量实验表明,GFPL在不平衡数据设置下将模型精度提高了3.6%,同时保持了较低的通信成本。
摘要:Federated learning (FL) facilitates the secure utilization of decentralized images, advancing applications in medical image recognition and autonomous driving. However, conventional FL faces two critical challenges in real-world deployment: ineffective knowledge fusion caused by model updates biased toward majority-class features, and prohibitive communication overhead due to frequent transmissions of high-dimensional model parameters. Inspired by the human brain's efficiency in knowledge integration, we propose a novel Generative Federated Prototype Learning (GFPL) framework to address these issues. Within this framework, a prototype generation method based on Gaussian Mixture Model (GMM) captures the statistical information of class-wise features, while a prototype aggregation strategy using Bhattacharyya distance effectively fuses semantically similar knowledge across clients. In addition, these fused prototypes are leveraged to generate pseudo-features, thereby mitigating feature distribution imbalance across clients. To further enhance feature alignment during local training, we devise a dual-classifier architecture, optimized via a hybrid loss combining Dot Regression and Cross-Entropy. Extensive experiments on benchmarks show that GFPL improves model accuracy by 3.6% under imbalanced data settings while maintaining low communication cost.
【2】JSAM: Privacy Straggler-Resilient Joint Client Selection and Incentive Mechanism Design in Differentially Private Federated Learning
标题:JSam:差异私人联邦学习中的隐私离散-弹性联合客户选择和激励机制设计
链接:https://arxiv.org/abs/2602.21844
作者:Ruichen Xu,Ying-Jun Angela Zhang,Jianwei Huang
摘要:不同的私人联邦学习面临着一个根本的紧张局势:保护客户数据的隐私保护机制同时产生了可量化的隐私成本,阻碍了参与,破坏了合作培训过程。现有的激励机制依赖于无偏见的客户端选择,迫使服务器补偿即使是最隐私敏感的客户端(“隐私落伍者”),导致系统效率低下和次优的资源分配。我们引入JSAM(联合客户端选择和隐私补偿机制),贝叶斯最优框架,同时优化客户端选择概率和隐私补偿,以最大限度地提高培训效果的预算约束下。我们的方法将一个复杂的2N维优化问题转化为一个有效的三维配方,通过新的理论表征的最佳选择策略。我们证明,服务器应优先选择隐私容忍的客户端,而不包括高敏感性的参与者,并揭示了反直觉的洞察力,客户端与最小的隐私敏感性可能会产生最高的累积成本,由于频繁的参与。对MNIST和CIFAR-10的广泛评估表明,与现有的无偏选择机制相比,JSAM在测试准确性方面提高了15%,同时在不同的数据异质性水平上保持了成本效率。
摘要:Differentially private federated learning faces a fundamental tension: privacy protection mechanisms that safeguard client data simultaneously create quantifiable privacy costs that discourage participation, undermining the collaborative training process. Existing incentive mechanisms rely on unbiased client selection, forcing servers to compensate even the most privacy-sensitive clients ("privacy stragglers"), leading to systemic inefficiency and suboptimal resource allocation. We introduce JSAM (Joint client Selection and privacy compensAtion Mechanism), a Bayesian-optimal framework that simultaneously optimizes client selection probabilities and privacy compensation to maximize training effectiveness under budget constraints. Our approach transforms a complex 2N-dimensional optimization problem into an efficient three-dimensional formulation through novel theoretical characterization of optimal selection strategies. We prove that servers should preferentially select privacy-tolerant clients while excluding high-sensitivity participants, and uncover the counter-intuitive insight that clients with minimal privacy sensitivity may incur the highest cumulative costs due to frequent participation. Extensive evaluations on MNIST and CIFAR-10 demonstrate that JSAM achieves up to 15% improvement in test accuracy compared to existing unbiased selection mechanisms while maintaining cost efficiency across varying data heterogeneity levels.
【3】Private and Robust Contribution Evaluation in Federated Learning
标题:联邦学习中的私人和稳健贡献评估
链接:https://arxiv.org/abs/2602.21721
作者:Delio Jaramillo Velez,Gergely Biczok,Alexandre Graell i Amat,Johan Ostman,Balazs Pejo
摘要:跨竖井联合学习允许多个组织协作训练机器学习模型,而无需共享原始数据,但客户端更新仍然可以通过推理攻击泄漏敏感信息。安全聚合通过隐藏个人更新来保护隐私,但它使贡献评估复杂化,这对于公平奖励和检测低质量或恶意参与者至关重要。现有的边际贡献方法,如Shapley值,与安全聚合不兼容,而实际的替代方案,如Leave-One-Out,是粗糙的,依赖于自我评估。 我们引入了两个边缘差异贡献分数兼容安全聚合。公平私有满足标准的公平公理,而其他人消除了自我评价,并提供了对操纵的抵抗力,解决了一个在很大程度上被忽视的漏洞。我们为公平性,隐私性,鲁棒性和计算效率提供了理论保证,并在多个医学图像数据集和跨筒仓设置的CIFAR 10上评估了我们的方法。我们的分数始终优于现有的基线,更好地近似Shapley诱导的客户排名,并提高下游模型性能以及不当行为检测。这些结果表明,公平性,隐私性,鲁棒性和实用性可以在联合贡献评估中共同实现,为现实世界的跨筒仓部署提供了一个原则性的解决方案。
摘要:Cross-silo federated learning allows multiple organizations to collaboratively train machine learning models without sharing raw data, but client updates can still leak sensitive information through inference attacks. Secure aggregation protects privacy by hiding individual updates, yet it complicates contribution evaluation, which is critical for fair rewards and detecting low-quality or malicious participants. Existing marginal-contribution methods, such as the Shapley value, are incompatible with secure aggregation, and practical alternatives, such as Leave-One-Out, are crude and rely on self-evaluation. We introduce two marginal-difference contribution scores compatible with secure aggregation. Fair-Private satisfies standard fairness axioms, while Everybody-Else eliminates self-evaluation and provides resistance to manipulation, addressing a largely overlooked vulnerability. We provide theoretical guarantees for fairness, privacy, robustness, and computational efficiency, and evaluate our methods on multiple medical image datasets and CIFAR10 in cross-silo settings. Our scores consistently outperform existing baselines, better approximate Shapley-induced client rankings, and improve downstream model performance as well as misbehavior detection. These results demonstrate that fairness, privacy, robustness, and practical utility can be achieved jointly in federated contribution evaluation, offering a principled solution for real-world cross-silo deployments.
【4】FedVG: Gradient-Guided Aggregation for Enhanced Federated Learning
标题:FedVG:用于增强联邦学习的用户引导聚合
链接:https://arxiv.org/abs/2602.21399
作者:Alina Devkota,Jacob Thrasher,Donald Adjeroh,Binod Bhattarai,Prashnna K. Gyawali
备注:Accepted to CVPR 2026 (Findings Track)
摘要:联合学习(FL)支持跨多个客户端的协作模型训练,而无需共享他们的私人数据。然而,跨客户端的数据异构性导致客户端漂移,这降低了模型的整体泛化性能。这种影响因过分关注业绩不佳的客户而进一步加剧。为了解决这个问题,我们提出了FedVG,一种新的基于梯度的联邦聚合框架,利用全局验证集来指导优化过程。这样的全局验证集可以使用现成的公共数据集建立,确保跨客户端的可访问性和一致性,而不会损害隐私。与优先考虑客户端数据集容量的传统方法相比,FedVG通过测量跨层验证梯度的大小来评估客户端模型的泛化能力。具体来说,我们计算逐层梯度范数以获得客户端特定的分数,该分数反映每个客户端需要调整多少以改进全局验证集的泛化,从而实现更明智和自适应的联邦聚合。在自然和医学图像基准测试数据集上进行的广泛实验,跨越不同的模型架构,表明FedVG始终如一地提高性能,特别是在高度异构的环境中。此外,FedVG是模块化的,可以与各种最先进的FL算法无缝集成,通常会进一步改善其结果。我们的代码可在https://github.com/alinadevkota/FedVG上获得。
摘要:Federated Learning (FL) enables collaborative model training across multiple clients without sharing their private data. However, data heterogeneity across clients leads to client drift, which degrades the overall generalization performance of the model. This effect is further compounded by overemphasis on poorly performing clients. To address this problem, we propose FedVG, a novel gradient-based federated aggregation framework that leverages a global validation set to guide the optimization process. Such a global validation set can be established using readily available public datasets, ensuring accessibility and consistency across clients without compromising privacy. In contrast to conventional approaches that prioritize client dataset volume, FedVG assesses the generalization ability of client models by measuring the magnitude of validation gradients across layers. Specifically, we compute layerwise gradient norms to derive a client-specific score that reflects how much each client needs to adjust for improved generalization on the global validation set, thereby enabling more informed and adaptive federated aggregation. Extensive experiments on both natural and medical image benchmarking datasets, across diverse model architectures, demonstrate that FedVG consistently improves performance, particularly in highly heterogeneous settings. Moreover, FedVG is modular and can be seamlessly integrated with various state-of-the-art FL algorithms, often further improving their results. Our code is available at https://github.com/alinadevkota/FedVG.
推理|分析|理解|解释(10篇)
【1】SigmaQuant: Hardware-Aware Heterogeneous Quantization Method for Edge DNN Inference
标题:SigmaQuant:边缘DNN推理的硬件感知异类量化方法
链接:https://arxiv.org/abs/2602.22136
作者:Qunyou Liu,Pengbo Yu,Marina Zapater,David Atienza
摘要:深度神经网络(DNN)对于在边缘或移动设备上执行高级任务至关重要,但它们的部署往往受到严重的资源限制的阻碍,包括有限的内存、能源和计算能力。虽然均匀量化提供了一种简单的方法来压缩模型并降低硬件要求,但它未能充分利用跨层的不同鲁棒性,并且通常导致精度下降或次优资源使用,特别是在低位宽时。相比之下,异构量化,它分配不同的位宽到各个层,可以减轻这些缺点。然而,目前的异构量化方法要么需要巨大的蛮力设计空间搜索,要么缺乏适应性,以满足不同的硬件条件,如存储器大小,能量预算和延迟要求。为了填补这些空白,这项工作引入了\textbf{\textit{SigmaQuant}},这是一种自适应逐层异构量化框架,旨在有效地平衡不同边缘环境的准确性和资源使用,而无需进行详尽的搜索。
摘要:Deep neural networks (DNNs) are essential for performing advanced tasks on edge or mobile devices, yet their deployment is often hindered by severe resource constraints, including limited memory, energy, and computational power. While uniform quantization provides a straightforward approach to compress model and reduce hardware requirement, it fails to fully leverage the varying robustness across layers, and often lead to accuracy degradation or suboptimal resource usage, particularly at low bitwidths. In contrast, heterogeneous quantization, which allocates different bitwidths to individual layers, can mitigate these drawbacks. Nonetheless, current heterogeneous quantization methods either needs huge brute-force design space search or lacks the adaptability to meet different hardware conditions, such as memory size, energy budget, and latency requirement. Filling these gaps, this work introduces \textbf{\textit{SigmaQuant}}, an adaptive layer-wise heterogeneous quantization framework designed to efficiently balance accuracy and resource usage for varied edge environments without exhaustive search.
【2】Slice and Explain: Logic-Based Explanations for Neural Networks through Domain Slicing
标题:切片并解释:通过领域切片对神经网络进行基于逻辑的解释
链接:https://arxiv.org/abs/2602.22115
作者:Luiz Fernando Paulino Queiroz,Carlos Henrique Leitão Cavalcante,Thiago Alves Rocha
备注:Preprint version. For the final published version, see the DOI below
摘要:神经网络(NN)在各个领域都很普遍,但通常缺乏可解释性。为了满足日益增长的解释需求,已经提出了基于逻辑的方法来解释NN的预测,提供正确性保证。然而,在这些方法中,可扩展性仍然是一个问题。本文提出了一种利用领域切片来促进NN解释生成的方法。通过切片降低逻辑约束的复杂性,我们减少了解释时间高达40%的时间,通过比较实验表明。我们的研究结果突出了域切片在提高NN解释效率方面的功效。
摘要:Neural networks (NNs) are pervasive across various domains but often lack interpretability. To address the growing need for explanations, logic-based approaches have been proposed to explain predictions made by NNs, offering correctness guarantees. However, scalability remains a concern in these methods. This paper proposes an approach leveraging domain slicing to facilitate explanation generation for NNs. By reducing the complexity of logical constraints through slicing, we decrease explanation time by up to 40\% less time, as indicated through comparative experiments. Our findings highlight the efficacy of domain slicing in enhancing explanation efficiency for NNs.
【3】Learning Unknown Interdependencies for Decentralized Root Cause Analysis in Nonlinear Dynamical Systems
标题:非线性动态系统中的分散根本原因分析的未知相互依赖性
链接:https://arxiv.org/abs/2602.21928
作者:Ayush Mohanty,Paritosh Ramanan,Nagi Gebraeel
备注:Manuscript under review
摘要:网络化工业系统(如供应链和电力网络)中的根本原因分析(RCA)由于地理上分布的客户端之间的未知和动态演变的相互依赖性而非常困难。这些客户端代表了异构的物理过程和工业资产,配备了传感器,生成大量非线性,高维和异构的物联网数据。经典的RCA方法需要系统依赖图的部分或全部知识,这在这些复杂的网络中是很少可用的。虽然联邦学习(FL)为去中心化设置提供了一个自然的框架,但大多数现有的FL方法都假设同质特征空间和可重新训练的客户端模型。这些假设与我们的问题设置不相容。不同的客户端具有不同的数据特性,并且通常运行无法修改的固定专有模型。本文提出了一种联邦跨客户端的相互依赖学习方法的功能分区,非线性时间序列数据,而不需要访问原始传感器流或修改专有的客户端模型。每个专有的本地客户端模型都使用机器学习(ML)模型进行增强,该模型对跨客户端的相互依赖性进行编码。这些ML模型通过一个全局服务器进行协调,该服务器在通过校准的差分隐私噪声保护隐私的同时强制执行表示一致性。使用模型残差和异常标志进行RCA。我们建立了理论上的收敛保证,并在广泛的模拟和真实世界的工业网络安全数据集上验证了我们的方法。
摘要:Root cause analysis (RCA) in networked industrial systems, such as supply chains and power networks, is notoriously difficult due to unknown and dynamically evolving interdependencies among geographically distributed clients. These clients represent heterogeneous physical processes and industrial assets equipped with sensors that generate large volumes of nonlinear, high-dimensional, and heterogeneous IoT data. Classical RCA methods require partial or full knowledge of the system's dependency graph, which is rarely available in these complex networks. While federated learning (FL) offers a natural framework for decentralized settings, most existing FL methods assume homogeneous feature spaces and retrainable client models. These assumptions are not compatible with our problem setting. Different clients have different data features and often run fixed, proprietary models that cannot be modified. This paper presents a federated cross-client interdependency learning methodology for feature-partitioned, nonlinear time-series data, without requiring access to raw sensor streams or modifying proprietary client models. Each proprietary local client model is augmented with a Machine Learning (ML) model that encodes cross-client interdependencies. These ML models are coordinated via a global server that enforces representation consistency while preserving privacy through calibrated differential privacy noise. RCA is performed using model residuals and anomaly flags. We establish theoretical convergence guarantees and validate our approach on extensive simulations and a real-world industrial cybersecurity dataset.
【4】xai-cola: A Python library for sparsifying counterfactual explanations
标题:xai-cola:一个用于稀疏化反事实解释的Python库
链接:https://arxiv.org/abs/2602.21845
作者:Lin Zhu,Lei You
备注:5pages, 1 figure
摘要:反事实解释是事后可解释性研究的一个重要领域。然而,大多数CE生成器生成的解释往往是高度冗余的。这项工作介绍了一个开源Python库xai-cola,它提供了一个端到端的管道,用于稀疏化由任意生成器生成的CE,减少多余的功能更改,同时保持其有效性。它提供了一个文档化的API,该API以pandas DataFrame形式接收原始表格数据,预处理对象(用于标准化和编码)以及经过训练的scikit-learn或PyTorch模型。在此基础上,用户可以使用内置或外部进口的CE发生器。该库还实现了几种稀疏化策略,并包括用于分析和比较稀疏化反事实的可视化例程。xai-cola是在MIT许可下发布的,可以从PyPI安装。实证实验表明,xai-cola在几个CE生成器中产生更稀疏的反事实,在我们的设置中将修改后的特征数量减少了50%。源代码可在https://github.com/understanding-ml/COLA上获得。
摘要:Counterfactual explanation (CE) is an important domain within post-hoc explainability. However, the explanations generated by most CE generators are often highly redundant. This work introduces an open-source Python library xai-cola, which provides an end-to-end pipeline for sparsifying CEs produced by arbitrary generators, reducing superfluous feature changes while preserving their validity. It offers a documented API that takes as input raw tabular data in pandas DataFrame form, a preprocessing object (for standardization and encoding), and a trained scikit-learn or PyTorch model. On this basis, users can either employ the built-in or externally imported CE generators. The library also implements several sparsification policies and includes visualization routines for analysing and comparing sparsified counterfactuals. xai-cola is released under the MIT license and can be installed from PyPI. Empirical experiments indicate that xai-cola produces sparser counterfactuals across several CE generators, reducing the number of modified features by up to 50% in our setting. The source code is available at https://github.com/understanding-ml/COLA.
【5】Revisiting the Bertrand Paradox via Equilibrium Analysis of No-regret Learners
标题:从无悔学习者的均衡分析重新审视BERT兰悖论
链接:https://arxiv.org/abs/2602.21620
作者:Arnab Maiti,Junyan Liu,Kevin Jamieson,Lillian J. Ratliff
备注:36 pages, 34 figures
摘要:研究了需求函数为非增函数的离散Bertrand定价博弈。这个游戏有$n \ge 2$个玩家,他们同时从集合$\{1/k,2/k,\ldots,1\}$中选择价格,其中$k\in\mathbb{N}$。设定最低价格的参与者获得了全部需求;如果多个参与者在最低价格上打成平手,他们就平分需求。 我们研究伯特兰悖论,经典理论预测低价格,但实际市场往往维持高价格。为了理解这一差距,我们分析了一个重复博弈模型,在这个模型中,企业使用无悔学习者来设定价格。我们的目标是描述不同的无悔学习保证下可能出现的均衡结果。我们特别感兴趣的问题,如无外部后悔学习者是否可以收敛到不受欢迎的高价结果,以及更强的保证,如无交换后悔塑造竞争性低价行为的出现。我们解决这些问题和相关问题,通过理论分析,辅以实验,支持理论,并揭示了令人惊讶的现象,无交换遗憾的学习者。
摘要:We study the discrete Bertrand pricing game with a non-increasing demand function. The game has $n \ge 2$ players who simultaneously choose prices from the set $\{1/k, 2/k, \ldots, 1\}$, where $k\in\mathbb{N}$. The player who sets the lowest price captures the entire demand; if multiple players tie for the lowest price, they split the demand equally. We study the Bertrand paradox, where classical theory predicts low prices, yet real markets often sustain high prices. To understand this gap, we analyze a repeated-game model in which firms set prices using no-regret learners. Our goal is to characterize the equilibrium outcomes that can arise under different no-regret learning guarantees. We are particularly interested in questions such as whether no-external-regret learners can converge to undesirable high-price outcomes, and how stronger guarantees such as no-swap regret shape the emergence of competitive low-price behavior. We address these and related questions through a theoretical analysis, complemented by experiments that support the theory and reveal surprising phenomena for no-swap regret learners.
【6】MINAR: Mechanistic Interpretability for Neural Algorithmic Reasoning
标题:MINAR:神经网络推理的机械解释性
链接:https://arxiv.org/abs/2602.21442
作者:Jesse He,Helen Jenne,Max Vargas,Davis Brown,Gal Mishne,Yusu Wang,Henry Kvinge
摘要:最近的神经算法推理(NAR)领域研究了图神经网络(GNN)模拟Bellman-Ford等经典算法的能力,这种现象称为算法对齐。与此同时,大型语言模型(LLM)的最新进展催生了对机械可解释性的研究,其目的是识别执行特定计算的粒度模型组件,如电路。在这项工作中,我们介绍了神经网络推理的机械可解释性(MINAR),一个有效的电路发现工具箱,适应属性修补方法从机械可解释性的GNN设置。我们通过两个案例研究表明,MINAR从经过算法任务训练的GNN中恢复了忠实的神经元级电路。我们的研究为训练过程中的电路形成和修剪过程提供了新的见解,并对GNN如何训练并行执行多个任务提供了新的见解,从而为相关任务重新使用电路组件。我们的代码可在https://github.com/pnnl/MINAR上获得。
摘要:The recent field of neural algorithmic reasoning (NAR) studies the ability of graph neural networks (GNNs) to emulate classical algorithms like Bellman-Ford, a phenomenon known as algorithmic alignment. At the same time, recent advances in large language models (LLMs) have spawned the study of mechanistic interpretability, which aims to identify granular model components like circuits that perform specific computations. In this work, we introduce Mechanistic Interpretability for Neural Algorithmic Reasoning (MINAR), an efficient circuit discovery toolbox that adapts attribution patching methods from mechanistic interpretability to the GNN setting. We show through two case studies that MINAR recovers faithful neuron-level circuits from GNNs trained on algorithmic tasks. Our study sheds new light on the process of circuit formation and pruning during training, as well as giving new insight into how GNNs trained to perform multiple tasks in parallel reuse circuit components for related tasks. Our code is available at https://github.com/pnnl/MINAR.
【7】Shared Nature, Unique Nurture: PRISM for Pluralistic Reasoning via In-context Structure Modeling
标题:共享的自然,独特的养育:通过上下文结构建模进行多元推理的PRISM
链接:https://arxiv.org/abs/2602.21317
作者:Guancheng Tu,Shiyang Zhang,Tianyu Zhang,Yi Zhang,Diji Yang
摘要:大型语言模型(LLM)正在向单一的人工Hivemind融合,其中共享的自然(预训练先验)导致分布多样性的深刻崩溃,限制了创造性探索和科学发现所需的独特视角。为了解决这个问题,我们建议使用认知进化范式为模型配备推理时间培育(个性化的认知轨迹),通过探索,内化和表达来进行。我们通过PRISM(通过上下文结构建模的多元推理)实例化这一点,这是一个模型不可知的系统,它用动态的动态认知图来增强LLM。在三个创造力基准上,PRISM实现了最先进的新颖性,并显着扩大了分销多样性。此外,我们通过一个具有挑战性的罕见疾病诊断基准来评估现实世界的效用。结果表明,PRISM成功地揭示了正确的长尾诊断,标准LLM错过,确认其分歧源于有意义的探索,而不是不相干的噪音。总的来说,这项工作为多元人工智能建立了一个新的范式,超越了单一的共识,走向了一个由能够进行集体、多视角发现的独特认知个体组成的多元化生态系统。
摘要:Large Language Models (LLMs) are converging towards a singular Artificial Hivemind, where shared Nature (pre-training priors) result in a profound collapse of distributional diversity, limiting the distinct perspectives necessary for creative exploration and scientific discovery. To address this, we propose to equip models with inference-time Nurture (individualized epistemic trajectories) using Epistemic Evolution paradigm, progressing through explore, internalize, and express. We instantiate this via PRISM (Pluralistic Reasoning via In-context Structure Modeling), a model-agnostic system that augments LLM with dynamic On-the-fly Epistemic Graphs. On three creativity benchmarks, PRISM achieves state-of-the-art novelty and significantly expands distributional diversity. Moreover, we evaluate the real-world utility via a challenging rare-disease diagnosis benchmark. Results demonstrate that PRISM successfully uncovers correct long-tail diagnoses that standard LLM miss, confirming that its divergence stems from meaningful exploration rather than incoherent noise. Overall, this work establishes a new paradigm for Pluralistic AI, moving beyond monolithic consensus toward a diverse ecosystem of unique cognitive individuals capable of collective, multi-perspective discovery.
【8】ToolMATH: A Math Tool Benchmark for Realistic Long-Horizon Multi-Tool Reasoning
标题:Tools MATH:现实长视野多工具推理的数学工具基准
链接:https://arxiv.org/abs/2602.21265
作者:Hyeonje Choi,Jeongsoo Lee,Hyojun Lee,Jay-Yoon Lee
备注:Conference : Submitted to ICML 2026. 8 pages (+ abstract 16 pages), 5 figures
摘要
:我们介绍\ToolMATH,一个数学为基础的基准,评估工具增强的语言模型在现实的多工具环境中的输出依赖于调用模式指定的工具和维持多步执行。它将数学问题转化为具有工具集的可控的、可检查正确性的基准,从而能够在(1)大型重叠工具目录和(2)缺乏预期功能的情况下系统地评估模型可靠性。\ToolMATH为工具增强代理中的故障模式提供可操作的诊断证据,帮助识别鲁棒性所需的控制机制。\ToolMATH大约包含8 k个问题和12 k个工具;我们提供了一个额外的硬集\ToolMATHHard,其中包含问题和工具。我们的评估显示,失败的关键因素是由于无法推理,导致中间结果的错误积累和约束以后的决策。工具列表冗余不仅增加了噪音,而且将早期的小偏差放大为不可逆的执行漂移。该基准强调,当预期的功能缺失时,分心工具有时可以作为解决方案路径中的部分替代品,但它们也可以将模型误导到不接地的工具轨迹。最后,工具使用协议之间的比较强调,改善来自本地的行动选择,更多的是从长期计划的一致性和纪律性的使用观察。
摘要:We introduce \ToolMATH, a math-grounded benchmark that evaluates tool-augmented language models in realistic multi-tool environments where the output depends on calling schema-specified tools and sustaining multi-step execution. It turns math problems into a controlled, correctness-checkable benchmark with tool sets, enabling systematic evaluation of model reliability under (1) large, overlapping tool catalogs and (2) the absence of the intended capability. \ToolMATH provides actionable diagnostic evidence of failure modes in tool-augmented agents, helping identify the control mechanisms required for robustness. \ToolMATH roughly contains 8k questions and 12k tools; we provide an additional hard-set \ToolMATHHard with questions and tools. Our evaluation reveals that the key failure factor is due to the inability to reason, leading to the accumulation of intermediate results' errors and constrain later decisions. Tool-list redundancy do not simply add noise, but amplify small early deviations into irreversible execution drift. The benchmark highlights that when the intended capability is missing, distractor tools can sometimes serve as partial substitutes in solution paths, yet they can also mislead models into ungrounded tool trajectories. Finally, comparisons between tool-use protocols emphasize that improvements come less from local action selection and more from long-range plan coherence and disciplined use of observations.
【9】Architecture-Agnostic Curriculum Learning for Document Understanding: Empirical Evidence from Text-Only and Multimodal
标题:文档理解的架构不可知课程学习:来自纯文本和多模式的经验证据
链接:https://arxiv.org/abs/2602.21225
作者:Mohammed Hamdan,Vincenzo Dentamaro,Giuseppe Pirlo,Mohamed Cheriet
摘要:我们调查渐进式数据调度-一种逐步增加训练数据暴露的课程学习策略(33\%$\rightarrow$67\%$\rightarrow$100\%)-是否在架构上不同的文档理解模型中产生一致的效率增益。通过评估BERT(纯文本,110 M参数)和LayoutLMv 3(多模态,126 M参数)的FUNSD和CORD基准,我们建立了这个时间表减少了约33%的挂钟训练时间,相当于从6.67减少到10.0有效的时代当量的数据。为了将课程效果与计算减少隔离开来,我们引入了匹配计算基线(Standard-7)来控制总梯度更新。在FUNSD数据集上,课程显著优于BERT的匹配计算基线($Δ$F1 =+0.023,$p=0.022$,$d_z=3.83$),构成了在容量受限模型中真正的调度益处的证据。相比之下,对于LayoutLMv 3($p=0.621$)没有观察到类似的益处,其多模态表示提供了足够的归纳偏差。在CORD数据集上,无论调度如何,所有条件都收敛到相等的F1分数($\geq$0.947),表明性能上限。比较渐进式、双相、反向和随机起搏的消融计划证实,效率增益来自减少的数据量,而不是有序。综上所述,这些研究结果表明,渐进式调度是一个可靠的计算减少策略,跨模型的家庭,与特定的好处取决于模型容量和任务复杂性之间的相互作用。
摘要:We investigate whether progressive data scheduling -- a curriculum learning strategy that incrementally increases training data exposure (33\%$\rightarrow$67\%$\rightarrow$100\%) -- yields consistent efficiency gains across architecturally distinct document understanding models. By evaluating BERT (text-only, 110M parameters) and LayoutLMv3 (multimodal, 126M parameters) on the FUNSD and CORD benchmarks, we establish that this schedule reduces wall-clock training time by approximately 33\%, commensurate with the reduction from 6.67 to 10.0 effective epoch-equivalents of data. To isolate curriculum effects from compute reduction, we introduce matched-compute baselines (Standard-7) that control for total gradient updates. On the FUNSD dataset, the curriculum significantly outperforms the matched-compute baseline for BERT ($Δ$F1 = +0.023, $p=0.022$, $d_z=3.83$), constituting evidence for a genuine scheduling benefit in capacity-constrained models. In contrast, no analogous benefit is observed for LayoutLMv3 ($p=0.621$), whose multimodal representations provide sufficient inductive bias. On the CORD dataset, all conditions converge to equivalent F1 scores ($\geq$0.947) irrespective of scheduling, indicating a performance ceiling. Schedule ablations comparing progressive, two-phase, reverse, and random pacing confirm that the efficiency gain derives from reduced data volume rather than ordering. Taken together, these findings demonstrate that progressive scheduling is a reliable compute-reduction strategy across model families, with curriculum-specific benefits contingent on the interaction between model capacity and task complexity.
【10】Scalable Kernel-Based Distances for Statistical Inference and Integration
标题:可扩展的基于核的距离统计推断和集成
链接:https://arxiv.org/abs/2602.21846
作者:Masha Naslidnyk
备注:PhD thesis
摘要:表示、比较和测量概率分布之间的距离是计算统计和机器学习中的一项关键任务。表示的选择和相关的距离决定了使用它们的方法的属性:例如,某些距离可以允许对问题的鲁棒性或平滑性进行编码。核方法提供了灵活而丰富的分布的希尔伯特空间表示,允许建模者通过选择核来执行属性,并以有效的非参数速率估计相关距离。特别是,最大平均差异(MMD),一个基于核的距离比较希尔伯特空间的平均函数,已受到显着的关注,由于其计算的易处理性,并受到从业者的青睐。 在这篇论文中,我们对基于核的距离进行了深入的研究,重点是有效的计算,核心贡献在第3章到第6章。论文的第一部分主要研究MMD估计,特别是改进的MMD估计。在第三章中,我们提出了一个理论上合理的,改进的估计MMD在基于模拟的推理。然后,在第四章中,我们提出了一个基于MMD的条件期望估计,这是统计计算中普遍存在的任务。在第五章中,我们研究了MMD应用于积分任务时的校准问题。 在第二部分中,由于内核嵌入的最新发展超出了平均值,我们引入了一系列新的基于内核的差异:内核分位数差异。这些地址的MMD的一些陷阱,并通过理论结果和实证研究表明,提供一个有竞争力的替代MMD及其快速近似。最后,我们讨论了更广泛的经验教训和未来的工作出现的论文。
摘要
:Representing, comparing, and measuring the distance between probability distributions is a key task in computational statistics and machine learning. The choice of representation and the associated distance determine properties of the methods in which they are used: for example, certain distances can allow one to encode robustness or smoothness of the problem. Kernel methods offer flexible and rich Hilbert space representations of distributions that allow the modeller to enforce properties through the choice of kernel, and estimate associated distances at efficient nonparametric rates. In particular, the maximum mean discrepancy (MMD), a kernel-based distance constructed by comparing Hilbert space mean functions, has received significant attention due to its computational tractability and is favoured by practitioners. In this thesis, we conduct a thorough study of kernel-based distances with a focus on efficient computation, with core contributions in Chapters 3 to 6. Part I of the thesis is focused on the MMD, specifically on improved MMD estimation. In Chapter 3 we propose a theoretically sound, improved estimator for MMD in simulation-based inference. Then, in Chapter 4, we propose an MMD-based estimator for conditional expectations, a ubiquitous task in statistical computation. Closing Part I, in Chapter 5 we study the problem of calibration when MMD is applied to the task of integration. In Part II, motivated by the recent developments in kernel embeddings beyond the mean, we introduce a family of novel kernel-based discrepancies: kernel quantile discrepancies. These address some of the pitfalls of MMD, and are shown through both theoretical results and an empirical study to offer a competitive alternative to MMD and its fast approximations. We conclude with a discussion on broader lessons and future work emerging from the thesis.
分类|识别(2篇)
【1】Robust Long-Form Bangla Speech Processing: Automatic Speech Recognition and Speaker Diarization
标题:鲁棒的长形式孟加拉语语音处理:自动语音识别和说话人分区化
链接:https://arxiv.org/abs/2602.21741
作者:MD. Sagor Chowdhury,Adiba Fairooz Chowdhury
备注:6 pages, 5 figures, 3 tables; system paper submitted to DL Sprint 4.0 (Kaggle)
摘要:我们描述了我们的端到端系统孟加拉语长格式语音识别(ASR)和扬声器日记提交给DL Sprint 4.0的竞争Kaggle。孟加拉语提出了实质性的挑战,这两项任务:一个大的音素库存,显着的方言变异,频繁的代码混合与英语,以及相对稀缺的大规模标记语料库。对于ASR,我们实现了0.37738的最佳私有字错误率(WER)和0.36137的公共WER,将BengaliAI微调的Whisper媒体模型与Demucs源分离相结合,用于语音隔离,沉默边界分块和精心调整的生成超参数。对于说话人日记,我们通过将pyannote.audio管道内的默认分割模型替换为孟加拉语微调变体,将其与wespeaker-voxceleb-resnet 34-LM嵌入和基于质心的凝聚聚类配对,达到了0.27671的最佳私有日记错误率(DER)和0.20936的公共DER。我们的实验表明,特定领域的微调分割组件,声音源分离,自然的沉默意识分块是三个最有效的设计选择低资源孟加拉语语音处理。
摘要:We describe our end-to-end system for Bengali long-form speech recognition (ASR) and speaker diarization submitted to the DL Sprint 4.0 competition on Kaggle. Bengali presents substantial challenges for both tasks: a large phoneme inventory, significant dialectal variation, frequent code-mixing with English, and a relative scarcity of large-scale labelled corpora. For ASR we achieve a best private Word Error Rate (WER) of 0.37738 and public WER of 0.36137, combining a BengaliAI fine-tuned Whisper medium model with Demucs source separation for vocal isolation, silence-boundary chunking, and carefully tuned generation hyperparameters. For speaker diarization we reach a best private Diarization Error Rate (DER) of 0.27671 and public DER of 0.20936 by replacing the default segmentation model inside the pyannote.audio pipeline with a Bengali-fine-tuned variant, pairing it with wespeaker-voxceleb-resnet34-LM embeddings and centroid-based agglomerative clustering. Our experiments demonstrate that domain-specific fine-tuning of the segmentation component, vocal source separation, and natural silence-aware chunking are the three most impactful design choices for low-resource Bengali speech processing.
【2】Effects of Training Data Quality on Classifier Performance
标题:训练数据质量对分类器性能的影响
链接:https://arxiv.org/abs/2602.21462
作者:Alan F. Karr,Regina Ruane
摘要:我们描述了大量的数值实验,评估和量化分类器的性能如何取决于训练数据的质量,分类器的分析经常被忽视的组成部分。 更具体地说,在宏基因组组装短DNA读段到“重叠群”的科学背景下,我们研究了多种机制降低训练数据质量的影响,以及四种分类器-贝叶斯分类器,神经网络,分区模型和随机森林。我们调查的个人行为和一致性之间的分类。我们发现所有四个分类器都存在类似故障的行为,随着退化的增加,它们从大部分正确变为偶然正确,因为它们以同样的方式出错。在这个过程中,出现了空间异质性的图像:随着训练数据远离分析数据,分类器决策退化,边界变得不那么密集,一致性增加。
摘要:We describe extensive numerical experiments assessing and quantifying how classifier performance depends on the quality of the training data, a frequently neglected component of the analysis of classifiers. More specifically, in the scientific context of metagenomic assembly of short DNA reads into "contigs," we examine the effects of degrading the quality of the training data by multiple mechanisms, and for four classifiers -- Bayes classifiers, neural nets, partition models and random forests. We investigate both individual behavior and congruence among the classifiers. We find breakdown-like behavior that holds for all four classifiers, as degradation increases and they move from being mostly correct to only coincidentally correct, because they are wrong in the same way. In the process, a picture of spatial heterogeneity emerges: as the training data move farther from analysis data, classifier decisions degenerate, the boundary becomes less dense, and congruence increases.
表征(3篇)
【1】From Basis to Basis: Gaussian Particle Representation for Interpretable PDE Operators
标题:从基础到基础:可解释的偏脱方程运算符的高斯粒子表示
链接:https://arxiv.org/abs/2602.21551
作者:Zhihao Li,Yu Feng,Zhilu Lai,Wei Wang
摘要:学习流体的PDE动力学越来越依赖于神经算子和基于变换器的模型,但这些方法往往缺乏可解释性,并且难以处理局部的高频结构,同时在空间样本中产生二次成本。我们建议用高斯基表示场,其中学习的原子具有明确的几何形状(中心,各向异性尺度,权重),并形成一个紧凑的,网格不可知的,直接可视化的状态。在此表示的基础上,我们引入了一个在模态空间中起作用的高斯粒子算子:学习的高斯模态窗口执行Petrov-Galerkin测量,PG Gaussian Attention实现全局跨尺度耦合。这种基础到基础的设计与分辨率无关,并在固定的模态预算下实现了N的近线性复杂度,支持不规则的几何形状和无缝的2D到3D扩展。在标准PDE基准测试和真实数据集上,我们的方法在提供内在可解释性的同时达到了最先进的竞争精度。
摘要
:Learning PDE dynamics for fluids increasingly relies on neural operators and Transformer-based models, yet these approaches often lack interpretability and struggle with localized, high-frequency structures while incurring quadratic cost in spatial samples. We propose representing fields with a Gaussian basis, where learned atoms carry explicit geometry (centers, anisotropic scales, weights) and form a compact, mesh-agnostic, directly visualizable state. Building on this representation, we introduce a Gaussian Particle Operator that acts in modal space: learned Gaussian modal windows perform a Petrov-Galerkin measurement, and PG Gaussian Attention enables global cross-scale coupling. This basis-to-basis design is resolution-agnostic and achieves near-linear complexity in N for a fixed modal budget, supporting irregular geometries and seamless 2D-to-3D extension. On standard PDE benchmarks and real datasets, our method attains state-of-the-art competitive accuracy while providing intrinsic interpretability.
【2】WaterVIB: Learning Minimal Sufficient Watermark Representations via Variational Information Bottleneck
标题:WaterSim:通过变分信息瓶颈学习最小充分水印表示
链接:https://arxiv.org/abs/2602.21508
作者:Haoyuan He,Yu Zheng,Jie Zhou,Jiwen Lu
备注:22 pages, 7 figures. Preprint
摘要:鲁棒水印是知识产权保护的关键,而现有的方法面临着严重的脆弱性,对再生为基础的AIGC攻击。我们确定,现有的方法失败,因为他们纠缠的水印与高频覆盖纹理,这是容易被重写过程中产生净化。为了解决这个问题,我们提出了WaterVIB,一个理论上接地的框架,重新制定的编码器作为一个信息筛通过变化的信息瓶颈。我们的方法不是过度拟合脆弱的封面细节,而是迫使模型学习消息的最小充分统计量。这有效地过滤掉了易于生成移位的冗余覆盖细微差别,仅保留了再生的基本信号不变。我们从理论上证明,优化这个瓶颈是对分布移位攻击的鲁棒性的必要条件。大量的实验表明,WaterVIB的性能明显优于最先进的方法,实现了针对未知扩散编辑的卓越zero-shot恢复能力。
摘要:Robust watermarking is critical for intellectual property protection, whereas existing methods face a severe vulnerability against regeneration-based AIGC attacks. We identify that existing methods fail because they entangle the watermark with high-frequency cover texture, which is susceptible to being rewritten during generative purification. To address this, we propose WaterVIB, a theoretically grounded framework that reformulates the encoder as an information sieve via the Variational Information Bottleneck. Instead of overfitting to fragile cover details, our approach forces the model to learn a Minimal Sufficient Statistic of the message. This effectively filters out redundant cover nuances prone to generative shifts, retaining only the essential signal invariant to regeneration. We theoretically prove that optimizing this bottleneck is a necessary condition for robustness against distribution-shifting attacks. Extensive experiments demonstrate that WaterVIB significantly outperforms state-of-the-art methods, achieving superior zero-shot resilience against unknown diffusion-based editing.
【3】Learning Recursive Multi-Scale Representations for Irregular Multivariate Time Series Forecasting
标题:学习递归多尺度表示用于不规则多变量时间序列预测
链接:https://arxiv.org/abs/2602.21498
作者:Boyuan Li,Zhen Liu,Yicheng Luo,Qianli Ma
备注:Accepted in ICLR 2026
摘要:不规则多变量时间序列(IMTS)的特点是连续时间戳之间的不均匀间隔,其中携带的采样模式信息的学习时间和变量的依赖性有价值和信息。此外,IMTS往往在多个时间尺度上表现出不同的依赖性。然而,许多现有的多尺度IMTS方法使用重采样来获得粗略的序列,这会改变原始时间戳并扰乱采样模式信息。为了应对这一挑战,我们提出了ReIMTS,一种用于不规则多变量时间序列预测的递归多尺度建模方法。ReIMTS保持时间戳不变,并递归地将每个样本分成时间周期逐渐缩短的子样本,而不是重新分配。基于这些从长到短的子样本中的原始采样时间戳,提出了一种不规则感知的表示融合机制,以捕获全局到局部的依赖关系,从而进行准确的预测。大量的实验表明,在不同的模型和真实世界的数据集的预测任务的平均性能提高了27.1%。我们的代码可在https://github.com/Ladbaby/PyOmniTS上获得。
摘要:Irregular Multivariate Time Series (IMTS) are characterized by uneven intervals between consecutive timestamps, which carry sampling pattern information valuable and informative for learning temporal and variable dependencies. In addition, IMTS often exhibit diverse dependencies across multiple time scales. However, many existing multi-scale IMTS methods use resampling to obtain the coarse series, which can alter the original timestamps and disrupt the sampling pattern information. To address the challenge, we propose ReIMTS, a Recursive multi-scale modeling approach for Irregular Multivariate Time Series forecasting. Instead of resampling, ReIMTS keeps timestamps unchanged and recursively splits each sample into subsamples with progressively shorter time periods. Based on the original sampling timestamps in these long-to-short subsamples, an irregularity-aware representation fusion mechanism is proposed to capture global-to-local dependencies for accurate forecasting. Extensive experiments demonstrate an average performance improvement of 27.1\% in the forecasting task across different models and real-world datasets. Our code is available at https://github.com/Ladbaby/PyOmniTS.
优化|敛散性(7篇)
【1】Neural solver for Wasserstein Geodesics and optimal transport dynamics
标题:Wasserstein测地学和最佳输运动力学的神经求解器
链接:https://arxiv.org/abs/2602.22003
作者:Hailiang Liu,Yan-Han Chen
备注:28 pages, 22 figures
摘要:近年来,机器学习社区越来越多地采用最优传输(OT)框架来建模分布关系。在这项工作中,我们引入了一个基于样本的神经求解器,用于计算源和目标分布之间的Wasserstein测地线,以及相关的速度场。基于最优运输(OT)问题的动力学公式,我们将约束优化转换为极大极小问题,使用深度神经网络来近似相关函数。这种方法不仅提供了Wasserstein测地线,而且还恢复了OT图,从而能够从目标分布直接采样。通过估计OT图,我们获得沿粒子轨迹的速度估计,这反过来又使我们能够学习完整的速度场。该框架是灵活的,很容易扩展到一般的成本函数,包括常用的二次成本。我们证明了我们的方法的有效性,通过实验合成和真实的数据集。
摘要:In recent years, the machine learning community has increasingly embraced the optimal transport (OT) framework for modeling distributional relationships. In this work, we introduce a sample-based neural solver for computing the Wasserstein geodesic between a source and target distribution, along with the associated velocity field. Building on the dynamical formulation of the optimal transport (OT) problem, we recast the constrained optimization as a minimax problem, using deep neural networks to approximate the relevant functions. This approach not only provides the Wasserstein geodesic but also recovers the OT map, enabling direct sampling from the target distribution. By estimating the OT map, we obtain velocity estimates along particle trajectories, which in turn allow us to learn the full velocity field. The framework is flexible and readily extends to general cost functions, including the commonly used quadratic cost. We demonstrate the effectiveness of our method through experiments on both synthetic and real datasets.
【2】Outpatient Appointment Scheduling Optimization with a Genetic Algorithm Approach
标题:基于遗传算法的门诊预约调度优化
链接:https://arxiv.org/abs/2602.21995
作者:Ana Rodrigues,Rui Rego
备注:7 pages, 4 figures
摘要:在多中心医疗保健环境中,复杂的医疗预约安排的优化仍然是一个重大的运营挑战,其中临床安全协议和患者物流必须协调一致。本研究提出并评估了遗传算法(GA)的框架,旨在自动化调度的多个医疗行为,同时坚持严格的程序间不兼容的规则。使用包含四个医疗机构的50个医疗行为的合成数据集,我们比较了两种GA变体,即预购和无序,以及确定性的先到先得(FCFS)和随机选择基线。我们的研究结果表明,GA框架实现了100%的约束满足率,有效地解决了FCFS基线分别在60%和40%的情况下未能解决的时间重叠和临床不兼容性。此外,GA变体在以患者为中心的指标方面表现出统计学上的显著改善(p < 0.001),实现了经常低于0.4的空闲时间比(ITR),并减少了健康中心之间的旅行。虽然GA(有序)变体提供了一个优越的初始搜索轨迹,但两个进化模型在第100代时都收敛到了可比的全局最优值。这些研究结果表明,从人工、人工调解的调度过渡到自动化的元启发式方法,可以提高临床完整性,减少管理开销,并通过最大限度地减少等待时间和后勤负担,显着改善患者体验。
摘要:The optimization of complex medical appointment scheduling remains a significant operational challenge in multi-center healthcare environments, where clinical safety protocols and patient logistics must be reconciled. This study proposes and evaluates a Genetic Algorithm (GA) framework designed to automate the scheduling of multiple medical acts while adhering to rigorous inter-procedural incompatibility rules. Using a synthetic dataset encompassing 50 medical acts across four healthcare facilities, we compared two GA variants, Pre-Ordered and Unordered, against deterministic First-Come, First-Served (FCFS) and Random Choice baselines. Our results demonstrate that the GA framework achieved a 100% constraint fulfillment rate, effectively resolving temporal overlaps and clinical incompatibilities that the FCFS baseline failed to address in 60% and 40% of cases, respectively. Furthermore, the GA variants demonstrated statistically significant improvements (p < 0.001) in patient-centric metrics, achieving an Idle Time Ratio (ITR) frequently below 0.4 and reducing inter-healthcenter trips. While the GA (Ordered) variant provided a superior initial search locus, both evolutionary models converged to comparable global optima by the 100th generation. These findings suggest that transitioning from manual, human-mediated scheduling to an automated metaheuristic approach enhances clinical integrity, reduces administrative overhead, and significantly improves the patient experience by minimizing wait times and logistical burdens.
【3】Estimation and Optimization of Ship Fuel Consumption in Maritime: Review, Challenges and Future Directions
标题:海事船舶油耗估算与优化:回顾、挑战和未来方向
链接:https://arxiv.org/abs/2602.21959
作者:Dusica Marijan,Hamza Haruna Mohammed,Bakht Zaman
备注:23 pages, 4 figures. Published in Journal of Marine Science and Technology (2026)
摘要:为了减少碳排放和降低运输成本,提高船舶的燃油效率至关重要。本集团采取各种措施以减少船舶的总燃料消耗,包括优化船舶参数及选择燃料消耗最低的航线。提出了不同的估计方法来预测燃料消耗量,同时提出了各种优化方法来最小化燃料油消耗量。本文提供了一个全面的审查方法,估计和优化海上运输的燃油消耗。我们的新贡献包括将燃油消耗和估计方法分类为基于物理的,机器学习和混合模型,探索它们的优势和局限性。此外,我们强调了数据融合技术的重要性,它结合了AIS,机载传感器和气象数据,以提高准确性。我们首次尝试讨论可解释人工智能在增强决策模型透明度方面的新兴作用。独特的是,关键的挑战,包括数据质量,可用性和实时优化的需要,确定,并提出了未来的研究方向,以解决这些差距,重点是混合模型,实时优化和数据集的标准化。
摘要:To reduce carbon emissions and minimize shipping costs, improving the fuel efficiency of ships is crucial. Various measures are taken to reduce the total fuel consumption of ships, including optimizing vessel parameters and selecting routes with the lowest fuel consumption. Different estimation methods are proposed for predicting fuel consumption, while various optimization methods are proposed to minimize fuel oil consumption. This paper provides a comprehensive review of methods for estimating and optimizing fuel oil consumption in maritime transport. Our novel contributions include categorizing fuel oil consumption \& estimation methods into physics-based, machine-learning, and hybrid models, exploring their strengths and limitations. Furthermore, we highlight the importance of data fusion techniques, which combine AIS, onboard sensors, and meteorological data to enhance accuracy. We make the first attempt to discuss the emerging role of Explainable AI in enhancing model transparency for decision-making. Uniquely, key challenges, including data quality, availability, and the need for real-time optimization, are identified, and future research directions are proposed to address these gaps, with a focus on hybrid models, real-time optimization, and the standardization of datasets.
【4】Mitigating Structural Noise in Low-Resource S2TT: An Optimized Cascaded Nepali-English Pipeline with Punctuation Restoration
标题:缓解低资源S2 TT中的结构噪音:具有标点恢复的优化尼泊尔-英国级联管道
链接:https://arxiv.org/abs/2602.21647
作者:Tangsang Chongbang,Pranesh Pyara Shrestha,Amrit Sarki,Anku Jaiswal
备注:13 pages, 4 figures, 12 tables
摘要:本文提出并评估了一个优化的级联尼泊尔语语音到英语文本翻译(S2TT)系统,重点是减轻自动语音识别(ASR)引入的结构噪声。我们首先建立了高度熟练的ASR和NMT组件:Wav2Vec2-XLS-R-300m模型在OpenSLR-54上达到了最先进的2.72%CER,多级微调的MarianMT模型在FLORES-200基准测试中达到了28.32 BLEU分数。我们实证研究了标点符号丢失的影响,证明不加标点的ASR输出会显着降低翻译质量,导致FLORES基准的相对BLEU大幅下降20.7%。为了克服这一点,我们提出并评估了一个中间标点符号恢复模块(PRM)。最终的S2TT管道在自定义数据集上的三种配置中进行了测试。将PRM直接应用于ASR输出的最佳配置在直接ASR到NMT基线上实现了4.90 BLEU点增益(BLEU 36.38 vs. 31.48)。这种改进通过人工评估进行了验证,这证实了优化的管道具有更好的并行性(3.673)和流畅性(3.804)。这项工作验证了有针对性的标点符号恢复是减轻尼泊尔S2TT管道结构噪音的最有效干预措施。它建立了一个优化的基线,并展示了一个关键的架构洞察力,为类似的低资源语言开发级联语音翻译系统。
摘要:This paper presents and evaluates an optimized cascaded Nepali speech-to-English text translation (S2TT) system, focusing on mitigating structural noise introduced by Automatic Speech Recognition (ASR). We first establish highly proficient ASR and NMT components: a Wav2Vec2-XLS-R-300m model achieved a state-of-the-art 2.72% CER on OpenSLR-54, and a multi-stage fine-tuned MarianMT model reached a 28.32 BLEU score on the FLORES-200 benchmark. We empirically investigate the influence of punctuation loss, demonstrating that unpunctuated ASR output significantly degrades translation quality, causing a massive 20.7% relative BLEU drop on the FLORES benchmark. To overcome this, we propose and evaluate an intermediate Punctuation Restoration Module (PRM). The final S2TT pipeline was tested across three configurations on a custom dataset. The optimal configuration, which applied the PRM directly to ASR output, achieved a 4.90 BLEU point gain over the direct ASR-to-NMT baseline (BLEU 36.38 vs. 31.48). This improvement was validated by human assessment, which confirmed the optimized pipeline's superior Adequacy (3.673) and Fluency (3.804). This work validates that targeted punctuation restoration is the most effective intervention for mitigating structural noise in the Nepali S2TT pipeline. It establishes an optimized baseline and demonstrates a critical architectural insight for developing cascaded speech translation systems for similar low-resource languages.
【5】Neural network optimization strategies and the topography of the loss landscape
标题:神经网络优化策略和损失景观的地形
链接:https://arxiv.org/abs/2602.21276
作者:Jianneng Yu,Alexandre V. Morozov
备注:12 pages in the main text + 5 pages in the supplement. 6 figures + 1 table in the main text, 4 figures and 1 table in the supplement
摘要:通过优化非凸损失景观上的多维拟合参数集来训练神经网络。景观的低损失区域对应于在训练数据上表现良好的参数集。机器学习中的一个关键问题是经过训练的神经网络在以前看不见的测试数据上的性能。在这里,我们研究神经网络训练的随机梯度下降(SGD)-一个非凸的全局优化算法,它只依赖于目标函数的梯度。我们对比SGD的解决方案,通过非随机拟牛顿方法,利用曲率信息来确定步骤方向和黄金分割搜索选择步长。我们使用几种计算工具来研究通过这两种优化方法获得的神经网络参数,包括内核主成分分析和一种新颖的通用算法,用于在损失或能量景观上找到成对点之间的低高度路径,FourierPathplane。我们发现,优化器的选择深刻地影响所得到的解决方案的性质。SGD解决方案往往被较低的障碍比拟牛顿解决方案分开,即使两组解决方案都通过提前停止来正则化,以确保测试数据的足够性能。当允许在训练数据上广泛拟合时,拟牛顿解在SGD无法达到的损失景观上占据更深的最小值。然而,这些解决方案不太适用于测试数据。总的来说,SGD探索了光滑的吸引域,而拟牛顿优化能够找到更深、更孤立的最小值,这些最小值在参数空间中分布得更广。我们的研究结果有助于理解损失景观的地形和景观勘探策略在创建强大的,可转移的神经网络模型中的基本作用。
摘要:Neural networks are trained by optimizing multi-dimensional sets of fitting parameters on non-convex loss landscapes. Low-loss regions of the landscapes correspond to the parameter sets that perform well on the training data. A key issue in machine learning is the performance of trained neural networks on previously unseen test data. Here, we investigate neural network training by stochastic gradient descent (SGD) - a non-convex global optimization algorithm which relies only on the gradient of the objective function. We contrast SGD solutions with those obtained via a non-stochastic quasi-Newton method, which utilizes curvature information to determine step direction and Golden Section Search to choose step size. We use several computational tools to investigate neural network parameters obtained by these two optimization methods, including kernel Principal Component Analysis and a novel, general-purpose algorithm for finding low-height paths between pairs of points on loss or energy landscapes, FourierPathFinder. We find that the choice of the optimizer profoundly affects the nature of the resulting solutions. SGD solutions tend to be separated by lower barriers than quasi-Newton solutions, even if both sets of solutions are regularized by early stopping to ensure adequate performance on test data. When allowed to fit extensively on the training data, quasi-Newton solutions occupy deeper minima on the loss landscapes that are not reached by SGD. These solutions are less generalizable to the test data however. Overall, SGD explores smooth basins of attraction, while quasi-Newton optimization is capable of finding deeper, more isolated minima that are more spread out in the parameter space. Our findings help understand both the topography of the loss landscapes and the fundamental role of landscape exploration strategies in creating robust, transferrable neural network models.
【6】Group Orthogonalized Policy Optimization:Group Policy Optimization as Orthogonal Projection in Hilbert Space
标题:群体子化政策优化:作为Hilbert空间中的正交投影的群体政策优化
链接:https://arxiv.org/abs/2602.21269
作者:Wang Zixian
摘要:我们提出了一种新的对齐算法,用于从Hilbert函数空间几何中导出的大型语言模型。GOPO不是在概率单纯形上进行优化并继承Kullback-Leibler发散的指数曲率,而是将对齐提升到关于参考策略的平方可积函数的Hilbert空间L2(pi_k)中。在该空间内,单纯形约束简化为线性正交条件 = 0,定义余维1子空间H 0。最小化到无约束目标u_star的距离产生功耗散函数J(v)=
-(mu / 2)||v|| ^2,其最大化直接从希尔伯特投影定理得出。强制边界v >= -1会产生一个有界的希尔伯特投影,它会产生精确的稀疏性,通过一个封闭形式的阈值将零概率分配给灾难性的不良行为。为了将这一泛函理论与实际联系起来,GOPO将无穷维L2(pi_k)投影到由群抽样诱导的有限经验子空间。由于群归一化优势之和为零,因此实施概率守恒的拉格朗日乘子正好为零,将约束投影减少为无约束经验损失。所得到的目标具有恒定的海森曲率μ I,非饱和线性梯度,和一个内在的死区机制,没有启发式裁剪。数学推理基准的实验表明,GOPO实现了竞争力的推广,同时保持稳定的梯度动态和熵保存的制度,裁剪为基础的方法高原。
摘要:We present Group Orthogonalized Policy Optimization (GOPO), a new alignment algorithm for large language models derived from the geometry of Hilbert function spaces. Instead of optimizing on the probability simplex and inheriting the exponential curvature of Kullback-Leibler divergence, GOPO lifts alignment into the Hilbert space L2(pi_k) of square-integrable functions with respect to the reference policy. Within this space, the simplex constraint reduces to a linear orthogonality condition = 0, defining a codimension-one subspace H0. Minimizing distance to an unconstrained target u_star yields the work-dissipation functional J(v) = - (mu / 2) ||v||^2, whose maximizer follows directly from the Hilbert projection theorem. Enforcing the boundary v >= -1 produces a bounded Hilbert projection that induces exact sparsity, assigning zero probability to catastrophically poor actions through a closed-form threshold. To connect this functional theory with practice, GOPO projects from infinite-dimensional L2(pi_k) to a finite empirical subspace induced by group sampling. Because group-normalized advantages sum to zero, the Lagrange multiplier enforcing probability conservation vanishes exactly, reducing the constrained projection to an unconstrained empirical loss. The resulting objective has constant Hessian curvature mu I, non-saturating linear gradients, and an intrinsic dead-zone mechanism without heuristic clipping. Experiments on mathematical reasoning benchmarks show that GOPO achieves competitive generalization while maintaining stable gradient dynamics and entropy preservation in regimes where clipping-based methods plateau.
【7】Efficient Uncoupled Learning Dynamics with $\tilde{O}\!\left(T^{-1/4}\right)$ Last-Iterate Convergence in Bilinear Saddle-Point Problems over Convex Sets under Bandit Feedback
链接:https://arxiv.org/abs/2602.21436
作者:Arnab Maiti,Claire Jie Zhang,Kevin Jamieson,Jamie Heather Morgenstern,Ioannis Panageas,Lillian J. Ratliff
备注:19 pages, Accepted at AISTATS 2026
摘要:在本文中,我们研究了双线性鞍点问题中学习算法的最后一次收敛,这是一个更好的收敛概念,可以捕捉学习动态的日常行为。我们专注于具有挑战性的设置,玩家选择行动,从紧凑的凸集,只收到强盗的反馈。我们的主要贡献是设计一个解耦的学习算法,保证最后一个收敛到纳什均衡的概率很高。我们建立了$\tilde{O}(T^{-1/4})$的收敛速度多项式因子的问题参数。最重要的是,我们提出的算法是计算效率高,只需要一个有效的线性优化预言的球员的紧凑的行动集。该算法通过结合实验设计和经典的Follow-The-Regularized-Leader(FTRL)框架的技术获得,并根据每个学习者的动作集的几何形状精心选择正则化函数。
摘要:In this paper, we study last-iterate convergence of learning algorithms in bilinear saddle-point problems, a preferable notion of convergence that captures the day-to-day behavior of learning dynamics. We focus on the challenging setting where players select actions from compact convex sets and receive only bandit feedback. Our main contribution is the design of an uncoupled learning algorithm that guarantees last-iterate convergence to the Nash equilibrium with high probability. We establish a convergence rate of $\tilde{O}(T^{-1/4})$ up to polynomial factors in problem parameters. Crucially, our proposed algorithm is computationally efficient, requiring only an efficient linear optimization oracle over the players' compact action sets. The algorithm is obtained by combining techniques from experimental design and the classic Follow-The-Regularized-Leader (FTRL) framework, with a carefully chosen regularizer function tailored to the geometry of the action set of each learner.
预测|估计(6篇)
【1】Sample Complexity Bounds for Robust Mean Estimation with Mean-Shift Contamination
标题:具有均值漂移污染的稳健均值估计的样本复杂性界
链接:https://arxiv.org/abs/2602.22130
作者:Ilias Diakonikolas,Giannis Iakovidis,Daniel M. Kane,Sihan Liu
摘要:我们研究了均值漂移污染下均值估计的基本任务。在均值漂移污染模型中,允许对手用从基础分布的任意移位版本中提取的样本替换干净样本的一小部分恒定部分。先前的工作特征的样本复杂性的任务的特殊情况下,高斯分布和拉普拉斯分布。具体而言,它表明,一致的估计是可能的,在这些情况下,一个属性,这是证明不可能在胡贝尔的污染模型。在早期的工作中提出的一个悬而未决的问题是确定样本的复杂性均值漂移污染模型的一般基础分布的均值估计。在这项工作中,我们研究并从根本上解决了这个悬而未决的问题。具体而言,我们表明,在温和的光谱条件下的特征函数(潜在的多变量)的基础分布,存在一个样本有效的算法,估计目标的平均值,以任何所需的精度。我们补充我们的上限与定性匹配的样本复杂性下限。我们的技术关键使用傅立叶分析,特别是引入傅立叶见证的概念作为我们的上限和下限的基本成分。
摘要:We study the basic task of mean estimation in the presence of mean-shift contamination. In the mean-shift contamination model, an adversary is allowed to replace a small constant fraction of the clean samples by samples drawn from arbitrarily shifted versions of the base distribution. Prior work characterized the sample complexity of this task for the special cases of the Gaussian and Laplace distributions. Specifically, it was shown that consistent estimation is possible in these cases, a property that is provably impossible in Huber's contamination model. An open question posed in earlier work was to determine the sample complexity of mean estimation in the mean-shift contamination model for general base distributions. In this work, we study and essentially resolve this open question. Specifically, we show that, under mild spectral conditions on the characteristic function of the (potentially multivariate) base distribution, there exists a sample-efficient algorithm that estimates the target mean to any desired accuracy. We complement our upper bound with a qualitatively matching sample complexity lower bound. Our techniques make critical use of Fourier analysis, and in particular introduce the notion of a Fourier witness as an essential ingredient of our upper and lower bounds.
【2】DualWeaver: Synergistic Feature Weaving Surrogates for Multivariate Forecasting with Univariate Time Series Foundation Models
标题:DualWeaver:一元时间序列基础模型多元预测的协同特征编织替代品
链接:https://arxiv.org/abs/2602.22066
作者:Jinpeng Li,Zhongyi Pei,Huaze Xue,Bojian Zheng,Chen Wang,Jianmin Wang
备注:16 pages. Preprint
摘要:时间序列基础模型(TSFM)通过大规模的预训练实现了强大的单变量预测,但有效地将这一成功扩展到多变量预测仍然具有挑战性。为了解决这个问题,我们提出了DualWeaver,一个新的框架,通过使用一对可学习的,结构对称的代理序列,适应单变量TSFM(Uni-TSFM)的多变量预测。由一个共享的辅助功能融合模块,捕捉交叉变量的依赖关系,这些代理映射到TSFM兼容系列通过预测目标。对称结构使得能够直接从代理进行最终预测的无参数重建,而无需额外的参数解码。进一步引入了一个理论上的正则化项,以增强对自适应崩溃的鲁棒性。在不同的真实世界数据集上进行的大量实验表明,DualWeaver在准确性和稳定性方面都优于最先进的多元预测器。我们在https://github.com/li-jinpeng/DualWeaver上发布代码。
摘要:Time-series foundation models (TSFMs) have achieved strong univariate forecasting through large-scale pre-training, yet effectively extending this success to multivariate forecasting remains challenging. To address this, we propose DualWeaver, a novel framework that adapts univariate TSFMs (Uni-TSFMs) for multivariate forecasting by using a pair of learnable, structurally symmetric surrogate series. Generated by a shared auxiliary feature-fusion module that captures cross-variable dependencies, these surrogates are mapped to TSFM-compatible series via the forecasting objective. The symmetric structure enables parameter-free reconstruction of final predictions directly from the surrogates, without additional parametric decoding. A theoretically grounded regularization term is further introduced to enhance robustness against adaptation collapse. Extensive experiments on diverse real-world datasets show that DualWeaver outperforms state-of-the-art multivariate forecasters in both accuracy and stability. We release the code at https://github.com/li-jinpeng/DualWeaver.
【3】Physics-Informed Machine Learning for Vessel Shaft Power and Fuel Consumption Prediction: Interpretable KAN-based Approach
标题:用于船舶轴功率和燃油消耗预测的物理信息机器学习:可解释的基于KAN的方法
链接:https://arxiv.org/abs/2602.22055
作者:Hamza Haruna Mohammed,Dusica Marijan,Arnbjørn Maressa
备注:10 pages, 5 figures, IEEE conference paper format; under review
摘要:准确预测轴转速、轴功率和燃料消耗对于提高海上运输的运营效率和可持续性至关重要。传统的基于物理的模型提供了可解释性,但与现实世界的可变性斗争,而纯粹的数据驱动的方法以牺牲物理可解释性为代价来实现准确性。本文介绍了一种物理信息Kolmogorov-Arnold网络(PI-KAN),这是一种混合方法,它将可解释的单变量特征变换与物理信息损失函数和无泄漏链式预测管道相结合。使用来自五艘货船的操作和环境数据,PI-KAN始终优于传统的多项式方法和神经网络基线。该模型实现了最低的平均绝对误差(MAE)和均方根误差(RMSE),以及所有船舶的轴功率和燃油消耗的最高决定系数(R^2),同时保持物理一致的行为。可解释性分析揭示了重新发现域一致的依赖关系,如速度-功率关系和余弦波和风效应。这些结果表明,PI-KAN实现了预测准确性和可解释性,为操作环境中的船舶性能监测和决策支持提供了一个强大的工具。
摘要:Accurate prediction of shaft rotational speed, shaft power, and fuel consumption is crucial for enhancing operational efficiency and sustainability in maritime transportation. Conventional physics-based models provide interpretability but struggle with real-world variability, while purely data-driven approaches achieve accuracy at the expense of physical plausibility. This paper introduces a Physics-Informed Kolmogorov-Arnold Network (PI-KAN), a hybrid method that integrates interpretable univariate feature transformations with a physics-informed loss function and a leakage-free chained prediction pipeline. Using operational and environmental data from five cargo vessels, PI-KAN consistently outperforms the traditional polynomial method and neural network baselines. The model achieves the lowest mean absolute error (MAE) and root mean squared error (RMSE), and the highest coefficient of determination (R^2) for shaft power and fuel consumption across all vessels, while maintaining physically consistent behavior. Interpretability analysis reveals rediscovery of domain-consistent dependencies, such as cubic-like speed-power relationships and cosine-like wave and wind effects. These results demonstrate that PI-KAN achieves both predictive accuracy and interpretability, offering a robust tool for vessel performance monitoring and decision support in operational settings.
【4】AgentLTV: An Agent-Based Unified Search-and-Evolution Framework for Automated Lifetime Value Prediction
标题:AgentLTV:一个基于Agent的统一搜索和进化框架,用于自动化终身价值预测
链接:https://arxiv.org/abs/2602.21634
作者:Chaowei Wu,Huazhu Chen,Congde Yuan,Qirui Yang,Guoqing Song,Yue Gao,Li Luo,Frank Youhua Chen,Mengzhuo Guo
备注:12 pages, 4 figures, submitted to KDD 2026: 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining, ADS Track
摘要:生命周期价值(LTV)预测在广告、推荐系统和电子商务中至关重要。在实践中,LTV数据模式因决策场景而异。因此,从业者经常构建复杂的、特定于SOA的管道,并在特性处理、目标设计和调优上进行扩展。这个过程是昂贵的,很难转移。我们提出了AgentLTV,一个基于代理的统一搜索和进化框架的自动LTV建模。AgentLTV将每个候选解决方案视为{可执行管道程序}。LLM驱动的代理生成代码,运行和修复管道,并分析执行反馈。两个决策代理协调两阶段搜索。蒙特卡洛树搜索(MCTS)阶段在固定预算下探索了广泛的建模选择空间,由树的多项式置信上限标准和Pareto感知的多度量值函数指导。进化算法(EA)阶段通过基于岛的进化与交叉,变异和迁移来改进最佳MCTS程序。在大规模专有数据集和公共基准测试上的实验表明,AgentLTV在排名和错误度量方面始终发现强大的模型。在线桶级分析进一步表明,排名一致性和价值校准得到了改善,特别是对于高价值和负LTV细分市场。我们总结了面向部署者的要点:使用MCTS快速适应新的数据模式,使用EA进行稳定的细化,并通过桶级排名和校准诊断来验证部署就绪性。所提出的AgentLTV已成功部署在网上。
摘要
:Lifetime Value (LTV) prediction is critical in advertising, recommender systems, and e-commerce. In practice, LTV data patterns vary across decision scenarios. As a result, practitioners often build complex, scenario-specific pipelines and iterate over feature processing, objective design, and tuning. This process is expensive and hard to transfer. We propose AgentLTV, an agent-based unified search-and-evolution framework for automated LTV modeling. AgentLTV treats each candidate solution as an {executable pipeline program}. LLM-driven agents generate code, run and repair pipelines, and analyze execution feedback. Two decision agents coordinate a two-stage search. The Monte Carlo Tree Search (MCTS) stage explores a broad space of modeling choices under a fixed budget, guided by the Polynomial Upper Confidence bounds for Trees criterion and a Pareto-aware multi-metric value function. The Evolutionary Algorithm (EA) stage refines the best MCTS program via island-based evolution with crossover, mutation, and migration. Experiments on a large-scale proprietary dataset and a public benchmark show that AgentLTV consistently discovers strong models across ranking and error metrics. Online bucket-level analysis further indicates improved ranking consistency and value calibration, especially for high-value and negative-LTV segments. We summarize practitioner-oriented takeaways: use MCTS for rapid adaptation to new data patterns, use EA for stable refinement, and validate deployment readiness with bucket-level ranking and calibration diagnostics. The proposed AgentLTV has been successfully deployed online.
【5】Extending Sequence Length is Not All You Need: Effective Integration of Multimodal Signals for Gene Expression Prediction
标题:延长序列长度并不是您所需要的全部:有效整合多峰信号用于基因表达预测
链接:https://arxiv.org/abs/2602.21550
作者:Zhao Yang,Yi Duan,Jiwei Zhu,Ying Ba,Chuan Cao,Bing Su
备注:Accepted at ICLR 2026
摘要:从DNA序列预测mRNA表达水平的基因表达预测提出了重大挑战。以前的工作通常集中在延长输入序列的长度,以定位远端增强子,这可能会影响数百个酶以外的靶基因。我们的工作首先表明,对于当前的模型,长序列建模会降低性能。即使是精心设计的算法也只能减轻长序列引起的性能下降。相反,我们发现靶基因附近的近端多模态表观基因组信号证明更重要。因此,我们专注于如何更好地整合这些信号,这一直被忽视。我们发现,不同的信号类型具有不同的生物学作用,一些直接标记活性调控元件,而另一些则反映背景染色质模式,可能会引入混淆效应。简单的连接可能会导致模型与这些背景模式产生虚假的关联。为了应对这一挑战,我们提出了Prism,这是一个框架,它学习高维表观基因组特征的多种组合来代表不同的背景染色质状态,并使用后门调整来减轻混淆效应。我们的实验结果表明,多模态表观基因组信号的正确建模仅使用短序列进行基因表达预测就可以实现最先进的性能。
摘要:Gene expression prediction, which predicts mRNA expression levels from DNA sequences, presents significant challenges. Previous works often focus on extending input sequence length to locate distal enhancers, which may influence target genes from hundreds of kilobases away. Our work first reveals that for current models, long sequence modeling can decrease performance. Even carefully designed algorithms only mitigate the performance degradation caused by long sequences. Instead, we find that proximal multimodal epigenomic signals near target genes prove more essential. Hence we focus on how to better integrate these signals, which has been overlooked. We find that different signal types serve distinct biological roles, with some directly marking active regulatory elements while others reflect background chromatin patterns that may introduce confounding effects. Simple concatenation may lead models to develop spurious associations with these background patterns. To address this challenge, we propose Prism, a framework that learns multiple combinations of high-dimensional epigenomic features to represent distinct background chromatin states and uses backdoor adjustment to mitigate confounding effects. Our experimental results demonstrate that proper modeling of multimodal epigenomic signals achieves state-of-the-art performance using only short sequences for gene expression prediction.
【6】Forecasting Future Language: Context Design for Mention Markets
标题:预测未来语言:提及市场的上下文设计
链接:https://arxiv.org/abs/2602.21229
作者:Sumin Kim,Jihoon Kwon,Yoon Kim,Nicole Kagan,Raffi Khatchadourian,Wonbin Ahn,Alejandro Lopez-Lira,Jaewon Lee,Yoontae Hwang,Oscar Levy,Yongjae Lee,Chanyeol Choi
备注:10 pages
摘要:提及市场是一种预测市场,在这种市场中,合同根据未来的公共事件中是否提到指定的关键词来解决,需要对关键词提及结果进行准确的概率预测。虽然最近的工作表明,大型语言模型(LLM)可以生成与人类预测者竞争的预测,但仍不清楚如何设计输入上下文以支持准确的预测。在本文中,我们研究这个问题,通过实验的盈利电话提及市场,这需要预测一家公司是否会提到一个特定的关键字在其即将到来的电话。我们进行受控比较,改变(i)提供的上下文信息(新闻和/或先前的盈利电话记录)和(ii)市场概率如何,(即,预测市场合同价格)。我们引入了市场条件预测(MCP),它明确地将市场隐含的概率视为先验,并指示LLM使用文本证据更新此先验,而不是从头开始重新预测基本利率。在我们的实验中,我们发现了三个见解:(1)更丰富的上下文始终提高预测性能;(2)市场条件提示(MCP),将市场概率视为先验,并使用文本证据进行更新,产生更好的校准预测;(3)市场概率和MCP的混合(MixMCP)优于市场基线。通过用市场先验抑制LLM的后验更新,MixMCP比单独的市场或LLM产生更稳健的预测。
摘要:Mention markets, a type of prediction market in which contracts resolve based on whether a specified keyword is mentioned during a future public event, require accurate probabilistic forecasts of keyword-mention outcomes. While recent work shows that large language models (LLMs) can generate forecasts competitive with human forecasters, it remains unclear how input context should be designed to support accurate prediction. In this paper, we study this question through experiments on earnings-call mention markets, which require forecasting whether a company will mention a specified keyword during its upcoming call. We run controlled comparisons varying (i) which contextual information is provided (news and/or prior earnings-call transcripts) and (ii) how \textit{market probability}, (i.e., prediction market contract price) is used. We introduce Market-Conditioned Prompting (MCP), which explicitly treats the market-implied probability as a prior and instructs the LLM to update this prior using textual evidence, rather than re-predicting the base rate from scratch. In our experiments, we find three insights: (1) richer context consistently improves forecasting performance; (2) market-conditioned prompting (MCP), which treats the market probability as a prior and updates it using textual evidence, yields better-calibrated forecasts; and (3) a mixture of the market probability and MCP (MixMCP) outperforms the market baseline. By dampening the LLM's posterior update with the market prior, MixMCP yields more robust predictions than either the market or the LLM alone.
其他神经网络|深度学习|模型|建模(18篇)
【1】Surrogate models for Rock-Fluid Interaction: A Grid-Size-Invariant Approach
标题:岩石-流体相互作用的替代模型:网格尺寸不变的方法
链接:https://arxiv.org/abs/2602.22188
作者:Nathalie C. Pinheiro,Donghu Guo,Hannah P. Menke,Aniket C. Joshi,Claire E. Heaney,Ahmed H. ElSheikh,Christopher C. Pain
摘要:模拟岩石-流体相互作用需要求解一组偏微分方程(PDE)来预测流体在界面上的流动行为和与岩石的反应。传统的高保真数值模型需要高分辨率才能获得可靠的结果,从而导致巨大的计算费用。这限制了这些模型对多查询问题的适用性,例如不确定性量化和优化,这需要运行许多场景。作为高保真模型的廉价替代方案,这项工作开发了八个替代模型来预测多孔介质中的流体流动。其中四个是降阶模型(ROM),一个神经网络用于压缩,另一个用于预测。其他四个是具有网格大小不变性特性的单神经网络;我们使用这个术语来指代图像到图像模型,这些模型能够推断出比训练期间使用的计算域更大的计算域。除了代理模型的新的网格大小不变的框架,我们比较了UNet和UNet++架构的预测性能,并证明了UNet++优于UNet的代理模型。此外,我们还证明了网格大小不变方法是减少训练过程中内存消耗的可靠方法,从而使预测值和地面实况值之间具有良好的相关性,并优于所分析的ROM。分析的应用程序是特别具有挑战性的,因为流体引起的岩石溶解的结果在一个非静态的固体场,因此,它不能用来帮助调整未来的预测。
摘要:Modelling rock-fluid interaction requires solving a set of partial differential equations (PDEs) to predict the flow behaviour and the reactions of the fluid with the rock on the interfaces. Conventional high-fidelity numerical models require a high resolution to obtain reliable results, resulting in huge computational expense. This restricts the applicability of these models for multi-query problems, such as uncertainty quantification and optimisation, which require running numerous scenarios. As a cheaper alternative to high-fidelity models, this work develops eight surrogate models for predicting the fluid flow in porous media. Four of these are reduced-order models (ROM) based on one neural network for compression and another for prediction. The other four are single neural networks with the property of grid-size invariance; a term which we use to refer to image-to-image models that are capable of inferring on computational domains that are larger than those used during training. In addition to the novel grid-size-invariant framework for surrogate models, we compare the predictive performance of UNet and UNet++ architectures, and demonstrate that UNet++ outperforms UNet for surrogate models. Furthermore, we show that the grid-size-invariant approach is a reliable way to reduce memory consumption during training, resulting in good correlation between predicted and ground-truth values and outperforming the ROMs analysed. The application analysed is particularly challenging because fluid-induced rock dissolution results in a non-static solid field and, consequently, it cannot be used to help in adjustments of the future prediction.
【2】Learning and Naming Subgroups with Exceptional Survival Characteristics
标题:具有特殊生存特征的学习和学习亚群
链接:https://arxiv.org/abs/2602.22179
作者:Mhd Jawad Al Rahwanji,Sascha Xu,Nils Philipp Walter,Jilles Vreeken
摘要:在许多应用中,重要的是识别比种群的其余部分存活更长或更短的亚群。例如,在医学上,它可以确定哪些患者从治疗中受益,以及在预测性维护中,哪些组件更有可能失败。发现具有特殊生存特征的亚组的现有方法需要对生存模型(例如比例风险)、预离散化特征进行限制性假设,并且当它们比较平均统计量时,往往会忽略个体偏差。在本文中,我们提出了Sysurv,这是一种完全可微的非参数方法,它利用随机生存森林来学习个体生存曲线,自动学习条件以及如何将这些条件组合成内在可解释的规则,以便选择具有特殊生存特征的子群。对广泛的数据集和设置的经验评估,包括癌症数据的案例研究,表明Sysurv揭示了有见地和可操作的生存亚组。
摘要:In many applications, it is important to identify subpopulations that survive longer or shorter than the rest of the population. In medicine, for example, it allows determining which patients benefit from treatment, and in predictive maintenance, which components are more likely to fail. Existing methods for discovering subgroups with exceptional survival characteristics require restrictive assumptions about the survival model (e.g. proportional hazards), pre-discretized features, and, as they compare average statistics, tend to overlook individual deviations. In this paper, we propose Sysurv, a fully differentiable, non-parametric method that leverages random survival forests to learn individual survival curves, automatically learns conditions and how to combine these into inherently interpretable rules, so as to select subgroups with exceptional survival characteristics. Empirical evaluation on a wide range of datasets and settings, including a case study on cancer data, shows that Sysurv reveals insightful and actionable survival subgroups.
【3】Don't stop me now: Rethinking Validation Criteria for Model Parameter Selection
标题:现在不要阻止我:重新思考模型参数选择的验证标准
链接:https://arxiv.org/abs/2602.22107
作者:Andrea Apicella,Francesco Isgrò,Andrea Pollastro,Roberto Prevete
摘要:尽管有大量关于训练损失函数的文献,但对验证集的泛化评估仍然没有得到充分的研究。在这项工作中,我们进行了系统的实证和统计研究,用于模型选择的验证标准如何影响神经分类器的测试性能,并注意早期停止。在标准基准上使用完全连接的网络,在$k$倍的评估下,我们比较:(i)耐心的早期停止和(ii)所有时期的事后选择(即没有早期停止)。模型使用交叉熵、C-Loss或PolyLoss进行训练;验证集上的模型参数选择使用准确性或三个损失函数之一进行,每个损失函数都独立考虑。有三个主要发现。(1)基于验证准确性的早期停止执行最差,始终选择测试准确性低于基于丢失的早期停止和事后选择的检查点。(2)基于损失的验证标准产生可比和更稳定的测试精度。(3)在数据集和折叠中,任何单个验证规则的性能往往低于测试最优检查点。总体而言,所选模型通常在统计上实现测试集性能低于所有时期的最佳性能,无论验证标准如何。我们的研究结果表明,避免验证精度(特别是与早期停止)的参数选择,有利于基于损失的验证标准。
摘要
:Despite the extensive literature on training loss functions, the evaluation of generalization on the validation set remains underexplored. In this work, we conduct a systematic empirical and statistical study of how the validation criterion used for model selection affects test performance in neural classifiers, with attention to early stopping. Using fully connected networks on standard benchmarks under $k$-fold evaluation, we compare: (i) early stopping with patience and (ii) post-hoc selection over all epochs (i.e. no early stopping). Models are trained with cross-entropy, C-Loss, or PolyLoss; the model parameter selection on the validation set is made using accuracy or one of the three loss functions, each considered independently. Three main findings emerge. (1) Early stopping based on validation accuracy performs worst, consistently selecting checkpoints with lower test accuracy than both loss-based early stopping and post-hoc selection. (2) Loss-based validation criteria yield comparable and more stable test accuracy. (3) Across datasets and folds, any single validation rule often underperforms the test-optimal checkpoint. Overall, the selected model typically achieves test-set performance statistically lower than the best performance across all epochs, regardless of the validation criterion. Our results suggest avoiding validation accuracy (in particular with early stopping) for parameter selection, favoring loss-based validation criteria.
【4】Learning in the Null Space: Small Singular Values for Continual Learning
标题:学习空间中的学习:持续学习的小奇异值
链接:https://arxiv.org/abs/2602.21919
作者:Cuong Anh Pham,Praneeth Vepakomma,Samuel Horváth
备注:17 pages, accepted as Oral presentation at the Third Conference on Parsimony and Learning (CPAL 2026)
摘要:减轻灾难性的遗忘,同时使进一步的学习是一个主要的挑战,在继续学习(CL)。基于正交的训练方法因其效率和强大的理论特性而受到关注,许多现有方法通过梯度投影来增强正交性。在本文中,我们重新审视正交性,并利用这样一个事实,即小奇异值对应于几乎正交于以前任务的输入空间的方向。基于这一原则,我们引入了NESS(从小奇异值估计的零空间),这是一种CL方法,它直接在权重空间中应用正交性,而不是通过梯度操作。具体而言,NESS使用每层输入表示的最小奇异值构建近似零空间,并通过约束到该子空间的紧凑低秩自适应(LoRA风格)公式来参数化特定于任务的更新。子空间基是固定的,以保持零空间约束,并且每个任务只学习一个可训练矩阵。这种设计确保所产生的更新近似地保持在先前输入的零空间中,同时能够适应新任务。我们在三个基准数据集上的理论分析和实验证明了具有竞争力的性能,低遗忘和跨任务的稳定准确性,突出了小奇异值在持续学习中的作用。该代码可在https://github.com/pacman-ctm/NESS上获得。
摘要:Alleviating catastrophic forgetting while enabling further learning is a primary challenge in continual learning (CL). Orthogonal-based training methods have gained attention for their efficiency and strong theoretical properties, and many existing approaches enforce orthogonality through gradient projection. In this paper, we revisit orthogonality and exploit the fact that small singular values correspond to directions that are nearly orthogonal to the input space of previous tasks. Building on this principle, we introduce NESS (Null-space Estimated from Small Singular values), a CL method that applies orthogonality directly in the weight space rather than through gradient manipulation. Specifically, NESS constructs an approximate null space using the smallest singular values of each layer's input representation and parameterizes task-specific updates via a compact low-rank adaptation (LoRA-style) formulation constrained to this subspace. The subspace basis is fixed to preserve the null-space constraint, and only a single trainable matrix is learned for each task. This design ensures that the resulting updates remain approximately in the null space of previous inputs while enabling adaptation to new tasks. Our theoretical analysis and experiments on three benchmark datasets demonstrate competitive performance, low forgetting, and stable accuracy across tasks, highlighting the role of small singular values in continual learning. The code is available at https://github.com/pacman-ctm/NESS.
【5】The Error of Deep Operator Networks Is the Sum of Its Parts: Branch-Trunk and Mode Error Decompositions
标题:深度运营商网络的误差是其各部分的总和:分支-干线和模式误差分解
链接:https://arxiv.org/abs/2602.21910
作者:Alexander Heinlein,Johannes Taraz
备注:29 pages, 12 figures
摘要:算子学习有可能通过学习微分方程的解算子来强烈影响科学计算,从而可能加速多查询任务,例如设计优化和数量级的不确定性量化。尽管已证明具有通用近似特性,但深度算子网络(DeepONets)在实践中往往表现出有限的准确性和泛化能力,这阻碍了它们的采用。因此,了解这些局限性对于进一步推进该方法至关重要。 这项工作分析了经典DeepONet架构的性能限制。结果表明,近似误差是由分支网络的内部尺寸足够大时,和学习的主干基础往往可以取代经典的基函数,而不会对性能产生显着影响。 为了进一步研究这一点,我们构造了一个修改后的DeepONet,其中主干网络被训练解矩阵的左奇异向量所取代。这种修改产生了几个关键的见解。首先,观察到分支网络中的频谱偏差,并且更有效地学习了主导低频模式的系数。第二,由于奇异值缩放的分支系数,整体分支误差是占主导地位的模式与中间奇异值,而不是最小的。第三,使用共享的分支网络的所有模式系数,如在标准架构中,提高了小模式的泛化相比,堆叠的架构,其中系数被单独计算。最后,在参数空间中的模式之间的强和有害的耦合被识别。
摘要:Operator learning has the potential to strongly impact scientific computing by learning solution operators for differential equations, potentially accelerating multi-query tasks such as design optimization and uncertainty quantification by orders of magnitude. Despite proven universal approximation properties, deep operator networks (DeepONets) often exhibit limited accuracy and generalization in practice, which hinders their adoption. Understanding these limitations is therefore crucial for further advancing the approach. This work analyzes performance limitations of the classical DeepONet architecture. It is shown that the approximation error is dominated by the branch network when the internal dimension is sufficiently large, and that the learned trunk basis can often be replaced by classical basis functions without a significant impact on performance. To investigate this further, a modified DeepONet is constructed in which the trunk network is replaced by the left singular vectors of the training solution matrix. This modification yields several key insights. First, a spectral bias in the branch network is observed, with coefficients of dominant, low-frequency modes learned more effectively. Second, due to singular-value scaling of the branch coefficients, the overall branch error is dominated by modes with intermediate singular values rather than the smallest ones. Third, using a shared branch network for all mode coefficients, as in the standard architecture, improves generalization of small modes compared to a stacked architecture in which coefficients are computed separately. Finally, strong and detrimental coupling between modes in parameter space is identified.
【6】Easy to Learn, Yet Hard to Forget: Towards Robust Unlearning Under Bias
标题:易于学习,但难以忘记:在偏见下实现稳健的遗忘
链接:https://arxiv.org/abs/2602.21773
作者:JuneHyoung Kwon,MiHyeon Kim,Eunju Lee,Yoonji Lee,Seunghoon Lee,YoungBin Kim
备注:Accepted to AAAI 2026
摘要:机器非学习使模型能够忘记特定数据,对于确保数据隐私和模型可靠性至关重要。然而,在现实世界中,模型从数据中的虚假相关性中学习到意想不到的偏差,其有效性可能会受到严重破坏。本文研究了从这种有偏见的模型中忘却的独特挑战。我们发现了一种新的现象,我们称之为“捷径遗忘”,其中模型表现出“容易学习,但难以忘记”的趋势。具体来说,模型很难忘记容易学习的、偏差对齐的样本;它们不是忘记类属性,而是忘记偏差属性,这可以矛盾地提高打算忘记的类的准确性。为了解决这个问题,我们提出了CUPID,这是一个新的非学习框架,其灵感来自于观察到具有不同偏见的样本表现出不同的损失景观锐度。我们的方法首先根据样本清晰度将遗忘集划分为因果和偏差近似子集,然后将模型参数分解为因果和偏差路径,最后通过将精细的因果和偏差梯度路由到各自的路径来执行有针对性的更新。在包括Waterbirds、BAR和Biased NICO++在内的有偏数据集上进行的大量实验表明,我们的方法实现了最先进的遗忘性能,并有效地缓解了捷径遗忘问题。
摘要:Machine unlearning, which enables a model to forget specific data, is crucial for ensuring data privacy and model reliability. However, its effectiveness can be severely undermined in real-world scenarios where models learn unintended biases from spurious correlations within the data. This paper investigates the unique challenges of unlearning from such biased models. We identify a novel phenomenon we term ``shortcut unlearning," where models exhibit an ``easy to learn, yet hard to forget" tendency. Specifically, models struggle to forget easily-learned, bias-aligned samples; instead of forgetting the class attribute, they unlearn the bias attribute, which can paradoxically improve accuracy on the class intended to be forgotten. To address this, we propose CUPID, a new unlearning framework inspired by the observation that samples with different biases exhibit distinct loss landscape sharpness. Our method first partitions the forget set into causal- and bias-approximated subsets based on sample sharpness, then disentangles model parameters into causal and bias pathways, and finally performs a targeted update by routing refined causal and bias gradients to their respective pathways. Extensive experiments on biased datasets including Waterbirds, BAR, and Biased NICO++ demonstrate that our method achieves state-of-the-art forgetting performance and effectively mitigates the shortcut unlearning problem.
【7】ABM-UDE: Developing Surrogates for Epidemic Agent-Based Models via Scientific Machine Learning
标题:ABM-UTE:通过科学机器学习开发基于流行病代理的模型的替代品
链接:https://arxiv.org/abs/2602.21588
作者:Sharv Murgai,Utkarsh Utkarsh,Kyle C. Nguyen,Alan Edelman,Erin C. S. Acquesta,Christopher Vincent Rackauckas
备注:25 pages, 4 figures
摘要:基于代理的流行病模型(ABM)编码的行为和政策的异质性,但太慢,夜间医院规划。我们开发了可直接从exascale ABM轨迹使用通用微分方程(UDEs)学习的县准备代理:具有神经参数化接触率$κ_φ(u,t)$(无附加残差)的机械SEIR系列ODEs。我们的贡献有三个方面:我们采用多重射击和一种基于预测误差的方法(PEM)来稳定神经增强的流行病学动态的识别,使其跨越干预驱动的状态转变;我们实施积极性和质量守恒,并表明学习的接触率参数化产生了适定的向量场;我们量化了准确性、校准和计算ABM集合和UDE基线。在代表性ExaEpi场景中,PEM-UDE相对于单次激发UDE(3.00 vs. 13.14)降低了平均MSE 77%,相对于MS-UDE(3.75)降低了20%。可靠性同时提高:ABM $10$-$90$%和$25$-$75$%波段的经验覆盖率从0.68/0.43(UDE)和0.79/0.55(MS-UDE)上升到PEM-UDE的0.86/0.61和MS+PEM-UDE的0.94/0.69,表明校准的不确定性而不是过度自信的拟合。在商用CPU上,推理可在几秒钟内运行(每90天预测20-35秒),从而可以在笔记本电脑上进行夜间“假设”扫描。相对于$\sim$100 CPU-小时ABM参考运行,这将使每个场景的时钟时间降低$\sim10^{4}\times。这缩小了现实主义节奏差距,支持阈值感知决策(例如,保持ICU占用率$<75$%),保持机械可解释性,并且使得能够在标准机构硬件上进行校准的、风险感知的场景规划。除了流行病,ABM$\to$UDE配方提供了一个便携式路径,可以将基于代理的模拟器提取为其他科学领域的快速,可靠的替代品。
摘要:Agent-based epidemic models (ABMs) encode behavioral and policy heterogeneity but are too slow for nightly hospital planning. We develop county-ready surrogates that learn directly from exascale ABM trajectories using Universal Differential Equations (UDEs): mechanistic SEIR-family ODEs with a neural-parameterized contact rate $κ_φ(u,t)$ (no additive residual). Our contributions are threefold: we adapt multiple shooting and an observer-based prediction-error method (PEM) to stabilize identification of neural-augmented epidemiological dynamics across intervention-driven regime shifts; we enforce positivity and mass conservation and show the learned contact-rate parameterization yields a well-posed vector field; and we quantify accuracy, calibration, and compute against ABM ensembles and UDE baselines. On a representative ExaEpi scenario, PEM-UDE reduces mean MSE by 77% relative to single-shooting UDE (3.00 vs. 13.14) and by 20% relative to MS-UDE (3.75). Reliability improves in parallel: empirical coverage of ABM $10$-$90$% and $25$-$75$% bands rises from 0.68/0.43 (UDE) and 0.79/0.55 (MS-UDE) to 0.86/0.61 with PEM-UDE and 0.94/0.69 with MS+PEM-UDE, indicating calibrated uncertainty rather than overconfident fits. Inference runs in seconds on commodity CPUs (20-35 s per $\sim$90-day forecast), enabling nightly ''what-if'' sweeps on a laptop. Relative to a $\sim$100 CPU-hour ABM reference run, this yields $\sim10^{4}\times$ lower wall-clock per scenario. This closes the realism-cadence gap, supports threshold-aware decision-making (e.g., maintaining ICU occupancy $<75$%), preserves mechanistic interpretability, and enables calibrated, risk-aware scenario planning on standard institutional hardware. Beyond epidemics, the ABM$\to$UDE recipe provides a portable path to distill agent-based simulators into fast, trustworthy surrogates for other scientific domains.
【8】Mamba Meets Scheduling: Learning to Solve Flexible Job Shop Scheduling with Efficient Sequence Modeling
标题:Mamba满足调度:学习通过高效的序列建模解决灵活的车间调度问题
链接:https://arxiv.org/abs/2602.21546
作者:Zhi Cao,Cong Zhang,Yaoxin Wu,Yaqing Hou,Hongwei Ge
摘要
:柔性作业车间问题(FJSP)是一个在制造业和生产调度中有着广泛应用的组合优化问题。它涉及将作业分配给不同的机器以优化标准,例如最小化总完成时间。目前在这个领域中基于学习的方法通常依赖于本地化的特征提取模型,限制了它们捕获跨操作和机器的总体依赖关系的能力。本文介绍了一种创新的架构,利用Mamba,线性计算复杂度的状态空间模型,以促进全面的序列建模定制FJSP。与流行的基于图形注意力的框架相比,FJSP的计算密集型,我们的模型更有效。具体来说,该模型具有一个编码器和一个解码器。该编码器采用了双曼巴块提取操作和机器功能分开。此外,我们引入了一个有效的交叉注意解码器来学习操作和机器的交互式嵌入。我们的实验结果表明,我们的方法实现了更快的求解速度,并超越了最先进的基于学习的方法在各种基准的FJSP的性能。
摘要:The Flexible Job Shop Problem (FJSP) is a well-studied combinatorial optimization problem with extensive applications for manufacturing and production scheduling. It involves assigning jobs to various machines to optimize criteria, such as minimizing total completion time. Current learning-based methods in this domain often rely on localized feature extraction models, limiting their capacity to capture overarching dependencies spanning operations and machines. This paper introduces an innovative architecture that harnesses Mamba, a state-space model with linear computational complexity, to facilitate comprehensive sequence modeling tailored for FJSP. In contrast to prevalent graph-attention-based frameworks that are computationally intensive for FJSP, we show our model is more efficient. Specifically, the proposed model possesses an encoder and a decoder. The encoder incorporates a dual Mamba block to extract operation and machine features separately. Additionally, we introduce an efficient cross-attention decoder to learn interactive embeddings of operations and machines. Our experimental results demonstrate that our method achieves faster solving speed and surpasses the performance of state-of-the-art learning-based methods for FJSP across various benchmarks.
【9】The Design Space of Tri-Modal Masked Diffusion Models
标题:三模掩蔽扩散模型的设计空间
链接:https://arxiv.org/abs/2602.21472
作者:Louis Bethune,Victor Turrisi,Bruno Kacper Mlodozeniec,Pau Rodriguez Lopez,Lokesh Boominathan,Nikhil Bhendawade,Amitis Shidani,Joris Pelemans,Theo X. Olausson,Devon Hjelm,Paul Dixon,Joao Monteiro,Pierre Ablin,Vishnu Banna,Arno Blaas,Nick Henderson,Kari Noriy,Dan Busbridge,Josh Susskind,Marco Cuturi,Irina Belousova,Luca Zappella,Russ Webb,Jason Ramapuram
备注:41 pages, 29 figures, 10 tables
摘要:离散扩散模型已经成为自回归语言模型的强有力的替代品,最近的工作初始化和微调双峰生成的基本单峰模型。与以前的方法不同,我们介绍了第一个在文本、图像-文本和音频-文本数据上从头开始预训练的三模式掩蔽扩散模型。我们系统地分析了多模态比例律,模态混合比,噪声时间表和批量效应,并提供了优化的推理采样默认值。我们的批量分析产生了一种新的随机微分方程(SDD)为基础的重新参数化,消除了需要调整的最佳批量大小,在最近的工作报告。这种重新参数化将通常基于计算约束(GPU饱和度、FLOP效率、挂钟时间)选择的物理批处理大小与逻辑批处理大小进行比较,选择逻辑批处理大小是为了在随机优化期间平衡梯度方差。最后,我们在6.4T令牌上预训练了一个初步的3B参数三模态模型,展示了统一设计的能力,并在文本生成、文本到图像任务和文本到语音任务中取得了很好的效果。我们的工作是迄今为止对多模态离散扩散模型进行的最大规模的系统性开放研究,为跨多种模态的缩放行为提供了见解。
摘要:Discrete diffusion models have emerged as strong alternatives to autoregressive language models, with recent work initializing and fine-tuning a base unimodal model for bimodal generation. Diverging from previous approaches, we introduce the first tri-modal masked diffusion model pretrained from scratch on text, image-text, and audio-text data. We systematically analyze multimodal scaling laws, modality mixing ratios, noise schedules, and batch-size effects, and we provide optimized inference sampling defaults. Our batch-size analysis yields a novel stochastic differential equation (SDE)-based reparameterization that eliminates the need for tuning the optimal batch size as reported in recent work. This reparameterization decouples the physical batch size, often chosen based on compute constraints (GPU saturation, FLOP efficiency, wall-clock time), from the logical batch size, chosen to balance gradient variance during stochastic optimization. Finally, we pretrain a preliminary 3B-parameter tri-modal model on 6.4T tokens, demonstrating the capabilities of a unified design and achieving strong results in text generation, text-to-image tasks, and text-to-speech tasks. Our work represents the largest-scale systematic open study of multimodal discrete diffusion models conducted to date, providing insights into scaling behaviors across multiple modalities.
【10】When Learning Hurts: Fixed-Pole RNN for Real-Time Online Training
标题:当学习受到伤害时:用于实时在线训练的固定极点RNN
链接:https://arxiv.org/abs/2602.21454
作者:Alexander Morgan,Ummay Sumaya Khan,Lingjia Liu,Lizhong Zheng
摘要:递归神经网络(RNN)可以被解释为离散时间状态空间模型,其中状态演化对应于由前馈权重和递归极点两者控制的无限脉冲响应(IIR)滤波操作。虽然原则上,包括极点位置在内的所有参数都可以通过时间反向传播(BPTT)进行优化,但这种联合学习会产生大量的计算开销,并且对于训练数据有限的应用来说通常是不切实际的。回声状态网络(ESN)通过固定循环动力学并仅训练线性读数来缓解这种限制,从而实现高效稳定的在线自适应。在这项工作中,我们通过分析和经验研究了为什么学习复发极点在数据受限的实时学习场景中不能提供切实的好处。我们的分析表明,极点学习使权重优化问题高度非凸,需要更多的训练样本和迭代,以使基于梯度的方法收敛到有意义的解决方案。从经验上看,我们观察到,对于复值数据,梯度下降经常表现出较长的平稳期,而高级优化器提供的改进有限。相比之下,固定极点架构即使在有限的训练数据下也会产生稳定且条件良好的状态表示。数值结果表明,固定极点网络具有较低的训练复杂度,使其更适合于在线实时任务的优越性能。
摘要:Recurrent neural networks (RNNs) can be interpreted as discrete-time state-space models, where the state evolution corresponds to an infinite-impulse-response (IIR) filtering operation governed by both feedforward weights and recurrent poles. While, in principle, all parameters including pole locations can be optimized via backpropagation through time (BPTT), such joint learning incurs substantial computational overhead and is often impractical for applications with limited training data. Echo state networks (ESNs) mitigate this limitation by fixing the recurrent dynamics and training only a linear readout, enabling efficient and stable online adaptation. In this work, we analytically and empirically examine why learning recurrent poles does not provide tangible benefits in data-constrained, real-time learning scenarios. Our analysis shows that pole learning renders the weight optimization problem highly non-convex, requiring significantly more training samples and iterations for gradient-based methods to converge to meaningful solutions. Empirically, we observe that for complex-valued data, gradient descent frequently exhibits prolonged plateaus, and advanced optimizers offer limited improvement. In contrast, fixed-pole architectures induce stable and well-conditioned state representations even with limited training data. Numerical results demonstrate that fixed-pole networks achieve superior performance with lower training complexity, making them more suitable for online real-time tasks.
【11】HiPPO Zoo: Explicit Memory Mechanisms for Interpretable State Space Models
标题:HiPPO Zoo:可解释状态空间模型的显式记忆机制
链接:https://arxiv.org/abs/2602.21340
作者:Jack Goffinet,Casey Hanks,David E. Carlson
备注:20 pages, 6 figures
摘要:Representing the past in a compressed, efficient, and informative manner is a central problem for systems trained on sequential data. The HiPPO framework, originally proposed by Gu & Dao et al., provides a principled approach to sequential compression by projecting signals onto orthogonal polynomial (OP) bases via structured linear ordinary differential equations. Subsequent works have embedded these dynamics in state space models (SSMs), where HiPPO structure serves as an initialization. Nonlinear successors of these SSM methods such as Mamba are state-of-the-art for many tasks with long-range dependencies, but the mechanisms by which they represent and prioritize history remain largely implicit. In this work, we revisit the HiPPO framework with the goal of making these mechanisms explicit. We show how polynomial representations of history can be extended to support capabilities of modern SSMs such as adaptive allocation of memory and associative memory while retaining direct interpretability in the OP basis. We introduce a unified framework comprising five such extensions, which we collectively refer to as a "HiPPO zoo." Each extension exposes a specific modeling capability through an explicit, interpretable modification of the HiPPO framework. The resulting models adapt their memory online and train in streaming settings with efficient updates. We illustrate the behaviors and modeling advantages of these extensions through a range of synthetic sequence modeling tasks, demonstrating that capabilities typically associated with modern SSMs can be realized through explicit, interpretable polynomial memory structures.
【12】AngelSlim: A more accessible, comprehensive, and efficient toolkit for large model compression
标题:AngelSlim:一个更易于访问、更全面、更高效的大型模型压缩工具包
链接:https://arxiv.org/abs/2602.21233
作者:Rui Cen,QiangQiang Hu,Hong Huang,Hong Liu,Song Liu,Xin Luo,Lin Niu,Yifan Tan,Decheng Wu,Linchuan Xie,Rubing Yang,Guanghua Yu,Jianchen Zhu
摘要:This technical report introduces AngelSlim, a comprehensive and versatile toolkit for large model compression developed by the Tencent Hunyuan team. By consolidating cutting-edge algorithms, including quantization, speculative decoding, token pruning, and distillation. AngelSlim provides a unified pipeline that streamlines the transition from model compression to industrial-scale deployment. To facilitate efficient acceleration, we integrate state-of-the-art FP8 and INT8 Post-Training Quantization (PTQ) algorithms alongside pioneering research in ultra-low-bit regimes, featuring HY-1.8B-int2 as the first industrially viable 2-bit large model. Beyond quantization, we propose a training-aligned speculative decoding framework compatible with multimodal architectures and modern inference engines, achieving 1.8x to 2.0x throughput gains without compromising output correctness. Furthermore, we develop a training-free sparse attention framework that reduces Time-to-First-Token (TTFT) in long-context scenarios by decoupling sparse kernels from model architectures through a hybrid of static patterns and dynamic token selection. For multimodal models, AngelSlim incorporates specialized pruning strategies, namely IDPruner for optimizing vision tokens via Maximal Marginal Relevance and Samp for adaptive audio token merging and pruning. By integrating these compression strategies from low-level implementations, AngelSlim enables algorithm-focused research and tool-assisted deployment.
【13】Probing the Geometry of Diffusion Models with the String Method
标题:用弦方法探索扩散模型的几何形状
链接:https://arxiv.org/abs/2602.22122
作者:Elio Moreau,Florentin Coeurdoux,Grégoire Ferre,Eric Vanden-Eijnden
摘要:Understanding the geometry of learned distributions is fundamental to improving and interpreting diffusion models, yet systematic tools for exploring their landscape remain limited. Standard latent-space interpolations fail to respect the structure of the learned distribution, often traversing low-density regions. We introduce a framework based on the string method that computes continuous paths between samples by evolving curves under the learned score function. Operating on pretrained models without retraining, our approach interpolates between three regimes: pure generative transport, which yields continuous sample paths; gradient-dominated dynamics, which recover minimum energy paths (MEPs); and finite-temperature string dynamics, which compute principal curves -- self-consistent paths that balance energy and entropy. We demonstrate that the choice of regime matters in practice. For image diffusion models, MEPs contain high-likelihood but unrealistic ''cartoon'' images, confirming prior observations that likelihood maxima appear unrealistic; principal curves instead yield realistic morphing sequences despite lower likelihood. For protein structure prediction, our method computes transition pathways between metastable conformers directly from models trained on static structures, yielding paths with physically plausible intermediates. Together, these results establish the string method as a principled tool for probing the modal structure of diffusion models -- identifying modes, characterizing barriers, and mapping connectivity in complex learned distributions.
【14】MBD-ML: Many-body dispersion from machine learning for molecules and materials
标题:MBD-ML:分子和材料机器学习的多体分散
链接:https://arxiv.org/abs/2602.22086
作者:Evgeny Moerman,Adil Kabylda,Almaz Khabibrakhmanov,Alexandre Tkatchenko
备注:22 pages, 6 figures, Supplementary Information (12 figures)
摘要
:Van der Waals (vdW) interactions are essential for describing molecules and materials, from drug design and catalysis to battery applications. These omnipresent interactions must also be accurately included in machine-learned force fields. The many-body dispersion (MBD) method stands out as one of the most accurate and transferable approaches to capture vdW interactions, requiring only atomic $C_6$ coefficients and polarizabilities as input. We present MBD-ML, a pretrained message passing neural network that predicts these atomic properties directly from atomic structures. Through seamless integration with libMBD, our method enables the immediate calculation of MBD-inclusive total energies, forces, and stress tensors. By eliminating the need for intermediate electronic structure calculations, MBD-ML offers a practical and streamlined tool that simplifies the incorporation of state-of-the-art vdW interactions into any electronic structure code, as well as empirical and machine-learned force fields.
【15】Learning Quantum Data Distribution via Chaotic Quantum Diffusion Model
标题:通过混乱量子扩散模型学习量子数据分布
链接:https://arxiv.org/abs/2602.22061
作者:Quoc Hoan Tran,Koki Chinzei,Yasuhiro Endo,Hirotaka Oshima
备注:12 pages, 7 figures; extended version from Poster in Workshop: Machine Learning and the Physical Sciences https://neurips.cc/virtual/2025/loc/san-diego/123072
摘要:Generative models for quantum data pose significant challenges but hold immense potential in fields such as chemoinformatics and quantum physics. Quantum denoising diffusion probabilistic models (QuDDPMs) enable efficient learning of quantum data distributions by progressively scrambling and denoising quantum states; however, existing implementations typically rely on circuit-based random unitary dynamics that can be costly to realize and sensitive to control imperfections, particularly on analog quantum hardware. We propose the chaotic quantum diffusion model, a framework that generates projected ensembles via chaotic Hamiltonian time evolution, providing a flexible and hardware-compatible diffusion mechanism. Requiring only global, time-independent control, our approach substantially reduces implementation overhead across diverse analog quantum platforms while achieving accuracy comparable to QuDDPMs. This method improves trainability and robustness, broadening the applicability of quantum generative modeling.
【16】Neural Learning of Fast Matrix Multiplication Algorithms: A StrassenNet Approach
标题:快速矩阵乘算法的神经学习:StrassenNet方法
链接:https://arxiv.org/abs/2602.21797
作者:Paolo Andreini,Alessandra Bernardi,Monica Bianchini,Barbara Toniella Corradini,Sara Marziali,Giacomo Nunziati,Franco Scarselli
备注:16 pages, 5 figures
摘要:Fast matrix multiplication can be described as searching for low-rank decompositions of the matrix--multiplication tensor. We design a neural architecture, \textsc{StrassenNet}, which reproduces the Strassen algorithm for $2\times 2$ multiplication. Across many independent runs the network always converges to a rank-$7$ tensor, thus numerically recovering Strassen's optimal algorithm. We then train the same architecture on $3\times 3$ multiplication with rank $r\in\{19,\dots,23\}$. Our experiments reveal a clear numerical threshold: models with $r=23$ attain significantly lower validation error than those with $r\le 22$, suggesting that $r=23$ could actually be the smallest effective rank of the matrix multiplication tensor $3\times 3$. We also sketch an extension of the method to border-rank decompositions via an $\varepsilon$--parametrisation and report preliminary results consistent with the known bounds for the border rank of the $3\times 3$ matrix--multiplication tensor.
【17】Goodness-of-Fit Tests for Latent Class Models with Ordinal Categorical Data
标题:具有有序分类数据的潜在类模型的适合度检验
链接:https://arxiv.org/abs/2602.21572
作者:Huan Qing
备注:50 pages, 4 tables, 3 figures
摘要:Ordinal categorical data are widely collected in psychology, education, and other social sciences, appearing commonly in questionnaires, assessments, and surveys. Latent class models provide a flexible framework for uncovering unobserved heterogeneity by grouping individuals into homogeneous classes based on their response patterns. A fundamental challenge in applying these models is determining the number of latent classes, which is unknown and must be inferred from data. In this paper, we propose one test statistic for this problem. The test statistic centers the largest singular value of a normalized residual matrix by a simple sample-size adjustment. Under the null hypothesis that the candidate number of latent classes is correct, its upper bound converges to zero in probability. Under an under-fitted alternative, the statistic itself exceeds a fixed positive constant with probability approaching one. This sharp dichotomous behavior of the test statistic yields two sequential testing algorithms that consistently estimate the true number of latent classes. Extensive experimental studies confirm the theoretical findings and demonstrate their accuracy and reliability in determining the number of latent classes.
【18】How many asymmetric communities are there in multi-layer directed networks?
标题:多层有向网络中有多少个不对称社区?
链接:https://arxiv.org/abs/2602.21569
作者:Huan Qing
备注:44 pages, 4 tables, 2 figures
摘要
:Estimating the asymmetric numbers of communities in multi-layer directed networks is a challenging problem due to the multi-layer structures and inherent directional asymmetry, leading to possibly different numbers of sender and receiver communities. This work addresses this issue under the multi-layer stochastic co-block model, a model for multi-layer directed networks with distinct community structures in sending and receiving sides, by proposing a novel goodness-of-fit test. The test statistic relies on the deviation of the largest singular value of an aggregated normalized residual matrix from the constant 2. The test statistic exhibits a sharp dichotomy: Under the null hypothesis of correct model specification, its upper bound converges to zero with high probability; under underfitting, the test statistic itself diverges to infinity. With this property, we develop a sequential testing procedure that searches through candidate pairs of sender and receiver community numbers in a lexicographic order. The process stops at the smallest such pair where the test statistic drops below a decaying threshold. For robustness, we also propose a ratio-based variant algorithm, which detects sharp changes in the sequence of test statistics by comparing consecutive candidates. Both methods are proven to consistently determine the true numbers of sender and receiver communities under the multi-layer stochastic co-block model.
其他(44篇)
【1】Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets
标题:在翻译中受益:基准和数据集自动翻译的高效管道
链接:https://arxiv.org/abs/2602.22207
作者:Hanna Yukhymenko,Anton Alexandrov,Martin Vechev
摘要:The reliability of multilingual Large Language Model (LLM) evaluation is currently compromised by the inconsistent quality of translated benchmarks. Existing resources often suffer from semantic drift and context loss, which can lead to misleading performance metrics. In this work, we present a fully automated framework designed to address these challenges by enabling scalable, high-quality translation of datasets and benchmarks. We demonstrate that adapting test-time compute scaling strategies, specifically Universal Self-Improvement (USI) and our proposed multi-round ranking method, T-RANK, allows for significantly higher quality outputs compared to traditional pipelines. Our framework ensures that benchmarks preserve their original task structure and linguistic nuances during localization. We apply this approach to translate popular benchmarks and datasets into eight Eastern and Southern European languages (Ukrainian, Bulgarian, Slovak, Romanian, Lithuanian, Estonian, Turkish, Greek). Evaluations using both reference-based metrics and LLM-as-a-judge show that our translations surpass existing resources, resulting in more accurate downstream model assessment. We release both the framework and the improved benchmarks to facilitate robust and reproducible multilingual AI development.
【2】On Imbalanced Regression with Hoeffding Trees
标题:关于Hoeffding树的不平衡回归
链接:https://arxiv.org/abs/2602.22101
作者:Pantia-Marina Alchirch,Dimitrios I. Diochnos
备注:13 pages, 6 figures, 1 table, 2 algorithms, authors' version of paper accepted in PAKDD 2026 special session on Data Science: Foundations and Applications (DSFA)
摘要:Many real-world applications provide a continuous stream of data that is subsequently used by machine learning models to solve regression tasks of interest. Hoeffding trees and their variants have a long-standing tradition due to their effectiveness, either alone or as base models in broader ensembles. At the same time a recent line of work in batch learning has shown that kernel density estimation (KDE) is an effective approach for smoothed predictions in imbalanced regression tasks [Yang et al., 2021]. Moreover, another recent line of work for batch learning, called hierarchical shrinkage (HS) [Agarwal et al., 2022], has introduced a post-hoc regularization method for decision trees that does not alter the structure of the learned tree. Using a telescoping argument we cast KDE to streaming environments and extend the implementation of HS to incremental decision tree models. Armed with these extensions we investigate the performance of decision trees that may enjoy such options in datasets commonly used for regression in online settings. We conclude that KDE is beneficial in the early parts of the stream, while HS hardly, if ever, offers performance benefits. Our code is publicly available at: https://github.com/marinaAlchirch/DSFA_2026.
【3】FlowCorrect: Efficient Interactive Correction of Generative Flow Policies for Robotic Manipulation
标题:FlowCorrect:机器人操纵生成流策略的有效交互式修正
链接:https://arxiv.org/abs/2602.22056
作者:Edgar Welte,Yitian Shi,Rosa Wolf,Maximillian Gilles,Rania Rayyes
备注:8 pages, 5 figures
摘要:Generative manipulation policies can fail catastrophically under deployment-time distribution shift, yet many failures are near-misses: the robot reaches almost-correct poses and would succeed with a small corrective motion. We present FlowCorrect, a deployment-time correction framework that converts near-miss failures into successes using sparse human nudges, without full policy retraining. During execution, a human provides brief corrective pose nudges via a lightweight VR interface. FlowCorrect uses these sparse corrections to locally adapt the policy, improving actions without retraining the backbone while preserving the model performance on previously learned scenarios. We evaluate on a real-world robot across three tabletop tasks: pick-and-place, pouring, and cup uprighting. With a low correction budget, FlowCorrect improves success on hard cases by 85\% while preserving performance on previously solved scenarios. The results demonstrate clearly that FlowCorrect learns only with very few demonstrations and enables fast and sample-efficient incremental, human-in-the-loop corrections of generative visuomotor policies at deployment time in real-world robotics.
【4】Function-Space Empirical Bayes Regularisation with Student's t Priors
标题:具有学生t先验的功能空间经验性Bayes正规化
链接:https://arxiv.org/abs/2602.22015
作者:Pengcheng Hao,Ercan Engin Kuruoglu
摘要:Bayesian deep learning (BDL) has emerged as a principled approach to produce reliable uncertainty estimates by integrating deep neural networks with Bayesian inference, and the selection of informative prior distributions remains a significant challenge. Various function-space variational inference (FSVI) regularisation methods have been presented, assigning meaningful priors over model predictions. However, these methods typically rely on a Gaussian prior, which fails to capture the heavy-tailed statistical characteristics inherent in neural network outputs. By contrast, this work proposes a novel function-space empirical Bayes regularisation framework -- termed ST-FS-EB -- which employs heavy-tailed Student's $t$ priors in both parameter and function spaces. Also, we approximate the posterior distribution through variational inference (VI), inducing an evidence lower bound (ELBO) objective based on Monte Carlo (MC) dropout. Furthermore, the proposed method is evaluated against various VI-based BDL baselines, and the results demonstrate its robust performance in in-distribution prediction, out-of-distribution (OOD) detection and handling distribution shifts.
【5】Compact Circulant Layers with Spectral Priors
标题:具有光谱先验的紧凑循环层
链接:https://arxiv.org/abs/2602.21965
作者:Joseph Margaryan,Thomas Hamelryck
摘要:Critical applications in areas such as medicine, robotics and autonomous systems require compact (i.e., memory efficient), uncertainty-aware neural networks suitable for edge and other resource-constrained deployments. We study compact spectral circulant and block-circulant-with-circulant-blocks (BCCB) layers: FFT-diagonalizable circular convolutions whose weights live directly in the real FFT (RFFT) half (1D) or half-plane (2D). Parameterizing filters in the frequency domain lets us impose simple spectral structure, perform structured variational inference in a low-dimensional weight space, and calculate exact layer spectral norms, enabling inexpensive global Lipschitz bounds and margin-based robustness diagnostics. By placing independent complex Gaussians on the Hermitian support we obtain a discrete instance of the spectral representation of stationary kernels, inducing an exact stationary Gaussian-process prior over filters on the discrete circle/torus. We exploit this to define a practical spectral prior and a Hermitian-aware low-rank-plus-diagonal variational posterior in real coordinates. Empirically, spectral circulant/BCCB layers are effective compact building blocks in both (variational) Bayesian and point estimate regimes: compact Bayesian neural networks on MNIST->Fashion-MNIST, variational heads on frozen CIFAR-10 features, and deterministic ViT projections on CIFAR-10/Tiny ImageNet; spectral layers match strong baselines while using substantially fewer parameters and with tighter Lipschitz certificates.
【6】Bridging Through Absence: How Comeback Researchers Bridge Knowledge Gaps Through Structural Re-emergence
标题:弥合缺席:卷土重来的研究人员如何通过结构性重新出现弥合知识差距
链接:https://arxiv.org/abs/2602.21926
作者:Somyajit Chakraborty,Angshuman Jana,Avijit Gayen
备注:Preprint; 25 pages, 14 figures, 7 tables, Submitted to Scientometrics 2025
摘要:Understanding the role of researchers who return to academia after prolonged inactivity, termed "comeback researchers", is crucial for developing inclusive models of scientific careers. This study investigates the structural and semantic behaviors of comeback researchers, focusing on their role in cross-disciplinary knowledge transfer and network reintegration. Using the AMiner citation dataset, we analyze 113,637 early-career researchers and identify 1,425 comeback cases based on a three-year-or-longer publication gap followed by renewed activity. We find that comeback researchers cite 126% more distinct communities and exhibit 7.6% higher bridging scores compared to dropouts. They also demonstrate 74% higher gap entropy, reflecting more irregular yet strategically impactful publication trajectories. Predictive models trained on these bridging- and entropy-based features achieve a 97% ROC-AUC, far outperforming the 54% ROC-AUC of baseline models using traditional metrics like publication count and h-index. Finally, we substantiate these results via a multi-lens validation. These findings highlight the unique contributions of comeback researchers and offer data-driven tools for their early identification and institutional support.
【7】2-Step Agent: A Framework for the Interaction of a Decision Maker with AI Decision Support
标题:2-Step Agent:决策者与人工智能决策支持交互的框架
链接:https://arxiv.org/abs/2602.21889
作者:Otto Nyberg,Fausto Carcassi,Giovanni Cinà
备注:17 pages, 17 figures
摘要:Across a growing number of fields, human decision making is supported by predictions from AI models. However, we still lack a deep understanding of the effects of adoption of these technologies. In this paper, we introduce a general computational framework, the 2-Step Agent, which models the effects of AI-assisted decision making. Our framework uses Bayesian methods for causal inference to model 1) how a prediction on a new observation affects the beliefs of a rational Bayesian agent, and 2) how this change in beliefs affects the downstream decision and subsequent outcome. Using this framework, we show by simulations how a single misaligned prior belief can be sufficient for decision support to result in worse downstream outcomes compared to no decision support. Our results reveal several potential pitfalls of AI-driven decision support and highlight the need for thorough model documentation and proper user training.
【8】Distill and Align Decomposition for Enhanced Claim Verification
标题:提取和对齐分解以增强索赔验证
链接:https://arxiv.org/abs/2602.21857
作者:Jabez Magomere,Elena Kochkina,Samuel Mensah,Simerjot Kaur,Fernando Acero,Arturo Oncevay,Charese H. Smiley,Xiaomo Liu,Manuela Veloso
备注:EACL Findings 2026
摘要:Complex claim verification requires decomposing sentences into verifiable subclaims, yet existing methods struggle to align decomposition quality with verification performance. We propose a reinforcement learning (RL) approach that jointly optimizes decomposition quality and verifier alignment using Group Relative Policy Optimization (GRPO). Our method integrates: (i) structured sequential reasoning; (ii) supervised finetuning on teacher-distilled exemplars; and (iii) a multi-objective reward balancing format compliance, verifier alignment, and decomposition quality. Across six evaluation settings, our trained 8B decomposer improves downstream verification performance to (71.75%) macro-F1, outperforming prompt-based approaches ((+1.99), (+6.24)) and existing RL methods ((+5.84)). Human evaluation confirms the high quality of the generated subclaims. Our framework enables smaller language models to achieve state-of-the-art claim verification by jointly optimising for verification accuracy and decomposition quality.
【9】Excitation: Momentum For Experts
标题:兴奋:专家的动力
链接:https://arxiv.org/abs/2602.21798
作者:Sagi Shaier
摘要:We propose Excitation, a novel optimization framework designed to accelerate learning in sparse architectures such as Mixture-of-Experts (MoEs). Unlike traditional optimizers that treat all parameters uniformly, Excitation dynamically modulates updates using batch-level expert utilization. It introduces a competitive update dynamic that amplifies updates to highly-utilized experts and can selectively suppress low-utilization ones, effectively sharpening routing specialization. Notably, we identify a phenomenon of "structural confusion" in deep MoEs, where standard optimizers fail to establish functional signal paths; Excitation acts as a specialization catalyst, "rescuing" these models and enabling stable training where baselines remain trapped. Excitation is optimizer-, domain-, and model-agnostic, requires minimal integration effort, and introduces neither additional per-parameter optimizer state nor learnable parameters, making it highly viable for memory-constrained settings. Across language and vision tasks, Excitation consistently improves convergence speed and final performance in MoE models, indicating that active update modulation is a key mechanism for effective conditional computation.
【10】DHP: Efficient Scaling of MLLM Training with Dynamic Hybrid Parallelism
标题:DHP:利用动态混合并行主义有效扩展MLLM训练
链接:https://arxiv.org/abs/2602.21788
作者:Yifan Niu,Han Xiao,Dongyi Liu,Wei Zhou,Jia Li
摘要:Scaling long-context capabilities is crucial for Multimodal Large Language Models (MLLMs). However, real-world multimodal datasets are extremely heterogeneous. Existing training frameworks predominantly rely on static parallelism strategies, which suffer from severe load imbalance, redundant communication, and suboptimal hardware utilization under data heterogeneity. In this work, we propose Dynamic Hybrid Parallelism (DHP), an efficient parallelism strategy that adaptively reconfigures communication groups and parallelism degrees during MLLM training. We generalize the non-power-of-two parallelism degrees and develop a polynomial-time algorithm to generate near-optimal parallelism strategies with only millisecond-level overhead per training batch. DHP is able to maintain high hardware efficiency even under extreme data variability. Experimental results demonstrate that DHP significantly outperforms Megatron-LM and DeepSpeed, achieving up to 1.36 $\times$ speedup in training throughput while maintaining near-linear scaling efficiency across large-scale NPU clusters.
【11】Therapist-Robot-Patient Physical Interaction is Worth a Thousand Words: Enabling Intuitive Therapist Guidance via Remote Haptic Control
标题:治疗师-机器人-患者的物理互动价值千言万语:通过远程触觉控制实现直观的治疗师指导
链接:https://arxiv.org/abs/2602.21783
作者:Beatrice Luciani,Alex van den Berg,Matti Lang,Alexandre L. Ratschat,Laura Marchal-Crespo
备注:14 pages, 5 figures, 3 tables
摘要:Robotic systems can enhance the amount and repeatability of physically guided motor training. Yet their real-world adoption is limited, partly due to non-intuitive trainer/therapist-trainee/patient interactions. To address this gap, we present a haptic teleoperation system for trainers to remotely guide and monitor the movements of a trainee wearing an arm exoskeleton. The trainer can physically interact with the exoskeleton through a commercial handheld haptic device via virtual contact points at the exoskeleton's elbow and wrist, allowing intuitive guidance. Thirty-two participants tested the system in a trainer-trainee paradigm, comparing our haptic demonstration system with conventional visual demonstration in guiding trainees in executing arm poses. Quantitative analyses showed that haptic demonstration significantly reduced movement completion time and improved smoothness, while speech analysis using large language models for automated transcription and categorization of verbal commands revealed fewer verbal instructions. The haptic demonstration did not result in higher reported mental and physical effort by trainers compared to the visual demonstration, while trainers reported greater competence and trainees lower physical demand. These findings support the feasibility of our proposed interface for effective remote human-robot physical interaction. Future work should assess its usability and efficacy for clinical populations in restoring clinicians' sense of agency during robot-assisted therapy.
【12】Generalisation of RLHF under Reward Shift and Clipped KL Regularisation
标题:奖励转移和KL规范化下的RL HF推广
链接:https://arxiv.org/abs/2602.21765
作者:Kenton Tang,Yuzhu Chen,Fengxiang He
摘要:Alignment and adaptation in large language models heavily rely on reinforcement learning from human feedback (RLHF); yet, theoretical understanding of its generalisability remains premature, especially when the learned reward could shift, and the KL control is estimated and clipped. To address this issue, we develop generalisation theory for RLHF that explicitly accounts for (1) \emph{reward shift}: reward models are trained on preference data from earlier or mixed behaviour policies while RLHF optimises the current policy on its own rollouts; and (2) \emph{clipped KL regularisation}: the KL regulariser is estimated from sampled log-probability ratios and then clipped for stabilisation, resulting in an error to RLHF. We present generalisation bounds for RLHF, suggesting that the generalisation error stems from a sampling error from prompts and rollouts, a reward shift error, and a KL clipping error. We also discuss special cases of (1) initialising RLHF parameters with a uniform prior over a finite space, and (2) training RLHF by stochastic gradient descent, as an Ornstein-Uhlenbeck process. The theory yields practical implications in (1) optimal KL clipping threshold, and (2) budget allocation in prompts, rollouts, and preference data.
【13】C$^{2}$TC: A Training-Free Framework for Efficient Tabular Data Condensation
标题:C$^{2}$TC:用于高效表格数据压缩的免训练框架
链接:https://arxiv.org/abs/2602.21717
作者
:Sijia Xu,Fan Li,Xiaoyang Wang,Zhengyi Yang,Xuemin Lin
摘要:Tabular data is the primary data format in industrial relational databases, underpinning modern data analytics and decision-making. However, the increasing scale of tabular data poses significant computational and storage challenges to learning-based analytical systems. This highlights the need for data-efficient learning, which enables effective model training and generalization using substantially fewer samples. Dataset condensation (DC) has emerged as a promising data-centric paradigm that synthesizes small yet informative datasets to preserve data utility while reducing storage and training costs. However, existing DC methods are computationally intensive due to reliance on complex gradient-based optimization. Moreover, they often overlook key characteristics of tabular data, such as heterogeneous features and class imbalance. To address these limitations, we introduce C$^{2}$TC (Class-Adaptive Clustering for Tabular Condensation), the first training-free tabular dataset condensation framework that jointly optimizes class allocation and feature representation, enabling efficient and scalable condensation. Specifically, we reformulate the dataset condensation objective into a novel class-adaptive cluster allocation problem (CCAP), which eliminates costly training and integrates adaptive label allocation to handle class imbalance. To solve the NP-hard CCAP, we develop HFILS, a heuristic local search that alternates between soft allocation and class-wise clustering to efficiently obtain high-quality solutions. Moreover, a hybrid categorical feature encoding (HCFE) is proposed for semantics-preserving clustering of heterogeneous discrete attributes. Extensive experiments on 10 real-world datasets demonstrate that C$^{2}$TC improves efficiency by at least 2 orders of magnitude over state-of-the-art baselines, while achieving superior downstream performance.
【14】Revisiting RAG Retrievers: An Information Theoretic Benchmark
标题:重温RAG检索器:信息论基准
链接:https://arxiv.org/abs/2602.21553
作者:Wenqing Zheng,Dmitri Kalaev,Noah Fatsi,Daniel Barcklow,Owen Reinert,Igor Melnyk,Senthil Kumar,C. Bayan Bruss
摘要:Retrieval-Augmented Generation (RAG) systems rely critically on the retriever module to surface relevant context for large language models. Although numerous retrievers have recently been proposed, each built on different ranking principles such as lexical matching, dense embeddings, or graph citations, there remains a lack of systematic understanding of how these mechanisms differ and overlap. Existing benchmarks primarily compare entire RAG pipelines or introduce new datasets, providing little guidance on selecting or combining retrievers themselves. Those that do compare retrievers directly use a limited set of evaluation tools which fail to capture complementary and overlapping strengths. This work presents MIGRASCOPE, a Mutual Information based RAG Retriever Analysis Scope. We revisit state-of-the-art retrievers and introduce principled metrics grounded in information and statistical estimation theory to quantify retrieval quality, redundancy, synergy, and marginal contribution. We further show that if chosen carefully, an ensemble of retrievers outperforms any single retriever. We leverage the developed tools over major RAG corpora to provide unique insights on contribution levels of the state-of-the-art retrievers. Our findings provide a fresh perspective on the structure of modern retrieval techniques and actionable guidance for designing robust and efficient RAG systems.
【15】Muon+: Towards Better Muon via One Additional Normalization Step
标题:Muon+:通过一个额外的标准化步骤迈向更好的Muon
链接:https://arxiv.org/abs/2602.21545
作者:Ruijie Zhang,Yequan Zhao,Ziyue Liu,Zhengyang Wang,Zheng Zhang
摘要:The Muon optimizer has demonstrated promising performance in pre-training large language models through gradient (or momentum) orthogonalization. In this work, we propose a simple yet effective enhancement to Muon, namely Muon+, which introduces an additional normalization step after orthogonalization. We demonstrate the effectiveness of Muon+ through extensive pre-training experiments across a wide range of model scales and architectures. Our evaluation includes GPT-style models ranging from 130M to 774M parameters and LLaMA-style models ranging from 60M to 1B parameters. We comprehensively evaluate the effectiveness of Muon+ in the compute-optimal training regime and further extend the token-to-parameter (T2P) ratio to an industrial level of $\approx 200$. Experimental results show that Muon+ provides a consistent boost on training and validation perplexity over Muon. We provide our code here: https://github.com/K1seki221/MuonPlus.
【16】LiLo-VLA: Compositional Long-Horizon Manipulation via Linked Object-Centric Policies
标题:LiLo-VLA:通过链接对象中心策略的组合长期操作
链接:https://arxiv.org/abs/2602.21531
作者:Yue Yang,Shuo Cheng,Yu Fang,Homanga Bharadhwaj,Mingyu Ding,Gedas Bertasius,Daniel Szafir
摘要:General-purpose robots must master long-horizon manipulation, defined as tasks involving multiple kinematic structure changes (e.g., attaching or detaching objects) in unstructured environments. While Vision-Language-Action (VLA) models offer the potential to master diverse atomic skills, they struggle with the combinatorial complexity of sequencing them and are prone to cascading failures due to environmental sensitivity. To address these challenges, we propose LiLo-VLA (Linked Local VLA), a modular framework capable of zero-shot generalization to novel long-horizon tasks without ever being trained on them. Our approach decouples transport from interaction: a Reaching Module handles global motion, while an Interaction Module employs an object-centric VLA to process isolated objects of interest, ensuring robustness against irrelevant visual features and invariance to spatial configurations. Crucially, this modularity facilitates robust failure recovery through dynamic replanning and skill reuse, effectively mitigating the cascading errors common in end-to-end approaches. We introduce a 21-task simulation benchmark consisting of two challenging suites: LIBERO-Long++ and Ultra-Long. In these simulations, LiLo-VLA achieves a 69% average success rate, outperforming Pi0.5 by 41% and OpenVLA-OFT by 67%. Furthermore, real-world evaluations across 8 long-horizon tasks demonstrate an average success rate of 85%. Project page: https://yy-gx.github.io/LiLo-VLA/.
【17】Training Generalizable Collaborative Agents via Strategic Risk Aversion
标题:通过战略风险规避训练可推广的协作主体
链接:https://arxiv.org/abs/2602.21515
作者:Chengrui Qu,Yizhou Zhang,Nicholas Lanzetti,Eric Mazumdar
摘要:Many emerging agentic paradigms require agents to collaborate with one another (or people) to achieve shared goals. Unfortunately, existing approaches to learning policies for such collaborative problems produce brittle solutions that fail when paired with new partners. We attribute these failures to a combination of free-riding during training and a lack of strategic robustness. To address these problems, we study the concept of strategic risk aversion and interpret it as a principled inductive bias for generalizable cooperation with unseen partners. While strategically risk-averse players are robust to deviations in their partner's behavior by design, we show that, in collaborative games, they also (1) can have better equilibrium outcomes than those at classical game-theoretic concepts like Nash, and (2) exhibit less or no free-riding. Inspired by these insights, we develop a multi-agent reinforcement learning (MARL) algorithm that integrates strategic risk aversion into standard policy optimization methods. Our empirical results across collaborative benchmarks (including an LLM collaboration task) validate our theory and demonstrate that our approach consistently achieves reliable collaboration with heterogeneous and previously unseen partners across collaborative tasks.
【18】D-Flow SGLD: Source-Space Posterior Sampling for Scientific Inverse Problems with Flow Matching
标题:D-Flow SGLD:具有流匹配的科学逆问题的源空间后验抽样
链接:https://arxiv.org/abs/2602.21469
作者:Meet Hemant Parikh,Yaqin Chen,Jian-Xun Wang
摘要:Data assimilation and scientific inverse problems require reconstructing high-dimensional physical states from sparse and noisy observations, ideally with uncertainty-aware posterior samples that remain faithful to learned priors and governing physics. While training-free conditional generation is well developed for diffusion models, corresponding conditioning and posterior sampling strategies for Flow Matching (FM) priors remain comparatively under-explored, especially on scientific benchmarks where fidelity must be assessed beyond measurement misfit. In this work, we study training-free conditional generation for scientific inverse problems under FM priors and organize existing inference-time strategies by where measurement information is injected: (i) guided transport dynamics that perturb sampling trajectories using likelihood information, and (ii) source-distribution inference that performs posterior inference over the source variable while keeping the learned transport fixed. Building on the latter, we propose D-Flow SGLD, a source-space posterior sampling method that augments differentiable source inference with preconditioned stochastic gradient Langevin dynamics, enabling scalable exploration of the source posterior induced by new measurement operators without retraining the prior or modifying the learned FM dynamics. We benchmark representative methods from both families on a hierarchy of problems: 2D toy posteriors, chaotic Kuramoto-Sivashinsky trajectories, and wall-bounded turbulence reconstruction. Across these settings, we quantify trade-offs among measurement assimilation, posterior diversity, and physics/statistics fidelity, and establish D-Flow SGLD as a practical FM-compatible posterior sampler for scientific inverse problems.
【19】Asymptotically Fast Clebsch-Gordan Tensor Products with Vector Spherical Harmonics
标题:具有向球调和的渐进快Clebsch-Gordan张量积
链接:https://arxiv.org/abs/2602.21466
作者:YuQing Xie,Ameya Daigavane,Mit Kotak,Tess Smidt
备注:28 pages, 2 figures. arXiv admin note: text overlap with arXiv:2506.13523
摘要:$E(3)$-equivariant neural networks have proven to be effective in a wide range of 3D modeling tasks. A fundamental operation of such networks is the tensor product, which allows interaction between different feature types. Because this operation scales poorly, there has been considerable work towards accelerating this interaction. However, recently \citet{xieprice} have pointed out that most speedups come from a reduction in expressivity rather than true algorithmic improvements on computing Clebsch-Gordan tensor products. A modification of Gaunt tensor product \citep{gaunt} can give a true asymptotic speedup but is incomplete and misses many interactions. In this work, we provide the first complete algorithm which truly provides asymptotic benefits Clebsch-Gordan tensor products. For full CGTP, our algorithm brings runtime complexity from the naive $O(L^6)$ to $O(L^4\log^2 L)$, close to the lower bound of $O(L^4)$. We first show how generalizing fast Fourier based convolution naturally leads to the previously proposed Gaunt tensor product \citep{gaunt}. To remedy antisymmetry issues, we generalize from scalar signals to irrep valued signals, giving us tensor spherical harmonics. We prove a generalized Gaunt formula for the tensor harmonics. Finally, we show that we only need up to vector valued signals to recover the missing interactions of Gaunt tensor product.
【20】Provably Safe Generative Sampling with Constricting Barrier Functions
标题:具有收缩障碍函数的可证安全生成抽样
链接:https://arxiv.org/abs/2602.21429
作者:Darshan Gadginmath,Ahmed Allibhoy,Fabio Pasqualetti
备注:25 pages, 7 figures
摘要
:Flow-based generative models, such as diffusion models and flow matching models, have achieved remarkable success in learning complex data distributions. However, a critical gap remains for their deployment in safety-critical domains: the lack of formal guarantees that generated samples will satisfy hard constraints. We address this by proposing a safety filtering framework that acts as an online shield for any pre-trained generative model. Our key insight is to cooperate with the generative process rather than override it. We define a constricting safety tube that is relaxed at the initial noise distribution and progressively tightens to the target safe set at the final data distribution, mirroring the coarse-to-fine structure of the generative process itself. By characterizing this tube via Control Barrier Functions (CBFs), we synthesize a feedback control input through a convex Quadratic Program (QP) at each sampling step. As the tube is loosest when noise is high and intervention is cheapest in terms of control energy, most constraint enforcement occurs when it least disrupts the model's learned structure. We prove that this mechanism guarantees safe sampling while minimizing the distributional shift from the original model at each sampling step, as quantified by the KL divergence. Our framework applies to any pre-trained flow-based generative scheme requiring no retraining or architectural modifications. We validate the approach across constrained image generation, physically-consistent trajectory sampling, and safe robotic manipulation policies, achieving 100% constraint satisfaction while preserving semantic fidelity.
【21】Proximal-IMH: Proximal Posterior Proposals for Independent Metropolis-Hastings with Approximate Operators
标题:Proximal-IMH:具有近似运营商的独立Metropolis-Hastings的近后提案
链接:https://arxiv.org/abs/2602.21426
作者:Youguang Chen,George Biros
摘要:We consider the problem of sampling from a posterior distribution arising in Bayesian inverse problems in science, engineering, and imaging. Our method belongs to the family of independence Metropolis-Hastings (IMH) sampling algorithms, which are common in Bayesian inference. Relying on the existence of an approximate posterior distribution that is cheaper to sample from but may have significant bias, we introduce Proximal-IMH, a scheme that removes this bias by correcting samples from the approximate posterior through an auxiliary optimization problem. This yields a local adjustment that trades off adherence to the exact model against stability around the approximate reference point. For idealized settings, we prove that the proximal correction tightens the match between approximate and exact posteriors, thereby improving acceptance rates and mixing. The method applies to both linear and nonlinear input-output operators and is particularly suitable for inverse problems where exact posterior sampling is too expensive. We present numerical experiments including multimodal and data-driven priors with nonlinear input-output operators. The results show that Proximal-IMH reliably outperforms existing IMH variants.
【22】On the Structural Non-Preservation of Epistemic Behaviour under Policy Transformation
标题:论政策转型下认识行为的结构性不保存
链接:https://arxiv.org/abs/2602.21424
作者:Alexander Galozy
备注:15 pages, 3 figures. Under review at RLC 2026
摘要:Reinforcement learning (RL) agents under partial observability often condition actions on internally accumulated information such as memory or inferred latent context. We formalise such information-conditioned interaction patterns as behavioural dependency: variation in action selection with respect to internal information under fixed observations. This induces a probe-relative notion of $ε$-behavioural equivalence and a within-policy behavioural distance that quantifies probe sensitivity. We establish three structural results. First, the set of policies exhibiting non-trivial behavioural dependency is not closed under convex aggregation. Second, behavioural distance contracts under convex combination. Third, we prove a sufficient local condition under which gradient ascent on a skewed mixture objective decreases behavioural distance when a dominant-mode gradient aligns with the direction of steepest contraction. Minimal bandit and partially observable gridworld experiments provide controlled witnesses of these mechanisms. In the examined settings, behavioural distance decreases under convex aggregation and under continued optimisation with skewed latent priors, and in these experiments it precedes degradation under latent prior shift. These results identify structural conditions under which probe-conditioned behavioural separation is not preserved under common policy transformations.
【23】ECHOSAT: Estimating Canopy Height Over Space And Time
标题:ECHOSAT:估计空间和时间上的树冠高度
链接:https://arxiv.org/abs/2602.21421
作者:Jan Pauls,Karsten Schrödter,Sven Ligensa,Martin Schwartz,Berkant Turan,Max Zimmer,Sassan Saatchi,Sebastian Pokutta,Philippe Ciais,Fabian Gieseke
备注:19 pages, 12 figures, 6 tables
摘要:Forest monitoring is critical for climate change mitigation. However, existing global tree height maps provide only static snapshots and do not capture temporal forest dynamics, which are essential for accurate carbon accounting. We introduce ECHOSAT, a global and temporally consistent tree height map at 10 m resolution spanning multiple years. To this end, we resort to multi-sensor satellite data to train a specialized vision transformer model, which performs pixel-level temporal regression. A self-supervised growth loss regularizes the predictions to follow growth curves that are in line with natural tree development, including gradual height increases over time, but also abrupt declines due to forest loss events such as fires. Our experimental evaluation shows that our model improves state-of-the-art accuracies in the context of single-year predictions. We also provide the first global-scale height map that accurately quantifies tree growth and disturbances over time. We expect ECHOSAT to advance global efforts in carbon monitoring and disturbance assessment. The maps can be accessed at https://github.com/ai4forest/echosat.
【24】Generative Bayesian Computation as a Scalable Alternative to Gaussian Process Surrogates
标题:生成性Bayesian计算作为高斯过程替代品的可扩展替代品
链接:https://arxiv.org/abs/2602.21408
作者:Nick Polson,Vadim Sokolov
摘要:Gaussian process (GP) surrogates are the default tool for emulating expensive computer experiments, but cubic cost, stationarity assumptions, and Gaussian predictive distributions limit their reach. We propose Generative Bayesian Computation (GBC) via Implicit Quantile Networks (IQNs) as a surrogate framework that targets all three limitations. GBC learns the full conditional quantile function from input--output pairs; at test time, a single forward pass per quantile level produces draws from the predictive distribution. Across fourteen benchmarks we compare GBC to four GP-based methods. GBC improves CRPS by 11--26\% on piecewise jump-process benchmarks, by 14\% on a ten-dimensional Friedman function, and scales linearly to 90,000 training points where dense-covariance GPs are infeasible. A boundary-augmented variant matches or outperforms Modular Jump GPs on two-dimensional jump datasets (up to 46\% CRPS improvement). In active learning, a randomized-prior IQN ensemble achieves nearly three times lower RMSE than deep GP active learning on Rocket LGBB. Overall, GBC records a favorable point estimate in 12 of 14 comparisons. GPs retain an edge on smooth surfaces where their smoothness prior provides effective regularization.
【25】VCDF: A Validated Consensus-Driven Framework for Time Series Causal Discovery
标题:VEDF:一个经过验证的事件驱动框架,用于时间序列因果关系发现
链接:https://arxiv.org/abs/2602.21381
作者:Gene Yu,Ce Guo,Wayne Luk
备注:This paper has been accepted to PAKDD 2026. Please cite the proceedings version when available
摘要:Time series causal discovery is essential for understanding dynamic systems, yet many existing methods remain sensitive to noise, non-stationarity, and sampling variability. We propose the Validated Consensus-Driven Framework (VCDF), a simple and method-agnostic layer that improves robustness by evaluating the stability of causal relations across blocked temporal subsets. VCDF requires no modification to base algorithms and can be applied to methods such as VAR-LiNGAM and PCMCI. Experiments on synthetic datasets show that VCDF improves VAR-LiNGAM by approximately 0.08-0.12 in both window and summary F1 scores across diverse data characteristics, with gains most pronounced for moderate-to-long sequences. The framework also benefits from longer sequences, yielding up to 0.18 absolute improvement on time series of length 1000 and above. Evaluations on simulated fMRI data and IT-monitoring scenarios further demonstrate enhanced stability and structural accuracy under realistic noise conditions. VCDF provides an effective reliability layer for time series causal discovery without altering underlying modeling assumptions.
【26】Interleaved Head Attention
标题:交错的头部注意力
链接:https://arxiv.org/abs/2602.21371
作者:Sai Surya Duvvuri,Chanakya Ekbote,Rachit Bansal,Rishabh Tiwari,Devvrit Khatri,David Brandfonbrener,Paul Liang,Inderjit Dhillon,Manzil Zaheer
摘要:Multi-Head Attention (MHA) is the core computational primitive underlying modern Large Language Models (LLMs). However, MHA suffers from a fundamental linear scaling limitation: $H$ attention heads produce exactly $H$ independent attention matrices, with no communication between heads during attention computation. This becomes problematic for multi-step reasoning, where correct answers depend on aggregating evidence from multiple parts of the context and composing latent token-to-token relations over a chain of intermediate inferences. To address this, we propose Interleaved Head Attention (IHA), which enables cross-head mixing by constructing $P$ pseudo-heads per head (typically $P=H$), where each pseudo query/key/value is a learned linear combination of all $H$ original queries, keys and values respectively. Interactions between pseudo-query and pseudo-key heads induce up to $P^2$ attention patterns per head with modest parameter overhead $\mathcal{O}(H^2P)$. We provide theory showing improved efficiency in terms of number of parameters on the synthetic Polynomial task (IHA uses $Θ(\sqrt{k}n^2)$ parameters vs. $Θ(kn^2)$ for MHA) and on the synthetic order-sensitive CPM-3 task (IHA uses $\lceil\sqrt{N_{\max}}\rceil$ heads vs. $N_{\max}$ for MHA). On real-world benchmarks, IHA improves Multi-Key retrieval on RULER by 10-20% (4k-16k) and, after fine-tuning for reasoning on OpenThoughts, improves GSM8K by 5.8% and MATH-500 by 2.8% (Majority Vote) over full attention.
【27】Black-Box Reliability Certification for AI Agents via Self-Consistency Sampling and Conformal Calibration
标题:通过自一致性抽样和保形校准进行人工智能代理的黑匣子可靠性认证
链接:https://arxiv.org/abs/2602.21368
作者:Charafeddine Mouzouni
备注:41 pages, 11 figures, 10 tables, including appendices
摘要:Given a black-box AI system and a task, at what confidence level can a practitioner trust the system's output? We answer with a reliability level -- a single number per system-task pair, derived from self-consistency sampling and conformal calibration, that serves as a black-box deployment gate with exact, finite-sample, distribution-free guarantees. Self-consistency sampling reduces uncertainty exponentially; conformal calibration guarantees correctness within 1/(n+1) of the target level, regardless of the system's errors -- made transparently visible through larger answer sets for harder questions. Weaker models earn lower reliability levels (not accuracy -- see Definition 2.4): GPT-4.1 earns 94.6% on GSM8K and 96.8% on TruthfulQA, while GPT-4.1-nano earns 89.8% on GSM8K and 66.5% on MMLU. We validate across five benchmarks, five models from three families, and both synthetic and real data. Conditional coverage on solvable items exceeds 0.93 across all configurations; sequential stopping reduces API costs by around 50%.
【28】Towards Controllable Video Synthesis of Routine and Rare OR Events
标题:常规和罕见手术事件的可控视频合成
链接:https://arxiv.org/abs/2602.21365
作者:Dominik Schneider,Lalithkumar Seenivasan,Sampath Rapuri,Vishalroshan Anil,Aiza Maksutova,Yiqing Shen,Jan Emily Mangulabnan,Hao Ding,Jose L. Porras,Masaru Ishii,Mathias Unberath
备注:Accepted to IPCAI 2026 and submitted to IJCARs
摘要
:Purpose: Curating large-scale datasets of operating room (OR) workflow, encompassing rare, safety-critical, or atypical events, remains operationally and ethically challenging. This data bottleneck complicates the development of ambient intelligence for detecting, understanding, and mitigating rare or safety-critical events in the OR. Methods: This work presents an OR video diffusion framework that enables controlled synthesis of rare and safety-critical events. The framework integrates a geometric abstraction module, a conditioning module, and a fine-tuned diffusion model to first transform OR scenes into abstract geometric representations, then condition the synthesis process, and finally generate realistic OR event videos. Using this framework, we also curate a synthetic dataset to train and validate AI models for detecting near-misses of sterile-field violations. Results: In synthesizing routine OR events, our method outperforms off-the-shelf video diffusion baselines, achieving lower FVD/LPIPS and higher SSIM/PSNR in both in- and out-of-domain datasets. Through qualitative results, we illustrate its ability for controlled video synthesis of counterfactual events. An AI model trained and validated on the generated synthetic data achieved a RECALL of 70.13% in detecting near safety-critical events. Finally, we conduct an ablation study to quantify performance gains from key design choices. Conclusion: Our solution enables controlled synthesis of routine and rare OR events from abstract geometric representations. Beyond demonstrating its capability to generate rare and safety-critical scenarios, we show its potential to support the development of ambient intelligence models.
【29】Efficient Opportunistic Approachability
标题:高效的经验主义接近性
链接:https://arxiv.org/abs/2602.21328
作者:Teodor Vanislavov Marinov,Mehryar Mohri,Princewill Okoroafor,Jon Schneider,Julian Zimmert
摘要:We study the problem of opportunistic approachability: a generalization of Blackwell approachability where the learner would like to obtain stronger guarantees (i.e., approach a smaller set) when their adversary limits themselves to a subset of their possible action space. Bernstein et al. (2014) introduced this problem in 2014 and presented an algorithm that guarantees sublinear approachability rates for opportunistic approachability. However, this algorithm requires the ability to produce calibrated online predictions of the adversary's actions, a problem whose standard implementations require time exponential in the ambient dimension and result in approachability rates that scale as $T^{-O(1/d)}$. In this paper, we present an efficient algorithm for opportunistic approachability that achieves a rate of $O(T^{-1/4})$ (and an inefficient one that achieves a rate of $O(T^{-1/3})$), bypassing the need for an online calibration subroutine. Moreover, in the case where the dimension of the adversary's action set is at most two, we show it is possible to obtain the optimal rate of $O(T^{-1/2})$.
【30】Equitable Evaluation via Elicitation
标题:通过激励进行公平评价
链接:https://arxiv.org/abs/2602.21327
作者:Elbert Du,Cynthia Dwork,Lunjia Hu,Reid McIlroy-Young,Han Shao,Linjun Zhang
备注:27 pages, 3 figures, 2 tables
摘要:Individuals with similar qualifications and skills may vary in their demeanor, or outward manner: some tend toward self-promotion while others are modest to the point of omitting crucial information. Comparing the self-descriptions of equally qualified job-seekers with different self-presentation styles is therefore problematic. We build an interactive AI for skill elicitation that provides accurate determination of skills while simultaneously allowing individuals to speak in their own voice. Such a system can be deployed, for example, when a new user joins a professional networking platform, or when matching employees to needs during a company reorganization. To obtain sufficient training data, we train an LLM to act as synthetic humans. Elicitation mitigates endogenous bias arising from individuals' own self-reports. To address systematic model bias we enforce a mathematically rigorous notion of equitability ensuring that the covariance between self-presentation manner and skill evaluation error is small.
【31】Dynamic Symmetric Point Tracking: Tackling Non-ideal Reference in Analog In-memory Training
标题:动态对称点跟踪:解决模拟记忆训练中的非理想参考
链接:https://arxiv.org/abs/2602.21321
作者:Quan Xiao,Jindan Li,Zhaoxian Wu,Tayfun Gokmen,Tianyi Chen
摘要:Analog in-memory computing (AIMC) performs computation directly within resistive crossbar arrays, offering an energy-efficient platform to scale large vision and language models. However, non-ideal analog device properties make the training on AIMC devices challenging. In particular, its update asymmetry can induce a systematic drift of weight updates towards a device-specific symmetric point (SP), which typically does not align with the optimum of the training objective. To mitigate this bias, most existing works assume the SP is known and pre-calibrate it to zero before training by setting the reference point as the SP. Nevertheless, calibrating AIMC devices requires costly pulse updates, and residual calibration error can directly degrade training accuracy. In this work, we present the first theoretical characterization of the pulse complexity of SP calibration and the resulting estimation error. We further propose a dynamic SP estimation method that tracks the SP during model training, and establishes its convergence guarantees. In addition, we develop an enhanced variant based on chopping and filtering techniques from digital signal processing. Numerical experiments demonstrate both the efficiency and effectiveness of the proposed method.
【32】Precedence-Constrained Decision Trees and Coverings
标题:优先约束决策树和覆盖
链接:https://arxiv.org/abs/2602.21312
作者:Michał Szyfelbein,Dariusz Dereniowski
摘要:This work considers a number of optimization problems and reductive relations between them. The two main problems we are interested in are the \emph{Optimal Decision Tree} and \emph{Set Cover}. We study these two fundamental tasks under precedence constraints, that is, if a test (or set) $X$ is a predecessor of $Y$, then in any feasible decision tree $X$ needs to be an ancestor of $Y$ (or respectively, if $Y$ is added to set cover, then so must be $X$). For the Optimal Decision Tree we consider two optimization criteria: worst case identification time (height of the tree) or the average identification time. Similarly, for the Set Cover we study two cost measures: the size of the cover or the average cover time. Our approach is to develop a number of algorithmic reductions, where an approximation algorithm for one problem provides an approximation for another via a black-box usage of a procedure for the former. En route we introduce other optimization problems either to complete the `reduction landscape' or because they hold the essence of combinatorial structure of our problems. The latter is brought by a problem of finding a maximum density precedence closed subfamily, where the density is defined as the ratio of the number of items the family covers to its size. By doing so we provide $\cO^*(\sqrt{m})$-approximation algorithms for all of the aforementioned problems. The picture is complemented by a number of hardness reductions that provide $o(m^{1/12-ε})$-inapproximability results for the decision tree and covering problems. Besides giving a complete set of results for general precedence constraints, we also provide polylogarithmic approximation guarantees for two most typically studied and applicable precedence types, outforests and inforests. By providing corresponding hardness results, we show these results to be tight.
【33】Robust AI Evaluation through Maximal Lotteries
标题:通过最大彩票进行稳健的人工智能评估
链接:https://arxiv.org/abs/2602.21297
作者:Hadi Khalaf,Serena L. Wang,Daniel Halpern,Itai Shapira,Flavio du Pin Calmon,Ariel D. Procaccia
摘要:The standard way to evaluate language models on subjective tasks is through pairwise comparisons: an annotator chooses the "better" of two responses to a prompt. Leaderboards aggregate these comparisons into a single Bradley-Terry (BT) ranking, forcing heterogeneous preferences into a total order and violating basic social-choice desiderata. In contrast, social choice theory provides an alternative approach called maximal lotteries, which aggregates pairwise preferences without imposing any assumptions on their structure. However, we show that maximal lotteries are highly sensitive to preference heterogeneity and can favor models that severely underperform on specific tasks or user subpopulations. We introduce robust lotteries that optimize worst-case performance under plausible shifts in the preference data. On large-scale preference datasets, robust lotteries provide more reliable win rate guarantees across the annotator distribution and recover a stable set of top-performing models. By moving from rankings to pluralistic sets of winners, robust lotteries offer a principled step toward an ecosystem of complementary AI systems that serve the full spectrum of human preferences.
【34】Make Every Draft Count: Hidden State based Speculative Decoding
标题:让每一份草稿都有意义:基于隐藏状态的猜测解码
链接:https://arxiv.org/abs/2602.21224
作者:Yuetao Chen,Xuliang Wang,Xinzhou Zheng,Ming Li,Peng Wang,Hong Xu
摘要:Speculative decoding has emerged as a pivotal technique to accelerate LLM inference by employing a lightweight draft model to generate candidate tokens that are subsequently verified by the target model in parallel. However, while this paradigm successfully increases the arithmetic intensity of memory-bound inference, it causes significant compute inefficiency: the majority of draft tokens fail verification and are discarded, resulting in waste of computation. Motivated by the goal of recollecting this wasted computation, we propose a novel system that transforms discarded drafts into reusable tokens. Our key insight is to perform auto-regressive prediction at the hidden states level and postpone the integrating token information after the hidden states generation, so the draft hidden states are not contaminated by incorrect tokens, enabling hidden state reuse. To implement such a system, first we introduce a draft model architecture based on auto-regressive hidden states, which preserves richer semantics than token-based drafters to facilitate draft repurposing. Second, we design an efficient token information injection mechanism that leverages our specialized draft model to construct high-quality draft token trees and enables resampling tokens from verification failures. Third, we eliminate the overhead hidden in our design to further maximize hardware utilization. We conducted extensive evaluations against various baselines, demonstrating up to a 3.3x speedup against standard speculative decoding.
【35】Task-Aware LoRA Adapter Composition via Similarity Retrieval in Vector Databases
标题:通过载体数据库中相似性检索实现任务感知LoRA适配器组合
链接:https://arxiv.org/abs/2602.21222
作者:Riya Adsul,Balachandra Devarangadi Sunil,Isha Nalawade,Sudharshan Govindan
摘要:Parameter efficient fine tuning methods like LoRA have enabled task specific adaptation of large language models, but efficiently composing multiple specialized adapters for unseen tasks remains challenging. We present a novel framework for dynamic LoRA adapter composition that leverages similarity retrieval in vector databases to enable zero-shot generalization across diverse NLP tasks. Our approach constructs a task-aware vector database by embedding training examples from 22 datasets spanning commonsense reasoning, question answering, natural language inference, and sentiment analysis. At inference time, we retrieve the most similar training examples, compute task similarity distributions via nucleus sampling, and dynamically merge relevant LoRA adapters using retrieval weighted fusion strategies. We evaluated four merging methods Linear, Concatenation, TIES, and Magnitude Prune demonstrating that our dataset centric retrieval approach often matches or exceeds the performance of individually fine-tuned task-specific adapters. Notably, Linear merging achieves 70.95% on PIQA and 77.62% on RTE, substantially outperforming single-task baselines (46% and 52%, respectively). Our framework requires no additional retriever training, operates with frozen embeddings, and enables efficient, interpretable adapter composition. These results suggest that retrieval based dynamic merging offers a promising direction for scalable, parameter-efficient multitask learning without requiring full model retraining for each new task.
【36】Field-Theoretic Memory for AI Agents: Continuous Dynamics for Context Preservation
标题:人工智能代理的场论记忆:上下文保存的连续动态
链接:https://arxiv.org/abs/2602.21220
作者:Subhadip Mitra
备注:15 pages, 6 figures. Code: https://github.com/rotalabs/rotalabs-fieldmem
摘要:We present a memory system for AI agents that treats stored information as continuous fields governed by partial differential equations rather than discrete entries in a database. The approach draws from classical field theory: memories diffuse through semantic space, decay thermodynamically based on importance, and interact through field coupling in multi-agent scenarios. We evaluate the system on two established long-context benchmarks: LoCoMo (ACL 2024) with 300-turn conversations across 35 sessions, and LongMemEval (ICLR 2025) testing multi-session reasoning over 500+ turns. On LongMemEval, the field-theoretic approach achieves significant improvements: +116% F1 on multi-session reasoning (p<0.01, d= 3.06), +43.8% on temporal reasoning (p<0.001, d= 9.21), and +27.8% retrieval recall on knowledge updates (p<0.001, d= 5.00). Multi-agent experiments show near-perfect collective intelligence (>99.8%) through field coupling. Code is available at github.com/rotalabs/rotalabs-fieldmem.
【37】Disaster Question Answering with LoRA Efficiency and Accurate End Position
标题:具有LoRA效率和准确的最终位置的灾难问题解答
链接:https://arxiv.org/abs/2602.21212
作者:Takato Yasuno
备注:12 pages, 5 figures
摘要:Natural disasters such as earthquakes, torrential rainfall, floods, and volcanic eruptions occur with extremely low frequency and affect limited geographic areas. When individuals face disaster situations, they often experience confusion and lack the domain-specific knowledge and experience necessary to determine appropriate responses and actions. While disaster information is continuously updated, even when utilizing RAG search and large language models for inquiries, obtaining relevant domain knowledge about natural disasters and experiences similar to one's specific situation is not guaranteed. When hallucinations are included in disaster question answering, artificial misinformation may spread and exacerbate confusion. This work introduces a disaster-focused question answering system based on Japanese disaster situations and response experiences. Utilizing the cl-tohoku/bert-base-japanese-v3 + Bi-LSTM + Enhanced Position Heads architecture with LoRA efficiency optimization, we achieved 70.4\% End Position accuracy with only 5.7\% of the total parameters (6.7M/117M). Experimental results demonstrate that the combination of Japanese BERT-base optimization and Bi-LSTM contextual understanding achieves accuracy levels suitable for real disaster response scenarios, attaining a 0.885 Span F1 score. Future challenges include: establishing natural disaster Q\&A benchmark datasets, fine-tuning foundation models with disaster knowledge, developing lightweight and power-efficient edge AI Disaster Q\&A applications for situations with insufficient power and communication during disasters, and addressing disaster knowledge base updates and continual learning capabilities.
【38】Exploiting Low-Rank Structure in Max-K-Cut Problems
标题:利用Max-K-Cut问题中的低等级结构
链接:https://arxiv.org/abs/2602.20376
作者:Ria Stevens,Fangshuo Liao,Barbara Su,Jianqiang Li,Anastasios Kyrillidis
摘要:We approach the Max-3-Cut problem through the lens of maximizing complex-valued quadratic forms and demonstrate that low-rank structure in the objective matrix can be exploited, leading to alternative algorithms to classical semidefinite programming (SDP) relaxations and heuristic techniques. We propose an algorithm for maximizing these quadratic forms over a domain of size $K$ that enumerates and evaluates a set of $O\left(n^{2r-1}\right)$ candidate solutions, where $n$ is the dimension of the matrix and $r$ represents the rank of an approximation of the objective. We prove that this candidate set is guaranteed to include the exact maximizer when $K=3$ (corresponding to Max-3-Cut) and the objective is low-rank, and provide approximation guarantees when the objective is a perturbation of a low-rank matrix. This construction results in a family of novel, inherently parallelizable and theoretically-motivated algorithms for Max-3-Cut. Extensive experimental results demonstrate that our approach achieves performance comparable to existing algorithms across a wide range of graphs, while being highly scalable.
【39】Coarsening Bias from Variable Discretization in Causal Functionals
标题:因果函数中变量离散化的粗糙化偏差
链接:https://arxiv.org/abs/2602.22083
作者:Xiaxian Ou,Razieh Nabi
摘要:A class of causal effect functionals requires integration over conditional densities of continuous variables, as in mediation effects and nonparametric identification in causal graphical models. Estimating such densities and evaluating the resulting integrals can be statistically and computationally demanding. A common workaround is to discretize the variable and replace integrals with finite sums. Although convenient, discretization alters the population-level functional and can induce non-negligible approximation bias, even under correct identification. Under smoothness conditions, we show that this coarsening bias is first order in the bin width and arises at the level of the target functional, distinct from statistical estimation error. We propose a simple bias-reduced functional that evaluates the outcome regression at within-bin conditional means, eliminating the leading term and yielding a second-order approximation error. We derive plug-in and one-step estimators for the bias-reduced functional. Simulations demonstrate substantial bias reduction and near-nominal confidence interval coverage, even under coarse binning. Our results provide a simple framework for controlling the impact of variable discretization on parameter approximation and estimation.
【40】A Researcher's Guide to Empirical Risk Minimization
标题:研究人员经验风险最小化指南
链接:https://arxiv.org/abs/2602.21501
作者:Lars van der Laan
摘要:This guide develops high-probability regret bounds for empirical risk minimization (ERM). The presentation is modular: we state broadly applicable guarantees under high-level conditions and give tools for verifying them for specific losses and function classes. We emphasize that many ERM rate derivations can be organized around a three-step recipe -- a basic inequality, a uniform local concentration bound, and a fixed-point argument -- which yields regret bounds in terms of a critical radius, defined via localized Rademacher complexity, under a mild Bernstein-type variance--risk condition. To make these bounds concrete, we upper bound the critical radius using local maximal inequalities and metric-entropy integrals, recovering familiar rates for VC-subgraph, Sobolev/Hölder, and bounded-variation classes. We also review ERM with nuisance components -- including weighted ERM and Neyman-orthogonal losses -- as they arise in causal inference, missing data, and domain adaptation. Following the orthogonal learning framework, we highlight that these problems often admit regret-transfer bounds linking regret under an estimated loss to population regret under the target loss. These bounds typically decompose regret into (i) statistical error under the estimated (optimized) loss and (ii) approximation error due to nuisance estimation. Under sample splitting or cross-fitting, the first term can be controlled using standard fixed-loss ERM regret bounds, while the second term depends only on nuisance-estimation accuracy. We also treat the in-sample regime, where nuisances and the ERM are fit on the same data, deriving regret bounds and giving sufficient conditions for fast rates.
【41】Global Sequential Testing for Multi-Stream Auditing
标题:多流审计的全球顺序测试
链接:https://arxiv.org/abs/2602.21479
作者:Beepul Bharti,Ambar Pal,Jeremias Sulam
摘要:Across many risk-sensitive areas, it is critical to continuously audit the performance of machine learning systems and detect any unusual behavior quickly. This can be modeled as a sequential hypothesis testing problem with $k$ incoming streams of data and a global null hypothesis that asserts that the system is working as expected across all $k$ streams. The standard global test employs a Bonferroni correction and has an expected stopping time bound of $O\left(\ln\frac{k}α\right)$ when $k$ is large and the significance level of the test, $α$, is small. In this work, we construct new sequential tests by using ideas of merging test martingales with different trade-offs in expected stopping times under different, sparse or dense alternative hypotheses. We further derive a new, balanced test that achieves an improved expected stopping time bound that matches Bonferroni's in the sparse setting but that naturally results in $O\left(\frac{1}{k}\ln\frac{1}α\right)$ under a dense alternative. We empirically demonstrate the effectiveness of our proposed tests on synthetic and real-world data.
【42】A Knowledge-Driven Approach to Music Segmentation, Music Source Separation and Cinematic Audio Source Separation
标题:知识驱动的音乐分割、音乐源分离和电影音频源分离方法
链接:https://arxiv.org/abs/2602.21476
作者:Chun-wei Ho,Sabato Marco Siniscalchi,Kai Li,Chin-Hui Lee
摘要:We propose a knowledge-driven, model-based approach to segmenting audio into single-category and mixed-category chunks with applications to source separation. "Knowledge" here denotes information associated with the data, such as music scores. "Model" here refers to tool that can be used for audio segmentation and recognition, such as hidden Markov models. In contrast to conventional learning that often relies on annotated data with given segment categories and their corresponding boundaries to guide the learning process, the proposed framework does not depend on any pre-segmented training data and learns directly from the input audio and its related knowledge sources to build all necessary models autonomously. Evaluation on simulation data shows that score-guided learning achieves very good music segmentation and separation results. Tested on movie track data for cinematic audio source separation also shows that utilizing sound category knowledge achieves better separation results than those obtained with data-driven techniques without using such information.
【43】Conditional neural control variates for variance reduction in Bayesian inverse problems
标题:条件神经控制变化以减少Bayesian逆问题中的方差
链接:https://arxiv.org/abs/2602.21357
作者:Ali Siahkoohi,Hyunwoo Oh
摘要:Bayesian inference for inverse problems involves computing expectations under posterior distributions -- e.g., posterior means, variances, or predictive quantities -- typically via Monte Carlo (MC) estimation. When the quantity of interest varies significantly under the posterior, accurate estimates demand many samples -- a cost often prohibitive for partial differential equation-constrained problems. To address this challenge, we introduce conditional neural control variates, a modular method that learns amortized control variates from joint model-data samples to reduce the variance of MC estimators. To scale to high-dimensional problems, we leverage Stein's identity to design an architecture based on an ensemble of hierarchical coupling layers with tractable Jacobian trace computation. Training requires: (i) samples from the joint distribution of unknown parameters and observed data; and (ii) the posterior score function, which can be computed from physics-based likelihood evaluations, neural operator surrogates, or learned generative models such as conditional normalizing flows. Once trained, the control variates generalize across observations without retraining. We validate our approach on stylized and partial differential equation-constrained Darcy flow inverse problems, demonstrating substantial variance reduction, even when the analytical score is replaced by a learned surrogate.
【44】Counterdiabatic Hamiltonian Monte Carlo
标题:反非绝热哈密顿蒙特卡罗
链接:https://arxiv.org/abs/2602.21272
作者:Reuben Cohn-Gordon,Uroš Seljak,Dries Sels
摘要:Hamiltonian Monte Carlo (HMC) is a state of the art method for sampling from distributions with differentiable densities, but can converge slowly when applied to challenging multimodal problems. Running HMC with a time varying Hamiltonian, in order to interpolate from an initial tractable distribution to the target of interest, can address this problem. In conjunction with a weighting scheme to eliminate bias, this can be viewed as a special case of Sequential Monte Carlo (SMC) sampling \cite{doucet2001introduction}. However, this approach can be inefficient, since it requires slow change between the initial and final distribution. Inspired by \cite{sels2017minimizing}, where a learned \emph{counterdiabatic} term added to the Hamiltonian allows for efficient quantum state preparation, we propose \emph{Counterdiabatic Hamiltonian Monte Carlo} (CHMC), which can be viewed as an SMC sampler with a more efficient kernel. We establish its relationship to recent proposals for accelerating gradient-based sampling with learned drift terms, and demonstrate on simple benchmark problems.
机器翻译由腾讯交互翻译提供,仅供参考
点击“阅读原文”获取带摘要的学术速递