机器学习学术速递[2.19]

点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.LG 方向，今日共计170篇

大模型相关(15篇)

【1】Align Once, Benefit Multilingually: Enforcing Multilingual Consistency for LLM Safety Alignment
标题：一次一致，多语言受益：强制多语言一致性以实现LLM安全一致
链接：https://arxiv.org/abs/2602.16660

作者：Yuyan Bu,Xiaohao Liu,ZhaoXing Ren,Yaodong Yang,Juntao Dai
备注：Accepted by ICLR 2026
摘要：跨语言社区广泛部署大型语言模型（LLM）需要可靠的多语言安全对齐。然而，最近将对齐扩展到其他语言的努力通常需要大量资源，无论是通过目标语言的大规模高质量监督，还是通过与高资源语言的成对对齐，这限制了可扩展性。在这项工作中，我们提出了一种资源有效的方法，以提高多语言安全对齐。我们引入了一个即插即用的多语言一致性（MLC）的损失，可以集成到现有的单语对齐管道。通过改善多语言表示向量之间的共线性，我们的方法鼓励在一个单一的更新在多语言语义水平的方向一致性。这允许仅使用多语言提示变量跨多种语言同时对齐，而无需在低资源语言中进行额外的响应级别监督。我们验证了所提出的方法在不同的模型架构和对齐范例，并证明其有效性，提高多语言的安全性，对一般模型效用的影响有限。跨语言和任务的进一步评估表明改进的跨语言泛化，表明所提出的方法是在有限监督下实现多语言一致性对齐的实用解决方案。
摘要：The widespread deployment of large language models (LLMs) across linguistic communities necessitates reliable multilingual safety alignment. However, recent efforts to extend alignment to other languages often require substantial resources, either through large-scale, high-quality supervision in the target language or through pairwise alignment with high-resource languages, which limits scalability. In this work, we propose a resource-efficient method for improving multilingual safety alignment. We introduce a plug-and-play Multi-Lingual Consistency (MLC) loss that can be integrated into existing monolingual alignment pipelines. By improving collinearity between multilingual representation vectors, our method encourages directional consistency at the multilingual semantic level in a single update. This allows simultaneous alignment across multiple languages using only multilingual prompt variants without requiring additional response-level supervision in low-resource languages. We validate the proposed method across different model architectures and alignment paradigms, and demonstrate its effectiveness in enhancing multilingual safety with limited impact on general model utility. Further evaluation across languages and tasks indicates improved cross-lingual generalization, suggesting the proposed approach as a practical solution for multilingual consistency alignment under limited supervision.

【2】Who can we trust? LLM-as-a-jury for Comparative Assessment
标题：我们可以相信谁？法学硕士作为比较评估陪审团
链接：https://arxiv.org/abs/2602.16610

作者：Mengjie Qian,Guangzhi Sun,Mark J. F. Gales,Kate M. Knill
摘要：大型语言模型（LLM）越来越多地被用作自然语言生成评估的自动评估器，通常使用成对比较判断。现有的方法通常依赖于单个判断或聚合多个判断，假设相同的可靠性。在实践中，LLM法官在不同任务和方面的表现差异很大，他们的判断概率可能是有偏见的和不一致的。此外，用于法官校准的人工标记监督可能不可用。我们首先经验证明，LLM比较概率存在不一致，并表明它限制了直接基于概率的排名的有效性。为了解决这个问题，我们研究了法学硕士作为一个陪审团设置，并提出BT-sigma，布拉德利-特里模型的一个判断感知扩展，为每个法官引入了一个参数，以共同推断项目排名和判断可靠性单独从成对比较。在基准NLG评估数据集上的实验表明，BT-sigma始终优于基于平均的聚合方法，并且学习到的LLM判断的周期一致性的独立度量与学习到的LLM判断的周期一致性的独立度量密切相关。进一步的分析表明，BT-sigma可以被解释为一种无监督的校准机制，通过建模判断可靠性来提高聚合。
摘要：Large language models (LLMs) are increasingly applied as automatic evaluators for natural language generation assessment often using pairwise comparative judgements. Existing approaches typically rely on single judges or aggregate multiple judges assuming equal reliability. In practice, LLM judges vary substantially in performance across tasks and aspects, and their judgment probabilities may be biased and inconsistent. Furthermore, human-labelled supervision for judge calibration may be unavailable. We first empirically demonstrate that inconsistencies in LLM comparison probabilities exist and show that it limits the effectiveness of direct probability-based ranking. To address this, we study the LLM-as-a-jury setting and propose BT-sigma, a judge-aware extension of the Bradley-Terry model that introduces a discriminator parameter for each judge to jointly infer item rankings and judge reliability from pairwise comparisons alone. Experiments on benchmark NLG evaluation datasets show that BT-sigma consistently outperforms averaging-based aggregation methods, and that the learned discriminator strongly correlates with independent measures of the cycle consistency of LLM judgments. Further analysis reveals that BT-sigma can be interpreted as an unsupervised calibration mechanism that improves aggregation by modelling judge reliability.

【3】From Growing to Looping: A Unified View of Iterative Computation in LLMs
标题：从成长到循环：LLM迭代计算的统一观点
链接：https://arxiv.org/abs/2602.16490

作者：Ferdinand Kapl,Emmanouil Angelis,Kaitlin Maile,Johannes von Oswald,Stefan Bauer
摘要：循环，跨深度重用一个层块，以及深度增长，通过复制中间层来训练从浅到深的模型，都与更强的推理有关，但它们的关系仍不清楚。我们提供了一个机械的统一：循环和深度增长的模型表现出收敛的深度方面的签名，包括增加对后期层的依赖和与循环或增长的块对齐的重复模式。这些共享的签名支持这样的观点，即它们的增益源于一种常见的迭代计算形式。在此基础上，我们证明了这两种技术是适应性和可组合的：将推理时间循环应用于深度增长模型的中间块，可以将某些推理原语的准确性提高高达2\times $，尽管模型从未被训练成循环。当给出更多的上下文示例或额外的监督微调数据时，这两种方法也比基线更好地适应。此外，深度增长模型在使用更高质量、数学密集的冷却混合时实现了最大的推理增益，这可以通过将中间块调整为循环来进一步提高。总体而言，我们的研究结果将深度增长和循环定位为诱导和缩放迭代计算以改善推理的互补实用方法。
摘要：Looping, reusing a block of layers across depth, and depth growing, training shallow-to-deep models by duplicating middle layers, have both been linked to stronger reasoning, but their relationship remains unclear. We provide a mechanistic unification: looped and depth-grown models exhibit convergent depth-wise signatures, including increased reliance on late layers and recurring patterns aligned with the looped or grown block. These shared signatures support the view that their gains stem from a common form of iterative computation. Building on this connection, we show that the two techniques are adaptable and composable: applying inference-time looping to the middle blocks of a depth-grown model improves accuracy on some reasoning primitives by up to $2\times$, despite the model never being trained to loop. Both approaches also adapt better than the baseline when given more in-context examples or additional supervised fine-tuning data. Additionally, depth-grown models achieve the largest reasoning gains when using higher-quality, math-heavy cooldown mixtures, which can be further boosted by adapting a middle block to loop. Overall, our results position depth growth and looping as complementary, practical methods for inducing and scaling iterative computation to improve reasoning.

【4】Intra-Fairness Dynamics: The Bias Spillover Effect in Targeted LLM Alignment
标题：公平内部动态：有针对性的LLM对齐中的偏差溢出效应
链接：https://arxiv.org/abs/2602.16438

作者：Eva Paraschou,Line Harder Clemmensen,Sneha Das
备注：Submitted to the BiAlign CHI Workshop 2026
摘要：传统的大型语言模型（LLM）公平性对齐主要集中在减轻单个敏感属性的偏差，忽略了公平性作为一个固有的多维和特定于上下文的值。这种方法有可能创建实现狭窄公平度量的系统，同时加剧非目标属性的差异，这种现象称为偏见溢出。虽然在机器学习中得到了广泛的研究，但在LLM对齐中，偏差溢出仍然严重不足。在这项工作中，我们调查了有针对性的性别对齐如何影响三个国家的最先进的法学硕士（Mistral 7B，Llama 3.1 8B，Qwen 2.5 7B）的九个敏感属性的公平性。使用直接偏好优化和BBQ基准，我们评估模糊和清晰的上下文中的公平性。我们的研究结果揭示了明显的偏见溢出：虽然总体结果显示出改善，但上下文感知分析在模糊的上下文中暴露出显着的退化，特别是在身体外观（所有模型中p< 0.001），性取向和残疾状况方面。我们证明，沿着一个属性提高公平性可能会在不确定性下无意中恶化其他人的差异，突出了上下文感知，多属性公平性评估框架的必要性。
摘要：Conventional large language model (LLM) fairness alignment largely focuses on mitigating bias along single sensitive attributes, overlooking fairness as an inherently multidimensional and context-specific value. This approach risks creating systems that achieve narrow fairness metrics while exacerbating disparities along untargeted attributes, a phenomenon known as bias spillover. While extensively studied in machine learning, bias spillover remains critically underexplored in LLM alignment. In this work, we investigate how targeted gender alignment affects fairness across nine sensitive attributes in three state-of-the-art LLMs (Mistral 7B, Llama 3.1 8B, Qwen 2.5 7B). Using Direct Preference Optimization and the BBQ benchmark, we evaluate fairness under ambiguous and disambiguous contexts. Our findings reveal noticeable bias spillover: while aggregate results show improvements, context-aware analysis exposes significant degradations in ambiguous contexts, particularly for physical appearance ($p< 0.001$ across all models), sexual orientation, and disability status. We demonstrate that improving fairness along one attribute can inadvertently worsen disparities in others under uncertainty, highlighting the necessity of context-aware, multi-attribute fairness evaluation frameworks.

【5】Helpful to a Fault: Measuring Illicit Assistance in Multi-Turn, Multilingual LLM Agents
标题：对错误有帮助：衡量多回合、多语言LLM代理的非法援助
链接：https://arxiv.org/abs/2602.16346

作者：Nivya Talokar,Ayush K Tarun,Murari Mandal,Maksym Andriushchenko,Antoine Bosselut
摘要：基于LLM的代理通过工具和内存执行真实世界的工作流。这些启示使恶意的对手也可以使用这些代理来执行复杂的误用场景。现有的代理滥用基准主要测试单提示指令，在衡量代理如何最终帮助多轮有害或非法任务方面留下了空白。我们介绍STING（顺序测试非法N步目标执行），一个自动化的红队框架，构建一个一步一步的非法计划接地在一个良性的角色和迭代探测目标代理自适应后续行动，使用判断代理跟踪阶段完成。我们进一步引入了一个分析框架，将多回合红色团队建模为首次越狱时间随机变量，启用了发现曲线等分析工具，攻击语言的风险比归因，以及一个新的度量标准：限制平均越狱发现。在AgentHarm的情况下，STING产生更高的非法任务完成比单轮提示和聊天为导向的多轮基线适应工具使用代理。在六个非英语环境的多语言评估中，我们发现攻击成功和非法任务完成在低资源语言中并没有持续增加，这与常见的聊天机器人结果不同。总的来说，STING提供了一种实用的方法来评估和压力测试现实部署环境中的代理滥用，其中交互本质上是多轮的，通常是多语言的。
摘要：LLM-based agents execute real-world workflows via tools and memory. These affordances enable ill-intended adversaries to also use these agents to carry out complex misuse scenarios. Existing agent misuse benchmarks largely test single-prompt instructions, leaving a gap in measuring how agents end up helping with harmful or illegal tasks over multiple turns. We introduce STING (Sequential Testing of Illicit N-step Goal execution), an automated red-teaming framework that constructs a step-by-step illicit plan grounded in a benign persona and iteratively probes a target agent with adaptive follow-ups, using judge agents to track phase completion. We further introduce an analysis framework that models multi-turn red-teaming as a time-to-first-jailbreak random variable, enabling analysis tools like discovery curves, hazard-ratio attribution by attack language, and a new metric: Restricted Mean Jailbreak Discovery. Across AgentHarm scenarios, STING yields substantially higher illicit-task completion than single-turn prompting and chat-oriented multi-turn baselines adapted to tool-using agents. In multilingual evaluations across six non-English settings, we find that attack success and illicit-task completion do not consistently increase in lower-resource languages, diverging from common chatbot findings. Overall, STING provides a practical way to evaluate and stress-test agent misuse in realistic deployment settings, where interactions are inherently multi-turn and often multilingual.

【6】HiPER: Hierarchical Reinforcement Learning with Explicit Credit Assignment for Large Language Model Agents
标题：HiPER：大型语言模型代理的具有显式信用分配的分层强化学习
链接：https://arxiv.org/abs/2602.16165

作者：Jiangweizhi Peng,Yuanxin Liu,Ruida Zhou,Charles Fleming,Zhaoran Wang,Alfredo Garcia,Mingyi Hong
摘要：训练LLM作为多回合决策的交互式代理仍然具有挑战性，特别是在具有稀疏和延迟奖励的长期任务中，代理必须在接收有意义的反馈之前执行扩展的动作序列。大多数现有的强化学习（RL）方法将LLM代理建模为在单个时间尺度上操作的扁平策略，在每个回合选择一个动作。在稀疏奖励设置中，这种扁平策略必须在没有显式时间抽象的情况下在整个轨迹上传播信用，这通常会导致不稳定的优化和低效的信用分配。我们提出了HiPER，一种新的分层计划执行RL框架，明确地将高级规划与低级执行分离。HiPER将策略分解为提出子目标的高级计划器和在多个行动步骤中执行子目标的低级执行器。为了使优化与这种结构，我们引入了一个关键的技术，称为分层优势估计（HAE），仔细分配信贷在规划和执行层面。通过在每个子目标的执行过程中聚合回报并协调两个级别的更新，HAE提供了一个无偏的梯度估计，并可证明与平坦的广义优势估计相比降低了方差。从经验上看，HiPER在具有挑战性的交互式基准测试中达到了最先进的性能，在ALFWorld上达到了97.4%的成功率，在WebShop上达到了83.3%的成功率，使用Qwen2.5- 7 B-Instruct（比最好的方法增加了6.6%和8.3%），特别是在需要多个相关子任务的长时间任务上有很大的收益。这些结果突出了显式层次分解的重要性，可扩展的RL训练的多轮LLM代理。
摘要：Training LLMs as interactive agents for multi-turn decision-making remains challenging, particularly in long-horizon tasks with sparse and delayed rewards, where agents must execute extended sequences of actions before receiving meaningful feedback. Most existing reinforcement learning (RL) approaches model LLM agents as flat policies operating at a single time scale, selecting one action at each turn. In sparse-reward settings, such flat policies must propagate credit across the entire trajectory without explicit temporal abstraction, which often leads to unstable optimization and inefficient credit assignment. We propose HiPER, a novel Hierarchical Plan-Execute RL framework that explicitly separates high-level planning from low-level execution. HiPER factorizes the policy into a high-level planner that proposes subgoals and a low-level executor that carries them out over multiple action steps. To align optimization with this structure, we introduce a key technique called hierarchical advantage estimation (HAE), which carefully assigns credit at both the planning and execution levels. By aggregating returns over the execution of each subgoal and coordinating updates across the two levels, HAE provides an unbiased gradient estimator and provably reduces variance compared to flat generalized advantage estimation. Empirically, HiPER achieves state-of-the-art performance on challenging interactive benchmarks, reaching 97.4\% success on ALFWorld and 83.3\% on WebShop with Qwen2.5-7B-Instruct (+6.6\% and +8.3\% over the best prior method), with especially large gains on long-horizon tasks requiring multiple dependent subtasks. These results highlight the importance of explicit hierarchical decomposition for scalable RL training of multi-turn LLM agents.

【7】CLAA: Cross-Layer Attention Aggregation for Accelerating LLM Prefill
标题：CLAA：跨层注意力聚合，加速LLM预编写
链接：https://arxiv.org/abs/2602.16054

作者：Bradley McDanel,Steven Li,Harshit Khaitan
备注：15 pages, 8 figures
摘要：长上下文LLM推理中的预填充阶段仍然是计算瓶颈。最近的令牌排名算法通过选择性地处理语义相关令牌的子集来加速推理。然而，现有的方法遭受不稳定的令牌重要性估计，往往在层之间变化。独立于特定于区块链的架构来评估令牌排名质量是一项挑战。为了解决这一问题，我们引入了一个消息灵通的Oracle，它通过测量从生成的答案到提示的注意力来定义地面真理令牌的重要性。这个预言揭示了现有的排名在各个层之间表现出很高的差异：排名在特定层会急剧下降，这是一种端到端基准测试看不到的故障模式。诊断结果提出了一个简单的解决办法：将各层的分数相加，而不是依赖于任何一个层。我们将其实现为跨层注意力聚合（CLAA），与完整KV缓存基线相比，它缩小了与Oracle上限的差距，并将首次令牌时间（TTFT）减少了39%。
摘要：The prefill stage in long-context LLM inference remains a computational bottleneck. Recent token-ranking heuristics accelerate inference by selectively processing a subset of semantically relevant tokens. However, existing methods suffer from unstable token importance estimation, often varying between layers. Evaluating token-ranking quality independently from heuristic-specific architectures is challenging. To address this, we introduce an Answer-Informed Oracle, which defines ground-truth token importance by measuring attention from generated answers back to the prompt. This oracle reveals that existing heuristics exhibit high variance across layers: rankings can degrade sharply at specific layers, a failure mode invisible to end-to-end benchmarks. The diagnosis suggests a simple fix: aggregate scores across layers rather than relying on any single one. We implement this as Cross-Layer Attention Aggregation (CLAA), which closes the gap to the oracle upper bound and reduces Time-to-First-Token (TTFT) by up to 39\% compared to the Full KV Cache baseline.

【8】Multi-Objective Alignment of Language Models for Personalized Psychotherapy
标题：个性化心理治疗语言模型的多目标对齐
链接：https://arxiv.org/abs/2602.16053

作者：Mehrab Beikzadeh,Yasaman Asadollah Salmanpour,Ashima Suvarna,Sriram Sankararaman,Matteo Malgaroli,Majid Sarrafzadeh,Saadia Gabriel
摘要：心理健康障碍影响着全世界10亿多人，但由于劳动力短缺和成本限制，获得护理的机会仍然有限。虽然人工智能系统显示出治疗前景，但目前的对齐方法独立地优化了目标，未能平衡患者的偏好与临床安全性。我们调查了335名有心理健康经历的人，收集了治疗维度的偏好排名，然后使用直接偏好优化开发了一个多目标对齐框架。我们训练奖励模型的六个标准-移情，安全，积极倾听，自我激励的变化，信任/融洽，和病人的自主性-和系统地比较多目标的方法与单目标优化，监督微调，参数合并。与单目标优化（93.6%同理心，47.8%安全性）相比，多目标DPO（MODPO）实现了更好的平衡（77.6%同理心，62.6%安全性），治疗标准优于一般沟通原则17.2%。盲态临床医生评价证实MODPO始终是首选，LLM-评价者一致性与临床医生间可靠性相当。
摘要：Mental health disorders affect over 1 billion people worldwide, yet access to care remains limited by workforce shortages and cost constraints. While AI systems show therapeutic promise, current alignment approaches optimize objectives independently, failing to balance patient preferences with clinical safety. We survey 335 individuals with lived mental health experience to collect preference rankings across therapeutic dimensions, then develop a multi-objective alignment framework using direct preference optimization. We train reward models for six criteria -- empathy, safety, active listening, self-motivated change, trust/rapport, and patient autonomy -- and systematically compare multi-objective approaches against single-objective optimization, supervised fine-tuning, and parameter merging. Multi-objective DPO (MODPO) achieves superior balance (77.6% empathy, 62.6% safety) compared to single-objective optimization (93.6% empathy, 47.8% safety), and therapeutic criteria outperform general communication principles by 17.2%. Blinded clinician evaluation confirms MODPO is consistently preferred, with LLM-evaluator agreement comparable to inter-clinician reliability.

【9】ReLoop: Structured Modeling and Behavioral Verification for Reliable LLM-Based Optimization
标题：ReLoop：结构化建模和行为验证，用于可靠的基于LLM的优化
链接：https://arxiv.org/abs/2602.15983

作者：Junbo Jacob Lian,Yujun Sun,Huiling Chen,Chaoyu Zhang,Chung-Piaw Teo
备注：Code and benchmark: \url{https://github.com/junbolian/ReLoop}
摘要：大型语言模型（LLM）可以将自然语言转换为优化代码，但无声的失败会带来严重的风险：执行并返回解决方案可行的解决方案的代码可能会编码语义上不正确的公式，从而在组合问题上产生高达90个百分点的可行性-正确性差距。我们引入ReLoop，从两个互补的方向解决沉默的故障。结构化生成将代码生成分解为反映专家建模实践的四阶段推理链（理解，形式化，合成，验证），并具有显式变量类型推理和自验证，以防止从源头上产生公式错误。行为验证通过测试公式是否正确地响应基于求解器的参数扰动来检测幸存下来的错误，而不需要地面真理-一个绕过基于LLM的代码审查中固有的自一致性问题的外部语义信号。这两种机制是互补的：结构化的生成在复杂的组合问题上占主导地位，而行为验证成为最大的单一贡献者与本地化的配方缺陷的问题。再加上通过IIS增强的诊断进行的执行恢复，ReLoop将最强模型的正确率从22.6%提高到31.1%，执行率从72.1%提高到100.0%，在跨越三种范式（基础，SFT，RL）和三个基准的五个模型中获得一致的收益。我们还发布了RetailOpt-190，190个组合零售优化场景，针对LLM最常失败的多约束交互。
摘要：Large language models (LLMs) can translate natural language into optimization code, but silent failures pose a critical risk: code that executes and returns solver-feasible solutions may encode semantically incorrect formulations, creating a feasibility-correctness gap of up to 90 percentage points on compositional problems. We introduce ReLoop, addressing silent failures from two complementary directions. Structured generation decomposes code production into a four-stage reasoning chain (understand, formalize, synthesize, verify) that mirrors expert modeling practice, with explicit variable-type reasoning and self-verification to prevent formulation errors at their source. Behavioral verification detects errors that survive generation by testing whether the formulation responds correctly to solver-based parameter perturbation, without requiring ground truth -- an external semantic signal that bypasses the self-consistency problem inherent in LLM-based code review. The two mechanisms are complementary: structured generation dominates on complex compositional problems, while behavioral verification becomes the largest single contributor on problems with localized formulation defects. Together with execution recovery via IIS-enhanced diagnostics, ReLoop raises correctness from 22.6% to 31.1% and execution from 72.1% to 100.0% on the strongest model, with consistent gains across five models spanning three paradigms (foundation, SFT, RL) and three benchmarks. We additionally release RetailOpt-190, 190 compositional retail optimization scenarios targeting the multi-constraint interactions where LLMs most frequently fail.

【10】Can Vision-Language Models See Squares? Text-Recognition Mediates Spatial Reasoning Across Three Model Families
标题：视觉语言模型能看到方形吗？文本识别调解三个模型家族之间的空间推理
链接：https://arxiv.org/abs/2602.15950

作者：Yuval Levental
备注：9 pages, 3 figures, 2 tables. Workshop-length paper
摘要：我们提出了一个简单的实验，暴露了视觉语言模型（VLM）的一个基本限制：当这些细胞缺乏文本标识时，无法准确地定位二进制网格中的填充细胞。我们生成15个15 x15的网格，这些网格具有不同的密度（10.7%-41.8%填充单元格），并将每个网格渲染为两种图像类型-文本符号（.和#）和没有网格线的填充正方形-然后请三个前沿VLM（Claude Opus，ChatGPT 5.2和Gemini 3 Thinking）转录它们。在文本符号条件下，Claude和ChatGPT实现了约91%的单元格准确度和84%的F1，而Gemini实现了84%的准确度和63%的F1。在填充平方条件下，所有三个模型都崩溃到60-73%的准确度和29-39%的F1。重要的是，所有条件都通过相同的视觉编码器--文本符号是图像，而不是标记化的文本。文本与正方形的F1差距在各个模型中的范围从34到54个点，这表明VLM的行为就像它们拥有一个高保真的文本识别路径来进行空间推理，大大优于它们的原生视觉路径。每个模型在平方条件下都表现出不同的失败模式-系统性欠计数（Claude），大量过度计数（ChatGPT）和模板幻觉（Gemini）-但都有相同的潜在缺陷：非文本视觉元素的空间定位严重退化。
摘要：We present a simple experiment that exposes a fundamental limitation in vision-language models (VLMs): the inability to accurately localize filled cells in binary grids when those cells lack textual identity. We generate fifteen 15x15 grids with varying density (10.7%-41.8% filled cells) and render each as two image types -- text symbols (. and #) and filled squares without gridlines -- then ask three frontier VLMs (Claude Opus, ChatGPT 5.2, and Gemini 3 Thinking) to transcribe them. In the text-symbol condition, Claude and ChatGPT achieve approximately 91% cell accuracy and 84% F1, while Gemini achieves 84% accuracy and 63% F1. In the filled-squares condition, all three models collapse to 60-73% accuracy and 29-39% F1. Critically, all conditions pass through the same visual encoder -- the text symbols are images, not tokenized text. The text-vs-squares F1 gap ranges from 34 to 54 points across models, demonstrating that VLMs behave as if they possess a high-fidelity text-recognition pathway for spatial reasoning that dramatically outperforms their native visual pathway. Each model exhibits a distinct failure mode in the squares condition -- systematic under-counting (Claude), massive over-counting (ChatGPT), and template hallucination (Gemini) -- but all share the same underlying deficit: severely degraded spatial localization for non-textual visual elements.

【11】Quality-constrained Entropy Maximization Policy Optimization for LLM Diversity
标题：LLM多样性的质量约束的熵最大化策略优化
链接：https://arxiv.org/abs/2602.15894

作者：Haihui Pan,Yuzhong Hong,Shaoke Lv,Junwei Bao,Hongfei Jiang,Yang Song
摘要：最近的研究表明，虽然对齐方法显着提高了大型语言模型（LLM）输出的质量，但它们同时降低了模型输出的多样性。虽然已经提出了一些方法来增强LLM输出分集，但它们通常以降低性能为代价。在这项工作中，我们首先从理论上证明了对齐任务可以分解为两个分布：质量和多样性。为了提高LLM输出的多样性，同时确保质量，我们提出了质量约束的熵最大化策略优化（QEMPO）。QEMPO的目标是在保证输出质量的同时最大化政策的输出熵。通过在QEMPO中加入不同的约束条件，我们得到了不同的策略。为了优化政策，我们提出了在线和离线培训方法。实验验证，QEMPO在提高输出多样性的同时，实现了与RLHF相当甚至更好的性能。
摘要：Recent research indicates that while alignment methods significantly improve the quality of large language model(LLM) outputs, they simultaneously reduce the diversity of the models' output. Although some methods have been proposed to enhance LLM output diversity, they often come at the cost of reduced performance. In this work, we first theoretically demonstrate that the alignment task can be decomposed into two distributions: quality and diversity. To enhance the diversity of LLM outputs while ensuring quality, we propose the Quality-constrained Entropy Maximization Policy Optimization (QEMPO). QEMPO aims to maximize the output entropy of the policy while ensuring output quality. By adding different constraints to QEMPO, we obtain different policies. To optimize policies, we propose both online and offline training methods. Experiments validate that QEMPO achieves performance comparable to or even better than RLHF while improving output diversity.

【12】MARVL: Multi-Stage Guidance for Robotic Manipulation via Vision-Language Models
标题：MARVL：通过视觉语言模型进行机器人操纵的多阶段引导
链接：https://arxiv.org/abs/2602.15872

作者：Xunlan Zhou,Xuanlin Chen,Shaowei Zhang,Xiangkun Li,ShengHua Wan,Xiaohai Hu,Yuan Lei,Le Gan,De-chuan Zhan
摘要：设计密集的奖励函数是高效机器人强化学习（RL）的关键。然而，大多数密集奖励依赖于人工工程，这从根本上限制了强化学习的可扩展性和自动化。虽然视觉语言模型（VLM）为奖励设计提供了一条很有前途的道路，但幼稚的VLM奖励往往与任务进度不一致，与空间基础相矛盾，并且对任务语义的理解有限。为了解决这些问题，我们提出了MARVL多阶段指导机器人操作通过视觉语言模型。MARVL微调的空间和语义一致性的VLM和任务分解成多阶段的子任务与任务方向投影的轨迹敏感性。从经验上讲，MARVL在元世界基准测试中的表现明显优于现有的VLM-reward方法，在稀疏奖励操作任务中表现出卓越的样本效率和鲁棒性。
摘要：Designing dense reward functions is pivotal for efficient robotic Reinforcement Learning (RL). However, most dense rewards rely on manual engineering, which fundamentally limits the scalability and automation of reinforcement learning. While Vision-Language Models (VLMs) offer a promising path to reward design, naive VLM rewards often misalign with task progress, struggle with spatial grounding, and show limited understanding of task semantics. To address these issues, we propose MARVL-Multi-stAge guidance for Robotic manipulation via Vision-Language models. MARVL fine-tunes a VLM for spatial and semantic consistency and decomposes tasks into multi-stage subtasks with task direction projection for trajectory sensitivity. Empirically, MARVL significantly outperforms existing VLM-reward methods on the Meta-World benchmark, demonstrating superior sample efficiency and robustness on sparse-reward manipulation tasks.

【13】Do Personality Traits Interfere? Geometric Limitations of Steering in Large Language Models
标题：人格特质会相互干扰吗？大型语言模型中转向的几何限制
链接：https://arxiv.org/abs/2602.15847

作者：Pranav Bhandari,Usman Naseem,Mehwish Nasim
摘要：大型语言模型（LLM）中的人格导向通常依赖于注入特定于特质的导向向量，隐含地假设人格特质可以独立控制。在这项工作中，我们将通过分析大五人格导向方向之间的几何关系来检验这一假设是否成立。我们研究了从两个模型族（LLaMA-3-8B和Mistral-8B）中提取的导向矢量，并应用了一系列几何条件化方案，从无约束方向到软和硬正交归一化。我们的研究结果表明，人格转向方向表现出实质性的几何依赖性：转向一个特质一贯诱导其他的变化，即使线性重叠被明确删除。虽然硬orthonormalisation强制几何独立性，它不会消除跨特质行为的影响，并可以减少转向强度。这些发现表明，LLM的人格特质占据了一个轻微耦合的子空间，限制了完全独立的特质控制。
摘要：Personality steering in large language models (LLMs) commonly relies on injecting trait-specific steering vectors, implicitly assuming that personality traits can be controlled independently. In this work, we examine whether this assumption holds by analysing the geometric relationships between Big Five personality steering directions. We study steering vectors extracted from two model families (LLaMA-3-8B and Mistral-8B) and apply a range of geometric conditioning schemes, from unconstrained directions to soft and hard orthonormalisation. Our results show that personality steering directions exhibit substantial geometric dependence: steering one trait consistently induces changes in others, even when linear overlap is explicitly removed. While hard orthonormalisation enforces geometric independence, it does not eliminate cross-trait behavioural effects and can reduce steering strength. These findings suggest that personality traits in LLMs occupy a slightly coupled subspace, limiting fully independent trait control.

【14】Empirical Cumulative Distribution Function Clustering for LLM-based Agent System Analysis
标题：基于LLM的Agent系统分析的经验累积分布函数聚类
链接：https://arxiv.org/abs/2602.16131

作者：Chihiro Watanabe,Jingyu Sun
摘要：大型语言模型（LLM）越来越多地被用作解决复杂任务的代理，例如问答（QA），科学辩论和软件开发。一个标准的评估程序将LLM代理的多个响应聚合到一个最终答案中，通常通过多数投票，并将其与参考答案进行比较。然而，这一过程可能会掩盖原始响应的质量和分布特征。在本文中，我们提出了一种新的评估框架的基础上生成的响应和参考答案之间的余弦相似性的经验累积分布函数（ECDF）。这使得能够在精确匹配度量之外对响应质量进行更细致的评估。为了分析不同的代理配置的响应分布，我们进一步介绍了ECDF的聚类方法，使用它们的距离和$k$-medoids算法。我们在QA数据集上的实验表明，ECDF可以区分具有相似最终精度但不同质量分布的代理设置。聚类分析还揭示了可解释的群体结构的反应，提供洞察温度，人物角色和问题主题的影响。
摘要：Large language models (LLMs) are increasingly used as agents to solve complex tasks such as question answering (QA), scientific debate, and software development. A standard evaluation procedure aggregates multiple responses from LLM agents into a single final answer, often via majority voting, and compares it against reference answers. However, this process can obscure the quality and distributional characteristics of the original responses. In this paper, we propose a novel evaluation framework based on the empirical cumulative distribution function (ECDF) of cosine similarities between generated responses and reference answers. This enables a more nuanced assessment of response quality beyond exact match metrics. To analyze the response distributions across different agent configurations, we further introduce a clustering method for ECDFs using their distances and the $k$-medoids algorithm. Our experiments on a QA dataset demonstrate that ECDFs can distinguish between agent settings with similar final accuracies but different quality distributions. The clustering analysis also reveals interpretable group structures in the responses, offering insights into the impact of temperature, persona, and question topics.

【15】MadEvolve: Evolutionary Optimization of Cosmological Algorithms with Large Language Models
标题：MadEvolve：使用大型语言模型的宇宙学算法的进化优化
链接：https://arxiv.org/abs/2602.15951

作者：Tianyi Li,Shihui Zang,Moritz Münchmeyer
摘要：我们开发了一个通用的框架来发现科学算法，并将其应用于计算宇宙学中的三个问题。我们的代码MadEvolve与Google的AlphaEvolve类似，但更强调自由参数及其优化。我们的代码从基线人工算法实现开始，然后通过对其代码进行迭代更改来优化其性能指标。MadEvolve的另一个便利功能是自动生成报告，将输入算法与进化算法进行比较，描述算法创新并列出自由参数及其功能。我们的代码支持自动微分、基于梯度的参数优化和无梯度优化方法。我们将MadEvolve应用于N体模拟中的宇宙学初始条件重建、21 cm前景污染重建和有效重子物理。在所有情况下，我们发现基本算法有很大的改进。我们在madevolve. org上公开MadEvolve和我们的三个任务。
摘要：We develop a general framework to discover scientific algorithms and apply it to three problems in computational cosmology. Our code, MadEvolve, is similar to Google's AlphaEvolve, but places a stronger emphasis on free parameters and their optimization. Our code starts with a baseline human algorithm implementation, and then optimizes its performance metrics by making iterative changes to its code. As a further convenient feature, MadEvolve automatically generates a report that compares the input algorithm with the evolved algorithm, describes the algorithmic innovations and lists the free parameters and their function. Our code supports both auto-differentiable, gradient-based parameter optimization and gradient-free optimization methods. We apply MadEvolve to the reconstruction of cosmological initial conditions, 21cm foreground contamination reconstruction and effective baryonic physics in N-body simulations. In all cases, we find substantial improvements over the base algorithm. We make MadEvolve and our three tasks publicly available at madevolve.org.

Graph相关(图学习|图神经网络|图优化等)(6篇)

【1】Hardware-accelerated graph neural networks: an alternative approach for neuromorphic event-based audio classification and keyword spotting on SoC FPGA
标题：硬件加速图神经网络：在SoCFPG上进行基于神经形态事件的音频分类和关键词识别的替代方法
链接：https://arxiv.org/abs/2602.16442

作者：Kamil Jeziorek,Piotr Wzorek,Krzysztof Blachut,Hiroshi Nakano,Manon Dampfhoffer,Thomas Mesquida,Hiroaki Nishi,Thomas Dalgaty,Tomasz Kryjak
备注：Under revision in TRETS Journal
摘要：随着嵌入式边缘传感器记录的数据量的增加，特别是来自产生离散事件流的神经形态设备的数据量的增加，对硬件感知神经架构的需求越来越大，这些架构能够实现高效，低延迟和节能的本地处理。我们提出了一个FPGA实现的事件图神经网络的音频处理。我们利用人工耳蜗将时间序列信号转换为稀疏事件数据，从而降低内存和计算成本。我们的架构在SoC FPGA上实现，并在两个开源数据集上进行了评估。对于分类任务，我们的基线浮点模型在SHD数据集上实现了92.7%的准确率-仅比最先进水平低2.4%-同时需要的参数减少了10倍和67倍以上。在SSC上，我们的模型实现了66.9-71.0%的准确度。与基于FPGA的脉冲神经网络相比，我们的量化模型达到了92.3%的准确率，比它们高出19.3%，同时减少了资源使用和延迟。对于SSC，我们报告了第一个硬件加速评估。我们进一步展示了事件音频关键字定位的第一个端到端FPGA实现，将图卷积层与递归序列建模相结合。该系统实现了高达95%的字端检测准确率，仅10.53微秒的延迟和1.18 W的功耗，为高能效的事件驱动KWS建立了强大的基准。
摘要：As the volume of data recorded by embedded edge sensors increases, particularly from neuromorphic devices producing discrete event streams, there is a growing need for hardware-aware neural architectures that enable efficient, low-latency, and energy-conscious local processing. We present an FPGA implementation of event-graph neural networks for audio processing. We utilise an artificial cochlea that converts time-series signals into sparse event data, reducing memory and computation costs. Our architecture was implemented on a SoC FPGA and evaluated on two open-source datasets. For classification task, our baseline floating-point model achieves 92.7% accuracy on SHD dataset - only 2.4% below the state of the art - while requiring over 10x and 67x fewer parameters. On SSC, our models achieve 66.9-71.0% accuracy. Compared to FPGA-based spiking neural networks, our quantised model reaches 92.3% accuracy, outperforming them by up to 19.3% while reducing resource usage and latency. For SSC, we report the first hardware-accelerated evaluation. We further demonstrate the first end-to-end FPGA implementation of event-audio keyword spotting, combining graph convolutional layers with recurrent sequence modelling. The system achieves up to 95% word-end detection accuracy, with only 10.53 microsecond latency and 1.18 W power consumption, establishing a strong benchmark for energy-efficient event-driven KWS.

【2】A Graph Meta-Network for Learning on Kolmogorov-Arnold Networks
标题：基于Kolmogorov-Arnold网络的图元网络
链接：https://arxiv.org/abs/2602.16316

作者：Guy Bar-Shalom,Ami Tavory,Itay Evron,Maya Bechler-Speicher,Ido Guy,Haggai Maron
摘要：权重空间模型直接从神经网络的参数中学习，从而实现诸如预测其在新数据集上的准确性等任务。简单的方法--比如将MLP应用于扁平化参数--性能很差，这使得设计更好的权重空间架构成为一个核心挑战。虽然先前的工作利用标准网络中的置换对称性来指导这样的设计，但对于Kolmogorov-Arnold网络（KAN）还没有类似的分析或定制的架构。在这项工作中，我们表明，KAN共享相同的置换对称性的MLP，并提出了KAN图，图形表示其计算。在此基础上，我们开发了WS-KAN，这是第一个在KAN上学习的权重空间架构，它自然地解释了它们的对称性。我们分析了WS-KAN的表达能力，表明它可以复制输入KAN的前向传递-一个标准的方法来评估权重空间架构的表达能力。我们构建了一个全面的“动物园”的训练KAN跨越不同的任务，我们用它作为基准来经验评估WS-KAN。在所有任务中，WS-KAN始终优于结构不可知的基线，通常是相当大的幅度。我们的代码可在https://github.com/BarSGuy/KAN-Graph-Metanetwork上获得。
摘要：Weight-space models learn directly from the parameters of neural networks, enabling tasks such as predicting their accuracy on new datasets. Naive methods -- like applying MLPs to flattened parameters -- perform poorly, making the design of better weight-space architectures a central challenge. While prior work leveraged permutation symmetries in standard networks to guide such designs, no analogous analysis or tailored architecture yet exists for Kolmogorov-Arnold Networks (KANs). In this work, we show that KANs share the same permutation symmetries as MLPs, and propose the KAN-graph, a graph representation of their computation. Building on this, we develop WS-KAN, the first weight-space architecture that learns on KANs, which naturally accounts for their symmetry. We analyze WS-KAN's expressive power, showing it can replicate an input KAN's forward pass - a standard approach for assessing expressiveness in weight-space architectures. We construct a comprehensive ``zoo'' of trained KANs spanning diverse tasks, which we use as benchmarks to empirically evaluate WS-KAN. Across all tasks, WS-KAN consistently outperforms structure-agnostic baselines, often by a substantial margin. Our code is available at https://github.com/BarSGuy/KAN-Graph-Metanetwork.

【3】Graph neural network for colliding particles with an application to sea ice floe modeling
标题：碰撞粒子的图形神经网络及其在海浮冰建模中的应用
链接：https://arxiv.org/abs/2602.16213

作者：Ruibiao Zhu
摘要：本文介绍了一种使用图神经网络（GNNs）进行海冰建模的新方法，该方法利用海冰的自然图结构，其中节点表示单个冰块，边缘模拟物理相互作用，包括碰撞。这个概念是在一个一维框架内发展的，作为一个基础步骤。传统的数值方法虽然有效，但计算密集且可扩展性较差。通过利用GNNs，该模型被称为碰撞捕获网络（CN），集成了数据同化（DA）技术，以有效地学习和预测各种条件下的海冰动态。该方法使用合成数据进行了验证，无论是否有观测数据点，发现该模型在不影响精度的情况下加速了轨迹的模拟。这一进步为边缘冰区（MIZ）的预测提供了更有效的工具，并突出了将机器学习与数据同化相结合以实现更有效和更高效建模的潜力。
摘要：This paper introduces a novel approach to sea ice modeling using Graph Neural Networks (GNNs), utilizing the natural graph structure of sea ice, where nodes represent individual ice pieces, and edges model the physical interactions, including collisions. This concept is developed within a one-dimensional framework as a foundational step. Traditional numerical methods, while effective, are computationally intensive and less scalable. By utilizing GNNs, the proposed model, termed the Collision-captured Network (CN), integrates data assimilation (DA) techniques to effectively learn and predict sea ice dynamics under various conditions. The approach was validated using synthetic data, both with and without observed data points, and it was found that the model accelerates the simulation of trajectories without compromising accuracy. This advancement offers a more efficient tool for forecasting in marginal ice zones (MIZ) and highlights the potential of combining machine learning with data assimilation for more effective and efficient modeling.

【4】Investigating GNN Convergence on Large Randomly Generated Graphs with Realistic Node Feature Correlations
标题：研究具有现实节点特征相关性的大型随机生成图上的GNN收敛性
链接：https://arxiv.org/abs/2602.16145

作者：Mohammed Zain Ali Ahmed
备注：8 pages, 1 figure
摘要：有一些现有的研究分析图神经网络在大型随机图上的收敛行为。不幸的是，这些研究中的大多数都没有对节点特征之间的相关性进行建模，而这些特征在各种现实网络中自然存在。因此，由这种收敛行为导致的GNN的衍生限制并不能真正反映GNN在应用于现实图形时的表达能力。在本文中，我们将介绍一种新的方法来产生随机图，具有相关的节点功能。将以确保相邻节点之间的相关性的方式对节点特征进行采样。作为我们选择抽样方案的动机，我们将诉诸现实生活中的图所表现出的属性，特别是巴拉巴西-阿尔伯特模型所捕获的属性。理论分析将强烈表明，在某些情况下可以避免收敛，我们将在使用我们的新方法生成的大型随机图上进行经验验证。观察到的发散行为提供了证据，表明GNN可能比最初的研究表明的更具表达力，特别是在现实的图形上。
摘要：There are a number of existing studies analysing the convergence behaviour of graph neural networks on large random graphs. Unfortunately, the majority of these studies do not model correlations between node features, which would naturally exist in a variety of real-life networks. Consequently, the derived limitations of GNNs, resulting from such convergence behaviour, is not truly reflective of the expressive power of GNNs when applied to realistic graphs. In this paper, we will introduce a novel method to generate random graphs that have correlated node features. The node features will be sampled in such a manner to ensure correlation between neighbouring nodes. As motivation for our choice of sampling scheme, we will appeal to properties exhibited by real-life graphs, particularly properties that are captured by the Barabási-Albert model. A theoretical analysis will strongly indicate that convergence can be avoided in some cases, which we will empirically validate on large random graphs generated using our novel method. The observed divergent behaviour provides evidence that GNNs may be more expressive than initial studies would suggest, especially on realistic graphs.

【5】Feature-based morphological analysis of shape graph data
标题：形状图数据的基于形态学分析
链接：https://arxiv.org/abs/2602.16120

作者：Murad Hossen,Demetrio Labate,Nicolas Charon
摘要：本文介绍并演示了一种用于形状图数据集统计分析的计算管道，即嵌入2D或3D空间的几何网络。与传统的抽象图不同，我们的目的不仅是检索和区分数据的连接结构的变化，而且网络分支的几何差异。我们提出的方法依赖于提取一组专门策划且明确的拓扑、几何和方向特征，旨在满足关键的不变性属性。我们利用由此产生的功能表示的任务，如组比较，聚类和分类的形状图的队列。在几个真实世界的数据集，包括城市道路/街道网络，神经元的痕迹和星形胶质细胞成像的有效性进行评估。这些结果与几种替代方法（基于特征的和不基于特征的）进行了基准测试。
摘要：This paper introduces and demonstrates a computational pipeline for the statistical analysis of shape graph datasets, namely geometric networks embedded in 2D or 3D spaces. Unlike traditional abstract graphs, our purpose is not only to retrieve and distinguish variations in the connectivity structure of the data but also geometric differences of the network branches. Our proposed approach relies on the extraction of a specifically curated and explicit set of topological, geometric and directional features, designed to satisfy key invariance properties. We leverage the resulting feature representation for tasks such as group comparison, clustering and classification on cohorts of shape graphs. The effectiveness of this representation is evaluated on several real-world datasets including urban road/street networks, neuronal traces and astrocyte imaging. These results are benchmarked against several alternative methods, both feature-based and not.

【6】Edge-Local and Qubit-Efficient Quantum Graph Learning for the NISQ Era
标题：NISQ时代的边本地和量子比特高效量子图学习
链接：https://arxiv.org/abs/2602.16018

作者：Armin Ahmadkhaniha,Jake Doliskani
摘要：图神经网络（GNN）是从图结构数据中学习表示的强大框架，但由于电路深度，多量子位相互作用和量子位可扩展性限制，它们在近期量子硬件上的直接实现仍然具有挑战性。在这项工作中，我们引入了一个完全量子图卷积架构，该架构明确设计用于噪声中间尺度量子（NISQ）机制中的无监督学习。我们的方法结合了一个变分量子特征提取层与边缘本地和量子比特效率的量子消息传递机制的启发量子交替算子Answer（QAOA）框架。与依赖于全局操作或多控制幺正的先前模型不同，我们的模型仅使用硬件本地单量子比特和双量子比特门将消息传递分解为沿图边缘的成对交互。对于具有$N$个节点和$n$个量子位特征寄存器的图，该设计将量子位要求从$O（Nn）$降低到$O（n）$，使得能够在当前量子设备上实现，而不管图的大小。我们使用Deep Graph Infomax目标对模型进行训练，以执行无监督节点表示学习。在Cora引文网络和大规模基因组SNP数据集上的实验表明，我们的模型与之前的量子和混合方法相比仍然具有竞争力。
摘要：Graph neural networks (GNNs) are a powerful framework for learning representations from graph-structured data, but their direct implementation on near-term quantum hardware remains challenging due to circuit depth, multi-qubit interactions, and qubit scalability constraints. In this work, we introduce a fully quantum graph convolutional architecture designed explicitly for unsupervised learning in the noisy intermediate-scale quantum (NISQ) regime. Our approach combines a variational quantum feature extraction layer with an edge-local and qubit-efficient quantum message-passing mechanism inspired by the Quantum Alternating Operator Ansatz (QAOA) framework. Unlike prior models that rely on global operations or multi-controlled unitaries, our model decomposes message passing into pairwise interactions along graph edges using only hardware-native single- and two-qubit gates. This design reduces the qubit requirement from $O(Nn)$ to $O(n)$ for a graph with $N$ nodes and $n$-qubit feature registers, enabling implementation on current quantum devices regardless of graph size. We train the model using the Deep Graph Infomax objective to perform unsupervised node representation learning. Experiments on the Cora citation network and a large-scale genomic SNP dataset demonstrate that our model remains competitive with prior quantum and hybrid approaches.

Transformer(5篇)

【1】Explainable AI: Context-Aware Layer-Wise Integrated Gradients for Explaining Transformer Models
标题：可解释人工智能：用于解释Transformer模型的上下文感知分层集成子
链接：https://arxiv.org/abs/2602.16608

作者：Melkamu Abay Mersha,Jugal Kalita
摘要：Transformer模型实现了跨领域和跨任务的最新性能，但它们的深层表示使它们的预测难以解释。现有的可解释性方法依赖于最后一层的属性，捕获本地令牌级的属性或全球注意力模式没有统一，并缺乏令牌间的依赖关系和结构组件的上下文意识。它们也未能捕捉到相关性如何在各层之间演变，以及结构组件如何塑造决策。为了解决这些局限性，我们提出了\textbf{上下文感知分层集成属性（CA-LIG）框架}，这是一个统一的分层属性框架，可以计算每个Transformer块内的分层集成属性，并将这些标记级别的属性与特定于类的注意力梯度融合在一起。这种集成产生了签名的、上下文敏感的归因图，这些归因图捕获支持和反对的证据，同时通过Transformer层跟踪相关性的层次流。我们在不同的任务，领域和Transformer模型家族中评估了CA-LIG框架，包括情感分析和BERT的长文档和多类文档分类，XLM-R和AfroLM在低资源语言环境中的仇恨语音检测，以及Masked Autoencoder Vision Transformer模型的图像分类。在所有任务和架构中，CA-LIG提供了更忠实的归因，对上下文依赖性表现出更强的敏感性，并产生比现有可解释性方法更清晰，语义更连贯的可视化。这些结果表明，CA-LIG为Transformer决策提供了更全面、更上下文感知和更可靠的解释，提高了深度神经模型的实际可解释性和概念理解。
摘要：Transformer models achieve state-of-the-art performance across domains and tasks, yet their deeply layered representations make their predictions difficult to interpret. Existing explainability methods rely on final-layer attributions, capture either local token-level attributions or global attention patterns without unification, and lack context-awareness of inter-token dependencies and structural components. They also fail to capture how relevance evolves across layers and how structural components shape decision-making. To address these limitations, we proposed the \textbf{Context-Aware Layer-wise Integrated Gradients (CA-LIG) Framework}, a unified hierarchical attribution framework that computes layer-wise Integrated Gradients within each Transformer block and fuses these token-level attributions with class-specific attention gradients. This integration yields signed, context-sensitive attribution maps that capture supportive and opposing evidence while tracing the hierarchical flow of relevance through the Transformer layers. We evaluate the CA-LIG Framework across diverse tasks, domains, and transformer model families, including sentiment analysis and long and multi-class document classification with BERT, hate speech detection in a low-resource language setting with XLM-R and AfroLM, and image classification with Masked Autoencoder vision Transformer model. Across all tasks and architectures, CA-LIG provides more faithful attributions, shows stronger sensitivity to contextual dependencies, and produces clearer, more semantically coherent visualizations than established explainability methods. These results indicate that CA-LIG provides a more comprehensive, context-aware, and reliable explanation of Transformer decision-making, advancing both the practical interpretability and conceptual understanding of deep neural models.

【2】Synthesis and Verification of Transformer Programs
标题：Transformer程序的综合与验证
链接：https://arxiv.org/abs/2602.16473

作者：Hongjian Jiang,Matthew Hague,Philipp Rümmer,Anthony Widjaja Lin
摘要：C-RASP是一种简单的编程语言，最近被证明可以捕获可由Transformers表达的概念。在本文中，我们开发了新的算法技术自动验证C-RASP。为此，我们建立了一个连接，以验证同步的Lustre，这使我们能够利用国家的最先进的模型检查器，利用高度优化的SMT求解器的程序。我们的第二个贡献首先解决了学习C-RASP程序。为此，我们提供了一个新的算法学习C-RASP的例子，使用本地搜索。我们证明了我们的实现的C-RASP在文献中的基准测试的功效，特别是在以下应用程序：（1）Transformer程序优化，和（2）约束学习的Transformer程序（基于部分规范）。
摘要：C-RASP is a simple programming language that was recently shown to capture concepts expressible by transformers. In this paper, we develop new algorithmic techniques for automatically verifying C-RASPs. To this end, we establish a connection to the verification of synchronous dataflow programs in Lustre, which enables us to exploit state-of-the-art model checkers utilizing highly optimized SMT-solvers. Our second contribution addresses learning a C-RASP program in the first place. To this end, we provide a new algorithm for learning a C-RASP from examples using local search. We demonstrate efficacy of our implementation for benchmarks of C-RASPs in the literature, in particular in connection to the following applications: (1) transformer program optimization, and (2) constrained learning of transformer programs (based on a partial specification).

【3】BAT: Better Audio Transformer Guided by Convex Gated Probing
标题：BAT：由凸面门控探测引导的更好的音频Transformer
链接：https://arxiv.org/abs/2602.16305

作者：Houtan Ghaffari,Lukas Rauch,Christoph Scholz,Paul Devos
摘要：探测在计算机视觉中被广泛采用，以忠实地评估自监督学习（SSL）嵌入，因为微调可能会歪曲其内在质量。相比之下，音频SSL模型仍然依赖于微调，因为简单的探测无法释放其全部潜力，并在AudioSet上竞争SOTA时改变其排名。因此，需要一个强大而有效的探测机制来引导音频SSL的轨迹走向可靠和可重复的方法。我们介绍了凸门探测（CGP），这是一种基于原型的方法，可以大大缩小音频中微调和探测之间的差距。CGP通过门控机制有效地利用所有冻结层，并暴露潜在的任务相关信息的位置。在CGP的指导下，我们重新构建了当前SOTA音频模型的整个SSL管道，这些模型使用了以前SSL方法的遗留实现。通过改进数据预处理、模型架构和预训练配方，我们引入了Better Audio Transformer（BAT），并在音频基准上建立了新的SOTA。
摘要：Probing is widely adopted in computer vision to faithfully evaluate self-supervised learning (SSL) embeddings, as fine-tuning may misrepresent their inherent quality. In contrast, audio SSL models still rely on fine-tuning because simple probing fails to unlock their full potential and alters their rankings when competing for SOTA on AudioSet. Hence, a robust and efficient probing mechanism is required to guide the trajectory of audio SSL towards reliable and reproducible methods. We introduce Convex Gated Probing (CGP), a prototype-based method that drastically closes the gap between fine-tuning and probing in audio. CGP efficiently utilizes all frozen layers via a gating mechanism and exposes the location of latent task-relevant information. Guided by CGP, we rework the entire SSL pipeline of current SOTA audio models that use legacy implementations of prior SSL methods. By refining data preprocessing, model architecture, and pre-training recipe, we introduce Better Audio Transformer (BAT), and establish new SOTA on audio benchmarks.

【4】UCTECG-Net: Uncertainty-aware Convolution Transformer ECG Network for Arrhythmia Detection
标题：UCTEG-Net：用于心律失常检测的不确定性感知卷积Transformer心电图网络
链接：https://arxiv.org/abs/2602.16216

作者：Hamzeh Asgharnezhad,Pegah Tabarisaadi,Abbas Khosravi,Roohallah Alizadehsani,U. Rajendra Acharya
摘要：深度学习改进了自动心电图（ECG）分类，但对预测可靠性的有限了解阻碍了其在安全关键环境中的使用。本文提出了UCTECG-Net，一个不确定性感知的混合架构，结合一维卷积和Transformer编码器来处理原始ECG信号及其频谱图。在MIT-BIH心律失常和PTB诊断数据集上进行评估后，UCTECG-Net在准确性、精确度、召回率和F1评分方面优于LSTM、CNN 1D和Transformer基线，在MIT-BIH上的准确率高达98.58%，在PTB上的准确率高达99.14%。为了评估预测的可靠性，我们将三种不确定性量化方法（Monte Carlo Dropout，Deep Ensembles和Ensemble Monte Carlo Dropout）集成到所有模型中，并使用不确定性感知混淆矩阵和衍生指标分析它们的行为。结果表明，UCTECG-Net，特别是与Entrance或EMCD，提供了更可靠和更好的对齐的不确定性估计比竞争的架构，提供了一个更强大的基础，风险意识的心电图决策支持。
摘要：Deep learning has improved automated electrocardiogram (ECG) classification, but limited insight into prediction reliability hinders its use in safety-critical settings. This paper proposes UCTECG-Net, an uncertainty-aware hybrid architecture that combines one-dimensional convolutions and Transformer encoders to process raw ECG signals and their spectrograms jointly. Evaluated on the MIT-BIH Arrhythmia and PTB Diagnostic datasets, UCTECG-Net outperforms LSTM, CNN1D, and Transformer baselines in terms of accuracy, precision, recall and F1 score, achieving up to 98.58% accuracy on MIT-BIH and 99.14% on PTB. To assess predictive reliability, we integrate three uncertainty quantification methods (Monte Carlo Dropout, Deep Ensembles, and Ensemble Monte Carlo Dropout) into all models and analyze their behavior using an uncertainty-aware confusion matrix and derived metrics. The results show that UCTECG-Net, particularly with Ensemble or EMCD, provides more reliable and better-aligned uncertainty estimates than competing architectures, offering a stronger basis for risk-aware ECG decision support.

【5】RefineFormer3D: Efficient 3D Medical Image Segmentation via Adaptive Multi-Scale Transformer with Cross Attention Fusion
标题：RefineFormer 3D：通过具有交叉注意融合的自适应多尺度Transformer进行高效的3D医学图像分割
链接：https://arxiv.org/abs/2602.16320

作者：Kavyansh Tyagi,Vishwas Rathi,Puneet Goyal
备注：13 pages, 5 figures, 7 tables
摘要：准确且计算高效的3D医学图像分割仍然是临床工作流程中的关键挑战。基于transformer的架构通常表现出卓越的全局上下文建模，但代价是过多的参数计数和内存需求，限制了其临床部署。我们提出了RefineFormer3D，一个轻量级的分层Transformer架构，平衡分割精度和计算效率的体积医学成像。该架构集成了三个关键组件：（i）基于GhostConv3D的补丁嵌入，用于以最小冗余进行高效特征提取，（ii）MixFFN3D模块，具有低秩投影和深度卷积，用于参数高效特征提取，以及（iii）交叉注意融合解码器，实现自适应多尺度跳过连接集成。RefineFormer3D仅包含294万个参数，远远少于当代基于transformer的方法。在ACDC和BraTS基准测试上进行的大量实验表明，RefineFormer3D分别获得了93.44%和85.9%的平均Dice分数，优于或匹配最先进的方法，同时需要的参数明显较少。此外，该模型实现了快速推理（GPU上每卷8.35 ms），内存需求低，支持在资源受限的临床环境中部署。这些结果确立了RefineFormer3D作为实际3D医学图像分割的有效和可扩展的解决方案。
摘要：Accurate and computationally efficient 3D medical image segmentation remains a critical challenge in clinical workflows. Transformer-based architectures often demonstrate superior global contextual modeling but at the expense of excessive parameter counts and memory demands, restricting their clinical deployment. We propose RefineFormer3D, a lightweight hierarchical transformer architecture that balances segmentation accuracy and computational efficiency for volumetric medical imaging. The architecture integrates three key components: (i) GhostConv3D-based patch embedding for efficient feature extraction with minimal redundancy, (ii) MixFFN3D module with low-rank projections and depthwise convolutions for parameter-efficient feature extraction, and (iii) a cross-attention fusion decoder enabling adaptive multi-scale skip connection integration. RefineFormer3D contains only 2.94M parameters, substantially fewer than contemporary transformer-based methods. Extensive experiments on ACDC and BraTS benchmarks demonstrate that RefineFormer3D achieves 93.44\% and 85.9\% average Dice scores respectively, outperforming or matching state-of-the-art methods while requiring significantly fewer parameters. Furthermore, the model achieves fast inference (8.35 ms per volume on GPU) with low memory requirements, supporting deployment in resource-constrained clinical environments. These results establish RefineFormer3D as an effective and scalable solution for practical 3D medical image segmentation.

GAN|对抗|攻击|生成相关(4篇)

【1】Sequential Membership Inference Attacks
标题：顺序会员推断攻击
链接：https://arxiv.org/abs/2602.16596

作者：Thomas Michel,Debabrota Basu,Emilie Kaufmann
备注：27 pages, 10 figures
摘要：现代AI模型不是静态的。它们在生命周期中经历多次更新。因此，利用模型动态来创建更强大的成员推断（MI）攻击和更严格的隐私审计是及时的问题。虽然文献经验表明，使用一系列的模型更新可以增加MI攻击的力量，严格的分析的“最佳”MI攻击是有限的静态模型与无限的样本。因此，我们开发了一种“最佳”MI攻击，SeMI*，它使用模型更新的序列来识别在某个更新步骤中插入的目标的存在。对于经验平均值计算，我们推导出SeMI* 的最佳功率，同时访问有限数量的具有或不具有隐私的样本。我们的结果恢复现有的渐近分析。我们观察到，访问模型序列避免了MI信号的稀释，这与对最终模型的现有攻击不同，在最终模型中，MI信号随着训练数据的积累而消失。此外，攻击者可以使用SeMI* 来调整插入时间和金丝雀，以产生更严格的隐私审计。最后，我们在数据分布和使用DP-SGD训练或微调的模型中进行实验，证明SeMI* 的实际变体比基线更严格的隐私审计。
摘要：Modern AI models are not static. They go through multiple updates in their lifecycles. Thus, exploiting the model dynamics to create stronger Membership Inference (MI) attacks and tighter privacy audits are timely questions. Though the literature empirically shows that using a sequence of model updates can increase the power of MI attacks, rigorous analysis of the `optimal' MI attacks is limited to static models with infinite samples. Hence, we develop an `optimal' MI attack, SeMI*, that uses the sequence of model updates to identify the presence of a target inserted at a certain update step. For the empirical mean computation, we derive the optimal power of SeMI*, while accessing a finite number of samples with or without privacy. Our results retrieve the existing asymptotic analysis. We observe that having access to the model sequence avoids the dilution of MI signals unlike the existing attacks on the final model, where the MI signal vanishes as training data accumulates. Furthermore, an adversary can use SeMI* to tune both the insertion time and the canary to yield tighter privacy audits. Finally, we conduct experiments across data distributions and models trained or fine-tuned with DP-SGD demonstrating that practical variants of SeMI* lead to tighter privacy audits than the baselines.

【2】RoboGene: Boosting VLA Pre-training via Diversity-Driven Agentic Framework for Real-World Task Generation
标题：RoboGene：通过多样性驱动的统计框架增强VLA预训练，以生成现实世界的任务
链接：https://arxiv.org/abs/2602.16444

作者：Yixue Zhang,Kun Wu,Zhi Gao,Zhen Zhao,Pei Ren,Zhiyuan Xu,Fei Liao,Xinhua Wang,Shichao Fan,Di Wu,Qiuxuan Feng,Meng Li,Zhengping Che,Chang Liu,Jian Tang
摘要：对通用机器人操作的追求受到缺乏多样性的现实世界交互数据的阻碍。与视觉或语言中的网络数据收集不同，机器人数据收集是一个主动过程，会产生高昂的物理成本。因此，最大化数据价值的自动化任务管理仍然是一个关键但尚未充分探索的挑战。现有的手动方法是不可扩展的，并偏向于常见的任务，而现成的基础模型往往幻觉物理上不可行的指令。为了解决这个问题，我们引入RoboGene，一个代理框架，旨在自动生成不同的，物理上合理的操作任务，在单臂，双臂和移动机器人。RoboGene集成了三个核心组件：用于广泛任务覆盖的多样性驱动的采样，用于强制执行物理约束的自我反思机制，以及用于持续改进的人在回路改进。我们进行了广泛的定量分析和大规模的真实世界实验，收集了18 k轨迹的数据集，并引入了新的指标来评估任务质量，可行性和多样性。结果表明，RoboGene显著优于最先进的基础模型（例如，GPT-40，Gemini 2.5 Pro）。此外，真实世界的实验表明，使用RoboGene预训练的VLA模型实现了更高的成功率和更好的泛化能力，强调了高质量任务生成的重要性。我们的项目可以在https://robogene-boost-vla.github.io上找到。
摘要：The pursuit of general-purpose robotic manipulation is hindered by the scarcity of diverse, real-world interaction data. Unlike data collection from web in vision or language, robotic data collection is an active process incurring prohibitive physical costs. Consequently, automated task curation to maximize data value remains a critical yet under-explored challenge. Existing manual methods are unscalable and biased toward common tasks, while off-the-shelf foundation models often hallucinate physically infeasible instructions. To address this, we introduce RoboGene, an agentic framework designed to automate the generation of diverse, physically plausible manipulation tasks across single-arm, dual-arm, and mobile robots. RoboGene integrates three core components: diversity-driven sampling for broad task coverage, self-reflection mechanisms to enforce physical constraints, and human-in-the-loop refinement for continuous improvement. We conduct extensive quantitative analysis and large-scale real-world experiments, collecting datasets of 18k trajectories and introducing novel metrics to assess task quality, feasibility, and diversity. Results demonstrate that RoboGene significantly outperforms state-of-the-art foundation models (e.g., GPT-4o, Gemini 2.5 Pro). Furthermore, real-world experiments show that VLA models pre-trained with RoboGene achieve higher success rates and superior generalization, underscoring the importance of high-quality task generation. Our project is available at https://robogene-boost-vla.github.io.

【3】Discrete Stochastic Localization for Non-autoregressive Generation
标题：非自回归生成的离散随机本地化
链接：https://arxiv.org/abs/2602.16169

作者：Yunshu Wu,Jiayi Cheng,Partha Thakuria,Rob Brekelmans,Evangelos E. Papalexakis,Greg Ver Steeg
摘要：非自回归（NAR）生成通过并行预测多个令牌来减少解码延迟，但迭代精化通常会在自生成草稿下遭受错误累积和分布偏移。掩蔽扩散语言模型（MDLM）及其重新掩蔽采样器（例如，ReMDM）可以被视为现代NAR迭代细化，其中生成重复修改部分观察到的草稿。在这项工作中，我们表明，\n {单独训练}可以大大提高MDLM/ReMDM采样的步进效率。我们提出了\textsc{DSL}（离散随机定位），它在连续的腐败水平上训练单个SNR不变的去噪器，在一个扩散Transformer内桥接中间草稿噪声和掩码式端点腐败。在OpenWebText上，\textsc{DSL}微调在低步长预算下产生较大的MAUVE增益，超过MDLM+ReMDM基线，降噪评估减少了\（\sim\）4$\times$，并在高预算下匹配自回归质量。分析表明，改进的自校正和不确定性校准，使remasking显着更具计算效率。
摘要：Non-autoregressive (NAR) generation reduces decoding latency by predicting many tokens in parallel, but iterative refinement often suffers from error accumulation and distribution shift under self-generated drafts. Masked diffusion language models (MDLMs) and their remasking samplers (e.g., ReMDM) can be viewed as modern NAR iterative refinement, where generation repeatedly revises a partially observed draft. In this work we show that \emph{training alone} can substantially improve the step-efficiency of MDLM/ReMDM sampling. We propose \textsc{DSL} (Discrete Stochastic Localization), which trains a single SNR-invariant denoiser across a continuum of corruption levels, bridging intermediate draft noise and mask-style endpoint corruption within one Diffusion Transformer. On OpenWebText, \textsc{DSL} fine-tuning yields large MAUVE gains at low step budgets, surpassing the MDLM+ReMDM baseline with $\sim$4$\times$ fewer denoiser evaluations, and matches autoregressive quality at high budgets. Analyses show improved self-correction and uncertainty calibration, making remasking markedly more compute-efficient.

【4】Visual Memory Injection Attacks for Multi-Turn Conversations
标题：针对多回合对话的视觉记忆注入攻击
链接：https://arxiv.org/abs/2602.15927

作者：Christian Schlarmann,Matthias Hein
摘要：生成式大型视觉语言模型（LVLM）最近取得了令人印象深刻的性能提升，其用户群正在迅速增长。然而，LVLM的安全性，特别是在长上下文多回合环境中，在很大程度上还没有得到充分的研究。在本文中，我们考虑了攻击者将操纵的图像上传到网络/社交媒体的现实场景。良性用户下载此图像并将其用作LVLM的输入。我们的新型隐形视觉记忆注入（VMI）攻击的设计是这样的，在正常的提示LVLM表现出名义上的行为，但一旦用户给出一个触发提示，LVLM输出一个特定的规定的目标消息来操纵用户，例如对抗性营销或政治说服。与以前专注于单回合攻击的工作相比，VMI即使在与用户进行长时间的多回合对话后也是有效的。我们证明了我们的攻击最近几个开放重量LVLM。因此，这篇文章表明，大规模操纵用户是可行的，在多轮对话设置扰动图像，呼吁更好的鲁棒性LVLM对这些攻击。我们在https://github.com/chs20/visual-memory-injection上发布源代码
摘要：Generative large vision-language models (LVLMs) have recently achieved impressive performance gains, and their user base is growing rapidly. However, the security of LVLMs, in particular in a long-context multi-turn setting, is largely underexplored. In this paper, we consider the realistic scenario in which an attacker uploads a manipulated image to the web/social media. A benign user downloads this image and uses it as input to the LVLM. Our novel stealthy Visual Memory Injection (VMI) attack is designed such that on normal prompts the LVLM exhibits nominal behavior, but once the user gives a triggering prompt, the LVLM outputs a specific prescribed target message to manipulate the user, e.g. for adversarial marketing or political persuasion. Compared to previous work that focused on single-turn attacks, VMI is effective even after a long multi-turn conversation with the user. We demonstrate our attack on several recent open-weight LVLMs. This article thereby shows that large-scale manipulation of users is feasible with perturbed images in multi-turn conversation settings, calling for better robustness of LVLMs against these attacks. We release the source code at https://github.com/chs20/visual-memory-injection

半/弱/无/有监督|不确定性|主动学习(2篇)

【1】Geometry-Aware Uncertainty Quantification via Conformal Prediction on Manifolds
标题：通过多边形预测实现几何感知的不确定性量化
链接：https://arxiv.org/abs/2602.16015

作者：Marzieh Amiri Shahbazi,Ali Baheri
摘要：共形预测为回归提供了无分布的覆盖率;然而现有的方法假设欧几里得输出空间，并且当响应位于黎曼流形上时产生校准不良的预测区域。我们提出了自适应测地线共形预测，一个框架，取代欧氏残差与测地线不一致的分数和规范化，他们通过交叉验证的难度估计处理异方差噪声。由此产生的预测区域，球体上的测地线帽，具有位置独立的区域，并将其大小调整为局部预测难度，从而产生比非自适应替代方案更均匀的条件覆盖。在一个合成球实验与强异方差和来自IGRF-14卫星数据的真实世界的地磁场预测任务，自适应方法显着降低条件覆盖的变化，并提高了最坏情况下的覆盖范围更接近标称水平，而基于坐标的基线浪费了很大一部分的覆盖面积，由于图表失真。
摘要：Conformal prediction provides distribution-free coverage guaranties for regression; yet existing methods assume Euclidean output spaces and produce prediction regions that are poorly calibrated when responses lie on Riemannian manifolds. We propose \emph{adaptive geodesic conformal prediction}, a framework that replaces Euclidean residuals with geodesic nonconformity scores and normalizes them by a cross-validated difficulty estimator to handle heteroscedastic noise. The resulting prediction regions, geodesic caps on the sphere, have position-independent area and adapt their size to local prediction difficulty, yielding substantially more uniform conditional coverage than non-adaptive alternatives. In a synthetic sphere experiment with strong heteroscedasticity and a real-world geomagnetic field forecasting task derived from IGRF-14 satellite data, the adaptive method markedly reduces conditional coverage variability and raises worst-case coverage much closer to the nominal level, while coordinate-based baselines waste a large fraction of coverage area due to chart distortion.

【2】Adaptive Semi-Supervised Training of P300 ERP-BCI Speller System with Minimum Calibration Effort
标题：以最小的校准工作量对P300 ERP-BCI拼写系统进行自适应半监督训练
链接：https://arxiv.org/abs/2602.15955

作者：Shumeng Chen,Jane E. Huggins,Tianwen Ma
备注：8 pages, 8 figures
摘要：P300基于ERP的脑机接口（BCI）拼写器是一种辅助交流工具。它搜索由目标刺激引起的P300事件相关电位（ERP），将其与嵌入在脑电图（EEG）信号中的对非目标刺激的神经反应区分开来。传统方法需要冗长的校准过程来构建二元分类器，这降低了整体效率。因此，我们提出了一个具有最小校准工作量的统一框架，以便在给定少量标记的校准数据的情况下，我们采用自适应半监督EM-GMM算法来更新二元分类器。我们评估了我们的方法的基础上，字符级的预测精度，信息传输率（ITR），和BCI实用程序。我们对训练数据进行了校准，并报告了测试数据的结果。我们的研究结果表明，在15名参与者中，9名参与者使用我们的自适应方法或基准测试超过了0.7的最低字符级准确度，这9名参与者中有7名显示我们的自适应方法优于基准测试。所提出的半监督学习框架提供了一种实用且有效的替代方案，以提高实时BCI拼写器系统的整体拼写效率，特别是在具有有限标记数据的上下文中。
摘要：A P300 ERP-based Brain-Computer Interface (BCI) speller is an assistive communication tool. It searches for the P300 event-related potential (ERP) elicited by target stimuli, distinguishing it from the neural responses to non-target stimuli embedded in electroencephalogram (EEG) signals. Conventional methods require a lengthy calibration procedure to construct the binary classifier, which reduced overall efficiency. Thus, we proposed a unified framework with minimum calibration effort such that, given a small amount of labeled calibration data, we employed an adaptive semi-supervised EM-GMM algorithm to update the binary classifier. We evaluated our method based on character-level prediction accuracy, information transfer rate (ITR), and BCI utility. We applied calibration on training data and reported results on testing data. Our results indicate that, out of 15 participants, 9 participants exceed the minimum character-level accuracy of 0.7 using either on our adaptive method or the benchmark, and 7 out of these 9 participants showed that our adaptive method performed better than the benchmark. The proposed semi-supervised learning framework provides a practical and efficient alternative to improve the overall spelling efficiency in the real-time BCI speller system, particularly in contexts with limited labeled data.

迁移|Zero/Few/One-Shot|自适应(7篇)

【1】A Contrastive Learning Framework Empowered by Attention-based Feature Adaptation for Street-View Image Classification
标题：基于注意力的特征自适应的对比学习框架用于街景图像分类
链接：https://arxiv.org/abs/2602.16590

作者：Qi You,Yitai Cheng,Zichao Zeng,James Haworth
摘要：街景图像属性分类是图像分类的重要下游任务，可用于自动驾驶、城市分析和高清地图构建等应用。无论是从头开始训练，从预先训练的权重初始化，还是微调大型模型，都需要计算。虽然CLIP等预先训练的视觉语言模型提供了丰富的图像表示，但现有的适应或微调方法通常依赖于其全局图像嵌入，限制了它们捕获复杂，杂乱的街道场景中必不可少的细粒度，局部属性的能力。为了解决这个问题，我们提出了CLIP-MHAdapter，当前轻量级CLIP适配范例的一个变体，它附加了一个瓶颈MLP，该MLP配备了对补丁令牌进行操作的多头自注意力，以模拟补丁间的依赖关系。CLIP-MHAdapter拥有约140万个可训练参数，在Global StreetScapes数据集上的八个属性分类任务中实现了卓越或具有竞争力的准确性，在保持低计算成本的同时获得了最先进的结果。该代码可在https://github.com/SpaceTimeLab/CLIP-MHAdapter上获得。
摘要：Street-view image attribute classification is a vital downstream task of image classification, enabling applications such as autonomous driving, urban analytics, and high-definition map construction. It remains computationally demanding whether training from scratch, initialising from pre-trained weights, or fine-tuning large models. Although pre-trained vision-language models such as CLIP offer rich image representations, existing adaptation or fine-tuning methods often rely on their global image embeddings, limiting their ability to capture fine-grained, localised attributes essential in complex, cluttered street scenes. To address this, we propose CLIP-MHAdapter, a variant of the current lightweight CLIP adaptation paradigm that appends a bottleneck MLP equipped with multi-head self-attention operating on patch tokens to model inter-patch dependencies. With approximately 1.4 million trainable parameters, CLIP-MHAdapter achieves superior or competitive accuracy across eight attribute classification tasks on the Global StreetScapes dataset, attaining new state-of-the-art results while maintaining low computational cost. The code is available at https://github.com/SpaceTimeLab/CLIP-MHAdapter.

【2】Let's Split Up: Zero-Shot Classifier Edits for Fine-Grained Video Understanding
标题：让我们分开：Zero-Shot分类器编辑以实现细粒度视频理解
链接：https://arxiv.org/abs/2602.16545

作者：Kaiting Liu,Hazel Doughty
备注：ICLR 2026
摘要：视频识别模型通常是在固定的分类上训练的，这些分类往往过于粗糙，将对象、方式或结果中的区别压缩在单个标签下。随着任务和定义的发展，这样的模型不能适应新出现的区别，收集新的注释和重新训练以适应这些变化是昂贵的。为了解决这些挑战，我们引入了类别拆分，这是一项新任务，对现有分类器进行编辑，将粗类别细化为更细的子类别，同时在其他地方保持准确性。我们提出了一种zero-shot编辑方法，利用视频分类器的潜在组成结构来暴露细粒度的区别，而无需额外的数据。我们进一步表明，低拍摄微调，而简单，是非常有效的，并受益于我们的zero-shot初始化。在我们新的视频基准上进行的类别划分实验表明，我们的方法大大优于视觉语言基线，提高了新划分类别的准确性，而不会牺牲其余类别的性能。项目页面：https://kaitingliu.github.io/Category-Splitting/。
摘要：Video recognition models are typically trained on fixed taxonomies which are often too coarse, collapsing distinctions in object, manner or outcome under a single label. As tasks and definitions evolve, such models cannot accommodate emerging distinctions and collecting new annotations and retraining to accommodate such changes is costly. To address these challenges, we introduce category splitting, a new task where an existing classifier is edited to refine a coarse category into finer subcategories, while preserving accuracy elsewhere. We propose a zero-shot editing method that leverages the latent compositional structure of video classifiers to expose fine-grained distinctions without additional data. We further show that low-shot fine-tuning, while simple, is highly effective and benefits from our zero-shot initialization. Experiments on our new video benchmarks for category splitting demonstrate that our method substantially outperforms vision-language baselines, improving accuracy on the newly split categories without sacrificing performance on the rest. Project page: https://kaitingliu.github.io/Category-Splitting/.

【3】Transfer Learning of Linear Regression with Multiple Pretrained Models: Benefiting from More Pretrained Models via Overparameterization Debiasing
标题：具有多个预训练模型的线性回归的转移学习：通过过度参数化去偏置从更多预训练模型中受益
链接：https://arxiv.org/abs/2602.16531

作者：Daniel Boharon,Yehuda Dar
摘要：我们使用几个可以过度参数化的最小二乘预训练模型来研究线性回归任务的迁移学习。我们将目标学习任务制定为最小化目标数据集上的平方误差的优化，并对学习模型与预训练模型的距离进行惩罚。我们分析制定的学习目标模型的测试误差，并提供相应的经验评估。我们的研究结果阐明了使用更多的预训练模型可以改善迁移学习。具体来说，如果预训练模型被过度参数化，那么使用足够多的预训练模型对于有益的迁移学习是很重要的。然而，学习可能会受到预训练模型的过度参数化偏差的影响，即，在高维参数空间中，将最小$\ell_2 $-范数解限制在训练样本所张成的小子空间内.我们提出了一个简单的通过乘法校正因子的去偏置，可以减少过度参数化偏差，并利用更多的预训练模型来学习目标预测器。
摘要：We study transfer learning for a linear regression task using several least-squares pretrained models that can be overparameterized. We formulate the target learning task as optimization that minimizes squared errors on the target dataset with penalty on the distance of the learned model from the pretrained models. We analytically formulate the test error of the learned target model and provide the corresponding empirical evaluations. Our results elucidate when using more pretrained models can improve transfer learning. Specifically, if the pretrained models are overparameterized, using sufficiently many of them is important for beneficial transfer learning. However, the learning may be compromised by overparameterization bias of pretrained models, i.e., the minimum $\ell_2$-norm solution's restriction to a small subspace spanned by the training examples in the high-dimensional parameter space. We propose a simple debiasing via multiplicative correction factor that can reduce the overparameterization bias and leverage more pretrained models to learn a target predictor.

【4】Training-Free Adaptation of Diffusion Models via Doob's $h$-Transform
标题：通过Doob的$h$-变换对扩散模型进行免训练适应
链接：https://arxiv.org/abs/2602.16198

作者：Qijie Zhu,Zeqi Ye,Han Liu,Zhaoran Wang,Minshuo Chen
备注：36 pages, 3 figures
摘要：自适应方法一直是解锁各种应用中预训练扩散模型的变革能力的主力。现有的方法往往抽象的自适应目标作为一个奖励函数和引导扩散模型，以产生高回报的样本。然而，由于额外的训练，这些方法可能会产生很高的计算开销，或者依赖于对奖励的严格假设，例如可微性。此外，尽管它们在经验上取得了成功，但理论上的理由和保证很少成立。在本文中，我们提出了DOIT（Doob-Oriented Inference-time Transformation），这是一种适用于通用的不可微奖励的免训练和计算效率高的自适应方法。我们方法的关键框架是一个度量传输公式，它试图将预先训练的生成分布传输到高回报的目标分布。我们利用Doob的$h$-变换来实现这种传输，它对扩散采样过程进行了动态校正，并在不修改预训练模型的情况下实现了基于模拟的高效计算。在理论上，我们建立了一个高概率收敛到目标高回报分布通过描述动态Doob的校正近似误差的保证。根据经验，在D4 RL离线RL基准测试中，我们的方法在保持采样效率的同时始终优于最先进的基线。
摘要：Adaptation methods have been a workhorse for unlocking the transformative power of pre-trained diffusion models in diverse applications. Existing approaches often abstract adaptation objectives as a reward function and steer diffusion models to generate high-reward samples. However, these approaches can incur high computational overhead due to additional training, or rely on stringent assumptions on the reward such as differentiability. Moreover, despite their empirical success, theoretical justification and guarantees are seldom established. In this paper, we propose DOIT (Doob-Oriented Inference-time Transformation), a training-free and computationally efficient adaptation method that applies to generic, non-differentiable rewards. The key framework underlying our method is a measure transport formulation that seeks to transport the pre-trained generative distribution to a high-reward target distribution. We leverage Doob's $h$-transform to realize this transport, which induces a dynamic correction to the diffusion sampling process and enables efficient simulation-based computation without modifying the pre-trained model. Theoretically, we establish a high probability convergence guarantee to the target high-reward distribution via characterizing the approximation error in the dynamic Doob's correction. Empirically, on D4RL offline RL benchmarks, our method consistently outperforms state-of-the-art baselines while preserving sampling efficiency.

【5】Collaborative Zone-Adaptive Zero-Day Intrusion Detection for IoBT
标题：面向IoBT的协作区域自适应零日入侵检测
链接：https://arxiv.org/abs/2602.16098

作者：Amirmohammad Pasdar,Shabnam Kasra Kermanshahi,Nour Moustafa,Van-Thuan Pham
摘要：战场物联网（IoBT）依赖于异构、带宽受限和间歇性连接的战术网络，这些网络面临着快速发展的网络威胁。在此设置中，由于中断的链路、延迟、操作安全限制和跨区域的非IID流量，入侵检测不能依赖于对原始流量的连续集中收集。我们提出了区域自适应入侵检测（ZAID），一个合作的检测和模型改进框架看不见的攻击类型，其中“零日”是指以前未观察到的攻击家族和行为（而不是漏洞披露时间）。ZAID结合了通用卷积模型用于可推广的流量表示，基于自动编码器的重建信号作为辅助异常分数，以及用于参数高效区域自适应的轻量级适配器模块。为了支持受限连接下的跨区域泛化，ZAID使用联邦聚合和伪标签来利用本地观察到的弱标签行为。我们使用零日协议在ToN_IoT上评估ZAID，该协议将MITM，DDoS和DoS排除在监督训练之外，并在区域级部署和适配期间引入它们。ZAID对看不见的攻击流量的准确率高达83.16%，并在相同的程序下传输到UNSW-NB 15，最高准确率为71.64%。这些结果表明，参数高效、区域个性化的协作可以提高对竞争性IoBT环境中以前看不见的攻击的检测。
摘要：The Internet of Battlefield Things (IoBT) relies on heterogeneous, bandwidth-constrained, and intermittently connected tactical networks that face rapidly evolving cyber threats. In this setting, intrusion detection cannot depend on continuous central collection of raw traffic due to disrupted links, latency, operational security limits, and non-IID traffic across zones. We present Zone-Adaptive Intrusion Detection (ZAID), a collaborative detection and model-improvement framework for unseen attack types, where "zero-day" refers to previously unobserved attack families and behaviours (not vulnerability disclosure timing). ZAID combines a universal convolutional model for generalisable traffic representations, an autoencoder-based reconstruction signal as an auxiliary anomaly score, and lightweight adapter modules for parameter-efficient zone adaptation. To support cross-zone generalisation under constrained connectivity, ZAID uses federated aggregation and pseudo-labelling to leverage locally observed, weakly labelled behaviours. We evaluate ZAID on ToN_IoT using a zero-day protocol that excludes MITM, DDoS, and DoS from supervised training and introduces them during zone-level deployment and adaptation. ZAID achieves up to 83.16% accuracy on unseen attack traffic and transfers to UNSW-NB15 under the same procedure, with a best accuracy of 71.64%. These results indicate that parameter-efficient, zone-personalised collaboration can improve the detection of previously unseen attacks in contested IoBT environments.

【6】World Action Models are Zero-shot Policies
标题：世界行动模式是零机会政策
链接：https://arxiv.org/abs/2602.15922

作者：Seonghyeon Ye,Yunhao Ge,Kaiyuan Zheng,Shenyuan Gao,Sihyun Yu,George Kurian,Suneel Indupuru,You Liang Tan,Chuning Zhu,Jiannan Xiang,Ayaan Malik,Kyungmin Lee,William Liang,Nadun Ranawaka,Jiasheng Gu,Yinzhen Xu,Guanzhi Wang,Fengyuan Hu,Avnish Narayan,Johan Bjorck,Jing Wang,Gwanghyun Kim,Dantong Niu,Ruijie Zheng,Yuqi Xie,Jimmy Wu,Qi Wang,Ryan Julian,Danfei Xu,Yilun Du,Yevgen Chebotar,Scott Reed,Jan Kautz,Yuke Zhu,Linxi "Jim" Fan,Joel Jang
备注：Project page: https://dreamzero0.github.io/
摘要：最先进的视觉-语言-动作（VLA）模型擅长语义概括，但很难概括到新环境中看不见的物理运动。我们介绍了DreamZero，一个建立在预训练视频扩散骨干上的世界动作模型（WAM）。与VLA不同，WAM通过预测未来世界的状态和行为来学习物理动力学，使用视频作为世界如何演变的密集表示。通过对视频和动作进行联合建模，DreamZero可以从异构的机器人数据中有效地学习各种技能，而无需依赖重复的演示。与真实机器人实验中最先进的VLA相比，这导致在新任务和环境的泛化方面有超过2倍的改进。关键是，通过模型和系统优化，我们使14 B自回归视频扩散模型能够在7 Hz下执行实时闭环控制。最后，我们展示了两种形式的跨具体转移：来自其他机器人或人类的仅视频演示，仅用10-20分钟的数据，就可以在看不见的任务性能上获得超过42%的相对提高。更令人惊讶的是，DreamZero实现了Few-Shot实施例自适应，在保留zero-shot泛化的同时转移到仅具有30分钟播放数据的新实施例。
摘要：State-of-the-art Vision-Language-Action (VLA) models excel at semantic generalization but struggle to generalize to unseen physical motions in novel environments. We introduce DreamZero, a World Action Model (WAM) built upon a pretrained video diffusion backbone. Unlike VLAs, WAMs learn physical dynamics by predicting future world states and actions, using video as a dense representation of how the world evolves. By jointly modeling video and action, DreamZero learns diverse skills effectively from heterogeneous robot data without relying on repetitive demonstrations. This results in over 2x improvement in generalization to new tasks and environments compared to state-of-the-art VLAs in real robot experiments. Crucially, through model and system optimizations, we enable a 14B autoregressive video diffusion model to perform real-time closed-loop control at 7Hz. Finally, we demonstrate two forms of cross-embodiment transfer: video-only demonstrations from other robots or humans yield a relative improvement of over 42% on unseen task performance with just 10-20 minutes of data. More surprisingly, DreamZero enables few-shot embodiment adaptation, transferring to a new embodiment with only 30 minutes of play data while retaining zero-shot generalization.

【7】Separating Oblivious and Adaptive Models of Variable Selection
标题：分离变量选择的不经意和自适应模型
链接：https://arxiv.org/abs/2602.16568

作者：Ziyun Chen,Jerry Li,Kevin Tian,Yusong Zhu
备注：40 pages
摘要：稀疏恢复是学习理论和高维统计中研究最多的问题之一。在这项工作中，我们调查的统计和计算景观稀疏恢复与$\ell_\infty$错误保证。这个问题的变体是由\n {变量选择}任务激发的，目标是估计$\mathbb{R}^d$中$k$-稀疏信号的支持度。我们的主要贡献是$\ell_\infty$稀疏恢复的\oblivious}（"for each“）和\adaptive}（" for all”）模型之间的可证明分离。我们发现，在一个不经意的模型下，最佳的$\ell_\infty$错误是在接近线性的时间与$\approx k\log d$样本，而在一个自适应模型，$\gtrsim k^2$样本是必要的任何算法，以实现这一界限。这与标准的$\ell_2$设置形成了惊人的对比，在标准设置中，即使对于自适应稀疏恢复，$\approx k \log d$ samples也足够了。最后，我们的初步检查的一个部分自适应模型，在那里我们显示非平凡的变量选择保证是可能的$\approx k\log d$测量。
摘要：Sparse recovery is among the most well-studied problems in learning theory and high-dimensional statistics. In this work, we investigate the statistical and computational landscapes of sparse recovery with $\ell_\infty$ error guarantees. This variant of the problem is motivated by \emph{variable selection} tasks, where the goal is to estimate the support of a $k$-sparse signal in $\mathbb{R}^d$. Our main contribution is a provable separation between the \emph{oblivious} (``for each'') and \emph{adaptive} (``for all'') models of $\ell_\infty$ sparse recovery. We show that under an oblivious model, the optimal $\ell_\infty$ error is attainable in near-linear time with $\approx k\log d$ samples, whereas in an adaptive model, $\gtrsim k^2$ samples are necessary for any algorithm to achieve this bound. This establishes a surprising contrast with the standard $\ell_2$ setting, where $\approx k \log d$ samples suffice even for adaptive sparse recovery. We conclude with a preliminary examination of a \emph{partially-adaptive} model, where we show nontrivial variable selection guarantees are possible with $\approx k\log d$ measurements.

强化学习(8篇)

【1】RIDER: 3D RNA Inverse Design with Reinforcement Learning-Guided Diffusion
标题：RIDER：具有强化学习引导扩散的3D RNA反向设计
链接：https://arxiv.org/abs/2602.16548

作者：Tianmeng Hu,Yongzheng Cui,Biao Luo,Ke Li
备注：Accepted as a conference paper at ICLR 2026
摘要：RNA三维（3D）结构的逆向设计对于合成生物学和治疗学中的功能RNA工程至关重要。虽然最近的深度学习方法已经推进了这一领域，但它们通常使用天然序列恢复进行优化和评估，这是结构保真度的有限替代，因为不同的序列可以折叠成类似的3D结构，并且高恢复率不一定表明正确的折叠。为了解决这一限制，我们提出了RIDER，这是一个带有强化学习的RNA逆向设计框架，可以直接优化3D结构相似性。首先，我们开发并预训练了一个以目标3D结构为条件的基于GNN的生成扩散模型，与最先进的方法相比，天然序列恢复提高了9%。然后，我们微调模型与改进的政策梯度算法使用四个特定任务的奖励功能的基础上的三维自一致性度量。实验结果表明，RIDER在所有指标中将结构相似性提高了100%以上，并发现了与天然序列不同的设计。
摘要：The inverse design of RNA three-dimensional (3D) structures is crucial for engineering functional RNAs in synthetic biology and therapeutics. While recent deep learning approaches have advanced this field, they are typically optimized and evaluated using native sequence recovery, which is a limited surrogate for structural fidelity, since different sequences can fold into similar 3D structures and high recovery does not necessarily indicate correct folding. To address this limitation, we propose RIDER, an RNA Inverse DEsign framework with Reinforcement learning that directly optimizes for 3D structural similarity. First, we develop and pre-train a GNN-based generative diffusion model conditioned on the target 3D structure, achieving a 9% improvement in native sequence recovery over state-of-the-art methods. Then, we fine-tune the model with an improved policy gradient algorithm using four task-specific reward functions based on 3D self-consistency metrics. Experimental results show that RIDER improves structural similarity by over 100% across all metrics and discovers designs that are distinct from native sequences.

【2】Vulnerability Analysis of Safe Reinforcement Learning via Inverse Constrained Reinforcement Learning
标题：逆约束强化学习安全强化学习的脆弱性分析
链接：https://arxiv.org/abs/2602.16543

作者：Jialiang Fan,Shixiong Jiang,Mengyu Liu,Fanxin Kong
备注：12 pages, 6 figures, supplementary material included
摘要：安全强化学习（Safe RL）旨在确保策略性能，同时满足安全约束。然而，大多数现有的安全强化学习方法都假设环境是良性的，这使得它们容易受到现实世界中常见的对抗性扰动的影响。此外，现有的基于梯度的对抗性攻击通常需要访问策略的梯度信息，这在现实场景中通常是不切实际的。为了应对这些挑战，我们提出了一个对抗性攻击框架来揭示安全RL策略的漏洞。使用专家演示和黑盒环境交互，我们的框架学习约束模型和代理（学习者）策略，实现基于梯度的攻击优化，而不需要受害者策略的内部梯度或地面实况安全约束。我们进一步提供理论分析，建立可行性和推导扰动界。多个安全RL基准测试的实验证明了我们的方法在有限的特权访问下的有效性。
摘要：Safe reinforcement learning (Safe RL) aims to ensure policy performance while satisfying safety constraints. However, most existing Safe RL methods assume benign environments, making them vulnerable to adversarial perturbations commonly encountered in real-world settings. In addition, existing gradient-based adversarial attacks typically require access to the policy's gradient information, which is often impractical in real-world scenarios. To address these challenges, we propose an adversarial attack framework to reveal vulnerabilities of Safe RL policies. Using expert demonstrations and black-box environment interaction, our framework learns a constraint model and a surrogate (learner) policy, enabling gradient-based attack optimization without requiring the victim policy's internal gradients or the ground-truth safety constraints. We further provide theoretical analysis establishing feasibility and deriving perturbation bounds. Experiments on multiple Safe RL benchmarks demonstrate the effectiveness of our approach under limited privileged access.

【3】Capacity-constrained demand response in smart grids using deep reinforcement learning
标题：使用深度强化学习的智能电网容量受限需求响应
链接：https://arxiv.org/abs/2602.16525

作者：Shafagh Abband Pashaki,Sepehr Maleki,Amir Badiee
摘要：提出了一种基于容量约束激励的居民智能电网需求响应方法。它旨在维持电网容量限制，并通过经济激励最终用户减少或改变其能源消耗来防止拥堵。建议的框架采用了分层架构，其中服务提供商调整每小时的激励费率的基础上批发电价和汇总的住宅负荷。明确考虑了服务提供商和最终用户的经济利益。采用深度强化学习方法来学习显式容量约束下的最优实时激励率。异构用户的喜好，通过电器级的家庭能源管理系统和不满意的成本建模。利用三个家庭的实际居民用电量和电价数据进行仿真，结果表明该方法有效地降低了峰值需求，平滑了总负荷曲线。与无需求响应情况相比，这导致峰均比降低约22.82%。
摘要：This paper presents a capacity-constrained incentive-based demand response approach for residential smart grids. It aims to maintain electricity grid capacity limits and prevent congestion by financially incentivising end users to reduce or shift their energy consumption. The proposed framework adopts a hierarchical architecture in which a service provider adjusts hourly incentive rates based on wholesale electricity prices and aggregated residential load. The financial interests of both the service provider and end users are explicitly considered. A deep reinforcement learning approach is employed to learn optimal real-time incentive rates under explicit capacity constraints. Heterogeneous user preferences are modelled through appliance-level home energy management systems and dissatisfaction costs. Using real-world residential electricity consumption and price data from three households, simulation results show that the proposed approach effectively reduces peak demand and smooths the aggregated load profile. This leads to an approximately 22.82% reduction in the peak-to-average ratio compared to the no-demand-response case.

【4】Reinforcement Learning for Parameterized Quantum State Preparation: A Comparative Study
标题：参数化量子状态准备的强化学习：比较研究
链接：https://arxiv.org/abs/2602.16523

作者：Gerhard Stenzel,Isabella Debelic,Michael Kölle,Tobias Rohe,Leo Sünkel,Julian Hager,Claudia Linnhoff-Popien
备注：Extended version of a short paper to be published at ICAART 2026
摘要：我们将具有强化学习的定向量子电路合成（DQCS）从纯粹的离散门选择扩展到具有连续单量子比特旋转\（R_x\）、\（R_y\）和\（R_z\）的参数化量子态制备。我们比较了两种训练机制：一个阶段的代理，联合选择门类型，受影响的量子位和旋转角度;和一个两阶段的变体，首先提出一个离散电路，随后使用参数偏移梯度与亚当优化旋转角度。使用Gymnasium和PennyLane，我们评估了包含2到10个量子位的系统和\（λ\）从1到5的复杂性增加的目标上的近端策略优化（PPO）和优势演员-评论家（A2 C）。而A2 C在这种情况下不会学习有效的策略，PPO在稳定的超参数下成功（一阶段：学习率约为\（5\times10^{-4}\），自保真度误差阈值为0.01;两阶段：学习率约为\（10^{-4}\））。这两种方法都能可靠地重建计算基态（成功率在83%和99%之间）和Bell态（成功率在61%和77%之间）。然而，可扩展性在\（λ\）约为3到4时饱和，即使在\（λ=2\）时也不会扩展到10个量子比特的目标。两阶段方法仅提供边际精度增益，同时需要大约三倍的运行时间。为了在固定计算预算下的实用性，我们因此建议采用一阶段PPO策略，提供明确的合成电路，并与经典的变分基线进行对比，以概述提高可扩展性的途径。
摘要：We extend directed quantum circuit synthesis (DQCS) with reinforcement learning from purely discrete gate selection to parameterized quantum state preparation with continuous single-qubit rotations $R_x$, $R_y$, and $R_z$. We compare two training regimes: a one-stage agent that jointly selects the gate type, the affected qubit(s), and the rotation angle; and a two-stage variant that first proposes a discrete circuit and subsequently optimizes the rotation angles with Adam using parameter-shift gradients. Using Gymnasium and PennyLane, we evaluate Proximal Policy Optimization (PPO) and Advantage Actor--Critic (A2C) on systems comprising two to ten qubits and on targets of increasing complexity with $λ$ ranging from one to five. Whereas A2C does not learn effective policies in this setting, PPO succeeds under stable hyperparameters (one-stage: learning rate approximately $5\times10^{-4}$ with a self-fidelity-error threshold of 0.01; two-stage: learning rate approximately $10^{-4}$). Both approaches reliably reconstruct computational basis states (between 83\% and 99\% success) and Bell states (between 61\% and 77\% success). However, scalability saturates for $λ$ of approximately three to four and does not extend to ten-qubit targets even at $λ=2$. The two-stage method offers only marginal accuracy gains while requiring around three times the runtime. For practicality under a fixed compute budget, we therefore recommend the one-stage PPO policy, provide explicit synthesized circuits, and contrast with a classical variational baseline to outline avenues for improved scalability.

【5】Causally-Guided Automated Feature Engineering with Multi-Agent Reinforcement Learning
标题：采用多智能体强化学习的因果引导自动特征工程
链接：https://arxiv.org/abs/2602.16435

作者：Arun Vignesh Malarkkan,Wangyang Ying,Yanjie Fu
备注：11 Pages, References and Appendix
摘要：自动特征工程（AFE）使AI系统能够从原始表格数据中自主构建高实用性的表示。然而，现有的AFE方法依赖于统计几何学，产生脆性特征，在分布偏移下失败。我们介绍CAFE，这是一个框架，它将AFE重新定义为因果引导的顺序决策过程，将因果发现与强化学习驱动的特征构建联系起来。阶段I学习特征和目标上的稀疏有向无环图以获得软因果先验，基于特征对目标的因果影响将特征分组为直接、间接或其他。第二阶段使用级联多智能体深度Q学习架构来选择因果组和转换运算符，并采用分层奖励成形和因果组级探索策略，在控制特征复杂性的同时支持因果合理的转换。在15个公共基准测试中（使用宏F1分类;使用逆相对绝对误差回归），CAFE比强大的AFE基线提高了7%，减少了收敛时间，并提供了具有竞争力的目标时间。在受控协变量变化下，CAFE相对于非因果多代理基线将性能下降降低了约4倍，并产生了具有更稳定的事后归因的更紧凑的特征集。这些发现强调，因果结构，作为一个软的归纳先验，而不是一个刚性的约束，可以大大提高自动化特征工程的鲁棒性和效率。
摘要：Automated feature engineering (AFE) enables AI systems to autonomously construct high-utility representations from raw tabular data. However, existing AFE methods rely on statistical heuristics, yielding brittle features that fail under distribution shift. We introduce CAFE, a framework that reformulates AFE as a causally-guided sequential decision process, bridging causal discovery with reinforcement learning-driven feature construction. Phase I learns a sparse directed acyclic graph over features and the target to obtain soft causal priors, grouping features as direct, indirect, or other based on their causal influence with respect to the target. Phase II uses a cascading multi-agent deep Q-learning architecture to select causal groups and transformation operators, with hierarchical reward shaping and causal group-level exploration strategies that favor causally plausible transformations while controlling feature complexity. Across 15 public benchmarks (classification with macro-F1; regression with inverse relative absolute error), CAFE achieves up to 7% improvement over strong AFE baselines, reduces episodes-to-convergence, and delivers competitive time-to-target. Under controlled covariate shifts, CAFE reduces performance drop by ~4x relative to a non-causal multi-agent baseline, and produces more compact feature sets with more stable post-hoc attributions. These findings underscore that causal structure, used as a soft inductive prior rather than a rigid constraint, can substantially improve the robustness and efficiency of automated feature engineering.

【6】Graphon Mean-Field Subsampling for Cooperative Heterogeneous Multi-Agent Reinforcement Learning
标题：协作式异类多智能体强化学习的图形平均场子采样
链接：https://arxiv.org/abs/2602.16196

作者：Emile Anand,Richard Hoffmann,Sarah Liaw,Adam Wierman
备注：43 pages, 5 figures, 1 table
摘要：协调大量交互代理是多代理强化学习（MARL）的核心挑战，其中联合状态-动作空间的大小随代理数量呈指数级变化。平均场方法通过聚集代理交互来减轻这种负担，但这些方法假设均匀交互。最近的基于图子的框架捕捉异质性，但计算昂贵的代理数量的增长。因此，我们引入$\texttt{GMFS}$，一个$\textbf{G}$raphon $\textbf{M}$ean-$\textbf{F}$ield $\textbf{S}$子采样框架，用于具有异构代理交互的可扩展协作MARL。通过根据交互强度对$κ$代理进行子采样，我们近似了图子加权平均场，并学习了样本复杂度为$\mathrm{poly}（κ）$和最优性差距为O（1/\sqrtκ）$的策略。我们验证了我们的理论与机器人协调的数值模拟，显示$\texttt{GMFS}$达到接近最优的性能。
摘要：Coordinating large populations of interacting agents is a central challenge in multi-agent reinforcement learning (MARL), where the size of the joint state-action space scales exponentially with the number of agents. Mean-field methods alleviate this burden by aggregating agent interactions, but these approaches assume homogeneous interactions. Recent graphon-based frameworks capture heterogeneity, but are computationally expensive as the number of agents grows. Therefore, we introduce $\texttt{GMFS}$, a $\textbf{G}$raphon $\textbf{M}$ean-$\textbf{F}$ield $\textbf{S}$ubsampling framework for scalable cooperative MARL with heterogeneous agent interactions. By subsampling $κ$ agents according to interaction strength, we approximate the graphon-weighted mean-field and learn a policy with sample complexity $\mathrm{poly}(κ)$ and optimality gap $O(1/\sqrtκ)$. We verify our theory with numerical simulations in robotic coordination, showing that $\texttt{GMFS}$ achieves near-optimal performance.

【7】MARLEM: A Multi-Agent Reinforcement Learning Simulation Framework for Implicit Cooperation in Decentralized Local Energy Markets
标题：MARLEM：用于分散本地能源市场隐性合作的多智能体强化学习模拟框架
链接：https://arxiv.org/abs/2602.16063

作者：Nelson Salazar-Pena,Alejandra Tabares,Andres Gonzalez-Mancera
备注：32 pages, 7 figures, 1 table, 1 algorithm
摘要：本文介绍了一种新的，开源的MARL模拟框架，研究LEMs中的隐式合作，建模为一个分散的部分可观察马尔可夫决策过程，并实现为MARL的健身房环境。我们的框架具有一个模块化的市场平台，具有即插即用的清算机制，物理约束的代理模型（包括电池存储），一个现实的网格网络和一个全面的分析套件，以评估紧急协调。主要贡献是一种新的方法，以促进隐式合作，其中代理的观察和奖励与系统级的关键性能指标，使他们能够独立学习的战略，有利于整个系统，并旨在集体有益的结果没有明确的沟通。通过代表性案例研究（在https：//github.com/salazarna/marlem的专用GitHub存储库中提供），我们展示了框架分析不同市场配置（例如不同的存储部署）如何影响系统性能的能力。这说明了它在促进紧急协调、提高市场效率和加强电网稳定性方面的潜力。所提出的仿真框架是一个灵活的，可扩展的，可重复的工具，研究人员和从业人员设计，测试和验证未来的智能，分散的能源系统的策略。
摘要：This paper introduces a novel, open-source MARL simulation framework for studying implicit cooperation in LEMs, modeled as a decentralized partially observable Markov decision process and implemented as a Gymnasium environment for MARL. Our framework features a modular market platform with plug-and-play clearing mechanisms, physically constrained agent models (including battery storage), a realistic grid network, and a comprehensive analytics suite to evaluate emergent coordination. The main contribution is a novel method to foster implicit cooperation, where agents' observations and rewards are enhanced with system-level key performance indicators to enable them to independently learn strategies that benefit the entire system and aim for collectively beneficial outcomes without explicit communication. Through representative case studies (available in a dedicated GitHub repository in https://github.com/salazarna/marlem, we show the framework's ability to analyze how different market configurations (such as varying storage deployment) impact system performance. This illustrates its potential to facilitate emergent coordination, improve market efficiency, and strengthen grid stability. The proposed simulation framework is a flexible, extensible, and reproducible tool for researchers and practitioners to design, test, and validate strategies for future intelligent, decentralized energy systems.

【8】Harnessing Implicit Cooperation: A Multi-Agent Reinforcement Learning Approach Towards Decentralized Local Energy Markets
标题：利用隐性合作：面向分散本地能源市场的多主体强化学习方法
链接：https://arxiv.org/abs/2602.16062

作者：Nelson Salazar-Pena,Alejandra Tabares,Andres Gonzalez-Mancera
备注：42 pages, 7 figures, 10 tables
摘要：本文提出了隐式的合作，一个框架，使分散的代理近似最优协调在当地能源市场没有明确的对等通信。我们将问题表述为一个分散的部分可观察马尔可夫决策问题，该问题通过多智能体强化学习任务来解决，在该任务中，智能体使用stigmergic信号（系统级的关键性能指标）来推断并对全局状态做出反应。通过IEEE 34节点拓扑上的3x 3析因设计，我们评估了三种训练范式（CTCE，CTDE，DTDE）和三种算法（PPO，APPO，SAC）。结果确定APPO-DTDE的最佳配置，实现了91.7%的协调得分相对于理论集中基准（CTCE）。然而，在效率和稳定性之间出现了一个关键的权衡：虽然集中式基准以0.6的点对点交易比率最大化了分配效率，但完全分散的方法（DTDE）表现出了优越的物理稳定性。具体而言，与混合架构相比，DTDE将电网平衡的变化减少了31%，建立了高度可预测的、进口偏置的负载分布，简化了电网调节。此外，拓扑分析揭示了新兴的空间集群，分散的代理自组织成稳定的交易社区，以尽量减少拥塞惩罚。虽然SAC在混合环境中表现出色，但由于熵驱动的不稳定性，它在分散环境中失败了。这项研究证明，stigmergic信令为复杂的网格协调提供了足够的上下文，为昂贵的集中式通信基础设施提供了一个强大的，保护隐私的替代方案。
摘要：This paper proposes implicit cooperation, a framework enabling decentralized agents to approximate optimal coordination in local energy markets without explicit peer-to-peer communication. We formulate the problem as a decentralized partially observable Markov decision problem that is solved through a multi-agent reinforcement learning task in which agents use stigmergic signals (key performance indicators at the system level) to infer and react to global states. Through a 3x3 factorial design on an IEEE 34-node topology, we evaluated three training paradigms (CTCE, CTDE, DTDE) and three algorithms (PPO, APPO, SAC). Results identify APPO-DTDE as the optimal configuration, achieving a coordination score of 91.7% relative to the theoretical centralized benchmark (CTCE). However, a critical trade-off emerges between efficiency and stability: while the centralized benchmark maximizes allocative efficiency with a peer-to-peer trade ratio of 0.6, the fully decentralized approach (DTDE) demonstrates superior physical stability. Specifically, DTDE reduces the variance of grid balance by 31% compared to hybrid architectures, establishing a highly predictable, import-biased load profile that simplifies grid regulation. Furthermore, topological analysis reveals emergent spatial clustering, where decentralized agents self-organize into stable trading communities to minimize congestion penalties. While SAC excelled in hybrid settings, it failed in decentralized environments due to entropy-driven instability. This research proves that stigmergic signaling provides sufficient context for complex grid coordination, offering a robust, privacy-preserving alternative to expensive centralized communication infrastructure.

分层学习(1篇)

【1】Fast Online Learning with Gaussian Prior-Driven Hierarchical Unimodal Thompson Sampling
标题：采用高斯先验驱动分层单峰Thompson抽样的快速在线学习
链接：https://arxiv.org/abs/2602.15972

作者：Tianchi Zhao,He Liu,Hongyin Shi,Jinliang Li
摘要：研究一类多臂强盗（MAB）问题，其中具有高斯奖励反馈的手臂被聚类。由于高斯分布的普适性，这种臂设置在许多现实问题中得到应用，例如，毫米波通信和具有风险资产的投资组合管理。基于高斯先验的汤普森采样算法（TSG）选择最优臂的算法，提出了一种适用于2层分层结构的高斯先验下的离散臂汤普森采样算法（TSCG）。我们证明，通过利用2级结构，我们可以实现一个更低的遗憾界比我们做普通的TSG。此外，当奖励是单峰的，我们可以达到一个更低的边界上的遗憾，我们的单峰汤普森采样算法与高斯先验下的交叉臂（UTSCG）。我们提出的每一个算法都伴随着理论上的评价上的遗憾界，我们的数值实验证实了我们提出的算法的优势。
摘要：We study a type of Multi-Armed Bandit (MAB) problems in which arms with a Gaussian reward feedback are clustered. Such an arm setting finds applications in many real-world problems, for example, mmWave communications and portfolio management with risky assets, as a result of the universality of the Gaussian distribution. Based on the Thompson Sampling algorithm with Gaussian prior (TSG) algorithm for the selection of the optimal arm, we propose our Thompson Sampling with Clustered arms under Gaussian prior (TSCG) specific to the 2-level hierarchical structure. We prove that by utilizing the 2-level structure, we can achieve a lower regret bound than we do with ordinary TSG. In addition, when the reward is Unimodal, we can reach an even lower bound on the regret by our Unimodal Thompson Sampling algorithm with Clustered Arms under Gaussian prior (UTSCG). Each of our proposed algorithms are accompanied by theoretical evaluation of the upper regret bound, and our numerical experiments confirm the advantage of our proposed algorithms.

医学相关(2篇)

【1】P-RAG: Prompt-Enhanced Parametric RAG with LoRA and Selective CoT for Biomedical and Multi-Hop QA
标题：P-RAG：具有LoRA和选择性CoT的预算增强参数RAG，用于生物医学和多跳QA
链接：https://arxiv.org/abs/2602.15874

作者：Xingda Lyu,Gongfu Lyu,Zitai Yan,Yuxin Jiang
摘要：大型语言模型（LLM）表现出卓越的能力，但仍然受到其对静态训练数据的依赖的限制。检索增强生成（RAG）解决了这个问题，在推理过程中检索外部知识，但它仍然在很大程度上依赖于知识库的质量。为了探索潜在的改进，我们评估了三种RAG变体-标准RAG，DA-RAG和我们提出的增强型参数RAG（P-RAG），这是一种混合架构，它在LLM和检索证据中集成了参数知识，并在思想链（CoT）提示和低秩自适应（LoRA）微调的指导下对一般和生物医学数据集进行了微调。使用通过LoRA微调的LLaMA-3.2-1B-Instruct，我们在PubMedQA和2 WikiMultihopQA上进行评估。P-RAG在PubMedQA上的表现比标准RAG在F1中高出10.47个百分点（93.33% vs. 82.86%;相对12.64%）。在2 WikiMultihopQA上，P-RAG的总得分几乎是标准RAG的两倍（33.44%对17.83%），在比较子集上达到44.03%（42.74%桥，21.84%推理，8.60%组成）。CoT提示大大提高了多跳推理，但对于更简单的单跳查询，结果好坏参半。这些发现强调了P-RAG在准确、可扩展和上下文自适应生物医学问题回答方面的潜力。我们的贡献包括：（1）LLaMA-3.2-1B-Instruct基于LoRA的微调，用于生物医学QA，（2）引入P-RAG与Chain-of-Thought提示，以及（3）PubMedQA和2 WikiMultihopQA的最新结果。
摘要：Large Language Models (LLMs) demonstrate remarkable capabilities but remain limited by their reliance on static training data. Retrieval-Augmented Generation (RAG) addresses this constraint by retrieving external knowledge during inference, though it still depends heavily on knowledge base quality. To explore potential improvements, we evaluated three RAG variants-Standard RAG, DA-RAG, and our proposed Prompt-Enhanced Parametric RAG (P-RAG), a hybrid architecture that integrates parametric knowledge within the LLM and retrieved evidence, guided by Chain-of-Thought (CoT) prompting and Low-Rank Adaptation (LoRA) fine-tuning-on both general and biomedical datasets. Using LLaMA-3.2-1B-Instruct fine-tuned via LoRA, we evaluate on PubMedQA and 2WikiMultihopQA. P-RAG outperforms Standard RAG on PubMedQA by 10.47 percentage points in F1 (93.33% vs. 82.86%; 12.64% relative). On 2WikiMultihopQA, P-RAG nearly doubles the overall score vs. Standard RAG (33.44% vs. 17.83%) and achieves 44.03% on the Compare subset (with 42.74% Bridge, 21.84% Inference, 8.60% Compose). CoT prompting substantially improves multi-hop reasoning but yields mixed results for simpler, single-hop queries. These findings underscore P-RAG's potential for accurate, scalable, and contextually adaptive biomedical question answering. Our contributions include: (1) LoRA-based fine-tuning of LLaMA-3.2-1B-Instruct for biomedical QA, (2) introduction of P-RAG with Chain-of-Thought prompting, and (3) state-of-the-art results on PubMedQA and 2WikiMultihopQA.

【2】Building Safe and Deployable Clinical Natural Language Processing under Temporal Leakage Constraints
标题：在时间泄漏约束下构建安全且可部署的临床自然语言处理
链接：https://arxiv.org/abs/2602.15852

作者：Ha Na Cho,Sairam Sutari,Alexander Lopez,Hansen Bow,Kai Zheng
摘要：临床自然语言处理（NLP）模型已经显示出通过利用叙述性临床文档来支持医院出院计划的前景。然而，基于笔记的模型特别容易受到时间和词汇泄漏的影响，其中文档工件编码未来的临床决策并夸大明显的预测性能。这种行为对现实世界的部署构成了巨大的风险，过度自信或暂时无效的预测可能会扰乱临床工作流程并损害患者安全。本研究的重点是在时间泄漏约束下构建安全和可部署的临床NLP所需的系统级设计选择。我们提出了一个轻量级的审计管道，将可解释性集成到模型开发过程中，以在最终训练之前识别和抑制易于泄漏的信号。使用择期脊柱手术后第二天出院预测作为案例研究，我们评估审计如何影响预测行为，校准和安全相关的权衡。结果表明，审计模型表现出更保守和更好的校准概率估计，减少依赖放电相关的词汇线索。这些发现强调，部署就绪的临床NLP系统应该优先考虑时间有效性，校准和行为鲁棒性，而不是乐观的性能。
摘要：Clinical natural language processing (NLP) models have shown promise for supporting hospital discharge planning by leveraging narrative clinical documentation. However, note-based models are particularly vulnerable to temporal and lexical leakage, where documentation artifacts encode future clinical decisions and inflate apparent predictive performance. Such behavior poses substantial risks for real-world deployment, where overconfident or temporally invalid predictions can disrupt clinical workflows and compromise patient safety. This study focuses on system-level design choices required to build safe and deployable clinical NLP under temporal leakage constraints. We present a lightweight auditing pipeline that integrates interpretability into the model development process to identify and suppress leakage-prone signals prior to final training. Using next-day discharge prediction after elective spine surgery as a case study, we evaluate how auditing affects predictive behavior, calibration, and safety-relevant trade-offs. Results show that audited models exhibit more conservative and better-calibrated probability estimates, with reduced reliance on discharge-related lexical cues. These findings emphasize that deployment-ready clinical NLP systems should prioritize temporal validity, calibration, and behavioral robustness over optimistic performance.

蒸馏|知识提取(1篇)

【1】Multi-Class Boundary Extraction from Implicit Representations
标题：从隐式表示中提取多类边界
链接：https://arxiv.org/abs/2602.16217

作者：Jash Vira,Andrew Myers,Simon Ratcliffe
摘要：从隐式神经表示建模单类表面的表面提取是一个众所周知的任务。然而，不存在从多个类的隐式表示中提取保证拓扑正确性和无孔的表面的方法。在这项工作中，我们通过引入一个二维边界提取算法的多类的情况下，专注于拓扑一致性和水密性，这也允许设置最小的细节约束的近似奠定了基础。最后，我们使用地质建模数据评估我们的算法，展示其适应性和尊重复杂拓扑结构的能力。
摘要：Surface extraction from implicit neural representations modelling a single class surface is a well-known task. However, there exist no surface extraction methods from an implicit representation of multiple classes that guarantee topological correctness and no holes. In this work, we lay the groundwork by introducing a 2D boundary extraction algorithm for the multi-class case focusing on topological consistency and water-tightness, which also allows for setting minimum detail restraint on the approximation. Finally, we evaluate our algorithm using geological modelling data, showcasing its adaptiveness and ability to honour complex topology.

推荐(3篇)

【1】Variable-Length Semantic IDs for Recommender Systems
标题：推荐系统的可变长度语义ID
链接：https://arxiv.org/abs/2602.16375

作者：Kirill Khrylchenko
摘要：生成模型越来越多地用于推荐系统中，既用于将用户行为建模为事件序列，又用于将大型语言模型集成到推荐管道中。这种设置的一个关键挑战是项空间的基数非常大，这使得训练生成模型变得困难，并在自然语言和项标识符之间引入了词汇差距。语义标识符（语义ID），它表示作为低基数令牌序列的项目，最近出现作为一个有效的解决方案，这个问题。然而，现有的方法生成固定长度的语义标识符，为所有项目分配相同的描述长度。这是低效的，与自然语言不一致，并且忽略了现实世界目录的高度倾斜的频率结构，其中流行的项目和罕见的长尾项目表现出根本不同的信息需求。与此同时，新兴的通信文献研究了代理如何开发离散的通信协议，通常会产生可变长度的消息，其中频繁的概念得到较短的描述。尽管概念上的相似性，这些想法还没有被系统地采用在推荐系统。在这项工作中，我们的桥梁推荐系统和紧急通信引入可变长度的语义标识符的建议。我们提出了一个离散变分自动编码器与Gumbel-Softmax重新参数化，学习项目表示的自适应长度的原则概率框架下，避免了不稳定的REINFORCE为基础的训练和固定长度的限制，以前的语义ID方法。
摘要：Generative models are increasingly used in recommender systems, both for modeling user behavior as event sequences and for integrating large language models into recommendation pipelines. A key challenge in this setting is the extremely large cardinality of item spaces, which makes training generative models difficult and introduces a vocabulary gap between natural language and item identifiers. Semantic identifiers (semantic IDs), which represent items as sequences of low-cardinality tokens, have recently emerged as an effective solution to this problem. However, existing approaches generate semantic identifiers of fixed length, assigning the same description length to all items. This is inefficient, misaligned with natural language, and ignores the highly skewed frequency structure of real-world catalogs, where popular items and rare long-tail items exhibit fundamentally different information requirements. In parallel, the emergent communication literature studies how agents develop discrete communication protocols, often producing variable-length messages in which frequent concepts receive shorter descriptions. Despite the conceptual similarity, these ideas have not been systematically adopted in recommender systems. In this work, we bridge recommender systems and emergent communication by introducing variable-length semantic identifiers for recommendation. We propose a discrete variational autoencoder with Gumbel-Softmax reparameterization that learns item representations of adaptive length under a principled probabilistic framework, avoiding the instability of REINFORCE-based training and the fixed-length constraints of prior semantic ID methods.

【2】Rethinking ANN-based Retrieval: Multifaceted Learnable Index for Large-scale Recommendation System
标题：重新思考基于NN的检索：大规模推荐系统的多面可学习索引
链接：https://arxiv.org/abs/2602.16124

作者：Jiang Zhang,Yubo Wang,Wei Chang,Lu Han,Xingying Cheng,Feng Zhang,Min Li,Songhao Jiang,Wei Zheng,Harry Tran,Zhen Wang,Lei Chen,Yueming Wang,Benyu Zhang,Xiangjun Fan,Bi Xue,Qifan Wang
摘要：近似最近邻（ANN）搜索广泛应用于大型推荐系统的检索阶段。在这个阶段中，候选项使用其学习的嵌入向量进行索引，并对每个用户（或项）查询执行ANN搜索以检索一组相关项。然而，基于人工神经网络的检索有两个关键的限制。首先，项目嵌入和它们的索引通常是在不同的阶段学习的：索引通常是在嵌入训练之后离线执行的，这可能会产生次优的检索质量，特别是对于新创建的项目。其次，虽然ANN提供了次线性查询时间，但它仍然必须为每个请求运行，从而在工业规模上产生大量计算成本。在本文中，我们提出了多方面可学习索引（MFLI），这是一种可扩展的实时检索范式，可以在统一框架内学习多方面的项目嵌入和索引，并消除服务时的人工神经网络搜索。具体来说，我们通过项目嵌入的残差量化构建了一个多方面的分层码本，并与嵌入共同训练码本。我们进一步引入了一个高效的多方面的索引结构和机制，支持实时更新。在服务时，学习的层次索引直接用于识别相关项目，完全避免了ANN搜索。在数十亿用户的真实数据上进行的广泛实验表明，与现有的最先进的方法相比，MFLI将参与任务的召回率提高了11.8%，冷内容交付提高了57.29%，语义相关性提高了13.5%。我们还部署MFLI在系统中，并报告在线实验结果表明，提高了参与度，更少的流行偏见，更高的服务效率。
摘要：Approximate nearest neighbor (ANN) search is widely used in the retrieval stage of large-scale recommendation systems. In this stage, candidate items are indexed using their learned embedding vectors, and ANN search is executed for each user (or item) query to retrieve a set of relevant items. However, ANN-based retrieval has two key limitations. First, item embeddings and their indices are typically learned in separate stages: indexing is often performed offline after embeddings are trained, which can yield suboptimal retrieval quality-especially for newly created items. Second, although ANN offers sublinear query time, it must still be run for every request, incurring substantial computation cost at industry scale. In this paper, we propose MultiFaceted Learnable Index (MFLI), a scalable, real-time retrieval paradigm that learns multifaceted item embeddings and indices within a unified framework and eliminates ANN search at serving time. Specifically, we construct a multifaceted hierarchical codebook via residual quantization of item embeddings and co-train the codebook with the embeddings. We further introduce an efficient multifaceted indexing structure and mechanisms that support real-time updates. At serving time, the learned hierarchical indices are used directly to identify relevant items, avoiding ANN search altogether. Extensive experiments on real-world data with billions of users show that MFLI improves recall on engagement tasks by up to 11.8\%, cold-content delivery by up to 57.29\%, and semantic relevance by 13.5\% compared with prior state-of-the-art methods. We also deploy MFLI in the system and report online experimental results demonstrating improved engagement, less popularity bias, and higher serving efficiency.

【3】BamaER: A Behavior-Aware Memory-Augmented Model for Exercise Recommendation
标题：BamaER：用于锻炼推荐的行为感知记忆增强模型
链接：https://arxiv.org/abs/2602.15879

作者：Qing Yang,Yuhao Jiang,Rui Wang,Jipeng Guo,Yejiang Wang,Xinghe Cheng,Zezheng Wu,Jiapu Wang,Jingwei Zhang
摘要：习题推荐是根据学生的学习历史、兴趣爱好等个性化特征进行个性化的习题选择。尽管取得了显着的进步，大多数现有的方法表示学生学习仅仅作为练习序列，忽略了丰富的行为交互信息。这种有限的代表性往往导致对学习进展的估计有偏见和不可靠。此外，固定长度的序列分割限制了早期学习经验的结合，从而阻碍了长期依赖关系的建模和知识掌握的准确估计。为了解决这些局限性，我们提出了一个行为感知的记忆增强练习推荐框架BamaER，它包括三个核心模块：（i）学习进度预测模块，通过三向混合编码方案捕获异构学生交互行为;（ii）记忆-增强知识跟踪模块，其维持动态记忆矩阵以联合地对历史和当前知识状态进行建模以用于鲁棒掌握估计;以及（iii）练习过滤模块，其将候选者选择公式化为多样性感知优化问题，通过河马优化算法来解决，以减少冗余并提高推荐覆盖率。在五个真实世界的教育数据集上进行的实验表明，BamaER在一系列评估指标上始终优于最先进的基线。
摘要：Exercise recommendation focuses on personalized exercise selection conditioned on students' learning history, personal interests, and other individualized characteristics. Despite notable progress, most existing methods represent student learning solely as exercise sequences, overlooking rich behavioral interaction information. This limited representation often leads to biased and unreliable estimates of learning progress. Moreover, fixed-length sequence segmentation limits the incorporation of early learning experiences, thereby hindering the modeling of long-term dependencies and the accurate estimation of knowledge mastery. To address these limitations, we propose BamaER, a Behavior-aware memory-augmented Exercise Recommendation framework that comprises three core modules: (i) the learning progress prediction module that captures heterogeneous student interaction behaviors via a tri-directional hybrid encoding scheme; (ii) the memory-augmented knowledge tracing module that maintains a dynamic memory matrix to jointly model historical and current knowledge states for robust mastery estimation; and (iii) the exercise filtering module that formulates candidate selection as a diversity-aware optimization problem, solved via the Hippopotamus Optimization Algorithm to reduce redundancy and improve recommendation coverage. Experiments on five real-world educational datasets show that BamaER consistently outperforms state-of-the-art baselines across a range of evaluation metrics.

联邦学习|隐私保护|加密(2篇)

【1】Towards Secure and Scalable Energy Theft Detection: A Federated Learning Approach for Resource-Constrained Smart Meters
标题：迈向安全和可扩展的能源盗窃检测：资源受限智能电表的联邦学习方法
链接：https://arxiv.org/abs/2602.16181

作者：Diego Labate,Dipanwita Thakur,Giancarlo Fortino
摘要：能源盗窃对智能电网的稳定性和效率构成重大威胁，导致重大经济损失和运营挑战。传统的集中式机器学习盗窃检测方法需要聚合用户数据，这引发了对隐私和数据安全的严重担忧。这些问题在智能电表环境中进一步加剧，其中设备通常受到资源限制并且缺乏运行繁重模型的能力。在这项工作中，我们提出了一个保护隐私的联邦学习框架，用于能源盗窃检测，同时解决隐私和计算约束。我们的方法利用适合在低功耗智能电表上部署的轻量级多层感知器（MLP）模型，并通过在聚合之前将高斯噪声注入本地模型更新来集成基本差分隐私（DP）。这确保了正式的隐私保证，而不会影响学习性能。我们评估我们的框架下IID和非IID数据分布在现实世界的智能电表数据集。实验结果表明，我们的方法实现了竞争力的准确率，精度，召回率和AUC分数，同时保持隐私和效率。这使得所提出的解决方案实用且可扩展，可用于下一代智能电网基础设施中的安全能源盗窃检测。
摘要：Energy theft poses a significant threat to the stability and efficiency of smart grids, leading to substantial economic losses and operational challenges. Traditional centralized machine learning approaches for theft detection require aggregating user data, raising serious concerns about privacy and data security. These issues are further exacerbated in smart meter environments, where devices are often resource-constrained and lack the capacity to run heavy models. In this work, we propose a privacy-preserving federated learning framework for energy theft detection that addresses both privacy and computational constraints. Our approach leverages a lightweight multilayer perceptron (MLP) model, suitable for deployment on low-power smart meters, and integrates basic differential privacy (DP) by injecting Gaussian noise into local model updates before aggregation. This ensures formal privacy guarantees without compromising learning performance. We evaluate our framework on a real-world smart meter dataset under both IID and non-IID data distributions. Experimental results demonstrate that our method achieves competitive accuracy, precision, recall, and AUC scores while maintaining privacy and efficiency. This makes the proposed solution practical and scalable for secure energy theft detection in next-generation smart grid infrastructures.

【2】Exploring New Frontiers in Vertical Federated Learning: the Role of Saddle Point Reformulation
标题：探索垂直联邦学习的新前沿：鞍点重组的作用
链接：https://arxiv.org/abs/2602.15996

作者：Aleksandr Beznosikov,Georgiy Kormakov,Alexander Grigorievskiy,Mikhail Rudakov,Ruslan Nazykov,Alexander Rogozin,Anton Vakhrushev,Andrey Savchenko,Martin Takáč,Alexander Gasnikov
备注：104 pages, 1 table, 9 figures, 10 theorems, 12 algorithms
摘要：垂直联合学习（Vertical Federated Learning，VFL）的目标是在共享相同用户的同时，使用不同设备上可用的功能来集体训练模型。本文主要研究了基于经典拉格朗日函数的VFL问题的鞍点重构。我们首先演示如何使用确定性方法来解决这个公式。更重要的是，我们探索各种随机修改，以适应实际情况，如采用压缩技术，有效的信息传输，使部分参与异步通信，并利用坐标选择更快的本地计算。我们表明，鞍点重新制定起着关键作用，并开辟了可能性，使用上述扩展，似乎是不可能的标准最小化制定。为每个算法提供了收敛估计，证明了它们在解决VFL问题方面的有效性。此外，替代的重新制定的调查，并进行了数值实验，以验证所提出的方法的性能和有效性。
摘要：The objective of Vertical Federated Learning (VFL) is to collectively train a model using features available on different devices while sharing the same users. This paper focuses on the saddle point reformulation of the VFL problem via the classical Lagrangian function. We first demonstrate how this formulation can be solved using deterministic methods. More importantly, we explore various stochastic modifications to adapt to practical scenarios, such as employing compression techniques for efficient information transmission, enabling partial participation for asynchronous communication, and utilizing coordinate selection for faster local computation. We show that the saddle point reformulation plays a key role and opens up possibilities to use mentioned extension that seem to be impossible in the standard minimization formulation. Convergence estimates are provided for each algorithm, demonstrating their effectiveness in addressing the VFL problem. Additionally, alternative reformulations are investigated, and numerical experiments are conducted to validate performance and effectiveness of the proposed approach.

推理|分析|理解|解释(5篇)

【1】Steering diffusion models with quadratic rewards: a fine-grained analysis
标题：具有二次回报的引导扩散模型：细粒度分析
链接：https://arxiv.org/abs/2602.16570

作者：Ankur Moitra,Andrej Risteski,Dhruv Rohatgi
摘要：推理时间算法是一种新兴的范例，其中预先训练的模型被用作子例程来解决下游任务。这样的算法已被提出的任务范围从逆问题和引导图像生成推理。然而，目前在实践中部署的方法是具有各种故障模式的自动化-我们对何时可以有效地改进这些自动化知之甚少。在本文中，我们考虑了从奖励倾斜扩散模型中采样的任务-即从$p^{\star}（x）\propto p（x）\exp（r（x））$中采样-给定奖励函数$r$和$p$的预训练扩散预言。我们提供了一个细粒度的分析，这个任务的计算易处理性的二次奖励$r（x）= x^\top A x + b^\top x$。我们表明，线性奖励倾斜总是有效的采样-一个简单的结果，似乎已经在文献中被忽视。我们使用它作为构建块，以及概念上的新成分--哈伯德-斯特拉托诺维奇变换--来提供一种从低秩正定二次倾斜采样的有效算法，即$r（x）= x^\top A x$其中$A$是正定的，并且秩为$O（1）$。对于负定倾斜，即$r（x）= - x^\top A x$其中$A$是正定的，我们证明了即使$A$是秩1的（尽管有指数大的条目），问题也是棘手的。
摘要：Inference-time algorithms are an emerging paradigm in which pre-trained models are used as subroutines to solve downstream tasks. Such algorithms have been proposed for tasks ranging from inverse problems and guided image generation to reasoning. However, the methods currently deployed in practice are heuristics with a variety of failure modes -- and we have very little understanding of when these heuristics can be efficiently improved. In this paper, we consider the task of sampling from a reward-tilted diffusion model -- that is, sampling from $p^{\star}(x) \propto p(x) \exp(r(x))$ -- given a reward function $r$ and pre-trained diffusion oracle for $p$. We provide a fine-grained analysis of the computational tractability of this task for quadratic rewards $r(x) = x^\top A x + b^\top x$. We show that linear-reward tilts are always efficiently sampleable -- a simple result that seems to have gone unnoticed in the literature. We use this as a building block, along with a conceptually new ingredient -- the Hubbard-Stratonovich transform -- to provide an efficient algorithm for sampling from low-rank positive-definite quadratic tilts, i.e. $r(x) = x^\top A x$ where $A$ is positive-definite and of rank $O(1)$. For negative-definite tilts, i.e. $r(x) = - x^\top A x$ where $A$ is positive-definite, we prove that the problem is intractable even if $A$ is of rank 1 (albeit with exponentially-large entries).

【2】Missing-by-Design: Certifiable Modality Deletion for Revocable Multimodal Sentiment Analysis
标题：设计缺失：可撤销多模式情绪分析的可认证模式删除
链接：https://arxiv.org/abs/2602.16144

作者：Rong Fu,Wenxin Zhang,Ziming Wang,Chunlei Meng,Jiaxuan Lu,Jiekai Wu,Kangan Qian,Hao Zhang,Simon Fong
备注：21 pages, 6 figures
摘要：随着多模态系统越来越多地处理敏感的个人数据，选择性地撤销特定数据模态的能力已经成为隐私合规性和用户自主性的关键要求。我们提出了Missing-by-Design（MBD），这是一个统一的多模态情感分析框架，它将结构化表示学习与可认证的参数修改管道相结合。可撤销性在隐私敏感应用中至关重要，在这些应用中，用户或监管机构可能会要求删除特定于模式的信息。MBD学习属性感知嵌入，并采用基于生成器的重建来恢复丢失的通道，同时保留任务相关的信号。对于删除请求，该框架应用显着性驱动的候选选择和校准的高斯更新来产生机器可验证的模态删除证书。在基准数据集上的实验表明，MBD在不完整的输入下实现了强大的预测性能，并提供了一个实用的隐私-效用权衡，将手术遗忘定位为完全再训练的有效替代方案。
摘要：As multimodal systems increasingly process sensitive personal data, the ability to selectively revoke specific data modalities has become a critical requirement for privacy compliance and user autonomy. We present Missing-by-Design (MBD), a unified framework for revocable multimodal sentiment analysis that combines structured representation learning with a certifiable parameter-modification pipeline. Revocability is critical in privacy-sensitive applications where users or regulators may request removal of modality-specific information. MBD learns property-aware embeddings and employs generator-based reconstruction to recover missing channels while preserving task-relevant signals. For deletion requests, the framework applies saliency-driven candidate selection and a calibrated Gaussian update to produce a machine-verifiable Modality Deletion Certificate. Experiments on benchmark datasets show that MBD achieves strong predictive performance under incomplete inputs and delivers a practical privacy-utility trade-off, positioning surgical unlearning as an efficient alternative to full retraining.

【3】CHAI: CacHe Attention Inference for text2video
标题：CHAI：CacHe注意力推理文本2 video
链接：https://arxiv.org/abs/2602.16132

作者：Joel Mathew Cherian,Ashutosh Muralidhara Bharadwaj,Vima Gupta,Anand Padmanabha Iyer
摘要：文本到视频扩散模型提供了令人印象深刻的结果，但由于3D潜伏期的顺序去噪而仍然缓慢。现有的加速推理的方法要么需要昂贵的模型再训练，要么使用基于递归的跳步，随着去噪步骤的减少，这很难保持视频质量。我们的工作CHAI旨在使用交叉推理缓存来减少延迟，同时保持视频质量。我们引入缓存注意力作为一种有效的方法参加跨交叉推理潜伏共享对象/场景。这种选择性注意机制可以在语义相关的提示中有效地重用缓存的潜在信息，从而产生高缓存命中率。我们证明了使用Cache Attention可以生成高质量的视频，只需8个去噪步骤。当集成到整个系统中时，CHAI比基准OpenSora 1.2快1.65倍-3.35倍，同时保持视频质量。
摘要：Text-to-video diffusion models deliver impressive results but remain slow because of the sequential denoising of 3D latents. Existing approaches to speed up inference either require expensive model retraining or use heuristic-based step skipping, which struggles to maintain video quality as the number of denoising steps decreases. Our work, CHAI, aims to use cross-inference caching to reduce latency while maintaining video quality. We introduce Cache Attention as an effective method for attending to shared objects/scenes across cross-inference latents. This selective attention mechanism enables effective reuse of cached latents across semantically related prompts, yielding high cache hit rates. We show that it is possible to generate high-quality videos using Cache Attention with as few as 8 denoising steps. When integrated into the overall system, CHAI is 1.65x - 3.35x faster than baseline OpenSora 1.2 while maintaining video quality.

【4】The Limits of Long-Context Reasoning in Automated Bug Fixing
标题：长上下文推理在自动漏洞修复中的局限性
链接：https://arxiv.org/abs/2602.16069

作者：Ravi Raju,Mengmeng Ji,Shubhangi Upasani,Bo Li,Urmish Thakker
备注：4 pages, under review
摘要：上下文长度的快速增长导致了大型语言模型（LLM）可以直接在整个代码库上进行推理的假设。与此同时，LLM的最新进展使软件工程基准测试的性能强劲，特别是与代理工作流搭配使用时。在这项工作中，我们系统地评估目前的LLM是否可以可靠地执行长上下文代码调试和补丁生成。使用SWE-bench Verified作为受控实验设置，我们首先评估了代理工具（mini-SWE-agent）中最先进的模型，其中性能大幅提高：GPT-5-nano在100个样本上实现了高达31%的分辨率，而Deepseek-R1-0528等开源模型获得了具有竞争力的结果。然而，令牌级分析表明，成功的代理轨迹通常保持在20 k令牌，较长的积累上下文与较低的成功率相关，这表明代理的成功主要来自任务分解成短上下文步骤，而不是有效的长上下文推理。为了直接测试长上下文能力，我们构建了一个数据管道，通过将相关文件放置到上下文中人为地增加输入的上下文长度（确保完美的检索召回）;然后我们研究了真正长上下文（64 k-128 k令牌）下的单次补丁生成。尽管如此，性能急剧下降：Qwen 3-Coder-30 B-A3 B在64 k上下文中仅达到7%的分辨率，而GPT-5-nano无法解决任何任务。定性分析揭示了系统的故障模式，包括幻觉差异，不正确的文件目标，和畸形的补丁头。总的来说，我们的研究结果突出了当前LLM中的标称上下文长度和可用上下文容量之间的显着差距，并表明现有的代理编码基准没有有意义地评估长上下文推理。
摘要：Rapidly increasing context lengths have led to the assumption that large language models (LLMs) can directly reason over entire codebases. Concurrently, recent advances in LLMs have enabled strong performance on software engineering benchmarks, particularly when paired with agentic workflows. In this work, we systematically evaluate whether current LLMs can reliably perform long-context code debugging and patch generation. Using SWE-bench Verified as a controlled experimental setting, we first evaluate state-of-the-art models within an agentic harness (mini-SWE-agent), where performance improves substantially: GPT-5-nano achieves up to a 31\% resolve rate on 100 samples, and open-source models such as Deepseek-R1-0528 obtain competitive results. However, token-level analysis shows that successful agentic trajectories typically remain under 20k tokens, and that longer accumulated contexts correlate with lower success rates, indicating that agentic success primarily arises from task decomposition into short-context steps rather than effective long-context reasoning. To directly test long-context capability, we construct a data pipeline where we artificially inflate the context length of the input by placing the relevant files into the context (ensuring perfect retrieval recall); we then study single-shot patch generation under genuinely long contexts (64k-128k tokens). Despite this setup, performance degrades sharply: Qwen3-Coder-30B-A3B achieves only a 7\% resolve rate at 64k context, while GPT-5-nano solves none of the tasks. Qualitative analysis reveals systematic failure modes, including hallucinated diffs, incorrect file targets, and malformed patch headers. Overall, our findings highlight a significant gap between nominal context length and usable context capacity in current LLMs, and suggest that existing agentic coding benchmarks do not meaningfully evaluate long-context reasoning.

【5】Kalman-Inspired Runtime Stability and Recovery in Hybrid Reasoning Systems
标题：混合推理系统中受卡尔曼启发的状态稳定性和恢复
链接：https://arxiv.org/abs/2602.15855

作者：Barak Or
备注：Under review
摘要：将学习组件与基于模型的推理相结合的混合推理系统越来越多地部署在工具增强的决策循环中，但它们在部分可观察性和持续证据不匹配下的运行时行为仍然知之甚少。在实践中，故障往往是由于内部推理动态的逐渐发散而不是孤立的预测错误。这项工作研究了运行时稳定性的混合推理系统从卡尔曼启发的角度。我们的模型推理作为一个随机推理过程驱动的内部创新信号，并引入认知漂移作为一个可测量的运行时现象。稳定性是根据可检测性、有界发散性和可恢复性而不是任务级正确性来定义的。我们提出了一个运行时稳定性框架，监测创新统计数据，检测新兴的不稳定性，并触发恢复意识的控制机制。多步，工具增强推理任务的实验表明，可靠的不稳定检测之前，任务失败，并表明恢复，当可行的，重新建立有限的时间内的内部行为。这些结果强调运行时的稳定性作为一个系统级的要求，在不确定性的可靠推理。
摘要：Hybrid reasoning systems that combine learned components with model-based inference are increasingly deployed in tool-augmented decision loops, yet their runtime behavior under partial observability and sustained evidence mismatch remains poorly understood. In practice, failures often arise as gradual divergence of internal reasoning dynamics rather than as isolated prediction errors. This work studies runtime stability in hybrid reasoning systems from a Kalman-inspired perspective. We model reasoning as a stochastic inference process driven by an internal innovation signal and introduce cognitive drift as a measurable runtime phenomenon. Stability is defined in terms of detectability, bounded divergence, and recoverability rather than task-level correctness. We propose a runtime stability framework that monitors innovation statistics, detects emerging instability, and triggers recovery-aware control mechanisms. Experiments on multi-step, tool-augmented reasoning tasks demonstrate reliable instability detection prior to task failure and show that recovery, when feasible, re-establishes bounded internal behavior within finite time. These results emphasize runtime stability as a system-level requirement for reliable reasoning under uncertainty.

检测相关(5篇)

【1】How to Label Resynthesized Audio: The Dual Role of Neural Audio Codecs in Audio Deepfake Detection
标题：如何标记重新合成的音频：神经音频编解码器在音频Deepfake检测中的双重角色
链接：https://arxiv.org/abs/2602.16343

作者：Yixuan Xiao,Florian Lux,Alejandro Pérez-González-de-Martos,Ngoc Thang Vu
备注：Accepted to ICASSP 2026
摘要：由于文本到语音系统通常不直接产生波形，最近的欺骗检测研究使用来自声码器和神经音频编解码器的重新合成波形来模拟攻击者。与专为语音合成而设计的声码器不同，神经音频编解码器最初是为了压缩音频以进行存储和传输而开发的。然而，他们离散语音的能力也引发了人们对基于语言建模的语音合成的兴趣。由于这种双重功能，编解码器重新合成的数据可以被标记为真实或欺骗。到目前为止，很少有研究涉及这个问题。在这项研究中，我们提出了一个具有挑战性的扩展ASVspoof 5数据集为此目的而构建。我们研究不同的标签选择如何影响检测性能，并提供标签策略的见解。
摘要：Since Text-to-Speech systems typically don't produce waveforms directly, recent spoof detection studies use resynthesized waveforms from vocoders and neural audio codecs to simulate an attacker. Unlike vocoders, which are specifically designed for speech synthesis, neural audio codecs were originally developed for compressing audio for storage and transmission. However, their ability to discretize speech also sparked interest in language-modeling-based speech synthesis. Owing to this dual functionality, codec resynthesized data may be labeled as either bonafide or spoof. So far, very little research has addressed this issue. In this study, we present a challenging extension of the ASVspoof 5 dataset constructed for this purpose. We examine how different labeling choices affect detection performance and provide insights into labeling strategies.

【2】Explainability for Fault Detection System in Chemical Processes
标题：化学过程故障检测系统的可解释性
链接：https://arxiv.org/abs/2602.16341

作者：Georgios Gravanis,Dimitrios Kyriakou,Spyros Voutetakis,Simira Papadopoulou,Konstantinos Diamantaras
摘要：在这项工作中，我们应用并比较了两种最先进的可解释性人工智能（XAI）方法，即集成解释（IG）和SHapley加法解释（SHAP），它们解释了高度准确的长短时间记忆（LSTM）分类器的故障诊断决策。该分类器被训练用于检测基准非线性化学过程田纳西伊士曼过程（TEP）中的故障。它强调了如何XAI方法可以帮助确定的子系统的过程中发生的故障。使用我们对该过程的了解，我们注意到，在大多数情况下，相同的特征被指示为对决策最重要的特征，而在某些情况下，SHAP方法似乎信息量更大，更接近故障的根本原因。最后，由于所使用的XAI方法是模型不可知的，所提出的方法不限于特定的过程，也可以用于类似的问题。
摘要：In this work, we apply and compare two state-of-the-art eXplainability Artificial Intelligence (XAI) methods, the Integrated Gradients (IG) and the SHapley Additive exPlanations (SHAP), that explain the fault diagnosis decisions of a highly accurate Long Short-Time Memory (LSTM) classifier. The classifier is trained to detect faults in a benchmark non-linear chemical process, the Tennessee Eastman Process (TEP). It is highlighted how XAI methods can help identify the subsystem of the process where the fault occurred. Using our knowledge of the process, we note that in most cases the same features are indicated as the most important for the decision, while insome cases the SHAP method seems to be more informative and closer to the root cause of the fault. Finally, since the used XAI methods are model-agnostic, the proposed approach is not limited to the specific process and can also be used in similar problems.

【3】Axle Sensor Fusion for Online Continual Wheel Fault Detection in Wayside Railway Monitoring
标题：轴传感器融合用于轨边铁路监测中在线连续车轮故障检测
链接：https://arxiv.org/abs/2602.16101

作者：Afonso Lourenço,Francisca Osório,Diogo Risca,Goreti Marreiros
摘要：可靠和具有成本效益的维护对铁路安全至关重要，特别是在容易磨损和故障的轮轨界面。预测性维护框架越来越多地利用传感器生成的时间序列数据，但传统方法需要手动特征工程，并且深度学习模型通常会随着操作模式的不断变化而在在线设置中降级。这项工作提出了一个语义感知，标签效率的持续学习框架，铁路故障诊断。加速度计信号通过变分自动编码器编码成以完全无监督的方式捕获正常操作结构的潜在表示。重要的是，包括轴数、车轮指数和基于应变的变形在内的语义元数据是通过光纤布拉格光栅传感器（抗电磁干扰）上的人工智能驱动峰值检测来提取的，并与VAE嵌入融合，增强未知操作条件下的异常检测。轻量级的梯度提升监督分类器以最少的标签稳定异常评分，而基于重播的持续学习策略能够适应不断发展的领域，而不会发生灾难性的遗忘。实验表明，该模型检测到轻微的缺陷，由于单位和变形，同时适应不断变化的操作条件，如训练类型，速度，负载和轨道轮廓的变化，捕获使用一个单一的加速度计和应变仪在路边监测。
摘要：Reliable and cost-effective maintenance is essential for railway safety, particularly at the wheel-rail interface, which is prone to wear and failure. Predictive maintenance frameworks increasingly leverage sensor-generated time-series data, yet traditional methods require manual feature engineering, and deep learning models often degrade in online settings with evolving operational patterns. This work presents a semantic-aware, label-efficient continual learning framework for railway fault diagnostics. Accelerometer signals are encoded via a Variational AutoEncoder into latent representations capturing the normal operational structure in a fully unsupervised manner. Importantly, semantic metadata, including axle counts, wheel indexes, and strain-based deformations, is extracted via AI-driven peak detection on fiber Bragg grating sensors (resistant to electromagnetic interference) and fused with the VAE embeddings, enhancing anomaly detection under unknown operational conditions. A lightweight gradient boosting supervised classifier stabilizes anomaly scoring with minimal labels, while a replay-based continual learning strategy enables adaptation to evolving domains without catastrophic forgetting. Experiments show the model detects minor imperfections due to flats and polygonization, while adapting to evolving operational conditions, such as changes in train type, speed, load, and track profiles, captured using a single accelerometer and strain gauge in wayside monitoring.

【4】A Study on Real-time Object Detection using Deep Learning
标题：利用深度学习进行实时对象检测的研究
链接：https://arxiv.org/abs/2602.15926

作者：Ankita Bose,Jayasravani Bhumireddy,Naveen N
备注：34 pages, 18 figures
摘要：目标检测在一系列领域具有引人注目的应用，包括人机界面、安全和视频监控、导航和道路交通监控、交通系统、工业自动化医疗保健、增强现实（AR）和虚拟现实（VR）世界、环境监控和活动识别。实时目标检测在所有这些领域的应用提供了对视觉信息的动态分析，有助于立即做出决策。此外，先进的深度学习算法利用了对象检测领域的进步，提供了更准确、更高效的解决方案。有一些优秀的深度学习算法用于对象检测，包括Faster R CNN（基于区域的卷积神经网络），Mask R-CNN，Cascade R-CNN，YOLO（You Only Look Once），SSD（Single Shot Multibox Detector），RetinaNet等。它提供了有关可用的不同对象检测模型、开放基准数据集的信息，以及有关在一系列应用中使用对象检测模型的研究。此外，对照研究提供了各种策略进行比较，并产生一些启发性的发现。最后但并非最不重要的是，提供了一些令人鼓舞的挑战和方法，作为进一步研究相关深度学习方法和对象识别的建议。
摘要：Object detection has compelling applications over a range of domains, including human-computer interfaces, security and video surveillance, navigation and road traffic monitoring, transportation systems, industrial automation healthcare, the world of Augmented Reality (AR) and Virtual Reality (VR), environment monitoring and activity identification. Applications of real time object detection in all these areas provide dynamic analysis of the visual information that helps in immediate decision making. Furthermore, advanced deep learning algorithms leverage the progress in the field of object detection providing more accurate and efficient solutions. There are some outstanding deep learning algorithms for object detection which includes, Faster R CNN(Region-based Convolutional Neural Network),Mask R-CNN, Cascade R-CNN, YOLO (You Only Look Once), SSD (Single Shot Multibox Detector), RetinaNet etc. This article goes into great detail on how deep learning algorithms are used to enhance real time object recognition. It provides information on the different object detection models available, open benchmark datasets, and studies on the use of object detection models in a range of applications. Additionally, controlled studies are provided to compare various strategies and produce some illuminating findings. Last but not least, a number of encouraging challenges and approaches are offered as suggestions for further investigation in both relevant deep learning approaches and object recognition.

【5】Multi-Channel Replay Speech Detection using Acoustic Maps
标题：使用声学地图的多通道回放语音检测
链接：https://arxiv.org/abs/2602.16399

作者：Michael Neri,Tuomas Virtanen
备注：Submitted to EUSIPCO 2026
摘要：重放攻击仍然是自动说话人验证系统的一个关键漏洞，特别是在实时语音助理应用程序中。在这项工作中，我们提出了声学地图作为一种新的空间特征表示重放语音检测从多通道录音。从离散方位角和仰角网格上的经典波束形成导出，声学地图对反映人类语音辐射和基于语音扬声器的重放之间的物理差异的定向能量分布进行编码。一个轻量级的卷积神经网络被设计来对这种表示进行操作，在具有大约6k个可训练参数的ReMASC数据集上实现有竞争力的性能。实验结果表明，声学地图提供了一个紧凑的和物理上可解释的特征空间，在不同的设备和声学环境的重放攻击检测。
摘要：Replay attacks remain a critical vulnerability for automatic speaker verification systems, particularly in real-time voice assistant applications. In this work, we propose acoustic maps as a novel spatial feature representation for replay speech detection from multi-channel recordings. Derived from classical beamforming over discrete azimuth and elevation grids, acoustic maps encode directional energy distributions that reflect physical differences between human speech radiation and loudspeaker-based replay. A lightweight convolutional neural network is designed to operate on this representation, achieving competitive performance on the ReMASC dataset with approximately 6k trainable parameters. Experimental results show that acoustic maps provide a compact and physically interpretable feature space for replay attack detection across different devices and acoustic environments.

分类|识别(3篇)

【1】Amortized Predictability-aware Training Framework for Time Series Forecasting and Classification
标题：用于时间序列预测和分类的摊销可预测性感知训练框架
链接：https://arxiv.org/abs/2602.16224

作者：Xu Zhang,Peng Wang,Yichen Li,Wei Wang
备注：This work is accepted by the proceedings of the ACM Web Conference 2026 (WWW 2026). The code is available at the link https://github.com/Meteor-Stars/APTF
摘要：时间序列数据在各个领域都容易受到噪声的影响，训练样本可能包含偏离正常数据分布的低可预测性模式，导致训练不稳定或收敛到较差的局部最小值。因此，减轻低预测性样本的不利影响对于时间序列预测（TSF）和时间序列分类（TSC）等时间序列分析任务至关重要。虽然许多深度学习模型已经取得了令人鼓舞的性能，但很少有人考虑如何识别和惩罚低可预测性样本，以从训练的角度提高模型性能。为了填补这一空白，我们提出了一个通用的摊销可预测性感知训练框架（APTF）的TSF和TSC。APTF引入了两个关键设计，使模型能够专注于高可预测性样本，同时仍然从低可预测性样本中进行适当的学习：（i）分层可预测性感知损失（HPL），其动态地识别低可预测性样本并随着训练的发展逐渐扩大其损失惩罚，以及（ii）摊销模型，其减轻由模型偏差引起的可预测性估计误差，进一步提高HPL的效率。该代码可在https://github.com/Meteor-Stars/APTF上获得。
摘要：Time series data are prone to noise in various domains, and training samples may contain low-predictability patterns that deviate from the normal data distribution, leading to training instability or convergence to poor local minima. Therefore, mitigating the adverse effects of low-predictability samples is crucial for time series analysis tasks such as time series forecasting (TSF) and time series classification (TSC). While many deep learning models have achieved promising performance, few consider how to identify and penalize low-predictability samples to improve model performance from the training perspective. To fill this gap, we propose a general Amortized Predictability-aware Training Framework (APTF) for both TSF and TSC. APTF introduces two key designs that enable the model to focus on high-predictability samples while still learning appropriately from low-predictability ones: (i) a Hierarchical Predictability-aware Loss (HPL) that dynamically identifies low-predictability samples and progressively expands their loss penalty as training evolves, and (ii) an amortization model that mitigates predictability estimation errors caused by model bias, further enhancing HPL's effectiveness. The code is available at https://github.com/Meteor-Stars/APTF.

【2】Linked Data Classification using Neurochaos Learning
标题：使用Neuchaos学习的关联数据分类
链接：https://arxiv.org/abs/2602.16204

作者：Pooja Honna,Ayush Patravali,Nithin Nagaraj,Nanjangud C. Narendra
摘要：神经混沌学习（NL）最近显示出优于传统深度学习的前景，因为它有两个关键特性：从小规模训练样本中学习的能力和低计算要求。在之前的工作中，NL已经在可分离和时间序列数据上实现和广泛测试，并在分类和回归任务上表现出卓越的性能。在本文中，我们研究了NL的下一步，即，将NL应用于链接的数据，特别是以知识图的形式表示的数据。我们通过在知识图上实现节点聚合来将链接数据集成到NL中，然后将聚合的节点特征馈送到最简单的NL架构：ChaosNet。我们证明了我们的实现的结果homophilic图数据集，以及heterophilic图数据集verying heterophily。我们显示出更好的效果，我们的方法对同胚图比异嗜图。同时，我们也提出了我们的分析结果，以及对未来工作的建议。
摘要：Neurochaos Learning (NL) has shown promise in recent times over traditional deep learning due to its two key features: ability to learn from small sized training samples, and low compute requirements. In prior work, NL has been implemented and extensively tested on separable and time series data, and demonstrated its superior performance on both classification and regression tasks. In this paper, we investigate the next step in NL, viz., applying NL to linked data, in particular, data that is represented in the form of knowledge graphs. We integrate linked data into NL by implementing node aggregation on knowledge graphs, and then feeding the aggregated node features to the simplest NL architecture: ChaosNet. We demonstrate the results of our implementation on homophilic graph datasets as well as heterophilic graph datasets of verying heterophily. We show better efficacy of our approach on homophilic graphs than on heterophilic graphs. While doing so, we also present our analysis of the results, as well as suggestions for future work.

【3】Partial Identification under Missing Data Using Weak Shadow Variables from Pretrained Models
标题：使用预训练模型中的弱阴影变量在缺失数据下进行部分识别
链接：https://arxiv.org/abs/2602.16061

作者：Hongyu Chen,David Simchi-Levi,Ruoxuan Xiong
摘要：从用户反馈中估计群体数量（如平均结果）是平台评估和社会科学的基础，但反馈通常不是随机缺失的（MNAR）：具有更强意见的用户更有可能做出回应，因此标准估计量存在偏差，并且在没有额外假设的情况下无法识别被估计量。现有的方法通常依赖于强参数假设或定制的辅助变量，这在实践中可能无法获得。在本文中，我们开发了一个部分识别框架，在该框架中，通过求解一对线性规划，其约束条件编码的观测数据结构的估计量的尖锐的界限。这种公式自然地结合了来自预训练模型（包括大型语言模型（LLM））的结果预测，作为收紧可行集的额外线性约束。我们称这些预测为弱阴影变量：它们满足关于缺失的条件独立性假设，但不需要满足经典阴影变量方法所需的完整性条件。当预测有足够的信息时，边界会塌陷到一个点，恢复标准识别作为一个特殊情况。在有限样本中，提供有效的覆盖率的识别集，我们提出了一个集扩展估计，实现慢于$\sqrt{n}$收敛速度的集识别制度和标准的$\sqrt{n}$率下的点识别。在客户服务对话的模拟和半合成实验中，我们发现LLM预测对于经典的阴影变量方法通常是病态的，但在我们的框架中仍然非常有效。它们在保持真实MNAR机制下有效覆盖的同时，将识别间隔缩短了75- 83%。
摘要：Estimating population quantities such as mean outcomes from user feedback is fundamental to platform evaluation and social science, yet feedback is often missing not at random (MNAR): users with stronger opinions are more likely to respond, so standard estimators are biased and the estimand is not identified without additional assumptions. Existing approaches typically rely on strong parametric assumptions or bespoke auxiliary variables that may be unavailable in practice. In this paper, we develop a partial identification framework in which sharp bounds on the estimand are obtained by solving a pair of linear programs whose constraints encode the observed data structure. This formulation naturally incorporates outcome predictions from pretrained models, including large language models (LLMs), as additional linear constraints that tighten the feasible set. We call these predictions weak shadow variables: they satisfy a conditional independence assumption with respect to missingness but need not meet the completeness conditions required by classical shadow-variable methods. When predictions are sufficiently informative, the bounds collapse to a point, recovering standard identification as a special case. In finite samples, to provide valid coverage of the identified set, we propose a set-expansion estimator that achieves slower-than-$\sqrt{n}$ convergence rate in the set-identified regime and the standard $\sqrt{n}$ rate under point identification. In simulations and semi-synthetic experiments on customer-service dialogues, we find that LLM predictions are often ill-conditioned for classical shadow-variable methods yet remain highly effective in our framework. They shrink identification intervals by 75--83\% while maintaining valid coverage under realistic MNAR mechanisms.

表征(5篇)

【1】Knowledge-Embedded Latent Projection for Robust Representation Learning
标题：用于鲁棒表示学习的嵌入知识的潜在投影
链接：https://arxiv.org/abs/2602.16709

作者：Weijing Tang,Ming Yuan,Zongqi Xia,Tianxi Cai
摘要：潜空间模型被广泛用于分析高维离散数据矩阵，例如电子健康记录（EHR）中的患者特征矩阵，通过低维嵌入捕获复杂的依赖结构。然而，估计变得具有挑战性的不平衡的制度，其中一个矩阵的维度是远远大于其他。在EHR应用中，队列大小通常受到疾病流行率或数据可用性的限制，而由于医学编码系统的广度，特征空间仍然非常大。由于外部语义嵌入的可用性不断增加，例如EHR中预训练的临床概念嵌入，我们提出了一种知识嵌入式潜在投影模型，该模型利用语义辅助信息来规范表示学习。具体来说，我们通过在再生核希尔伯特空间中的映射将列嵌入建模为语义嵌入的平滑函数。我们开发了一个计算效率高的两步估计过程，结合语义引导的子空间建设，通过核主成分分析与可扩展的投影梯度下降。我们建立估计误差界的特征之间的权衡统计误差和近似误差引起的核投影。此外，我们为我们的非凸优化过程提供了局部收敛保证。大量的仿真研究和实际的电子病历应用证明了该方法的有效性。
摘要：Latent space models are widely used for analyzing high-dimensional discrete data matrices, such as patient-feature matrices in electronic health records (EHRs), by capturing complex dependence structures through low-dimensional embeddings. However, estimation becomes challenging in the imbalanced regime, where one matrix dimension is much larger than the other. In EHR applications, cohort sizes are often limited by disease prevalence or data availability, whereas the feature space remains extremely large due to the breadth of medical coding system. Motivated by the increasing availability of external semantic embeddings, such as pre-trained embeddings of clinical concepts in EHRs, we propose a knowledge-embedded latent projection model that leverages semantic side information to regularize representation learning. Specifically, we model column embeddings as smooth functions of semantic embeddings via a mapping in a reproducing kernel Hilbert space. We develop a computationally efficient two-step estimation procedure that combines semantically guided subspace construction via kernel principal component analysis with scalable projected gradient descent. We establish estimation error bounds that characterize the trade-off between statistical error and approximation error induced by the kernel projection. Furthermore, we provide local convergence guarantees for our non-convex optimization procedure. Extensive simulation studies and a real-world EHR application demonstrate the effectiveness of the proposed method.

【2】Are Object-Centric Representations Better At Compositional Generalization?
标题：以对象为中心的表示是否更擅长组合概括？
链接：https://arxiv.org/abs/2602.16689

作者：Ferdinand Kapl,Amir Mohammad Karimi Mamaghan,Maximilian Seitzer,Karl Henrik Johansson,Carsten Marr,Stefan Bauer,Andrea Dittadi
摘要：组合泛化，即对熟悉概念的新颖组合进行推理的能力，是人类认知的基础，也是机器学习的关键挑战。以对象为中心（OC）的表示，将场景编码为一组对象，通常被认为是支持这种概括，但在视觉丰富的设置系统的证据是有限的。我们在三个受控的视觉世界（CLEVRTex，Super-CLEVR和MOVi-C）中引入了一个视觉问题测试基准，以测量视觉编码器在有和没有以对象为中心的偏见的情况下，对对象属性的不可见组合的概括程度。为了确保公平和全面的比较，我们仔细考虑了训练数据的多样性，样本大小，表示大小，下游模型容量和计算。我们使用DINOv 2和SigLIP 2，两种广泛使用的视觉编码器，作为基础模型和它们的OC对应物。我们的主要研究结果表明：（1）OC方法在较难的组合泛化设置中更优越;（2）原始密集表示仅在较容易的设置中超过OC，并且通常需要更多的下游计算;（3）OC模型更有效，用更少的图像实现更强的泛化，而密集编码器只有在足够的数据和多样性下才能赶上或超过它们。总的来说，当数据集大小、训练数据多样性或下游计算中的任何一个受到约束时，以对象为中心的表示提供了更强的组合泛化。
摘要：Compositional generalization, the ability to reason about novel combinations of familiar concepts, is fundamental to human cognition and a critical challenge for machine learning. Object-centric (OC) representations, which encode a scene as a set of objects, are often argued to support such generalization, but systematic evidence in visually rich settings is limited. We introduce a Visual Question Answering benchmark across three controlled visual worlds (CLEVRTex, Super-CLEVR, and MOVi-C) to measure how well vision encoders, with and without object-centric biases, generalize to unseen combinations of object properties. To ensure a fair and comprehensive comparison, we carefully account for training data diversity, sample size, representation size, downstream model capacity, and compute. We use DINOv2 and SigLIP2, two widely used vision encoders, as the foundation models and their OC counterparts. Our key findings reveal that (1) OC approaches are superior in harder compositional generalization settings; (2) original dense representations surpass OC only on easier settings and typically require substantially more downstream compute; and (3) OC models are more sample efficient, achieving stronger generalization with fewer images, whereas dense encoders catch up or surpass them only with sufficient data and diversity. Overall, object-centric representations offer stronger compositional generalization when any one of dataset size, training data diversity, or downstream compute is constrained.

【3】Anatomy of Capability Emergence: Scale-Invariant Representation Collapse and Top-Down Reorganization in Neural Networks
标题：能力涌现剖析：神经网络中的规模不变表示崩溃和自上而下的重组
链接：https://arxiv.org/abs/2602.15997

作者：Jayadev Billa
备注：19 pages, 6 figures, 12 appendix pages
摘要：神经网络训练过程中的能力涌现在机制上仍然不透明。我们跟踪了五个模型尺度（405 K-85 M参数）的五个几何度量，八个算法任务中的120多个出现事件，以及三个Pythia语言模型（160 M-2.8B）。我们发现：（1）训练开始于通用表示折叠到跨210 X参数范围尺度不变的任务特定楼层（例如，模块化算术折叠到RANKME ~ 2.0，与模型大小无关）;（2）折叠自上而下通过层传播（32/32任务X模型一致性），与自下而上的特征构建直觉相矛盾;（3）表征几何引导涌现的几何层次（75-100%的前兆率为困难的任务），而本地学习系数是同步的（0/24前兆）和海森措施滞后。我们还描述了预测限制：几何测量编码粗任务难度，但不细粒度的时间（类内一致性27%，当任务顺序反转跨尺度，预测失败26%）。在Pythia上，全局几何模式复制，但每个任务的前兆信号没有-前兆关系需要任务训练对齐，而自然主义的预训练没有提供。我们的贡献是涌现的几何解剖及其边界条件，而不是预测工具。
摘要：Capability emergence during neural network training remains mechanistically opaque. We track five geometric measures across five model scales (405K-85M parameters), 120+ emergence events in eight algorithmic tasks, and three Pythia language models (160M-2.8B). We find: (1) training begins with a universal representation collapse to task-specific floors that are scale-invariant across a 210X parameter range (e.g., modular arithmetic collapses to RANKME ~ 2.0 regardless of model size); (2) collapse propagates top-down through layers (32/32 task X model consistency), contradicting bottom-up feature-building intuition; (3) a geometric hierarchy in which representation geometry leads emergence (75-100% precursor rate for hard tasks), while the local learning coefficient is synchronous (0/24 precursor) and Hessian measures lag. We also delineate prediction limits: geometric measures encode coarse task difficulty but not fine-grained timing (within-class concordance 27%; when task ordering reverses across scales, prediction fails at 26%). On Pythia, global geometric patterns replicate but per-task precursor signals do not -- the precursor relationship requires task-training alignment that naturalistic pre-training does not provide. Our contribution is the geometric anatomy of emergence and its boundary conditions, not a prediction tool.

【4】Parameter-free representations outperform single-cell foundation models on downstream benchmarks
标题：无参数表示在下游基准上优于单单元基础模型
链接：https://arxiv.org/abs/2602.16696

作者：Huan Souza,Pankaj Mehta
摘要：单细胞RNA测序（scRNA-seq）数据显示出强大且可重复的统计结构。这推动了大规模基础模型（例如TranscriptFormer）的开发，这些模型使用基于转换器的架构通过将基因嵌入潜在向量空间来学习基因表达的生成模型。这些嵌入已被用于获得下游任务的最先进（SOTA）性能，如细胞类型分类，疾病状态预测和跨物种学习。在这里，我们要问的是，在不使用计算密集型的基于深度学习的表示的情况下，是否可以实现类似的性能。使用简单的，可解释的管道，依赖于仔细的归一化和线性方法，我们在通常用于评估单细胞基础模型的多个基准测试中获得SOTA或接近SOTA的性能，包括在涉及新细胞类型和训练数据中缺失的生物体的分布外任务中优于基础模型。我们的研究结果强调了严格的基准测试的必要性，并表明细胞身份的生物学可以通过单细胞基因表达数据的简单线性表示来捕获。
摘要：Single-cell RNA sequencing (scRNA-seq) data exhibit strong and reproducible statistical structure. This has motivated the development of large-scale foundation models, such as TranscriptFormer, that use transformer-based architectures to learn a generative model for gene expression by embedding genes into a latent vector space. These embeddings have been used to obtain state-of-the-art (SOTA) performance on downstream tasks such as cell-type classification, disease-state prediction, and cross-species learning. Here, we ask whether similar performance can be achieved without utilizing computationally intensive deep learning-based representations. Using simple, interpretable pipelines that rely on careful normalization and linear methods, we obtain SOTA or near SOTA performance across multiple benchmarks commonly used to evaluate single-cell foundation models, including outperforming foundation models on out-of-distribution tasks involving novel cell types and organisms absent from the training data. Our findings highlight the need for rigorous benchmarking and suggest that the biology of cell identity can be captured by simple linear representations of single cell gene expression data.

【5】Structured Unitary Tensor Network Representations for Circuit-Efficient Quantum Data Encoding
标题：用于电路高效量子数据编码的结构化个位张量网络表示
链接：https://arxiv.org/abs/2602.16266

作者：Guang Lin,Toshihisa Tanaka,Qibin Zhao
摘要：将经典数据编码为量子态是量子机器学习的核心瓶颈：许多广泛使用的编码电路效率低下，需要深度电路和大量量子资源，这限制了量子硬件的可扩展性。在这项工作中，我们提出了TNQE，这是一种基于结构化酉张量网络（TN）表示的电路高效量子数据编码框架。TNQE首先通过TN分解表示每个经典输入，然后通过两个互补的核心到电路策略将所得的张量核心编译成编码电路。为了使这种编译在尊重量子运算的酉性质的同时是可训练的，我们引入了一个酉感知约束，将TN核参数化为可学习的块酉，使它们能够直接优化并直接编码为量子运算符。所提出的TNQE框架能够显式控制电路深度和量子位资源，从而允许构建浅的、资源高效的电路。在一系列基准测试中，TNQE实现了幅度编码深度为0.04\times $的编码电路，同时自然扩展到高分辨率图像（256\times 256$），并在真实量子硬件上展示了实际可行性。
摘要：Encoding classical data into quantum states is a central bottleneck in quantum machine learning: many widely used encodings are circuit-inefficient, requiring deep circuits and substantial quantum resources, which limits scalability on quantum hardware. In this work, we propose TNQE, a circuit-efficient quantum data encoding framework built on structured unitary tensor network (TN) representations. TNQE first represents each classical input via a TN decomposition and then compiles the resulting tensor cores into an encoding circuit through two complementary core-to-circuit strategies. To make this compilation trainable while respecting the unitary nature of quantum operations, we introduce a unitary-aware constraint that parameterizes TN cores as learnable block unitaries, enabling them to be directly optimized and directly encoded as quantum operators. The proposed TNQE framework enables explicit control over circuit depth and qubit resources, allowing the construction of shallow, resource-efficient circuits. Across a range of benchmarks, TNQE achieves encoding circuits as shallow as $0.04\times$ the depth of amplitude encoding, while naturally scaling to high-resolution images ($256 \times 256$) and demonstrating practical feasibility on real quantum hardware.

编码器(1篇)

【1】Factorization Machine with Quadratic-Optimization Annealing for RNA Inverse Folding and Evaluation of Binary-Integer Encoding and Nucleotide Assignment
标题：RNA反向折叠的二次优化退因机以及二进制编码和核苷分配的评估
链接：https://arxiv.org/abs/2602.16643

作者：Shuta Kikuchi,Shu Tanaka
备注：17 pages, 10 figures
摘要：RNA反向折叠问题的目的是识别优先采用给定目标二级结构的核苷酸序列。虽然已经提出了各种启发式和基于机器学习的方法，但许多方法需要大量的序列评估，这限制了它们在实验验证成本高昂时的适用性。我们提出了一种方法来解决这个问题，使用二次优化退火（FMQA）的因式分解机。FMQA是一种离散的黑箱优化方法，据报道，以获得高质量的解决方案与有限数量的评估。将FMQA应用于该问题需要将核苷酸转换为二进制变量。然而，整数到核苷酸分配和二进制整数编码对FMQA性能的影响尚未得到彻底研究，即使这样的选择决定了代理模型和搜索环境的结构，从而直接影响解决方案的质量。因此，本研究的目的是建立一个新的FMQA框架的RNA反向折叠，并分析这些分配和编码方法的影响。我们评估了四个核苷酸与有序整数（0-3）的所有24种可能的分配，并结合四种二进制整数编码方法。我们的结果表明，一热和畴壁编码优于二进制和一元编码的归一化系综缺陷值。在结构域壁编码中，分配给边界整数（0和3）的核苷酸出现频率较高。在RNA反向折叠问题中，将鸟嘌呤和胞嘧啶分配给这些边界整数促进了它们在茎区域的富集，这导致比用独热编码获得的二级结构更稳定的二级结构。
摘要：The RNA inverse folding problem aims to identify nucleotide sequences that preferentially adopt a given target secondary structure. While various heuristic and machine learning-based approaches have been proposed, many require a large number of sequence evaluations, which limits their applicability when experimental validation is costly. We propose a method to solve the problem using a factorization machine with quadratic-optimization annealing (FMQA). FMQA is a discrete black-box optimization method reported to obtain high-quality solutions with a limited number of evaluations. Applying FMQA to the problem requires converting nucleotides into binary variables. However, the influence of integer-to-nucleotide assignments and binary-integer encoding on the performance of FMQA has not been thoroughly investigated, even though such choices determine the structure of the surrogate model and the search landscape, and thus can directly affect solution quality. Therefore, this study aims both to establish a novel FMQA framework for RNA inverse folding and to analyze the effects of these assignments and encoding methods. We evaluated all 24 possible assignments of the four nucleotides to the ordered integers (0-3), in combination with four binary-integer encoding methods. Our results demonstrated that one-hot and domain-wall encodings outperform binary and unary encodings in terms of the normalized ensemble defect value. In domain-wall encoding, nucleotides assigned to the boundary integers (0 and 3) appeared with higher frequency. In the RNA inverse folding problem, assigning guanine and cytosine to these boundary integers promoted their enrichment in stem regions, which led to more thermodynamically stable secondary structures than those obtained with one-hot encoding.

优化|敛散性(10篇)

【1】Almost Sure Convergence of Differential Temporal Difference Learning for Average Reward Markov Decision Processes
标题：平均报酬Markov决策过程的差异时间差异学习几乎肯定收敛
链接：https://arxiv.org/abs/2602.16629

作者：Ethan Blaser,Jiuqi Wang,Shangtong Zhang
摘要：平均奖励是强化学习（RL）中的一个基本性能指标，专注于代理的长期性能。差分时间差（TD）学习算法是平均奖励RL的一个主要进步，因为它们提供了一种有效的在线方法来学习与策略上和策略外设置中的平均奖励相关联的值函数。然而，现有的趋同保证需要一个与国家访问次数挂钩的学习率本地时钟，从业者不使用，也不超出表格设置。我们解决这个问题的限制，证明几乎肯定收敛的政策$n$步差分TD任何$n$使用标准的学习率递减没有本地时钟。然后，我们推导出三个充分条件下，关闭的政策$n$步差分TD也收敛没有本地时钟。这些结果加强了微分TD的理论基础，并使其收敛性分析更接近于实际实现。
摘要：The average reward is a fundamental performance metric in reinforcement learning (RL) focusing on the long-run performance of an agent. Differential temporal difference (TD) learning algorithms are a major advance for average reward RL as they provide an efficient online method to learn the value functions associated with the average reward in both on-policy and off-policy settings. However, existing convergence guarantees require a local clock in learning rates tied to state visit counts, which practitioners do not use and does not extend beyond tabular settings. We address this limitation by proving the almost sure convergence of on-policy $n$-step differential TD for any $n$ using standard diminishing learning rates without a local clock. We then derive three sufficient conditions under which off-policy $n$-step differential TD also converges without a local clock. These results strengthen the theoretical foundations of differential TD and bring its convergence analysis closer to practical implementations.

【2】Muon with Spectral Guidance: Efficient Optimization for Scientific Machine Learning
标题：具有谱引导的μ子：科学机器学习的有效优化
链接：https://arxiv.org/abs/2602.16167

作者：Binghang Lu,Jiahao Zhang,Guang Lin
摘要：基于物理的神经网络和神经算子通常会遇到由病态梯度、多尺度谱行为和物理约束引起的刚度引起的严重优化困难。最近，μ介子优化器通过在梯度的奇异向量基中执行正交化更新，从而改善几何条件，从而显示出了希望。然而，它的单位奇异值更新可能会导致过于激进的步骤，并缺乏明确的稳定性保证时，应用于物理知情的学习。在这项工作中，我们提出了SpecMuon，一个频谱感知优化器，集成了μ子的正交几何与模式的放松标量辅助变量（RSAV）机制。通过将矩阵值梯度分解为奇异模式并沿主要频谱方向单独应用RSAV更新，SpecMuon根据全局损失能量自适应地调节步长，同时保留Muon的尺度平衡特性。该公式将优化解释为多模梯度流，并实现对刚性频谱分量的原则性控制。我们建立严格的理论性质SpecMuon，包括修改的能量耗散律，积极性和有界性的辅助变量，和全局收敛的线性率下的Polyak-Lojasiewicz条件。在物理信息神经网络、DeepONets和分数PINN-DeepONets上的数值实验表明，与Adam、AdamW和原始Muon优化器相比，SpecMuon在一维Burgers方程和分数偏微分方程等基准问题上实现了更快的收敛和更高的稳定性。
摘要：Physics-informed neural networks and neural operators often suffer from severe optimization difficulties caused by ill-conditioned gradients, multi-scale spectral behavior, and stiffness induced by physical constraints. Recently, the Muon optimizer has shown promise by performing orthogonalized updates in the singular-vector basis of the gradient, thereby improving geometric conditioning. However, its unit-singular-value updates may lead to overly aggressive steps and lack explicit stability guarantees when applied to physics-informed learning. In this work, we propose SpecMuon, a spectral-aware optimizer that integrates Muon's orthogonalized geometry with a mode-wise relaxed scalar auxiliary variable (RSAV) mechanism. By decomposing matrix-valued gradients into singular modes and applying RSAV updates individually along dominant spectral directions, SpecMuon adaptively regulates step sizes according to the global loss energy while preserving Muon's scale-balancing properties. This formulation interprets optimization as a multi-mode gradient flow and enables principled control of stiff spectral components. We establish rigorous theoretical properties of SpecMuon, including a modified energy dissipation law, positivity and boundedness of auxiliary variables, and global convergence with a linear rate under the Polyak-Lojasiewicz condition. Numerical experiments on physics-informed neural networks, DeepONets, and fractional PINN-DeepONets demonstrate that SpecMuon achieves faster convergence and improved stability compared with Adam, AdamW, and the original Muon optimizer on benchmark problems such as the one-dimensional Burgers equation and fractional partial differential equations.

【3】Differentially Private Non-convex Distributionally Robust Optimization
标题：差异私有非凸分布鲁棒优化
链接：https://arxiv.org/abs/2602.16155

作者：Difei Xu,Meng Ding,Zebin Ma,Huanyi Xie,Youming Tao,Aicha Slaitane,Di Wang
摘要：Real-world deployments routinely face distribution shifts, group imbalances, and adversarial perturbations, under which the traditional Empirical Risk Minimization (ERM) framework can degrade severely. Distributionally Robust Optimization (DRO) addresses this issue by optimizing the worst-case expected loss over an uncertainty set of distributions, offering a principled approach to robustness. Meanwhile, as training data in DRO always involves sensitive information, safeguarding it against leakage under Differential Privacy (DP) is essential. In contrast to classical DP-ERM, DP-DRO has received much less attention due to its minimax optimization structure with uncertainty constraint. To bridge the gap, we provide a comprehensive study of DP-(finite-sum)-DRO with $ψ$-divergence and non-convex loss. First, we study DRO with general $ψ$-divergence by reformulating it as a minimization problem, and develop a novel $(\varepsilon, δ)$-DP optimization method, called DP Double-Spider, tailored to this structure. Under mild assumptions, we show that it achieves a utility bound of $\mathcal{O}(\frac{1}{\sqrt{n}}+ (\frac{\sqrt{d \log (1/δ)}}{n \varepsilon})^{2/3})$ in terms of the gradient norm, where $n$ denotes the data size and $d$ denotes the model dimension. We further improve the utility rate for specific divergences. In particular, for DP-DRO with KL-divergence, by transforming the problem into a compositional finite-sum optimization problem, we develop a DP Recursive-Spider method and show that it achieves a utility bound of $\mathcal{O}((\frac{\sqrt{d \log(1/δ)}}{n\varepsilon})^{2/3} )$, matching the best-known result for non-convex DP-ERM. Experimentally, we demonstrate that our proposed methods outperform existing approaches for DP minimax optimization.
摘要：Real-world deployments routinely face distribution shifts, group imbalances, and adversarial perturbations, under which the traditional Empirical Risk Minimization (ERM) framework can degrade severely. Distributionally Robust Optimization (DRO) addresses this issue by optimizing the worst-case expected loss over an uncertainty set of distributions, offering a principled approach to robustness. Meanwhile, as training data in DRO always involves sensitive information, safeguarding it against leakage under Differential Privacy (DP) is essential. In contrast to classical DP-ERM, DP-DRO has received much less attention due to its minimax optimization structure with uncertainty constraint. To bridge the gap, we provide a comprehensive study of DP-(finite-sum)-DRO with $ψ$-divergence and non-convex loss. First, we study DRO with general $ψ$-divergence by reformulating it as a minimization problem, and develop a novel $(\varepsilon, δ)$-DP optimization method, called DP Double-Spider, tailored to this structure. Under mild assumptions, we show that it achieves a utility bound of $\mathcal{O}(\frac{1}{\sqrt{n}}+ (\frac{\sqrt{d \log (1/δ)}}{n \varepsilon})^{2/3})$ in terms of the gradient norm, where $n$ denotes the data size and $d$ denotes the model dimension. We further improve the utility rate for specific divergences. In particular, for DP-DRO with KL-divergence, by transforming the problem into a compositional finite-sum optimization problem, we develop a DP Recursive-Spider method and show that it achieves a utility bound of $\mathcal{O}((\frac{\sqrt{d \log(1/δ)}}{n\varepsilon})^{2/3} )$, matching the best-known result for non-convex DP-ERM. Experimentally, we demonstrate that our proposed methods outperform existing approaches for DP minimax optimization.

【4】Heuristic Search as Language-Guided Program Optimization
标题：启发式搜索作为启发式程序优化
链接：https://arxiv.org/abs/2602.16038

作者：Mingxin Yu,Ruixiao Yang,Chuchu Fan
备注：8 pages, 3 figures, under review
摘要：在过去的几年中，大型语言模型（LLM）在组合优化（CO）中推进了自动启发式设计（AHD）。然而，现有的发现管道通常需要大量的手动试错或依赖领域专业知识来适应新的或复杂的问题。这源于紧密耦合的内部机制，限制了LLM驱动的设计过程的系统改进。为了解决这一挑战，我们提出了一个结构化的框架，为LLM驱动的AHD明确地分解成模块化的阶段的启发式发现过程：一个正向通过评估，一个反向通过分析反馈，和一个更新步骤的程序细化。这种分离为迭代细化提供了清晰的抽象，并支持对各个组件进行原则性的改进。我们在四个不同的现实世界的CO领域验证了我们的框架，在那里它始终优于基线，在看不见的测试集上实现了高达0.17美元的QYI改进。最后，我们证明了几种流行的AHD方法是我们框架的限制实例。通过将它们集成到我们的结构化管道中，我们可以模块化地升级组件并显着提高其性能。
摘要：Large Language Models (LLMs) have advanced Automated Heuristic Design (AHD) in combinatorial optimization (CO) in the past few years. However, existing discovery pipelines often require extensive manual trial-and-error or reliance on domain expertise to adapt to new or complex problems. This stems from tightly coupled internal mechanisms that limit systematic improvement of the LLM-driven design process. To address this challenge, we propose a structured framework for LLM-driven AHD that explicitly decomposes the heuristic discovery process into modular stages: a forward pass for evaluation, a backward pass for analytical feedback, and an update step for program refinement. This separation provides a clear abstraction for iterative refinement and enables principled improvements of individual components. We validate our framework across four diverse real-world CO domains, where it consistently outperforms baselines, achieving up to $0.17$ improvement in QYI on unseen test sets. Finally, we show that several popular AHD methods are restricted instantiations of our framework. By integrating them in our structured pipeline, we can upgrade the components modularly and significantly improve their performance.

【5】IT-OSE: Exploring Optimal Sample Size for Industrial Data Augmentation
标题：IT-OSE：探索工业数据增强的最佳样本量
链接：https://arxiv.org/abs/2602.15878

作者：Mingchun Sun,Rongqiang Zhao,Zhennan Huang,Songyu Ding,Jie Liu
摘要：在工业场景中，数据增强是提高模型性能的有效方法。然而，它的好处不是单向的。对于增强中的最佳样本量没有理论研究或既定的估计，也没有既定的衡量标准来评估最佳样本量的准确性或其与地面实况的偏差。为了解决这些问题，我们提出了一个信息理论的最佳样本量估计（IT-OSE），提供可靠的OSS估计工业数据增强。区间覆盖和偏差（ICD）评分提出了直观地评估估计OSS。从理论上分析了OSS与主导因素之间的关系，并将其公式化，从而增强了OSS的可解释性。实验表明，与经验估计相比，IT-OSE将基线模型中分类任务的准确性平均提高了4.38%，并将基线模型中回归任务的MAPE平均降低了18.80%。下游模型性能的改进更加稳定。ICD评分中的ICDdev也平均降低了49.30%。增强了OSS的确定性。与穷举搜索相比，IT-OSE实现了相同的OSS，同时平均减少了83.97%和93.46%的计算和数据成本。此外，实用性实验表明，IT-OSE具有跨代表性的基于传感器的工业场景的通用性。
摘要：In industrial scenarios, data augmentation is an effective approach to improve model performance. However, its benefits are not unidirectionally beneficial. There is no theoretical research or established estimation for the optimal sample size (OSS) in augmentation, nor is there an established metric to evaluate the accuracy of OSS or its deviation from the ground truth. To address these issues, we propose an information-theoretic optimal sample size estimation (IT-OSE) to provide reliable OSS estimation for industrial data augmentation. An interval coverage and deviation (ICD) score is proposed to evaluate the estimated OSS intuitively. The relationship between OSS and dominant factors is theoretically analyzed and formulated, thereby enhancing the interpretability. Experiments show that, compared to empirical estimation, the IT-OSE increases accuracy in classification tasks across baseline models by an average of 4.38%, and reduces MAPE in regression tasks across baseline models by an average of 18.80%. The improvements in downstream model performance are more stable. ICDdev in the ICD score is also reduced by an average of 49.30%. The determinism of OSS is enhanced. Compared to exhaustive search, the IT-OSE achieves the same OSS while reducing computational and data costs by an average of 83.97% and 93.46%. Furthermore, practicality experiments demonstrate that the IT-OSE exhibits generality across representative sensor-based industrial scenarios.

【6】A Koopman-Bayesian Framework for High-Fidelity, Perceptually Optimized Haptic Surgical Simulation
标题：用于高保真、感知优化的触觉手术模拟的Koopman-Bayesian框架
链接：https://arxiv.org/abs/2602.15834

作者：Rohit Kaushik,Eva Kaushik
备注：11 pages, 6 figures
摘要：我们引入了一个统一的框架，结合非线性动力学，感知心理物理学和高频触觉渲染，以提高手术模拟的真实感。手术器械与软组织的相互作用被提升到具有Koopman算子公式的增强状态空间，允许对本质上非线性的动态进行线性预测和控制。为了使渲染的力与人类的感知极限一致，我们提出了一个贝叶斯校准模块的基础上WeberFechner和史蒂文斯缩放定律，逐步形成力信号相对于每个人的歧视阈值。对于各种模拟手术任务，如触诊，切口，骨磨，所提出的系统达到了4.3 ms的平均渲染延迟，小于2.8%的力误差和20%的提高知觉歧视。多元统计分析（MANOVA和回归）表明，该系统的性能显着优于传统的弹簧阻尼器和能量，基于渲染方法。最后，我们讨论了对手术培训和VR的潜在影响，基于医学教育，以及勾勒未来的工作，在触觉接口闭环神经反馈。
摘要：We introduce a unified framework that combines nonlinear dynamics, perceptual psychophysics and high frequency haptic rendering to enhance realism in surgical simulation. The interaction of the surgical device with soft tissue is elevated to an augmented state space with a Koopman operator formulation, allowing linear prediction and control of the dynamics that are nonlinear by nature. To make the rendered forces consistent with human perceptual limits, we put forward a Bayesian calibration module based on WeberFechner and Stevens scaling laws, which progressively shape force signals relative to each individual's discrimination thresholds. For various simulated surgical tasks such as palpation, incision, and bone milling, the proposed system attains an average rendering latency of 4.3 ms, a force error of less than 2.8% and a 20% improvement in perceptual discrimination. Multivariate statistical analyses (MANOVA and regression) reveal that the system's performance is significantly better than that of conventional spring-damper and energy, based rendering methods. We end by discussing the potential impact on surgical training and VR, based medical education, as well as sketching future work toward closed, loop neural feedback in haptic interfaces.

【7】Optimal training-conditional regret for online conformal prediction
标题：在线保形预测的最佳训练条件遗憾
链接：https://arxiv.org/abs/2602.16537

作者：Jiadong Liang,Zhimei Ren,Yuxin Chen
摘要：我们研究了未知分布漂移的非平稳数据流的在线共形预测。虽然大多数先前的工作研究了这个问题下的对抗性设置和/或评估性能的差距的时间平均的边缘覆盖，而是通过训练条件的累积遗憾来评估性能。我们特别关注具有两种类型分布偏移的独立生成的数据：突变点和平滑漂移。当非一致性得分函数在独立数据集上进行预训练时，我们提出了一种分裂共形风格的算法，该算法利用漂移检测来自适应地更新校准集，从而可证明实现最小最优遗憾。当非一致性分数在线训练时，我们开发了一种全共形风格的算法，该算法再次结合了漂移检测来处理非平稳性;这种方法依赖于模型拟合算法的稳定性-而不是排列对称性，这通常更适合于在不断变化的环境下进行在线学习。我们建立了非渐近的遗憾保证我们的在线全共形算法，它匹配的极大极小下界适当的限制下的预测集。数值实验证实了我们的理论研究结果。
摘要：We study online conformal prediction for non-stationary data streams subject to unknown distribution drift. While most prior work studied this problem under adversarial settings and/or assessed performance in terms of gaps of time-averaged marginal coverage, we instead evaluate performance through training-conditional cumulative regret. We specifically focus on independently generated data with two types of distribution shift: abrupt change points and smooth drift. When non-conformity score functions are pretrained on an independent dataset, we propose a split-conformal style algorithm that leverages drift detection to adaptively update calibration sets, which provably achieves minimax-optimal regret. When non-conformity scores are instead trained online, we develop a full-conformal style algorithm that again incorporates drift detection to handle non-stationarity; this approach relies on stability - rather than permutation symmetry - of the model-fitting algorithm, which is often better suited to online learning under evolving environments. We establish non-asymptotic regret guarantees for our online full conformal algorithm, which match the minimax lower bound under appropriate restrictions on the prediction sets. Numerical experiments corroborate our theoretical findings.

【8】On sparsity, extremal structure, and monotonicity properties of Wasserstein and Gromov-Wasserstein optimal transport plans
标题：关于Wasserstein和Gromov-Wasserstein最优交通计划的稀疏性、极端结构和单调性
链接：https://arxiv.org/abs/2602.16265

作者：Titouan Vayer
摘要：与标准线性最优传输（OT）框架相比，本说明对Gromov-Wasserstein（GW）距离的一些重要属性进行了独立概述。更具体地说，我探讨了以下问题：GW的最优运输计划稀疏？在什么条件下它们在置换上是支持的？它们是否满足某种形式的循环单调性？特别是，我提出的条件负半定的财产，并表明，当它成立时，有GW的最优计划是稀疏的，并支持置换。
摘要：This note gives a self-contained overview of some important properties of the Gromov-Wasserstein (GW) distance, compared with the standard linear optimal transport (OT) framework. More specifically, I explore the following questions: are GW optimal transport plans sparse? Under what conditions are they supported on a permutation? Do they satisfy a form of cyclical monotonicity? In particular, I present the conditionally negative semi-definite property and show that, when it holds, there are GW optimal plans that are sparse and supported on a permutation.

【9】Local adapt-then-combine algorithms for distributed nonsmooth optimization: Achieving provable communication acceleration
标题：用于分布式非光滑优化的本地自适应然后组合算法：实现可证明的通信加速
链接：https://arxiv.org/abs/2602.16148

作者：Luyao Guo,Xinli Shi,Wenying Xu,Jinde Cao
摘要：本文研究了网络上的分布式复合优化问题，其中智能体的目标是最小化局部光滑分量和公共非光滑项之和。利用概率本地更新机制，我们提出了一个通信效率的适应然后结合（ATC）框架，FlexATC，统一了众多的基于ATC的分布式算法。在独立于网络拓扑结构和局部更新次数的步长下，我们分别在凸和强凸环境下建立了FlexATC的次线性和线性收敛速度。值得注意的是，在强凸设置中，线性速率与目标函数和网络拓扑解耦，并且FlexATC允许在大多数迭代中跳过通信，而不会使线性速率恶化。此外，所提出的统一理论首次证明，本地更新可证明导致基于ATC的分布式算法的通信加速。数值实验进一步验证了所提出的框架的有效性，并证实了理论结果。
摘要：This paper is concerned with the distributed composite optimization problem over networks, where agents aim to minimize a sum of local smooth components and a common nonsmooth term. Leveraging the probabilistic local updates mechanism, we propose a communication-efficient Adapt-Then-Combine (ATC) framework, FlexATC, unifying numerous ATC-based distributed algorithms. Under stepsizes independent of the network topology and the number of local updates, we establish sublinear and linear convergence rates for FlexATC in convex and strongly convex settings, respectively. Remarkably, in the strong convex setting, the linear rate is decoupled from the objective functions and network topology, and FlexATC permits communication to be skipped in most iterations without any deterioration of the linear rate. In addition, the proposed unified theory demonstrates for the first time that local updates provably lead to communication acceleration for ATC-based distributed algorithms. Numerical experiments further validate the efficacy of the proposed framework and corroborate the theoretical results.

【10】Ratio Covers of Convex Sets and Optimal Mixture Density Estimation
标题：凸集的比率覆盖和最优混合密度估计
链接：https://arxiv.org/abs/2602.16142

作者：Spencer Compton,Gábor Lugosi,Jaouad Mourtada,Jian Qian,Nikita Zhivotovskiy
备注：45 pages
摘要：本文研究了Kullback-Leibler发散下的密度估计：给定一个独立同分布。从一个未知的密度$p$的样本，目标是构造一个估计$\widehat p$，使得$\mathrm{KL}（p，\widehat p）$在高概率下很小。我们考虑两个设置涉及一个有限的字典的$M$密度：（i）模型聚合，其中$p$属于字典，和（ii）凸聚合（混合密度估计），其中$p$是一个混合的密度从字典。重要的是，我们不对基密度做任何假设：它们的比率可能是无界的，它们的支撑可能不同。对于这两个问题，我们确定了最好的可能的高概率保证的字典大小，样本大小和置信水平。这些最优速率高于密度比由绝对常数约束时可实现的速率;对于混合密度估计，它们在离散分布的特殊情况下匹配现有的下界。我们的混合情况下的分析取决于两个新的覆盖结果。首先，我们提供了一个尖锐的，分布自由的上限上的局部Hellinger熵的类的混合物的$M$分布。其次，我们证明了凸集的最优比率覆盖定理：对于每个凸紧集K\subset \mathbb{R}_+^d$，存在一个至多包含2 ^{8d}$个元素的子集A\subset K$，使得K$的每个元素在坐标方向上被A$的一个元素支配直到一个泛常数因子。这个几何结果是独立的利益;值得注意的是，它产生新的基数估计$\varepados $-近似帕累托集在多目标优化时，可达到的目标向量集是凸的。
摘要：We study density estimation in Kullback-Leibler divergence: given an i.i.d. sample from an unknown density $p$, the goal is to construct an estimator $\widehat p$ such that $\mathrm{KL}(p,\widehat p)$ is small with high probability. We consider two settings involving a finite dictionary of $M$ densities: (i) model aggregation, where $p$ belongs to the dictionary, and (ii) convex aggregation (mixture density estimation), where $p$ is a mixture of densities from the dictionary. Crucially, we make no assumption on the base densities: their ratios may be unbounded and their supports may differ. For both problems, we identify the best possible high-probability guarantees in terms of the dictionary size, sample size, and confidence level. These optimal rates are higher than those achievable when density ratios are bounded by absolute constants; for mixture density estimation, they match existing lower bounds in the special case of discrete distributions. Our analysis of the mixture case hinges on two new covering results. First, we provide a sharp, distribution-free upper bound on the local Hellinger entropy of the class of mixtures of $M$ distributions. Second, we prove an optimal ratio covering theorem for convex sets: for every convex compact set $K\subset \mathbb{R}_+^d$, there exists a subset $A\subset K$ with at most $2^{8d}$ elements such that each element of $K$ is coordinate-wise dominated by an element of $A$ up to a universal constant factor. This geometric result is of independent interest; notably, it yields new cardinality estimates for $\varepsilon$-approximate Pareto sets in multi-objective optimization when the attainable set of objective vectors is convex.

预测|估计(11篇)

【1】Predicting The Cop Number Using Machine Learning
标题：使用机器学习预测警察人数
链接：https://arxiv.org/abs/2602.16600

作者：Meagan Mann,Christian Muise,Erin Meger
备注：8 pages
摘要：《警察与强盗》是一款在图上玩的追逐躲避游戏，四十多年前由Quilliot \cite{quilliot 1978 jeux}和Nowakowski and Winkler \cite{NOWAKOWSKI 1983235}独立推出。图族的copnumber的确定是近年来研究的一个热点.一个图的警察数$c（G）$被定义为保证抓住抢劫犯所需的最小警察数。确定Copnumber在计算上是困难的，并且用于此的精确算法通常限于小图族。本文研究了经典的机器学习方法和图神经网络是否可以从图的结构属性中准确地预测图的cop数，并确定哪些属性对这种预测影响最大。在经典的机器学习模型中，基于树的模型在预测中实现了高精度，尽管类别不平衡，而图神经网络在没有显式特征工程的情况下实现了相当的结果。可解释性分析表明，最具预测性的特征与节点连接性，聚类，集团结构和宽度参数有关，这与已知的理论结果一致。我们的研究结果表明，机器学习方法可以通过在计算不可行的情况下提供可扩展的近似值来与现有的警察人数算法进行补充。
摘要：Cops and Robbers is a pursuit evasion game played on a graph, first introduced independently by Quilliot \cite{quilliot1978jeux} and Nowakowski and Winkler \cite{NOWAKOWSKI1983235} over four decades ago. A main interest in recent the literature is identifying the cop number of graph families. The cop number of a graph, $c(G)$, is defined as the minimum number of cops required to guarantee capture of the robber. Determining the cop number is computationally difficult and exact algorithms for this are typically restricted to small graph families. This paper investigates whether classical machine learning methods and graph neural networks can accurately predict a graph's cop number from its structural properties and identify which properties most strongly influence this prediction. Of the classical machine learning models, tree-based models achieve high accuracy in prediction despite class imbalance, whereas graph neural networks achieve comparable results without explicit feature engineering. The interpretability analysis shows that the most predictive features are related to node connectivity, clustering, clique structure, and width parameters, which aligns with known theoretical results. Our findings suggest that machine learning approaches can be used in complement with existing cop number algorithms by offering scalable approximations where computation is infeasible.

【2】AIFL: A Global Daily Streamflow Forecasting Model Using Deterministic LSTM Pre-trained on ERA5-Land and Fine-tuned on IFS
标题：AIFL：使用确定性LSTM的全球每日流预测模型，在ERA 5上预训练-在IFS上进行着陆和微调
链接：https://arxiv.org/abs/2602.16579

作者：Maria Luisa Taccari,Kenza Tazi,Oisín M. Morrison,Andreas Grafberger,Juan Colonese,Corentin Carton de Wiart,Christel Prudhomme,Cinzia Mazzetti,Matthew Chantry,Florian Pappenberger
摘要：可靠的全球径流预测对于洪水防备和水资源管理至关重要，但数据驱动的模型在从历史再分析过渡到业务预测产品时往往会遇到性能差距。本文介绍了AIFL（人工智能洪水），一个确定性的LSTM为基础的全球日径流预测模型。AIFL在CARAVAN数据集管理的18，588个盆地上进行训练，采用了一种新的两阶段训练策略来桥接再分析到预测的领域转移。该模型首先在40年的ERA 5-土地再分析（1980-2019）上进行预训练，以捕获强大的水文过程，然后在业务综合预报系统（IFS）控制预报（2016-2019）上进行微调，以适应业务数值天气预报的特定误差结构和偏差。据我们所知，这是第一个在CARAVAN生态系统中进行端到端训练的全球模型。在一个独立的时间测试集（2021-2024）上，AIFL实现了较高的预测技能，中值修正克林-古普塔效率（KGE '）为0.66，中值纳什-萨克利夫效率（NSE）为0.53。基准测试结果表明，AIFL与当前最先进的全球系统相比具有很强的竞争力，在保持透明和可重复的强制管道的同时实现了相当的精度。该模型在极端事件检测方面表现出异常的可靠性，为全球水文界提供了一个精简和业务上稳健的基线。
摘要：Reliable global streamflow forecasting is essential for flood preparedness and water resource management, yet data-driven models often suffer from a performance gap when transitioning from historical reanalysis to operational forecast products. This paper introduces AIFL (Artificial Intelligence for Floods), a deterministic LSTM-based model designed for global daily streamflow forecasting. Trained on 18,588 basins curated from the CARAVAN dataset, AIFL utilises a novel two-stage training strategy to bridge the reanalysis-to-forecast domain shift. The model is first pre-trained on 40 years of ERA5-Land reanalysis (1980-2019) to capture robust hydrological processes, then fine-tuned on operational Integrated Forecasting System (IFS) control forecasts (2016-2019) to adapt to the specific error structures and biases of operational numerical weather prediction. To our knowledge, this is the first global model trained end-to-end within the CARAVAN ecosystem. On an independent temporal test set (2021-2024), AIFL achieves high predictive skill with a median modified Kling-Gupta Efficiency (KGE') of 0.66 and a median Nash-Sutcliffe Efficiency (NSE) of 0.53. Benchmarking results show that AIFL is highly competitive with current state-of-the-art global systems, achieving comparable accuracy while maintaining a transparent and reproducible forcing pipeline. The model demonstrates exceptional reliability in extreme-event detection, providing a streamlined and operationally robust baseline for the global hydrological community.

【3】MoDE-Boost: Boosting Shared Mobility Demand with Edge-Ready Prediction Models
标题：MoDE-Booth：利用边缘就绪预测模型提振共享移动需求
链接：https://arxiv.org/abs/2602.16573

作者：Antonios Tziorvas,George S. Theodoropoulos,Yannis Theodoridis
备注：25 pages
摘要：城市需求预测在智能交通系统中优化路径选择、调度和拥堵管理等方面起着至关重要的作用。通过利用数据融合和分析技术，交通需求预测是识别新兴空间和时间需求模式的关键中间措施。在本文中，我们通过提出两种梯度提升模型变体来应对这一挑战，一种用于分类，一种用于回归，两者都能够在不同的时间范围内生成需求预测，从5分钟到1小时。我们的整体方法有效地集成了时间和上下文特征，实现了对提高共享（微）移动服务效率至关重要的准确预测。为了评估其有效性，我们利用了来自五个大都市地区的电动滑板车和电动自行车网络的开放共享移动数据。这些真实世界的数据集使我们能够将我们的方法与最先进的方法以及基于生成人工智能的模型进行比较，证明其在捕捉现代城市交通复杂性方面的有效性。最终，我们的方法为城市微观流动管理提供了新的见解，有助于应对快速城市化带来的挑战，从而为更可持续、更高效、更宜居的城市做出贡献。
摘要：Urban demand forecasting plays a critical role in optimizing routing, dispatching, and congestion management within Intelligent Transportation Systems. By leveraging data fusion and analytics techniques, traffic demand forecasting serves as a key intermediate measure for identifying emerging spatial and temporal demand patterns. In this paper, we tackle this challenge by proposing two gradient boosting model variations, one for classiffication and one for regression, both capable of generating demand forecasts at various temporal horizons, from 5 minutes up to one hour. Our overall approach effectively integrates temporal and contextual features, enabling accurate predictions that are essential for improving the efficiency of shared (micro-) mobility services. To evaluate its effectiveness, we utilize open shared mobility data derived from e-scooter and e-bike networks in five metropolitan areas. These real-world datasets allow us to compare our approach with state-of-the-art methods as well as a Generative AI-based model, demonstrating its effectiveness in capturing the complexities of modern urban mobility. Ultimately, our methodology offers novel insights on urban micro-mobility management, helping to tackle the challenges arising from rapid urbanization and thus, contributing to more sustainable, efficient, and livable cities.

【4】HPMixer: Hierarchical Patching for Multivariate Time Series Forecasting
标题：HMPixer：多元时间序列预测的分层修补
链接：https://arxiv.org/abs/2602.16468

作者：Jung Min Choi,Vijaya Krishna Yalavarthi,Lars Schmidt-Thieme
备注：18 pages, 5 figures, 5 tables, PAKDD 2026
摘要：在长期多变量时间序列预测中，有效地捕获周期模式和残差动态是必不可少的。为了在标准深度学习基准设置中解决这个问题，我们提出了分层修补混合器（HPMixer），它以解耦但互补的方式对周期性和残差进行建模。周期性组件利用可学习的循环模块[7]，通过非线性通道MLP增强，以获得更好的表现力。残差分量通过可学习平稳小波变换（LSWT）进行处理，以提取稳定的、平移不变的频域表示。随后，通道混合编码器对显式通道间依赖性进行建模，而两级非重叠分层修补机制捕获粗尺度和细尺度残差变化。通过将解耦周期性建模与结构化、多尺度残差学习相结合，HPMixer提供了一个有效的框架。在标准多变量基准上进行的大量实验表明，与最近的基准相比，HPMixer实现了具有竞争力或最先进的性能。
摘要：In long-term multivariate time series forecasting, effectively capturing both periodic patterns and residual dynamics is essential. To address this within standard deep learning benchmark settings, we propose the Hierarchical Patching Mixer (HPMixer), which models periodicity and residuals in a decoupled yet complementary manner. The periodic component utilizes a learnable cycle module [7] enhanced with a nonlinear channel-wise MLP for greater expressiveness. The residual component is processed through a Learnable Stationary Wavelet Transform (LSWT) to extract stable, shift-invariant frequency-domain representations. Subsequently, a channel-mixing encoder models explicit inter-channel dependencies, while a two-level non-overlapping hierarchical patching mechanism captures coarse- and fine-scale residual variations. By integrating decoupled periodicity modeling with structured, multi-scale residual learning, HPMixer provides an effective framework. Extensive experiments on standard multivariate benchmarks demonstrate that HPMixer achieves competitive or state-of-the-art performance compared to recent baselines.

【5】Guide-Guard: Off-Target Predicting in CRISPR Applications
标题：向导：CRISPR应用中的脱靶预测
链接：https://arxiv.org/abs/2602.16327

作者：Joseph Bingham,Netanel Arussy,Saman Zonouz
备注：10 pages, 11 figs, accepted to IDEAL 2022
摘要：随着CRISPR等网络物理基因组测序和编辑技术的引入，研究人员可以更容易地访问工具，以研究和创建遗传学和健康科学（例如农业和医学）中各种主题的补救措施。随着该领域的发展和增长，新的关注点出现在预测脱靶行为的能力上。在这项工作中，我们从数据驱动的角度探索了潜在的生物和化学模型。此外，我们提出了一种基于机器学习的解决方案，名为\textit{Guide-Guard}，用于预测CRISPR基因编辑过程中给定gRNA的系统行为，准确率为84%。该解决方案能够同时在多个不同的基因上进行训练，同时保持准确性。
摘要：With the introduction of cyber-physical genome sequencing and editing technologies, such as CRISPR, researchers can more easily access tools to investigate and create remedies for a variety of topics in genetics and health science (e.g. agriculture and medicine). As the field advances and grows, new concerns present themselves in the ability to predict the off-target behavior. In this work, we explore the underlying biological and chemical model from a data driven perspective. Additionally, we present a machine learning based solution named \textit{Guide-Guard} to predict the behavior of the system given a gRNA in the CRISPR gene-editing process with 84\% accuracy. This solution is able to be trained on multiple different genes at the same time while retaining accuracy.

【6】Prediction of Major Solar Flares Using Interpretable Class-dependent Reward Framework with Active Region Magnetograms and Domain Knowledge
标题：使用可解释类别相关奖励框架、活动区磁图和领域知识预测主要太阳耀斑
链接：https://arxiv.org/abs/2602.16264

作者：Zixian Wu,Xuebao Li,Yanfang Zheng,Rui Wang,Shunhuang Zhang,Jinfang Wei,Yongshang Lv,Liang Dong,Zamri Zainal Abidin,Noraisyah Mohamed Shah,Hongwei Ye,Pengchao Yan,Xuefeng Li,Xiaojia Ji,Xusheng Huang,Xiaotian Wang,Honglei Jin
备注：24 pages,12 figures
摘要：在这项工作中，我们开发，第一次，监督分类框架与类相关的奖励（CDR）预测$\geq$MM耀斑在24小时内。我们构建多个数据集，涵盖知识知情的功能和视线（LOS）磁图。我们还应用了三种深度学习模型（CNN、CNN-BiLSTM和Transformer）和三种CDR对应模型（CDR-CNN、CDR-CNN-BiLSTM和CDR-Transformer）。首先，我们分析了LOS磁场参数与Transformer的重要性，然后比较了它的性能与LOS，只有矢量，和组合的磁场参数。其次，我们比较了基于CDR模型与深度学习模型的耀斑预测性能。第三，我们对CDR模型的报酬工程进行了敏感性分析。第四，我们使用SHAP方法来实现模型的可解释性。最后，我们进行了性能比较，我们的模型和NASA/CCMC。主要研究结果如下：（1）在LOS特征组合中，R_VALUE和AREA_ACR始终能得到最好的结果。（2）Transformer利用组合的LOS和矢量磁场数据比利用单独的任一数据获得更好的性能。（3）使用知识信息特征的模型优于使用磁图的模型。（4）虽然CNN和CNN-BiLSTM在磁图上优于CDR同行，但在使用知识信息特征时，CDR-Transformer略优于其深度学习同行。在所有型号中，CDR-Transformer实现了最佳性能。（5）CDR模型的预测性能对奖励选择不太敏感。（6）通过SHAP分析，CDR模型倾向于将TOTUSJH视为更重要，而Transformer倾向于将R_VALUE视为更重要。（7）在相同的预测时间和活动区数下，CDR-Transformer的预测能力优于NASA/CCMC。
摘要：In this work, we develop, for the first time, a supervised classification framework with class-dependent rewards (CDR) to predict $\geq$MM flares within 24 hr. We construct multiple datasets, covering knowledge-informed features and line-of sight (LOS) magnetograms. We also apply three deep learning models (CNN, CNN-BiLSTM, and Transformer) and three CDR counterparts (CDR-CNN, CDR-CNN-BiLSTM, and CDR-Transformer). First, we analyze the importance of LOS magnetic field parameters with the Transformer, then compare its performance using LOS-only, vector-only, and combined magnetic field parameters. Second, we compare flare prediction performance based on CDR models versus deep learning counterparts. Third, we perform sensitivity analysis on reward engineering for CDR models. Fourth, we use the SHAP method for model interpretability. Finally, we conduct performance comparison between our models and NASA/CCMC. The main findings are: (1)Among LOS feature combinations, R_VALUE and AREA_ACR consistently yield the best results. (2)Transformer achieves better performance with combined LOS and vector magnetic field data than with either alone. (3)Models using knowledge-informed features outperform those using magnetograms. (4)While CNN and CNN-BiLSTM outperform their CDR counterparts on magnetograms, CDR-Transformer is slightly superior to its deep learning counterpart when using knowledge-informed features. Among all models, CDR-Transformer achieves the best performance. (5)The predictive performance of the CDR models is not overly sensitive to the reward choices.(6)Through SHAP analysis, the CDR model tends to regard TOTUSJH as more important, while the Transformer tends to prioritize R_VALUE more.(7)Under identical prediction time and active region (AR) number, the CDR-Transformer shows superior predictive capabilities compared to NASA/CCMC.

【7】Online Prediction of Stochastic Sequences with High Probability Regret Bounds
标题：具有高概率后悔界的随机序列在线预测
链接：https://arxiv.org/abs/2602.16236

作者：Matthias Frey,Jonathan H. Manton,Jingge Zhu
备注：Accepted for publication at The Fourteenth International Conference on Learning Representations (ICLR 2026)
摘要：我们重温了经典问题的普遍预测的随机序列与有限的时间范围$T$已知的学习者。我们调查的问题是，是否有可能得到消失的遗憾界限，持有高概率，补充现有的界限，从文献中，持有预期。我们提出了这样的高概率界有一个非常相似的形式作为先验期望界。对于可数字母表上的随机过程的普遍预测的情况下，我们的边界状态的收敛速度为$\mathcal{O}（T^{-1/2} δ^{-1/2}）$，与之前已知的顺序为$\mathcal{O}（T^{-1/2}）$的期望内边界相比，概率至少为$1-δ$。我们还提出了一个不可能的结果，证明了不可能在相同形式的界中改进$δ$的指数，而不做额外的假设。
摘要：We revisit the classical problem of universal prediction of stochastic sequences with a finite time horizon $T$ known to the learner. The question we investigate is whether it is possible to derive vanishing regret bounds that hold with high probability, complementing existing bounds from the literature that hold in expectation. We propose such high-probability bounds which have a very similar form as the prior expectation bounds. For the case of universal prediction of a stochastic process over a countable alphabet, our bound states a convergence rate of $\mathcal{O}(T^{-1/2} δ^{-1/2})$ with probability as least $1-δ$ compared to prior known in-expectation bounds of the order $\mathcal{O}(T^{-1/2})$. We also propose an impossibility result which proves that it is not possible to improve the exponent of $δ$ in a bound of the same form without making additional assumptions.

【8】SEMixer: Semantics Enhanced MLP-Mixer for Multiscale Mixing and Long-term Time Series Forecasting
标题：SEmixer：用于多尺度混合和长期时间序列预测的语义增强型ML混合器
链接：https://arxiv.org/abs/2602.16220

作者：Xu Zhang,Qitong Wang,Peng Wang,Wei Wang
备注：This work is accepted by the proceedings of the ACM Web Conference 2026 (WWW 2026). The code is available at the link https://github.com/Meteor-Stars/SEMixer
摘要：多尺度模式建模是长期时间序列预测（TSF）的关键。然而，时间序列中的冗余和噪声，以及非相邻尺度之间的语义间隙，使得多尺度时间依赖的有效对齐和集成具有挑战性。为了解决这个问题，我们提出了SEMixer，一个轻量级的多尺度模型，专为长期TSF。SEMixer具有两个关键组件：随机注意机制（RAM）和多尺度渐进混合链（MPMC）。RAM在训练过程中捕获不同的时间-补丁交互，并在推理时通过dropout集成来聚合它们，增强补丁级别的语义，并使MLP-Mixer能够更好地对多尺度依赖关系进行建模。MPMC进一步以内存高效的方式堆叠RAM和MLP混合器，实现更有效的时间混合。它解决了跨尺度的语义差距，促进了更好的多尺度建模和预测性能。我们不仅在10个公共数据集上验证了SEMixer的有效性，而且在基于21 GB真实无线网络数据的\textit{2025 CCF AlOps Challenge}上，SEMixer获得了第三名。该代码可在链接https://github.com/Meteor-Stars/SEMixer上获得。
摘要：Modeling multiscale patterns is crucial for long-term time series forecasting (TSF). However, redundancy and noise in time series, together with semantic gaps between non-adjacent scales, make the efficient alignment and integration of multi-scale temporal dependencies challenging. To address this, we propose SEMixer, a lightweight multiscale model designed for long-term TSF. SEMixer features two key components: a Random Attention Mechanism (RAM) and a Multiscale Progressive Mixing Chain (MPMC). RAM captures diverse time-patch interactions during training and aggregates them via dropout ensemble at inference, enhancing patch-level semantics and enabling MLP-Mixer to better model multi-scale dependencies. MPMC further stacks RAM and MLP-Mixer in a memory-efficient manner, achieving more effective temporal mixing. It addresses semantic gaps across scales and facilitates better multiscale modeling and forecasting performance. We not only validate the effectiveness of SEMixer on 10 public datasets, but also on the \textit{2025 CCF AlOps Challenge} based on 21GB real wireless network data, where SEMixer achieves third place. The code is available at the link https://github.com/Meteor-Stars/SEMixer.

【9】Deep TPC: Temporal-Prior Conditioning for Time Series Forecasting
标题：深度PPC：时间序列预测的时间先验条件
链接：https://arxiv.org/abs/2602.16188

作者：Filippos Bellos,NaveenJohn Premkumar,Yannis Avrithis,Nam H. Nguyen,Jason J. Corso
备注：Accepted to ICASSP 2026
摘要：时间序列（TS）的LLM方法通常对时间进行浅层处理，在很大程度上冻结的解码器的输入处注入位置或基于时间的线索，这限制了时间推理，因为这些信息通过层降级。我们引入了时间先验条件（TPC），它将时间提升到一级模态，在多个深度对模型进行条件化。TPC将一小组可学习的时间序列令牌附加到补丁流;在选定的层，这些令牌交叉参与由相同的冻结LLM编码的紧凑的、人类可读的时间描述符导出的时间嵌入，然后通过自我关注反馈时间上下文。这在保持低参数预算的同时解开时间序列信号和时间信息。我们表明，通过只训练交叉注意模块并明确地解开时间序列信号和时间信息，TPC始终优于完全微调和浅层条件反射策略，在不同数据集的长期预测中实现了最先进的性能。代码可在：www.example.com
摘要：LLM-for-time series (TS) methods typically treat time shallowly, injecting positional or prompt-based cues once at the input of a largely frozen decoder, which limits temporal reasoning as this information degrades through the layers. We introduce Temporal-Prior Conditioning (TPC), which elevates time to a first-class modality that conditions the model at multiple depths. TPC attaches a small set of learnable time series tokens to the patch stream; at selected layers these tokens cross-attend to temporal embeddings derived from compact, human-readable temporal descriptors encoded by the same frozen LLM, then feed temporal context back via self-attention. This disentangles time series signal and temporal information while maintaining a low parameter budget. We show that by training only the cross-attention modules and explicitly disentangling time series signal and temporal information, TPC consistently outperforms both full fine-tuning and shallow conditioning strategies, achieving state-of-the-art performance in long-term forecasting across diverse datasets. Code available at: https://github.com/fil-mp/Deep_tpc

【10】MolCrystalFlow: Molecular Crystal Structure Prediction via Flow Matching
标题：MolCrystalFlow：通过流量匹配预测分子晶体结构
链接：https://arxiv.org/abs/2602.16020

作者：Cheng Zeng,Harry W. Sullivan,Thomas Egg,Maya M. Martirossyan,Philipp Höllmer,Jirui Jin,Richard G. Hennig,Adrian Roitberg,Stefano Martiniani,Ellad B. Tadmor,Mingjie Liu
备注：20 pages, 4 figures
摘要：分子晶体结构预测是计算化学中的一个巨大挑战，因为组成分子的尺寸很大，分子内和分子间的相互作用也很复杂。虽然生成建模已经彻底改变了分子，无机固体和金属有机框架的结构发现，但将这种方法扩展到完全周期性的分子晶体仍然是难以捉摸的。在这里，我们提出了MolCrystalFlow，一个基于流的分子晶体结构预测生成模型。该框架通过嵌入分子作为刚体并共同学习晶格矩阵、分子取向和质心位置，将分子内复杂性从分子间堆积中解脱出来。质心和方向表示在它们的原生黎曼流形上，允许测地线流构造和尊重几何对称性的图神经网络操作。我们对我们的模型进行了基准测试，针对两个开源分子晶体数据集上的大尺寸周期晶体和基于规则的结构生成方法的最先进的生成模型。我们展示了MolCrystalFlow模型与通用机器学习潜力的集成，以加速分子晶体结构预测，为数据驱动的分子晶体生成发现铺平了道路。
摘要：Molecular crystal structure prediction represents a grand challenge in computational chemistry due to large sizes of constituent molecules and complex intra- and intermolecular interactions. While generative modeling has revolutionized structure discovery for molecules, inorganic solids, and metal-organic frameworks, extending such approaches to fully periodic molecular crystals is still elusive. Here, we present MolCrystalFlow, a flow-based generative model for molecular crystal structure prediction. The framework disentangles intramolecular complexity from intermolecular packing by embedding molecules as rigid bodies and jointly learning the lattice matrix, molecular orientations, and centroid positions. Centroids and orientations are represented on their native Riemannian manifolds, allowing geodesic flow construction and graph neural network operations that respects geometric symmetries. We benchmark our model against state-of-the-art generative models for large-size periodic crystals and rule-based structure generation methods on two open-source molecular crystal datasets. We demonstrate an integration of MolCrystalFlow model with universal machine learning potential to accelerate molecular crystal structure prediction, paving the way for data-driven generative discovery of molecular crystals.

【11】R$^2$Energy: A Large-Scale Benchmark for Robust Renewable Energy Forecasting under Diverse and Extreme Conditions
标题：R $' 2 $能源：多样化和极端条件下稳健可再生能源预测的大规模基准
链接：https://arxiv.org/abs/2602.15961

作者：Zhi Sheng,Yuan Yuan,Guozhen Zhang,Yong Li
摘要：可再生能源的快速发展，特别是风能和太阳能，使得可靠的预测对电力系统的运行至关重要。虽然最近的深度学习模型已经达到了很高的平均精度，但气候驱动的极端天气事件的频率和强度不断增加，对电网稳定性和运营安全构成了严重威胁。因此，开发能够承受波动条件的强大预测模型已成为一项重大挑战。在本文中，我们提出了R$^2$Energy，这是一个用于NWP辅助可再生能源预测的大规模基准。它包括来自中国四个省902个风能和太阳能站的超过1070万个高保真小时记录，提供了捕捉可再生能源发电广泛变化所需的各种气象条件。我们进一步建立了一个标准化的、无泄漏的预报模式，该模式允许所有模型对未来的数值天气预报（NWP）信号进行相同的访问，从而能够在最先进的代表性预报架构中进行公平和可重复的比较。除了总体准确性，我们将政权明智的评估与专家一致的极端天气注释，揭示了一个关键的“鲁棒性差距”通常被平均指标掩盖。这一差距揭示了一个明显的鲁棒性-复杂性权衡：在极端条件下，模型的可靠性取决于其气象集成策略，而不是其架构复杂性。R$^2$Energy为评估和开发安全关键型电力系统应用的预测模型提供了原则性基础。
摘要：The rapid expansion of renewable energy, particularly wind and solar power, has made reliable forecasting critical for power system operations. While recent deep learning models have achieved strong average accuracy, the increasing frequency and intensity of climate-driven extreme weather events pose severe threats to grid stability and operational security. Consequently, developing robust forecasting models that can withstand volatile conditions has become a paramount challenge. In this paper, we present R$^2$Energy, a large-scale benchmark for NWP-assisted renewable energy forecasting. It comprises over 10.7 million high-fidelity hourly records from 902 wind and solar stations across four provinces in China, providing the diverse meteorological conditions necessary to capture the wide-ranging variability of renewable generation. We further establish a standardized, leakage-free forecasting paradigm that grants all models identical access to future Numerical Weather Prediction (NWP) signals, enabling fair and reproducible comparison across state-of-the-art representative forecasting architectures. Beyond aggregate accuracy, we incorporate regime-wise evaluation with expert-aligned extreme weather annotations, uncovering a critical ``robustness gap'' typically obscured by average metrics. This gap reveals a stark robustness-complexity trade-off: under extreme conditions, a model's reliability is driven by its meteorological integration strategy rather than its architectural complexity. R$^2$Energy provides a principled foundation for evaluating and developing forecasting models for safety-critical power system applications.

其他神经网络|深度学习|模型|建模(38篇)

【1】Retrieval-Augmented Foundation Models for Matched Molecular Pair Transformations to Recapitulate Medicinal Chemistry Intuition
标题：匹配对分子转化的检索增强基础模型以重述药物化学直觉
链接：https://arxiv.org/abs/2602.16684

作者：Bo Pan,Peter Zhiping Zhang,Hao-Wei Pang,Alex Zhu,Xiang Yu,Liying Zhang,Liang Zhao
摘要：匹配分子对（MMPs）捕获药物化学家常规用于设计类似物的局部化学编辑，但现有的ML方法要么在全分子水平上操作，编辑可控性有限，要么从受限的设置和小模型中学习MMPs式编辑。我们提出了一个变量到变量的模拟生成和训练的基础模型上的大规模MMP变换（MMPTs），以产生不同的变量条件下的输入变量。为了实现实际的控制，我们开发了提示机制，让用户指定首选的转换模式在生成过程中。我们进一步介绍MMPT-RAG，检索增强框架，使用外部参考类似物作为上下文指导，引导生成和概括项目特定的系列。在一般化学语料库和专利特定数据集上的实验证明了改进的多样性，新颖性和可控性，并表明我们的方法在实际发现场景中恢复了真实的模拟结构。
摘要：Matched molecular pairs (MMPs) capture the local chemical edits that medicinal chemists routinely use to design analogs, but existing ML approaches either operate at the whole-molecule level with limited edit controllability or learn MMP-style edits from restricted settings and small models. We propose a variable-to-variable formulation of analog generation and train a foundation model on large-scale MMP transformations (MMPTs) to generate diverse variables conditioned on an input variable. To enable practical control, we develop prompting mechanisms that let the users specify preferred transformation patterns during generation. We further introduce MMPT-RAG, a retrieval-augmented framework that uses external reference analogs as contextual guidance to steer generation and generalize from project-specific series. Experiments on general chemical corpora and patent-specific datasets demonstrate improved diversity, novelty, and controllability, and show that our method recovers realistic analog structures in practical discovery scenarios.

【2】A Systematic Evaluation of Sample-Level Tokenization Strategies for MEG Foundation Models
标题：MEG基金会模型样本级代币化策略的系统评估
链接：https://arxiv.org/abs/2602.16626

作者：SungJun Cho,Chetan Gohil,Rukuang Huang,Oiwi Parker Jones,Mark W. Woolrich
备注：15 pages, 10 figures, 1 table
摘要：最近在自然语言处理方面的成功激发了人们对神经成像数据的大规模基础模型的兴趣。这种模型通常需要对连续的神经时间序列数据进行离散化，这一过程被称为“标记化”。然而，目前对神经数据的不同标记化策略的影响知之甚少。在这项工作中，我们提出了一个系统的评估基于变压器的大型神经成像模型（LNM）应用于脑磁图（MEG）数据的样本级标记化策略。我们通过检查其信号重建保真度及其对后续基础建模性能（令牌预测，生成数据的生物可扩展性，特定于主题的信息的保存以及下游任务的性能）的影响来比较可学习和不可学习的令牌器。对于可学习的标记器，我们引入了一种基于自动编码器的新方法。实验进行了三个公开的MEG数据集跨越不同的采集站点，扫描仪，和实验范例。我们的研究结果表明，可学习和不可学习的离散化方案在大多数评估标准中都实现了高重建精度和广泛可比的性能，这表明简单的固定样本级标记化策略可以用于神经基础模型的开发。该代码可在https://github.com/OHBA-analysis/Cho2026_Tokenizer上获得。
摘要：Recent success in natural language processing has motivated growing interest in large-scale foundation models for neuroimaging data. Such models often require discretization of continuous neural time series data, a process referred to as 'tokenization'. However, the impact of different tokenization strategies for neural data is currently poorly understood. In this work, we present a systematic evaluation of sample-level tokenization strategies for transformer-based large neuroimaging models (LNMs) applied to magnetoencephalography (MEG) data. We compare learnable and non-learnable tokenizers by examining their signal reconstruction fidelity and their impact on subsequent foundation modeling performance (token prediction, biological plausibility of generated data, preservation of subject-specific information, and performance on downstream tasks). For the learnable tokenizer, we introduce a novel approach based on an autoencoder. Experiments were conducted on three publicly available MEG datasets spanning different acquisition sites, scanners, and experimental paradigms. Our results show that both learnable and non-learnable discretization schemes achieve high reconstruction accuracy and broadly comparable performance across most evaluation criteria, suggesting that simple fixed sample-level tokenization strategies can be used in the development of neural foundation models. The code is available at https://github.com/OHBA-analysis/Cho2026_Tokenizer.

【3】A Scalable Approach to Solving Simulation-Based Network Security Games
标题：解决基于模拟的网络安全游戏的可扩展方法
链接：https://arxiv.org/abs/2602.16564

作者：Michael Lanier,Yevgeniy Vorobeychik
摘要：我们介绍MetaDOAR，一个轻量级的元控制器，增强了双Oracle / PSRO范式与学习，分区感知过滤层和Q值缓存，使可扩展的多智能体强化学习非常大的网络环境。MetaDOAR从每个节点的结构嵌入中学习紧凑的状态投影，以快速评分并选择设备的一个小子集（前k分区），传统的低级演员利用评论家代理执行聚焦波束搜索。所选择的候选动作进行评估与批批评转发和存储在LRU缓存键控的量化状态投影和本地动作标识符，大大减少冗余的批评计算，同时保持决策质量通过保守的k跳缓存失效。从经验上讲，MetaDOAR在大型网络拓扑上获得了比SOTA基线更高的玩家回报，在内存使用或训练时间方面没有显著的扩展问题。这一贡献提供了一个实际的，理论上的驱动路径，以有效的分层策略学习的大规模网络化决策问题。
摘要：We introduce MetaDOAR, a lightweight meta-controller that augments the Double Oracle / PSRO paradigm with a learned, partition-aware filtering layer and Q-value caching to enable scalable multi-agent reinforcement learning on very large cyber-network environments. MetaDOAR learns a compact state projection from per node structural embeddings to rapidly score and select a small subset of devices (a top-k partition) on which a conventional low-level actor performs focused beam search utilizing a critic agent. Selected candidate actions are evaluated with batched critic forwards and stored in an LRU cache keyed by a quantized state projection and local action identifiers, dramatically reducing redundant critic computation while preserving decision quality via conservative k-hop cache invalidation. Empirically, MetaDOAR attains higher player payoffs than SOTA baselines on large network topologies, without significant scaling issues in terms of memory usage or training time. This contribution provide a practical, theoretically motivated path to efficient hierarchical policy learning for large-scale networked decision problems.

【4】FEKAN: Feature-Enriched Kolmogorov-Arnold Networks
标题：FEKAN：资源丰富的Kolmogorov-Arnold Networks
链接：https://arxiv.org/abs/2602.16530

作者：Sidharth S. Menon,Ameya D. Jagtap
备注：45 pages, 45 figures
摘要：Kolmogorov-Arnold Networks (KANs) have recently emerged as a compelling alternative to multilayer perceptrons, offering enhanced interpretability via functional decomposition. However, existing KAN architectures, including spline-, wavelet-, radial-basis variants, etc., suffer from high computational cost and slow convergence, limiting scalability and practical applicability. Here, we introduce Feature-Enriched Kolmogorov-Arnold Networks (FEKAN), a simple yet effective extension that preserves all the advantages of KAN while improving computational efficiency and predictive accuracy through feature enrichment, without increasing the number of trainable parameters. By incorporating these additional features, FEKAN accelerates convergence, increases representation capacity, and substantially mitigates the computational overhead characteristic of state-of-the-art KAN architectures. We investigate FEKAN across a comprehensive set of benchmarks, including function-approximation tasks, physics-informed formulations for diverse partial differential equations (PDEs), and neural operator settings that map between input and output function spaces. For function approximation, we systematically compare FEKAN against a broad family of KAN variants, FastKAN, WavKAN, ReLUKAN, HRKAN, ChebyshevKAN, RBFKAN, and the original SplineKAN. Across all tasks, FEKAN demonstrates substantially faster convergence and consistently higher approximation accuracy than the underlying baseline architectures. We also establish the theoretical foundations for FEKAN, showing its superior representation capacity compared to KAN, which contributes to improved accuracy and efficiency.

【5】Interpretability-by-Design with Accurate Locally Additive Models and Conditional Feature Effects
标题：具有精确局部可加性模型和条件特征效应的设计可解释性
链接：https://arxiv.org/abs/2602.16503

作者：Vasilis Gkolemis,Loukas Kavouras,Dimitrios Kyriakopoulos,Konstantinos Tsopelas,Dimitrios Rontogiannis,Giuseppe Casalicchio,Theodore Dalamagas,Christos Diou
摘要：Generalized additive models (GAMs) offer interpretability through independent univariate feature effects but underfit when interactions are present in data. GA$^2$Ms add selected pairwise interactions which improves accuracy, but sacrifices interpretability and limits model auditing. We propose \emph{Conditionally Additive Local Models} (CALMs), a new model class, that balances the interpretability of GAMs with the accuracy of GA$^2$Ms. CALMs allow multiple univariate shape functions per feature, each active in different regions of the input space. These regions are defined independently for each feature as simple logical conditions (thresholds) on the features it interacts with. As a result, effects remain locally additive while varying across subregions to capture interactions. We further propose a principled distillation-based training pipeline that identifies homogeneous regions with limited interactions and fits interpretable shape functions via region-aware backfitting. Experiments on diverse classification and regression tasks show that CALMs consistently outperform GAMs and achieve accuracy comparable with GA$^2$Ms. Overall, CALMs offer a compelling trade-off between predictive accuracy and interpretability.

【6】GICDM: Mitigating Hubness for Reliable Distance-Based Generative Model Evaluation
标题：GICDM：减轻中心度以实现基于距离的可靠生成模型评估
链接：https://arxiv.org/abs/2602.16449

作者：Nicolas Salvy,Hugues Talbot,Bertrand Thirion
摘要：Generative model evaluation commonly relies on high-dimensional embedding spaces to compute distances between samples. We show that dataset representations in these spaces are affected by the hubness phenomenon, which distorts nearest neighbor relationships and biases distance-based metrics. Building on the classical Iterative Contextual Dissimilarity Measure (ICDM), we introduce Generative ICDM (GICDM), a method to correct neighborhood estimation for both real and generated data. We introduce a multi-scale extension to improve empirical behavior. Extensive experiments on synthetic and real benchmarks demonstrate that GICDM resolves hubness-induced failures, restores reliable metric behavior, and improves alignment with human judgment.

【7】Learning with Locally Private Examples by Inverse Weierstrass Private Stochastic Gradient Descent
标题：通过逆维尔斯特拉斯私人随机梯度下降使用本地私人示例进行学习
链接：https://arxiv.org/abs/2602.16436

作者：Jean Dufraiche,Paul Mangold,Michaël Perrot,Marc Tommasi
备注：30 pages, 8 figures
摘要：Releasing data once and for all under noninteractive Local Differential Privacy (LDP) enables complete data reusability, but the resulting noise may create bias in subsequent analyses. In this work, we leverage the Weierstrass transform to characterize this bias in binary classification. We prove that inverting this transform leads to a bias-correction method to compute unbiased estimates of nonlinear functions on examples released under LDP. We then build a novel stochastic gradient descent algorithm called Inverse Weierstrass Private SGD (IWP-SGD). It converges to the true population risk minimizer at a rate of $\mathcal{O}(1/n)$, with $n$ the number of examples. We empirically validate IWP-SGD on binary classification tasks using synthetic and real-world datasets.

【8】Optical Inversion and Spectral Unmixing of Spectroscopic Photoacoustic Images with Physics-Informed Neural Networks
标题：利用物理信息神经网络实现光谱光声图像的光学倒置和光谱解混
链接：https://arxiv.org/abs/2602.16357

作者：Sarkis Ter Martirosyan,Xinyue Huang,David Qin,Anthony Yu,Stanislav Emelianov
摘要：Accurate estimation of the relative concentrations of chromophores in a spectroscopic photoacoustic (sPA) image can reveal immense structural, functional, and molecular information about physiological processes. However, due to nonlinearities and ill-posedness inherent to sPA imaging, concentration estimation is intractable. The Spectroscopic Photoacoustic Optical Inversion Autoencoder (SPOI-AE) aims to address the sPA optical inversion and spectral unmixing problems without assuming linearity. Herein, SPOI-AE was trained and tested on \textit{in vivo} mouse lymph node sPA images with unknown ground truth chromophore concentrations. SPOI-AE better reconstructs input sPA pixels than conventional algorithms while providing biologically coherent estimates for optical parameters, chromophore concentrations, and the percent oxygen saturation of tissue. SPOI-AE's unmixing accuracy was validated using a simulated mouse lymph node phantom ground truth.

【9】The Implicit Bias of Adam and Muon on Smooth Homogeneous Neural Networks
标题：光滑均匀神经网络上Adam和Muon的隐式偏差
链接：https://arxiv.org/abs/2602.16340

作者：Eitan Gronich,Gal Vardi
备注：11 pages, 1 figure (with appendix: 48 pages, 2 figures), under review for ICML 2026
摘要：We study the implicit bias of momentum-based optimizers on homogeneous models. We first extend existing results on the implicit bias of steepest descent in homogeneous models to normalized steepest descent with an optional learning rate schedule. We then show that for smooth homogeneous models, momentum steepest descent algorithms like Muon (spectral norm), MomentumGD ($\ell_2$ norm), and Signum ($\ell_\infty$ norm) are approximate steepest descent trajectories under a decaying learning rate schedule, proving that these algorithms too have a bias towards KKT points of the corresponding margin maximization problem. We extend the analysis to Adam (without the stability constant), which maximizes the $\ell_\infty$ margin, and to Muon-Signum and Muon-Adam, which maximize a hybrid norm. Our experiments corroborate the theory and show that the identity of the margin maximized depends on the choice of optimizer. Overall, our results extend earlier lines of work on steepest descent in homogeneous models and momentum-based optimizers in linear models.

【10】Subtractive Modulative Network with Learnable Periodic Activations
标题：具有可学习周期激活的减法调制网络
链接：https://arxiv.org/abs/2602.16337

作者：Tiou Wang,Zhuoqian Yang,Markus Flierl,Mathieu Salzmann,Sabine Süsstrunk
备注：4 pages, 3 figures, 3 tables
摘要：We propose the Subtractive Modulative Network (SMN), a novel, parameter-efficient Implicit Neural Representation (INR) architecture inspired by classical subtractive synthesis. The SMN is designed as a principled signal processing pipeline, featuring a learnable periodic activation layer (Oscillator) that generates a multi-frequency basis, and a series of modulative mask modules (Filters) that actively generate high-order harmonics. We provide both theoretical analysis and empirical validation for our design. Our SMN achieves a PSNR of $40+$ dB on two image datasets, comparing favorably against state-of-the-art methods in terms of both reconstruction accuracy and parameter efficiency. Furthermore, consistent advantage is observed on the challenging 3D NeRF novel view synthesis task. Supplementary materials are available at https://inrainbws.github.io/smn/.

【11】HAWX: A Hardware-Aware FrameWork for Fast and Scalable ApproXimation of DNNs
标题：HAWX：用于快速且可扩展地逼近DNN的硬件感知框架
链接：https://arxiv.org/abs/2602.16336

作者：Samira Nazari,Mohammad Saeed Almasi,Mahdi Taheri,Ali Azarpeyvand,Ali Mokhtari,Ali Mahani,Christian Herglotz
摘要：This work presents HAWX, a hardware-aware scalable exploration framework that employs multi-level sensitivity scoring at different DNN abstraction levels (operator, filter, layer, and model) to guide selective integration of heterogeneous AxC blocks. Supported by predictive models for accuracy, power, and area, HAWX accelerates the evaluation of candidate configurations, achieving over 23* speedup in a layer-level search with two candidate approximate blocks and more than (3*106)* speedup at the filter-level search only for LeNet-5, while maintaining accuracy comparable to exhaustive search. Experiments across state-of-the-art DNN benchmarks such as VGG-11, ResNet-18, and EfficientNetLite demonstrate that the efficiency benefits of HAWX scale exponentially with network size. The HAWX hardware-aware search algorithm supports both spatial and temporal accelerator architectures, leveraging either off-the-shelf approximate components or customized designs.

【12】Regret and Sample Complexity of Online Q-Learning via Concentration of Stochastic Approximation with Time-Inhomogeneous Markov Chains
标题：通过时间非齐次Markov链随机逼近的集中在线Q学习的遗憾和样本复杂性
链接：https://arxiv.org/abs/2602.16274

作者：Rahul Singh,Siddharth Chandak,Eric Moulines,Vivek S. Borkar,Nicholas Bambos
摘要：We present the first high-probability regret bound for classical online Q-learning in infinite-horizon discounted Markov decision processes, without relying on optimism or bonus terms. We first analyze Boltzmann Q-learning with decaying temperature and show that its regret depends critically on the suboptimality gap of the MDP: for sufficiently large gaps, the regret is sublinear, while for small gaps it deteriorates and can approach linear growth. To address this limitation, we study a Smoothed $ε_n$-Greedy exploration scheme that combines $ε_n$-greedy and Boltzmann exploration, for which we prove a gap-robust regret bound of near-$\tilde{O}(N^{9/10})$. To analyze these algorithms, we develop a high-probability concentration bound for contractive Markovian stochastic approximation with iterate- and time-dependent transition dynamics. This bound may be of independent interest as the contraction factor in our bound is governed by the mixing time and is allowed to converge to one asymptotically.

【13】DistributedEstimator: Distributed Training of Quantum Neural Networks via Circuit Cutting
标题：DistentedEstimator：通过电路切割对量子神经网络进行分布式训练
链接：https://arxiv.org/abs/2602.16233

作者：Prabhjot Singh,Adel N. Toosi,Rajkumar Buyya
摘要：Circuit cutting decomposes a large quantum circuit into a collection of smaller subcircuits. The outputs of these subcircuits are then classically reconstructed to recover the original expectation values. While prior work characterises cutting overhead largely in terms of subcircuit counts and sampling complexity, its end-to-end impact on iterative, estimator-driven training pipelines remains insufficiently measured from a systems perspective. In this paper, we propose a cut-aware estimator execution pipeline that treats circuit cutting as a staged distributed workload and instruments each estimator query into partitioning, subexperiment generation, parallel execution, and classical reconstruction phases. Using logged runtime traces and learning outcomes on two binary classification workloads (Iris and MNIST), we quantify cutting overheads, scaling limits, and sensitivity to injected stragglers, and we evaluate whether accuracy and robustness are preserved under matched training budgets. Our measurements show that cutting introduces substantial end-to-end overheads that grow with the number of cuts, and that reconstruction constitutes a dominant fraction of per-query time, bounding achievable speed-up under increased parallelism. Despite these systems costs, test accuracy and robustness are preserved in the measured regimes, with configuration-dependent improvements observed in some cut settings. These results indicate that practical scaling of circuit cutting for learning workloads hinges on reducing and overlapping reconstruction and on scheduling policies that account for barrier-dominated critical paths.

【14】Factored Latent Action World Models
标题：因子化的潜在行动世界模型
链接：https://arxiv.org/abs/2602.16229

作者：Zizhao Wang,Chang Shi,Jiaheng Hu,Kevin Rohling,Roberto Martín-Martín,Amy Zhang,Peter Stone
摘要：Learning latent actions from action-free video has emerged as a powerful paradigm for scaling up controllable world model learning. Latent actions provide a natural interface for users to iteratively generate and manipulate videos. However, most existing approaches rely on monolithic inverse and forward dynamics models that learn a single latent action to control the entire scene, and therefore struggle in complex environments where multiple entities act simultaneously. This paper introduces Factored Latent Action Model (FLAM), a factored dynamics framework that decomposes the scene into independent factors, each inferring its own latent action and predicting its own next-step factor value. This factorized structure enables more accurate modeling of complex multi-entity dynamics and improves video generation quality in action-free video settings compared to monolithic models. Based on experiments on both simulation and real-world multi-entity datasets, we find that FLAM outperforms prior work in prediction accuracy and representation quality, and facilitates downstream policy learning, demonstrating the benefits of factorized latent action models.

【15】Rethinking Input Domains in Physics-Informed Neural Networks via Geometric Compactification Mappings
标题：通过几何紧化映射重新思考物理信息神经网络中的输入域
链接：https://arxiv.org/abs/2602.16193

作者：Zhenzhen Huang,Haoyu Bian,Jiaquan Zhang,Yibei Liu,Kuien Liu,Caiyan Qin,Guoqing Wang,Yang Yang,Chaoning Zhang
摘要：Several complex physical systems are governed by multi-scale partial differential equations (PDEs) that exhibit both smooth low-frequency components and localized high-frequency structures. Existing physics-informed neural network (PINN) methods typically train with fixed coordinate system inputs, where geometric misalignment with these structures induces gradient stiffness and ill-conditioning that hinder convergence. To address this issue, we introduce a mapping paradigm that reshapes the input coordinates through differentiable geometric compactification mappings and couples the geometric structure of PDEs with the spectral properties of residual operators. Based on this paradigm, we propose Geometric Compactification (GC)-PINN, a framework that introduces three mapping strategies for periodic boundaries, far-field scale expansion, and localized singular structures in the input domain without modifying the underlying PINN architecture. Extensive empirical evaluation demonstrates that this approach yields more uniform residual distributions and higher solution accuracy on representative 1D and 2D PDEs, while improving training stability and convergence speed.

【16】Learning Personalized Agents from Human Feedback
标题：从人类反馈中学习个性化代理
链接：https://arxiv.org/abs/2602.16173

作者：Kaiqu Liang,Julia Kruk,Shengyi Qian,Xianjun Yang,Shengjie Bi,Yuanshun Yao,Shaoliang Nie,Mingyang Zhang,Lijuan Liu,Jaime Fernández Fisac,Shuyan Zhou,Saghar Hosseini
摘要：Modern AI agents are powerful but often fail to align with the idiosyncratic, evolving preferences of individual users. Prior approaches typically rely on static datasets, either training implicit preference models on interaction history or encoding user profiles in external memory. However, these approaches struggle with new users and with preferences that change over time. We introduce Personalized Agents from Human Feedback (PAHF), a framework for continual personalization in which agents learn online from live interaction using explicit per-user memory. PAHF operationalizes a three-step loop: (1) seeking pre-action clarification to resolve ambiguity, (2) grounding actions in preferences retrieved from memory, and (3) integrating post-action feedback to update memory when preferences drift. To evaluate this capability, we develop a four-phase protocol and two benchmarks in embodied manipulation and online shopping. These benchmarks quantify an agent's ability to learn initial preferences from scratch and subsequently adapt to persona shifts. Our theoretical analysis and empirical results show that integrating explicit memory with dual feedback channels is critical: PAHF learns substantially faster and consistently outperforms both no-memory and single-channel baselines, reducing initial personalization error and enabling rapid adaptation to preference shifts.

【17】On the Power of Source Screening for Learning Shared Feature Extractors
标题：学习共享特征提取器的源筛选力量
链接：https://arxiv.org/abs/2602.16125

作者：Leo,Wang,Connor Mclaughlin,Lili Su
摘要：Learning with shared representation is widely recognized as an effective way to separate commonalities from heterogeneity across various heterogeneous sources. Most existing work includes all related data sources via simultaneously training a common feature extractor and source-specific heads. It is well understood that data sources with low relevance or poor quality may hinder representation learning. In this paper, we further dive into the question of which data sources should be learned jointly by focusing on the traditionally deemed ``good'' collection of sources, in which individual sources have similar relevance and qualities with respect to the true underlying common structure. Towards tractability, we focus on the linear setting where sources share a low-dimensional subspace. We find that source screening can play a central role in statistically optimal subspace estimation. We show that, for a broad class of problem instances, training on a carefully selected subset of sources suffices to achieve minimax optimality, even when a substantial portion of data is discarded. We formalize the notion of an informative subpopulation, develop algorithms and practical heuristics for identifying such subsets, and validate their effectiveness through both theoretical analysis and empirical evaluations on synthetic and real-world datasets.

【18】Why Any-Order Autoregressive Models Need Two-Stream Attention: A Structural-Semantic Tradeoff
标题：为什么任意阶自回归模型需要两流关注：结构-语义权衡
链接：https://arxiv.org/abs/2602.16092

作者：Patrick Pynadath,Ruqi Zhang
摘要：Any-order autoregressive models (AO-ARMs) offer a promising path toward efficient masked diffusion by enabling native key-value caching, but competitive performance has so far required two-stream attention, typically motivated as a means of decoupling token content from position. In this work, we argue that two-stream attention may be serving a more subtle role. We identify a structural-semantic tradeoff in any-order generation: the hidden representation at each step must simultaneously attend to semantically informative tokens for prediction and structurally recent tokens for summarization, objectives that compete for attention capacity in a single stream but can specialize across two streams. To isolate this tradeoff from position-content separation, we propose Decoupled RoPE, a modification to rotary position embeddings that provides target position information without revealing target content. Decoupled RoPE performs competitively at short sequence lengths--where semantic and structural proximity coincide--but degrades as sequence length increases and the two orderings diverge. These results suggest that the success of two-stream attention stems not merely from separating position from content, but from circumventing the deeper structural-semantic tradeoff inherent to any-order generation.

【19】LGQ: Learning Discretization Geometry for Scalable and Stable Image Tokenization
标题：LGQ：学习离散化几何以实现可扩展和稳定的图像令牌化
链接：https://arxiv.org/abs/2602.16086

作者：Idil Bilge Altun,Mert Onur Cakiroglu,Elham Buxton,Mehmet Dalkilic,Hasan Kurban
摘要：Discrete image tokenization is a key bottleneck for scalable visual generation: a tokenizer must remain compact for efficient latent-space priors while preserving semantic structure and using discrete capacity effectively. Existing quantizers face a trade-off: vector-quantized tokenizers learn flexible geometries but often suffer from biased straight-through optimization, codebook under-utilization, and representation collapse at large vocabularies. Structured scalar or implicit tokenizers ensure stable, near-complete utilization by design, yet rely on fixed discretization geometries that may allocate capacity inefficiently under heterogeneous latent statistics. We introduce Learnable Geometric Quantization (LGQ), a discrete image tokenizer that learns discretization geometry end-to-end. LGQ replaces hard nearest-neighbor lookup with temperature-controlled soft assignments, enabling fully differentiable training while recovering hard assignments at inference. The assignments correspond to posterior responsibilities of an isotropic Gaussian mixture and minimize a variational free-energy objective, provably converging to nearest-neighbor quantization in the low-temperature limit. LGQ combines a token-level peakedness regularizer with a global usage regularizer to encourage confident yet balanced code utilization without imposing rigid grids. Under a controlled VQGAN-style backbone on ImageNet across multiple vocabulary sizes, LGQ achieves stable optimization and balanced utilization. At 16K codebook size, LGQ improves rFID by 11.88% over FSQ while using 49.96% fewer active codes, and improves rFID by 6.06% over SimVQ with 49.45% lower effective representation rate, achieving comparable fidelity with substantially fewer active entries. Our GitHub repository is available at: https://github.com/KurbanIntelligenceLab/LGQ

【20】AI-CARE: Carbon-Aware Reporting Evaluation Metric for AI Models
标题：AI-CARE：人工智能模型的碳感知报告评估指标
链接：https://arxiv.org/abs/2602.16042

作者：KC Santosh,Srikanth Baride,Rodrigue Rizk
备注：7 pages, 3 figures
摘要：As machine learning (ML) continues its rapid expansion, the environmental cost of model training and inference has become a critical societal concern. Existing benchmarks overwhelmingly focus on standard performance metrics such as accuracy, BLEU, or mAP, while largely ignoring energy consumption and carbon emissions. This single-objective evaluation paradigm is increasingly misaligned with the practical requirements of large-scale deployment, particularly in energy-constrained environments such as mobile devices, developing regions, and climate-aware enterprises. In this paper, we propose AI-CARE, an evaluation tool for reporting energy consumption, and carbon emissions of ML models. In addition, we introduce the carbon-performance tradeoff curve, an interpretable tool that visualizes the Pareto frontier between performance and carbon cost. We demonstrate, through theoretical analysis and empirical validation on representative ML workloads, that carbon-aware benchmarking changes the relative ranking of models and encourages architectures that are simultaneously accurate and environmentally responsible. Our proposal aims to shift the research community toward transparent, multi-objective evaluation and align ML progress with global sustainability goals. The tool and documentation are available at https://github.com/USD-AI-ResearchLab/ai-care.

【21】B-DENSE: Branching For Dense Ensemble Network Learning
标题：B-DENSE：为密集型网络学习提供分支机构
链接：https://arxiv.org/abs/2602.15971

作者：Cherish Puniani,Tushar Kumar,Arnav Bendre,Gaurav Kumar,Shree Singhi
备注：11 pages, 5 figures, 4 algorithms and 2 tables. Submitted to iclr 2026 delta workshop and still under review
摘要：Inspired by non-equilibrium thermodynamics, diffusion models have achieved state-of-the-art performance in generative modeling. However, their iterative sampling nature results in high inference latency. While recent distillation techniques accelerate sampling, they discard intermediate trajectory steps. This sparse supervision leads to a loss of structural information and introduces significant discretization errors. To mitigate this, we propose B-DENSE, a novel framework that leverages multi-branch trajectory alignment. We modify the student architecture to output $K$-fold expanded channels, where each subset corresponds to a specific branch representing a discrete intermediate step in the teacher's trajectory. By training these branches to simultaneously map to the entire sequence of the teacher's target timesteps, we enforce dense intermediate trajectory alignment. Consequently, the student model learns to navigate the solution space from the earliest stages of training, demonstrating superior image generation quality compared to baseline distillation frameworks.

【22】Learning to Drive in New Cities Without Human Demonstrations
标题：在没有人类示威的情况下在新城市学习驾驶
链接：https://arxiv.org/abs/2602.15891

作者：Zilin Wang,Saeed Rahmani,Daphne Cornelisse,Bidipta Sarkar,Alexander David Goldie,Jakob Nicolaus Foerster,Shimon Whiteson
备注：Autonomous Driving, Reinforcement Learning, Self-play, Simulation, Transfer Learning, Data-efficient Adaptation. Project Page: https://nomaddrive.github.io/
摘要：While autonomous vehicles have achieved reliable performance within specific operating regions, their deployment to new cities remains costly and slow. A key bottleneck is the need to collect many human demonstration trajectories when adapting driving policies to new cities that differ from those seen in training in terms of road geometry, traffic rules, and interaction patterns. In this paper, we show that self-play multi-agent reinforcement learning can adapt a driving policy to a substantially different target city using only the map and meta-information, without requiring any human demonstrations from that city. We introduce NO data Map-based self-play for Autonomous Driving (NOMAD), which enables policy adaptation in a simulator constructed based on the target-city map. Using a simple reward function, NOMAD substantially improves both task success rate and trajectory realism in target cities, demonstrating an effective and scalable alternative to data-intensive city-transfer methods. Project Page: https://nomaddrive.github.io/

【23】Distributed physics-informed neural networks via domain decomposition for fast flow reconstruction
标题：通过区域分解实现快速流重建的分布式物理信息神经网络
链接：https://arxiv.org/abs/2602.15883

作者：Yixiao Qian,Jiaxu Liu,Zewei Xia,Song Chen,Chao Xu,Shengze Cai
摘要：Physics-Informed Neural Networks (PINNs) offer a powerful paradigm for flow reconstruction, seamlessly integrating sparse velocity measurements with the governing Navier-Stokes equations to recover complete velocity and latent pressure fields. However, scaling such models to large spatiotemporal domains is hindered by computational bottlenecks and optimization instabilities. In this work, we propose a robust distributed PINNs framework designed for efficient flow reconstruction via spatiotemporal domain decomposition. A critical challenge in such distributed solvers is pressure indeterminacy, where independent sub-networks drift into inconsistent local pressure baselines. We address this issue through a reference anchor normalization strategy coupled with decoupled asymmetric weighting. By enforcing a unidirectional information flow from designated master ranks where the anchor point lies to neighboring ranks, our approach eliminates gauge freedom and guarantees global pressure uniqueness while preserving temporal continuity. Furthermore, to mitigate the Python interpreter overhead associated with computing high-order physics residuals, we implement a high-performance training pipeline accelerated by CUDA graphs and JIT compilation. Extensive validation on complex flow benchmarks demonstrates that our method achieves near-linear strong scaling and high-fidelity reconstruction, establishing a scalable and physically rigorous pathway for flow reconstruction and understanding of complex hydrodynamics.

【24】Genetic Generalized Additive Models
标题：遗传广义可加模型
链接：https://arxiv.org/abs/2602.15877

作者：Kaaustaaub Shankar,Kelly Cohen
备注：Accepted to NAFIPS 2026
摘要：Generalized Additive Models (GAMs) balance predictive accuracy and interpretability, but manually configuring their structure is challenging. We propose using the multi-objective genetic algorithm NSGA-II to automatically optimize GAMs, jointly minimizing prediction error (RMSE) and a Complexity Penalty that captures sparsity, smoothness, and uncertainty. Experiments on the California Housing dataset show that NSGA-II discovers GAMs that outperform baseline LinearGAMs in accuracy or match performance with substantially lower complexity. The resulting models are simpler, smoother, and exhibit narrower confidence intervals, enhancing interpretability. This framework provides a general approach for automated optimization of transparent, high-performing models. The code can be found at https://github.com/KaaustaaubShankar/GeneticAdditiveModels.

【25】Memes-as-Replies: Can Models Select Humorous Manga Panel Responses?
标题：表情包即回复：模型可以选择幽默漫画面板回应吗？
链接：https://arxiv.org/abs/2602.15842

作者：Ryosuke Kohita,Seiichiro Yoshioka
摘要：Memes are a popular element of modern web communication, used not only as static artifacts but also as interactive replies within conversations. While computational research has focused on analyzing the intrinsic properties of memes, the dynamic and contextual use of memes to create humor remains an understudied area of web science. To address this gap, we introduce the Meme Reply Selection task and present MaMe-Re (Manga Meme Reply Benchmark), a benchmark of 100,000 human-annotated pairs (500,000 total annotations from 2,325 unique annotators) consisting of openly licensed Japanese manga panels and social media posts. Our analysis reveals three key insights: (1) large language models (LLMs) show preliminary evidence of capturing complex social cues such as exaggeration, moving beyond surface-level semantic matching; (2) the inclusion of visual information does not improve performance, revealing a gap between understanding visual content and effectively using it for contextual humor; (3) while LLMs can match human judgments in controlled settings, they struggle to distinguish subtle differences in wit among semantically similar candidates. These findings suggest that selecting contextually humorous replies remains an open challenge for current models.

【26】Investigating Nonlinear Quenching Effects on Polar Field Buildup in the Sun Using Physics-Informed Neural Networks
标题：使用物理知识神经网络研究非线性熄灭对太阳极场建立的影响
链接：https://arxiv.org/abs/2602.16656

作者：Jithu J. Athalathil,Mohammed H. Talafha,Bhargav Vaidya
备注：Accepted for publication in The Astrophysical Journal
摘要：The solar dynamo relies on the regeneration of the poloidal magnetic field through processes strongly modulated by nonlinear feedbacks such as tilt quenching (TQ) and latitude quenching (LQ). These mechanisms play a decisive role in regulating the buildup of the Sun's polar field and, in turn, the amplitude of future solar cycles. In this work, we employ Physics-Informed Neural Networks (PINN) to solve the surface flux transport (SFT) equation, embedding physical constraints directly into the neural network framework. By systematically varying transport parameters, we isolate the relative contributions of TQ and LQ to polar dipole buildup. We use the residual dipole moment as a diagnostic for cycle-to-cycle amplification and show that TQ suppression strengthens with increasing diffusivity, while LQ dominates in advection-dominated regimes. The ratio $ΔD_{\mathrm{LQ}}/ΔD_{\mathrm{TQ}}$ exhibits a smooth inverse-square dependence on the dynamo effectivity range, refining previous empirical fits with improved accuracy and reduced scatter. The results further reveal that the need for a decay term is not essential for PINN set-up due to the training process. Compared with the traditional 1D SFT model, the PINN framework achieves significantly lower error metrics and more robust recovery of nonlinear trends. Our results suggest that the nonlinear interplay between LQ and TQ can naturally produce alternations between weak and strong cycles, providing a physical explanation for the observed even-odd cycle modulation. These findings demonstrate the potential of PINN as an accurate, efficient, and physically consistent tool for solar cycle prediction.

【27】Enhanced Diffusion Sampling: Efficient Rare Event Sampling and Free Energy Calculation with Diffusion Models
标题：增强扩散采样：使用扩散模型的有效罕见事件采样和自由能计算
链接：https://arxiv.org/abs/2602.16634

作者：Yu Xie,Ludwig Winkler,Lixin Sun,Sarah Lewis,Adam E. Foster,José Jiménez Luna,Tim Hempel,Michael Gastegger,Yaoyi Chen,Iryna Zaporozhets,Cecilia Clementi,Christopher M. Bishop,Frank Noé
摘要：The rare-event sampling problem has long been the central limiting factor in molecular dynamics (MD), especially in biomolecular simulation. Recently, diffusion models such as BioEmu have emerged as powerful equilibrium samplers that generate independent samples from complex molecular distributions, eliminating the cost of sampling rare transition events. However, a sampling problem remains when computing observables that rely on states which are rare in equilibrium, for example folding free energies. Here, we introduce enhanced diffusion sampling, enabling efficient exploration of rare-event regions while preserving unbiased thermodynamic estimators. The key idea is to perform quantitatively accurate steering protocols to generate biased ensembles and subsequently recover equilibrium statistics via exact reweighting. We instantiate our framework in three algorithms: UmbrellaDiff (umbrella sampling with diffusion models), $Δ$G-Diff (free-energy differences via tilted ensembles), and MetaDiff (a batchwise analogue for metadynamics). Across toy systems, protein folding landscapes and folding free energies, our methods achieve fast, accurate, and scalable estimation of equilibrium properties within GPU-minutes to hours per system -- closing the rare-event sampling gap that remained after the advent of diffusion-model equilibrium samplers.

【28】Error Propagation and Model Collapse in Diffusion Models: A Theoretical Study
标题：扩散模型中的误差传播和模型崩溃：理论研究
链接：https://arxiv.org/abs/2602.16601

作者：Nail B. Khelifa,Richard E. Turner,Ramji Venkataramanan
摘要：Machine learning models are increasingly trained or fine-tuned on synthetic data. Recursively training on such data has been observed to significantly degrade performance in a wide range of tasks, often characterized by a progressive drift away from the target distribution. In this work, we theoretically analyze this phenomenon in the setting of score-based diffusion models. For a realistic pipeline where each training round uses a combination of synthetic data and fresh samples from the target distribution, we obtain upper and lower bounds on the accumulated divergence between the generated and target distributions. This allows us to characterize different regimes of drift, depending on the score estimation error and the proportion of fresh data used in each generation. We also provide empirical results on synthetic data and images to illustrate the theory.

【29】Learning Distributed Equilibria in Linear-Quadratic Stochastic Differential Games: An $α$-Potential Approach
标题：线性-二次随机差异博弈中学习分布均衡：一种$a $-势方法
链接：https://arxiv.org/abs/2602.16555

作者：Philipp Plank,Yufei Zhang
摘要：We analyze independent policy-gradient (PG) learning in $N$-player linear-quadratic (LQ) stochastic differential games. Each player employs a distributed policy that depends only on its own state and updates the policy independently using the gradient of its own objective. We establish global linear convergence of these methods to an equilibrium by showing that the LQ game admits an $α$-potential structure, with $α$ determined by the degree of pairwise interaction asymmetry. For pairwise-symmetric interactions, we construct an affine distributed equilibrium by minimizing the potential function and show that independent PG methods converge globally to this equilibrium, with complexity scaling linearly in the population size and logarithmically in the desired accuracy. For asymmetric interactions, we prove that independent projected PG algorithms converge linearly to an approximate equilibrium, with suboptimality proportional to the degree of asymmetry. Numerical experiments confirm the theoretical results across both symmetric and asymmetric interaction networks.

【30】Functional Decomposition and Shapley Interactions for Interpreting Survival Models
标题：解释生存模型的功能分解和Shapley相互作用
链接：https://arxiv.org/abs/2602.16505

作者：Sophie Hanna Langbein,Hubert Baniecki,Fabian Fumagalli,Niklas Koenen,Marvin N. Wright,Julia Herbinger
摘要：Hazard and survival functions are natural, interpretable targets in time-to-event prediction, but their inherent non-additivity fundamentally limits standard additive explanation methods. We introduce Survival Functional Decomposition (SurvFD), a principled approach for analyzing feature interactions in machine learning survival models. By decomposing higher-order effects into time-dependent and time-independent components, SurvFD offers a previously unrecognized perspective on survival explanations, explicitly characterizing when and why additive explanations fail. Building on this theoretical decomposition, we propose SurvSHAP-IQ, which extends Shapley interactions to time-indexed functions, providing a practical estimator for higher-order, time-dependent interactions. Together, SurvFD and SurvSHAP-IQ establish an interaction- and time-aware interpretability approach for survival modeling, with broad applicability across time-to-event prediction tasks.

【31】Learning Preference from Observed Rankings
标题：从观察到的排名中学习偏好
链接：https://arxiv.org/abs/2602.16476

作者：Yu-Chang Chen,Chen Chian Fuh,Shang En Tsai
摘要：Estimating consumer preferences is central to many problems in economics and marketing. This paper develops a flexible framework for learning individual preferences from partial ranking information by interpreting observed rankings as collections of pairwise comparisons with logistic choice probabilities. We model latent utility as the sum of interpretable product attributes, item fixed effects, and a low-rank user-item factor structure, enabling both interpretability and information sharing across consumers and items. We further correct for selection in which comparisons are observed: a comparison is recorded only if both items enter the consumer's consideration set, inducing exposure bias toward frequently encountered items. We model pair observability as the product of item-level observability propensities and estimate these propensities with a logistic model for the marginal probability that an item is observable. Preference parameters are then estimated by maximizing an inverse-probability-weighted (IPW), ridge-regularized log-likelihood that reweights observed comparisons toward a target comparison population. To scale computation, we propose a stochastic gradient descent (SGD) algorithm based on inverse-probability resampling, which draws comparisons in proportion to their IPW weights. In an application to transaction data from an online wine retailer, the method improves out-of-sample recommendation performance relative to a popularity-based benchmark, with particularly strong gains in predicting purchases of previously unconsumed products.

【32】Machine Learning in Epidemiology
标题：流行病学中的机器学习
链接：https://arxiv.org/abs/2602.16352

作者：Marvin N. Wright,Lukas Burk,Pegah Golchian,Jan Kapar,Niklas Koenen,Sophie Hanna Langbein
摘要：In the age of digital epidemiology, epidemiologists are faced by an increasing amount of data of growing complexity and dimensionality. Machine learning is a set of powerful tools that can help to analyze such enormous amounts of data. This chapter lays the methodological foundations for successfully applying machine learning in epidemiology. It covers the principles of supervised and unsupervised learning and discusses the most important machine learning methods. Strategies for model evaluation and hyperparameter optimization are developed and interpretable machine learning is introduced. All these theoretical parts are accompanied by code examples in R, where an example dataset on heart disease is used throughout the chapter.

【33】Conjugate Learning Theory: Uncovering the Mechanisms of Trainability and Generalization in Deep Neural Networks
标题：结合学习理论：揭示深度神经网络的可训练性和概括性机制
链接：https://arxiv.org/abs/2602.16177

作者：Binchuan Qi
摘要：In this work, we propose a notion of practical learnability grounded in finite sample settings, and develop a conjugate learning theoretical framework based on convex conjugate duality to characterize this learnability property. Building on this foundation, we demonstrate that training deep neural networks (DNNs) with mini-batch stochastic gradient descent (SGD) achieves global optima of empirical risk by jointly controlling the extreme eigenvalues of a structure matrix and the gradient energy, and we establish a corresponding convergence theorem. We further elucidate the impact of batch size and model architecture (including depth, parameter count, sparsity, skip connections, and other characteristics) on non-convex optimization. Additionally, we derive a model-agnostic lower bound for the achievable empirical risk, theoretically demonstrating that data determines the fundamental limit of trainability. On the generalization front, we derive deterministic and probabilistic bounds on generalization error based on generalized conditional entropy measures. The former explicitly delineates the range of generalization error, while the latter characterizes the distribution of generalization error relative to the deterministic bounds under independent and identically distributed (i.i.d.) sampling conditions. Furthermore, these bounds explicitly quantify the influence of three key factors: (i) information loss induced by irreversibility in the model, (ii) the maximum attainable loss value, and (iii) the generalized conditional entropy of features with respect to labels. Moreover, they offer a unified theoretical lens for understanding the roles of regularization, irreversible transformations, and network depth in shaping the generalization behavior of deep neural networks. Extensive experiments validate all theoretical predictions, confirming the framework's correctness and consistency.

【34】Examining Fast Radiative Feedbacks Using Machine-Learning Weather Emulators
标题：使用机器学习天气模拟器检查快速辐射反馈
链接：https://arxiv.org/abs/2602.16090

作者：Ankur Mahesh,William D. Collins,Travis A. O'Brien,Paul B. Goddard,Sinclaire Zebaze,Shashank Subramanian,James P. C. Duncan,Oliver Watt-Meyer,Boris Bonev,Thorsten Kurth,Karthik Kashinath,Michael S. Pritchard,Da Yang
摘要：The response of the climate system to increased greenhouse gases and other radiative perturbations is governed by a combination of fast and slow feedbacks. Slow feedbacks are typically activated in response to changes in ocean temperatures on decadal timescales and manifest as changes in climatic state with no recent historical analogue. However, fast feedbacks are activated in response to rapid atmospheric physical processes on weekly timescales, and they are already operative in the present-day climate. This distinction implies that the physics of fast radiative feedbacks is present in the historical meteorological reanalyses used to train many recent successful machine-learning-based (ML) emulators of weather and climate. In addition, these feedbacks are functional under the historical boundary conditions pertaining to the top-of-atmosphere radiative balance and sea-surface temperatures. Together, these factors imply that we can use historically trained ML weather emulators to study the response of radiative-convective equilibrium (RCE), and hence the global hydrological cycle, to perturbations in carbon dioxide and other well-mixed greenhouse gases. Without retraining on prospective Earth system conditions, we use ML weather emulators to quantify the fast precipitation response to reduced and elevated carbon dioxed concentrations with no recent historical precedent. We show that the responses from historically trained emulators agree with those produced by full-physics Earth System Models (ESMs). In conclusion, we discuss the prospects for and advantages from using ESMs and ML emulators to study fast processes in global climate.

【35】Imaging-Derived Coronary Fractional Flow Reserve: Advances in Physics-Based, Machine-Learning, and Physics-Informed Methods
标题：图像来源的冠状动脉血流储备分数：基于物理、机器学习和物理知情方法的进展
链接：https://arxiv.org/abs/2602.16000

作者：Tanxin Zhu,Emran Hossen,Chen Zhao,Michele Esposito,Jiguang Sun,Weihua Zhou
备注：26 pages 4 tables
摘要：Purpose of Review Imaging derived fractional flow reserve (FFR) is rapidly evolving beyond conventional computational fluid dynamics (CFD) based pipelines toward machine learning (ML), deep learning (DL), and physics informed approaches that enable fast, wire free, and scalable functional assessment of coronary stenosis. This review synthesizes recent advances in CT and angiography based FFR, with particular emphasis on emerging physics informed neural networks and neural operators (PINNs and PINOs) and key considerations for their clinical translation. Recent Findings ML/DL approaches have markedly improved automation and computational speed, enabling prediction of pressure and FFR from anatomical descriptors or angiographic contrast dynamics. However, their real-world performance and generalizability can remain variable and sensitive to domain shift, due to multi-center heterogeneity, interpretability challenges, and differences in acquisition protocols and image quality. Physics informed learning introduces conservation structure and boundary condition consistency into model training, improving generalizability and reducing dependence on dense supervision while maintaining rapid inference. Recent evaluation trends increasingly highlight deployment oriented metrics, including calibration, uncertainty quantification, and quality control gatekeeping, as essential for safe clinical use. Summary The field is converging toward imaging derived FFR methods that are faster, more automated, and more reliable. While ML/DL offers substantial efficiency gains, physics informed frameworks such as PINNs and PINOs may provide a more robust balance between speed and physical consistency. Prospective multi center validation and standardized evaluation will be critical to support broad and safe clinical adoption.

【36】Including Node Textual Metadata in Laplacian-constrained Gaussian Graphical Models
标题：在拉普拉斯约束高斯图形模型中包含节点文本元数据
链接：https://arxiv.org/abs/2602.15920

作者：Jianhua Wang,Killian Cressant,Pedro Braconnot Velloso,Arnaud Breloy
备注：Submitted to EUSIPCO 2026
摘要：This paper addresses graph learning in Gaussian Graphical Models (GGMs). In this context, data matrices often come with auxiliary metadata (e.g., textual descriptions associated with each node) that is usually ignored in traditional graph estimation processes. To fill this gap, we propose a graph learning approach based on Laplacian-constrained GGMs that jointly leverages the node signals and such metadata. The resulting formulation yields an optimization problem, for which we develop an efficient majorization-minimization (MM) algorithm with closed-form updates at each iteration. Experimental results on a real-world financial dataset demonstrate that the proposed method significantly improves graph clustering performance compared to state-of-the-art approaches that use either signals or metadata alone, thus illustrating the interest of fusing both sources of information.

【37】Steering Dynamical Regimes of Diffusion Models by Breaking Detailed Balance
标题：通过打破详细平衡来引导扩散模型的动态机制
链接：https://arxiv.org/abs/2602.15914

作者：Haiqi Lu,Ying Tang
摘要：We show that deliberately breaking detailed balance in generative diffusion processes can accelerate the reverse process without changing the stationary distribution. Considering the Ornstein--Uhlenbeck process, we decompose the dynamics into a symmetric component and a non-reversible anti-symmetric component that generates rotational probability currents. We then construct an exponentially optimal non-reversible perturbation that improves the long-time relaxation rate while preserving the stationary target. We analyze how such non-reversible control reshapes the macroscopic dynamical regimes of the phase transitions recently identified in generative diffusion models. We derive a general criterion for the speciation time and show that suitable non-reversible perturbations can accelerate speciation. In contrast, the collapse transition is governed by a trace-controlled phase-space contraction mechanism that is fixed by the symmetric component, and the corresponding collapse time remains unchanged under anti-symmetric perturbations. Numerical experiments on Gaussian mixture models support these findings.

【38】Surrogate Modeling for Neutron Transport: A Neural Operator Approach
标题：中子输运的代理模型：神经算子方法
链接：https://arxiv.org/abs/2602.15890

作者：Md Hossain Sahadath,Qiyun Cheng,Shaowu Pan,Wei Ji
摘要：This work introduces a neural operator based surrogate modeling framework for neutron transport computation. Two architectures, the Deep Operator Network (DeepONet) and the Fourier Neural Operator (FNO), were trained for fixed source problems to learn the mapping from anisotropic neutron sources, Q(x,μ), to the corresponding angular fluxes, ψ(x,μ), in a one-dimensional slab geometry. Three distinct models were trained for each neural operator, corresponding to different scattering ratios (c = 0.1, 0.5, & 1.0), providing insight into their performance across distinct transport regimes (absorption-dominated, moderate, and scattering-dominated). The models were subsequently evaluated on a wide range of previously unseen source configurations, demonstrating that FNO generally achieves higher predictive accuracy, while DeepONet offers greater computational efficiency. Both models offered significant speedups that become increasingly pronounced as the scattering ratio increases, requiring <0.3% of the runtime of a conventional S_N solver. The surrogate models were further incorporated into the S_N k-eigenvalue solver, replacing the computationally intensive transport sweep loop with a single forward pass. Across varying fission cross sections and spatial-angular grids, both neural operator solvers reproduced reference eigenvalues with deviations up to 135 pcm for DeepONet and 112 pcm for FNO, while reducing runtime to <0.1% of that of the S_N solver on relatively fine grids. These results demonstrate the strong potential of neural operator frameworks as accurate, efficient, and generalizable surrogates for neutron transport, paving the way for real-time digital twin applications and repeated evaluations, such as in design optimization.

其他(36篇)

【1】Causality is Key for Interpretability Claims to Generalise
标题：因果关系是概括可解释性主张的关键
链接：https://arxiv.org/abs/2602.16698

作者：Shruti Joshi,Aaron Mueller,David Klindt,Wieland Brendel,Patrik Reizinger,Dhanya Sridhar
摘要：Interpretability research on large language models (LLMs) has yielded important insights into model behaviour, yet recurring pitfalls persist: findings that do not generalise, and causal interpretations that outrun the evidence. Our position is that causal inference specifies what constitutes a valid mapping from model activations to invariant high-level structures, the data or assumptions needed to achieve it, and the inferences it can support. Specifically, Pearl's causal hierarchy clarifies what an interpretability study can justify. Observations establish associations between model behaviour and internal components. Interventions (e.g., ablations or activation patching) support claims how these edits affect a behavioural metric (\eg, average change in token probabilities) over a set of prompts. However, counterfactual claims -- i.e., asking what the model output would have been for the same prompt under an unobserved intervention -- remain largely unverifiable without controlled supervision. We show how causal representation learning (CRL) operationalises this hierarchy, specifying which variables are recoverable from activations and under what assumptions. Together, these motivate a diagnostic framework that helps practitioners select methods and evaluations matching claims to evidence such that findings generalise.

【2】Protecting the Undeleted in Machine Unlearning
标题：在机器取消学习中保护未删除的内容
链接：https://arxiv.org/abs/2602.16697

作者：Aloni Cohen,Refael Kohen,Kobbi Nissim,Uri Stemmer
摘要：Machine unlearning aims to remove specific data points from a trained model, often striving to emulate "perfect retraining", i.e., producing the model that would have been obtained had the deleted data never been included. We demonstrate that this approach, and security definitions that enable it, carry significant privacy risks for the remaining (undeleted) data points. We present a reconstruction attack showing that for certain tasks, which can be computed securely without deletions, a mechanism adhering to perfect retraining allows an adversary controlling merely $ω(1)$ data points to reconstruct almost the entire dataset merely by issuing deletion requests. We survey existing definitions for machine unlearning, showing they are either susceptible to such attacks or too restrictive to support basic functionalities like exact summation. To address this problem, we propose a new security definition that specifically safeguards undeleted data against leakage caused by the deletion of other points. We show that our definition permits several essential functionalities, such as bulletin boards, summations, and statistical learning.

【3】On the Hardness of Approximation of the Fair k-Center Problem
标题：公平k中心问题逼近的难度
链接：https://arxiv.org/abs/2602.16688

作者：Suhas Thejaswi
摘要：In this work, we study the hardness of approximation of the fair $k$-center problem. Here the data points are partitioned into groups and the task is to choose a prescribed number of data points from each group, called centers, while minimizing the maximum distance from any point to its closest center. Although a polynomial-time $3$-approximation is known for this problem in general metrics, it has remained open whether this approximation guarantee is tight or could be further improved, especially since the unconstrained $k$-center problem admits a polynomial-time factor-$2$ approximation. We resolve this open question by proving that, for every $ε>0$, achieving a $(3-ε)$-approximation is NP-hard, assuming $\text{P} \neq \text{NP}$. Our inapproximability results hold even when only two disjoint groups are present and at least one center must be chosen from each group. Further, it extends to the canonical one-per-group setting with $k$-groups (for arbitrary $k$), where exactly one center must be selected from each group. Consequently, the factor-$3$ barrier for fair $k$-center in general metric spaces is inherent, and existing $3$-approximation algorithms are optimal up to lower-order terms even in these restricted regimes. This result stands in sharp contrast to the $k$-supplier formulation, where both the unconstrained and fair variants admit factor-$3$ approximation in polynomial time.

【4】Neighborhood Stability as a Measure of Nearest Neighbor Searchability
标题：邻里稳定性作为最近邻搜索能力的衡量标准
链接：https://arxiv.org/abs/2602.16673

作者：Thomas Vecchiato,Sebastian Bruch
摘要：Clustering-based Approximate Nearest Neighbor Search (ANNS) organizes a set of points into partitions, and searches only a few of them to find the nearest neighbors of a query. Despite its popularity, there are virtually no analytical tools to determine the suitability of clustering-based ANNS for a given dataset -- what we call "searchability." To address that gap, we present two measures for flat clusterings of high-dimensional points in Euclidean space. First is Clustering-Neighborhood Stability Measure (clustering-NSM), an internal measure of clustering quality -- a function of a clustering of a dataset -- that we show to be predictive of ANNS accuracy. The second, Point-Neighborhood Stability Measure (point-NSM), is a measure of clusterability -- a function of the dataset itself -- that is predictive of clustering-NSM. The two together allow us to determine whether a dataset is searchable by clustering-based ANNS given only the data points. Importantly, both are functions of nearest neighbor relationships between points, not distances, making them applicable to various distance functions including inner product.

【5】Towards a Science of AI Agent Reliability
标题：迈向人工智能代理可靠性科学
链接：https://arxiv.org/abs/2602.16666

作者：Stephan Rabanser,Sayash Kapoor,Peter Kirgis,Kangheng Liu,Saiteja Utpala,Arvind Narayanan
摘要：AI agents are increasingly deployed to execute important tasks. While rising accuracy scores on standard benchmarks suggest rapid progress, many agents still continue to fail in practice. This discrepancy highlights a fundamental limitation of current evaluations: compressing agent behavior into a single success metric obscures critical operational flaws. Notably, it ignores whether agents behave consistently across runs, withstand perturbations, fail predictably, or have bounded error severity. Grounded in safety-critical engineering, we provide a holistic performance profile by proposing twelve concrete metrics that decompose agent reliability along four key dimensions: consistency, robustness, predictability, and safety. Evaluating 14 agentic models across two complementary benchmarks, we find that recent capability gains have only yielded small improvements in reliability. By exposing these persistent limitations, our metrics complement traditional evaluations while offering tools for reasoning about how agents perform, degrade, and fail.

【6】Optimizer choice matters for the emergence of Neural Collapse
标题：优化器的选择对于神经崩溃的出现很重要
链接：https://arxiv.org/abs/2602.16642

作者：Jim Zhao,Tin Sum Cheng,Wojciech Masarczyk,Aurelien Lucchi
备注：Published as a conference paper at ICLR 2026
摘要：Neural Collapse (NC) refers to the emergence of highly symmetric geometric structures in the representations of deep neural networks during the terminal phase of training. Despite its prevalence, the theoretical understanding of NC remains limited. Existing analyses largely ignore the role of the optimizer, thereby suggesting that NC is universal across optimization methods. In this work, we challenge this assumption and demonstrate that the choice of optimizer plays a critical role in the emergence of NC. The phenomenon is typically quantified through NC metrics, which, however, are difficult to track and analyze theoretically. To overcome this limitation, we introduce a novel diagnostic metric, NC0, whose convergence to zero is a necessary condition for NC. Using NC0, we provide theoretical evidence that NC cannot emerge under decoupled weight decay in adaptive optimizers, as implemented in AdamW. Concretely, we prove that SGD, SignGD with coupled weight decay (a special case of Adam), and SignGD with decoupled weight decay (a special case of AdamW) exhibit qualitatively different NC0 dynamics. Also, we show the accelerating effect of momentum on NC (beyond convergence of train loss) when trained with SGD, being the first result concerning momentum in the context of NC. Finally, we conduct extensive empirical experiments consisting of 3,900 training runs across various datasets, architectures, optimizers, and hyperparameters, confirming our theoretical results. This work provides the first theoretical explanation for optimizer-dependent emergence of NC and highlights the overlooked role of weight-decay coupling in shaping the implicit biases of optimizers.

【7】Illustration of Barren Plateaus in Quantum Computing
标题：量子计算中的贫瘠高原插图
链接：https://arxiv.org/abs/2602.16558

作者：Gerhard Stenzel,Tobias Rohe,Michael Kölle,Leo Sünkel,Jonas Stein,Claudia Linnhoff-Popien
备注：Extended version of a short paper to be published at ICAART-QAIO 2026
摘要：Variational Quantum Circuits (VQCs) have emerged as a promising paradigm for quantum machine learning in the NISQ era. While parameter sharing in VQCs can reduce the parameter space dimensionality and potentially mitigate the barren plateau phenomenon, it introduces a complex trade-off that has been largely overlooked. This paper investigates how parameter sharing, despite creating better global optima with fewer parameters, fundamentally alters the optimization landscape through deceptive gradients -- regions where gradient information exists but systematically misleads optimizers away from global optima. Through systematic experimental analysis, we demonstrate that increasing degrees of parameter sharing generate more complex solution landscapes with heightened gradient magnitudes and measurably higher deceptiveness ratios. Our findings reveal that traditional gradient-based optimizers (Adam, SGD) show progressively degraded convergence as parameter sharing increases, with performance heavily dependent on hyperparameter selection. We introduce a novel gradient deceptiveness detection algorithm and a quantitative framework for measuring optimization difficulty in quantum circuits, establishing that while parameter sharing can improve circuit expressivity by orders of magnitude, this comes at the cost of significantly increased landscape deceptiveness. These insights provide important considerations for quantum circuit design in practical applications, highlighting the fundamental mismatch between classical optimization strategies and quantum parameter landscapes shaped by parameter sharing.

【8】Small molecule retrieval from tandem mass spectrometry: what are we optimizing for?
标题：从串联MS中检索小分子：我们正在优化什么？
链接：https://arxiv.org/abs/2602.16507

作者：Gaetan De Waele,Marek Wydmuch,Krzysztof Dembczyński,Wojciech Kotłowski,Willem Waegeman
摘要：One of the central challenges in the computational analysis of liquid chromatography-tandem mass spectrometry (LC-MS/MS) data is to identify the compounds underlying the output spectra. In recent years, this problem is increasingly tackled using deep learning methods. A common strategy involves predicting a molecular fingerprint vector from an input mass spectrum, which is then used to search for matches in a chemical compound database. While various loss functions are employed in training these predictive models, their impact on model performance remains poorly understood. In this study, we investigate commonly used loss functions, deriving novel regret bounds that characterize when Bayes-optimal decisions for these objectives must diverge. Our results reveal a fundamental trade-off between the two objectives of (1) fingerprint similarity and (2) molecular retrieval. Optimizing for more accurate fingerprint predictions typically worsens retrieval results, and vice versa. Our theoretical analysis shows this trade-off depends on the similarity structure of candidate sets, providing guidance for loss function and fingerprint selection.

【9】Fast and Scalable Analytical Diffusion
标题：快速且可扩展的分析扩散
链接：https://arxiv.org/abs/2602.16498

作者：Xinyi Shang,Peng Sun,Jingyu Lin,Zhiqiang Shen
摘要：Analytical diffusion models offer a mathematically transparent path to generative modeling by formulating the denoising score as an empirical-Bayes posterior mean. However, this interpretability comes at a prohibitive cost: the standard formulation necessitates a full-dataset scan at every timestep, scaling linearly with dataset size. In this work, we present the first systematic study addressing this scalability bottleneck. We challenge the prevailing assumption that the entire training data is necessary, uncovering the phenomenon of Posterior Progressive Concentration: the effective golden support of the denoising score is not static but shrinks asymptotically from the global manifold to a local neighborhood as the signal-to-noise ratio increases. Capitalizing on this, we propose Dynamic Time-Aware Golden Subset Diffusion (GoldDiff), a training-free framework that decouples inference complexity from dataset size. Instead of static retrieval, GoldDiff uses a coarse-to-fine mechanism to dynamically pinpoint the ''Golden Subset'' for inference. Theoretically, we derive rigorous bounds guaranteeing that our sparse approximation converges to the exact score. Empirically, GoldDiff achieves a $\bf 71 \times$ speedup on AFHQ while matching or achieving even better performance than full-scan baselines. Most notably, we demonstrate the first successful scaling of analytical diffusion to ImageNet-1K, unlocking a scalable, training-free paradigm for large-scale generative modeling.

【10】Beyond SGD, Without SVD: Proximal Subspace Iteration LoRA with Diagonal Fractional K-FAC
标题：超越BCD，没有奇异值：具有对角分数K-FAC的近端子空间迭代LoRA
链接：https://arxiv.org/abs/2602.16456

作者：Abdulla Jasem Almansoori,Maria Ivanova,Andrey Veprikov,Aleksandr Beznosikov,Samuel Horváth,Martin Takáč
备注：20 pages, 5 figures, 4 tables
摘要：Low-Rank Adaptation (LoRA) fine-tunes large models by learning low-rank updates on top of frozen weights, dramatically reducing trainable parameters and memory. In this work, we address the gap between training with full steps with low-rank projections (SVDLoRA) and LoRA fine-tuning. We propose LoRSum, a memory-efficient subroutine that closes this gap for gradient descent by casting LoRA optimization as a proximal sub-problem and solving it efficiently with alternating least squares updates, which we prove to be an implicit block power method. We recover several recently proposed preconditioning methods for LoRA as special cases, and show that LoRSum can also be used for updating a low-rank momentum. In order to address full steps with preconditioned gradient descent, we propose a scaled variant of LoRSum that uses structured metrics such as K-FAC and Shampoo, and we show that storing the diagonal of these metrics still allows them to perform well while remaining memory-efficient. Experiments on a synthetic task, CIFAR-100, and language-model fine-tuning on GLUE, SQuAD v2, and WikiText-103, show that our method can match or improve LoRA baselines given modest compute overhead, while avoiding full-matrix SVD projections and retaining LoRA-style parameter efficiency.

【11】Easy Data Unlearning Bench
标题：轻松数据学习工作台
链接：https://arxiv.org/abs/2602.16400

作者：Roy Rinberg,Pol Puigdemont,Martin Pawelczyk,Volkan Cevher
备注：ICML 2025 Workshop on Machine Unlearning for Generative AI
摘要：Evaluating machine unlearning methods remains technically challenging, with recent benchmarks requiring complex setups and significant engineering overhead. We introduce a unified and extensible benchmarking suite that simplifies the evaluation of unlearning algorithms using the KLoM (KL divergence of Margins) metric. Our framework provides precomputed model ensembles, oracle outputs, and streamlined infrastructure for running evaluations out of the box. By standardizing setup and metrics, it enables reproducible, scalable, and fair comparison across unlearning methods. We aim for this benchmark to serve as a practical foundation for accelerating research and promoting best practices in machine unlearning. Our code and data are publicly available.

【12】Improved Bounds for Reward-Agnostic and Reward-Free Exploration
标题：改进了奖励不可知和无奖励探索的界限
链接：https://arxiv.org/abs/2602.16363

作者：Oran Ridel,Alon Cohen
摘要：We study reward-free and reward-agnostic exploration in episodic finite-horizon Markov decision processes (MDPs), where an agent explores an unknown environment without observing external rewards. Reward-free exploration aims to enable $ε$-optimal policies for any reward revealed after exploration, while reward-agnostic exploration targets $ε$-optimality for rewards drawn from a small finite class. In the reward-agnostic setting, Li, Yan, Chen, and Fan achieve minimax sample complexity, but only for restrictively small accuracy parameter $ε$. We propose a new algorithm that significantly relaxes the requirement on $ε$. Our approach is novel and of technical interest by itself. Our algorithm employs an online learning procedure with carefully designed rewards to construct an exploration policy, which is used to gather data sufficient for accurate dynamics estimation and subsequent computation of an $ε$-optimal policy once the reward is revealed. Finally, we establish a tight lower bound for reward-free exploration, closing the gap between known upper and lower bounds.

【13】Fast KV Compaction via Attention Matching
标题：通过注意力匹配快速KV压缩
链接：https://arxiv.org/abs/2602.16284

作者：Adam Zweiger,Xinghong Fu,Han Guo,Yoon Kim
摘要：Scaling language models to long contexts is often bottlenecked by the size of the key-value (KV) cache. In deployed settings, long contexts are typically managed through compaction in token space via summarization. However, summarization can be highly lossy, substantially harming downstream performance. Recent work on Cartridges has shown that it is possible to train highly compact KV caches in latent space that closely match full-context performance, but at the cost of slow and expensive end-to-end optimization. This work describes an approach for fast context compaction in latent space through Attention Matching, which constructs compact keys and values to reproduce attention outputs and preserve attention mass at a per-KV-head level. We show that this formulation naturally decomposes into simple subproblems, some of which admit efficient closed-form solutions. Within this framework, we develop a family of methods that significantly push the Pareto frontier of compaction time versus quality, achieving up to 50x compaction in seconds on some datasets with little quality loss.

【14】Bayesian Quadrature: Gaussian Processes for Integration
标题：贝氏求积：高斯积分过程
链接：https://arxiv.org/abs/2602.16218

作者：Maren Mahsereci,Toni Karvonen
摘要：Bayesian quadrature is a probabilistic, model-based approach to numerical integration, the estimation of intractable integrals, or expectations. Although Bayesian quadrature was popularised already in the 1980s, no systematic and comprehensive treatment has been published. The purpose of this survey is to fill this gap. We review the mathematical foundations of Bayesian quadrature from different points of view; present a systematic taxonomy for classifying different Bayesian quadrature methods along the three axes of modelling, inference, and sampling; collect general theoretical guarantees; and provide a controlled numerical study that explores and illustrates the effect of different choices along the axes of the taxonomy. We also provide a realistic assessment of practical challenges and limitations to application of Bayesian quadrature methods and include an up-to-date and nearly exhaustive bibliography that covers not only machine learning and statistics literature but all areas of mathematics and engineering in which Bayesian quadrature or equivalent methods have seen use.

【15】Geometric Neural Operators via Lie Group-Constrained Latent Dynamics
标题：基于李群约束潜动力学的几何神经运算符
链接：https://arxiv.org/abs/2602.16209

作者：Jiaquan Zhang,Fachrina Dewi Puspitasari,Songbo Zhang,Yibei Liu,Kuien Liu,Caiyan Qin,Fan Mo,Peng Wang,Yang Yang,Chaoning Zhang
摘要：Neural operators offer an effective framework for learning solutions of partial differential equations for many physical systems in a resolution-invariant and data-driven manner. Existing neural operators, however, often suffer from instability in multi-layer iteration and long-horizon rollout, which stems from the unconstrained Euclidean latent space updates that violate the geometric and conservation laws. To address this challenge, we propose to constrain manifolds with low-rank Lie algebra parameterization that performs group action updates on the latent representation. Our method, termed Manifold Constraining based on Lie group (MCL), acts as an efficient \emph{plug-and-play} module that enforces geometric inductive bias to existing neural operators. Extensive experiments on various partial differential equations, such as 1-D Burgers and 2-D Navier-Stokes, over a wide range of parameters and steps demonstrate that our method effectively lowers the relative prediction error by 30-50\% at the cost of 2.26\% of parameter increase. The results show that our approach provides a scalable solution for improving long-term prediction fidelity by addressing the principled geometric constraints absent in the neural operator updates.

【16】ModalImmune: Immunity Driven Unlearning via Self Destructive Training
标题：ModalImmune：通过自我破坏性训练的免疫力驱动的忘记学习
链接：https://arxiv.org/abs/2602.16197

作者：Rong Fu,Jia Yee Tan,Wenxin Zhang,Zijian Zhang,Ziming Wang,Zhaolu Kang,Muge Qi,Shuning Zhang,Simon Fong
备注：23 pages, 8 figures
摘要：Multimodal systems are vulnerable to partial or complete loss of input channels at deployment, which undermines reliability in real-world settings. This paper presents ModalImmune, a training framework that enforces modality immunity by intentionally and controllably collapsing selected modality information during training so the model learns joint representations that are robust to destructive modality influence. The framework combines a spectrum-adaptive collapse regularizer, an information-gain guided controller for targeted interventions, curvature-aware gradient masking to stabilize destructive updates, and a certified Neumann-truncated hyper-gradient procedure for automatic meta-parameter adaptation. Empirical evaluation on standard multimodal benchmarks demonstrates that ModalImmune improves resilience to modality removal and corruption while retaining convergence stability and reconstruction capacity.

【17】Revolutionizing Long-Term Memory in AI: New Horizons with High-Capacity and High-Speed Storage
标题：人工智能中的长期记忆革命：具有高容量和高速存储的新视野
链接：https://arxiv.org/abs/2602.16192

作者：Hiroaki Yamanaka,Daisuke Miyashita,Takashi Toi,Asuka Maki,Taiga Ikeda,Jun Deguchi
备注：13 pages, 5 figures
摘要：Driven by our mission of "uplifting the world with memory," this paper explores the design concept of "memory" that is essential for achieving artificial superintelligence (ASI). Rather than proposing novel methods, we focus on several alternative approaches whose potential benefits are widely imaginable, yet have remained largely unexplored. The currently dominant paradigm, which can be termed "extract then store," involves extracting information judged to be useful from experiences and saving only the extracted content. However, this approach inherently risks the loss of information, as some valuable knowledge particularly for different tasks may be discarded in the extraction process. In contrast, we emphasize the "store then on-demand extract" approach, which seeks to retain raw experiences and flexibly apply them to various tasks as needed, thus avoiding such information loss. In addition, we highlight two further approaches: discovering deeper insights from large collections of probabilistic experiences, and improving experience collection efficiency by sharing stored experiences. While these approaches seem intuitively effective, our simple experiments demonstrate that this is indeed the case. Finally, we discuss major challenges that have limited investigation into these promising directions and propose research topics to address them.

【18】Multi-Agent Combinatorial-Multi-Armed-Bandit framework for the Submodular Welfare Problem under Bandit Feedback
标题：Bandit反馈下次模福利问题的多Agent组合多武装Bandit框架
链接：https://arxiv.org/abs/2602.16183

作者：Subham Pokhriyal,Shweta Jain,Vaneet Aggarwal
摘要：We study the \emph{Submodular Welfare Problem} (SWP), where items are partitioned among agents with monotone submodular utilities to maximize the total welfare under \emph{bandit feedback}. Classical SWP assumes full value-oracle access, achieving $(1-1/e)$ approximations via continuous-greedy algorithms. We extend this to a \emph{multi-agent combinatorial bandit} framework (\textsc{MA-CMAB}), where actions are partitions under full-bandit feedback with non-communicating agents. Unlike prior single-agent or separable multi-agent CMAB models, our setting couples agents through shared allocation constraints. We propose an explore-then-commit strategy with randomized assignments, achieving $\tilde{\mathcal{O}}(T^{2/3})$ regret against a $(1-1/e)$ benchmark, the first such guarantee for partition-based submodular welfare problem under bandit feedback.

【19】EnterpriseGym Corecraft: Training Generalizable Agents on High-Fidelity RL Environments
标题：DeliverseGym Corecraft：在高保真RL环境上训练可推广代理
链接：https://arxiv.org/abs/2602.16179

作者：Sushant Mehta,Logan Ritchie,Suhaas Garre,Nick Heiner,Edwin Chen
摘要：We show that training AI agents on high-fidelity reinforcement learning environments produces capabilities that generalize beyond the training distribution. We introduce \corecraft{}, the first environment in \textsc{EnterpriseGym}, Surge AI's suite of agentic RL environments. \corecraft{} is a fully operational enterprise simulation of a customer support organization, comprising over 2,500 entities across 14 entity types with 23 unique tools, designed to measure whether AI agents can perform the multi-step, domain-specific work that real jobs demand. Frontier models such as GPT-5.2 and Claude Opus 4.6 solve fewer than 30\% of tasks when all expert-authored rubric criteria must be satisfied. Using this environment, we train GLM~4.6 with Group Relative Policy Optimization (GRPO) and adaptive clipping. After a single epoch of training, the model improves from 25.37\% to 36.76\% task pass rate on held-out evaluation tasks. More importantly, these gains transfer to out-of-distribution benchmarks: +4.5\% on BFCL Parallel, +7.4\% on $τ^2$-Bench Retail, and +6.8\% on Toolathlon (Pass@1). We believe three environment properties are consistent with the observed transfer: task-centric world building that optimizes for diverse, challenging tasks; expert-authored rubrics enabling reliable reward computation; and enterprise workflows that reflect realistic professional patterns. Our results suggest that environment quality, diversity, and realism are key factors enabling generalizable agent capabilities.

【20】Emotion Collider: Dual Hyperbolic Mirror Manifolds for Sentiment Recovery via Anti Emotion Reflection
标题：情感碰撞器：通过反情感反射恢复情绪的双双曲镜总管
链接：https://arxiv.org/abs/2602.16161

作者：Rong Fu,Ziming Wang,Shuo Yin,Wenxin Zhang,Haiyun Wei,Kun Liu,Xianda Li,Zeli Su,Simon Fong
备注：25 pages, 14 figures
摘要：Emotional expression underpins natural communication and effective human-computer interaction. We present Emotion Collider (EC-Net), a hyperbolic hypergraph framework for multimodal emotion and sentiment modeling. EC-Net represents modality hierarchies using Poincare-ball embeddings and performs fusion through a hypergraph mechanism that passes messages bidirectionally between nodes and hyperedges. To sharpen class separation, contrastive learning is formulated in hyperbolic space with decoupled radial and angular objectives. High-order semantic relations across time steps and modalities are preserved via adaptive hyperedge construction. Empirical results on standard multimodal emotion benchmarks show that EC-Net produces robust, semantically coherent representations and consistently improves accuracy, particularly when modalities are partially available or contaminated by noise. These findings indicate that explicit hierarchical geometry combined with hypergraph fusion is effective for resilient multimodal affect understanding.

【21】ASPEN: Spectral-Temporal Fusion for Cross-Subject Brain Decoding
标题：ASPEN：用于跨学科大脑解码的谱-时间融合
链接：https://arxiv.org/abs/2602.16147

作者：Megan Lee,Seung Ha Hwang,Inhyeok Choi,Shreyas Darade,Mengchun Zhang,Kateryna Shapovalenko
摘要：Cross-subject generalization in EEG-based brain-computer interfaces (BCIs) remains challenging due to individual variability in neural signals. We investigate whether spectral representations offer more stable features for cross-subject transfer than temporal waveforms. Through correlation analyses across three EEG paradigms (SSVEP, P300, and Motor Imagery), we find that spectral features exhibit consistently higher cross-subject similarity than temporal signals. Motivated by this observation, we introduce ASPEN, a hybrid architecture that combines spectral and temporal feature streams via multiplicative fusion, requiring cross-modal agreement for features to propagate. Experiments across six benchmark datasets reveal that ASPEN is able to dynamically achieve the optimal spectral-temporal balance depending on the paradigm. ASPEN achieves the best unseen-subject accuracy on three of six datasets and competitive performance on others, demonstrating that multiplicative multimodal fusion enables effective cross-subject generalization.

【22】Evolutionary Context Search for Automated Skill Acquisition
标题：自动技能获取的进化上下文搜索
链接：https://arxiv.org/abs/2602.16113

作者：Qi Sun,Stefan Nielsen,Rio Yokota,Yujin Tang
摘要：Large Language Models cannot reliably acquire new knowledge post-deployment -- even when relevant text resources exist, models fail to transform them into actionable knowledge without retraining. Retrieval-Augmented Generation attempts to bridge this gap by surfacing relevant documents at inference time, yet similarity-based retrieval often fails to identify context that actually improves task performance. We introduce Evolutionary Context Search (ECS), an evolutionary method that searches context combinations using accuracy on a small development set, requiring only inference calls without weight updates. ECS moves beyond semantic similarity to discover non-obvious context pairings that significantly boost performance. Our empirical results show that ECS improves BackendBench by 27\% and $τ$-bench airline by 7\%. The evolved contexts are model-agnostic, as those evolved with Gemini-3-Flash transfer effectively to Claude Sonnet and DeepSeek. This suggests that ECS opens a path toward automated context discovery for skill acquisition -- an efficient alternative to manual prompt engineering or costly fine-tuning.

【23】Surgical Activation Steering via Generative Causal Mediation
标题：通过生成性因果调解进行手术激活引导
链接：https://arxiv.org/abs/2602.16080

作者：Aruna Sankaranarayanan,Amir Zur,Atticus Geiger,Dylan Hadfield-Menell
摘要：Where should we intervene in a language model (LM) to control behaviors that are diffused across many tokens of a long-form response? We introduce Generative Causal Mediation (GCM), a procedure for selecting model components, e.g., attention heads, to steer a binary concept (e.g., talk in verse vs. talk in prose) from contrastive long-form responses. In GCM, we first construct a dataset of contrasting inputs and responses. Then, we quantify how individual model components mediate the contrastive concept and select the strongest mediators for steering. We evaluate GCM on three tasks--refusal, sycophancy, and style transfer--across three language models. GCM successfully localizes concepts expressed in long-form responses and consistently outperforms correlational probe-based baselines when steering with a sparse set of attention heads. Together, these results demonstrate that GCM provides an effective approach for localizing and controlling the long-form responses of LMs.

【24】Omni-iEEG: A Large-Scale, Comprehensive iEEG Dataset and Benchmark for Epilepsy Research
标题：Omni-iEEG：大规模、全面的iEEG数据集和癫痫研究基准
链接：https://arxiv.org/abs/2602.16072

作者：Chenda Duan,Yipeng Zhang,Sotaro Kanai,Yuanyi Ding,Atsuro Daida,Pengyue Yu,Tiancheng Zheng,Naoto Kuroda,Shaun A. Hussain,Eishi Asano,Hiroki Nariai,Vwani Roychowdhury
备注：Published as a conference paper at ICLR 2026
摘要：Epilepsy affects over 50 million people worldwide, and one-third of patients suffer drug-resistant seizures where surgery offers the best chance of seizure freedom. Accurate localization of the epileptogenic zone (EZ) relies on intracranial EEG (iEEG). Clinical workflows, however, remain constrained by labor-intensive manual review. At the same time, existing data-driven approaches are typically developed on single-center datasets that are inconsistent in format and metadata, lack standardized benchmarks, and rarely release pathological event annotations, creating barriers to reproducibility, cross-center validation, and clinical relevance. With extensive efforts to reconcile heterogeneous iEEG formats, metadata, and recordings across publicly available sources, we present $\textbf{Omni-iEEG}$, a large-scale, pre-surgical iEEG resource comprising $\textbf{302 patients}$ and $\textbf{178 hours}$ of high-resolution recordings. The dataset includes harmonized clinical metadata such as seizure onset zones, resections, and surgical outcomes, all validated by board-certified epileptologists. In addition, Omni-iEEG provides over 36K expert-validated annotations of pathological events, enabling robust biomarker studies. Omni-iEEG serves as a bridge between machine learning and epilepsy research. It defines clinically meaningful tasks with unified evaluation metrics grounded in clinical priors, enabling systematic evaluation of models in clinically relevant settings. Beyond benchmarking, we demonstrate the potential of end-to-end modeling on long iEEG segments and highlight the transferability of representations pretrained on non-neurophysiological domains. Together, these contributions establish Omni-iEEG as a foundation for reproducible, generalizable, and clinically translatable epilepsy research. The project page with dataset and code links is available at omni-ieeg.github.io/omni-ieeg.

【25】Can Generative Artificial Intelligence Survive Data Contamination? Theoretical Guarantees under Contaminated Recursive Training
标题：生成人工智能能否在数据污染中幸存下来？受污染的循环训练下的理论保证
链接：https://arxiv.org/abs/2602.16065

作者：Kevin Wang,Hongqian Niu,Didong Li
摘要：Generative Artificial Intelligence (AI), such as large language models (LLMs), has become a transformative force across science, industry, and society. As these systems grow in popularity, web data becomes increasingly interwoven with this AI-generated material and it is increasingly difficult to separate them from naturally generated content. As generative models are updated regularly, later models will inevitably be trained on mixtures of human-generated data and AI-generated data from earlier versions, creating a recursive training process with data contamination. Existing theoretical work has examined only highly simplified settings, where both the real data and the generative model are discrete or Gaussian, where it has been shown that such recursive training leads to model collapse. However, real data distributions are far more complex, and modern generative models are far more flexible than Gaussian and linear mechanisms. To fill this gap, we study recursive training in a general framework with minimal assumptions on the real data distribution and allow the underlying generative model to be a general universal approximator. In this framework, we show that contaminated recursive training still converges, with a convergence rate equal to the minimum of the baseline model's convergence rate and the fraction of real data used in each iteration. To the best of our knowledge, this is the first (positive) theoretical result on recursive training without distributional assumptions on the data. We further extend the analysis to settings where sampling bias is present in data collection and support all theoretical results with empirical studies.

【26】Extracting and Analyzing Rail Crossing Behavior Signatures from Videos using Tensor Methods
标题：使用张量方法从视频中提取和分析铁路道口行为特征
链接：https://arxiv.org/abs/2602.16057

作者：Dawon Ahn,Het Patel,Aemal Khattak,Jia Chen,Evangelos E. Papalexakis
备注：6 pages, 10 figures. Accepted at InnovaRail 2026
摘要：Railway crossings present complex safety challenges where driver behavior varies by location, time, and conditions. Traditional approaches analyze crossings individually, limiting the ability to identify shared behavioral patterns across locations. We propose a multi-view tensor decomposition framework that captures behavioral similarities across three temporal phases: Approach (warning activation to gate lowering), Waiting (gates down to train passage), and Clearance (train passage to gate raising). We analyze railway crossing videos from multiple locations using TimeSformer embeddings to represent each phase. By constructing phase-specific similarity matrices and applying non-negative symmetric CP decomposition, we discover latent behavioral components with distinct temporal signatures. Our tensor analysis reveals that crossing location appears to be a stronger determinant of behavior patterns than time of day, and that approach-phase behavior provides particularly discriminative signatures. Visualization of the learned component space confirms location-based clustering, with certain crossings forming distinct behavioral clusters. This automated framework enables scalable pattern discovery across multiple crossings, providing a foundation for grouping locations by behavioral similarity to inform targeted safety interventions.

【27】MoE-Spec: Expert Budgeting for Efficient Speculative Decoding
标题：MoE-Spec：用于高效推测解码的专家验证
链接：https://arxiv.org/abs/2602.16052

作者：Bradley McDanel,Steven Li,Sruthikesh Surineni,Harshit Khaitan
备注：12 pages, 10 figures
摘要：Speculative decoding accelerates Large Language Model (LLM) inference by verifying multiple drafted tokens in parallel. However, for Mixture-of-Experts (MoE) models, this parallelism introduces a severe bottleneck: large draft trees activate many unique experts, significantly increasing memory pressure and diminishing speedups from speculative decoding relative to autoregressive decoding. Prior methods reduce speculation depth when MoE verification becomes expensive. We propose MoE-Spec, a training-free verification-time expert budgeting method that decouples speculation depth from memory cost by enforcing a fixed expert capacity limit at each layer, loading only the experts that contribute most to verification and dropping the long tail of rarely used experts that drive bandwidth overhead. Experiments across multiple model scales and datasets show that this method yields 10--30\% higher throughput than state-of-the-art speculative decoding baselines (EAGLE-3) at comparable quality, with flexibility to trade accuracy for further latency reductions through tighter budgets.

【28】Towards Efficient Constraint Handling in Neural Solvers for Routing Problems
标题：路由问题神经求解器中的有效约束处理
链接：https://arxiv.org/abs/2602.16012

作者：Jieyi Bi,Zhiguang Cao,Jianan Zhou,Wen Song,Yaoxin Wu,Jie Zhang,Yining Ma,Cathy Wu
备注：Accepted by ICLR 2026
摘要：Neural solvers have achieved impressive progress in addressing simple routing problems, particularly excelling in computational efficiency. However, their advantages under complex constraints remain nascent, for which current constraint-handling schemes via feasibility masking or implicit feasibility awareness can be inefficient or inapplicable for hard constraints. In this paper, we present Construct-and-Refine (CaR), the first general and efficient constraint-handling framework for neural routing solvers based on explicit learning-based feasibility refinement. Unlike prior construction-search hybrids that target reducing optimality gaps through heavy improvements yet still struggle with hard constraints, CaR achieves efficient constraint handling by designing a joint training framework that guides the construction module to generate diverse and high-quality solutions well-suited for a lightweight improvement process, e.g., 10 steps versus 5k steps in prior work. Moreover, CaR presents the first use of construction-improvement-shared representation, enabling potential knowledge sharing across paradigms by unifying the encoder, especially in more complex constrained scenarios. We evaluate CaR on typical hard routing constraints to showcase its broader applicability. Results demonstrate that CaR achieves superior feasibility, solution quality, and efficiency compared to both classical and neural state-of-the-art solvers.

【29】MAEB: Massive Audio Embedding Benchmark
标题：MAEB：海量音频嵌入基准
链接：https://arxiv.org/abs/2602.16008

作者：Adnan El Assadi,Isaac Chung,Chenghao Xiao,Roman Solomatin,Animesh Jha,Rahul Chand,Silky Singh,Kaitlyn Wang,Ali Sartaz Khan,Marc Moussa Nasser,Sufen Fong,Pengfei He,Alan Xiao,Ayush Sunil Munot,Aditya Shrivastava,Artem Gazizov,Niklas Muennighoff,Kenneth Enevoldsen
摘要：We introduce the Massive Audio Embedding Benchmark (MAEB), a large-scale benchmark covering 30 tasks across speech, music, environmental sounds, and cross-modal audio-text reasoning in 100+ languages. We evaluate 50+ models and find that no single model dominates across all tasks: contrastive audio-text models excel at environmental sound classification (e.g., ESC50) but score near random on multilingual speech tasks (e.g., SIB-FLEURS), while speech-pretrained models show the opposite pattern. Clustering remains challenging for all models, with even the best-performing model achieving only modest results. We observe that models excelling on acoustic understanding often perform poorly on linguistic tasks, and vice versa. We also show that the performance of audio encoders on MAEB correlates highly with their performance when used in audio large language models. MAEB is derived from MAEB+, a collection of 98 tasks. MAEB is designed to maintain task diversity while reducing evaluation cost, and it integrates into the MTEB ecosystem for unified evaluation across text, image, and audio modalities. We release MAEB and all 98 tasks along with code and a leaderboard at https://github.com/embeddings-benchmark/mteb.

【30】Verifier-Constrained Flow Expansion for Discovery Beyond the Data
标题：验证员限制的流程扩展，以实现数据之外的发现
链接：https://arxiv.org/abs/2602.15984

作者：Riccardo De Santi,Kimon Protopapas,Ya-Ping Hsieh,Andreas Krause
备注：ICLR 2026
摘要：Flow and diffusion models are typically pre-trained on limited available data (e.g., molecular samples), covering only a fraction of the valid design space (e.g., the full molecular space). As a consequence, they tend to generate samples from only a narrow portion of the feasible domain. This is a fundamental limitation for scientific discovery applications, where one typically aims to sample valid designs beyond the available data distribution. To this end, we address the challenge of leveraging access to a verifier (e.g., an atomic bonds checker), to adapt a pre-trained flow model so that its induced density expands beyond regions of high data availability, while preserving samples validity. We introduce formal notions of strong and weak verifiers and propose algorithmic frameworks for global and local flow expansion via probability-space optimization. Then, we present Flow Expander (FE), a scalable mirror descent scheme that provably tackles both problems by verifier-constrained entropy maximization over the flow process noised state space. Next, we provide a thorough theoretical analysis of the proposed method, and state convergence guarantees under both idealized and general assumptions. Ultimately, we empirically evaluate our method on both illustrative, yet visually interpretable settings, and on a molecular design task showcasing the ability of FE to expand a pre-trained flow model increasing conformer diversity while preserving validity.

【31】Statistical-Geometric Degeneracy in UAV Search: A Physics-Aware Asymmetric Filtering Approach
标题：无人机搜索中的统计几何简并：一种具有物理意识的不对称过滤方法
链接：https://arxiv.org/abs/2602.15893

作者：Zhiyuan Ren,Yudong Fang,Tao Zhang,Wenchi Cheng,Ben Lan
摘要：Post-disaster survivor localization using Unmanned Aerial Vehicles (UAVs) faces a fundamental physical challenge: the prevalence of Non-Line-of-Sight (NLOS) propagation in collapsed structures. Unlike standard Gaussian noise, signal reflection from debris introduces strictly non-negative ranging biases. Existing robust estimators, typically designed with symmetric loss functions (e.g., Huber or Tukey), implicitly rely on the assumption of error symmetry. Consequently, they experience a theoretical mismatch in this regime, leading to a phenomenon we formally identify as Statistical-Geometric Degeneracy (SGD)-a state where the estimator stagnates due to the coupling of persistent asymmetric bias and limited observation geometry. While emerging data-driven approaches offer alternatives, they often struggle with the scarcity of training data and the sim-to-real gap inherent in unstructured disaster zones. In this work, we propose a physically-grounded solution, the AsymmetricHuberEKF, which explicitly incorporates the non-negative physical prior of NLOS biases via a derived asymmetric loss function. Theoretically, we show that standard symmetric filters correspond to a degenerate case of our framework where the physical constraint is relaxed. Furthermore, we demonstrate that resolving SGD requires not just a robust filter, but specific bilateral information, which we achieve through a co-designed active sensing strategy. Validated in a 2D nadir-view scanning scenario, our approach significantly accelerates convergence compared to symmetric baselines, offering a resilient building block for search operations where data is scarce and geometry is constrained.

【32】Synthetic-Powered Multiple Testing with FDR Control
标题：具有HDR控制的综合动力多重测试
链接：https://arxiv.org/abs/2602.16690

作者：Yonghoon Lee,Meshi Bashari,Edgar Dobriban,Yaniv Romano
摘要：Multiple hypothesis testing with false discovery rate (FDR) control is a fundamental problem in statistical inference, with broad applications in genomics, drug screening, and outlier detection. In many such settings, researchers may have access not only to real experimental observations but also to auxiliary or synthetic data -- from past, related experiments or generated by generative models -- that can provide additional evidence about the hypotheses of interest. We introduce SynthBH, a synthetic-powered multiple testing procedure that safely leverages such synthetic data. We prove that SynthBH guarantees finite-sample, distribution-free FDR control under a mild PRDS-type positive dependence condition, without requiring the pooled-data p-values to be valid under the null. The proposed method adapts to the (unknown) quality of the synthetic data: it enhances the sample efficiency and may boost the power when synthetic data are of high quality, while controlling the FDR at a user-specified level regardless of their quality. We demonstrate the empirical performance of SynthBH on tabular outlier detection benchmarks and on genomic analyses of drug-cancer sensitivity associations, and further study its properties through controlled experiments on simulated data.

【33】Robust Stochastic Gradient Posterior Sampling with Lattice Based Discretisation
标题：基于格点的鲁棒随机梯度后验抽样
链接：https://arxiv.org/abs/2602.15925

作者：Zier Mensch,Lars Holdijk,Samuel Duffield,Maxwell Aifer,Patrick J. Coles,Max Welling,Miranda C. N. Cheng
摘要：Stochastic-gradient MCMC methods enable scalable Bayesian posterior sampling but often suffer from sensitivity to minibatch size and gradient noise. To address this, we propose Stochastic Gradient Lattice Random Walk (SGLRW), an extension of the Lattice Random Walk discretization. Unlike conventional Stochastic Gradient Langevin Dynamics (SGLD), SGLRW introduces stochastic noise only through the off-diagonal elements of the update covariance; this yields greater robustness to minibatch size while retaining asymptotic correctness. Furthermore, as comparison we analyze a natural analogue of SGLD utilizing gradient clipping. Experimental validation on Bayesian regression and classification demonstrates that SGLRW remains stable in regimes where SGLD fails, including in the presence of heavy-tailed gradient noise, and matches or improves predictive performance.

【34】A fully differentiable framework for training proxy Exchange Correlation Functionals for periodic systems
标题：训练周期系统代理交换相关泛函的完全可微框架
链接：https://arxiv.org/abs/2602.15923

作者：Rakshit Kumar Singh,Aryan Amit Barsainyan,Bharath Ramsundar
摘要：Density Functional Theory (DFT) is widely used for first-principles simulations in chemistry and materials science, but its computational cost remains a key limitation for large systems. Motivated by recent advances in ML-based exchange-correlation (XC) functionals, this paper introduces a differentiable framework that integrates machine learning models into density functional theory (DFT) for solids and other periodic systems. The framework defines a clean API for neural network models that can act as drop in replacements for conventional exchange-correlation (XC) functionals and enables gradients to flow through the full self-consistent DFT workflow. The framework is implemented in Python using a PyTorch backend, making it fully differentiable and easy to use with standard deep learning tools. We integrate the implementation with the DeepChem library to promote the reuse of established models and to lower the barrier for experimentation. In initial benchmarks against established electronic structure packages (GPAW and PySCF), our models achieve relative errors on the order of 5-10%.

【35】Generalized Leverage Score for Scalable Assessment of Privacy Vulnerability
标题：用于隐私漏洞可扩展评估的广义杠杆分数
链接：https://arxiv.org/abs/2602.15919

作者：Valentin Dorseuil,Jamal Atif,Olivier Cappé
摘要：Can the privacy vulnerability of individual data points be assessed without retraining models or explicitly simulating attacks? We answer affirmatively by showing that exposure to membership inference attack (MIA) is fundamentally governed by a data point's influence on the learned model. We formalize this in the linear setting by establishing a theoretical correspondence between individual MIA risk and the leverage score, identifying it as a principled metric for vulnerability. This characterization explains how data-dependent sensitivity translates into exposure, without the computational burden of training shadow models. Building on this, we propose a computationally efficient generalization of the leverage score for deep learning. Empirical evaluations confirm a strong correlation between the proposed score and MIA success, validating this metric as a practical surrogate for individual privacy risk assessment.

【36】NeuroSleep: Neuromorphic Event-Driven Single-Channel EEG Sleep Staging for Edge-Efficient Sensing
标题：NeuSleep：神经形态事件驱动的单通道脑电睡眠分期，实现边缘高效感知
链接：https://arxiv.org/abs/2602.15888

作者：Boyu Li,Xingchun Zhu,Yonghui Wu
备注：14 pages, 5 figures, under review at Journal of Neural Engineering
摘要：Reliable, continuous neural sensing on wearable edge platforms is fundamental to long-term health monitoring; however, for electroencephalography (EEG)-based sleep monitoring, dense high-frequency processing is often computationally prohibitive under tight energy budgets. To address this bottleneck, this paper proposes NeuroSleep, an integrated event-driven sensing and inference system for energy-efficient sleep staging. NeuroSleep first converts raw EEG into complementary multi-scale bipolar event streams using Residual Adaptive Multi-Scale Delta Modulation (R-AMSDM), enabling an explicit fidelity-sparsity trade-off at the sensing front end. Furthermore, NeuroSleep adopts a hierarchical inference architecture that comprises an Event-based Adaptive Multi-scale Response (EAMR) module for local feature extraction, a Local Temporal-Attention Module (LTAM) for context aggregation, and an Epoch-Leaky Integrate-and-Fire (ELIF) module to capture long-term state persistence. Experimental results using subject-independent 5-fold cross-validation on the Sleep-EDF Expanded dataset demonstrate that NeuroSleep achieves a mean accuracy of 74.2% with only 0.932 M parameters while reducing sparsity-adjusted effective operations by approximately 53.6% relative to dense processing. Compared with the representative dense Transformer baseline, NeuroSleep improves accuracy by 7.5% with a 45.8% reduction in computational load. By bridging neuromorphic encoding with state-aware modeling, NeuroSleep provides a scalable solution for always-on sleep analysis in resource-constrained wearable scenarios.

机器翻译由腾讯交互翻译提供，仅供参考

点击“阅读原文”获取带摘要的学术速递