机器学习学术速递[2.11]

点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.LG 方向，今日共计175篇

大模型相关(19篇)

【1】Biases in the Blind Spot: Detecting What LLMs Fail to Mention
标题：盲点的偏见：检测LLM未提及的内容
链接：https://arxiv.org/abs/2602.10117

作者：Iván Arcuschin,David Chanin,Adrià Garriga-Alonso,Oana-Maria Camburu
备注：10 pages, Under review at ICML 2026
摘要：大型语言模型（LLM）通常提供看似合理的思想链（CoT）推理痕迹，但可能隐藏内部偏见。我们称之为“非语言化的偏见”。因此，通过其陈述的推理来监测模型是不可靠的，现有的偏差评估通常需要预定义的类别和手工制作的数据集。在这项工作中，我们引入了一个完全自动化的黑盒管道，用于检测特定于任务的非语言化偏见。给定一个任务数据集，流水线使用LLM自动生成器来生成候选偏差概念。然后，它通过生成正负变化，在逐渐增大的输入样本上测试每个概念，并应用统计技术进行多次测试和提前停止。如果一个概念产生了统计学上显著的性能差异，而在模型的CoT中没有被引用作为理由，则该概念被标记为非语言化偏差。我们评估我们的管道在六个法学硕士三个决策任务（招聘，贷款审批和大学录取）。我们的技术自动发现这些模型中以前未知的偏差（例如，西班牙语流利，英语熟练，书写正式）。在同一次运行中，管道还验证了先前工作手动识别的偏见（性别，种族，宗教，民族）。更广泛地说，我们提出的方法提供了一个实用的，可扩展的路径自动特定于任务的偏见发现。
摘要：Large Language Models (LLMs) often provide chain-of-thought (CoT) reasoning traces that appear plausible, but may hide internal biases. We call these *unverbalized biases*. Monitoring models via their stated reasoning is therefore unreliable, and existing bias evaluations typically require predefined categories and hand-crafted datasets. In this work, we introduce a fully automated, black-box pipeline for detecting task-specific unverbalized biases. Given a task dataset, the pipeline uses LLM autoraters to generate candidate bias concepts. It then tests each concept on progressively larger input samples by generating positive and negative variations, and applies statistical techniques for multiple testing and early stopping. A concept is flagged as an unverbalized bias if it yields statistically significant performance differences while not being cited as justification in the model's CoTs. We evaluate our pipeline across six LLMs on three decision tasks (hiring, loan approval, and university admissions). Our technique automatically discovers previously unknown biases in these models (e.g., Spanish fluency, English proficiency, writing formality). In the same run, the pipeline also validates biases that were manually identified by prior work (gender, race, religion, ethnicity). More broadly, our proposed approach provides a practical, scalable path to automatic task-specific bias discovery.

【2】LLMs Encode Their Failures: Predicting Success from Pre-Generation Activations
标题：LLM编码他们的失败：从前一代激活中预测成功
链接：https://arxiv.org/abs/2602.09924

作者：William Lugoloobi,Thomas Foster,William Bankes,Chris Russell
摘要：在每个问题上运行具有扩展推理的LLM是昂贵的，但是确定哪些输入实际上需要额外的计算仍然具有挑战性。我们调查他们自己的成功的可能性是否是可恢复的，从他们的内部表示生成之前，如果这个信号可以引导更有效的推理。我们对预生成激活进行训练线性探测，以预测数学和编码任务的特定策略成功，大大优于问题长度和TF-IDF等表面特征。使用E2 H-AMC，它提供了人类和模型的性能相同的问题，我们表明，模型编码的模型特定的概念，是不同于人类的困难，这种区别增加扩展推理的难度。利用这些探测器，我们证明了在一个模型池中路由查询可以超过性能最好的模型，同时在MATH上将推理成本降低高达70%，这表明内部表示即使与人类对难度的直觉不同，也可以提高实际效率。我们的代码可在：https://github.com/KabakaWilliam/llms_know_difficulty
摘要：Running LLMs with extended reasoning on every problem is expensive, but determining which inputs actually require additional compute remains challenging. We investigate whether their own likelihood of success is recoverable from their internal representations before generation, and if this signal can guide more efficient inference. We train linear probes on pre-generation activations to predict policy-specific success on math and coding tasks, substantially outperforming surface features such as question length and TF-IDF. Using E2H-AMC, which provides both human and model performance on identical problems, we show that models encode a model-specific notion of difficulty that is distinct from human difficulty, and that this distinction increases with extended reasoning. Leveraging these probes, we demonstrate that routing queries across a pool of models can exceed the best-performing model whilst reducing inference cost by up to 70\% on MATH, showing that internal representations enable practical efficiency gains even when they diverge from human intuitions about difficulty. Our code is available at: https://github.com/KabakaWilliam/llms_know_difficulty

【3】Routing, Cascades, and User Choice for LLMs
标题：LLM的路由、级联和用户选择
链接：https://arxiv.org/abs/2602.09902

作者：Rafid Mahmood
备注：23 pages, accepted in ICLR 2026
摘要：为了减轻性能和成本之间的权衡，LLM提供商根据任务难度和延迟将用户任务路由到不同的模型。我们研究了LLM路由对用户行为的影响。我们提出了一个LLM提供商与两个模型（标准和推理）和用户之间的游戏，如果路由模型不能解决这些问题，用户可以重新提示或放弃任务。用户的目标是最大化他们的效用减去使用模型的延迟，而提供者则最小化为用户提供服务的成本。我们解决了这个Stackelberg游戏充分表征用户的最佳反应，简化了供应商的问题。我们观察到，在几乎所有的情况下，最优路由策略涉及一个静态的政策，没有级联，这取决于用户的预期效用的模型。此外，我们揭示了提供商的最佳路线和用户首选的路线之间的差距时，用户和供应商的排名模型的效用和成本不同。最后，我们展示了极端不一致的条件，其中提供商被激励来抑制模型的延迟以最大限度地降低成本，从而降低用户效用。结果为单提供者、单用户交互产生了简单的阈值规则，并澄清了路由、级联和节流何时有益或有害。
摘要：To mitigate the trade-offs between performance and costs, LLM providers route user tasks to different models based on task difficulty and latency. We study the effect of LLM routing with respect to user behavior. We propose a game between an LLM provider with two models (standard and reasoning) and a user who can re-prompt or abandon tasks if the routed model cannot solve them. The user's goal is to maximize their utility minus the delay from using the model, while the provider minimizes the cost of servicing the user. We solve this Stackelberg game by fully characterizing the user best response and simplifying the provider problem. We observe that in nearly all cases, the optimal routing policy involves a static policy with no cascading that depends on the expected utility of the models to the user. Furthermore, we reveal a misalignment gap between the provider-optimal and user-preferred routes when the user's and provider's rankings of the models with respect to utility and cost differ. Finally, we demonstrate conditions for extreme misalignment where providers are incentivized to throttle the latency of the models to minimize their costs, consequently depressing user utility. The results yield simple threshold rules for single-provider, single-user interactions and clarify when routing, cascading, and throttling help or harm.

【4】CoFEH: LLM-driven Feature Engineering Empowered by Collaborative Bayesian Hyperparameter Optimization
标题：CoFEH：由协作式Bayesian超参数优化支持的LLM驱动的特征工程
链接：https://arxiv.org/abs/2602.09851

作者：Beicheng Xu,Keyao Ding,Wei Liu,Yupeng Lu,Bin Cui
摘要：特征工程（FE）是自动化机器学习（AutoML）的关键，但仍然是传统方法的瓶颈，传统方法将其视为黑盒搜索，在严格的预定义搜索空间内操作，缺乏领域意识。虽然大型语言模型（LLM）提供了一个很有前途的替代方案，利用语义推理生成无界运算符，现有的方法无法构建自由形式的FE管道，仍然局限于孤立的子任务，如功能生成。最重要的是，它们很少与ML模型的超参数优化（HPO）联合优化，导致贪婪的“FE-then-HPO”工作流无法捕获强大的FE-HPO交互。在本文中，我们提出了CoFEH，这是一个协作框架，它将基于LLM的FE和Bayesian HPO交织在一起，以实现强大的端到端AutoML。CoFEH使用由思想树（ToT）驱动的LLM驱动的FE优化器来探索灵活的FE管道，使用贝叶斯优化（BO）模块来解决HPO，以及使用动态优化器选择器来通过自适应地调度FE和HPO步骤来实现交错优化。至关重要的是，我们引入了一个相互制约的机制，共享LLM和BO之间的上下文，使相互知情的决策。实验表明，CoFEH不仅优于传统的和基于LLM的FE基线，而且在联合优化下实现了优越的端到端性能。
摘要：Feature Engineering (FE) is pivotal in automated machine learning (AutoML) but remains a bottleneck for traditional methods, which treat it as a black-box search, operating within rigid, predefined search spaces and lacking domain awareness. While Large Language Models (LLMs) offer a promising alternative by leveraging semantic reasoning to generate unbounded operators, existing methods fail to construct free-form FE pipelines, remaining confined to isolated subtasks such as feature generation. Most importantly, they are rarely optimized jointly with hyperparameter optimization (HPO) of the ML model, leading to greedy "FE-then-HPO" workflows that cannot capture strong FE-HPO interactions. In this paper, we present CoFEH, a collaborative framework that interleaves LLM-based FE and Bayesian HPO for robust end-to-end AutoML. CoFEH uses an LLM-driven FE optimizer powered by Tree of Thought (ToT) to explore flexible FE pipelines, a Bayesian optimization (BO) module to solve HPO, and a dynamic optimizer selector that realizes interleaved optimization by adaptively scheduling FE and HPO steps. Crucially, we introduce a mutual conditioning mechanism that shares context between LLM and BO, enabling mutually informed decisions. Experiments show that CoFEH not only outperforms traditional and LLM-based FE baselines, but also achieves superior end-to-end performance under joint optimization.

【5】Decomposing Reasoning Efficiency in Large Language Models
标题：大型语言模型中的推理效率分解
链接：https://arxiv.org/abs/2602.09805

作者：Daniel Kaiser,Arnoldo Frigessi,Ali Ramezani-Kebrya,Benjamin Ricaud
备注：Preprint (under review). 29 pages, 4 figures
摘要：为推理训练的大型语言模型权衡了推理标记和准确性，但标准评估只报告最终的准确性，模糊了标记花费或浪费的地方。我们引入了一个跟踪可选框架，将令牌效率分解为可解释的因素：在固定令牌预算下完成（避免截断），完成条件正确性和冗长性（令牌使用）。当基准元数据提供每个实例的工作负载代理时，我们进一步将详细程度分为两个部分：平均语言化开销（每个工作单元的令牌）和耦合系数，用于捕获开销如何随任务工作负载扩展。当推理痕迹可用时，我们添加确定性的痕迹质量措施（接地，重复，提示复制），以将退化循环与冗长但参与的推理分开，避免人类标记和LLM判断。在CogniLoad上评估了25个模型，我们发现准确性和令牌效率排名存在差异（Spearman $ρ=0.63$），效率差距通常由条件正确性驱动，语言化开销变化约9倍（仅与模型规模弱相关）。我们的分解揭示了不同的瓶颈配置文件，建议不同的效率干预。
摘要：Large language models trained for reasoning trade off inference tokens against accuracy, yet standard evaluations report only final accuracy, obscuring where tokens are spent or wasted. We introduce a trace-optional framework that decomposes token efficiency into interpretable factors: completion under a fixed token budget (avoiding truncation), conditional correctness given completion, and verbosity (token usage). When benchmark metadata provides per-instance workload proxies, we further factor verbosity into two components: mean verbalization overhead (tokens per work unit) and a coupling coefficient capturing how overhead scales with task workload. When reasoning traces are available, we add deterministic trace-quality measures (grounding, repetition, prompt copying) to separate degenerate looping from verbose-but-engaged reasoning, avoiding human labeling and LLM judges. Evaluating 25 models on CogniLoad, we find that accuracy and token-efficiency rankings diverge (Spearman $ρ=0.63$), efficiency gaps are often driven by conditional correctness, and verbalization overhead varies by about 9 times (only weakly related to model scale). Our decomposition reveals distinct bottleneck profiles that suggest different efficiency interventions.

【6】When Less is More: The LLM Scaling Paradox in Context Compression
标题：当少即多：上下文压缩中的LLM缩放悖论
链接：https://arxiv.org/abs/2602.09789

作者：Ruishan Guo,Yibing Liu,Guoxin Ma,Yan Wang,Yueyang Zhang,Long Xia,Kecheng Chen,Zhiyuan Sun,Daiting Shi
备注：10 pages, 4 figures, conference
摘要：扩展模型参数长期以来一直是一种普遍的训练范式，这是由较大的模型产生更好的生成能力的假设驱动的。然而，在压缩器-解码器设置中的有损上下文压缩下，我们观察到尺寸保真度Paramount：增加压缩器尺寸可以减少重建上下文的忠实性，尽管训练损失减少。通过对从0.6B到90 B的模型进行广泛的实验，我们发现了两个主要因素引起的这个悖论：1）知识模糊：更大的模型越来越多地用它们自己的先验信念取代源事实，例如，"白草莓“$\到$"红草莓”;以及2）语义漂移：较大的模型倾向于解释或重组内容，而不是逐字复制内容，例如，“爱丽丝打鲍勃”$\到$“鲍勃打爱丽丝”。通过保持模型大小固定，我们反映了压缩上下文表示的紧急属性。我们发现，罪魁祸首不是参数计数本身，但过度的语义容量和放大生成的不确定性，伴随缩放。具体而言，上下文嵌入的增加的秩促进先验知识入侵，而令牌预测分布上的更高熵促进重写。我们的研究结果补充了现有的评价上下文压缩范式，支持扩展律的故障，忠实地保存在开放式的一代。
摘要：Scaling up model parameters has long been a prevalent training paradigm driven by the assumption that larger models yield superior generation capabilities. However, under lossy context compression in a compressor-decoder setup, we observe a Size-Fidelity Paradox: increasing the compressor size can lessen the faithfulness of reconstructed contexts though training loss decreases. Through extensive experiments across models from 0.6B to 90B, we coin this paradox arising from two dominant factors: 1) knowledge overwriting: larger models increasingly replace source facts with their own prior beliefs, e.g., ``the white strawberry'' $\to$ ``the red strawberry''; and 2) semantic drift: larger models tend to paraphrase or restructure content instead of reproducing it verbatim, e.g., ``Alice hit Bob'' $\to$ ``Bob hit Alice''. By holding model size fixed, we reflect on the emergent properties of compressed context representations. We show that the culprit is not parameter count itself, but the excessive semantic capacity and amplified generative uncertainty that accompany scaling. Specifically, the increased rank of context embeddings facilitates prior knowledge intrusion, whereas higher entropy over token prediction distributions promotes rewriting. Our results complement existing evaluations over context compression paradigm, underpinning a breakdown in scaling laws for faithful preservation in open-ended generation.

【7】LLM-FS: Zero-Shot Feature Selection for Effective and Interpretable Malware Detection
标题：LLM-FS：零次特征选择，用于有效和可解释的恶意软件检测
链接：https://arxiv.org/abs/2602.09634

作者：Naveen Gill,Ajvad Haneef K,Madhu Kumar S D
摘要：特征选择（FS）对于构建准确和可解释的检测模型仍然至关重要，特别是在高维恶意软件数据集中。传统的FS方法，如额外的树，方差阈值，基于树的模型，卡方检验，方差分析，随机选择和顺序注意主要依赖于统计分析或模型驱动的重要性分数，通常忽略了语义上下文的功能。受LLM驱动的FS的最新进展的启发，我们研究大型语言模型（LLM）是否可以在zero-shot设置中引导特征选择，仅使用特征名称和任务描述，作为传统方法的可行替代方案。我们评估多个LLM（GPT-5.0，GPT-4.0，Gemini-2.5等）在EMDD数据集（EMBER和BODMAS基准数据集的融合）上，将它们与几个分类器的已建立FS方法进行比较，包括随机森林，额外树，MLP和KNN。使用准确度、精确度、召回率、F1、AUC、MCC和运行时间评估性能。我们的研究结果表明，LLM引导的zero-shot特征选择与传统的FS方法相比具有竞争力的性能，同时在可解释性，稳定性和减少对标记数据的依赖性方面提供了额外的优势。这些发现将基于zero-shot LLM的FS定位为有效且可解释的恶意软件检测的有前途的替代策略，为安全关键型应用程序中的知识引导特征选择铺平了道路
摘要：Feature selection (FS) remains essential for building accurate and interpretable detection models, particularly in high-dimensional malware datasets. Conventional FS methods such as Extra Trees, Variance Threshold, Tree-based models, Chi-Squared tests, ANOVA, Random Selection, and Sequential Attention rely primarily on statistical heuristics or model-driven importance scores, often overlooking the semantic context of features. Motivated by recent progress in LLM-driven FS, we investigate whether large language models (LLMs) can guide feature selection in a zero-shot setting, using only feature names and task descriptions, as a viable alternative to traditional approaches. We evaluate multiple LLMs (GPT-5.0, GPT-4.0, Gemini-2.5 etc.) on the EMBOD dataset (a fusion of EMBER and BODMAS benchmark datasets), comparing them against established FS methods across several classifiers, including Random Forest, Extra Trees, MLP, and KNN. Performance is assessed using accuracy, precision, recall, F1, AUC, MCC, and runtime. Our results demonstrate that LLM-guided zero-shot feature selection achieves competitive performance with traditional FS methods while offering additional advantages in interpretability, stability, and reduced dependence on labeled data. These findings position zero-shot LLM-based FS as a promising alternative strategy for effective and interpretable malware detection, paving the way for knowledge-guided feature selection in security-critical applications

【8】AlignTune: Modular Toolkit for Post-Training Alignment of Large Language Models
标题：AlignButton：用于大型语言模型训练后对齐的模块化工具包
链接：https://arxiv.org/abs/2602.09621

作者：R E Zera Marveen Lyngkhoi,Chirag Chawla,Pratinav Seth,Utsav Avaiya,Soham Bhattacharjee,Mykola Khandoga,Rui Yuan,Vinay Kumar Sankarapu
备注：https://github.com/Lexsi-Labs/aligntune
摘要：训练后对齐是部署大型语言模型（LLM）的核心，但实际的工作流程仍然分散在后端特定的工具和ad-hoc胶水代码中，使得实验难以重现。我们确定后端干扰，奖励碎片，和不可复制的管道对齐研究的关键障碍。我们介绍AlignTune，一个模块化的工具包，为监督微调（SFT）和RLHF风格的优化提供了一个统一的接口，具有可互换的TRL和Unsloth后端。AlignTune简化了配置，提供了一个可扩展的奖励层（基于规则和学习），并集成了标准基准测试和自定义任务的评估。通过将后端特定逻辑隔离在单个工厂边界之后，AlignTune实现了受控比较和可重复的对齐实验。
摘要：Post-training alignment is central to deploying large language models (LLMs), yet practical workflows remain split across backend-specific tools and ad-hoc glue code, making experiments hard to reproduce. We identify backend interference, reward fragmentation, and irreproducible pipelines as key obstacles in alignment research. We introduce AlignTune, a modular toolkit exposing a unified interface for supervised fine-tuning (SFT) and RLHF-style optimization with interchangeable TRL and Unsloth backends. AlignTune standardizes configuration, provides an extensible reward layer (rule-based and learned), and integrates evaluation over standard benchmarks and custom tasks. By isolating backend-specific logic behind a single factory boundary, AlignTune enables controlled comparisons and reproducible alignment experiments.

【9】On the Optimal Reasoning Length for RL-Trained Language Models
标题：RL训练语言模型的最佳推理长度
链接：https://arxiv.org/abs/2602.09591

作者：Daisuke Nohara,Taishi Nakamura,Rio Yokota
备注：15 pages, 10 figures. Submitted to the Workshop on Scaling Post-training for LLMs (SPOT) at ICLR 2026
摘要：强化学习大大改善了大型语言模型中的推理，但它也倾向于延长思维输出链，并增加训练和推理过程中的计算成本。虽然已经提出了长度控制方法，但仍然不清楚什么是平衡效率和性能的最佳输出长度。在这项工作中，我们比较了两个模型，Qwen 3 - 1.7B基地和DeepSeek-R1-Distill-Qwen-1.5B的几种长度控制方法。我们的研究结果表明，长度惩罚可能会阻碍推理的获取，而适当调整的长度控制可以提高效率的模型具有较强的先验推理。通过将先前的工作扩展到RL训练的策略，我们确定了两种故障模式，1）长输出增加分散，2）短输出导致思考不足。
摘要：Reinforcement learning substantially improves reasoning in large language models, but it also tends to lengthen chain of thought outputs and increase computational cost during both training and inference. Though length control methods have been proposed, it remains unclear what the optimal output length is for balancing efficiency and performance. In this work, we compare several length control methods on two models, Qwen3-1.7B Base and DeepSeek-R1-Distill-Qwen-1.5B. Our results indicate that length penalties may hinder reasoning acquisition, while properly tuned length control can improve efficiency for models with strong prior reasoning. By extending prior work to RL trained policies, we identify two failure modes, 1) long outputs increase dispersion, and 2) short outputs lead to under-thinking.

【10】Rollout-Training Co-Design for Efficient LLM-Based Multi-Agent Reinforcement Learning
标题：基于LLM的高效多智能体强化学习的推出训练协同设计
链接：https://arxiv.org/abs/2602.09578

作者：Zhida Jiang,Zhaolong Xing,Jiawei Lu,Yipei Niu,Qingyuan Sang,Liangxu Zhang,Wenquan Dai,Junhua Shu,Jiaxing Wang,Qiangyu Pei,Qiong Chen,Xinyu Liu,Fangming Liu,Ai Han,Zhen Chen,Ke Zhang
摘要：尽管多智能体强化学习（MARL）的算法级创新，但大规模MARL训练的底层网络基础设施仍然没有得到充分探索。现有的培训框架主要针对单代理场景进行优化，无法解决MARL独特的系统级挑战，包括推出培训同步障碍，推出负载不平衡和培训资源利用不足。为了弥合这一差距，我们提出了FlexMARL，这是第一个端到端的培训框架，它全面优化了大规模基于LLM的MARL的推出，培训及其编排。具体来说，FlexMARL引入了联合编排器来管理部署培训分解架构下的数据流。在体验存储的基础上，一种新的微批处理驱动的异步流水线消除了同步障碍，同时提供了强大的一致性保证。Rollout引擎采用了并行采样和分层负载均衡相结合的策略，能够适应代理间/代理内的请求模式。训练引擎通过以Agent为中心的资源分配实现按需硬件绑定。不同智能体的训练状态通过统一和位置不可知的通信进行交换。在大规模生产集群上的实证结果表明，与现有框架相比，FlexMARL实现了高达7.3倍的加速比，并将硬件利用率提高了5.6倍。
摘要：Despite algorithm-level innovations for multi-agent reinforcement learning (MARL), the underlying networked infrastructure for large-scale MARL training remains underexplored. Existing training frameworks primarily optimize for single-agent scenarios and fail to address the unique system-level challenges of MARL, including rollout-training synchronization barriers, rollout load imbalance, and training resource underutilization. To bridge this gap, we propose FlexMARL, the first end-to-end training framework that holistically optimizes rollout, training, and their orchestration for large-scale LLM-based MARL. Specifically, FlexMARL introduces the joint orchestrator to manage data flow under the rollout-training disaggregated architecture. Building upon the experience store, a novel micro-batch driven asynchronous pipeline eliminates the synchronization barriers while providing strong consistency guarantees. Rollout engine adopts a parallel sampling scheme combined with hierarchical load balancing, which adapts to skewed inter/intra-agent request patterns. Training engine achieves on-demand hardware binding through agent-centric resource allocation. The training states of different agents are swapped via unified and location-agnostic communication. Empirical results on a large-scale production cluster demonstrate that FlexMARL achieves up to 7.3x speedup and improves hardware utilization by up to 5.6x compared to existing frameworks.

【11】Aligning Tree-Search Policies with Fixed Token Budgets in Test-Time Scaling of LLMs
标题：LLM测试时扩展中的树搜索策略与固定令牌预算保持一致
链接：https://arxiv.org/abs/2602.09574

作者：Sora Miyamoto,Daisuke Oba,Naoaki Okazaki
摘要：树搜索解码是大型语言模型（LLM）测试时间缩放的有效形式，但实际部署会强制执行固定的每个查询令牌预算，该预算因设置而异。现有的树搜索策略在很大程度上是不可知的，将预算视为终止条件，这可能导致后期过度分支或过早终止。我们提出了{预算引导MCTS}（BG-MCTS），这是一种树搜索解码算法，它将其搜索策略与剩余的令牌预算保持一致：它从广泛的探索开始，然后随着预算耗尽而优先考虑细化和答案完成，同时减少来自浅节点的后期分支。BG-MCTS在MATH 500和AIME 24/25上的不同预算中始终优于具有开放权重LLM的不可知树搜索基线。
摘要：Tree-search decoding is an effective form of test-time scaling for large language models (LLMs), but real-world deployment imposes a fixed per-query token budget that varies across settings. Existing tree-search policies are largely budget-agnostic, treating the budget as a termination condition, which can lead to late-stage over-branching or premature termination. We propose {Budget-Guided MCTS} (BG-MCTS), a tree-search decoding algorithm that aligns its search policy with the remaining token budget: it starts with broad exploration, then prioritizes refinement and answer completion as the budget depletes while reducing late-stage branching from shallow nodes. BG-MCTS consistently outperforms budget-agnostic tree-search baselines across different budgets on MATH500 and AIME24/25 with open-weight LLMs.

【12】Bridging the Modality Gap in Roadside LiDAR: A Training-Free Vision-Language Model Framework for Vehicle Classification
标题：弥合路边LiDART中的形态差距：用于车辆分类的免训练视觉语言模型框架
链接：https://arxiv.org/abs/2602.09425

作者：Yiqiao Li,Bo Shang,Jie Wei
备注：12 pages, 10 figures, 4 tables
摘要：细粒度卡车分类对于智能交通系统（ITS）至关重要，但目前基于LiDAR的方法由于依赖于监督式深度学习和劳动密集型手动注释而面临可扩展性挑战。视觉语言模型（VLM）提供了有前途的Few-Shot推广，但其应用到路边激光雷达是有限的稀疏的三维点云和密集的二维图像之间的模态差距。我们提出了一个框架，通过适应现成的VLM细粒度的卡车分类没有参数微调，弥合这一差距。我们新的深度感知图像生成管道应用噪声去除，空间和时间配准，方向校正，形态学操作和各向异性平滑，将稀疏，遮挡的LiDAR扫描转换为深度编码的2D视觉代理。在20个车辆类别的真实数据集上进行验证，我们的方法实现了具有竞争力的分类准确性，每个类别只有16-30个示例，为数据密集型监督基线提供了可扩展的替代方案。我们进一步观察到“语义锚”效应：基于文本的指导在超低拍摄机制$k < 4$中使性能正常化，但由于语义不匹配，在更多拍摄设置中降低了准确性。此外，我们证明了这个框架作为冷启动策略的有效性，使用VLM-generated标签来引导轻量级监督模型。值得注意的是，基于Few-Shot VLM的模型在完全没有昂贵的培训或微调的情况下，实现了特定拖运类别（20英尺，40英尺和53英尺集装箱）的75%的正确分类率，显著降低了初始手动标签的密集需求，从而实现了在ITS应用中的实际使用方法。
摘要：Fine-grained truck classification is critical for intelligent transportation systems (ITS), yet current LiDAR-based methods face scalability challenges due to their reliance on supervised deep learning and labor-intensive manual annotation. Vision-Language Models (VLMs) offer promising few-shot generalization, but their application to roadside LiDAR is limited by a modality gap between sparse 3D point clouds and dense 2D imagery. We propose a framework that bridges this gap by adapting off-the-shelf VLMs for fine-grained truck classification without parameter fine-tuning. Our new depth-aware image generation pipeline applies noise removal, spatial and temporal registration, orientation rectification, morphological operations, and anisotropic smoothing to transform sparse, occluded LiDAR scans into depth-encoded 2D visual proxies. Validated on a real-world dataset of 20 vehicle classes, our approach achieves competitive classification accuracy with as few as 16-30 examples per class, offering a scalable alternative to data-intensive supervised baselines. We further observe a "Semantic Anchor" effect: text-based guidance regularizes performance in ultra-low-shot regimes $k < 4$, but degrades accuracy in more-shot settings due to semantic mismatch. Furthermore, we demonstrate the efficacy of this framework as a Cold Start strategy, using VLM-generated labels to bootstrap lightweight supervised models. Notably, the few-shot VLM-based model achieves over correct classification rate of 75 percent for specific drayage categories (20ft, 40ft, and 53ft containers) entirely without the costly training or fine-tuning, significantly reducing the intensive demands of initial manual labeling, thus achieving a method of practical use in ITS applications.

【13】Large Language Models for Designing Participatory Budgeting Rules
标题：设计参与式预算规则的大型语言模型
链接：https://arxiv.org/abs/2602.09349

作者：Nguyen Thach,Xingchen Sha,Hau Chan
备注：Accepted as full paper to AAMAS 2026
摘要：预算法是一种根据居民的偏好来决定公共项目资金的民主模式，已被世界上许多城市所采用。PB的主要重点是设计规则和函数，为一组受某些预算约束的项目返回可行的预算分配。设计基于代理偏好优化效用和公平目标的PB规则具有挑战性，因为需要广泛的领域知识以及这两个概念之间的权衡。最近，大语言模型（LLM）已越来越多地用于自动算法设计。鉴于PB规则的算法经典背包问题的相似之处，在本文中，我们引入了一个新的框架，名为LLMLule，解决了现有作品的局限性，将LLMs到一个进化的搜索过程，PB规则的自动化设计。我们的实验结果，在600多个来自美国的真实PB实例上进行了评估，加拿大，波兰和荷兰与代理偏好的不同表示，表明LLM生成的规则一般优于现有的手工制作的规则在整体效用方面，同时仍然保持类似程度的公平性。
摘要：Participatory budgeting (PB) is a democratic paradigm for deciding the funding of public projects given the residents' preferences, which has been adopted in numerous cities across the world. The main focus of PB is designing rules, functions that return feasible budget allocations for a set of projects subject to some budget constraint. Designing PB rules that optimize both utility and fairness objectives based on agent preferences had been challenging due to the extensive domain knowledge required and the proven trade-off between the two notions. Recently, large language models (LLMs) have been increasingly employed for automated algorithmic design. Given the resemblance of PB rules to algorithms for classical knapsack problems, in this paper, we introduce a novel framework, named LLMRule, that addresses the limitations of existing works by incorporating LLMs into an evolutionary search procedure for automating the design of PB rules. Our experimental results, evaluated on more than 600 real-world PB instances obtained from the U.S., Canada, Poland, and the Netherlands with different representations of agent preferences, demonstrate that the LLM-generated rules generally outperform existing handcrafted rules in terms of overall utility while still maintaining a similar degree of fairness.

【14】Effective MoE-based LLM Compression by Exploiting Heterogeneous Inter-Group Experts Routing Frequency and Information Density
标题：通过利用异类组间专家路由频率和信息密度进行有效的基于MoE的LLM压缩
链接：https://arxiv.org/abs/2602.09316

作者：Zhendong Mi,Yixiao Chen,Pu Zhao,Xiaodong Yu,Hao Wang,Yanzhi Wang,Shaoyi Huang
摘要：基于混合专家（Mixture-of-Experts，MoE）的大型语言模型（Large Language Models，LLM）具有优异的性能，但存储多个专家网络所带来的巨大内存开销严重阻碍了其实际应用。基于奇异值分解（SVD）的压缩已经成为一种有前途的后训练技术;然而，大多数现有方法应用统一的秩分配或仅依赖于静态权重属性。这忽略了在MoE模型中观察到的专家利用率的实质性异质性，其中频繁的路由模式和内在信息密度在专家之间变化很大。在这项工作中，我们提出了RFID-MoE，一个有效的框架MoE压缩利用异构路由频率和信息密度。首先，我们引入了一个融合的度量，结合专家激活频率与有效排名来衡量专家的重要性，自适应地分配较高的排名在一个固定的预算下的关键专家组。此外，而不是丢弃压缩残差，我们重建他们通过一个参数有效的稀疏投影机制，以恢复丢失的信息与最小的参数开销。对代表性MoE LLM进行了广泛的实验（例如，Qwen 3，DeepSeek MoE）在多个压缩比上的性能表明，RFID-MoE始终优于MoBE和D2-MoE等最先进的方法。值得注意的是，RFID-MoE在60%压缩比下使用Qwen 3 - 30 B模型在PTB上实现了16.92的困惑度，与基线相比将困惑度降低了8.0以上，并将HellaSwag上的zero-shot精度提高了约8%。
摘要：Mixture-of-Experts (MoE) based Large Language Models (LLMs) have achieved superior performance, yet the massive memory overhead caused by storing multiple expert networks severely hinders their practical deployment. Singular Value Decomposition (SVD)-based compression has emerged as a promising post-training technique; however, most existing methods apply uniform rank allocation or rely solely on static weight properties. This overlooks the substantial heterogeneity in expert utilization observed in MoE models, where frequent routing patterns and intrinsic information density vary significantly across experts. In this work, we propose RFID-MoE, an effective framework for MoE compression by exploiting heterogeneous Routing Frequency and Information Density. We first introduce a fused metric that combines expert activation frequency with effective rank to measure expert importance, adaptively allocating higher ranks to critical expert groups under a fixed budget. Moreover, instead of discarding compression residuals, we reconstruct them via a parameter-efficient sparse projection mechanism to recover lost information with minimal parameter overhead. Extensive experiments on representative MoE LLMs (e.g., Qwen3, DeepSeekMoE) across multiple compression ratios demonstrate that RFID-MoE consistently outperforms state-of-the-art methods like MoBE and D2-MoE. Notably, RFID-MoE achieves a perplexity of 16.92 on PTB with the Qwen3-30B model at a 60% compression ratio, reducing perplexity by over 8.0 compared to baselines, and improves zero-shot accuracy on HellaSwag by approximately 8%.

【15】Empowering Contrastive Federated Sequential Recommendation with LLMs
标题：利用LLM支持对比联合顺序推荐
链接：https://arxiv.org/abs/2602.09306

作者：Thi Minh Chau Nguyen,Minh Hieu Nguyen,Duc Anh Nguyen,Xuan Huong Tran,Thanh Trung Huynh,Quoc Viet Hung Nguyen
摘要：联合顺序推荐（FedSeqRec）旨在执行下一项预测，同时保持用户数据分散，但模型质量经常受到存储在各个设备上的碎片，嘈杂和同质交互日志的限制。许多现有的方法试图通过手动数据扩充或附加的服务器端约束来进行补偿，但这些策略要么引入有限的语义多样性，要么增加系统开销。为了克服这些挑战，我们提出了\textbf{LUMOS}，一个参数隔离的FedSeqRec架构，集成了大型语言模型（LLM）作为\textbf {本地语义生成器}。LUMOS不共享梯度或辅助参数，而是私下调用设备上的LLM，从每个用户历史构建三个互补序列变体：（i）推断合理的行为延续的面向未来的轨迹，（ii）保留用户意图同时多样化交互模式的语义等价的改写，和（iii）作为信息性否定的\n {偏好不一致的反事实}。这些合成序列通过三视图对比优化方案在联合骨干中联合编码，从而实现更丰富的表示学习，而不会暴露敏感信息。在三个公共基准测试中的实验结果表明，LUMOS在HR@20和NDCG@20上实现了与竞争性集中和联合基线一致的收益。此外，使用基于语义的积极信号和反事实消极信号可以提高在嘈杂和对抗环境下的鲁棒性，即使没有专用的服务器端保护模块。总体而言，这项工作表明了LLM驱动的语义生成作为推进隐私保护联邦推荐的新范式的潜力。
摘要：Federated sequential recommendation (FedSeqRec) aims to perform next-item prediction while keeping user data decentralised, yet model quality is frequently constrained by fragmented, noisy, and homogeneous interaction logs stored on individual devices. Many existing approaches attempt to compensate through manual data augmentation or additional server-side constraints, but these strategies either introduce limited semantic diversity or increase system overhead. To overcome these challenges, we propose \textbf{LUMOS}, a parameter-isolated FedSeqRec architecture that integrates large language models (LLMs) as \emph{local semantic generators}. Instead of sharing gradients or auxiliary parameters, LUMOS privately invokes an on-device LLM to construct three complementary sequence variants from each user history: (i) \emph{future-oriented} trajectories that infer plausible behavioural continuations, (ii) \emph{semantically equivalent rephrasings} that retain user intent while diversifying interaction patterns, and (iii) \emph{preference-inconsistent counterfactuals} that serve as informative negatives. These synthesized sequences are jointly encoded within the federated backbone through a tri-view contrastive optimisation scheme, enabling richer representation learning without exposing sensitive information. Experimental results across three public benchmarks show that LUMOS achieves consistent gains over competitive centralised and federated baselines on HR@20 and NDCG@20. In addition, the use of semantically grounded positive signals and counterfactual negatives improves robustness under noisy and adversarial environments, even without dedicated server-side protection modules. Overall, this work demonstrates the potential of LLM-driven semantic generation as a new paradigm for advancing privacy-preserving federated recommendation.

【16】Reward Modeling for Reinforcement Learning-Based LLM Reasoning: Design, Challenges, and Evaluation
标题：基于强化学习的LLM推理的奖励建模：设计、挑战和评估
链接：https://arxiv.org/abs/2602.09305

作者：Pei-Chi Pan,Yingbin Liang,Sen Lin
摘要：大型语言模型（LLM）显示出变革的潜力，但它们的推理仍然不一致和不可靠。基于强化学习（RL）的微调是改进的关键机制，但其有效性从根本上取决于奖励设计。尽管它的重要性，奖励建模和核心LLM挑战之间的关系-如评估偏差，幻觉，分布转移和有效学习-仍然知之甚少。这项工作认为，奖励建模不仅仅是一个实现细节，而是推理对齐的核心架构师，塑造模型学习什么，它们如何推广，以及它们的输出是否可以信任。我们引入了推理对齐强化学习（RARL），这是一个统一的框架，它将多步推理的不同奖励范式系统化。在这个框架内，我们提出了一个分类的奖励机制，分析奖励黑客作为一个普遍的故障模式，并研究奖励信号如何统一的挑战，从推理时间缩放到幻觉缓解。我们进一步批判性地评估现有的基准，突出漏洞，如数据污染和奖励错位，并概述更强大的评估方向。通过整合分散的研究线索，并澄清奖励设计和基本推理能力之间的相互作用，这项工作为构建鲁棒，可验证和值得信赖的推理模型提供了基础路线图。
摘要：Large Language Models (LLMs) demonstrate transformative potential, yet their reasoning remains inconsistent and unreliable. Reinforcement learning (RL)-based fine-tuning is a key mechanism for improvement, but its effectiveness is fundamentally governed by reward design. Despite its importance, the relationship between reward modeling and core LLM challenges--such as evaluation bias, hallucination, distribution shift, and efficient learning--remains poorly understood. This work argues that reward modeling is not merely an implementation detail but a central architect of reasoning alignment, shaping what models learn, how they generalize, and whether their outputs can be trusted. We introduce Reasoning-Aligned Reinforcement Learning (RARL), a unifying framework that systematizes diverse reward paradigms for multi-step reasoning. Within this framework, we present a taxonomy of reward mechanisms, analyze reward hacking as a pervasive failure mode, and examine how reward signals unify challenges ranging from inference-time scaling to hallucination mitigation. We further critically evaluate existing benchmarks, highlighting vulnerabilities such as data contamination and reward misalignment, and outline directions for more robust evaluation. By integrating fragmented research threads and clarifying the interplay between reward design and fundamental reasoning capabilities, this work provides a foundational roadmap for building reasoning models that are robust, verifiable, and trustworthy.

【17】$n$-Musketeers: Reinforcement Learning Shapes Collaboration Among Language Models
标题：$n$-火枪手：强化学习验证语言模型之间的协作
链接：https://arxiv.org/abs/2602.09173

作者：Ryozo Masukawa,Sanggeon Yun,Hyunwoo Oh,SuhgHeon Jeong,Raheeb Hassa,Hanning Chen,Wenjun Huang,Mahdi Imani,Pietro Mercati,Nathaniel D. Bastian,Mohsen Imani
摘要：可验证奖励强化学习（RLVR）的最新进展表明，小型专用语言模型（SLM）可以表现出结构化推理，而不依赖于大型单片LLM。我们引入软隐藏状态的合作，多个异构冻结SLM专家通过他们的内部表示通过一个可训练的注意力接口集成。在Reasoning Gym和GSM 8K上的实验表明，这种潜在的集成与强大的单模型RLVR基线相比具有竞争力。消融进一步揭示了专家利用的双重机制：对于简单的算术域，性能增益可以在很大程度上解释静态专家的偏好，而更具挑战性的设置诱导越来越集中和结构化的专家注意力的培训，表明紧急专业化的路由器如何连接到相关的专家。总的来说，隐藏状态协作提供了一个紧凑的机制，利用冻结的专家，同时提供了一个观察窗口，专家利用模式和他们的演变下RLVR。
摘要：Recent progress in reinforcement learning with verifiable rewards (RLVR) shows that small, specialized language models (SLMs) can exhibit structured reasoning without relying on large monolithic LLMs. We introduce soft hidden-state collaboration, where multiple heterogeneous frozen SLM experts are integrated through their internal representations via a trainable attention interface. Experiments on Reasoning Gym and GSM8K show that this latent integration is competitive with strong single-model RLVR baselines. Ablations further reveal a dual mechanism of expert utilization: for simpler arithmetic domains, performance gains can largely be explained by static expert preferences, whereas more challenging settings induce increasingly concentrated and structured expert attention over training, indicating emergent specialization in how the router connects to relevant experts. Overall, hidden-state collaboration provides a compact mechanism for leveraging frozen experts, while offering an observational window into expert utilization patterns and their evolution under RLVR.

【18】UniComp: A Unified Evaluation of Large Language Model Compression via Pruning, Quantization and Distillation
标题：UniComp：通过修剪、量化和蒸馏对大型语言模型压缩进行统一评估
链接：https://arxiv.org/abs/2602.09130

作者：Jonathan von Rad,Yong Cao,Andreas Geiger
备注：18 pages, 6 figures. Submitted to ACL 2026
摘要：模型压缩对于部署大型语言模型（LLM）越来越重要，但现有的评估方法覆盖范围有限，主要集中在以知识为中心的基准。因此，我们引入UniComp，一个统一的评价框架，比较修剪，量化和知识蒸馏。UniComp从三个方面评估压缩模型：性能、可靠性和效率，使用一组不同的面向能力和安全的基准以及硬件感知的效率分析。通过在40多个数据集上对现代LLM上的六种压缩技术进行广泛评估，我们发现（i）压缩表现出一致的知识偏见，其中知识密集型任务相对保留，而推理，多语言和推理能力大幅下降;（ii）量化提供了保留的性能和效率之间的最佳总体折衷，而蒸馏产生强大的运行时加速增益在高计算成本;和（iii）特定于任务的校准可以显着提高修剪模型的推理能力高达50%。
摘要：Model compression is increasingly essential for deploying large language models (LLMs), yet existing evaluations are limited in method coverage and focus primarily on knowledge-centric benchmarks. Thus, we introduce UniComp, a unified evaluation framework for comparing pruning, quantization, and knowledge distillation. UniComp evaluates compressed models along three dimensions: performance, reliability, and efficiency, using a diverse set of capability- and safety-oriented benchmarks together with a hardware-aware efficiency analysis. Through extensive evaluation of six compression techniques on modern LLMs across more than 40 datasets, we find that (i) compression exhibits a consistent knowledge bias, where knowledge-intensive tasks are relatively preserved while reasoning, multilingual, and instruction-following capabilities degrade substantially; (ii) quantization provides the best overall trade-off between retained performance and efficiency, whereas distillation yields strong runtime acceleration gains at high computational cost; and (iii) task-specific calibration can significantly improve the reasoning ability of pruned models by up to 50%.

【19】Distributed Hybrid Parallelism for Large Language Models: Comparative Study and System Design Guide
标题：大型语言模型的分布式混合并行主义：比较研究和系统设计指南
链接：https://arxiv.org/abs/2602.09109

作者：Hossam Amer,Rezaul Karim,Ali Pourranjbar,Weiwei Zhang,Walid Ahmed,Boxing Chen
摘要：随着大型语言模型（LLM）的快速增长，已经开发了各种方法来跨硬件设备分配计算和存储器，以实现有效的训练和推理。虽然现有的调查提供了这些技术的描述性概述，但对其好处和权衡的系统分析以及这些见解如何为设计最佳分布式系统的原则性方法提供信息仍然有限。本文提供了一个全面的审查集体行动和分布式并行战略，辅以数学公式，以加深理论理解。我们进一步研究混合并行化设计，强调模型部署的不同阶段（包括训练和推理）的通信计算重叠。最新的进展，自动搜索最佳的混合并行化策略，使用成本模型进行了讨论。此外，我们提出的主流架构类别的案例研究，以揭示经验的见解，指导研究人员和从业人员在并行策略的选择。最后，我们强调了当前LLM培训模式的开放性挑战和局限性，并概述了下一代大规模模型开发的有希望的方向。
摘要：With the rapid growth of large language models (LLMs), a wide range of methods have been developed to distribute computation and memory across hardware devices for efficient training and inference. While existing surveys provide descriptive overviews of these techniques, systematic analysis of their benefits and trade offs and how such insights can inform principled methodology for designing optimal distributed systems remain limited. This paper offers a comprehensive review of collective operations and distributed parallel strategies, complemented by mathematical formulations to deepen theoretical understanding. We further examine hybrid parallelization designs, emphasizing communication computation overlap across different stages of model deployment, including both training and inference. Recent advances in automated search for optimal hybrid parallelization strategies using cost models are also discussed. Moreover, we present case studies with mainstream architecture categories to reveal empirical insights to guide researchers and practitioners in parallelism strategy selection. Finally, we highlight open challenges and limitations of current LLM training paradigms and outline promising directions for the next generation of large scale model development.

Graph相关(图学习|图神经网络|图优化等)(6篇)

【1】Differentiable Tripartite Modularity for Clustering Heterogeneous Graphs
标题：用于聚集异类图的可微三方模块化
链接：https://arxiv.org/abs/2602.09864

作者：Benoît Hurpeau
备注：12 pages, 3 figures
摘要：聚类异构关系数据仍然是图学习的核心挑战，特别是当交互涉及两种以上类型的实体时。虽然可区分的模块化目标，如DMoN已经实现了端到端的社区检测同质和二分图，扩展这些方法到高阶关系结构仍然是不平凡的。在这项工作中，我们介绍了一个可微的制定三方模块化的图形组成的三个节点类型通过介导的相互作用。社区结构的定义在三分图的加权共同路径，以及精确的因式分解计算，避免了显式的密集的三阶张量的建设。在主节点处引入结构规范化，以控制极端度的异构性，保证优化的稳定性。所得到的目标可以与图神经网络以端到端的方式联合优化，同时保留边数的线性复杂度。我们验证了大规模的城市地籍数据，它表现出强大的收敛行为，并产生空间连贯的分区所提出的框架。这些结果突出了可微三方模块化作为一个通用的方法构建块无监督聚类的异构图。
摘要：Clustering heterogeneous relational data remains a central challenge in graph learning, particularly when interactions involve more than two types of entities. While differentiable modularity objectives such as DMoN have enabled end-to-end community detection on homogeneous and bipartite graphs, extending these approaches to higher-order relational structures remains non-trivial. In this work, we introduce a differentiable formulation of tripartite modularity for graphs composed of three node types connected through mediated interactions. Community structure is defined in terms of weighted co-paths across the tripartite graph, together with an exact factorized computation that avoids the explicit construction of dense third-order tensors. A structural normalization at pivot nodes is introduced to control extreme degree heterogeneity and ensure stable optimization. The resulting objective can be optimized jointly with a graph neural network in an end-to-end manner, while retaining linear complexity in the number of edges. We validate the proposed framework on large-scale urban cadastral data, where it exhibits robust convergence behavior and produces spatially coherent partitions. These results highlight differentiable tripartite modularity as a generic methodological building block for unsupervised clustering of heterogeneous graphs.

【2】PlugSI: Plug-and-Play Test-Time Graph Adaptation for Spatial Interpolation
标题：PlugSI：即插即用测试时间图适应空间插值
链接：https://arxiv.org/abs/2602.09824

作者：Xuhang Wu,Zhuoxuan Liang,Wei Li,Xiaohua Jia,Sumi Helal
备注：Accepted at DASFAA 2026 (Full Research Paper)
摘要：随着物联网和边缘计算的快速发展，传感器网络已变得不可或缺，推动了大规模传感器部署的需求。然而，高昂的部署成本阻碍了它们的可扩展性。为了解决这些问题，空间插值（SI）引入虚拟传感器来利用图结构从观察到的传感器推断读数。然而，目前基于图的SI方法依赖于预先训练的模型，在测试时缺乏对更大和看不见的图的适应，并且忽略了测试数据的利用。为了解决这些问题，我们提出了PlugSI，即插即用框架，通过两个关键的创新改进测试时间图。首先，我们设计了一个未知拓扑适配器（UTA），它在测试时适应每个小批量的新图结构，增强了SI预训练模型的泛化能力。其次，我们引入了一个时间平衡适配器（TBA），它保持了一个稳定的历史共识，以指导UTA适应，并防止当前批次中的噪声造成的漂移。从经验上讲，大量的实验证明PlugSI可以无缝集成到现有的基于图形的SI方法中，并提供显著的改进（例如，MAE降低10.81%）。
摘要：With the rapid advancement of IoT and edge computing, sensor networks have become indispensable, driving the need for large-scale sensor deployment. However, the high deployment cost hinders their scalability. To tackle the issues, Spatial Interpolation (SI) introduces virtual sensors to infer readings from observed sensors, leveraging graph structure. However, current graph-based SI methods rely on pre-trained models, lack adaptation to larger and unseen graphs at test-time, and overlook test data utilization. To address these issues, we propose PlugSI, a plug-and-play framework that refines test-time graph through two key innovations. First, we design an Unknown Topology Adapter (UTA) that adapts to the new graph structure of each small-batch at test-time, enhancing the generalization of SI pre-trained models. Second, we introduce a Temporal Balance Adapter (TBA) that maintains a stable historical consensus to guide UTA adaptation and prevent drifting caused by noise in the current batch. Empirically, extensive experiments demonstrate PlugSI can be seamlessly integrated into existing graph-based SI methods and provide significant improvement (e.g., a 10.81% reduction in MAE).

【3】BRAVA-GNN: Betweenness Ranking Approximation Via Degree MAss Inspired Graph Neural Network
标题：BRAVa-GNN：通过度MAss启发图神经网络的中间度排名逼近
链接：https://arxiv.org/abs/2602.09716

作者：Justin Dachille,Aurora Rossi,Sunil Kumar Maurya,Frederik Mallmann-Trenn,Xin Liu,Frédéric Giroire,Tsuyoshi Murata,Emanuele Natale
备注：Submitted to KDD
摘要：计算网络中节点的重要性是一个长期存在的基本问题，它推动了各种中心性度量的广泛研究。一个特别著名的中心性度量是介数中心性，它在大规模网络中变得计算上是禁止的。因此，图神经网络（GNN）模型被提出来预测节点的排名，根据它们的相对介数中心。然而，国家的最先进的方法未能推广到高直径的图，如道路网络。我们提出了BRAVA-GNN，一个轻量级的GNN架构，它利用经验观察到的相关性，将介数中心性与基于度的量（特别是多跳度质量）联系起来。这种相关性促使使用度质量作为大小不变的节点特征和与真实网络的度分布密切匹配的合成训练图。此外，虽然以前的工作依赖于无标度合成图，但我们利用了双曲随机图模型，该模型在无标度范围外再现幂律指数，更好地捕捉了道路网络等现实世界图的结构。这种设计使BRAVA-GNN能够在不同的图族中推广，同时使用的参数比最轻量级的现有GNN基线少54倍。在19个真实世界的网络上进行的广泛实验表明，BRAVA-GNN在Kendall-Tau相关性方面实现了高达214%的改进，并且在推理时间方面比最先进的基于GNN的方法提高了70倍，特别是在具有挑战性的道路网络上。
摘要：Computing node importance in networks is a long-standing fundamental problem that has driven extensive study of various centrality measures. A particularly well-known centrality measure is betweenness centrality, which becomes computationally prohibitive on large-scale networks. Graph Neural Network (GNN) models have thus been proposed to predict node rankings according to their relative betweenness centrality. However, state-of-the-art methods fail to generalize to high-diameter graphs such as road networks. We propose BRAVA-GNN, a lightweight GNN architecture that leverages the empirically observed correlation linking betweenness centrality to degree-based quantities, in particular multi-hop degree mass. This correlation motivates the use of degree masses as size-invariant node features and synthetic training graphs that closely match the degree distributions of real networks. Furthermore, while previous work relies on scale-free synthetic graphs, we leverage the hyperbolic random graph model, which reproduces power-law exponents outside the scale-free regime, better capturing the structure of real-world graphs like road networks. This design enables BRAVA-GNN to generalize across diverse graph families while using 54x fewer parameters than the most lightweight existing GNN baseline. Extensive experiments on 19 real-world networks, spanning social, web, email, and road graphs, show that BRAVA-GNN achieves up to 214% improvement in Kendall-Tau correlation and up to 70x speedup in inference time over state-of-the-art GNN-based approaches, particularly on challenging road networks.

【4】Diffusion-Guided Pretraining for Brain Graph Foundation Models
标题：脑图基础模型的扩散引导预训练
链接：https://arxiv.org/abs/2602.09437

作者：Xinxu Wei,Rong Zhou,Lifang He,Yu Zhang
备注：18 pages
摘要：随着人们对大脑信号基础模型的兴趣越来越大，基于图的预训练已经成为从连接体数据中学习可转移表示的一个有前途的范例。然而，现有的对比和掩蔽自动编码器方法通常依赖于朴素的随机丢弃或掩蔽来增强，这不适合脑图和超图，因为它破坏了语义上有意义的连接模式。此外，常用的图形级读出和重建方案无法捕获全局结构信息，限制了学习表示的鲁棒性。在这项工作中，我们提出了一个统一的基于扩散的预训练框架，解决了这两个限制。首先，扩散被设计用于指导结构感知的丢弃和掩蔽策略，在保持有效的预训练多样性的同时保留脑图语义。其次，扩散通过允许图嵌入和掩蔽节点从全局相关区域聚合信息来实现拓扑感知的图级读出和节点级全局重建。在多个神经成像数据集上进行的广泛实验，超过25，000名受试者和60，000次扫描，涉及各种精神疾病和大脑图谱，证明了一致的性能改善。
摘要：With the growing interest in foundation models for brain signals, graph-based pretraining has emerged as a promising paradigm for learning transferable representations from connectome data. However, existing contrastive and masked autoencoder methods typically rely on naive random dropping or masking for augmentation, which is ill-suited for brain graphs and hypergraphs as it disrupts semantically meaningful connectivity patterns. Moreover, commonly used graph-level readout and reconstruction schemes fail to capture global structural information, limiting the robustness of learned representations. In this work, we propose a unified diffusion-based pretraining framework that addresses both limitations. First, diffusion is designed to guide structure-aware dropping and masking strategies, preserving brain graph semantics while maintaining effective pretraining diversity. Second, diffusion enables topology-aware graph-level readout and node-level global reconstruction by allowing graph embeddings and masked nodes to aggregate information from globally related regions. Extensive experiments across multiple neuroimaging datasets with over 25,000 subjects and 60,000 scans involving various mental disorders and brain atlases demonstrate consistent performance improvements.

【5】Enhanced Graph Transformer with Serialized Graph Tokens
标题：具有序列化图形令牌的增强图形Transformer
链接：https://arxiv.org/abs/2602.09065

作者：Ruixiang Wang,Yuyang Hong,Shiming Xiang,Chunhong Pan
备注：ICASSP 2026
摘要：Transformers在图形学习方面取得了成功，特别是对于节点级任务。然而，现有的方法在生成图级表示时遇到信息瓶颈。流行的单令牌范式未能充分利用编码令牌序列中的自我注意的固有强度，并退化为节点信号的加权和。为了解决这个问题，我们设计了一种新的串行化令牌范式，以更有效地封装全局信号。具体来说，提出了一种图序列化方法，将节点信号聚合成序列化的图令牌，并自动涉及位置编码。然后，堆叠的自我注意层被应用于编码该令牌序列并捕获其内部依赖关系。我们的方法可以通过对多个图形标记之间的复杂交互进行建模来产生更具表现力的图形表示。实验结果表明，我们的方法在几个图级基准测试中取得了最先进的结果。消融研究验证了所提出的模块的有效性。
摘要：Transformers have demonstrated success in graph learning, particularly for node-level tasks. However, existing methods encounter an information bottleneck when generating graph-level representations. The prevalent single token paradigm fails to fully leverage the inherent strength of self-attention in encoding token sequences, and degenerates into a weighted sum of node signals. To address this issue, we design a novel serialized token paradigm to encapsulate global signals more effectively. Specifically, a graph serialization method is proposed to aggregate node signals into serialized graph tokens, with positional encoding being automatically involved. Then, stacked self-attention layers are applied to encode this token sequence and capture its internal dependencies. Our method can yield more expressive graph representations by modeling complex interactions among multiple graph tokens. Experimental results show that our method achieves state-of-the-art results on several graph-level benchmarks. Ablation studies verify the effectiveness of the proposed modules.

【6】How Far Can You Grow? Characterizing the Extrapolation Frontier of Graph Generative Models for Materials Science
标题：你能成长多远？材料科学图生成模型的外推前沿特征
链接：https://arxiv.org/abs/2602.09309

作者：Can Polat,Erchin Serpedin,Mustafa Kurban,Hasan Kurban
摘要：每一个晶体材料的生成模型都有一个临界结构尺寸，超过这个尺寸，它的输出就变得不可靠了--我们称之为外推边界。尽管它对纳米材料设计有直接影响，但这一前沿从未被系统地测量过。我们介绍了RADII，一个半径分辨的基准${\sim}$75，000纳米粒子结构（55- 11，298原子），将半径作为一个连续的缩放旋钮，以跟踪生成质量从分布到无泄漏分裂下的分布制度。RADII提供了前沿特定的诊断：每半径的错误配置文件精确定位每个架构的缩放上限，表面-内部分解测试故障是否起源于边界或批量，交叉度量故障排序揭示了结构保真度的哪个方面首先打破。对五种最先进的结构进行基准测试，我们发现：（i）所有模型在训练半径之外的全局位置误差降低了13%，但局部键保真度在结构之间差异很大--从接近零到超过2倍的崩溃;（ii）没有两种结构具有相同的失效序列，揭示了由模型族形成的多维表面的前沿;以及（iii）行为良好的模型服从幂律标度指数$α\approximat 1/3$，其分布内拟合准确预测分布外误差，使其边界定量可预测。这些发现建立输出规模作为一个一流的几何生成模型的评价轴。数据集和代码可在https://github.com/KurbanIntelligenceLab/RADII上获得。
摘要：Every generative model for crystalline materials harbors a critical structure size beyond which its outputs quietly become unreliable -- we call this the extrapolation frontier. Despite its direct consequences for nanomaterial design, this frontier has never been systematically measured. We introduce RADII, a radius-resolved benchmark of ${\sim}$75,000 nanoparticle structures (55-11,298 atoms) that treats radius as a continuous scaling knob to trace generation quality from in-distribution to out-of-distribution regimes under leakage-free splits. RADII provides frontier-specific diagnostics: per-radius error profiles pinpoint each architecture's scaling ceiling, surface-interior decomposition tests whether failures originate at boundaries or in bulk, and cross-metric failure sequencing reveals which aspect of structural fidelity breaks first. Benchmarking five state-of-the-art architectures, we find that: (i) all models degrade by ${\sim}13\%$ in global positional error beyond training radii, yet local bond fidelity diverges wildly across architectures -- from near-zero to over $2\times$ collapse; (ii) no two architectures share the same failure sequence, revealing the frontier as a multi-dimensional surface shaped by model family; and (iii) well-behaved models obey a power-law scaling exponent $α\approx 1/3$ whose in-distribution fit accurately predicts out-of-distribution error, making their frontiers quantitatively forecastable. These findings establish output scale as a first-class evaluation axis for geometric generative models. The dataset and code are available at https://github.com/KurbanIntelligenceLab/RADII.

Transformer(5篇)

【1】Learning on the Manifold: Unlocking Standard Diffusion Transformers with Representation Encoders
标题：学习Manifold：解锁带有表示编码器的标准扩散变换器
链接：https://arxiv.org/abs/2602.10099

作者：Amandeep Kumar,Vishal M. Patel
备注：Technical Report
摘要：利用表示编码器进行生成建模为高效、高保真的合成提供了一条途径。然而，标准的扩散Transformers不能直接收敛于这些表示。虽然最近的工作属性，这是一个容量瓶颈，提出计算昂贵的宽度缩放的扩散Transformers器，我们证明，失败是从根本上几何。我们将几何干扰确定为根本原因：标准欧几里得流匹配迫使概率路径通过表示编码器的超球形特征空间的低密度内部，而不是遵循流形表面。为了解决这个问题，我们提出了黎曼流匹配与雅可比正则化（RJF）。通过将生成过程约束到流形测地线并校正曲率引起的误差传播，RJF使标准扩散Transformer架构能够在没有宽度缩放的情况下收敛。我们的方法RJF使标准的DiT-B架构（131 M参数）有效地收敛，实现了FID为3.37，而以前的方法无法收敛。代码：https://github.com/amandpkr/RJF
摘要：Leveraging representation encoders for generative modeling offers a path for efficient, high-fidelity synthesis. However, standard diffusion transformers fail to converge on these representations directly. While recent work attributes this to a capacity bottleneck proposing computationally expensive width scaling of diffusion transformers we demonstrate that the failure is fundamentally geometric. We identify Geometric Interference as the root cause: standard Euclidean flow matching forces probability paths through the low-density interior of the hyperspherical feature space of representation encoders, rather than following the manifold surface. To resolve this, we propose Riemannian Flow Matching with Jacobi Regularization (RJF). By constraining the generative process to the manifold geodesics and correcting for curvature-induced error propagation, RJF enables standard Diffusion Transformer architectures to converge without width scaling. Our method RJF enables the standard DiT-B architecture (131M parameters) to converge effectively, achieving an FID of 3.37 where prior methods fail to converge. Code: https://github.com/amandpkr/RJF

【2】Step-resolved data attribution for looped transformers
标题：环形Transformer的步进分辨数据属性
链接：https://arxiv.org/abs/2602.10097

作者：Georgios Kaissis,David Mildenberger,Juan Felipe Gomez,Martin J. Menten,Eleni Triantafillou
摘要：我们研究了单个训练示例如何塑造循环Transformers的内部计算，其中共享块用于$τ$循环迭代以实现潜在推理。现有的训练数据影响估计器（如TracIn）会产生一个单一的标量分数，该分数在所有循环迭代中聚合，从而模糊了在循环计算期间训练示例的重要性。我们引入了\textit{分步分解影响（SDI）}，它通过展开递归计算图并将影响归因于特定循环迭代来将TracIn分解为长度为$τ$的影响轨迹。为了使SDI在Transformer规模上实用，我们提出了一个TensorSketch实现，它永远不会实现每个示例的梯度。循环GPT风格的模型和算法推理任务的实验表明，SDI的规模很好，匹配全梯度基线与低误差，并支持广泛的数据属性和可解释性任务与每一步的洞察潜在的推理过程。
摘要：We study how individual training examples shape the internal computation of looped transformers, where a shared block is applied for $τ$ recurrent iterations to enable latent reasoning. Existing training-data influence estimators such as TracIn yield a single scalar score that aggregates over all loop iterations, obscuring when during the recurrent computation a training example matters. We introduce \textit{Step-Decomposed Influence (SDI)}, which decomposes TracIn into a length-$τ$ influence trajectory by unrolling the recurrent computation graph and attributing influence to specific loop iterations. To make SDI practical at transformer scale, we propose a TensorSketch implementation that never materialises per-example gradients. Experiments on looped GPT-style models and algorithmic reasoning tasks show that SDI scales excellently, matches full-gradient baselines with low error and supports a broad range of data attribution and interpretability tasks with per-step insights into the latent reasoning process.

【3】Statistical benchmarking of transformer models in low signal-to-noise time-series forecasting
标题：低信噪比时间序列预测中Transformer模型的统计基准测试
链接：https://arxiv.org/abs/2602.09869

作者：Cyril Garcia,Guillaume Remy
备注：Submitted to ICML
摘要：我们研究了Transformer架构在低数据制度下的多变量时间序列预测的性能，该制度仅包括几年的日常观察。使用已知的时间和横截面依赖结构和不同的信噪比的合成生成的过程中，我们进行自举实验，使直接评估通过样本外的相关性与最佳地面实况预测。我们表明，双向注意力Transformers，交替时间和横截面的自我注意力，可以超越标准的基线-拉索，提升方法，和完全连接的多层感知器-在广泛的设置，包括低信噪比制度。我们进一步介绍了一个动态稀疏化过程中应用的注意力矩阵在训练过程中，并证明它变得显着有效的噪声环境中，目标变量和最佳预测之间的相关性是几个百分点的顺序。对学习到的注意模式的分析揭示了可解释的结构，并表明了与经典回归中稀疏诱导正则化的联系，从而深入了解了为什么这些模型在噪声下有效地推广。
摘要：We study the performance of transformer architectures for multivariate time-series forecasting in low-data regimes consisting of only a few years of daily observations. Using synthetically generated processes with known temporal and cross-sectional dependency structures and varying signal-to-noise ratios, we conduct bootstrapped experiments that enable direct evaluation via out-of-sample correlations with the optimal ground-truth predictor. We show that two-way attention transformers, which alternate between temporal and cross-sectional self-attention, can outperform standard baselines-Lasso, boosting methods, and fully connected multilayer perceptrons-across a wide range of settings, including low signal-to-noise regimes. We further introduce a dynamic sparsification procedure for attention matrices applied during training, and demonstrate that it becomes significantly effective in noisy environments, where the correlation between the target variable and the optimal predictor is on the order of a few percent. Analysis of the learned attention patterns reveals interpretable structure and suggests connections to sparsity-inducing regularization in classical regression, providing insight into why these models generalize effectively under noise.

【4】The Laplacian Mechanism Improves Transformers by Reshaping Token Geometry
标题：拉普拉斯机制通过重塑代币几何改进变形机
链接：https://arxiv.org/abs/2602.09297

作者：Yuchong Zhang,Vardan Papyan
摘要：Transformers利用注意力、剩余连接和层规范化来控制令牌表示的变化。我们建议将注意力修改为拉普拉斯机制，使模型更直接地控制令牌方差。我们推测，这有助于Transformers实现理想的令牌几何。为了研究我们的猜想，我们首先表明，将拉普拉斯机制纳入Transformers会在计算机视觉和语言的基准测试中产生一致的改进。接下来，我们研究拉普拉斯机制如何使用各种工具影响令牌表示的几何形状：1）主成分分析，2）余弦相似性度量，3）方差分析和4）神经崩溃度量。我们的研究表明，拉普拉斯机制重塑令牌嵌入的几何形状的最大可分性：令牌崩溃根据他们的类，类的手段表现出神经崩溃。
摘要：Transformers leverage attention, the residual connection, and layer normalization to control the variance of token representations. We propose to modify attention into a Laplacian mechanism that gives the model more direct control over token variance. We conjecture that this helps transformers achieve the ideal token geometry. To investigate our conjecture, we first show that incorporating the Laplacian mechanism into transformers induces consistent improvements across benchmarks in computer vision and language. Next, we study how the Laplacian mechanism impacts the geometry of token representations using various tools: 1) principal component analysis, 2) cosine similarity metric, 3) analysis of variance, and 4) Neural Collapse metrics. Our investigation shows that the Laplacian mechanism reshapes token embeddings toward a geometry of maximal separability: tokens collapse according to their classes, and the class means exhibit Neural Collapse.

【5】Looping Back to Move Forward: Recursive Transformers for Efficient and Flexible Large Multimodal Models
标题：循环前进：高效灵活的大型多模式模型的回归变形机
链接：https://arxiv.org/abs/2602.09080

作者：Ruihan Xu,Yuting Gao,Lan Wang,Jianing Li,Weihao Chen,Qingpei Guo,Ming Yang,Shiliang Zhang
备注：This is a primary contribution in the Recursive Vision-Language Models
摘要：大型多模态模型（Large Multimodal Models，LMM）在视觉语言任务中取得了巨大的成功，但它们庞大的参数数量在训练和推理过程中往往没有得到充分利用。在这项工作中，我们采用了循环向前的想法：通过递归细化重用模型参数，在不增加模型大小的情况下提取更强的多模态表示。我们提出RecursiveVLM，一个递归的Transformer架构，为Linux量身定制。两个关键的创新实现了有效的循环：（i）递归连接器，通过融合中间层隐藏状态和应用特定于模态的投影，在递归步骤中对齐特征，尊重视觉和语言标记的不同统计结构;（ii）单调递归损失，监督每一步，并保证性能随着递归深度单调提高。这种设计将递归转换为按需细化机制：在资源受限的设备上以很少的循环提供强大的结果，并在更多计算资源可用时逐步改进输出。实验表明，与标准Transformers相比，稳定的增益为+3%，与普通递归基线相比，稳定的增益为+7%，这表明战略循环是实现高效的、自适应部署的Linux的强大途径。
摘要：Large Multimodal Models (LMMs) have achieved remarkable success in vision-language tasks, yet their vast parameter counts are often underutilized during both training and inference. In this work, we embrace the idea of looping back to move forward: reusing model parameters through recursive refinement to extract stronger multimodal representations without increasing model size. We propose RecursiveVLM, a recursive Transformer architecture tailored for LMMs. Two key innovations enable effective looping: (i) a Recursive Connector that aligns features across recursion steps by fusing intermediate-layer hidden states and applying modality-specific projections, respecting the distinct statistical structures of vision and language tokens; (ii) a Monotonic Recursion Loss that supervises every step and guarantees performance improves monotonically with recursion depth. This design transforms recursion into an on-demand refinement mechanism: delivering strong results with few loops on resource-constrained devices and progressively improving outputs when more computation resources are available. Experiments show consistent gains of +3% over standard Transformers and +7% over vanilla recursive baselines, demonstrating that strategic looping is a powerful path toward efficient, deployment-adaptive LMMs.

GAN|对抗|攻击|生成相关(6篇)

【1】Evaluating Disentangled Representations for Controllable Music Generation
标题：评估可控制音乐生成的分离表示
链接：https://arxiv.org/abs/2602.10058

作者：Laura Ibáñez-Martínez,Chukwuemeka Nkama,Andrea Poltronieri,Xavier Serra,Martín Rocamora
备注：Accepted at ICASSP 2026
摘要：最近的音乐生成方法依赖于分解的表示，通常被标记为结构和音色或局部和全局，以实现可控的合成。然而，这些嵌入的基本属性仍然没有得到充分的探索。在这项工作中，我们评估这样的解纠缠表示在一组音乐音频模型的可控生成使用基于探测的框架，超越标准的下游任务。所选模型反映了不同的无监督解纠缠策略，包括归纳偏见，数据增强，对抗性目标和阶段性训练程序。我们进一步分离出具体的策略来分析它们的效果。我们的分析跨越四个关键轴：信息性，等变性，不变性和解纠缠，这些都是跨数据集，任务和受控转换进行评估的。我们的研究结果揭示了嵌入的预期和实际语义之间的不一致，这表明目前的策略未能产生真正的解脱表示，并促使重新审视如何在音乐生成的可控性。
摘要：Recent approaches in music generation rely on disentangled representations, often labeled as structure and timbre or local and global, to enable controllable synthesis. Yet the underlying properties of these embeddings remain underexplored. In this work, we evaluate such disentangled representations in a set of music audio models for controllable generation using a probing-based framework that goes beyond standard downstream tasks. The selected models reflect diverse unsupervised disentanglement strategies, including inductive biases, data augmentations, adversarial objectives, and staged training procedures. We further isolate specific strategies to analyze their effect. Our analysis spans four key axes: informativeness, equivariance, invariance, and disentanglement, which are assessed across datasets, tasks, and controlled transformations. Our findings reveal inconsistencies between intended and actual semantics of the embeddings, suggesting that current strategies fall short of producing truly disentangled representations, and prompting a re-examination of how controllability is approached in music generation.

【2】Stemphonic: All-at-once Flexible Multi-stem Music Generation
标题：Stemphonic：同时灵活的多干音乐生成
链接：https://arxiv.org/abs/2602.09891

作者：Shih-Lun Wu,Ge Zhu,Juan-Pablo Caceres,Cheng-Zhi Anna Huang,Nicholas J. Bryan
备注：Accepted for publication at Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP) 2026
摘要：与传统的文本到音乐模型相比，音乐干生成，即产生音乐同步和隔离的乐器音频片段的任务，提供了更大的用户控制和更好地与音乐家工作流程对齐的潜力。然而，现有的词干生成方法要么依赖于并行输出预定义的词干集合的固定架构，要么一次仅生成一个词干，导致推理缓慢，尽管词干组合具有灵活性。我们提出了Stemphonic，一个基于扩散/流的框架，克服了这种权衡，并在一个推理通道中生成一组可变的同步茎。在训练过程中，我们将每个词干视为一个批处理元素，将同步的词干分组为一个批处理，并对每个组应用共享的噪声潜伏。在推理时，我们使用共享的初始噪声潜伏和特定于词干的文本输入来在一次通过中生成同步的多词干输出。我们进一步扩展了我们的方法，使一遍条件多干生成和干明智的活动控制，使用户能够迭代地生成和编排的时间分层的组合。我们在多个开源茎评估集上对我们的结果进行了基准测试，结果表明Stemphonic产生了更高质量的输出，同时将完整的混音生成过程加速了25%到50%。演示网址：https://stemphonic-demo.vercel.app。
摘要：Music stem generation, the task of producing musically-synchronized and isolated instrument audio clips, offers the potential of greater user control and better alignment with musician workflows compared to conventional text-to-music models. Existing stem generation approaches, however, either rely on fixed architectures that output a predefined set of stems in parallel, or generate only one stem at a time, resulting in slow inference despite flexibility in stem combination. We propose Stemphonic, a diffusion-/flow-based framework that overcomes this trade-off and generates a variable set of synchronized stems in one inference pass. During training, we treat each stem as a batch element, group synchronized stems in a batch, and apply a shared noise latent to each group. At inference-time, we use a shared initial noise latent and stem-specific text inputs to generate synchronized multi-stem outputs in one pass. We further expand our approach to enable one-pass conditional multi-stem generation and stem-wise activity controls to empower users to iteratively generate and orchestrate the temporal layering of a mix. We benchmark our results on multiple open-source stem evaluation sets and show that Stemphonic produces higher-quality outputs while accelerating the full mix generation process by 25 to 50%. Demos at: https://stemphonic-demo.vercel.app.

【3】Towards Poisoning Robustness Certification for Natural Language Generation
标题：走向自然语言生成的有毒鲁棒性认证
链接：https://arxiv.org/abs/2602.09757

作者：Mihnea Ghitu,Matthew Wicker
摘要：了解自然语言生成的可靠性对于在安全敏感领域中部署基础模型至关重要。虽然认证中毒防御为分类任务提供了可证明的鲁棒性界限，但它们从根本上不适合自回归生成：它们无法处理顺序预测或语言模型的指数级大输出空间。为了建立一个认证的自然语言生成框架，我们正式确定了两个安全属性：稳定性（对生成中任何变化的鲁棒性）和有效性（对生成中有针对性的有害变化的鲁棒性）。我们引入了目标分区聚合（TPA），这是第一个通过计算诱导特定有害类、令牌或短语所需的最小中毒预算来验证有效性/目标攻击的算法。此外，我们扩展TPA提供更严格的保证，使用混合整数线性规划（MILP）的多轮代。从经验上讲，我们证明了TPA在不同设置中的有效性，包括：当对手修改高达0.5%的数据集时，验证代理工具调用的有效性，并在基于偏好的对齐中验证8个令牌的稳定范围。虽然推理时间延迟仍然是一个公开的挑战，但我们的贡献使语言模型在安全关键型应用程序中的认证部署成为可能。
摘要：Understanding the reliability of natural language generation is critical for deploying foundation models in security-sensitive domains. While certified poisoning defenses provide provable robustness bounds for classification tasks, they are fundamentally ill-equipped for autoregressive generation: they cannot handle sequential predictions or the exponentially large output space of language models. To establish a framework for certified natural language generation, we formalize two security properties: stability (robustness to any change in generation) and validity (robustness to targeted, harmful changes in generation). We introduce Targeted Partition Aggregation (TPA), the first algorithm to certify validity/targeted attacks by computing the minimum poisoning budget needed to induce a specific harmful class, token, or phrase. Further, we extend TPA to provide tighter guarantees for multi-turn generations using mixed integer linear programming (MILP). Empirically, we demonstrate TPA's effectiveness across diverse settings including: certifying validity of agent tool-calling when adversaries modify up to 0.5% of the dataset and certifying 8-token stability horizons in preference-based alignment. Though inference-time latency remains an open challenge, our contributions enable certified deployment of language models in security-critical applications.

【4】Online Learning in MDPs with Partially Adversarial Transitions and Losses
标题：具有部分对抗性转变和损失的MDPs的在线学习
链接：https://arxiv.org/abs/2602.09474

作者：Ofir Schlisselberg,Tal Lancewicki,Yishay Mansour
摘要：We study reinforcement learning in MDPs whose transition function is stochastic at most steps but may behave adversarially at a fixed subset of $Λ$ steps per episode. This model captures environments that are stable except at a few vulnerable points. We introduce \emph{conditioned occupancy measures}, which remain stable across episodes even with adversarial transitions, and use them to design two algorithms. The first handles arbitrary adversarial steps and achieves regret $\tilde{O}(H S^Λ\sqrt{K S A^{Λ+1}})$, where $K$ is the number of episodes, $S$ is the number of state, $A$ is the number of actions and $H$ is the episode's horizon. The second, assuming the adversarial steps are consecutive, improves the dependence on $S$ to $\tilde{O}(H\sqrt{K S^{3} A^{Λ+1}})$. We further give a $K^{2/3}$-regret reduction that removes the need to know which steps are the $Λ$ adversarial steps. We also characterize the regret of adversarial MDPs in the \emph{fully adversarial} setting ($Λ=H-1$) both for full-information and bandit feedback, and provide almost matching upper and lower bounds (slightly strengthen existing lower bounds, and clarify how different feedback structures affect the hardness of learning).
摘要：We study reinforcement learning in MDPs whose transition function is stochastic at most steps but may behave adversarially at a fixed subset of $Λ$ steps per episode. This model captures environments that are stable except at a few vulnerable points. We introduce \emph{conditioned occupancy measures}, which remain stable across episodes even with adversarial transitions, and use them to design two algorithms. The first handles arbitrary adversarial steps and achieves regret $\tilde{O}(H S^Λ\sqrt{K S A^{Λ+1}})$, where $K$ is the number of episodes, $S$ is the number of state, $A$ is the number of actions and $H$ is the episode's horizon. The second, assuming the adversarial steps are consecutive, improves the dependence on $S$ to $\tilde{O}(H\sqrt{K S^{3} A^{Λ+1}})$. We further give a $K^{2/3}$-regret reduction that removes the need to know which steps are the $Λ$ adversarial steps. We also characterize the regret of adversarial MDPs in the \emph{fully adversarial} setting ($Λ=H-1$) both for full-information and bandit feedback, and provide almost matching upper and lower bounds (slightly strengthen existing lower bounds, and clarify how different feedback structures affect the hardness of learning).

【5】Measuring Privacy Risks and Tradeoffs in Financial Synthetic Data Generation
标题：衡量金融合成数据生成中的隐私风险和权衡
链接：https://arxiv.org/abs/2602.09288

作者：Michael Zuo,Inwon Kang,Stacy Patterson,Oshani Seneviratne
摘要：我们探讨的隐私效用权衡的合成数据生成方案的表格金融数据集，一个域的特点是高监管风险和严重的类不平衡。我们考虑代表性的表格数据生成器，包括自动编码器，生成对抗网络，扩散和copula合成器。为了应对金融领域的挑战，我们提供了新颖的隐私保护实现的GAN和自动编码器合成器。我们评估生成器是否以及如何同时实现数据质量，下游效用和隐私，并在平衡和不平衡的输入数据集之间进行比较。我们的研究结果提供了深入了解从表现出严重的类不平衡和混合类型属性的数据集生成合成数据的独特挑战。
摘要：We explore the privacy-utility tradeoff of synthetic data generation schemes on tabular financial datasets, a domain characterized by high regulatory risk and severe class imbalance. We consider representative tabular data generators, including autoencoders, generative adversarial networks, diffusion, and copula synthesizers. To address the challenges of the financial domain, we provide novel privacy-preserving implementations of GAN and autoencoder synthesizers. We evaluate whether and how well the generators simultaneously achieve data quality, downstream utility, and privacy, with comparison across balanced and imbalanced input datasets. Our results offer insight into the distinct challenges of generating synthetic data from datasets that exhibit severe class imbalance and mixed-type attributes.

【6】One RNG to Rule Them All: How Randomness Becomes an Attack Vector in Machine Learning
标题：一个RNG来统治所有人：随机性如何成为机器学习中的攻击载体
链接：https://arxiv.org/abs/2602.09182

作者：Kotekar Annapoorna Prabhu,Andrew Gan,Zahra Ghodsi
备注：This work has been accepted for publication at the IEEE Conference on Secure and Trustworthy Machine Learning (SaTML). The final version will be available on IEEE Xplore
摘要：机器学习依赖于随机性作为各种步骤的基本组成部分，例如数据采样，数据增强，权重初始化和优化。大多数机器学习框架使用伪随机数生成器作为随机性的来源。然而，不同框架、软件依赖性和硬件后端之间的设计选择和实现的差异，以及缺乏统计验证，可能会导致机器学习系统上出现以前从未探索过的攻击向量。这种对随机性来源的攻击可能非常隐蔽，并且在现实世界的系统中有着利用的历史。在这项工作中，我们从对抗的角度研究了随机性在机器学习开发管道中的作用，并分析了PRNG在主要机器学习框架中的实现。我们推出RNGGuard来帮助机器学习工程师以较低的工作量保护他们的系统。RNGGuard静态分析目标库的源代码，并识别使用它们的随机函数和模块的实例。在运行时，RNGGuard通过将不安全的函数调用替换为满足安全规范的RNGGuard实现来强制随机函数的安全执行。我们的评估表明，RNGGuard提供了一种实用的方法，可以缩小机器学习系统中安全随机性来源的现有差距。
摘要：Machine learning relies on randomness as a fundamental component in various steps such as data sampling, data augmentation, weight initialization, and optimization. Most machine learning frameworks use pseudorandom number generators as the source of randomness. However, variations in design choices and implementations across different frameworks, software dependencies, and hardware backends along with the lack of statistical validation can lead to previously unexplored attack vectors on machine learning systems. Such attacks on randomness sources can be extremely covert, and have a history of exploitation in real-world systems. In this work, we examine the role of randomness in the machine learning development pipeline from an adversarial point of view, and analyze the implementations of PRNGs in major machine learning frameworks. We present RNGGuard to help machine learning engineers secure their systems with low effort. RNGGuard statically analyzes a target library's source code and identifies instances of random functions and modules that use them. At runtime, RNGGuard enforces secure execution of random functions by replacing insecure function calls with RNGGuard's implementations that meet security specifications. Our evaluations show that RNGGuard presents a practical approach to close existing gaps in securing randomness sources in machine learning systems.

半/弱/无/有监督|不确定性|主动学习(7篇)

【1】Features as Rewards: Scalable Supervision for Open-Ended Tasks via Interpretability
标题：奖励功能：通过可解释性对开放式任务进行可扩展监督
链接：https://arxiv.org/abs/2602.10067

作者：Aaditya Vikram Prasad,Connor Watts,Jack Merullo,Dhruvil Gala,Owen Lewis,Thomas McGrath,Ekdeep Singh Lubana
摘要：在大规模数据集上训练的语言模型已被证明可以学习编码抽象概念的特征，如真实性或意图。这些功能传统上用于测试时监控或转向。我们提出了一种替代启示：功能作为可扩展的监督开放式任务。我们认为减少幻觉的情况是一种理想的，但开放的行为，并设计了一个强化学习（RL）管道，名为RLFR（从特征奖励中强化学习），使用特征作为奖励函数。基于一个新的探测框架，确定候选人的幻觉索赔，我们的管道教一个模型，以干预和纠正其完成时，它是不确定的事实。此外，该管道还支持可扩展的测试时计算，并再次受到我们的奖励功能的指导。在Gemma-3- 12 B-IT上运行的这个端到端过程产生的策略与原始模型相比，产生幻觉的可能性降低了58%，同时保持了标准基准的性能。两者合计，通过在功能的语言接地监督，本文介绍了一种新的范式，在学习开放式任务的可解释性的使用。
摘要：Language models trained on large-scale datasets have been shown to learn features that encode abstract concepts such as factuality or intent. Such features are traditionally used for test-time monitoring or steering. We present an alternative affordance: features as scalable supervision for open-ended tasks. We consider the case of hallucination-reduction as a desirable, yet open-ended behavior and design a reinforcement learning (RL) pipeline, titled RLFR (Reinforcement Learning from Feature Rewards), that uses features as reward functions. Grounded in a novel probing framework that identifies candidate hallucinated claims, our pipeline teaches a model to intervene and correct its completions when it is uncertain of their factuality. Furthermore, the pipeline enables scalable test-time compute, guided once more by our reward features. This end-to-end process operationalized on Gemma-3-12B-IT results in a policy that is 58% less likely to hallucinate compared to the original model, while preserving performance on standard benchmarks. Taken together, by grounding supervision in the language of features, this paper introduces a novel paradigm in the use of interpretability for learning open-ended tasks.

【2】Supervised Metric Regularization Through Alternating Optimization for Multi-Regime Physics-Informed Neural Networks
标题：基于交替优化的多状态物理信息神经网络监督度量正则化
链接：https://arxiv.org/abs/2602.09980

作者：Enzo Nicolas Spotorno,Josafat Ribeiro Leal,Antonio Augusto Frohlich
备注：5 pages, 1 figure
摘要：标准物理信息神经网络（PINN）在建模具有急剧状态转换（如分叉）的参数化动态系统时经常面临挑战。在这些情况下，从参数到解决方案的连续映射可能导致频谱偏差或“模式崩溃”，其中网络平均不同的物理行为。我们提出了一个拓扑感知PINN（TAPINN），旨在通过监督度量正则化构建潜在空间来减轻这一挑战。与将物理参数直接映射到解决方案的标准参数PINN不同，我们的方法将求解器设置在优化的潜在状态上，以反映区域之间基于度量的分离，显示出约49%的低物理残差（0.082 vs. 0.160）。我们训练这个架构使用基于相位的交替优化（AO）计划来管理度量和物理目标之间的梯度冲突。在Duffing Oscillator上的初步实验表明，虽然标准基线受到光谱偏差和高容量超网络过拟合（在违反物理的同时记忆数据）的影响，但我们的方法实现了稳定的收敛，梯度方差比多输出Sobolev误差基线低2.18倍，参数比基于超网络的替代方案少5倍。
摘要：Standard Physics-Informed Neural Networks (PINNs) often face challenges when modeling parameterized dynamical systems with sharp regime transitions, such as bifurcations. In these scenarios, the continuous mapping from parameters to solutions can result in spectral bias or "mode collapse", where the network averages distinct physical behaviors. We propose a Topology-Aware PINN (TAPINN) that aims to mitigate this challenge by structuring the latent space via Supervised Metric Regularization. Unlike standard parametric PINNs that map physical parameters directly to solutions, our method conditions the solver on a latent state optimized to reflect the metric-based separation between regimes, showing ~49% lower physics residual (0.082 vs. 0.160). We train this architecture using a phase-based Alternating Optimization (AO) schedule to manage gradient conflicts between the metric and physics objectives. Preliminary experiments on the Duffing Oscillator demonstrate that while standard baselines suffer from spectral bias and high-capacity Hypernetworks overfit (memorizing data while violating physics), our approach achieves stable convergence with 2.18x lower gradient variance than a multi-output Sobolev Error baseline, and 5x fewer parameters than a hypernetwork-based alternative.

【3】Self-Supervised Learning as Discrete Communication
标题：作为离散通信的自监督学习
链接：https://arxiv.org/abs/2602.09764

作者：Kawtar Zaher,Ilyass Moummad,Olivier Buisson,Alexis Joly
摘要：大多数自监督学习（SSL）方法通过对齐同一输入的不同视图来学习连续的视觉表示，对信息如何跨表示维度进行结构化提供有限的控制。在这项工作中，我们将视觉自监督学习框架为教师和学生网络之间的离散通信过程，其中语义信息通过固定容量的二进制通道传输。学生预测教师产生的多标签二进制消息，而不是对齐连续特征。离散协议通过逐元素的二进制交叉熵目标来执行，而编码率正则化项鼓励有效利用受约束的通道，促进结构化表示。我们进一步表明，定期重新初始化的投影头加强这种效果，鼓励嵌入，保持预测跨多个离散编码。大量的实验表明，在图像分类，检索和密集的视觉预测任务，以及通过自我监督适应域转移的连续协议基线的一致改进。除了主干表示，我们分析了学习的二进制代码，并表明它们形成了一个紧凑的和信息丰富的离散语言，捕捉跨类可重用的语义因素。
摘要：Most self-supervised learning (SSL) methods learn continuous visual representations by aligning different views of the same input, offering limited control over how information is structured across representation dimensions. In this work, we frame visual self-supervised learning as a discrete communication process between a teacher and a student network, where semantic information is transmitted through a fixed-capacity binary channel. Rather than aligning continuous features, the student predicts multi-label binary messages produced by the teacher. Discrete agreement is enforced through an element-wise binary cross-entropy objective, while a coding-rate regularization term encourages effective utilization of the constrained channel, promoting structured representations. We further show that periodically reinitializing the projection head strengthens this effect by encouraging embeddings that remain predictive across multiple discrete encodings. Extensive experiments demonstrate consistent improvements over continuous agreement baselines on image classification, retrieval, and dense visual prediction tasks, as well as under domain shift through self-supervised adaptation. Beyond backbone representations, we analyze the learned binary codes and show that they form a compact and informative discrete language, capturing semantic factors reusable across classes.

【4】Positive-Unlabelled Active Learning to Curate a Dataset for Orca Resident Interpretation
标题：积极无标签的主动学习为虎鲸居民解释策划数据集
链接：https://arxiv.org/abs/2602.09295

作者：Bret Nestor,Bohan Yao,Jasmine Moore,Jasper Kanes
摘要：这项工作展示了迄今为止最大的南方居民虎鲸（SRKW）声学数据，还包括其环境中的其他海洋哺乳动物。我们系统地搜索SRKW栖息地内所有可用的公共档案水听器数据（超过30年的音频数据）。该搜索包括弱监督，积极的非标记，主动学习策略，以识别海洋哺乳动物的所有实例。由此产生的基于transformer的检测器在精度、能效和速度方面优于DEEPAL、DCLDE-2026和两个新引入的专家注释数据集上的最先进检测器。检测模型的特异性为0-28.8%，灵敏度为95%。在DCLDE-2026数据集上，我们的多类物种分类器获得了42.1%的前1准确度（11个训练类，4个测试类），我们的生态型分类器获得了43.0%的前1准确度（4个训练类，5个测试类）。我们产生了919小时的SRKW数据、230小时的Bigg逆戟鲸数据、1374小时的未标记生态型逆戟鲸数据、1501小时的座头鲸数据、88小时的海狮数据、246小时的太平洋白腹海豚数据以及超过784小时的未指定海洋哺乳动物数据。这个SRKW数据集比DCLDE-2026、Ocean Networks Canada和OrcaSound的总和还要大。精选的物种标签在CC-BY 4.0许可下可用，相应的音频数据在原始所有者的许可下可用。该数据集的全面性使其适合无监督的机器翻译、栖息地使用调查和这种极度濒危生态类型的保护工作。
摘要：This work presents the largest curation of Southern Resident Killer Whale (SRKW) acoustic data to date, also containing other marine mammals in their environment. We systematically search all available public archival hydrophone data within the SRKW habitat (over 30 years of audio data). The search consists of a weakly-supervised, positive-unlabelled, active learning strategy to identify all instances of marine mammals. The resulting transformer-based detectors outperform state-of-the-art detectors on the DEEPAL, DCLDE-2026, and two newly introduced expert-annotated datasets in terms of accuracy, energy efficiency, and speed. The detection model has a specificity of 0-28.8% at 95% sensitivity. Our multiclass species classifier obtains a top-1 accuracy of 42.1% (11 train classes, 4 test classes) and our ecotype classifier obtains a top-1 accuracy of 43.0% (4 train classes, 5 test classes) on the DCLDE-2026 dataset. We yield 919 hours of SRKW data, 230 hours of Bigg's orca data, 1374 hours of orca data from unlabelled ecotypes, 1501 hours of humpback data, 88 hours of sea lion data, 246 hours of pacific white-sided dolphin data, and over 784 hours of unspecified marine mammal data. This SRKW dataset is larger than DCLDE-2026, Ocean Networks Canada, and OrcaSound combined. The curated species labels are available under CC-BY 4.0 license, and the corresponding audio data are available under the licenses of the original owners. The comprehensive nature of this dataset makes it suitable for unsupervised machine translation, habitat usage surveys, and conservation endeavours for this critically endangered ecotype.

【5】Quantifying Epistemic Uncertainty in Diffusion Models
标题：量化扩散模型中的认识不确定性
链接：https://arxiv.org/abs/2602.09170

作者：Aditi Gupta,Raphael A. Meyer,Yotam Yaniv,Elynn Chen,N. Benjamin Erichson
备注：Will appear in the Proceedings of the 29th International Conference on Artificial Intelligence and Statistics (AISTATS) 2026
摘要：为了保证高质量的输出，量化扩散模型的认知不确定性是很重要的，现有的方法往往是不可靠的，因为它们混合了认知和任意的不确定性。我们引入了一种基于Fisher信息的方法，该方法明确隔离了认知方差，为生成的数据产生了更可靠的可验证性分数。为了使这种方法可扩展，我们提出了FLARE（Fisher-Laplace随机估计），它使用模型参数的均匀随机子集来近似Fisher信息。从经验上讲，FLARE改进了合成时间序列生成任务中的不确定性估计，实现了比其他方法更准确和可靠的过滤。从理论上讲，我们限制了随机逼近的收敛速度，并提供了分析和经验证据，证明最后一层拉普拉斯逼近不足以完成这项任务。
摘要：To ensure high quality outputs, it is important to quantify the epistemic uncertainty of diffusion models.Existing methods are often unreliable because they mix epistemic and aleatoric uncertainty. We introduce a method based on Fisher information that explicitly isolates epistemic variance, producing more reliable plausibility scores for generated data. To make this approach scalable, we propose FLARE (Fisher-Laplace Randomized Estimator), which approximates the Fisher information using a uniformly random subset of model parameters. Empirically, FLARE improves uncertainty estimation in synthetic time-series generation tasks, achieving more accurate and reliable filtering than other methods. Theoretically, we bound the convergence rate of our randomized approximation and provide analytic and empirical evidence that last-layer Laplace approximations are insufficient for this task.

【6】Windowed SummaryMixing: An Efficient Fine-Tuning of Self-Supervised Learning Models for Low-resource Speech Recognition
标题：窗口摘要混合：用于低资源语音识别的自监督学习模型的有效微调
链接：https://arxiv.org/abs/2602.09043

作者：Aditya Srinivas Menon,Kumud Tripathi,Raj Gohil,Pankaj Wasnik
备注：The paper has been accepted at ICASSP 2026, Barcelona, Spain
摘要：自监督学习（SSL）具有先进的语音处理，但由于自我注意而遭受二次复杂性。为了解决这个问题，SummaryMixing（SM）已被提出作为一个线性时间的替代，总结整个话语使用均值池，但缺乏足够的本地上下文。在这项工作中，我们介绍了窗口SummaryMixing（WSM），它通过将局部邻域摘要与全局摘要相结合来增强SM，在提高时间依赖性的同时保持效率。此外，我们引入了一种选择性微调方法，用WSM块替换SSL模型中的自注意层，并在低资源设置中仅微调这些块。我们的方法提高了ASR的性能，同时减少了40%的峰值VRAM使用的SSL模型。WSM块具有线性时间复杂度和增强的上下文感知。选择性地替换一些注意层可以减少计算、内存和延迟，使其成为低资源语音识别的理想选择。
摘要：Self-supervised learning (SSL) has advanced speech processing but suffers from quadratic complexity due to self-attention. To address this, SummaryMixing (SM) has been proposed as a linear-time alternative that summarizes entire utterances using mean pooling but lacks sufficient local context. In this work, we introduce Windowed SummaryMixing (WSM), which enhances SM by integrating local neighborhood summaries alongside the global summary, maintaining efficiency while improving temporal dependencies. Additionally, we introduce a selective fine-tuning approach, replacing self-attention layers in SSL models with WSM blocks and fine-tuning only these blocks in low-resource settings. Our approach improves ASR performance while reducing peak VRAM usage by 40\% in the SSL models. WSM blocks have linear-time complexity with enhanced context awareness. Selectively replacing some attention layers reduces compute, memory, and latency, making it ideal for low-resource speech recognition.

【7】Soft Clustering Anchors for Self-Supervised Speech Representation Learning in Joint Embedding Prediction Architectures
标题：联合嵌入预测体系结构中自监督语音表示学习的软聚集条件
链接：https://arxiv.org/abs/2602.09040

作者：Georgios Ioannides,Adrian Kieback,Judah Goldfeder,Linsey Pang,Aman Chadha,Aaron Elkins,Yann LeCun,Ravid Shwartz-Ziv
备注：15 pages, 5 figures. Code: github.com/gioannides/clustering-anchored-jepa
摘要：联合嵌入预测架构（JEPA）为自监督语音表示学习提供了一种很有前途的方法，但在没有明确基础的情况下会出现表示崩溃。我们提出了GMM锚定JEPA，它适合一个高斯混合模型一次对数梅尔频谱图，并使用其冻结的软后验作为辅助目标，在整个训练。一个衰减的监督时间表允许GMM正规化，以主导早期的训练，然后逐渐屈服于JEPA的目标。与HuBERT和WavLM不同，它们需要迭代重新聚类，我们的方法使用软分配而不是硬分配对输入特征进行一次聚类。在约50k小时的语音中，与具有匹配计算的WavLM风格基线相比，GMM锚定提高了ASR（28.68% vs. 33.22% WER），情感识别（67.76% vs. 65.46%）和插槽填充（64.7% vs. 59.1% F1）。聚类分析表明，GMM锚定表示实现高达98%的熵相比，WavLM风格的31%，表明更均匀的集群利用率。代码可在https://github.com/gioannides/clustering-anchored-jepa上获得。
摘要：Joint Embedding Predictive Architectures (JEPA) offer a promising approach to self-supervised speech representation learning, but suffer from representation collapse without explicit grounding. We propose GMM-Anchored JEPA, which fits a Gaussian Mixture Model once on log-mel spectrograms and uses its frozen soft posteriors as auxiliary targets throughout training. A decaying supervision schedule allows GMM regularization to dominate early training before gradually yielding to the JEPA objective. Unlike HuBERT and WavLM, which require iterative re-clustering, our approach clusters input features once with soft rather than hard assignments. On ~50k hours of speech, GMM anchoring improves ASR (28.68% vs. 33.22% WER), emotion recognition (67.76% vs. 65.46%), and slot filling (64.7% vs. 59.1% F1) compared to a WavLM-style baseline with matched compute. Cluster analysis shows GMM-anchored representations achieve up to 98% entropy compared to 31% for WavLM-style, indicating substantially more uniform cluster utilization. Code is made available at https://github.com/gioannides/clustering-anchored-jepa.

迁移|Zero/Few/One-Shot|自适应(5篇)

【1】Grounding LTL Tasks in Sub-Symbolic RL Environments for Zero-Shot Generalization
标题：在子符号RL环境中接地LTL任务以实现Zero-Shot概括
链接：https://arxiv.org/abs/2602.09761

作者：Matteo Pannacci,Andrea Fanti,Elena Umili,Roberto Capobianco
备注：Preprint currently under review
摘要：在这项工作中，我们解决了训练强化学习代理在子符号环境中遵循线性时序逻辑中表达的多个时间扩展指令的问题。以前的多任务工作主要依赖于原始观测和公式中出现的符号之间的映射知识。我们放弃了这个不切实际的假设，通过联合训练多任务策略和具有相同经验的符号地滚球。符号地滚球仅通过神经奖励机器以半监督的方式从原始观察和稀疏奖励中进行训练。基于视觉的环境中的实验表明，我们的方法实现的性能与使用真正的符号接地和显着优于国家的最先进的方法子符号环境。
摘要：In this work we address the problem of training a Reinforcement Learning agent to follow multiple temporally-extended instructions expressed in Linear Temporal Logic in sub-symbolic environments. Previous multi-task work has mostly relied on knowledge of the mapping between raw observations and symbols appearing in the formulae. We drop this unrealistic assumption by jointly training a multi-task policy and a symbol grounder with the same experience. The symbol grounder is trained only from raw observations and sparse rewards via Neural Reward Machines in a semi-supervised fashion. Experiments on vision-based environments show that our method achieves performance comparable to using the true symbol grounding and significantly outperforms state-of-the-art methods for sub-symbolic environments.

【2】Adaptive recurrent flow map operator learning for reaction diffusion dynamics
标题：反应扩散动力学的自适应循环流图操作员学习
链接：https://arxiv.org/abs/2602.09487

作者：Huseyin Tunc
摘要：反应扩散（RD）方程是化学、生物学和物理学模式形成的基础，但学习稳定的算子，从数据中预测它们的长期动力学仍然具有挑战性。神经操作代理提供了分辨率鲁棒的预测，但自回归推出可能会由于误差的积累而漂移，并且分布外（OOD）初始条件通常会降低精度。基于物理的数值残差目标可以正则化算子学习，尽管它们引入了额外的假设、对离散化和损失设计的敏感性以及更高的训练成本。在这里，我们开发了一个纯数据驱动的操作员学习器，具有自适应循环训练（DDOL-ART），使用强大的循环策略，具有轻量级验证里程碑，可以提前退出非生产性的推出部分和重定向优化。DDOL-ART仅在短期内对单个分布内环形高斯族进行训练，学习在长时间部署下保持稳定的单步运算符，并将zero-shot推广到FitzHugh-Nagumo（FN），Gray-Scott（GS）和Lambda-Omega（LO）系统的强形态变化。在这些基准测试中，DDOL-ART提供了强大的准确性和成本权衡。在匹配设置下，它比基于物理的数值损失算子学习器（NLOL）快几倍，并且在分布稳定性和OOD鲁棒性方面仍然具有竞争力。训练动态诊断表明，自适应性加强了验证误差和OOD测试误差性能之间的相关性，作为一个反馈控制器，限制优化漂移。我们的研究结果表明，DDOL-ART的反馈控制的循环训练产生了强大的流图代理，没有PDE残差，同时保持与NLOL的竞争力，显着降低训练成本。
摘要：Reaction-diffusion (RD) equations underpin pattern formation across chemistry, biology, and physics, yet learning stable operators that forecast their long-term dynamics from data remains challenging. Neural-operator surrogates provide resolution-robust prediction, but autoregressive rollouts can drift due to the accumulation of error, and out-of-distribution (OOD) initial conditions often degrade accuracy. Physics-based numerical residual objectives can regularize operator learning, although they introduce additional assumptions, sensitivity to discretization and loss design, and higher training cost. Here we develop a purely data-driven operator learner with adaptive recurrent training (DDOL-ART) using a robust recurrent strategy with lightweight validation milestones that early-exit unproductive rollout segments and redirect optimization. Trained only on a single in-distribution toroidal Gaussian family over short horizons, DDOL-ART learns one-step operators that remain stable under long rollouts and generalize zero-shot to strong morphology shifts across FitzHugh-Nagumo (FN), Gray-Scott (GS), and Lambda-Omega (LO) systems. Across these benchmarks, DDOL-ART delivers a strong accuracy and cost trade-off. It is several-fold faster than a physics-based numerical-loss operator learner (NLOL) under matched settings, and it remains competitive on both in-distribution stability and OOD robustness. Training-dynamics diagnostics show that adaptivity strengthens the correlation between validation error and OOD test error performance, acting as a feedback controller that limits optimization drift. Our results indicate that feedback-controlled recurrent training of DDOL-ART generates robust flow-map surrogates without PDE residuals, while simultaneously maintaining competitiveness with NLOL at significantly reduced training costs.

【3】LARV: Data-Free Layer-wise Adaptive Rescaling Veneer for Model Merging
标题：LARV：用于模型合并的无数据分层自适应重新缩放单板
链接：https://arxiv.org/abs/2602.09413

作者：Xinyu Wang,Ke Deng,Fei Dou,Jinbo Bi,Jin Lu
备注：14 pages, 9 figures, 6 tables
摘要：模型合并旨在将多个微调模型合并为单个多任务模型，而无需访问训练数据。现有的任务向量合并方法，如TIES，TSV-M，和Iso-C/CTS不同的聚合规则，但对待所有层几乎一致。这种假设忽略了大型Vision Transformers中的强分层异质性，其中浅层对干扰敏感，而深层编码稳定的特定任务特征。我们介绍了LARV，一个无训练，无数据，与合并无关的分层自适应缩放单板，它可以插入任何任务向量合并，并在聚合之前为每个任务向量分配每层规模，并显示它始终如一地增强了不同的合并规则。LARV自适应地抑制浅层干扰和放大深层对齐使用一个简单的确定性的时间表，不需要重新训练或修改现有的合并。据我们所知，这是第一个为任务向量合并执行层感知缩放的工作。LARV计算简单的无数据层代理，并通过轻量级规则将其转换为尺度;我们研究了一个框架内的几个实例（例如，具有固定值或连续标测的分层两/三级缩放），并显示分层选择提供最佳鲁棒性，而连续标测保持消融。LARV与基本合并正交，增加的成本可以忽略不计。在带有Vision Transformers的FusionBench上，LARV在8/14/20任务设置中持续改进所有任务向量基线;例如，Iso-C + LARV在ViT-B/32上达到85.9%，在ViT-B/16上达到89.2%，在ViT-L/14上达到92.6%。分层分析和腐败测试进一步表明，LARV抑制浅层干扰，同时适度放大更深层次的，任务稳定的功能，把模型合并成一个强大的，层意识的过程，而不是一个统一的。
摘要：Model merging aims to combine multiple fine-tuned models into a single multi-task model without access to training data. Existing task-vector merging methods such as TIES, TSV-M, and Iso-C/CTS differ in their aggregation rules but treat all layers nearly uniformly. This assumption overlooks the strong layer-wise heterogeneity in large vision transformers, where shallow layers are sensitive to interference while deeper layers encode stable task-specific features. We introduce LARV, a training-free, data-free, merger-agnostic Layer-wise Adaptive Rescaling Veneer that plugs into any task-vector merger and assigns a per-layer scale to each task vector before aggregation, and show it consistently boosts diverse merging rules. LARV adaptively suppresses shallow-layer interference and amplifies deeper-layer alignment using a simple deterministic schedule, requiring no retraining or modification to existing mergers. To our knowledge, this is the first work to perform layer-aware scaling for task-vector merging. LARV computes simple data-free layer proxies and turns them into scales through a lightweight rule; we study several instantiations within one framework (e.g., tiered two/three-level scaling with fixed values, or continuous mappings) and show that tiered choices offer the best robustness, while continuous mappings remain an ablation. LARV is orthogonal to the base merger and adds negligible cost. On FusionBench with Vision Transformers, LARV consistently improves all task-vector baselines across 8/14/20-task settings; for example, Iso-C + LARV reaches 85.9% on ViT-B/32, 89.2% on ViT-B/16, and 92.6% on ViT-L/14. Layerwise analysis and corruption tests further indicate that LARV suppresses shallow-layer interference while modestly amplifying deeper, task-stable features, turning model merging into a robust, layer-aware procedure rather than a uniform one.

【4】Importance inversion transfer identifies shared principles for cross-domain learning
标题：重要性倒置转移确定了跨领域学习的共同原则
链接：https://arxiv.org/abs/2602.09116

作者：Daniele Caligiore
摘要：跨科学领域转让知识的能力依赖于共同的组织原则。然而，现有的迁移学习方法往往无法弥合根本异构的系统，特别是在严重的数据稀缺或随机噪声。这项研究正式可解释的跨域迁移学习（X-CDTL），一个框架，统一网络科学和可解释的人工智能，以确定结构不变量，概括跨生物，语言，分子和社交网络。通过引入重要性反转转移（IIT）机制，该框架优先考虑域不变的结构锚，而不是特殊的、高度区分的特征。在异常检测任务中，由这些原则指导的模型实现了显着的性能增益-在极端噪声下的决策稳定性相对提高了56%-超过传统基线。这些结果提供了跨异构域的共享组织签名的证据，建立了跨学科知识传播的原则范式。通过从不透明的潜在表示转变为明确的结构规律，这项工作推动了机器学习成为科学发现的强大引擎。
摘要：The capacity to transfer knowledge across scientific domains relies on shared organizational principles. However, existing transfer-learning methodologies often fail to bridge radically heterogeneous systems, particularly under severe data scarcity or stochastic noise. This study formalizes Explainable Cross-Domain Transfer Learning (X-CDTL), a framework unifying network science and explainable artificial intelligence to identify structural invariants that generalize across biological, linguistic, molecular, and social networks. By introducing the Importance Inversion Transfer (IIT) mechanism, the framework prioritizes domain-invariant structural anchors over idiosyncratic, highly discriminative features. In anomaly detection tasks, models guided by these principles achieve significant performance gains - exhibiting a 56\% relative improvement in decision stability under extreme noise - over traditional baselines. These results provide evidence for a shared organizational signature across heterogeneous domains, establishing a principled paradigm for cross-disciplinary knowledge propagation. By shifting from opaque latent representations to explicit structural laws, this work advances machine learning as a robust engine for scientific discovery.

【5】SAQNN: Spectral Adaptive Quantum Neural Network as a Universal Approximator
标题：SAQNN：作为通用逼近器的谱自适应量子神经网络
链接：https://arxiv.org/abs/2602.09718

作者：Jialiang Tang,Jialin Zhang,Xiaoming Sun
摘要：量子机器学习（QML）作为连接量子计算和机器学习的跨学科领域，近年来受到了极大的关注。目前，由于量子神经网络（QNN）表达能力的理论基础不完整，整个领域面临挑战。在本文中，我们提出了一个建设性的QNN模型，并证明了它具有普遍的逼近性质（UAP），这意味着它可以近似任何平方可积函数到任意精度。此外，它还支持切换函数基，从而适应数值逼近和机器学习中的各种场景。我们的模型具有渐近优势，最好的经典前馈神经网络的电路规模，并达到最佳的参数复杂性时，近似Sobolev函数下$L_2$范数。
摘要：Quantum machine learning (QML), as an interdisciplinary field bridging quantum computing and machine learning, has garnered significant attention in recent years. Currently, the field as a whole faces challenges due to incomplete theoretical foundations for the expressivity of quantum neural networks (QNNs). In this paper we propose a constructive QNN model and demonstrate that it possesses the universal approximation property (UAP), which means it can approximate any square-integrable function up to arbitrary accuracy. Furthermore, it supports switching function bases, thus adaptable to various scenarios in numerical approximation and machine learning. Our model has asymptotic advantages over the best classical feed-forward neural networks in terms of circuit size and achieves optimal parameter complexity when approximating Sobolev functions under $L_2$ norm.

强化学习(10篇)

【1】Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning
标题：Agent World模型：用于扩展强化学习的无限合成环境
链接：https://arxiv.org/abs/2602.10090

作者：Zhaoyang Wang,Canwen Xu,Boyi Liu,Yite Wang,Siwei Han,Zhewei Yao,Huaxiu Yao,Yuxiong He
备注：41 pages
摘要：大型语言模型（LLM）的最新进展使自主智能体能够执行需要与工具和环境进行多轮交互的复杂任务。然而，由于缺乏多样化和可靠的环境，这种代理培训的规模受到限制。在本文中，我们提出了代理世界模型（AWM），一个完全合成的环境生成管道。使用这个管道，我们可以扩展到1,000个覆盖日常场景的环境，在这些环境中，代理可以与丰富的工具集（平均每个环境35个工具）进行交互，并获得高质量的观察结果。值得注意的是，这些环境是代码驱动的，并由数据库支持，比LLM模拟的环境提供更可靠和一致的状态转换。此外，与从现实环境中收集轨迹相比，它们能够实现更有效的代理交互。为了证明这一资源的有效性，我们对多转向工具使用代理进行了大规模强化学习。由于完全可执行的环境和可访问的数据库状态，我们还可以设计可靠的奖励函数。在三个基准测试上的实验表明，只在合成环境中进行训练，而不是在特定于基准测试的环境中进行训练，会产生很强的分布外泛化能力。该代码可在https://github.com/Snowflake-Labs/agent-world-model上获得。
摘要：Recent advances in large language model (LLM) have empowered autonomous agents to perform complex tasks that require multi-turn interactions with tools and environments. However, scaling such agent training is limited by the lack of diverse and reliable environments. In this paper, we propose Agent World Model (AWM), a fully synthetic environment generation pipeline. Using this pipeline, we scale to 1,000 environments covering everyday scenarios, in which agents can interact with rich toolsets (35 tools per environment on average) and obtain high-quality observations. Notably, these environments are code-driven and backed by databases, providing more reliable and consistent state transitions than environments simulated by LLMs. Moreover, they enable more efficient agent interaction compared with collecting trajectories from realistic environments. To demonstrate the effectiveness of this resource, we perform large-scale reinforcement learning for multi-turn tool-use agents. Thanks to the fully executable environments and accessible database states, we can also design reliable reward functions. Experiments on three benchmarks show that training exclusively in synthetic environments, rather than benchmark-specific ones, yields strong out-of-distribution generalization. The code is available at https://github.com/Snowflake-Labs/agent-world-model.

【2】Optimistic World Models: Efficient Exploration in Model-Based Deep Reinforcement Learning
标题：乐观世界模型：基于模型的深度强化学习的有效探索
链接：https://arxiv.org/abs/2602.10044

作者：Akshay Mete,Shahid Aamir Sheikh,Tzu-Hsiang Lin,Dileep Kalathil,P. R. Kumar
摘要：有效的探索仍然是强化学习（RL）的核心挑战，特别是在稀疏奖励环境中。我们引入了乐观世界模型（OWM），这是一个用于乐观探索的原则性和可扩展的框架，它将自适应控制中的经典奖励偏置最大似然估计（RBMLE）引入深度RL。与上置信限（UCB）风格的探索方法相比，OWM通过增加乐观动态损失将乐观直接纳入模型学习，使想象的过渡偏向更高回报的结果。这种完全基于梯度的损失既不需要不确定性估计，也不需要约束优化。我们的方法是即插即用与现有的世界模型框架，保持可扩展性，同时只需要最小的修改，标准的训练过程。我们在两个最先进的世界模型架构中实例化了OWM，从而产生了Optimistic DreamerV3和Optimistic STORM，与基线模型相比，它们在样本效率和累积回报方面都有显着提高。
摘要：Efficient exploration remains a central challenge in reinforcement learning (RL), particularly in sparse-reward environments. We introduce Optimistic World Models (OWMs), a principled and scalable framework for optimistic exploration that brings classical reward-biased maximum likelihood estimation (RBMLE) from adaptive control into deep RL. In contrast to upper confidence bound (UCB)-style exploration methods, OWMs incorporate optimism directly into model learning by augmentation with an optimistic dynamics loss that biases imagined transitions toward higher-reward outcomes. This fully gradient-based loss requires neither uncertainty estimates nor constrained optimization. Our approach is plug-and-play with existing world model frameworks, preserving scalability while requiring only minimal modifications to standard training procedures. We instantiate OWMs within two state-of-the-art world model architectures, leading to Optimistic DreamerV3 and Optimistic STORM, which demonstrate significant improvements in sample efficiency and cumulative return compared to their baseline counterparts.

【3】ADORA: Training Reasoning Models with Dynamic Advantage Estimation on Reinforcement Learning
标题：ADORA：具有强化学习动态优势估计的训练推理模型
链接：https://arxiv.org/abs/2602.10019

作者：Qingnan Ren,Shiting Huang,Zhen Fang,Zehui Chen,Lin Chen,Lijun Li,Feng Zhao
摘要：强化学习已经成为在复杂任务中开发推理模型的基础技术，从数学问题解决到想象推理。这些模型的优化通常依赖于政策梯度方法，其有效性取决于对优势函数的准确估计。然而，流行的方法通常采用静态优势估计，这种做法会导致效率低下的信用分配，忽略了训练样本随时间的动态效用。这种限制导致次优的策略更新，这反过来又表现为收敛速度较慢和学习不稳定性增加，因为模型无法有效地适应不断变化的样本效用。为了解决这个问题，我们引入了\textbf{ADORA}（\textbf{A}dvantage \textbf{D}策略通过\textbf{O}nline \textbf{R}ollout \textbf{A}daptation），一个新的策略优化框架。ADORA通过自适应地将训练数据分类为暂时有利和不利的样本来动态调整优势函数的权重，基于在线模型推出期间它们的演变效用。这种定制的数据差异化策略允许ADORA无缝集成到现有的策略优化算法中，而无需进行重大的架构修改，从而使策略能够优先考虑从更多信息的经验中学习，从而实现更有效的策略更新。在不同的模型系列和不同的数据规模的广泛评估表明，ADORA是一个强大的和有效的框架。它显著增强了几何和数学任务中的长时间推理，在不需要敏感的超参数调整的情况下始终实现显着的性能增益。
摘要：Reinforcement learning has become a cornerstone technique for developing reasoning models in complex tasks, ranging from mathematical problem-solving to imaginary reasoning. The optimization of these models typically relies on policy gradient methods, whose efficacy hinges on the accurate estimation of an advantage function. However, prevailing methods typically employ static advantage estimation, a practice that leads to inefficient credit assignment by neglecting the dynamic utility of training samples over time. This limitation results in suboptimal policy updates, which in turn manifest as slower convergence rates and increased learning instability, as models fail to adapt to evolving sample utilities effectively. To address this problem, we introduce \textbf{ADORA} (\textbf{A}dvantage \textbf{D}ynamics via \textbf{O}nline \textbf{R}ollout \textbf{A}daptation), a novel framework for policy optimization. ADORA dynamically adjusts the advantage function's weighting by adaptively categorizing training data into temporarily advantageous and disadvantageous samples, based on their evolving utility during online model rollouts. This tailored data differentiation strategy allows ADORA to be seamlessly integrated into existing policy optimization algorithms without significant architectural modifications, enabling the policy to prioritize learning from more informative experiences and thereby achieve more efficient policy updates. Extensive evaluations across diverse model families and varying data scales demonstrate that ADORA is a robust and efficient framework. It significantly enhances long reasoning in both geometric and mathematical tasks, consistently achieving notable performance gains without requiring sensitive hyperparameter tuning.

【4】Answer First, Reason Later: Aligning Search Relevance via Mode-Balanced Reinforcement Learning
标题：先回答，后推理：通过模式平衡强化学习调整搜索相关性
链接：https://arxiv.org/abs/2602.10006

作者：Shijie Zhang,Xiang Guo,Rujun Guo,Shaoyu Liu,Xiaozhao Wang,Guanjun Jiang,Kevin Zhang
摘要：构建一个既能实现低延迟又能实现高性能的搜索相关性模型是搜索行业的一个长期挑战。为了满足在线系统的毫秒级响应要求，同时保留大语言模型（LLM）的可解释推理痕迹，我们提出了一种新的\textbf{先推理，后推理（AFRL）}范式。这种范式要求模型在第一个标记中输出确定的相关性分数，然后是结构化的逻辑解释。受推理模型成功的启发，我们采用了“监督微调（SFT）+强化学习（RL）”的流水线来实现AFRL。然而，直接应用现有的RL训练通常会导致搜索相关性任务中的\textbf{模式崩溃}，其中模型为了追求高回报而忘记了复杂的长尾规则。从信息论的角度来看：RL固有地最小化\textbf{反向KL发散}，这往往会寻求概率峰值（模式寻求），并倾向于“奖励黑客”。另一方面，SFT最小化了\textbf{Forward KL divergence}，迫使模型覆盖数据分布（模式覆盖）并有效地锚定专家规则。基于这一认识，我们提出了一种\textbf{Mode-Balanced Optimization}策略，将SFT辅助损失纳入Stepwise-GRPO训练以平衡这两个属性。此外，我们建立了一个自动化的教学进化系统和多阶段的课程，以确保专家级的数据质量。大量的实验表明，我们的32 B教师模型达到了最先进的性能。此外，AFRL架构实现了高效的知识蒸馏，成功地将专家级逻辑转换为0.6B模型，从而协调推理深度与部署延迟。
摘要：Building a search relevance model that achieves both low latency and high performance is a long-standing challenge in the search industry. To satisfy the millisecond-level response requirements of online systems while retaining the interpretable reasoning traces of Large Language Models (LLMs), we propose a novel \textbf{Answer-First, Reason Later (AFRL)} paradigm. This paradigm requires the model to output the definitive relevance score in the very first token, followed by a structured logical explanation. Inspired by the success of reasoning models, we adopt a "Supervised Fine-Tuning (SFT) + Reinforcement Learning (RL)" pipeline to achieve AFRL. However, directly applying existing RL training often leads to \textbf{mode collapse} in the search relevance task, where the model forgets complex long-tail rules in pursuit of high rewards. From an information theory perspective: RL inherently minimizes the \textbf{Reverse KL divergence}, which tends to seek probability peaks (mode-seeking) and is prone to "reward hacking." On the other hand, SFT minimizes the \textbf{Forward KL divergence}, forcing the model to cover the data distribution (mode-covering) and effectively anchoring expert rules. Based on this insight, we propose a \textbf{Mode-Balanced Optimization} strategy, incorporating an SFT auxiliary loss into Stepwise-GRPO training to balance these two properties. Furthermore, we construct an automated instruction evolution system and a multi-stage curriculum to ensure expert-level data quality. Extensive experiments demonstrate that our 32B teacher model achieves state-of-the-art performance. Moreover, the AFRL architecture enables efficient knowledge distillation, successfully transferring expert-level logic to a 0.6B model, thereby reconciling reasoning depth with deployment latency.

【5】Squeezing More from the Stream : Learning Representation Online for Streaming Reinforcement Learning
标题：从流中榨取更多内容：流强化学习的在线学习表示
链接：https://arxiv.org/abs/2602.09396

作者：Nilaksh,Antoine Clavaud,Mathieu Reymond,François Rivest,Sarath Chandar
备注：8 pages, 4 figures
摘要：在流强化学习（RL）中，在一次更新后立即观察并丢弃转换。虽然这最大限度地减少了设备上应用程序的资源使用，但它使代理的样本效率低下，因为仅基于值的损失就难以从瞬态数据中提取有意义的表示。我们建议将自预测表示（SPR）扩展到流式传输管道，以最大限度地利用每个观察到的帧。然而，由于流机制引起的高度相关的样本，天真地应用这种辅助损失会导致训练不稳定。因此，我们引入相对于动量目标的正交梯度更新，并解决特定于流的优化器所产生的梯度冲突。经过Atari、MinAtar和Octax套件的验证，我们的方法系统地优于现有的流媒体基准。潜在空间分析，包括t-SNE可视化和有效秩测量，证实了我们的方法学习更丰富的表示，弥补了由于没有重放缓冲区而造成的性能差距，同时保持足够的效率，仅在几个CPU内核上进行训练。
摘要：In streaming Reinforcement Learning (RL), transitions are observed and discarded immediately after a single update. While this minimizes resource usage for on-device applications, it makes agents notoriously sample-inefficient, since value-based losses alone struggle to extract meaningful representations from transient data. We propose extending Self-Predictive Representations (SPR) to the streaming pipeline to maximize the utility of every observed frame. However, due to the highly correlated samples induced by the streaming regime, naively applying this auxiliary loss results in training instabilities. Thus, we introduce orthogonal gradient updates relative to the momentum target and resolve gradient conflicts arising from streaming-specific optimizers. Validated across the Atari, MinAtar, and Octax suites, our approach systematically outperforms existing streaming baselines. Latent-space analysis, including t-SNE visualizations and effective-rank measurements, confirms that our method learns significantly richer representations, bridging the performance gap caused by the absence of a replay buffer, while remaining efficient enough to train on just a few CPU cores.

【6】Latent Poincaré Shaping for Agentic Reinforcement Learning
标题：广义强化学习的潜在庞加莱整形
链接：https://arxiv.org/abs/2602.09375

作者：Hanchen Xia,Baoyou Chen,Zelin Zang,Yutang Ge,Guojiang Zhao,Siyu Zhu
摘要：我们提出了LaPha，一种在庞加莱潜在空间中训练AlphaZero类LLM代理的方法。在LaPha下，搜索过程可以被可视化为一棵树，它以提示为根，从原点向庞加莱球的边界向外生长，其中负曲率提供了随半径呈指数增长的容量。利用双曲测地距离进行规则验证，定义了节点势，并通过势差分配稠密进程奖励。我们进一步在相同的共享潜在空间上附加一个轻量级的价值头，从而实现自我引导的测试时间扩展，而几乎没有额外的开销。在MATH-500上，LaPha将Qwen2.5-Math-1.5B从66.0%提高到88.2%。通过数值头引导搜索，LaPha-1.5B在AIME'24上达到了56.7%的准确率，LaPha-7 B在AIME'24上进一步达到了60.0%，在AIME'25上达到了53.3%。
摘要：We propose LaPha, a method for training AlphaZero-like LLM agents in a Poincaré latent space. Under LaPha, the search process can be visualized as a tree rooted at the prompt and growing outward from the origin toward the boundary of the Poincaré ball, where negative curvature provides exponentially increasing capacity with radius. Using hyperbolic geodesic distance to rule-verified correctness, we define a node potential and assign dense process rewards by potential differences. We further attach a lightweight value head on the same shared latent space, enabling self-guided test-time scaling with almost no additional overhead. On MATH-500, LaPha improves Qwen2.5-Math-1.5B from 66.0% to 88.2%. With value-head-guided search, LaPha-1.5B reaches 56.7% accuracy on AIME'24, and LaPha-7B further achieves 60.0% on AIME'24 and 53.3% on AIME'25.

【7】Risk-sensitive reinforcement learning using expectiles, shortfall risk and optimized certainty equivalent risk
标题：使用预期、短缺风险和优化确定性等效风险的风险敏感强化学习
链接：https://arxiv.org/abs/2602.09300

作者：Sumedh Gupte,Shrey Rakeshkumar Patel,Soumen Pachal,Prashanth L. A.,Sanjay P. Bhat
摘要：我们提出了风险敏感的强化学习算法，以满足三个家庭的风险措施，即预期，基于效用的短缺风险和优化的确定性等效风险。对于每个风险度量，在有限时域马尔可夫决策过程的背景下，我们首先推导出一个政策梯度定理。其次，我们提出了上述风险度量的风险敏感政策梯度的估计量，并为我们的估计量建立了$\mathcal{O}\left（1/m\right）$均方误差界，其中$m$是轨迹的数量。此外，在政策梯度型算法的标准假设下，我们建立了风险敏感目标的平滑性，进而导致我们提出的整体风险敏感政策梯度算法的稳定收敛速度界。最后，我们进行数值实验，以验证流行的RL基准的理论研究结果。
摘要：We propose risk-sensitive reinforcement learning algorithms catering to three families of risk measures, namely expectiles, utility-based shortfall risk and optimized certainty equivalent risk. For each risk measure, in the context of a finite horizon Markov decision process, we first derive a policy gradient theorem. Second, we propose estimators of the risk-sensitive policy gradient for each of the aforementioned risk measures, and establish $\mathcal{O}\left(1/m\right)$ mean-squared error bounds for our estimators, where $m$ is the number of trajectories. Further, under standard assumptions for policy gradient-type algorithms, we establish smoothness of the risk-sensitive objective, in turn leading to stationary convergence rate bounds for the overall risk-sensitive policy gradient algorithm that we propose. Finally, we conduct numerical experiments to validate the theoretical findings on popular RL benchmarks.

【8】CausalGDP: Causality-Guided Diffusion Policies for Reinforcement Learning
标题：Caidual GDP：Caidual引导的强化学习扩散政策
链接：https://arxiv.org/abs/2602.09207

作者：Xiaofeng Xiao,Xiao Hu,Yang Ye,Xubo Yue
摘要：强化学习（RL）在广泛的序列决策问题中取得了显着的成功。最近的基于扩散的策略通过对复杂的高维动作分布进行建模来进一步改进RL。然而，现有的扩散政策主要依赖于统计协会，并没有明确说明国家，行动和奖励之间的因果关系，限制了他们的能力，以确定哪些行动组件真正导致高回报。在本文中，我们提出了因果导向扩散策略（Causality Guided Diffusion Policy，Causality GDP），这是一个将因果推理集成到基于扩散的RL中的统一框架。Causal GDP首先从离线数据中学习基本扩散策略和初始因果动态模型，捕获状态，动作和奖励之间的因果依赖关系。在实时交互过程中，因果信息不断更新并作为指导信号整合，以引导扩散过程朝向因果影响未来状态和奖励的行动。通过明确考虑因果关系而不是关联，Causal GDP将政策优化的重点放在真正推动绩效改善的行动组件上。实验结果表明，Causal GDP始终实现竞争力或优于最先进的基于扩散和离线RL方法的性能，特别是在复杂的，高维的控制任务。
摘要：Reinforcement learning (RL) has achieved remarkable success in a wide range of sequential decision-making problems. Recent diffusion-based policies further improve RL by modeling complex, high-dimensional action distributions. However, existing diffusion policies primarily rely on statistical associations and fail to explicitly account for causal relationships among states, actions, and rewards, limiting their ability to identify which action components truly cause high returns. In this paper, we propose Causality-guided Diffusion Policy (CausalGDP), a unified framework that integrates causal reasoning into diffusion-based RL. CausalGDP first learns a base diffusion policy and an initial causal dynamical model from offline data, capturing causal dependencies among states, actions, and rewards. During real-time interaction, the causal information is continuously updated and incorporated as a guidance signal to steer the diffusion process toward actions that causally influence future states and rewards. By explicitly considering causality beyond association, CausalGDP focuses policy optimization on action components that genuinely drive performance improvements. Experimental results demonstrate that CausalGDP consistently achieves competitive or superior performance over state-of-the-art diffusion-based and offline RL methods, especially in complex, high-dimensional control tasks.

【9】EExApp: GNN-Based Reinforcement Learning for Radio Unit Energy Optimization in 5G O-RAN
标题：EExApp：基于GNN的强化学习用于5G O-RAN中无线电单元能量优化
链接：https://arxiv.org/abs/2602.09206

作者：Jie Lu,Peihao Yan,Huacheng Zeng
备注：Accepted by IEEE INFOCOM 2026
摘要：全球部署了超过350万个5G基站，其总能耗（预计每年超过131 TWh）引起了对运营成本和环境影响的严重担忧。在本文中，我们提出了EExAPP，这是一种基于深度强化学习（DRL）的xApp，用于5G开放无线电接入网络（O-RAN），可联合优化无线电单元（RU）睡眠调度和分布式单元（DU）资源切片。EExAPP使用双参与者-双批评者的近端策略优化（PPO）架构，具有专门的参与者-批评者对，目标是能源效率和服务质量（QoS）合规性。基于变换器的编码器通过将所有用户设备（UE）观测编码成固定维度表示来实现可变UE群体的可缩放处理。为了协调这两个优化目标，一个二分图注意力网络（GAT）被用来调制演员更新的基础上，两个评论家的输出，使自适应的节能和QoS之间的权衡。我们已经实施了EExAPP，并将其部署在真实的5G O-RAN测试平台上，包括实时流量、商业RU和智能手机。广泛的空中实验和消融研究证实，EExAPP显着优于现有的方法，在减少能源消耗的RU，同时保持QoS。
摘要：With over 3.5 million 5G base stations deployed globally, their collective energy consumption (projected to exceed 131 TWh annually) raises significant concerns over both operational costs and environmental impacts. In this paper, we present EExAPP, a deep reinforcement learning (DRL)-based xApp for 5G Open Radio Access Network (O-RAN) that jointly optimizes radio unit (RU) sleep scheduling and distributed unit (DU) resource slicing. EExAPP uses a dual-actor-dual-critic Proximal Policy Optimization (PPO) architecture, with dedicated actor-critic pairs targeting energy efficiency and quality-of-service (QoS) compliance. A transformer-based encoder enables scalable handling of variable user equipment (UE) populations by encoding all-UE observations into fixed-dimensional representations. To coordinate the two optimization objectives, a bipartite Graph Attention Network (GAT) is used to modulate actor updates based on both critic outputs, enabling adaptive tradeoffs between power savings and QoS. We have implemented EExAPP and deployed it on a real-world 5G O-RAN testbed with live traffic, commercial RU and smartphones. Extensive over-the-air experiments and ablation studies confirm that EExAPP significantly outperforms existing methods in reducing the energy consumption of RU while maintaining QoS.

【10】Boltzmann Reinforcement Learning for Noise resilience in Analog Ising Machines
标题：Boltzmann强化学习在模拟伊辛机中实现噪音恢复
链接：https://arxiv.org/abs/2602.09162

作者：Aditya Choudhary,Saaketh Desai,Prasad Iyer
摘要：模拟伊辛机（AIM）已经成为一个很有前途的组合优化范例，利用物理动力学来解决伊辛问题，具有高能量效率。然而，传统的优化和采样算法在这些平台上的性能往往受到固有的测量噪声的限制。我们介绍了BRAIN（Boltzmann Reinforcement for Analog Ising Networks），这是一个分布学习框架，它利用变分强化学习来近似Boltzmann分布。通过从逐状态采样转变为跨多个噪声测量聚合信息，BRAIN对AIM的高斯噪声特性具有弹性。我们评估大脑在不同的组合拓扑结构，包括居里-外斯和二维最近邻伊辛系统。我们发现，在真实的3%高斯测量噪声下，BRAIN方法仍能保持98%的基态保真度，而马尔可夫链蒙特卡罗（MCMC）方法的基态保真度下降到51%。此外，在这些条件下，BRAIN达到MCMC等效解决方案的速度快了192倍。BRAIN具有$\mathcal{O}（N^{1.55}）$扩展到65，536个自旋，并保持对高达40%的严重测量不确定性的鲁棒性。除了基态优化，BRAIN还能准确捕捉热力学相变和亚稳态，为在复杂优化中利用模拟计算架构提供可扩展和抗噪声的方法。
摘要：Analog Ising machines (AIMs) have emerged as a promising paradigm for combinatorial optimization, utilizing physical dynamics to solve Ising problems with high energy efficiency. However, the performance of traditional optimization and sampling algorithms on these platforms is often limited by inherent measurement noise. We introduce BRAIN (Boltzmann Reinforcement for Analog Ising Networks), a distribution learning framework that utilizes variational reinforcement learning to approximate the Boltzmann distribution. By shifting from state-by-state sampling to aggregating information across multiple noisy measurements, BRAIN is resilient to Gaussian noise characteristic of AIMs. We evaluate BRAIN across diverse combinatorial topologies, including the Curie-Weiss and 2D nearest-neighbor Ising systems. We find that under realistic 3\% Gaussian measurement noise, BRAIN maintains 98\% ground state fidelity, whereas Markov Chain Monte Carlo (MCMC) methods degrade to 51\% fidelity. Furthermore, BRAIN reaches the MCMC-equivalent solution up to 192x faster under these conditions. BRAIN exhibits $\mathcal{O}(N^{1.55})$ scaling up to 65,536 spins and maintains robustness against severe measurement uncertainty up to 40\%. Beyond ground state optimization, BRAIN accurately captures thermodynamic phase transitions and metastable states, providing a scalable and noise-resilient method for utilizing analog computing architectures in complex optimizations.

医学相关(6篇)

【1】Drug Release Modeling using Physics-Informed Neural Networks
标题：使用物理信息神经网络的药物释放建模
链接：https://arxiv.org/abs/2602.09963

作者：Daanish Aleem Qureshi,Khemraj Shukla,Vikas Srivastava
摘要：精确的药物释放模型对于设计和开发控释系统是必不可少的。经典模型（Fick，Higuchi，Peppas）依赖于简化的假设，这些假设限制了它们在复杂几何形状和释放机制中的准确性。在这里，我们提出了一种新的方法，使用物理信息神经网络（PINNs）和贝叶斯PINNs（BPINNs）预测释放平面，1D褶皱，2D褶皱的薄膜。这种方法独特地集成了菲克扩散定律与有限的实验数据，使准确的长期预测从短期测量，并系统地对经典的药物释放模型进行基准测试。我们将Fick第二定律嵌入PINN中，作为10，000个拉丁超立方体配置点的损失，并利用先前发表的实验数据集通过平均绝对误差（MAE）和均方根误差（RMSE）评估药物释放性能，考虑噪声条件和有限的数据场景。我们的方法相对于所有电影类型的经典基线降低了高达40%的平均误差。PINN制剂仅利用平面膜的前6%的释放时间数据（减少实验所需的94%的释放时间）实现RMSE <0.05。对于起皱和起皱的膜，PINN在33%的释放时间数据中达到RMSE <0.05。BPINN在噪声下提供更紧密和更可靠的不确定性量化。通过将物理定律与实验数据相结合，所提出的框架从短期测量中产生高度准确的长期释放预测，为加速表征和更有效的早期药物释放系统配方提供了一条实用的路线。
摘要：Accurate modeling of drug release is essential for designing and developing controlled-release systems. Classical models (Fick, Higuchi, Peppas) rely on simplifying assumptions that limit their accuracy in complex geometries and release mechanisms. Here, we propose a novel approach using Physics-Informed Neural Networks (PINNs) and Bayesian PINNs (BPINNs) for predicting release from planar, 1D-wrinkled, and 2D-crumpled films. This approach uniquely integrates Fick's diffusion law with limited experimental data to enable accurate long-term predictions from short-term measurements, and is systematically benchmarked against classical drug release models. We embedded Fick's second law into PINN as loss with 10,000 Latin-hypercube collocation points and utilized previously published experimental datasets to assess drug release performance through mean absolute error (MAE) and root mean square error (RMSE), considering noisy conditions and limited-data scenarios. Our approach reduced mean error by up to 40% relative to classical baselines across all film types. The PINN formulation achieved RMSE <0.05 utilizing only the first 6% of the release time data (reducing 94% of release time required for the experiments) for the planar film. For wrinkled and crumpled films, the PINN reached RMSE <0.05 in 33% of the release time data. BPINNs provide tighter and more reliable uncertainty quantification under noise. By combining physical laws with experimental data, the proposed framework yields highly accurate long-term release predictions from short-term measurements, offering a practical route for accelerated characterization and more efficient early-stage drug release system formulation.

【2】Fully-automated sleep staging: multicenter validation of a generalizable deep neural network for Parkinson's disease and isolated REM sleep behavior disorder
标题：全自动睡眠分期：帕金森病和孤立的REM睡眠行为障碍的可推广深度神经网络的多中心验证
链接：https://arxiv.org/abs/2602.09793

作者：Jesper Strøm,Casper Skjærbæk,Natasha Becker Bertelsen,Steffen Torpe Simonsen,Niels Okkels,David Bertram,Sinah Röttgen,Konstantin Kufer,Kaare B. Mikkelsen,Marit Otto,Poul Jørgen Jennum,Per Borghammer,Michael Sommerauer,Preben Kidmose
备注：21 pages excluding supplementary, 9 figures
摘要：孤立性快速眼动睡眠行为障碍（iRBD）是帕金森病（PD）的一个关键前驱标志物，视频多导睡眠图（vPSG）仍然是诊断的金标准。然而，由于EEG异常和碎片化睡眠，手动睡眠分期在神经退行性疾病中特别具有挑战性，使得PSG评估成为大规模部署新RBD筛查技术的瓶颈。我们调整了深度神经网络U-Sleep，用于PD和iRBD的可推广的睡眠分期。基于大型公开的多站点非神经退行性数据集（PUB; 12个站点的19，236个PSG）的预训练U-Sleep模型在来自两个中心的研究数据集上进行了微调（Lundbeck Foundation Parkinson's Disease Research Center（PACE）和脑炎-波恩队列（CBC）; 112 PD，138 iRBD，89年龄匹配对照。在丹麦睡眠医学中心（DCSM; 81例PD，36例iRBD，87例睡眠诊所对照）的独立数据集上评估了所得模型。由第二个盲法人类评分员对人类评分员和模型之间具有低一致性（| k{appa} < 0.6）的PSG子集进行重新评分，以识别不一致的来源。最后，我们应用基于置信度的阈值来优化REM睡眠分期。预训练的模型在PUB中达到平均\k{appa} = 0.81，但直接应用于PACE/CBC时，\k{appa} = 0.66。通过对模型进行微调，我们开发了一个广义模型，PACE/CBC的k{appa} = 0.74（p < 0.001，与预训练模型相比）。在DCSM中，平均值和中位I k{appa}分别从0.60增加到0.64（p < 0.001）和0.64增加到0.69（p < 0.001）。在评分者间研究中，模型和初始评分者之间一致性较低的PSG在人类评分者之间显示出类似的低一致性。应用置信阈值将正确识别REM睡眠时期的比例从85%增加到95.5%，同时为95%的受试者保留足够的（> 5分钟）REM睡眠。
摘要：Isolated REM sleep behavior disorder (iRBD) is a key prodromal marker of Parkinson's disease (PD), and video-polysomnography (vPSG) remains the diagnostic gold standard. However, manual sleep staging is particularly challenging in neurodegenerative diseases due to EEG abnormalities and fragmented sleep, making PSG assessments a bottleneck for deploying new RBD screening technologies at scale. We adapted U-Sleep, a deep neural network, for generalizable sleep staging in PD and iRBD. A pretrained U-Sleep model, based on a large publicly available, multisite non-neurodegenerative dataset (PUB; 19,236 PSGs across 12 sites), was fine-tuned on research datasets from two centers (Lundbeck Foundation Parkinson's Disease Research Center (PACE) and the Cologne-Bonn Cohort (CBC); 112 PD, 138 iRBD, 89 age-matched controls. The resulting model was evaluated on an independent dataset from the Danish Center for Sleep Medicine (DCSM; 81 PD, 36 iRBD, 87 sleep-clinic controls). A subset of PSGs with low agreement between the human rater and the model (\k{appa} < 0.6) was re-scored by a second blinded human rater to identify sources of disagreement. Finally, we applied confidence-based thresholds to optimize REM sleep staging. The pretrained model achieved mean \k{appa} = 0.81 in PUB, but \k{appa} = 0.66 when applied directly to PACE/CBC. By fine-tuning the model, we developed a generalized model with \k{appa} = 0.74 on PACE/CBC (p < 0.001 vs. the pretrained model). In DCSM, mean and median \k{appa} increased from 0.60 to 0.64 (p < 0.001) and 0.64 to 0.69 (p < 0.001), respectively. In the interrater study, PSGs with low agreement between the model and the initial scorer showed similarly low agreement between human scorers. Applying a confidence threshold increased the proportion of correctly identified REM sleep epochs from 85% to 95.5%, while preserving sufficient (> 5 min) REM sleep for 95% of subjects.

【3】Explainability in Generative Medical Diffusion Models: A Faithfulness-Based Analysis on MRI Synthesis
标题：生成医学扩散模型的可解释性：基于可信度的MRI合成分析
链接：https://arxiv.org/abs/2602.09781

作者：Surjo Dey,Pallabi Saikia
备注：Accepted at 3rd World Congress on Smart Computing (WCSC2026) conference
摘要：本研究探讨了在医学成像的背景下，生成扩散模型的可解释性，重点是磁共振成像（MRI）合成。尽管扩散模型在生成逼真的医学图像方面表现出了很强的性能，但其内部决策过程在很大程度上仍然是不透明的。我们提出了一个基于可信度的可解释性框架，该框架分析了基于原型的可解释性方法，如ProtoPNet（PPNet），增强型ProtoPNet（EPPNet）和ProtoPool如何将生成和训练功能之间的关系联系起来。我们的研究重点是通过扩散模型的去噪轨迹和随后的原型可解释性与忠实性分析来理解图像形成背后的推理。实验分析表明，EPPNet实现了最高的忠诚度（得分为0.1534），提供了更可靠的见解，并可解释的生成过程。研究结果强调，通过基于可信度的解释，扩散模型可以变得更加透明和可信，有助于在医疗保健领域更安全、更可解释的应用。
摘要：This study investigates the explainability of generative diffusion models in the context of medical imaging, focusing on Magnetic resonance imaging (MRI) synthesis. Although diffusion models have shown strong performance in generating realistic medical images, their internal decision making process remains largely opaque. We present a faithfulness-based explainability framework that analyzes how prototype-based explainability methods like ProtoPNet (PPNet), Enhanced ProtoPNet (EPPNet), and ProtoPool can link the relationship between generated and training features. Our study focuses on understanding the reasoning behind image formation through denoising trajectory of diffusion model and subsequently prototype explainability with faithfulness analysis. Experimental analysis shows that EPPNet achieves the highest faithfulness (with score 0.1534), offering more reliable insights, and explainability into the generative process. The results highlight that diffusion models can be made more transparent and trustworthy through faithfulness-based explanations, contributing to safer and more interpretable applications of generative AI in healthcare.

【4】ECG-IMN: Interpretable Mesomorphic Neural Networks for 12-Lead Electrocardiogram Interpretation
标题：ECG-IMN：用于12导心电图解释的可解释介形态神经网络
链接：https://arxiv.org/abs/2602.09566

作者：Vajira Thambawita,Jonas L. Isaksen,Jørgen K. Kanters,Hugo L. Hammer,Pål Halvorsen
摘要：深度学习在自动心电图（ECG）诊断方面已经达到了专家级的性能，但这些模型的“黑匣子”性质阻碍了它们的临床部署。对医疗人工智能的信任不仅需要高准确性，还需要关于驱动预测的特定生理特征的透明度。ECG的现有可解释性方法通常依赖于事后近似（例如，Grad-CAM和SHAP），这可能是不稳定的，计算成本高，并且不忠于模型的实际决策过程。在这项工作中，我们提出了ECG-IMN，一个可解释的Mesomorphic神经网络定制的高分辨率12导联心电图分类。与标准分类器不同，ECG-IMN作为超网络发挥作用：深度卷积骨干生成特定于每个输入样本的严格线性模型的参数。这种架构增强了内在的可解释性，因为决策逻辑在数学上是透明的，并且生成的权重（W）用作精确的高分辨率特征属性图。我们引入了一个转换解码器，它有效地将潜在特征映射到样本权重，从而实现病理证据的精确定位（例如，ST段抬高、T波倒置）。我们在PTB-XL数据集上评估了我们的方法用于分类任务，证明ECG-IMN实现了具有竞争力的预测性能（AUROC与黑盒基线相当），同时提供了忠实的，特定于实例的解释。通过明确地将参数生成与预测执行解耦，我们的框架弥合了深度学习能力和临床可信度之间的差距，为“白盒”心脏诊断提供了一条原则性的道路。
摘要：Deep learning has achieved expert-level performance in automated electrocardiogram (ECG) diagnosis, yet the "black-box" nature of these models hinders their clinical deployment. Trust in medical AI requires not just high accuracy but also transparency regarding the specific physiological features driving predictions. Existing explainability methods for ECGs typically rely on post-hoc approximations (e.g., Grad-CAM and SHAP), which can be unstable, computationally expensive, and unfaithful to the model's actual decision-making process. In this work, we propose the ECG-IMN, an Interpretable Mesomorphic Neural Network tailored for high-resolution 12-lead ECG classification. Unlike standard classifiers, the ECG-IMN functions as a hypernetwork: a deep convolutional backbone generates the parameters of a strictly linear model specific to each input sample. This architecture enforces intrinsic interpretability, as the decision logic is mathematically transparent and the generated weights (W) serve as exact, high-resolution feature attribution maps. We introduce a transition decoder that effectively maps latent features to sample-wise weights, enabling precise localization of pathological evidence (e.g., ST-elevation, T-wave inversion) in both time and lead dimensions. We evaluate our approach on the PTB-XL dataset for classification tasks, demonstrating that the ECG-IMN achieves competitive predictive performance (AUROC comparable to black-box baselines) while providing faithful, instance-specific explanations. By explicitly decoupling parameter generation from prediction execution, our framework bridges the gap between deep learning capability and clinical trustworthiness, offering a principled path toward "white-box" cardiac diagnostics.

【5】X-Mark: Saliency-Guided Robust Dataset Ownership Verification for Medical Imaging
标题：X-Mark：医学成像的显着性引导稳健数据集所有权验证
链接：https://arxiv.org/abs/2602.09284

作者：Pranav Kulkarni,Junfeng Guo,Heng Huang
摘要：高质量的医学成像数据集对于训练深度学习模型至关重要，但其未经授权的使用引发了严重的版权和伦理问题。医学成像对现有的为自然图像设计的数据集所有权验证方法提出了独特的挑战，因为在固定尺度图像中生成的静态水印图案在保持诊断质量的同时，对具有有限视觉多样性和精细解剖结构的动态和高分辨率扫描的缩放效果较差。在本文中，我们提出了X标记，一个样本特定的清洁标签水印方法的胸部X射线版权保护。具体来说，X-Mark使用条件U网在每个样本的显著区域内生成独特的扰动。我们设计了一个多分量的训练目标，以确保水印的功效，对动态缩放过程的鲁棒性，同时保持诊断质量和视觉可识别性。我们将拉普拉斯正则化纳入我们的训练目标，以惩罚高频扰动并实现水印的尺度不变性。所有权验证在黑盒设置中执行，以检测可疑模型中的特征行为。CheXpert上的大量实验验证了X-Mark的有效性，实现了100%的WSR，并将Ind-M场景中的误报概率降低了12%，同时证明了对潜在自适应攻击的抵抗力。
摘要：High-quality medical imaging datasets are essential for training deep learning models, but their unauthorized use raises serious copyright and ethical concerns. Medical imaging presents a unique challenge for existing dataset ownership verification methods designed for natural images, as static watermark patterns generated in fixed-scale images scale poorly dynamic and high-resolution scans with limited visual diversity and subtle anatomical structures, while preserving diagnostic quality. In this paper, we propose X-Mark, a sample-specific clean-label watermarking method for chest x-ray copyright protection. Specifically, X-Mark uses a conditional U-Net to generate unique perturbations within salient regions of each sample. We design a multi-component training objective to ensure watermark efficacy, robustness against dynamic scaling processes while preserving diagnostic quality and visual-distinguishability. We incorporate Laplacian regularization into our training objective to penalize high-frequency perturbations and achieve watermark scale-invariance. Ownership verification is performed in a black-box setting to detect characteristic behaviors in suspicious models. Extensive experiments on CheXpert verify the effectiveness of X-Mark, achieving WSR of 100% and reducing probability of false positives in Ind-M scenario by 12%, while demonstrating resistance to potential adaptive attacks.

【6】Predicting Gene Disease Associations in Type 2 Diabetes Using Machine Learning on Single-Cell RNA-Seq Data
标题：使用单细胞RN-Seq数据的机器学习预测2型糖尿病的基因疾病关联
链接：https://arxiv.org/abs/2602.09036

作者：Maria De La Luz Lomboy Toledo,Daniel Onah
备注：11 pages, 7 figures. Preprint
摘要：糖尿病是一种慢性代谢紊乱，其特征在于由于胰岛素产生或功能受损而导致血糖水平升高。主要有两种形式：1型糖尿病（T1 D），其涉及产生胰岛素的β-细胞的自身免疫性破坏，以及2型糖尿病（T2 D），其由胰岛素抵抗和进行性β-细胞功能障碍引起。了解这些疾病的分子机制是必不可少的改进的治疗策略，特别是那些针对β-细胞功能障碍的发展。为了在受控和生物学可解释的环境中研究这些机制，小鼠模型在糖尿病研究中发挥了核心作用。由于它们与人类的遗传和生理相似性，以及精确操纵其基因组的能力，小鼠能够详细研究疾病进展和基因功能。特别是，小鼠模型提供了重要的见解，β-细胞的发展，细胞异质性，和功能衰竭糖尿病条件下。在这些实验进展的基础上，本研究将机器学习方法应用于小鼠胰岛的单细胞转录组数据。具体而言，我们评估了文献中确定的两种监督方法：额外树分类器（ETC）和偏最小二乘判别分析（PLS-DA），以评估它们在单细胞分辨率下识别T2 D相关基因表达特征的能力。使用标准分类指标评估模型性能，重点是可解释性和生物相关性
摘要：Diabetes is a chronic metabolic disorder characterized by elevated blood glucose levels due to impaired insulin production or function. Two main forms are recognized: type 1 diabetes (T1D), which involves autoimmune destruction of insulin-producing \b{eta}-cells, and type 2 diabetes (T2D), which arises from insulin resistance and progressive \b{eta}-cell dysfunction. Understanding the molecular mechanisms underlying these diseases is essential for the development of improved therapeutic strategies, particularly those targeting \b{eta}-cell dysfunction. To investigate these mechanisms in a controlled and biologically interpretable setting, mouse models have played a central role in diabetes research. Owing to their genetic and physiological similarity to humans, together with the ability to precisely manipulate their genome, mice enable detailed investigation of disease progression and gene function. In particular, mouse models have provided critical insights into \b{eta}-cell development, cellular heterogeneity, and functional failure under diabetic conditions. Building on these experimental advances, this study applies machine learning methods to single-cell transcriptomic data from mouse pancreatic islets. Specifically, we evaluate two supervised approaches identified in the literature; Extra Trees Classifier (ETC) and Partial Least Squares Discriminant Analysis (PLS-DA), to assess their ability to identify T2D-associated gene expression signatures at single-cell resolution. Model performance is evaluated using standard classification metrics, with an emphasis on interpretability and biological relevance

蒸馏|知识提取(2篇)

【1】Life Cycle-Aware Evaluation of Knowledge Distillation for Machine Translation: Environmental Impact and Translation Quality Trade-offs
标题：机器翻译知识提炼的生命周期感知评估：环境影响和翻译质量权衡
链接：https://arxiv.org/abs/2602.09691

作者：Joseph Attieh,Timothee Mickus,Anne-Laure Ligozat,Aurélie Névéol,Jörg Tiedemann
摘要：知识蒸馏（KD）是一种将一个较大的系统（教师）压缩成一个较小的系统（学生）的工具。在机器翻译中，研究通常只报告学生的翻译质量，而忽略了执行KD的计算复杂性，这使得在计算引起的约束下难以在许多可用的KD选择中进行选择。在这项研究中，我们通过考虑翻译质量和计算成本来评估代表性的KD方法。我们使用机器学习生命周期评估（MLCA）工具将计算成本表示为碳足迹。该评估考虑了整个KD模型生命周期（教师培训，蒸馏和推理）的运行时操作排放和摊销硬件生产成本。我们发现，（i）蒸馏开销在小部署量下占主导地位，（ii）推理在规模上占主导地位，使得KD仅在超出任务依赖的使用阈值时才有益，以及（iii）单词级蒸馏通常比序列级蒸馏提供更有利的足迹质量权衡。我们的协议提供了可重复的指导选择KD方法明确的质量和计算诱导的约束。
摘要：Knowledge distillation (KD) is a tool to compress a larger system (teacher) into a smaller one (student). In machine translation, studies typically report only the translation quality of the student and omit the computational complexity of performing KD, making it difficult to select among the many available KD choices under compute-induced constraints. In this study, we evaluate representative KD methods by considering both translation quality and computational cost. We express computational cost as a carbon footprint using the machine learning life cycle assessment (MLCA) tool. This assessment accounts for runtime operational emissions and amortized hardware production costs throughout the KD model life cycle (teacher training, distillation, and inference). We find that (i) distillation overhead dominates the total footprint at small deployment volumes, (ii) inference dominates at scale, making KD beneficial only beyond a task-dependent usage threshold, and (iii) word-level distillation typically offers more favorable footprint-quality trade-offs than sequence-level distillation. Our protocol provides reproducible guidance for selecting KD methods under explicit quality and compute-induced constraints.

【2】Linear Model Extraction via Factual and Counterfactual Queries
标题：通过事实和反事实数据库提取线性模型
链接：https://arxiv.org/abs/2602.09748

作者：Daan Otto,Jannis Kurtz,Dick den Hertog,Ilker Birbil
摘要：在模型提取攻击中，目标是通过查询模型中的一组选定数据点来揭示黑盒机器学习模型的参数。由于对解释的需求不断增加，除了通常考虑的事实查询之外，这可能涉及反事实查询。在这项工作中，我们考虑线性模型和三种类型的查询：事实，反事实和强大的反事实。首先，对于任意一组查询，我们推导出新的数学公式的分类区域的决定未知的模型是已知的，而不恢复任何模型参数。其次，我们推导出在任意基于范数的距离下提取模型参数（鲁棒）反事实查询所需的查询数量的界限。我们表明，完整的模型可以恢复使用一个单一的反事实查询时，采用可微的距离措施。相反，当使用多面体距离时，所需查询的数量随着数据空间的维度线性增长。对于健壮的反事实，后者的查询数量加倍。因此，应用的距离函数和反事实的鲁棒性对模型的安全性有很大的影响。
摘要：In model extraction attacks, the goal is to reveal the parameters of a black-box machine learning model by querying the model for a selected set of data points. Due to an increasing demand for explanations, this may involve counterfactual queries besides the typically considered factual queries. In this work, we consider linear models and three types of queries: factual, counterfactual, and robust counterfactual. First, for an arbitrary set of queries, we derive novel mathematical formulations for the classification regions for which the decision of the unknown model is known, without recovering any of the model parameters. Second, we derive bounds on the number of queries needed to extract the model's parameters for (robust) counterfactual queries under arbitrary norm-based distances. We show that the full model can be recovered using just a single counterfactual query when differentiable distance measures are employed. In contrast, when using polyhedral distances for instance, the number of required queries grows linearly with the dimension of the data space. For robust counterfactuals, the latter number of queries doubles. Consequently, the applied distance function and robustness of counterfactuals have a significant impact on the model's security.

超分辨率|去噪|去模糊|去雾(2篇)

【1】Causality in Video Diffusers is Separable from Denoising
标题：视频扩散器的因果关系与降噪是可以分开的
链接：https://arxiv.org/abs/2602.10095

作者：Xingjian Bai,Guande He,Zhengqi Li,Eli Shechtman,Xun Huang,Zongze Wu
摘要：因果关系-指的是组件之间的时间，单向因果关系-是许多复杂生成过程的基础，包括视频，语言和机器人轨迹。当前的因果扩散模型将时间推理与迭代去噪纠缠在一起，在所有层、每个去噪步骤和整个上下文中应用因果注意力。在本文中，我们表明，在这些模型中的因果推理是分离的多步去噪过程。通过对自回归视频扩散器的系统探测，我们发现了两个关键问题：（1）早期层在去噪步骤中产生高度相似的特征，表明沿扩散轨迹的冗余计算;（2）更深层表现出稀疏的跨帧注意力，主要执行帧内渲染。出于这些研究结果，我们介绍了可分离的因果扩散（SCD），一个新的架构，显式地重复一次每帧的时间推理，通过因果Transformer编码器，从多步逐帧渲染，通过一个轻量级的扩散解码器。在合成和真实基准测试中对预训练和后训练任务进行的大量实验表明，SCD显着提高了吞吐量和每帧延迟，同时匹配或超过了强因果扩散基线的生成质量。
摘要：Causality -- referring to temporal, uni-directional cause-effect relationships between components -- underlies many complex generative processes, including videos, language, and robot trajectories. Current causal diffusion models entangle temporal reasoning with iterative denoising, applying causal attention across all layers, at every denoising step, and over the entire context. In this paper, we show that the causal reasoning in these models is separable from the multi-step denoising process. Through systematic probing of autoregressive video diffusers, we uncover two key regularities: (1) early layers produce highly similar features across denoising steps, indicating redundant computation along the diffusion trajectory; and (2) deeper layers exhibit sparse cross-frame attention and primarily perform intra-frame rendering. Motivated by these findings, we introduce Separable Causal Diffusion (SCD), a new architecture that explicitly decouples once-per-frame temporal reasoning, via a causal transformer encoder, from multi-step frame-wise rendering, via a lightweight diffusion decoder. Extensive experiments on both pretraining and post-training tasks across synthetic and real benchmarks show that SCD significantly improves throughput and per-frame latency while matching or surpassing the generation quality of strong causal diffusion baselines.

【2】Blind denoising diffusion models and the blessings of dimensionality
标题：盲去噪扩散模型和维度的祝福
链接：https://arxiv.org/abs/2602.09639

作者：Zahra Kadkhodaie,Aram-Alexandre Pooladian,Sinho Chewi,Eero Simoncelli
备注：40 pages, 12 figures
摘要：我们分析，理论和经验，生成扩散模型的性能的基础上\n {盲去噪器}，其中去噪器是不给噪声幅度在训练或采样过程中。假设数据分布具有较低的内在维度，我们证明了盲去噪扩散模型（BDDMs），尽管没有访问的噪声幅度，自动跟踪一个特定的隐式噪声时间表沿逆过程。我们的分析表明，BDDMs可以准确地采样的数据分布在多项式的许多步骤作为一个功能的内在尺寸。实证结果证实了这些数学研究结果的合成和图像数据，表明噪声方差准确估计的噪声图像。值得注意的是，我们观察到，无时间表的BDDM产生的样本质量更高，与他们的非盲同行。我们提供的证据表明，这种性能增益的出现，因为BDDMs纠正真正的残留噪声（图像）和噪声之间的不匹配，假设在非盲扩散模型中使用的时间表。
摘要：We analyze, theoretically and empirically, the performance of generative diffusion models based on \emph{blind denoisers}, in which the denoiser is not given the noise amplitude in either the training or sampling processes. Assuming that the data distribution has low intrinsic dimensionality, we prove that blind denoising diffusion models (BDDMs), despite not having access to the noise amplitude, \emph{automatically} track a particular \emph{implicit} noise schedule along the reverse process. Our analysis shows that BDDMs can accurately sample from the data distribution in polynomially many steps as a function of the intrinsic dimension. Empirical results corroborate these mathematical findings on both synthetic and image data, demonstrating that the noise variance is accurately estimated from the noisy image. Remarkably, we observe that schedule-free BDDMs produce samples of higher quality compared to their non-blind counterparts. We provide evidence that this performance gain arises because BDDMs correct the mismatch between the true residual noise (of the image) and the noise assumed by the schedule used in non-blind diffusion models.

联邦学习|隐私保护|加密(4篇)

【1】Towards Explainable Federated Learning: Understanding the Impact of Differential Privacy
标题：迈向可解释的联邦学习：了解差异隐私的影响
链接：https://arxiv.org/abs/2602.10100

作者：Júlio Oliveira,Rodrigo Ferreira,André Riker,Glaucio H. S. Carvalho,Eirini Eleni Tsilopoulou
摘要：数据隐私和可解释人工智能（XAI）是现代机器学习系统的两个重要方面。为了增强数据隐私，最近的机器学习模型被设计为联合学习（FL）系统。除此之外，还可以通过差分隐私（DP）添加额外的隐私层。另一方面，为了提高可解释性，机器学习必须考虑更多可解释的方法，减少特征数量和降低复杂的内部架构。在此背景下，本文旨在实现一种机器学习（ML）模型，该模型将增强的数据隐私与可解释性相结合。因此，我们提出了一种FL解决方案，称为具有差分隐私的联合可解释树（FEXT-DP），该解决方案：（i）基于决策树，因为它们是轻量级的，并且比基于神经网络的FL系统具有更好的可解释性;（ii）将差分隐私（DP）应用于基于树的模型，提供额外的数据隐私保护层。然而，添加DP有一个副作用：它损害了系统的可解释性。因此，本文还介绍了DP保护对ML模型可解释性的影响。所进行的性能评估表明，FEXT-DP在更快的训练方面有所改进，即，轮数、均方误差和可解释性。
摘要：Data privacy and eXplainable Artificial Intelligence (XAI) are two important aspects for modern Machine Learning systems. To enhance data privacy, recent machine learning models have been designed as a Federated Learning (FL) system. On top of that, additional privacy layers can be added, via Differential Privacy (DP). On the other hand, to improve explainability, ML must consider more interpretable approaches with reduced number of features and less complex internal architecture. In this context, this paper aims to achieve a machine learning (ML) model that combines enhanced data privacy with explainability. So, we propose a FL solution, called Federated EXplainable Trees with Differential Privacy (FEXT-DP), that: (i) is based on Decision Trees, since they are lightweight and have superior explainability than neural networks-based FL systems; (ii) provides additional layer of data privacy protection applying Differential Privacy (DP) to the Tree-Based model. However, there is a side effect adding DP: it harms the explainability of the system. So, this paper also presents the impact of DP protection on the explainability of the ML model. The carried out performance assessment shows improvements of FEXT-DP in terms of a faster training, i.e., numbers of rounds, Mean Squared Error and explainability.

【2】Safeguarding Privacy: Privacy-Preserving Detection of Mind Wandering and Disengagement Using Federated Learning in Online Education
标题：保护隐私：保护隐私在在线教育中使用联邦学习检测思想徘徊和脱节
链接：https://arxiv.org/abs/2602.09904

作者：Anna Bodonhelyi,Mengdi Wang,Efe Bozkir,Babette Bühler,Enkelejda Kasneci
摘要：自2019冠状病毒病疫情以来，在线课程扩大了受教育的机会，但缺乏直接的教师支持挑战了学习者自我调节注意力和参与度的能力。走神和脱离可能对学习成果有害，这使得通过基于视频的指标自动检测它们成为实时学习者支持的一种有前途的方法。然而，基于机器学习的方法通常需要共享敏感数据，从而引发隐私问题。联邦学习提供了一种保护隐私的替代方案，它支持分散的模型训练，同时还分布计算负载。我们提出了一个框架，利用跨设备联合学习，以解决远程学习过程中的行为和认知脱离的不同表现，特别是行为脱离，走神和无聊。我们使用面部表情和凝视特征拟合基于视频的认知脱离检测模型。通过采用联邦学习，我们通过隐私设计来保护用户的数据隐私，并引入了一种具有实时学习者支持潜力的新解决方案。我们通过整合相关功能，进一步解决眼镜带来的挑战，提高整体模型性能。为了验证我们的方法的性能，我们对五个数据集进行了广泛的实验，并对多个联邦学习算法进行了基准测试。我们的研究结果显示，隐私保护教育技术促进学习者参与的巨大潜力。
摘要：Since the COVID-19 pandemic, online courses have expanded access to education, yet the absence of direct instructor support challenges learners' ability to self-regulate attention and engagement. Mind wandering and disengagement can be detrimental to learning outcomes, making their automated detection via video-based indicators a promising approach for real-time learner support. However, machine learning-based approaches often require sharing sensitive data, raising privacy concerns. Federated learning offers a privacy-preserving alternative by enabling decentralized model training while also distributing computational load. We propose a framework exploiting cross-device federated learning to address different manifestations of behavioral and cognitive disengagement during remote learning, specifically behavioral disengagement, mind wandering, and boredom. We fit video-based cognitive disengagement detection models using facial expressions and gaze features. By adopting federated learning, we safeguard users' data privacy through privacy-by-design and introduce a novel solution with the potential for real-time learner support. We further address challenges posed by eyeglasses by incorporating related features, enhancing overall model performance. To validate the performance of our approach, we conduct extensive experiments on five datasets and benchmark multiple federated learning algorithms. Our results show great promise for privacy-preserving educational technologies promoting learner engagement.

【3】Rashomon Sets and Model Multiplicity in Federated Learning
标题：联邦学习中的罗生门集和模型多重性
链接：https://arxiv.org/abs/2602.09520

作者：Xenia Heilmann,Luca Corbucci,Mattia Cerrato
摘要：罗生门集合捕获了一系列模型，这些模型实现了几乎相同的经验表现，但在决策边界上可能存在很大差异。了解这些模型之间的差异，即，它们的多样性被认为是迈向模型透明性、公平性和鲁棒性的关键一步，因为它揭示了标准度量所掩盖的决策边界不稳定性。然而，现有的罗生门集合和多样性度量的定义假设集中式学习，并且不能自然地扩展到分散的多方设置，如联邦学习（FL）。在FL中，多个客户端在中央服务器的协调下协作训练模型，而不共享原始数据，这保留了隐私，但引入了来自异构客户端数据分布和通信约束的挑战。在这种情况下，选择单一的最佳模型可能会使不同客户的预测行为同质化，放大偏见，或破坏公平性保证。在这项工作中，我们提供了FL中罗生门集合的第一个形式化。首先，我们将罗生门集合定义适应FL，区分三个视角：（I）在所有客户端上的聚合统计上定义的全局罗生门集合，（II）表示在部分t个客户端上的局部罗生门集合的交集的t-协议罗生门集合，和（III）个人罗生门集特定于每个客户端的本地分布。第二，我们展示了如何标准的多重性指标可以估计FL的隐私约束。最后，我们介绍了一个多重感知FL管道，并在标准FL基准数据集上进行了实证研究。我们的研究结果表明，所有三个提出的联邦罗生门集定义提供了有价值的见解，使客户能够部署模型，更好地符合他们的本地数据，公平性考虑和实际要求。
摘要：The Rashomon set captures the collection of models that achieve near-identical empirical performance yet may differ substantially in their decision boundaries. Understanding the differences among these models, i.e., their multiplicity, is recognized as a crucial step toward model transparency, fairness, and robustness, as it reveals decision boundaries instabilities that standard metrics obscure. However, the existing definitions of Rashomon set and multiplicity metrics assume centralized learning and do not extend naturally to decentralized, multi-party settings like Federated Learning (FL). In FL, multiple clients collaboratively train models under a central server's coordination without sharing raw data, which preserves privacy but introduces challenges from heterogeneous client data distribution and communication constraints. In this setting, the choice of a single best model may homogenize predictive behavior across diverse clients, amplify biases, or undermine fairness guarantees. In this work, we provide the first formalization of Rashomon sets in FL.First, we adapt the Rashomon set definition to FL, distinguishing among three perspectives: (I) a global Rashomon set defined over aggregated statistics across all clients, (II) a t-agreement Rashomon set representing the intersection of local Rashomon sets across a fraction t of clients, and (III) individual Rashomon sets specific to each client's local distribution.Second, we show how standard multiplicity metrics can be estimated under FL's privacy constraints. Finally, we introduce a multiplicity-aware FL pipeline and conduct an empirical study on standard FL benchmark datasets. Our results demonstrate that all three proposed federated Rashomon set definitions offer valuable insights, enabling clients to deploy models that better align with their local data, fairness considerations, and practical requirements.

【4】Federated Learning for Surgical Vision in Appendicitis Classification: Results of the FedSurg EndoVis 2024 Challenge
标题：阑尾炎分类中手术视觉的联邦学习：FedSurg EndoVis 2024挑战赛的结果
链接：https://arxiv.org/abs/2510.04772

作者：Max Kirchner,Hanna Hoffmann,Alexander C. Jenke,Oliver L. Saldanha,Kevin Pfeiffer,Weam Kanjo,Julia Alekseenko,Claas de Boer,Santhi Raj Kolamuri,Lorenzo Mazza,Nicolas Padoy,Sophia Bano,Annika Reinke,Lena Maier-Hein,Danail Stoyanov,Jakob N. Kather,Fiona R. Kolbinger,Sebastian Bodenstedt,Stefanie Speidel
备注：A challenge report pre-print (31 pages), including 7 tables and 8 figures
摘要：目的：FedSurg挑战旨在对手术视频分类的联邦学习的最新技术水平进行基准测试。它的目标是评估当前的方法如何推广到看不见的临床中心，并通过本地微调进行调整，同时在不共享患者数据的情况下实现协作模型开发。研究方法：参与者开发了使用多中心EPDIX 300视频数据集的初步版本对阑尾炎炎症阶段进行分类的策略。该挑战评估了两项任务：对一个看不见的中心的泛化和微调后的中心特定适应。提交的方法包括具有线性探测的基础模型、具有三重态损失的指标学习以及各种FL聚合方案（FedAvg、FedMedian、FedSAM）。使用F1评分和预期成本评估性能，通过自举和统计检验评估排名稳健性。结果：在泛化任务中，跨中心的性能是有限的。在适应任务中，所有团队在微调后都有所改善，但排名稳定性较低。基于ViViT的提交实现了最强的整体性能。这一挑战突出了泛化的局限性，对类不平衡的敏感性，以及分散训练中超参数调整的困难，而时空建模和上下文感知预处理成为有前途的策略。结论：FedSurg Challenge为评估手术视频分类中的FL策略建立了第一个基准。研究结果强调了本地个性化和全球鲁棒性之间的权衡，并强调了架构选择，预处理和损失设计的重要性。该基准测试为临床手术AI中的不平衡感知、自适应和鲁棒FL方法的未来开发提供了参考点。
摘要：Purpose: The FedSurg challenge was designed to benchmark the state of the art in federated learning for surgical video classification. Its goal was to assess how well current methods generalize to unseen clinical centers and adapt through local fine-tuning while enabling collaborative model development without sharing patient data. Methods: Participants developed strategies to classify inflammation stages in appendicitis using a preliminary version of the multi-center Appendix300 video dataset. The challenge evaluated two tasks: generalization to an unseen center and center-specific adaptation after fine-tuning. Submitted approaches included foundation models with linear probing, metric learning with triplet loss, and various FL aggregation schemes (FedAvg, FedMedian, FedSAM). Performance was assessed using F1-score and Expected Cost, with ranking robustness evaluated via bootstrapping and statistical testing. Results: In the generalization task, performance across centers was limited. In the adaptation task, all teams improved after fine-tuning, though ranking stability was low. The ViViT-based submission achieved the strongest overall performance. The challenge highlighted limitations in generalization, sensitivity to class imbalance, and difficulties in hyperparameter tuning in decentralized training, while spatiotemporal modeling and context-aware preprocessing emerged as promising strategies. Conclusion: The FedSurg Challenge establishes the first benchmark for evaluating FL strategies in surgical video classification. Findings highlight the trade-off between local personalization and global robustness, and underscore the importance of architecture choice, preprocessing, and loss design. This benchmarking offers a reference point for future development of imbalance-aware, adaptive, and robust FL methods in clinical surgical AI.

推理|分析|理解|解释(10篇)

【1】Empirical Stability Analysis of Kolmogorov-Arnold Networks in Hard-Constrained Recurrent Physics-Informed Discovery
标题：硬约束回归物理学发现中Kolmogorov-Arnold网络的经验稳定性分析
链接：https://arxiv.org/abs/2602.09988

作者：Enzo Nicolas Spotorno,Josafat Leal Filho,Antonio Augusto Medeiros Frohlich
备注：5 pages
摘要：我们研究了将Kolmogorov-Arnold网络（KAN）集成到硬约束循环物理信息架构（HRPINN）中，以评估振荡系统中学习的剩余流形的保真度。受Kolmogorov-Arnold表示定理和初步灰盒结果的启发，我们假设KAN与MLP相比能够有效地恢复未知项。通过对配置灵敏度、参数规模和训练范式的初始灵敏度分析，我们发现，虽然小KAN在单变量多项式残差（Duffing）上具有竞争力，但它们表现出严重的超参数脆弱性、更深配置的不稳定性以及乘法项（Van der Pol）的一致性失效，通常优于标准MLP。这些经验的挑战突出的限制，在原来的KAN公式的状态耦合的加性电感偏置，并提供了初步的经验证据，电感偏置的限制，为未来的混合建模。
摘要：We investigate the integration of Kolmogorov-Arnold Networks (KANs) into hard-constrained recurrent physics-informed architectures (HRPINN) to evaluate the fidelity of learned residual manifolds in oscillatory systems. Motivated by the Kolmogorov-Arnold representation theorem and preliminary gray-box results, we hypothesized that KANs would enable efficient recovery of unknown terms compared to MLPs. Through initial sensitivity analysis on configuration sensitivity, parameter scale, and training paradigm, we found that while small KANs are competitive on univariate polynomial residuals (Duffing), they exhibit severe hyperparameter fragility, instability in deeper configurations, and consistent failure on multiplicative terms (Van der Pol), generally outperformed by standard MLPs. These empirical challenges highlight limitations of the additive inductive bias in the original KAN formulation for state coupling and provide preliminary empirical evidence of inductive bias limitations for future hybrid modeling.

【2】Coupled Inference in Diffusion Models for Semantic Decomposition
标题：语义分解扩散模型中的耦合推理
链接：https://arxiv.org/abs/2602.09983

作者：Calvin Yeung,Ali Zakeri,Zhuowen Zou,Mohsen Imani
备注：15 pages
摘要：许多视觉场景可以被描述为潜在因素的组合。有效的识别、推理和编辑通常不仅需要形成这样的组合表示，还需要解决分解问题。构造这些表示的一个流行选择是通过绑定操作。谐振器网络，可以理解为耦合Hopfield网络，被提出作为一种方式来执行这种绑定表示的分解。最近的研究表明，Hopfield网络和扩散模型之间有着显著的相似之处。出于这些观察，我们引入了一个框架，语义分解，在扩散模型中使用耦合推理。我们的方法框架语义分解作为一个逆问题，并耦合的扩散过程中使用的重建驱动的指导条款，鼓励成分的因素估计匹配的约束向量。我们还介绍了一种新的迭代采样方案，提高了我们的模型的性能。最后，我们表明，基于注意力的谐振器网络是我们的框架的一个特例。从经验上讲，我们证明了我们的耦合推理框架在一系列合成语义分解任务中优于谐振器网络。
摘要：Many visual scenes can be described as compositions of latent factors. Effective recognition, reasoning, and editing often require not only forming such compositional representations, but also solving the decomposition problem. One popular choice for constructing these representations is through the binding operation. Resonator networks, which can be understood as coupled Hopfield networks, were proposed as a way to perform decomposition on such bound representations. Recent works have shown notable similarities between Hopfield networks and diffusion models. Motivated by these observations, we introduce a framework for semantic decomposition using coupled inference in diffusion models. Our method frames semantic decomposition as an inverse problem and couples the diffusion processes using a reconstruction-driven guidance term that encourages the composition of factor estimates to match the bound vector. We also introduce a novel iterative sampling scheme that improves the performance of our model. Finally, we show that attention-based resonator networks are a special case of our framework. Empirically, we demonstrate that our coupled inference framework outperforms resonator networks across a range of synthetic semantic decomposition tasks.

【3】Scalable and Reliable State-Aware Inference of High-Impact N-k Contingencies
标题：高影响N-k意外事件的可扩展且可靠的状态感知推断
链接：https://arxiv.org/abs/2602.09461

作者：Lihao Mai,Chenhan Xiao,Yang Weng
摘要：基于逆变器的资源、灵活的负载和快速变化的操作条件的渗透率的增加使得高阶$N\！-\！应急评估日益重要，但计算量过大。在常规运行中，使用交流潮流或ACOPF对所有停电组合进行详尽的评估是不可行的。这一事实迫使运营商依赖启发式筛选方法，其能力，始终保持所有关键的意外没有正式建立。本文提出了一个可扩展的，状态感知的权变推理框架，旨在直接产生高影响$N\！-\！k$停电的情况下，没有枚举组合的意外空间。该框架采用了一个条件扩散模型来产生适合当前操作状态的候选突发事件，而一个拓扑感知的图神经网络只训练基地和$N\！-\！1$ cases离线高效构建高风险训练样本。最后，该框架的开发，以提供可控的覆盖范围保证严重的突发事件，允许运营商明确管理的风险，在有限的交流潮流评估预算的关键事件。IEEE基准系统的实验表明，对于一个给定的评估预算，所提出的方法一致地评估更高的严重程度比均匀抽样的突发事件。这允许以减少的计算工作量更可靠地识别关键中断。
摘要：Increasing penetration of inverter-based resources, flexible loads, and rapidly changing operating conditions make higher-order $N\!-\!k$ contingency assessment increasingly important but computationally prohibitive. Exhaustive evaluation of all outage combinations using AC power-flow or ACOPF is infeasible in routine operation. This fact forces operators to rely on heuristic screening methods whose ability to consistently retain all critical contingencies is not formally established. This paper proposes a scalable, state-aware contingency inference framework designed to directly generate high-impact $N\!-\!k$ outage scenarios without enumerating the combinatorial contingency space. The framework employs a conditional diffusion model to produce candidate contingencies tailored to the current operating state, while a topology-aware graph neural network trained only on base and $N\!-\!1$ cases efficiently constructs high-risk training samples offline. Finally, the framework is developed to provide controllable coverage guarantees for severe contingencies, allowing operators to explicitly manage the risk of missing critical events under limited AC power-flow evaluation budgets. Experiments on IEEE benchmark systems show that, for a given evaluation budget, the proposed approach consistently evaluates higher-severity contingencies than uniform sampling. This allows critical outages to be identified more reliably with reduced computational effort.

【4】Effective Reasoning Chains Reduce Intrinsic Dimensionality
标题：有效的推理链减少内在的抽象性
链接：https://arxiv.org/abs/2602.09276

作者：Archiki Prasad,Mandar Joshi,Kenton Lee,Mohit Bansal,Peter Shaw
备注：20 pages, 3 figures
摘要：思想链（CoT）推理及其变体大大提高了语言模型在复杂推理任务中的性能，但不同策略促进泛化的精确机制仍然知之甚少。虽然目前的解释往往指向增加的测试时间计算或结构指导，建立一个一致的，可量化的这些因素和泛化之间的联系仍然具有挑战性。在这项工作中，我们确定的内在维度作为一个定量的措施，用于表征推理链的有效性。内在维度量化了在给定任务上达到给定精度阈值所需的最小模型维度数。通过保持固定的模型架构和不同的推理策略，通过不同的任务制定，我们证明了有效的推理策略一贯降低任务的内在维度。在GSM 8 K上使用Gemma-3 1B和4 B验证了这一点，我们观察到推理策略的内在维度与其分布内和分布外数据的泛化性能之间存在很强的逆相关性。我们的研究结果表明，有效的推理链通过使用更少的参数更好地压缩任务来促进学习，为分析推理过程提供了一个新的定量指标。
摘要：Chain-of-thought (CoT) reasoning and its variants have substantially improved the performance of language models on complex reasoning tasks, yet the precise mechanisms by which different strategies facilitate generalization remain poorly understood. While current explanations often point to increased test-time computation or structural guidance, establishing a consistent, quantifiable link between these factors and generalization remains challenging. In this work, we identify intrinsic dimensionality as a quantitative measure for characterizing the effectiveness of reasoning chains. Intrinsic dimensionality quantifies the minimum number of model dimensions needed to reach a given accuracy threshold on a given task. By keeping the model architecture fixed and varying the task formulation through different reasoning strategies, we demonstrate that effective reasoning strategies consistently reduce the intrinsic dimensionality of the task. Validating this on GSM8K with Gemma-3 1B and 4B, we observe a strong inverse correlation between the intrinsic dimensionality of a reasoning strategy and its generalization performance on both in-distribution and out-of-distribution data. Our findings suggest that effective reasoning chains facilitate learning by better compressing the task using fewer parameters, offering a new quantitative metric for analyzing reasoning processes.

【5】Feature salience -- not task-informativeness -- drives machine learning model explanations
标题：特征突出性（而不是任务信息性）驱动机器学习模型解释
链接：https://arxiv.org/abs/2602.09238

作者：Benedict Clark,Marta Oliveira,Rick Wilming,Stefan Haufe
摘要：可解释人工智能（XAI）有望提供对机器学习模型决策过程的洞察，其中一个目标是识别诸如快捷学习等失败。这个promise依赖于字段的假设，即被XAI标记为重要的输入特征必须包含关于目标变量的信息。然而，目前还不清楚信息量是否确实是重要性归因的主要驱动因素，或者其他数据属性，如统计抑制，测试时的新颖性或高特征显着性，是否有很大的贡献。为了澄清这一点，我们在二进制图像分类任务的三种变体上训练了深度学习模型，其中半透明水印要么不存在，要么作为类别相关的混淆，要么代表类别无关的噪声。五种流行的归因方法的结果显示，无论训练设置如何，所有模型在水印区域（RIW）中的相对重要性都大幅提高（R^2\geq. 45 $）。相比之下，水印的存在是否是类依赖性的或不只是有一个边际效应RIW（$R^2 \leq .03$），尽管有明显的影响模型的性能和泛化能力的影响。XAI方法显示出与模型无关的边缘检测滤波器类似的行为，并且当明亮的图像强度由较小而不是较大的特征值编码时，对水印的重要性大大降低。这些结果表明，重要性归因最强烈地受到测试时图像结构的显着性的驱动，而不是机器学习模型学习到的统计关联。以前的研究表明，成功的XAI应用程序应重新评估方面可能虚假的并发功能的显着性和信息，并使用功能属性的方法作为构建块的工作流程应仔细检查。
摘要：Explainable AI (XAI) promises to provide insight into machine learning models' decision processes, where one goal is to identify failures such as shortcut learning. This promise relies on the field's assumption that input features marked as important by an XAI must contain information about the target variable. However, it is unclear whether informativeness is indeed the main driver of importance attribution in practice, or if other data properties such as statistical suppression, novelty at test-time, or high feature salience substantially contribute. To clarify this, we trained deep learning models on three variants of a binary image classification task, in which translucent watermarks are either absent, act as class-dependent confounds, or represent class-independent noise. Results for five popular attribution methods show substantially elevated relative importance in watermarked areas (RIW) for all models regardless of the training setting ($R^2 \geq .45$). By contrast, whether the presence of watermarks is class-dependent or not only has a marginal effect on RIW ($R^2 \leq .03$), despite a clear impact impact on model performance and generalisation ability. XAI methods show similar behaviour to model-agnostic edge detection filters and attribute substantially less importance to watermarks when bright image intensities are encoded by smaller instead of larger feature values. These results indicate that importance attribution is most strongly driven by the salience of image structures at test time rather than statistical associations learned by machine learning models. Previous studies demonstrating successful XAI application should be reevaluated with respect to a possibly spurious concurrency of feature salience and informativeness, and workflows using feature attribution methods as building blocks should be scrutinised.

【6】Decoding Future Risk: Deep Learning Analysis of Tubular Adenoma Whole-Slide Images
标题：解码未来风险：管状腺癌整片图像的深度学习分析
链接：https://arxiv.org/abs/2602.09155

作者：Ahmed Rahu,Brian Shula,Brandon Combs,Aqsa Sultana,Surendra P. Singh,Vijayan K. Asari,Derrick Forchetti
备注：20 pages, 5 figures
摘要：结直肠癌（CRC）仍然是癌症相关死亡率的重要原因，尽管广泛实施了旨在检测和切除癌前息肉的预防措施。虽然筛查有效地降低了发病率，但最初诊断为低度腺瘤性息肉的患者中有相当一部分在以后的生活中仍然会发展为CRC，即使不存在已知的高危综合征。确定哪些低风险患者的进展风险较高，这是针对性监测和预防性治疗策略的一个关键未满足需求。传统的腺瘤组织学评估，虽然基本的，可能无法完全捕捉微妙的建筑或细胞学特征，恶性潜力的指示。数字病理学和机器学习的进步为全面客观地分析全切片图像（WSIs）提供了机会。这项研究调查了机器学习算法，特别是卷积神经网络（CNN），是否可以检测到低度管状腺瘤的WSI中的细微组织学特征，这些特征可以预测患者患结直肠癌的长期风险。
摘要：Colorectal cancer (CRC) remains a significant cause of cancer-related mortality, despite the widespread implementation of prophylactic initiatives aimed at detecting and removing precancerous polyps. Although screening effectively reduces incidence, a notable portion of patients initially diagnosed with low-grade adenomatous polyps will still develop CRC later in life, even without the presence of known high-risk syndromes. Identifying which low-risk patients are at higher risk of progression is a critical unmet need for tailored surveillance and preventative therapeutic strategies. Traditional histological assessment of adenomas, while fundamental, may not fully capture subtle architectural or cytological features indicative of malignant potential. Advancements in digital pathology and machine learning provide an opportunity to analyze whole-slide images (WSIs) comprehensively and objectively. This study investigates whether machine learning algorithms, specifically convolutional neural networks (CNNs), can detect subtle histological features in WSIs of low-grade tubular adenomas that are predictive of a patient's long-term risk of developing colorectal cancer.

【7】Epistemic Throughput: Fundamental Limits of Attention-Constrained Inference
标题：认识预测：注意力限制推理的基本限制
链接：https://arxiv.org/abs/2602.09127

作者：Lei You
摘要：最近的生成和工具使用人工智能系统可以以较低的边际成本呈现大量候选人，但只有一小部分可以仔细检查。这就造成了解码器端的瓶颈：下游决策者必须在缺乏关注的情况下从许多公共记录中形成可靠的后验。我们正式通过注意力约束推理（ACI），其中一个便宜的筛选阶段处理$K$记录和昂贵的验证阶段可以跟进最多$B$他们这个制度。在贝叶斯对数损失下，我们研究了每个窗口的后验不确定性的最大可实现的减少，我们称之为认知吞吐量。我们的主要结果是一个“JaKoB”标度律显示，认识吞吐量有一个基线项，随着验证和流行率线性增长，和一个额外的\sqrt{JKB}$，其中$J$总结了筛选质量。因此，扩大廉价的筛选可以非线性地放大稀缺的验证，即使在信息记录稀少的情况下。我们进一步表明，这种缩放是紧在弱筛选限制，并在稀疏验证制度（$B \ll K$），大量的杠杆作用需要重尾分数分布，轻尾分数的放大只有对数。
摘要：Recent generative and tool-using AI systems can surface a large volume of candidates at low marginal cost, yet only a small fraction can be checked carefully. This creates a decoder-side bottleneck: downstream decision-makers must form reliable posteriors from many public records under scarce attention. We formalize this regime via Attention-Constrained Inference (ACI), in which a cheap screening stage processes $K$ records and an expensive verification stage can follow up on at most $B$ of them. Under Bayes log-loss, we study the maximum achievable reduction in posterior uncertainty per window, which we call \emph{epistemic throughput}. Our main result is a ``JaKoB'' scaling law showing that epistemic throughput has a baseline term that grows linearly with verification and prevalence, and an additional \emph{information-leverage} term that scales as $\sqrt{JKB}$, where $J$ summarizes screening quality. Thus, expanding cheap screening can nonlinearly amplify scarce verification, even when informative records are rare. We further show that this scaling is tight in a weak-screening limit, and that in the sparse-verification regime ($B \ll K$), substantial leverage requires heavy-tailed score distributions; for light-tailed scores the amplification is only logarithmic.

【8】Predicting Open Source Software Sustainability with Deep Temporal Neural Hierarchical Architectures and Explainable AI
标题：利用深时态神经分层架构和可解释人工智能预测开源软件的可持续性
链接：https://arxiv.org/abs/2602.09064

作者：S M Rakib Ul Karim,Wenyi Lu,Enock Kasaadha,Sean Goggins
摘要：开源软件（OSS）项目遵循不同的生命周期轨迹，这些轨迹由不断发展的贡献、协调和社区参与模式所塑造。了解这些轨迹对于寻求大规模评估项目组织和健康状况的利益相关者至关重要。然而，以前的工作主要依赖于静态或汇总的指标，如项目年龄或累积活动，对OSS可持续性如何随着时间的推移而展开的洞察有限。在本文中，我们提出了一个分层的预测框架，模型OSS项目属于不同的生命周期阶段的基础上建立的社会技术分类的OSS发展。这些生命周期阶段不是将可持续性仅仅视为项目的寿命，而是将可持续性作为一个多维度的结构来操作，该结构集成了贡献活动、社区参与和维护动态。该框架将设计的表格指标与24个月的时间活动序列相结合，并采用多阶段分类管道，以区分与不同协调和参与制度相关的生命周期阶段。为了支持透明度，我们结合了可解释的人工智能技术来检查特征类别对模型预测的相对贡献。在OSS存储库的大型语料库上进行评估，所提出的方法在生命周期阶段分类中达到了94%以上的整体准确率。归因分析一致确定贡献活动和社区相关的功能为主导信号，突出集体参与动态的核心作用。
摘要：Open Source Software (OSS) projects follow diverse lifecycle trajectories shaped by evolving patterns of contribution, coordination, and community engagement. Understanding these trajectories is essential for stakeholders seeking to assess project organization and health at scale. However, prior work has largely relied on static or aggregated metrics, such as project age or cumulative activity, providing limited insight into how OSS sustainability unfolds over time. In this paper, we propose a hierarchical predictive framework that models OSS projects as belonging to distinct lifecycle stages grounded in established socio-technical categorizations of OSS development. Rather than treating sustainability solely as project longevity, these lifecycle stages operationalize sustainability as a multidimensional construct integrating contribution activity, community participation, and maintenance dynamics. The framework combines engineered tabular indicators with 24-month temporal activity sequences and employs a multi-stage classification pipeline to distinguish lifecycle stages associated with different coordination and participation regimes. To support transparency, we incorporate explainable AI techniques to examine the relative contribution of feature categories to model predictions. Evaluated on a large corpus of OSS repositories, the proposed approach achieves over 94\% overall accuracy in lifecycle stage classification. Attribution analyses consistently identify contribution activity and community-related features as dominant signals, highlighting the central role of collective participation dynamics.

【9】Statistical-Computational Trade-offs in Learning Multi-Index Models via Harmonic Analysis
标题：通过调和分析学习多指标模型的统计计算权衡
链接：https://arxiv.org/abs/2602.09959

作者：Hugo Latourelle-Vigeant,Theodor Misiakiewicz
备注：91 pages
摘要：We study the problem of learning multi-index models (MIMs), where the label depends on the input $\boldsymbol{x} \in \mathbb{R}^d$ only through an unknown $\mathsf{s}$-dimensional projection $\boldsymbol{W}_*^\mathsf{T} \boldsymbol{x} \in \mathbb{R}^\mathsf{s}$. Exploiting the equivariance of this problem under the orthogonal group $\mathcal{O}_d$, we obtain a sharp harmonic-analytic characterization of the learning complexity for MIMs with spherically symmetric inputs -- which refines and generalizes previous Gaussian-specific analyses. Specifically, we derive statistical and computational complexity lower bounds within the Statistical Query (SQ) and Low-Degree Polynomial (LDP) frameworks. These bounds decompose naturally across spherical harmonic subspaces. Guided by this decomposition, we construct a family of spectral algorithms based on harmonic tensor unfolding that sequentially recover the latent directions and (nearly) achieve these SQ and LDP lower bounds. Depending on the choice of harmonic degree sequence, these estimators can realize a broad range of trade-offs between sample and runtime complexity. From a technical standpoint, our results build on the semisimple decomposition of the $\mathcal{O}_d$-action on $L^2 (\mathbb{S}^{d-1})$ and the intertwining isomorphism between spherical harmonics and traceless symmetric tensors.
摘要：We study the problem of learning multi-index models (MIMs), where the label depends on the input $\boldsymbol{x} \in \mathbb{R}^d$ only through an unknown $\mathsf{s}$-dimensional projection $\boldsymbol{W}_*^\mathsf{T} \boldsymbol{x} \in \mathbb{R}^\mathsf{s}$. Exploiting the equivariance of this problem under the orthogonal group $\mathcal{O}_d$, we obtain a sharp harmonic-analytic characterization of the learning complexity for MIMs with spherically symmetric inputs -- which refines and generalizes previous Gaussian-specific analyses. Specifically, we derive statistical and computational complexity lower bounds within the Statistical Query (SQ) and Low-Degree Polynomial (LDP) frameworks. These bounds decompose naturally across spherical harmonic subspaces. Guided by this decomposition, we construct a family of spectral algorithms based on harmonic tensor unfolding that sequentially recover the latent directions and (nearly) achieve these SQ and LDP lower bounds. Depending on the choice of harmonic degree sequence, these estimators can realize a broad range of trade-offs between sample and runtime complexity. From a technical standpoint, our results build on the semisimple decomposition of the $\mathcal{O}_d$-action on $L^2 (\mathbb{S}^{d-1})$ and the intertwining isomorphism between spherical harmonics and traceless symmetric tensors.

【10】The Critical Horizon: Inspection Design Principles for Multi-Stage Operations and Deep Reasoning
标题：关键视野：多阶段操作和深度推理的检查设计原则
链接：https://arxiv.org/abs/2602.09394

作者：Seyed Morteza Emadi
备注：49 pages, 5 figures
摘要：生产线、服务流程、供应链和人工智能推理链都面临着一个共同的挑战：将终端结果归因于导致终端结果的中间阶段。我们为这个信用分配问题建立了一个信息理论障碍：连接早期步骤和最终结果的信号随着深度呈指数衰减，从而创建了一个临界范围，超出该范围，任何算法都无法仅从终端数据中学习。我们证明了四个结果。首先，信号衰减界限：将结果归因于早期阶段的样本复杂性随着干预步骤的数量呈指数级增长。第二，宽度限制：并行卷展仅提供对数缓解，相关性限制了独立样本的有效数量。第三，客观不匹配：当序列有效性要求所有步骤都正确时，附加奖励聚合优化了错误的数量。第四，最优检查设计：均匀检查点间距是最小最优均匀信号衰减下，而贪婪算法产生最优的非均匀衰减下的时间表。总之，这些结果为运营中的检查设计和人工智能中的监督设计提供了共同的分析基础。
摘要：Manufacturing lines, service journeys, supply chains, and AI reasoning chains share a common challenge: attributing a terminal outcome to the intermediate stage that caused it. We establish an information-theoretic barrier to this credit assignment problem: the signal connecting early steps to final outcomes decays exponentially with depth, creating a critical horizon beyond which no algorithm can learn from endpoint data alone. We prove four results. First, a Signal Decay Bound: sample complexity for attributing outcomes to early stages grows exponentially in the number of intervening steps. Second, Width Limits: parallel rollouts provide only logarithmic relief, with correlation capping the effective number of independent samples. Third, an Objective Mismatch: additive reward aggregation optimizes the wrong quantity when sequential validity requires all steps to be correct. Fourth, Optimal Inspection Design: uniform checkpoint spacing is minimax-optimal under homogeneous signal attenuation, while a greedy algorithm yields optimal non-uniform schedules under heterogeneous attenuation. Together, these results provide a common analytical foundation for inspection design in operations and supervision design in AI.

检测相关(9篇)

【1】Vendi Novelty Scores for Out-of-Distribution Detection
标题：分发外检测的Vendi新颖性分数
链接：https://arxiv.org/abs/2602.10062

作者：Amey P. Pasarkar,Adji Bousso Dieng
摘要：分发外（OOD）检测对于机器学习系统的安全部署至关重要。现有的事后检测器通常依赖于模型的置信度得分或特征空间中的似然估计，通常在限制性分布假设下。在这项工作中，我们引入了第三种范式，并制定面向对象的检测从多样性的角度来看。我们提出了Vendi新颖性分数（VNS），一个OOD检测器的基础上的Vendi分数（VS），一个家庭的相似性为基础的多样性指标。VNS量化了测试样本增加了分布特征集的VS的程度，提供了不需要密度建模的新颖性的原则概念。VNS是线性时间、非参数的，并且自然地结合了类条件（局部）和队列级（全局）新颖信号。在多个图像分类基准和网络架构中，VNS实现了最先进的OOD检测性能。值得注意的是，VNS在仅使用1%的训练数据进行计算时保持了这种性能，从而可以在内存或访问受限的设置中进行部署。
摘要：Out-of-distribution (OOD) detection is critical for the safe deployment of machine learning systems. Existing post-hoc detectors typically rely on model confidence scores or likelihood estimates in feature space, often under restrictive distributional assumptions. In this work, we introduce a third paradigm and formulate OOD detection from a diversity perspective. We propose the Vendi Novelty Score (VNS), an OOD detector based on the Vendi Scores (VS), a family of similarity-based diversity metrics. VNS quantifies how much a test sample increases the VS of the in-distribution feature set, providing a principled notion of novelty that does not require density modeling. VNS is linear-time, non-parametric, and naturally combines class-conditional (local) and dataset-level (global) novelty signals. Across multiple image classification benchmarks and network architectures, VNS achieves state-of-the-art OOD detection performance. Remarkably, VNS retains this performance when computed using only 1% of the training data, enabling deployment in memory- or access-constrained settings.

【2】Allure of Craquelure: A Variational-Generative Approach to Crack Detection in Paintings
标题：Craquelure的诱惑：绘画裂缝检测的变分生成方法
链接：https://arxiv.org/abs/2602.09730

作者：Laura Paul,Holger Rauhut,Martin Burger,Samira Kabri,Tim Roith
摘要：成像技术、深度学习和数值性能的最新进展使艺术品的非侵入性详细分析成为可能，支持其文档和保护。特别是，数字化绘画中裂纹的自动检测对于评估退化和指导修复至关重要，但由于可能复杂的场景以及裂纹与诸如笔触或头发等裂纹状艺术特征之间的视觉相似性，仍然具有挑战性。我们提出了一种混合方法，将裂缝检测建模为逆问题，将观察到的图像分解为无裂缝的绘画和裂缝组件。一个深生成模型作为强大的基础艺术品之前，而裂纹结构被捕获使用芒福德-沙阿型变分功能与裂纹前。联合优化产生绘画中裂纹定位的像素级地图。
摘要：Recent advances in imaging technologies, deep learning and numerical performance have enabled non-invasive detailed analysis of artworks, supporting their documentation and conservation. In particular, automated detection of craquelure in digitized paintings is crucial for assessing degradation and guiding restoration, yet remains challenging due to the possibly complex scenery and the visual similarity between cracks and crack-like artistic features such as brush strokes or hair. We propose a hybrid approach that models crack detection as an inverse problem, decomposing an observed image into a crack-free painting and a crack component. A deep generative model is employed as powerful prior for the underlying artwork, while crack structures are captured using a Mumford--Shah-type variational functional together with a crack prior. Joint optimization yields a pixel-level map of crack localizations in the painting.

【3】Contextual and Seasonal LSTMs for Time Series Anomaly Detection
标题：用于时间序列异常检测的上下文和季节LSTM
链接：https://arxiv.org/abs/2602.09690

作者：Lingpei Zhang,Qingming Li,Yong Yang,Jiahao Chen,Rui Zeng,Chenyang Lyu,Shouling Ji
备注：Published as a conference paper at ICLR 2026
摘要：单变量时间序列（UTS），其中每个时间戳记录一个变量，作为Web系统和云服务器中的关键指标。UTS中的异常检测在数据挖掘和系统可靠性管理中起着至关重要的作用。然而，现有的基于重建和基于预测的方法难以捕获某些细微的异常，特别是小点异常和缓慢上升的异常。为了解决这些挑战，我们提出了一种新的基于预测的框架，命名为上下文和季节性LSTMs（CS-LSTMs）。CS-LSTM建立在噪声分解策略的基础上，并联合利用上下文依赖性和季节性模式，从而加强对细微异常的检测。通过集成时域和频域表示，CS-LSTM实现了对周期趋势和异常定位的更准确建模。对公共基准数据集的广泛评估表明，CS-LSTM始终优于最先进的方法，突出了它们在鲁棒时间序列异常检测中的有效性和实用价值。
摘要：Univariate time series (UTS), where each timestamp records a single variable, serve as crucial indicators in web systems and cloud servers. Anomaly detection in UTS plays an essential role in both data mining and system reliability management. However, existing reconstruction-based and prediction-based methods struggle to capture certain subtle anomalies, particularly small point anomalies and slowly rising anomalies. To address these challenges, we propose a novel prediction-based framework named Contextual and Seasonal LSTMs (CS-LSTMs). CS-LSTMs are built upon a noise decomposition strategy and jointly leverage contextual dependencies and seasonal patterns, thereby strengthening the detection of subtle anomalies. By integrating both time-domain and frequency-domain representations, CS-LSTMs achieve more accurate modeling of periodic trends and anomaly localization. Extensive evaluations on public benchmark datasets demonstrate that CS-LSTMs consistently outperform state-of-the-art methods, highlighting their effectiveness and practical value in robust time series anomaly detection.

【4】Why the Counterintuitive Phenomenon of Likelihood Rarely Appears in Tabular Anomaly Detection with Deep Generative Models?
标题：为什么在使用深生成模型的表格异常检测中很少出现可能性的反直觉现象？
链接：https://arxiv.org/abs/2602.09593

作者：Donghwan Kim,Junghun Phee,Hyunsoo Yoon
备注：47 pages, 11 figures
摘要：具有易处理和可分析计算的可能性的深度生成模型，以规范化流为例，通过基于可能性的评分为异常检测提供了有效的基础。我们证明，与深度生成模型经常为异常数据分配更高可能性的图像域不同，这种违反直觉的行为在表格设置中发生的频率要低得多。我们首先介绍了一个域不可知的配方，使一致的检测和评估的反直觉的现象，解决缺乏精确的定义。通过对ADBench中的47个表格数据集和10个CV/NLP嵌入数据集进行广泛的实验，以13个基线模型为基准，我们证明了这种现象在一般表格数据中始终是罕见的。我们进一步从理论和实证两个角度研究了这一现象，重点是数据维度和特征相关性差异的作用。我们的研究结果表明，只有可能性检测与规范化流提供了一个实用和可靠的方法，在表格域的异常检测。
摘要：Deep generative models with tractable and analytically computable likelihoods, exemplified by normalizing flows, offer an effective basis for anomaly detection through likelihood-based scoring. We demonstrate that, unlike in the image domain where deep generative models frequently assign higher likelihoods to anomalous data, such counterintuitive behavior occurs far less often in tabular settings. We first introduce a domain-agnostic formulation that enables consistent detection and evaluation of the counterintuitive phenomenon, addressing the absence of precise definition. Through extensive experiments on 47 tabular datasets and 10 CV/NLP embedding datasets in ADBench, benchmarked against 13 baseline models, we demonstrate that the phenomenon, as defined, is consistently rare in general tabular data. We further investigate this phenomenon from both theoretical and empirical perspectives, focusing on the roles of data dimensionality and difference in feature correlation. Our results suggest that likelihood-only detection with normalizing flows offers a practical and reliable approach for anomaly detection in tabular domains.

【5】Mitigating the Likelihood Paradox in Flow-based OOD Detection via Entropy Manipulation
标题：通过熵操纵缓解基于流的OOD检测中可能出现的悖论
链接：https://arxiv.org/abs/2602.09581

作者：Donghwan Kim,Hyunsoo Yoon
备注：28 pages, 4 figures
摘要：可以可追踪地计算输入可能性的深度生成模型，包括归一化流，通常会将意外高的可能性分配给分布外（OOD）输入。我们减轻这种可能性悖论操纵输入熵的语义相似性的基础上，施加更强的扰动输入，是不太相似的分布内存库。我们提供了一个理论分析，表明熵控制增加了分布和OOD样本之间的预期对数似然差距，有利于分布，我们解释了为什么这个过程不需要对密度模型进行任何额外的训练。然后，我们评估我们的方法对基于标准基准的似然OOD检测器，并发现一致的AUROC改善基线，支持我们的解释。
摘要：Deep generative models that can tractably compute input likelihoods, including normalizing flows, often assign unexpectedly high likelihoods to out-of-distribution (OOD) inputs. We mitigate this likelihood paradox by manipulating input entropy based on semantic similarity, applying stronger perturbations to inputs that are less similar to an in-distribution memory bank. We provide a theoretical analysis showing that entropy control increases the expected log-likelihood gap between in-distribution and OOD samples in favor of the in-distribution, and we explain why the procedure works without any additional training of the density model. We then evaluate our method against likelihood-based OOD detectors on standard benchmarks and find consistent AUROC improvements over baselines, supporting our explanation.

【6】ArtifactLens: Hundreds of Labels Are Enough for Artifact Detection with VLMs
标题：ArtifactLens：数百个标签足以使用VLM进行收件箱检测
链接：https://arxiv.org/abs/2602.09475

作者：James Burgess,Rameen Abdal,Dan Stoddart,Sergey Tulyakov,Serena Yeung-Levy,Kuan-Chieh Jackson Wang
备注：https://jmhb0.github.io/ArtifactLens/
摘要：现代的图像生成器可以生成非常逼真的图像，只有像扭曲的手或扭曲的物体这样的伪像才能显示它们的合成来源。检测这些工件是必不可少的：如果没有检测，我们就不能对生成器进行基准测试或训练奖励模型来改进它们。当前的检测器会对数万个标记图像上的VLM进行微调，但每当生成器发生变化或出现新的伪影类型时，重复这一操作的成本很高。我们表明，预训练的VLM已经编码了检测工件所需的知识-通过正确的脚手架，这种能力可以使用每个工件类别仅几百个标记的示例来解锁。我们的系统ArtifactLens在五个人类伪影基准上达到了最先进的水平（多个数据集的第一次评估），同时需要数量级更少的标记数据。脚手架包括一个多组件架构，具有上下文学习和文本教学优化，每个都有新的改进。我们的方法推广到其他工件类型-对象形态，动物解剖，和实体的相互作用-和AIGC检测的不同任务。
摘要：Modern image generators produce strikingly realistic images, where only artifacts like distorted hands or warped objects reveal their synthetic origin. Detecting these artifacts is essential: without detection, we cannot benchmark generators or train reward models to improve them. Current detectors fine-tune VLMs on tens of thousands of labeled images, but this is expensive to repeat whenever generators evolve or new artifact types emerge. We show that pretrained VLMs already encode the knowledge needed to detect artifacts - with the right scaffolding, this capability can be unlocked using only a few hundred labeled examples per artifact category. Our system, ArtifactLens, achieves state-of-the-art on five human artifact benchmarks (the first evaluation across multiple datasets) while requiring orders of magnitude less labeled data. The scaffolding consists of a multi-component architecture with in-context learning and text instruction optimization, with novel improvements to each. Our methods generalize to other artifact types - object morphology, animal anatomy, and entity interactions - and to the distinct task of AIGC detection.

【7】MacrOData: New Benchmarks of Thousands of Datasets for Tabular Outlier Detection
标题：MacrOData：用于表格离群值检测的数千个数据集的新基准
链接：https://arxiv.org/abs/2602.09329

作者：Xueying Ding,Simon Klüttermann,Haomin Wen,Yilong Chen,Leman Akoglu
备注：28 pages
摘要：质量基准对于公平和准确地跟踪科学进展和使从业人员能够作出知情的方法选择至关重要。表格数据上的离群值检测（OD）支持许多现实世界的应用，但现有的OD基准仍然有限。著名的OD基准AdBench是文献中事实上的标准，但仅包含57个数据集。除了在这项工作中讨论的其他缺点外，它的小规模严重限制了多样性和统计能力。我们介绍MacrOData，一个大规模的基准测试套件，包括三个精心策划的组件：OddBench，790个数据集包含真实世界的语义异常; OvrBench，856个数据集具有真实世界的统计离群值;和SynBench，800个综合生成的数据集跨越不同的数据先验和离群值原型。由于其规模和多样性，MacrOData可以对表格OD方法进行全面和统计上可靠的评估。我们的基准测试进一步满足了几个关键的需求：我们为所有数据集提供标准化的训练/测试分割，公共/私有基准测试分区，后者保留了在线排行榜的测试标签，并用语义元数据注释我们的数据集。我们在所有的基准进行了广泛的实验，评估了广泛的OD方法，包括经典的，深，基础模型，在不同的超参数配置。我们报告了详细的实证研究结果，实用的指导方针，以及个人的表现作为未来研究的参考。所有包含2，446个数据集的基准都是开源的，以及在https://huggingface.co/MacrOData-CMU上托管的公开访问排行榜。
摘要：Quality benchmarks are essential for fairly and accurately tracking scientific progress and enabling practitioners to make informed methodological choices. Outlier detection (OD) on tabular data underpins numerous real-world applications, yet existing OD benchmarks remain limited. The prominent OD benchmark AdBench is the de facto standard in the literature, yet comprises only 57 datasets. In addition to other shortcomings discussed in this work, its small scale severely restricts diversity and statistical power. We introduce MacrOData, a large-scale benchmark suite for tabular OD comprising three carefully curated components: OddBench, with 790 datasets containing real-world semantic anomalies; OvrBench, with 856 datasets featuring real-world statistical outliers; and SynBench, with 800 synthetically generated datasets spanning diverse data priors and outlier archetypes. Owing to its scale and diversity, MacrOData enables comprehensive and statistically robust evaluation of tabular OD methods. Our benchmarks further satisfy several key desiderata: We provide standardized train/test splits for all datasets, public/private benchmark partitions with held-out test labels for the latter reserved toward an online leaderboard, and annotate our datasets with semantic metadata. We conduct extensive experiments across all benchmarks, evaluating a broad range of OD methods comprising classical, deep, and foundation models, over diverse hyperparameter configurations. We report detailed empirical findings, practical guidelines, as well as individual performances as references for future research. All benchmarks containing 2,446 datasets combined are open-sourced, along with a publicly accessible leaderboard hosted at https://huggingface.co/MacrOData-CMU.

【8】What do Geometric Hallucination Detection Metrics Actually Measure?
标题：几何幻觉检测预设实际测量什么？
链接：https://arxiv.org/abs/2602.09158

作者：Eric Yeats,John Buckheit,Sarah Scullen,Brendan Kennedy,Loc Truong,Davis Brown,Bill Kay,Cliff Joslyn,Tegan Emerson,Michael J. Henry,John Emanuello,Henry Kvinge
备注：Published at the 2025 ICML Workshop on Reliable and Responsible Foundation Models
摘要：幻觉仍然是在高后果应用中部署生成模型的障碍。这在外部地面实况不容易用于验证模型输出的情况下尤其如此。这种情况激发了对LLM内部状态中的几何信号的研究，这些信号预测幻觉并且需要有限的外部知识。鉴于有一系列因素可能导致模型输出被称为幻觉（例如，不相关性与不连贯性），在本文中，我们问这些几何统计实际上捕捉到了幻觉的哪些特定属性。为了评估这一点，我们生成了一个合成数据集，该数据集改变了与幻觉相关的输出的不同属性。这包括输出正确性、置信度、相关性、一致性和完整性。我们发现，不同的几何统计捕捉不同类型的幻觉。一路上，我们表明，许多现有的几何检测方法具有相当大的敏感性，在任务域的变化（例如，数学问题vs.历史问题）。受此启发，我们引入了一种简单的归一化方法来减轻域偏移对几何统计的影响，从而在多域设置中获得+34点的AUROC增益。
摘要：Hallucination remains a barrier to deploying generative models in high-consequence applications. This is especially true in cases where external ground truth is not readily available to validate model outputs. This situation has motivated the study of geometric signals in the internal state of an LLM that are predictive of hallucination and require limited external knowledge. Given that there are a range of factors that can lead model output to be called a hallucination (e.g., irrelevance vs incoherence), in this paper we ask what specific properties of a hallucination these geometric statistics actually capture. To assess this, we generate a synthetic dataset which varies distinct properties of output associated with hallucination. This includes output correctness, confidence, relevance, coherence, and completeness. We find that different geometric statistics capture different types of hallucinations. Along the way we show that many existing geometric detection methods have substantial sensitivity to shifts in task domain (e.g., math questions vs. history questions). Motivated by this, we introduce a simple normalization method to mitigate the effect of domain shift on geometric statistics, leading to AUROC gains of +34 points in multi-domain settings.

【9】Persistent Entropy as a Detector of Phase Transitions
标题：作为相转变检测器的持续性信息
链接：https://arxiv.org/abs/2602.09058

作者：Matteo Rucco
摘要：持久熵（PE）是持久性条形码的信息论概括统计量，已被广泛用于检测复杂系统中的状态变化。尽管它在经验上取得了成功，但对持久熵何时以及为什么可靠地检测相变的一般理论理解仍然有限，特别是在随机和数据驱动的环境中。在这项工作中，我们建立了一个一般的，独立于模型的定理提供了充分条件下，持久熵可证明分离两个阶段。我们表明，持久熵表现出渐近非零的差距跨越阶段。该结果仅依赖于沿着收敛图序列的持续熵的连续性，或在温和的正则化下，因此广泛适用于数据模态，过滤和同调度。为了连接渐近理论与有限时间计算，我们引入了一个操作框架的基础上拓扑稳定，定义一个拓扑过渡时间，通过稳定一个选定的拓扑统计滑动窗口，和一个基于概率的估计在有限的观察范围内的关键参数。我们验证了Kuramoto同步转换，集体运动中Vicsek有序到无序转换以及多个数据集和架构的神经网络训练动态的框架。在所有的实验中，持续熵的稳定和跨实现的可变性的崩溃提供了与理论机制一致的鲁棒的数字签名。
摘要：Persistent entropy (PE) is an information-theoretic summary statistic of persistence barcodes that has been widely used to detect regime changes in complex systems. Despite its empirical success, a general theoretical understanding of when and why persistent entropy reliably detects phase transitions has remained limited, particularly in stochastic and data-driven settings. In this work, we establish a general, model-independent theorem providing sufficient conditions under which persistent entropy provably separates two phases. We show that persistent entropy exhibits an asymptotically non-vanishing gap across phases. The result relies only on continuity of persistent entropy along the convergent diagram sequence, or under mild regularization, and is therefore broadly applicable across data modalities, filtrations, and homological degrees. To connect asymptotic theory with finite-time computations, we introduce an operational framework based on topological stabilization, defining a topological transition time by stabilizing a chosen topological statistic over sliding windows, and a probability-based estimator of critical parameters within a finite observation horizon. We validate the framework on the Kuramoto synchronization transition, the Vicsek order-to-disorder transition in collective motion, and neural network training dynamics across multiple datasets and architectures. Across all experiments, stabilization of persistent entropy and collapse of variability across realizations provide robust numerical signatures consistent with the theoretical mechanism.

分类|识别(3篇)

【1】Causal Identification in Multi-Task Demand Learning with Confounding
标题：带混杂的多任务需求学习中的原因识别
链接：https://arxiv.org/abs/2602.09969

作者：Varun Gupta,Vijay Kamble
摘要：我们研究了一个典型的多任务需求学习问题的动机零售定价，其中一家公司寻求估计异构的线性价格响应函数在一个大的集合的决策背景。每种情境的特征都是丰富的可观察协变量，但通常只表现出有限的历史价格变化，这促使人们使用多任务学习来借用任务之间的力量。在这种情况下，一个核心挑战是内隐性：历史价格是由管理者或算法选择的，可能与未观察到的任务级需求决定因素任意相关。在潜在基本面的这种混杂下，常用的方法，如汇总回归和元学习，无法识别因果价格效应。我们提出了一个新的估计框架，实现因果识别，尽管任意依赖价格和潜在的任务结构。我们的方法，决策条件掩蔽结果元学习（DCMOML），涉及精心设计的元学习者的信息集，利用跨任务的异质性，同时考虑到内源性的决策历史。在每个任务的价格自适应性的温和限制下，我们建立了该方法识别特定于任务的因果参数的条件均值给定的设计信息集。我们的研究结果为使用内生价格和小任务样本进行大规模需求估计提供了保证，为在运营环境中部署因果，数据驱动的定价模型提供了原则性基础。
摘要：We study a canonical multi-task demand learning problem motivated by retail pricing, in which a firm seeks to estimate heterogeneous linear price-response functions across a large collection of decision contexts. Each context is characterized by rich observable covariates yet typically exhibits only limited historical price variation, motivating the use of multi-task learning to borrow strength across tasks. A central challenge in this setting is endogeneity: historical prices are chosen by managers or algorithms and may be arbitrarily correlated with unobserved, task-level demand determinants. Under such confounding by latent fundamentals, commonly used approaches, such as pooled regression and meta-learning, fail to identify causal price effects. We propose a new estimation framework that achieves causal identification despite arbitrary dependence between prices and latent task structure. Our approach, Decision-Conditioned Masked-Outcome Meta-Learning (DCMOML), involves carefully designing the information set of a meta-learner to leverage cross-task heterogeneity while accounting for endogenous decision histories. Under a mild restriction on price adaptivity in each task, we establish that this method identifies the conditional mean of the task-specific causal parameters given the designed information set. Our results provide guarantees for large-scale demand estimation with endogenous prices and small per-task samples, offering a principled foundation for deploying causal, data-driven pricing models in operational environments.

【2】Differentiable Modeling for Low-Inertia Grids: Benchmarking PINNs, NODEs, and DP for Identification and Control of SMIB System
标题：低惯性电网的差异化建模：对PINN、NODE和DP进行基准测试，以识别和控制SMIB系统
链接：https://arxiv.org/abs/2602.09667

作者：Shinhoo Kang,Sangwook Kim,Sehyun Yun
备注：9 pages, 7 figures, 4 tables
摘要：向低惯性电力系统的过渡需要建模框架，不仅提供准确的状态预测，而且还提供物理上一致的控制灵敏度。虽然科学机器学习提供了强大的非线性建模工具，但不同可微范式的控制导向意义仍然没有得到充分的理解。本文提出了一个比较研究的物理信息神经网络（PINNs），神经常微分方程（NODE），微分规划（DP）的建模，识别和控制的电力系统动态。使用单机无限总线（SMIB）系统作为基准，我们评估其性能的轨迹外推，参数估计，和线性二次型调节器（LQR）的综合。我们的研究结果强调了数据驱动的灵活性和物理结构之间的基本权衡。NODE表现出优越的外推捕捉底层矢量场，而PINN显示有限的推广，由于其依赖于时间依赖的解决方案地图。在参数识别的逆问题中，虽然DP和PINN都成功地恢复了未知参数，但DP通过将控制方程作为硬约束来实现更快的收敛。最重要的是，对于控制合成，DP框架产生闭环稳定性的理论最优。此外，我们证明，NODE作为一个可行的数据驱动的替代时，控制方程不可用。
摘要：The transition toward low-inertia power systems demands modeling frameworks that provide not only accurate state predictions but also physically consistent sensitivities for control. While scientific machine learning offers powerful nonlinear modeling tools, the control-oriented implications of different differentiable paradigms remain insufficiently understood. This paper presents a comparative study of Physics-Informed Neural Networks (PINNs), Neural Ordinary Differential Equations (NODEs), and Differentiable Programming (DP) for modeling, identification, and control of power system dynamics. Using the Single Machine Infinite Bus (SMIB) system as a benchmark, we evaluate their performance in trajectory extrapolation, parameter estimation, and Linear Quadratic Regulator (LQR) synthesis. Our results highlight a fundamental trade-off between data-driven flexibility and physical structure. NODE exhibits superior extrapolation by capturing the underlying vector field, whereas PINN shows limited generalization due to its reliance on a time-dependent solution map. In the inverse problem of parameter identification, while both DP and PINN successfully recover the unknown parameters, DP achieves significantly faster convergence by enforcing governing equations as hard constraints. Most importantly, for control synthesis, the DP framework yields closed-loop stability comparable to the theoretical optimum. Furthermore, we demonstrate that NODE serves as a viable data-driven surrogate when governing equations are unavailable.

【3】DRAGON: Robust Classification for Very Large Collections of Software Repositories
标题：Dragon：针对超大规模软件库集合的稳健分类
链接：https://arxiv.org/abs/2602.09071

作者：Stefano Balla,Stefano Zacchiroli,Thomas Degueule,Jean-Rémy Falleri,Romain Robbes
摘要：使用反映其内容和目的的“主题”对源代码存储库进行自动分类的能力非常有用，特别是在导航或搜索大型软件集合时。然而，现有的方法通常严重依赖于README文件和其他元数据，这些文件和元数据经常丢失，限制了它们在现实世界的大规模设置中的适用性。我们提出了DRAGON，一个存储库分类器，专为非常大的和不同的软件集合。它完全基于通常存储在版本控制系统中的轻量级信号：文件和目录名，以及可选的README（如果可用）。在大规模存储库分类中，DRAGON将F1@5从54.8%提高到60.8%，超过了最先进的水平。即使没有README文件，DRAGON仍然有效，性能仅下降6%。当他们在场的时候。这种鲁棒性使其适用于文档稀疏或不一致的真实环境。此外，许多剩余的分类错误都是近似错误，其中预测的标签在语义上接近正确的主题。这个属性增加了预测在现实世界的软件集合中的实用价值，其中建议一些相关的主题仍然可以指导搜索和发现。作为开发DRAGON的副产品，我们还发布了迄今为止最大的开放数据集，用于存储库分类，包括82.5万个存储库以及相关的地面实况主题，这些主题来自Software Heritage归档，为未来关于软件存储库理解的大规模和语言无关的研究提供了基础。
摘要：The ability to automatically classify source code repositories with ''topics'' that reflect their content and purpose is very useful, especially when navigating or searching through large software collections. However, existing approaches often rely heavily on README files and other metadata, which are frequently missing, limiting their applicability in real-world large-scale settings. We present DRAGON, a repository classifier designed for very large and diverse software collections. It operates entirely on lightweight signals commonly stored in version control systems: file and directory names, and optionally the README when available. In repository classification at scale, DRAGON improves F1@5 from 54.8% to 60.8%, surpassing the state of the art. DRAGON remains effective even when README files are absent, with performance degrading by only 6% w.r.t. when they are present. This robustness makes it practical for real-world settings where documentation is sparse or inconsistent. Furthermore, many of the remaining classification errors are near misses, where predicted labels are semantically close to the correct topics. This property increases the practical value of the predictions in real-world software collections, where suggesting a few related topics can still guide search and discovery. As a byproduct of developing DRAGON, we also release the largest open dataset to date for repository classification, consisting of 825 thousand repositories with associated ground-truth topics, sourced from the Software Heritage archive, providing a foundation for future large-scale and language-agnostic research on software repository understanding.

表征(3篇)

【1】Towards Uniformity and Alignment for Multimodal Representation Learning
标题：实现多模式表示学习的一致性和一致性
链接：https://arxiv.org/abs/2602.09507

作者：Wenzhe Yin,Pan Zhou,Zehao Xiao,Jie Liu,Shujian Yu,Jan-Jakob Sonke,Efstratios Gavves
摘要：多模态表征学习旨在构建一个共享的嵌入空间，在该空间中，异构模态在语义上是一致的。尽管有强有力的实证结果，基于InfoNCE的目标引入了内在的冲突，产生跨模式的分配差距。在这项工作中，我们确定了两个冲突的多模态制度，都加剧了方式的数量增加：（i）一致性的冲突，即排斥的一致性破坏成对对齐，和（ii）内部对齐冲突，对齐多个模态引起竞争的对齐方向。为了解决这些问题，我们提出了多模态表示的对齐和一致性的原则性解耦，为多模态学习提供了一个无冲突的配方，同时支持区分和生成用例，而无需特定于任务的模块。然后，我们提供了一个理论上的保证，我们的方法作为一个有效的代理全球Hölder分歧多模态分布，从而减少了模态之间的分布差距。大量的实验检索和UnCLIP风格的生成表现出一致的收益。
摘要：Multimodal representation learning aims to construct a shared embedding space in which heterogeneous modalities are semantically aligned. Despite strong empirical results, InfoNCE-based objectives introduce inherent conflicts that yield distribution gaps across modalities. In this work, we identify two conflicts in the multimodal regime, both exacerbated as the number of modalities increases: (i) an alignment-uniformity conflict, whereby the repulsion of uniformity undermines pairwise alignment, and (ii) an intra-alignment conflict, where aligning multiple modalities induces competing alignment directions. To address these issues, we propose a principled decoupling of alignment and uniformity for multimodal representations, providing a conflict-free recipe for multimodal learning that simultaneously supports discriminative and generative use cases without task-specific modules. We then provide a theoretical guarantee that our method acts as an efficient proxy for a global Hölder divergence over multiple modality distributions, and thus reduces the distribution gap among modalities. Extensive experiments on retrieval and UnCLIP-style generation demonstrate consistent gains.

【2】Barycentric alignment for instance-level comparison of neural representations
标题：用于神经表示的实例级比较的重心对齐
链接：https://arxiv.org/abs/2602.09225

作者：Shreya Saha,Zoe Wanying He,Meenakshi Khosla
摘要：比较神经网络中的表示是一项挑战，因为表示允许对称性，例如单元的任意重新排序或激活空间的旋转，这会模糊模型之间的潜在等价性。我们引入了一个重心对齐框架，该框架排除了这些讨厌的对称性，以构建一个跨许多模型的通用嵌入空间。与现有的相似性措施，总结整个刺激集的关系，这个框架使相似性被定义在单个刺激的水平，揭示输入，引起收敛与发散的表示跨模型。使用这种实例级的相似性概念，我们确定了系统的输入属性，预测代表性的收敛与分歧的视觉和语言模型的家庭。我们还构建了跨个人和皮层区域的大脑表征的通用嵌入空间，使人类视觉层次结构的各个阶段的表征协议的实例级比较。最后，我们将相同的重心对齐框架应用于纯单峰视觉和语言模型，并发现事后对齐到共享空间中会产生图像文本相似性分数，这些分数密切跟踪人类的跨模态判断，并接近对比训练的视觉语言模型的性能。这明显表明，独立学习的表征已经共享了足够的几何结构，可以进行与人类对齐的跨模态比较。总之，这些结果表明，解决代表性的相似性，在个人的刺激水平揭示了无法检测到的现象，设置水平的比较指标。
摘要：Comparing representations across neural networks is challenging because representations admit symmetries, such as arbitrary reordering of units or rotations of activation space, that obscure underlying equivalence between models. We introduce a barycentric alignment framework that quotients out these nuisance symmetries to construct a universal embedding space across many models. Unlike existing similarity measures, which summarize relationships over entire stimulus sets, this framework enables similarity to be defined at the level of individual stimuli, revealing inputs that elicit convergent versus divergent representations across models. Using this instance-level notion of similarity, we identify systematic input properties that predict representational convergence versus divergence across vision and language model families. We also construct universal embedding spaces for brain representations across individuals and cortical regions, enabling instance-level comparison of representational agreement across stages of the human visual hierarchy. Finally, we apply the same barycentric alignment framework to purely unimodal vision and language models and find that post-hoc alignment into a shared space yields image text similarity scores that closely track human cross-modal judgments and approach the performance of contrastively trained vision-language models. This strikingly suggests that independently learned representations already share sufficient geometric structure for human-aligned cross-modal comparison. Together, these results show that resolving representational similarity at the level of individual stimuli reveals phenomena that cannot be detected by set-level comparison metrics.

【3】Spectral Disentanglement and Enhancement: A Dual-domain Contrastive Framework for Representation Learning
标题：光谱解纠缠和增强：表示学习的双域对比框架
链接：https://arxiv.org/abs/2602.09066

作者：Jinjin Guo,Yexin Li,Zhichao Huang,Jun Fang,Zhiyuan Liu,Chao Liu,Pengzhang Liu,Qixia Jiang
摘要：大规模多模态对比学习最近在学习丰富和可转移的表示方面取得了令人印象深刻的成功，但它仍然从根本上受到特征维度的统一处理和对所学习特征的内在谱结构的忽视的限制。经验证据表明，高维嵌入往往会崩溃成窄锥，集中在一个小子空间的任务相关的语义，而大多数维度仍然被噪声和虚假的相关性。这种光谱不平衡和纠缠破坏了模型的推广。我们提出了光谱解纠缠和增强（Spectral Disentanglement and Enhancement，简称SPOT），这是一个新的框架，它弥合了嵌入空间的几何结构与其光谱特性之间的差距。我们的方法利用奇异值分解来自适应地将特征维度划分为捕获任务关键语义的强信号、反映辅助相关性的弱信号和表示不相关扰动的噪声。然后，应用一种基于光谱的光谱增强策略，选择性地放大信息分量，并在理论上保证训练稳定性。在增强特征的基础上，我们进一步引入了双域对比损失，共同优化了特征空间和谱空间的对齐，有效地将谱正则化集成到训练过程中，并鼓励更丰富，更鲁棒的表示。在大规模多模态基准测试上的大量实验表明，该方法始终提高了表示的鲁棒性和泛化能力，优于最先进的方法。它与现有的对比管道无缝集成，为多模态表示学习提供了有效的解决方案。
摘要：Large-scale multimodal contrastive learning has recently achieved impressive success in learning rich and transferable representations, yet it remains fundamentally limited by the uniform treatment of feature dimensions and the neglect of the intrinsic spectral structure of the learned features. Empirical evidence indicates that high-dimensional embeddings tend to collapse into narrow cones, concentrating task-relevant semantics in a small subspace, while the majority of dimensions remain occupied by noise and spurious correlations. Such spectral imbalance and entanglement undermine model generalization. We propose Spectral Disentanglement and Enhancement (SDE), a novel framework that bridges the gap between the geometry of the embedded spaces and their spectral properties. Our approach leverages singular value decomposition to adaptively partition feature dimensions into strong signals that capture task-critical semantics, weak signals that reflect ancillary correlations, and noise representing irrelevant perturbations. A curriculum-based spectral enhancement strategy is then applied, selectively amplifying informative components with theoretical guarantees on training stability. Building upon the enhanced features, we further introduce a dual-domain contrastive loss that jointly optimizes alignment in both the feature and spectral spaces, effectively integrating spectral regularization into the training process and encouraging richer, more robust representations. Extensive experiments on large-scale multimodal benchmarks demonstrate that SDE consistently improves representation robustness and generalization, outperforming state-of-the-art methods. SDE integrates seamlessly with existing contrastive pipelines, offering an effective solution for multimodal representation learning.

编码器(1篇)

【1】Circuit Fingerprints: How Answer Tokens Encode Their Geometrical Path
标题：电路指纹：答案代币如何编码其几何路径
链接：https://arxiv.org/abs/2602.09784

作者：Andres Saurez,Neha Sengar,Dongsoo Har
备注：Submitted to ICML 2026. 15 pages, 11 figures
摘要：Transformers中的电路发现和激活转向已经发展为独立的研究线程，但两者都在相同的表征空间上操作。它们是同一底层结构的两个视图吗？我们表明，它们遵循一个单一的几何原理：答案令牌，孤立地处理，编码的方向，将产生它们。这种电路指纹假设使电路发现没有梯度或因果干预-恢复可比的结构，以梯度为基础的方法，通过几何对齐单独。我们在四个模型系列的标准基准测试（IOI，SVA，MCQA）上验证了这一点，实现了与基于梯度的方法相当的电路发现性能。识别电路组件的相同方向也可以实现受控转向-实现69.8%的情感分类准确率，而指令提示的准确率为53.1%，同时保持事实准确性。除了方法开发之外，这种读写对偶性揭示了Transformer电路基本上是几何结构：可解释性和可控性是同一对象的两个方面。
摘要：Circuit discovery and activation steering in transformers have developed as separate research threads, yet both operate on the same representational space. Are they two views of the same underlying structure? We show they follow a single geometric principle: answer tokens, processed in isolation, encode the directions that would produce them. This Circuit Fingerprint hypothesis enables circuit discovery without gradients or causal intervention -- recovering comparable structure to gradient-based methods through geometric alignment alone. We validate this on standard benchmarks (IOI, SVA, MCQA) across four model families, achieving circuit discovery performance comparable to gradient-based methods. The same directions that identify circuit components also enable controlled steering -- achieving 69.8\% emotion classification accuracy versus 53.1\% for instruction prompting while preserving factual accuracy. Beyond method development, this read-write duality reveals that transformer circuits are fundamentally geometric structures: interpretability and controllability are two facets of the same object.

优化|敛散性(8篇)

【1】Long Chain-of-Thought Compression via Fine-Grained Group Policy Optimization
标题：通过细粒度集团政策优化压缩长思想链
链接：https://arxiv.org/abs/2602.10048

作者：Xinchen Han,Hossam Afifi,Michel Marot,Xilu Wang,Lu Yin
备注：IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2026
摘要：大型语言模型（LLM）通常会生成不必要的冗长思想链（CoT）推理，这会增加计算成本和延迟，而不会带来相应的性能提升。在本文中，我们提出了\textbf{F}细粒度\textbf{G}群策略\textbf{O}优化（\textbf{FGO}），这是一种强化学习（RL）算法，通过细分群响应并根据长度和熵分配适当的权重来细化群响应，从而实现有效的CoT压缩。同时，作为组相对策略优化（GRPO）的增强变体，FGO成功地解决了GRPO的两个主要限制：低效的数据利用率和熵崩溃。我们在多个推理LLM和基准上评估FGO，包括MATH 500，AIME 24，AMC 23和Minerva。实验结果表明，FGO在不降低性能的前提下实现了有效的CoT压缩，同时解决了GRPO的关键限制。
摘要：Large Language Models (LLMs) often generate unnecessarily verbose Chain-of-Thought (CoT) reasoning that increases computational costs and latency without proportional performance gains. In this paper, we propose \textbf{F}ine-grained \textbf{G}roup policy \textbf{O}ptimization (\textbf{FGO}), a Reinforcement Learning (RL) algorithm that refines group responses by subdividing them and assigning appropriate weights based on length and entropy, thereby enabling effective CoT compression. Meanwhile, as an enhanced variant of Group Relative Policy Optimization (GRPO), FGO successfully addresses two major limitations of the GRPO: inefficient data utilization and entropy collapse. We evaluate FGO on multiple reasoning LLMs and benchmarks, including MATH500, AIME24, AMC23, and Minerva. Experimental results show that FGO achieves efficient CoT compression without degrading performance, and simultaneously resolves the key limitations of GRPO.

【2】Effectiveness of Binary Autoencoders for QUBO-Based Optimization Problems
标题：二进制自动编码器对基于QUBO的优化问题的有效性
链接：https://arxiv.org/abs/2602.10037

作者：Tetsuro Abe,Masashi Yamashita,Shu Tanaka
备注：14 pages, 5 figures
摘要：In black-box combinatorial optimization, objective evaluations are often expensive, so high quality solutions must be found under a limited budget. Factorization machine with quantum annealing (FMQA) builds a quadratic surrogate model from evaluated samples and optimizes it on an Ising machine. However, FMQA requires binary decision variables, and for nonbinary structures such as integer permutations, the choice of binary encoding strongly affects search efficiency. If the encoding fails to reflect the original neighborhood structure, small Hamming moves may not correspond to meaningful modifications in the original solution space, and constrained problems can yield many infeasible candidates that waste evaluations. Recent work combines FMQA with a binary autoencoder (bAE) that learns a compact binary latent code from feasible solutions, yet the mechanism behind its performance gains is unclear. Using a small traveling salesman problem as an interpretable testbed, we show that the bAE reconstructs feasible tours accurately and, compared with manually designed encodings at similar compression, better aligns tour distances with latent Hamming distances, yields smoother neighborhoods under small bit flips, and produces fewer local optima. These geometric properties explain why bAE+FMQA improves the approximation ratio faster while maintaining feasibility throughout optimization, and they provide guidance for designing latent representations for black-box optimization.

【3】ExO-PPO: an Extended Off-policy Proximal Policy Optimization Algorithm
标题：ExO-PPO：一种扩展的非策略近端策略优化算法
链接：https://arxiv.org/abs/2602.09726

作者：Hanyong Wang,Menglong Yang
摘要：Deep reinforcement learning has been able to solve various tasks successfully, however, due to the construction of policy gradient and training dynamics, tuning deep reinforcement learning models remains challenging. As one of the most successful deep reinforcement-learning algorithm, the Proximal Policy Optimization algorithm (PPO) clips the policy gradient within a conservative on-policy updates, which ensures reliable and stable policy improvement. However, this training pattern may sacrifice sample efficiency. On the other hand, off-policy methods make more adequate use of data through sample reuse, though at the cost of increased the estimation variance and bias. To leverage the advantages of both, in this paper, we propose a new PPO variant based on the stability guarantee from conservative on-policy iteration with a more efficient off-policy data utilization. Specifically, we first derive an extended off-policy improvement from an expectation form of generalized policy improvement lower bound. Then, we extend the clipping mechanism with segmented exponential functions for a suitable surrogate objective function. Third, the trajectories generated by the past $M$ policies are organized in the replay buffer for off-policy training. We refer to this method as Extended Off-policy Proximal Policy Optimization (ExO-PPO). Compared with PPO and some other state-of-the-art variants, we demonstrate an improved performance of ExO-PPO with balanced sample efficiency and stability on varied tasks in the empirical experiments.

【4】Beyond Uniform Credit: Causal Credit Assignment for Policy Optimization
标题：超越统一信用：政策优化的因果信用分配
链接：https://arxiv.org/abs/2602.09331

作者：Mykola Khandoga,Rui Yuan,Vinay Kumar Sankarapu
备注：12 pages, 1 figure
摘要：Policy gradient methods for language model reasoning, such as GRPO and DAPO, assign uniform credit to all generated tokens - the filler phrase "Let me think" receives the same gradient update as the critical calculation "23 + 45 = 68." We propose counterfactual importance weighting: mask reasoning spans, measure the drop in answer probability, and upweight tokens accordingly during policy gradient updates. Our method requires no auxiliary models or external annotation, instead importance is estimated directly from the policy model's own probability shifts. Experiments on GSM8K across three models spanning the Qwen and Llama families demonstrate consistent improvements over uniform baselines and faster convergence to equivalent accuracy. Inverting the importance signal hurts performance, confirming we capture genuine causal structure rather than noise. Analysis shows the method correctly prioritizes calculation steps over scaffolding text. We view these findings as establishing counterfactual importance weighting as a foundation for further research rather than a complete solution.

【5】Weighted Wasserstein Barycenter of Gaussian Processes for exotic Bayesian Optimization tasks
标题：奇异Bayesian优化任务的加权Wasserstein高斯过程重心
链接：https://arxiv.org/abs/2602.09181

作者：Antonio Candelieri,Francesco Archetti
摘要：Exploiting the analogy between Gaussian Distributions and Gaussian Processes' posterior, we present how the weighted Wasserstein Barycenter of Gaussian Processes (W2BGP) can be used to unify, under a common framework, different exotic Bayesian Optimization (BO) tasks. Specifically, collaborative/federated BO, (synchronous) batch BO, and multi-fidelity BO are considered in this paper. Our empirical analysis proves that each one of these tasks requires just an appropriate weighting schema for the W2BGP, while the entire framework remains untouched. Moreover, we demonstrate that the most well-known BO acquisition functions can be easily re-interpreted under the proposed framework and also enable a more computationally efficient way to deal with the computation of the Wasserstein Barycenter, compared with state-of-the-art methods from the Machine Learning literature. Finally, research perspectives branching from the proposed approach are presented.

【6】SpinCastML an Open Decision-Making Application for Inverse Design of Electrospinning Manufacturing: A Machine Learning, Optimal Sampling and Inverse Monte Carlo Approach
标题：SpinCastML一个用于静电纺制造逆设计的开放决策应用程序：机器学习、最佳抽样和逆蒙特卡罗方法
链接：https://arxiv.org/abs/2602.09120

作者：Elisa Roldan,Tasneem Sabir
摘要：Electrospinning is a powerful technique for producing micro to nanoscale fibers with application specific architectures. Small variations in solution or operating conditions can shift the jet regime, generating non Gaussian fiber diameter distributions. Despite substantial progress, no existing framework enables inverse design toward desired fiber outcomes while integrating polymer solvent chemical constraints or predicting full distributions. SpinCastML is an open source, distribution aware, chemically informed machine learning and Inverse Monte Carlo (IMC) software for inverse electrospinning design. Built on a rigorously curated dataset of 68,480 fiber diameters from 1,778 datasets across 16 polymers, SpinCastML integrates three structured sampling methods, a suite of 11 high-performance learners, and chemistry aware constraints to predict not only mean diameter but the entire distribution. Cubist model with a polymer balanced Sobol D optimal sampling provides the highest global performance (R2 > 0.92). IMC accurately captures the fiber distributions, achieving R2 > 0.90 and <1% error between predicted and experimental success rates. The IMC engine supports both retrospective analysis and forward-looking inverse design, generating physically and chemically feasible polymer solvent parameter combinations with quantified success probabilities for user-defined targets. SpinCastML reframes electrospinning from trial and error to a reproducible, data driven design process. As an open source executable, it enables laboratories to analyze their own datasets and co create an expanding community software. SpinCastML reduces experimental waste, accelerates discovery, and democratizes access to advanced modeling, establishing distribution aware inverse design as a new standard for sustainable nanofiber manufacturing across biomedical, filtration, and energy applications.

【7】Step-Size Stability in Stochastic Optimization: A Theoretical Perspective
标题：随机优化中的步进稳定性：理论视角
链接：https://arxiv.org/abs/2602.09842

作者：Fabian Schaipp,Robert M. Gower,Adrien Taylor
摘要：We present a theoretical analysis of stochastic optimization methods in terms of their sensitivity with respect to the step size. We identify a key quantity that, for each method, describes how the performance degrades as the step size becomes too large. For convex problems, we show that this quantity directly impacts the suboptimality bound of the method. Most importantly, our analysis provides direct theoretical evidence that adaptive step-size methods, such as SPS or NGN, are more robust than SGD. This allows us to quantify the advantage of these adaptive methods beyond empirical evaluation. Finally, we show through experiments that our theoretical bound qualitatively mirrors the actual performance as a function of the step size, even for nonconvex problems.

【8】Optimal Estimation in Orthogonally Invariant Generalized Linear Models: Spectral Initialization and Approximate Message Passing
标题：时间不变广义线性模型中的最佳估计：谱插值和近似消息传递
链接：https://arxiv.org/abs/2602.09240

作者：Yihan Zhang,Hong Chang Ji,Ramji Venkataramanan,Marco Mondelli
摘要：We consider the problem of parameter estimation from a generalized linear model with a random design matrix that is orthogonally invariant in law. Such a model allows the design have an arbitrary distribution of singular values and only assumes that its singular vectors are generic. It is a vast generalization of the i.i.d. Gaussian design typically considered in the theoretical literature, and is motivated by the fact that real data often have a complex correlation structure so that methods relying on i.i.d. assumptions can be highly suboptimal. Building on the paradigm of spectrally-initialized iterative optimization, this paper proposes optimal spectral estimators and combines them with an approximate message passing (AMP) algorithm, establishing rigorous performance guarantees for these two algorithmic steps. Both the spectral initialization and the subsequent AMP meet existing conjectures on the fundamental limits to estimation -- the former on the optimal sample complexity for efficient weak recovery, and the latter on the optimal errors. Numerical experiments suggest the effectiveness of our methods and accuracy of our theory beyond orthogonally invariant data.

预测|估计(10篇)

【1】Conformal Prediction Sets for Instance Segmentation
标题：实例分割的保形预测集
链接：https://arxiv.org/abs/2602.10045

作者：Kerri Lu,Dan M. Kluger,Stephen Bates,Sherrie Wang
摘要：Current instance segmentation models achieve high performance on average predictions, but lack principled uncertainty quantification: their outputs are not calibrated, and there is no guarantee that a predicted mask is close to the ground truth. To address this limitation, we introduce a conformal prediction algorithm to generate adaptive confidence sets for instance segmentation. Given an image and a pixel coordinate query, our algorithm generates a confidence set of instance predictions for that pixel, with a provable guarantee for the probability that at least one of the predictions has high Intersection-Over-Union (IoU) with the true object instance mask. We apply our algorithm to instance segmentation examples in agricultural field delineation, cell segmentation, and vehicle detection. Empirically, we find that our prediction sets vary in size based on query difficulty and attain the target coverage, outperforming existing baselines such as Learn Then Test, Conformal Risk Control, and morphological dilation-based methods. We provide versions of the algorithm with asymptotic and finite sample guarantees.

【2】Predictive Query Language: A Domain-Specific Language for Predictive Modeling on Relational Databases
标题：预测查询语言：一种用于关系数据库预测建模的领域专用语言
链接：https://arxiv.org/abs/2602.09572

作者：Vid Kocijan,Jinu Sunil,Jan Eric Lenssen,Viman Deb,Xinwei Xe,Federco Reyes Gomez,Matthias Fey,Jure Leskovec
摘要：The purpose of predictive modeling on relational data is to predict future or missing values in a relational database, for example, future purchases of a user, risk of readmission of the patient, or the likelihood that a financial transaction is fraudulent. Typically powered by machine learning methods, predictive models are used in recommendations, financial fraud detection, supply chain optimization, and other systems, providing billions of predictions every day. However, training a machine learning model requires manual work to extract the required training examples - prediction entities and target labels - from the database, which is slow, laborious, and prone to mistakes. Here, we present the Predictive Query Language (PQL), a SQL-inspired declarative language for defining predictive tasks on relational databases. PQL allows specifying a predictive task in a single declarative query, enabling the automatic computation training labels for a large variety of machine learning tasks, such as regression, classification, time-series forecasting, and recommender systems. PQL is already successfully integrated and used in a collection of use cases as part of a predictive AI platform. The versatility of the language can be demonstrated through its many ongoing use cases, including financial fraud, item recommendations, and workload prediction. We demonstrate its versatile design through two implementations; one for small-scale, low-latency use and one that can handle large-scale databases.

【3】In-Hospital Stroke Prediction from PPG-Derived Hemodynamic Features
标题：根据PGP推导的血流动力学特征预测医院中风
链接：https://arxiv.org/abs/2602.09328

作者：Jiaming Liu,Cheng Ding,Daoqiang Zhang
备注：11 pages, 6 figures, 3 tables. To appear in Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '26)
摘要：The absence of pre-hospital physiological data in standard clinical datasets fundamentally constrains the early prediction of stroke, as patients typically present only after stroke has occurred, leaving the predictive value of continuous monitoring signals such as photoplethysmography (PPG) unvalidated. In this work, we overcome this limitation by focusing on a rare but clinically critical cohort - patients who suffered stroke during hospitalization while already under continuous monitoring - thereby enabling the first large-scale analysis of pre-stroke PPG waveforms aligned to verified onset times. Using MIMIC-III and MC-MED, we develop an LLM-assisted data mining pipeline to extract precise in-hospital stroke onset timestamps from unstructured clinical notes, followed by physician validation, identifying 176 patients (MIMIC) and 158 patients (MC-MED) with high-quality synchronized pre-onset PPG data, respectively. We then extract hemodynamic features from PPG and employ a ResNet-1D model to predict impending stroke across multiple early-warning horizons. The model achieves F1-scores of 0.7956, 0.8759, and 0.9406 at 4, 5, and 6 hours prior to onset on MIMIC-III, and, without re-tuning, reaches 0.9256, 0.9595, and 0.9888 on MC-MED for the same horizons. These results provide the first empirical evidence from real-world clinical data that PPG contains predictive signatures of stroke several hours before onset, demonstrating that passively acquired physiological signals can support reliable early warning, supporting a shift from post-event stroke recognition to proactive, physiology-based surveillance that may materially improve patient outcomes in routine clinical care.

【4】RAPID: Risk of Attribute Prediction-Induced Disclosure in Synthetic Microdata
标题：RAID：合成微数据中属性预测引发的披露风险
链接：https://arxiv.org/abs/2602.09235

作者：Matthias Templ,Oscar Thees,Roman Müller
备注：29 pages, 5 figures
摘要：Statistical data anonymization increasingly relies on fully synthetic microdata, for which classical identity disclosure measures are less informative than an adversary's ability to infer sensitive attributes from released data. We introduce RAPID (Risk of Attribute Prediction--Induced Disclosure), a disclosure risk measure that directly quantifies inferential vulnerability under a realistic attack model. An adversary trains a predictive model solely on the released synthetic data and applies it to real individuals' quasi-identifiers. For continuous sensitive attributes, RAPID reports the proportion of records whose predicted values fall within a specified relative error tolerance. For categorical attributes, we propose a baseline-normalized confidence score that measures how much more confident the attacker is about the true class than would be expected from class prevalence alone, and we summarize risk as the fraction of records exceeding a policy-defined threshold. This construction yields an interpretable, bounded risk metric that is robust to class imbalance, independent of any specific synthesizer, and applicable with arbitrary learning algorithms. We illustrate threshold calibration, uncertainty quantification, and comparative evaluation of synthetic data generators using simulations and real data. Our results show that RAPID provides a practical, attacker-realistic upper bound on attribute-inference disclosure risk that complements existing utility diagnostics and disclosure control frameworks.

【5】A Lightweight Multi-View Approach to Short-Term Load Forecasting
标题：短期负荷预测的轻量级多视图方法
链接：https://arxiv.org/abs/2602.09220

作者：Julien Guité-Vinet,Alexandre Blondin Massé,Éric Beaudry
摘要：Time series forecasting is a critical task across domains such as energy, finance, and meteorology, where accurate predictions enable informed decision-making. While transformer-based and large-parameter models have recently achieved state-of-the-art results, their complexity can lead to overfitting and unstable forecasts, especially when older data points become less relevant. In this paper, we propose a lightweight multi-view approach to short-term load forecasting that leverages single-value embeddings and a scaled time-range input to capture temporally relevant features efficiently. We introduce an embedding dropout mechanism to prevent over-reliance on specific features and enhance interpretability. Our method achieves competitive performance with significantly fewer parameters, demonstrating robustness across multiple datasets, including scenarios with noisy or sparse data, and provides insights into the contributions of individual features to the forecast.

【6】ML-DCN: Masked Low-Rank Deep Crossing Network Towards Scalable Ads Click-through Rate Prediction at Pinterest
标题：ML-DCN：Pinterest上面向可扩展广告点进率预测的掩蔽低等级深度交叉网络
链接：https://arxiv.org/abs/2602.09194

作者：Jiacheng Li,Yixiong Meng,Yi wu,Yun Zhao,Sharare Zehtabian,Jiayin Jin,Degao Peng,Jinfeng Zhuang,Qifei Shen,Kungang Li
摘要：Deep learning recommendation systems rely on feature interaction modules to model complex user-item relationships across sparse categorical and dense features. In large-scale ad ranking, increasing model capacity is a promising path to improving both predictive performance and business outcomes, yet production serving budgets impose strict constraints on latency and FLOPs. This creates a central tension: we want interaction modules that both scale effectively with additional compute and remain compute-efficient at serving time. In this work, we study how to scale feature interaction modules under a fixed serving budget. We find that naively scaling DCNv2 and MaskNet, despite their widespread adoption in industry, yields rapidly diminishing offline gains in the Pinterest ads ranking system. To overcome aforementioned limitations, we propose ML-DCN, an interaction module that integrates an instance-conditioned mask into a low-rank crossing layer, enabling per-example selection and amplification of salient interaction directions while maintaining efficient computation. This novel architecture combines the strengths of DCNv2 and MaskNet, scales efficiently with increased compute, and achieves state-of-the-art performance. Experiments on a large internal Pinterest ads dataset show that ML-DCN achieves higher AUC than DCNv2, MaskNet, and recent scaling-oriented alternatives at matched FLOPs, and it scales more favorably overall as compute increases, exhibiting a stronger AUC-FLOPs trade-off. Finally, online A/B tests demonstrate statistically significant improvements in key ads metrics (including CTR and click-quality measures) and ML-DCN has been deployed in the production system with neutral serving cost.

【7】DMamba: Decomposition-enhanced Mamba for Time Series Forecasting
标题：DMamba：用于时间序列预测的分解增强型Mamba
链接：https://arxiv.org/abs/2602.09081

作者：Ruxuan Chen,Fang Sun
备注：9 pages, 3 figures, 4 tables
摘要：State Space Models (SSMs), particularly Mamba, have shown potential in long-term time series forecasting. However, existing Mamba-based architectures often struggle with datasets characterized by non-stationary patterns. A key observation from time series theory is that the statistical nature of inter-variable relationships differs fundamentally between the trend and seasonal components of a decomposed series. Trend relationships are often driven by a few common stochastic factors or long-run equilibria, suggesting that they reside on a lower-dimensional manifold. In contrast, seasonal relationships involve dynamic, high-dimensional interactions like phase shifts and amplitude co-movements, requiring more expressive modeling. In this paper, we propose DMamba, a novel forecasting model that explicitly aligns architectural complexity with this component-specific characteristic. DMamba employs seasonal-trend decomposition and processes the components with specialized, differentially complex modules: a variable-direction Mamba encoder captures the rich, cross-variable dynamics within the seasonal component, while a simple Multi-Layer Perceptron (MLP) suffices to learn from the lower-dimensional inter-variable relationships in the trend component. Extensive experiments on diverse datasets demonstrate that DMamba sets a new state-of-the-art (SOTA), consistently outperforming both recent Mamba-based architectures and leading decomposition-based models.

【8】Stabilized Maximum-Likelihood Iterative Quantum Amplitude Estimation for Structural CVaR under Correlated Random Fields
标题：相关随机场下结构CVaR的稳定最大似然迭代量子幅度估计
链接：https://arxiv.org/abs/2602.09847

作者：Alireza Tabarraei
摘要：Conditional Value-at-Risk (CVaR) is a central tail-risk measure in stochastic structural mechanics, yet its accurate evaluation under high-dimensional, spatially correlated material uncertainty remains computationally prohibitive for classical Monte Carlo methods. Leveraging bounded-expectation reformulations of CVaR compatible with quantum amplitude estimation, we develop a quantum-enhanced inference framework that casts CVaR evaluation as a statistically consistent, confidence-constrained maximum-likelihood amplitude estimation problem. The proposed method extends iterative quantum amplitude estimation (IQAE) by embedding explicit maximum-likelihood inference within a rigorously controlled interval-tracking architecture. To ensure global correctness under finite-shot noise and the non-injective oscillatory response induced by Grover amplification, we introduce a stabilized inference scheme incorporating multi-hypothesis feasibility tracking, periodic low-depth disambiguation, and a bounded restart mechanism governed by an explicit failure-probability budget. This formulation preserves the quadratic oracle-complexity advantage of amplitude estimation while providing finite-sample confidence guarantees and reduced estimator variance. The framework is demonstrated on benchmark problems with spatially correlated lognormal Young's modulus fields generated using a Nystrom low-rank Gaussian kernel model. Numerical results show that the proposed estimator achieves substantially lower oracle complexity than classical Monte Carlo CVaR estimation at comparable confidence levels, while maintaining rigorous statistical reliability. This work establishes a practically robust and theoretically grounded quantum-enhanced methodology for tail-risk quantification in stochastic continuum mechanics.

【9】Minimum Distance Summaries for Robust Neural Posterior Estimation
标题：鲁棒神经后验估计的最小距离总结
链接：https://arxiv.org/abs/2602.09161

作者：Sherman Khoo,Dennis Prangle,Song Liu,Mark Beaumont
摘要：Simulation-based inference (SBI) enables amortized Bayesian inference by first training a neural posterior estimator (NPE) on prior-simulator pairs, typically through low-dimensional summary statistics, which can then be cheaply reused for fast inference by querying it on new test observations. Because NPE is estimated under the training data distribution, it is susceptible to misspecification when observations deviate from the training distribution. Many robust SBI approaches address this by modifying NPE training or introducing error models, coupling robustness to the inference network and compromising amortization and modularity. We introduce minimum-distance summaries, a plug-in robust NPE method that adapts queried test-time summaries independently of the pretrained NPE. Leveraging the maximum mean discrepancy (MMD) as a distance between observed data and a summary-conditional predictive distribution, the adapted summary inherits strong robustness properties from the MMD. We demonstrate that the algorithm can be implemented efficiently with random Fourier feature approximations, yielding a lightweight, model-free test-time adaptation procedure. We provide theoretical guarantees for the robustness of our algorithm and empirically evaluate it on a range of synthetic and real-world tasks, demonstrating substantial robustness gains with minimal additional overhead.

【10】Predicting magnetism with first-principles AI
标题：用第一性原理人工智能预测磁性
链接：https://arxiv.org/abs/2602.09093

作者：Max Geier,Liang Fu
备注：6+3 pages, 3+4 figures
摘要：Computational discovery of magnetic materials remains challenging because magnetism arises from the competition between kinetic energy and Coulomb interaction that is often beyond the reach of standard electronic-structure methods. Here we tackle this challenge by directly solving the many-electron Schrödinger equation with neural-network variational Monte Carlo, which provides a highly expressive variational wavefunction for strongly correlated systems. Applying this technique to transition metal dichalcogenide moiré semicondutors, we predict itinerant ferromagnetism in WSe$_2$/WS$_2$ and an antiferromagnetic insulator in twisted $Γ$-valley homobilayer, using the same neural network without any physics input beyond the microscopic Hamiltonian. Crucially, both types of magnetic states are obtained from a single calculation within the $S_z=0$ sector, removing the need to compute and compare multiple $S_z$ sectors. This significantly reduces computational cost and paves the way for faster and more reliable magnetic material design.

其他神经网络|深度学习|模型|建模(25篇)

【1】Olaf-World: Orienting Latent Actions for Video World Modeling
标题：Olaf-World：为视频世界建模定位潜在动作
链接：https://arxiv.org/abs/2602.10104

作者：Yuxin Jiang,Yuchao Gu,Ivor W. Tsang,Mike Zheng Shou
备注：Project page: https://showlab.github.io/Olaf-World/ Code: https://github.com/showlab/Olaf-World
摘要：Scaling action-controllable world models is limited by the scarcity of action labels. While latent action learning promises to extract control interfaces from unlabeled video, learned latents often fail to transfer across contexts: they entangle scene-specific cues and lack a shared coordinate system. This occurs because standard objectives operate only within each clip, providing no mechanism to align action semantics across contexts. Our key insight is that although actions are unobserved, their semantic effects are observable and can serve as a shared reference. We introduce Seq$Δ$-REPA, a sequence-level control-effect alignment objective that anchors integrated latent action to temporal feature differences from a frozen, self-supervised video encoder. Building on this, we present Olaf-World, a pipeline that pretrains action-conditioned video world models from large-scale passive video. Extensive experiments demonstrate that our method learns a more structured latent action space, leading to stronger zero-shot action transfer and more data-efficient adaptation to new control interfaces than state-of-the-art baselines.

【2】Infusion: Shaping Model Behavior by Editing Training Data via Influence Functions
标题：输液：通过影响函数编辑训练数据塑造模型行为
链接：https://arxiv.org/abs/2602.09987

作者：J Rosser,Robert Kirk,Edward Grefenstette,Jakob Foerster,Laura Ruis
备注：10 pages, 14 figures
摘要：Influence functions are commonly used to attribute model behavior to training documents. We explore the reverse: crafting training data that induces model behavior. Our framework, Infusion, uses scalable influence-function approximations to compute small perturbations to training documents that induce targeted changes in model behavior through parameter shifts. We evaluate Infusion on data poisoning tasks across vision and language domains. On CIFAR-10, we show that making subtle edits via Infusion to just 0.2% (100/45,000) of the training documents can be competitive with the baseline of inserting a small number of explicit behavior examples. We also find that Infusion transfers across architectures (ResNet $\leftrightarrow$ CNN), suggesting a single poisoned corpus can affect multiple independently trained models. In preliminary language experiments, we characterize when our approach increases the probability of target behaviors and when it fails, finding it most effective at amplifying behaviors the model has already learned. Taken together, these results show that small, subtle edits to training data can systematically shape model behavior, underscoring the importance of training data interpretability for adversaries and defenders alike. We provide the code here: https://github.com/jrosseruk/infusion.

【3】Hybrid Responsible AI-Stochastic Approach for SLA Compliance in Multivendor 6G Networks
标题：多供应商6G网络中SLA合规性的混合负责任人工智能随机方法
链接：https://arxiv.org/abs/2602.09841

作者：Emanuel Figetakis,Ahmed Refaey Hussein
备注：6 pages, 4 figures
摘要：The convergence of AI and 6G network automation introduces new challenges in maintaining transparency, fairness, and accountability across multivendor management systems. Although closed-loop AI orchestration improves adaptability and self-optimization, it also creates a responsibility gap, where violations of SLAs cannot be causally attributed to specific agents or vendors. This paper presents a hybrid responsible AI-stochastic learning framework that embeds fairness, robustness, and auditability directly into the network control loop. The framework integrates RAI games with stochastic optimization, enabling dynamic adversarial reweighting and probabilistic exploration across heterogeneous vendor domains. An RAAP continuously records AI-driven decision trajectories and produces dual accountability reports: user-level SLA summaries and operator-level responsibility analytics. Experimental evaluations on synthetic two-class multigroup datasets demonstrate that the proposed hybrid model improves the accuracy of the worst group by up to 10.5\%. Specifically, hybrid RAI achieved a WGAcc of 60.5\% and an AvgAcc of 72.7\%, outperforming traditional RAI-GA (50.0\%) and ERM (21.5\%). The audit mechanism successfully traced 99\% simulated SLA violations to the AI entities responsible, producing both vendor and agent-level accountability indices. These results confirm that the proposed hybrid approach enhances fairness and robustness as well as establishes a concrete accountability framework for autonomous SLA assurance in multivendor 6G networks.

【4】Physics-informed diffusion models in spectral space
标题：光谱空间中的物理信息扩散模型
链接：https://arxiv.org/abs/2602.09708

作者：Davide Gallon,Philippe von Wurstemberger,Patrick Cheridito,Arnulf Jentzen
备注：24 pages, 9 figures
摘要：We propose a methodology that combines generative latent diffusion models with physics-informed machine learning to generate solutions of parametric partial differential equations (PDEs) conditioned on partial observations, which includes, in particular, forward and inverse PDE problems. We learn the joint distribution of PDE parameters and solutions via a diffusion process in a latent space of scaled spectral representations, where Gaussian noise corresponds to functions with controlled regularity. This spectral formulation enables significant dimensionality reduction compared to grid-based diffusion models and ensures that the induced process in function space remains within a class of functions for which the PDE operators are well defined. Building on diffusion posterior sampling, we enforce physics-informed constraints and measurement conditions during inference, applying Adam-based updates at each diffusion step. We evaluate the proposed approach on Poisson, Helmholtz, and incompressible Navier--Stokes equations, demonstrating improved accuracy and computational efficiency compared with existing diffusion-based PDE solvers, which are state of the art for sparse observations. Code is available at https://github.com/deeplearningmethods/PISD.

【5】Model soups need only one ingredient
标题：模型汤只需要一种成分
链接：https://arxiv.org/abs/2602.09689

作者：Alireza Abdollahpoorrostam,Nikolaos Dimitriadis,Adam Hazimeh,Pascal Frossard
摘要：Fine-tuning large pre-trained models on a target distribution often improves in-distribution (ID) accuracy, but at the cost of out-of-distribution (OOD) robustness as representations specialize to the fine-tuning data. Weight-space ensembling methods, such as Model Soups, mitigate this effect by averaging multiple checkpoints, but they are computationally prohibitive, requiring the training and storage of dozens of fine-tuned models. In this paper, we introduce MonoSoup, a simple, data-free, hyperparameter-free, post-hoc method that achieves a strong ID-OOD balance using only a single checkpoint. Our method applies Singular Value Decomposition (SVD) to each layer's update and decomposes it into high-energy directions that capture task-specific adaptation and low-energy directions that introduce noise but may still encode residual signals useful for robustness. MonoSoup then uses entropy-based effective rank to automatically re-weigh these components with layer-wise coefficients that account for the spectral and geometric structure of the model. Experiments on CLIP models fine-tuned on ImageNet and evaluated under natural distribution shifts, as well as on Qwen language models tested on mathematical reasoning and multiple-choice benchmarks, show that this plug-and-play approach is a practical and effective alternative to multi-checkpoint methods, retaining much of their benefits without their computational overhead.

【6】Resilient Class-Incremental Learning: on the Interplay of Drifting, Unlabelled and Imbalanced Data Streams
标题：弹性课堂增量学习：关于漂移、未标签和不平衡数据流的相互作用
链接：https://arxiv.org/abs/2602.09681

作者：Jin Li,Kleanthis Malialis,Marios Polycarpou
备注：Accepted by Artificial Intelligence Science and Engineering
摘要：In today's connected world, the generation of massive streaming data across diverse domains has become commonplace. In the presence of concept drift, class imbalance, label scarcity, and new class emergence, they jointly degrade representation stability, bias learning toward outdated distributions, and reduce the resilience and reliability of detection in dynamic environments. This paper proposes SCIL (Streaming Class-Incremental Learning) to address these challenges. The SCIL framework integrates an autoencoder (AE) with a multi-layer perceptron for multi-class prediction, uses a dual-loss strategy (classification and reconstruction) for prediction and new class detection, employs corrected pseudo-labels for online training, manages classes with queues, and applies oversampling to handle imbalance. The rationale behind the method's structure is elucidated through ablation studies and a comprehensive experimental evaluation is performed using both real-world and synthetic datasets that feature class imbalance, incremental classes, and concept drifts. Our results demonstrate that SCIL outperforms strong baselines and state-of-the-art methods. Based on our commitment to Open Science, we make our code and datasets available to the community.

【7】Training deep physical neural networks with local physical information bottleneck
标题：训练具有局部物理信息瓶颈的深度物理神经网络
链接：https://arxiv.org/abs/2602.09569

作者：Hao Wang,Ziao Wang,Xiangpeng Liang,Han Zhao,Jianqi Hu,Junjie Jiang,Xing Fu,Jianshi Tang,Huaqiang Wu,Sylvain Gigan,Qiang Liu
备注：9 pages, 4 figures
摘要：Deep learning has revolutionized modern society but faces growing energy and latency constraints. Deep physical neural networks (PNNs) are interconnected computing systems that directly exploit analog dynamics for energy-efficient, ultrafast AI execution. Realizing this potential, however, requires universal training methods tailored to physical intricacies. Here, we present the Physical Information Bottleneck (PIB), a general and efficient framework that integrates information theory and local learning, enabling deep PNNs to learn under arbitrary physical dynamics. By allocating matrix-based information bottlenecks to each unit, we demonstrate supervised, unsupervised, and reinforcement learning across electronic memristive chips and optical computing platforms. PIB also adapts to severe hardware faults and allows for parallel training via geographically distributed resources. Bypassing auxiliary digital models and contrastive measurements, PIB recasts PNN training as an intrinsic, scalable information-theoretic process compatible with diverse physical substrates.

【8】Learning to Discover Iterative Spectral Algorithms
标题：学习发现迭代谱算法
链接：https://arxiv.org/abs/2602.09530

作者：Zihang Liu,Oleg Balabanov,Yaoqing Yang,Michael W. Mahoney
摘要：We introduce AutoSpec, a neural network framework for discovering iterative spectral algorithms for large-scale numerical linear algebra and numerical optimization. Our self-supervised models adapt to input operators using coarse spectral information (e.g., eigenvalue estimates and residual norms), and they predict recurrence coefficients for computing or applying a matrix polynomial tailored to a downstream task. The effectiveness of AutoSpec relies on three ingredients: an architecture whose inference pass implements short, executable numerical linear algebra recurrences; efficient training on small synthetic problems with transfer to large-scale real-world operators; and task-defined objectives that enforce the desired approximation or preconditioning behavior across the range of spectral profiles represented in the training set. We apply AutoSpec to discovering algorithms for representative numerical linear algebra tasks: accelerating matrix-function approximation; accelerating sparse linear solvers; and spectral filtering/preconditioning for eigenvalue computations. On real-world matrices, the learned procedures deliver orders-of-magnitude improvements in accuracy and/or reductions in iteration count, relative to basic baselines. We also find clear connections to classical theory: the induced polynomials often exhibit near-equiripple, near-minimax behavior characteristic of Chebyshev polynomials.

【9】Beyond Student: An Asymmetric Network for Neural Network Inheritance
标题：超越学生：神经网络继承的不对称网络
链接：https://arxiv.org/abs/2602.09509

作者：Yiyun Zhou,Jingwei Shi,Mingjing Xu,Zhonghua Jiang,Jingyuan Chen
摘要：Knowledge Distillation (KD) has emerged as a powerful technique for model compression, enabling lightweight student networks to benefit from the performance of redundant teacher networks. However, the inherent capacity gap often limits the performance of student networks. Inspired by the expressiveness of pretrained teacher networks, a compelling research question arises: is there a type of network that can not only inherit the teacher's structure but also maximize the inheritance of its knowledge? Furthermore, how does the performance of such an inheriting network compare to that of student networks, all benefiting from the same teacher network? To further explore this question, we propose InherNet, a neural network inheritance method that performs asymmetric low-rank decomposition on the teacher's weights and reconstructs a lightweight yet expressive network without significant architectural disruption. By leveraging Singular Value Decomposition (SVD) for initialization to ensure the inheritance of principal knowledge, InherNet effectively balances depth, width, and compression efficiency. Experimental results across unimodal and multimodal tasks demonstrate that InherNet achieves higher performance compared to student networks of similar parameter sizes. Our findings reveal a promising direction for future research in efficient model compression beyond traditional distillation.

【10】Computationally Efficient Replicable Learning of Parities
标题：计算高效的可复制性慈善学习
链接：https://arxiv.org/abs/2602.09499

作者：Moshe Noivirt,Jessica Sorrell,Eliad Tsfadia
摘要：We study the computational relationship between replicability (Impagliazzo et al. [STOC `22], Ghazi et al. [NeurIPS `21]) and other stability notions. Specifically, we focus on replicable PAC learning and its connections to differential privacy (Dwork et al. [TCC 2006]) and to the statistical query (SQ) model (Kearns [JACM `98]). Statistically, it was known that differentially private learning and replicable learning are equivalent and strictly more powerful than SQ-learning. Yet, computationally, all previously known efficient (i.e., polynomial-time) replicable learning algorithms were confined to SQ-learnable tasks or restricted distributions, in contrast to differentially private learning. Our main contribution is the first computationally efficient replicable algorithm for realizable learning of parities over arbitrary distributions, a task that is known to be hard in the SQ-model, but possible under differential privacy. This result provides the first evidence that efficient replicable learning over general distributions strictly extends efficient SQ-learning, and is closer in power to efficient differentially private learning, despite computational separations between replicability and privacy. Our main building block is a new, efficient, and replicable algorithm that, given a set of vectors, outputs a subspace of their linear span that covers most of them.

【11】A Scoping Review of Deep Learning for Urban Visual Pollution and Proposal of a Real-Time Monitoring Framework with a Visual Pollution Index
标题：城市视觉污染深度学习的范围审查以及具有视觉污染指数的实时监测框架的提议
链接：https://arxiv.org/abs/2602.09446

作者：Mohammad Masudur Rahman,Md. Rashedur Rahman,Ashraful Islam,Saadia B Alam,M Ashraful Amin
摘要：Urban Visual Pollution (UVP) has emerged as a critical concern, yet research on automatic detection and application remains fragmented. This scoping review maps the existing deep learning-based approaches for detecting, classifying, and designing a comprehensive application framework for visual pollution management. Following the PRISMA-ScR guidelines, seven academic databases (Scopus, Web of Science, IEEE Xplore, ACM DL, ScienceDirect, SpringerNatureLink, and Wiley) were systematically searched and reviewed, and 26 articles were found. Most research focuses on specific pollutant categories and employs variations of YOLO, Faster R-CNN, and EfficientDet architectures. Although several datasets exist, they are limited to specific areas and lack standardized taxonomies. Few studies integrate detection into real-time application systems, yet they tend to be geographically skewed. We proposed a framework for monitoring visual pollution that integrates a visual pollution index to assess the severity of visual pollution for a certain area. This review highlights the need for a unified UVP management system that incorporates pollutant taxonomy, a cross-city benchmark dataset, a generalized deep learning model, and an assessment index that supports sustainable urban aesthetics and enhances the well-being of urban dwellers.

【12】Learning with Multiple Correct Answers -- A Trichotomy of Regret Bounds under Different Feedback Models
标题：多个正确答案学习--不同反馈模型下遗憾界限的三分法
链接：https://arxiv.org/abs/2602.09402

作者：Alireza F. Pour,Farnam Mansouri,Shai Ben-David
摘要：We study an online learning problem with multiple correct answers, where each instance admits a set of valid labels, and in each round the learner must output a valid label for the queried example. This setting is motivated by language generation tasks, in which a prompt may admit many acceptable completions, but not every completion is acceptable. We study this problem under three feedback models. For each model, we characterize the optimal mistake bound in the realizable setting using an appropriate combinatorial dimension. We then establish a trichotomy of regret bounds across the three models in the agnostic setting. Our results also imply sample complexity bounds for the batch setup that depend on the respective combinatorial dimensions.

【13】SnareNet: Flexible Repair Layers for Neural Networks with Hard Constraints
标题：SnareNet：具有硬约束的神经网络的灵活修复层
链接：https://arxiv.org/abs/2602.09317

作者：Ya-Chi Chu,Alkiviades Boukas,Madeleine Udell
摘要：Neural networks are increasingly used as surrogate solvers and control policies, but unconstrained predictions can violate physical, operational, or safety requirements. We propose SnareNet, a feasibility-controlled architecture for learning mappings whose outputs must satisfy input-dependent nonlinear constraints. SnareNet appends a differentiable repair layer that navigates in the constraint map's range space, steering iterates toward feasibility and producing a repaired output that satisfies constraints to a user-specified tolerance. To stabilize end-to-end training, we introduce adaptive relaxation, which designs a relaxed feasible set that snares the neural network at initialization and shrinks it into the feasible set, enabling early exploration and strict feasibility later in training. On optimization-learning and trajectory planning benchmarks, SnareNet consistently attains improved objective quality while satisfying constraints more reliably than prior work.

【14】Don't Shoot The Breeze: Topic Continuity Model Using Nonlinear Naive Bayes With Attention
标题：不要开枪：注意使用非线性天真Bayes的话题连续性模型
链接：https://arxiv.org/abs/2602.09312

作者：Shu-Ting Pi,Pradeep Bagavan,Yejia Li,Disha,Qun Liu
备注：EMNLP 2024: Industry Track; 8 pages, 2 figures, 1 table
摘要：Utilizing Large Language Models (LLM) as chatbots in diverse business scenarios often presents the challenge of maintaining topic continuity. Abrupt shifts in topics can lead to poor user experiences and inefficient utilization of computational resources. In this paper, we present a topic continuity model aimed at assessing whether a response aligns with the initial conversation topic. Our model is built upon the expansion of the corresponding natural language understanding (NLU) model into quantifiable terms using a Naive Bayes approach. Subsequently, we have introduced an attention mechanism and logarithmic nonlinearity to enhance its capability to capture topic continuity. This approach allows us to convert the NLU model into an interpretable analytical formula. In contrast to many NLU models constrained by token limits, our proposed model can seamlessly handle conversations of any length with linear time complexity. Furthermore, the attention mechanism significantly improves the model's ability to identify topic continuity in complex conversations. According to our experiments, our model consistently outperforms traditional methods, particularly in handling lengthy and intricate conversations. This unique capability offers us an opportunity to ensure the responsible and interpretable use of LLMs.

【15】Stabilizing Physics-Informed Consistency Models via Structure-Preserving Training
标题：基于结构保持训练的物理信息一致性模型稳定化
链接：https://arxiv.org/abs/2602.09303

作者：Che-Chia Chang,Chen-Yang Dai,Te-Sheng Lin,Ming-Chih Lai,Chieh-Hsin Lai
摘要：We propose a physics-informed consistency modeling framework for solving partial differential equations (PDEs) via fast, few-step generative inference. We identify a key stability challenge in physics-constrained consistency training, where PDE residuals can drive the model toward trivial or degenerate solutions, degrading the learned data distribution. To address this, we introduce a structure-preserving two-stage training strategy that decouples distribution learning from physics enforcement by freezing the coefficient decoder during physics-informed fine-tuning. We further propose a two-step residual objective that enforces physical consistency on refined, structurally valid generative trajectories rather than noisy single-step predictions. The resulting framework enables stable, high-fidelity inference for both unconditional generation and forward problems. We demonstrate that forward solutions can be obtained via a projection-based zero-shot inpainting procedure, achieving consistent accuracy of diffusion baselines with orders of magnitude reduction in computational cost.

【16】Do Neural Networks Lose Plasticity in a Gradually Changing World?
标题：神经网络在逐渐变化的世界中会失去可塑性吗？
链接：https://arxiv.org/abs/2602.09234

作者：Tianhui Liu,Lili Mou
摘要：Continual learning has become a trending topic in machine learning. Recent studies have discovered an interesting phenomenon called loss of plasticity, referring to neural networks gradually losing the ability to learn new tasks. However, existing plasticity research largely relies on contrived settings with abrupt task transitions, which often do not reflect real-world environments. In this paper, we propose to investigate a gradually changing environment, and we simulate this by input/output interpolation and task sampling. We perform theoretical and empirical analysis, showing that the loss of plasticity is an artifact of abrupt tasks changes in the environment and can be largely mitigated if the world changes gradually.

【17】Beyond the Unit Hypersphere: Embedding Magnitude in Contrastive Learning
标题：超越单位超球体：在对比学习中嵌入幅度
链接：https://arxiv.org/abs/2602.09229

作者：Xincan Feng,Taro Watanabe
备注：Preliminary work. Under review
摘要：Cosine similarity is prevalent in contrastive learning, yet it makes an implicit assumption: embedding magnitude is noise. Prior work occasionally found dot product and cosine similarity comparable, but left unanswered WHAT information magnitude carries, WHEN it helps, and HOW to leverage it. We conduct a systematic study through a $2 \times 2$ ablation that independently controls input-side and output-side normalization across text and vision models. Our findings reveal three key insights. First, in text retrieval, output (document) magnitude strongly correlates with relevance (Cohen's $d$ up to 1.80), yielding the largest gains on reasoning-intensive tasks. Second, input and output magnitudes serve asymmetric roles: output magnitude directly scales similarity scores while input magnitude modulates training dynamics. Third, magnitude learning benefits asymmetric tasks (text retrieval, RAG) but harms symmetric tasks (STS, text-image alignment). These findings establish a task symmetry principle: the choice between cosine and dot product depends on whether the task has distinct input roles, enabling cost-free improvements by simply removing an unnecessary constraint.

【18】Train Less, Infer Faster: Efficient Model Finetuning and Compression via Structured Sparsity
标题：减少训练，加快推理：通过结构化稀疏性进行高效模型微调和压缩
链接：https://arxiv.org/abs/2602.09169

作者：Jonathan Svirsky,Yehonathan Refael,Ofir Lindenbaum
摘要：Fully finetuning foundation language models (LMs) with billions of parameters is often impractical due to high computational costs, memory requirements, and the risk of overfitting. Although methods like low-rank adapters help address these challenges by adding small trainable modules to the frozen LM, they also increase memory usage and do not reduce inference latency. We uncover an intriguing phenomenon: sparsifying specific model rows and columns enables efficient task adaptation without requiring weight tuning. We propose a scheme for effective finetuning via sparsification using training stochastic gates, which requires minimal trainable parameters, reduces inference time, and removes 20--40\% of model parameters without significant accuracy loss. Empirical results show it outperforms recent finetuning baselines in efficiency and performance. Additionally, we provide theoretical guarantees for the convergence of this stochastic gating process, and show that our method admits a simpler and better-conditioned optimization landscape compared to LoRA. Our results highlight sparsity as a compelling mechanism for task-specific adaptation in LMs.

【19】Patient foundation model for risk stratification in low-risk overweight patients
标题：低风险超重患者风险分层的患者基础模型
链接：https://arxiv.org/abs/2602.09079

作者：Zachary N. Flamholz,Dillon Tracy,Ripple Khera,Jordan Wolinsky,Nicholas Lee,Nathaniel Tann,Xiao Yin Zhu,Harry Phillips,Jeffrey Sherman
摘要：Accurate risk stratification in patients with overweight or obesity is critical for guiding preventive care and allocating high-cost therapies such as GLP-1 receptor agonists. We present PatientTPP, a neural temporal point process (TPP) model trained on over 500,000 real-world clinical trajectories to learn patient representations from sequences of diagnoses, labs, and medications. We extend existing TPP modeling approaches to include static and numeric features and incorporate clinical knowledge for event encoding. PatientTPP representations support downstream prediction tasks, including classification of obesity-associated outcomes in low-risk individuals, even for events not explicitly modeled during training. In health economic evaluation, PatientTPP outperformed body mass index in stratifying patients by future cardiovascular-related healthcare costs, identifying higher-risk patients more efficiently. By modeling both the type and timing of clinical events, PatientTPP offers an interpretable, general-purpose foundation for patient risk modeling with direct applications to obesity-related care and cost targeting.

【20】Learning to Remember, Learn, and Forget in Attention-Based Models
标题：在基于注意力的模型中学会记住、学习和忘记
链接：https://arxiv.org/abs/2602.09075

作者：Djohan Bonnet,Jamie Lohoff,Jan Finkbeiner,Elidona Skhikerujah,Emre Neftci
摘要：In-Context Learning (ICL) in transformers acts as an online associative memory and is believed to underpin their high performance on complex sequence processing tasks. However, in gated linear attention models, this memory has a fixed capacity and is prone to interference, especially for long sequences. We propose Palimpsa, a self-attention model that views ICL as a continual learning problem that must address a stability-plasticity dilemma. Palimpsa uses Bayesian metaplasticity, where the plasticity of each attention state is tied to an importance state grounded by a prior distribution that captures accumulated knowledge. We demonstrate that various gated linear attention models emerge as specific architecture choices and posterior approximations, and that Mamba2 is a special case of Palimpsa where forgetting dominates. This theoretical link enables the transformation of any non-metaplastic model into a metaplastic one, significantly expanding its memory capacity. Our experiments show that Palimpsa consistently outperforms baselines on the Multi-Query Associative Recall (MQAR) benchmark and on Commonsense Reasoning tasks.

【21】Dynamic Load Model for Data Centers with Pattern-Consistent Calibration
标题：基于模式一致性校正的数据中心动态负载模型
链接：https://arxiv.org/abs/2602.07859

作者：Siyu Lu,Chenhan Xiao,Yang Weng
备注：10 pages, 13 figures
摘要：The rapid growth of data centers has made large electronic load (LEL) modeling increasingly important for power system analysis. Such loads are characterized by fast workload-driven variability and protection-driven disconnection and reconnection behavior that are not captured by conventional load models. Existing data center load modeling includes physics-based approaches, which provide interpretable structure for grid simulation, and data-driven approaches, which capture empirical workload variability from data. However, physics-based models are typically uncalibrated to facility-level operation, while trajectory alignment in data-driven methods often leads to overfitting and unrealistic dynamic behavior. To resolve these limitations, we design the framework to leverage both physics-based structure and data-driven adaptability. The physics-based structure is parameterized to enable data-driven pattern-consistent calibration from real operational data, supporting facility-level grid planning. We further show that trajectory-level alignment is limited for inherently stochastic data center loads. Therefore, we design the calibration to align temporal and statistical patterns using temporal contrastive learning (TCL). This calibration is performed locally at the facility, and only calibrated parameters are shared with utilities, preserving data privacy. The proposed load model is calibrated by real-world operational load data from the MIT Supercloud, ASU Sol, Blue Waters, and ASHRAE datasets. Then it is integrated into the ANDES platform and evaluated on the IEEE 39-bus, NPCC 140-bus, and WECC 179-bus systems. We find that interactions among LELs can fundamentally alter post-disturbance recovery behavior, producing compound disconnection-reconnection dynamics and delayed stabilization that are not captured by uncalibrated load models.

【22】Robust Processing and Learning: Principles, Methods, and Wireless Applications
标题：稳健的处理和学习：原理、方法和无线应用
链接：https://arxiv.org/abs/2602.09848

作者：Shixiong Wang,Wei Dai,Li-Chun Wang,Geoffrey Ye Li
摘要：This tutorial-style overview article examines the fundamental principles and methods of robustness, using wireless sensing and communication (WSC) as the narrative and exemplifying framework. First, we formalize the conceptual and mathematical foundations of robustness, highlighting the interpretations and relations across robust statistics, optimization, and machine learning. Key techniques, such as robust estimation and testing, distributionally robust optimization, and regularized and adversary training, are investigated. Together, the costs of robustness in system design, for example, the compromised nominal performances and the extra computational burdens, are discussed. Second, we review recent robust signal processing solutions for WSC that address model mismatch, data scarcity, adversarial perturbation, and distributional shift. Specific applications include robust ranging-based localization, modality sensing, channel estimation, receive combining, waveform design, and federated learning. Through this effort, we aim to introduce the classical developments and recent advances in robustness theory to the general signal processing community, exemplifying how robust statistical, optimization, and machine learning approaches can address the uncertainties inherent in WSC systems.

【23】Continual Learning for non-stationary regression via Memory-Efficient Replay
标题：通过内存高效重播进行非平稳回归的持续学习
链接：https://arxiv.org/abs/2602.09720

作者：Pablo García-Santaclara,Bruno Fernández-Castro,RebecaP. Díaz-Redondo,Martín Alonso-Gamarra
摘要：Data streams are rarely static in dynamic environments like Industry 4.0. Instead, they constantly change, making traditional offline models outdated unless they can quickly adjust to the new data. This need can be adequately addressed by continual learning (CL), which allows systems to gradually acquire knowledge without incurring the prohibitive costs of retraining them from scratch. Most research on continual learning focuses on classification problems, while very few studies address regression tasks. We propose the first prototype-based generative replay framework designed for online task-free continual regression. Our approach defines an adaptive output-space discretization model, enabling prototype-based generative replay for continual regression without storing raw data. Evidence obtained from several benchmark datasets shows that our framework reduces forgetting and provides more stable performance than other state-of-the-art solutions.

【24】The Entropic Signature of Class Speciation in Diffusion Models
标题：扩散模型中物种形成的熵特征
链接：https://arxiv.org/abs/2602.09651

作者：Florian Handke,Dejan Stančević,Felix Koulischer,Thomas Demeester,Luca Ambrogioni
备注：21 pages
摘要：Diffusion models do not recover semantic structure uniformly over time. Instead, samples transition from semantic ambiguity to class commitment within a narrow regime. Recent theoretical work attributes this transition to dynamical instabilities along class-separating directions, but practical methods to detect and exploit these windows in trained models are still limited. We show that tracking the class-conditional entropy of a latent semantic variable given the noisy state provides a reliable signature of these transition regimes. By restricting the entropy to semantic partitions, the entropy can furthermore resolve semantic decisions at different levels of abstraction. We analyze this behavior in high-dimensional Gaussian mixture models and show that the entropy rate concentrates on the same logarithmic time scale as the speciation symmetry-breaking instability previously identified in variance-preserving diffusion. We validate our method on EDM2-XS and Stable Diffusion 1.5, where class-conditional entropy consistently isolates the noise regimes critical for semantic structure formation. Finally, we use our framework to quantify how guidance redistributes semantic information over time. Together, these results connect information-theoretic and statistical physics perspectives on diffusion and provide a principled basis for time-localized control.

【25】From Average Sensitivity to Small-Loss Regret Bounds under Random-Order Model
标题：从平均敏感度到随机顺序模型下的小损失后悔界
链接：https://arxiv.org/abs/2602.09457

作者：Shinsaku Sakaue,Yuichi Yoshida
摘要：We study online learning in the random-order model, where the multiset of loss functions is chosen adversarially but revealed in a uniformly random order. Building on the batch-to-online conversion by Dong and Yoshida (2023), we show that if an offline algorithm admits a $(1+\varepsilon)$-approximation guarantee and the effect of $\varepsilon$ on its average sensitivity is characterized by a function $\varphi(\varepsilon)$, then an adaptive choice of $\varepsilon$ yields a small-loss regret bound of $\tilde O(\varphi^{\star}(\mathrm{OPT}_T))$, where $\varphi^{\star}$ is the concave conjugate of $\varphi$, $\mathrm{OPT}_T$ is the offline optimum over $T$ rounds, and $\tilde O$ hides polylogarithmic factors in $T$. Our method requires no regularity assumptions on loss functions, such as smoothness, and can be viewed as a generalization of the AdaGrad-style tuning applied to the approximation parameter $\varepsilon$. Our result recovers and strengthens the $(1+\varepsilon)$-approximate regret bounds of Dong and Yoshida (2023) and yields small-loss regret bounds for online $k$-means clustering, low-rank approximation, and regression. We further apply our framework to online submodular function minimization using $(1\pm\varepsilon)$-cut sparsifiers of submodular hypergraphs, obtaining a small-loss regret bound of $\tilde O(n^{3/4}(1 + \mathrm{OPT}_T^{3/4}))$, where $n$ is the ground-set size. Our approach sheds light on the power of sparsification and related techniques in establishing small-loss regret bounds in the random-order model.

其他(34篇)

【1】WildCat: Near-Linear Attention in Theory and Practice
标题：野猫：理论与实践中的近线性注意力
链接：https://arxiv.org/abs/2602.10056

作者：Tobias Schröder,Lester Mackey
摘要：We introduce WildCat, a high-accuracy, low-cost approach to compressing the attention mechanism in neural networks. While attention is a staple of modern network architectures, it is also notoriously expensive to deploy due to resource requirements that scale quadratically with the input sequence length $n$. WildCat avoids these quadratic costs by only attending over a small weighted coreset. Crucially, we select the coreset using a fast but spectrally-accurate subsampling algorithm -- randomly pivoted Cholesky -- and weight the elements optimally to minimise reconstruction error. Remarkably, given bounded inputs, WildCat approximates exact attention with super-polynomial $O(n^{-\sqrt{\log(\log(n))}})$ error decay while running in near-linear $O(n^{1+o(1)})$ time. In contrast, prior practical approximations either lack error guarantees or require quadratic runtime to guarantee such high fidelity. We couple this advance with a GPU-optimized PyTorch implementation and a suite of benchmark experiments demonstrating the benefits of WildCat for image generation, image classification, and language model KV cache compression.

【2】Position: Message-passing and spectral GNNs are two sides of the same coin
标题：位置：消息传递和光谱GNN是同一枚硬币的两面
链接：https://arxiv.org/abs/2602.10031

作者：Antonis Vasileiou,Juan Cervino,Pascal Frossard,Charilaos I. Kanatsoulis,Christopher Morris,Michael T. Schaub,Pierre Vandergheynst,Zhiyang Wang,Guy Wolf,Ron Levie
摘要：Graph neural networks (GNNs) are commonly divided into message-passing neural networks (MPNNs) and spectral graph neural networks, reflecting two largely separate research traditions in machine learning and signal processing. This paper argues that this divide is mostly artificial, hindering progress in the field. We propose a viewpoint in which both MPNNs and spectral GNNs are understood as different parametrizations of permutation-equivariant operators acting on graph signals. From this perspective, many popular architectures are equivalent in expressive power, while genuine gaps arise only in specific regimes. We further argue that MPNNs and spectral GNNs offer complementary strengths. That is, MPNNs provide a natural language for discrete structure and expressivity analysis using tools from logic and graph isomorphism research, while the spectral perspective provides principled tools for understanding smoothing, bottlenecks, stability, and community structure. Overall, we posit that progress in graph learning will be accelerated by clearly understanding the key similarities and differences between these two types of GNNs, and by working towards unifying these perspectives within a common theoretical and conceptual framework rather than treating them as competing paradigms.

【3】A Task-Centric Theory for Iterative Self-Improvement with Easy-to-Hard Curricula
标题：以任务为中心的理论，通过易到难的课程进行迭代自我提升
链接：https://arxiv.org/abs/2602.10014

作者：Chenruo Liu,Yijun Dong,Yiqiu Shen,Qi Lei
摘要：Iterative self-improvement fine-tunes an autoregressive large language model (LLM) on reward-verified outputs generated by the LLM itself. In contrast to the empirical success of self-improvement, the theoretical foundation of this generative, iterative procedure in a practical, finite-sample setting remains limited. We make progress toward this goal by modeling each round of self-improvement as maximum-likelihood fine-tuning on a reward-filtered distribution and deriving finite-sample guarantees for the expected reward. Our analysis reveals an explicit feedback loop where better models accept more data per iteration, supporting sustained self-improvement while explaining eventual saturation of such improvement. Adopting a task-centric view by considering reasoning tasks with multiple difficulty levels, we further prove quantifiable conditions on model initialization, task difficulty, and sample budget where easy-to-hard curricula provably achieve better guarantees than training on fixed mixtures of tasks. Our analyses are validated via Monte-Carlo simulations and controlled experiments on graph-based reasoning tasks.

【4】Online Monitoring Framework for Automotive Time Series Data using JEPA Embeddings
标题：使用JEPA嵌入式的汽车时间序列数据在线监控框架
链接：https://arxiv.org/abs/2602.09985

作者：Alexander Fertig,Karthikeyan Chandra Sekaran,Lakshman Balasubramanian,Michael Botsch
备注：Accepted at the 2026 IEEE Intelligent Vehicles Symposium. Copyright 2026 IEEE. Permission from IEEE must be obtained for use in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
摘要：As autonomous vehicles are rolled out, measures must be taken to ensure their safe operation. In order to supervise a system that is already in operation, monitoring frameworks are frequently employed. These run continuously online in the background, supervising the system status and recording anomalies. This work proposes an online monitoring framework to detect anomalies in object state representations. Thereby, a key challenge is creating a framework for anomaly detection without anomaly labels, which are usually unavailable for unknown anomalies. To address this issue, this work applies a self-supervised embedding method to translate object data into a latent representation space. For this, a JEPA-based self-supervised prediction task is constructed, allowing training without anomaly labels and the creation of rich object embeddings. The resulting expressive JEPA embeddings serve as input for established anomaly detection methods, in order to identify anomalies within object state representations. This framework is particularly useful for applications in real-world environments, where new or unknown anomalies may occur during operation for which there are no labels available. Experiments performed on the publicly available, real-world nuScenes dataset illustrate the framework's capabilities.

【5】A Controlled Study of Double DQN and Dueling DQN Under Cross-Environment Transfer
标题：跨环境转移下双DQN和决斗DQN的对照研究
链接：https://arxiv.org/abs/2602.09810

作者：Azka Nasir,Fatima Dossa,Muhammad Ahmed Atif,Mohammad Ahmed Atif
摘要：Transfer learning in deep reinforcement learning is often motivated by improved stability and reduced training cost, but it can also fail under substantial domain shift. This paper presents a controlled empirical study examining how architectural differences between Double Deep Q-Networks (DDQN) and Dueling DQN influence transfer behavior across environments. Using CartPole as a source task and LunarLander as a structurally distinct target task, we evaluate a fixed layer-wise representation transfer protocol under identical hyperparameters and training conditions, with baseline agents trained from scratch used to contextualize transfer effects. Empirical results show that DDQN consistently avoids negative transfer under the examined setup and maintains learning dynamics comparable to baseline performance in the target environment. In contrast, Dueling DQN consistently exhibits negative transfer under identical conditions, characterized by degraded rewards and unstable optimization behavior. Statistical analysis across multiple random seeds confirms a significant performance gap under transfer. These findings suggest that architectural inductive bias is strongly associated with robustness to cross-environment transfer in value-based deep reinforcement learning under the examined transfer protocol.

【6】Why Linear Interpretability Works: Invariant Subspaces as a Result of Architectural Constraints
标题：线性可解释性为何有效：建筑约束的不变子空间
链接：https://arxiv.org/abs/2602.09783

作者：Andres Saurez,Yousung Lee,Dongsoo Har
备注：Submitted to ICML 2026. 19 pages, 13 figures
摘要：Linear probes and sparse autoencoders consistently recover meaningful structure from transformer representations -- yet why should such simple methods succeed in deep, nonlinear systems? We show this is not merely an empirical regularity but a consequence of architectural necessity: transformers communicate information through linear interfaces (attention OV circuits, unembedding matrices), and any semantic feature decoded through such an interface must occupy a context-invariant linear subspace. We formalize this as the \emph{Invariant Subspace Necessity} theorem and derive the \emph{Self-Reference Property}: tokens directly provide the geometric direction for their associated features, enabling zero-shot identification of semantic structure without labeled data or learned probes. Empirical validation in eight classification tasks and four model families confirms the alignment between class tokens and semantically related instances. Our framework provides \textbf{a principled architectural explanation} for why linear interpretability methods work, unifying linear probes and sparse autoencoders.

【7】Flexible Entropy Control in RLVR with Gradient-Preserving Perspective
标题：基于保留者视角的WLVR中的灵活信息量控制
链接：https://arxiv.org/abs/2602.09782

作者：Kun Chen,Peng Shi,Fanfan Liu,Haibo Qiu,Zhixiong Zeng,Siqi Yang,Wenji Mao
备注：https://github.com/Kwen-Chen/Flexible-Entropy-Control
摘要：Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a critical method for enhancing the reasoning capabilities of Large Language Models (LLMs). However, continuous training often leads to policy entropy collapse, characterized by a rapid decay in entropy that results in premature overconfidence, reduced output diversity, and vanishing gradient norms that inhibit learning. Gradient-Preserving Clipping is a primary factor influencing these dynamics, but existing mitigation strategies are largely static and lack a framework connecting clipping mechanisms to precise entropy control. This paper proposes reshaping entropy control in RL from the perspective of Gradient-Preserving Clipping. We first theoretically and empirically verify the contributions of specific importance sampling ratio regions to entropy growth and reduction. Leveraging these findings, we introduce a novel regulation mechanism using dynamic clipping threshold to precisely manage entropy. Furthermore, we design and evaluate dynamic entropy control strategies, including increase-then-decrease, decrease-increase-decrease, and oscillatory decay. Experimental results demonstrate that these strategies effectively mitigate entropy collapse, and achieve superior performance across multiple benchmarks.

【8】Sample-Efficient Real-World Dexterous Policy Fine-Tuning via Action-Chunked Critics and Normalizing Flows
标题：样本高效的现实世界灵活的政策通过数据组块的批评者和规范化流程进行微调
链接：https://arxiv.org/abs/2602.09580

作者：Chenyu Yang,Denis Tarasov,Davide Liconti,Hehui Zheng,Robert K. Katzschmann
摘要：Real-world fine-tuning of dexterous manipulation policies remains challenging due to limited real-world interaction budgets and highly multimodal action distributions. Diffusion-based policies, while expressive, do not permit conservative likelihood-based updates during fine-tuning because action probabilities are intractable. In contrast, conventional Gaussian policies collapse under multimodality, particularly when actions are executed in chunks, and standard per-step critics fail to align with chunked execution, leading to poor credit assignment. We present SOFT-FLOW, a sample-efficient off-policy fine-tuning framework with normalizing flow (NF) to address these challenges. The normalizing flow policy yields exact likelihoods for multimodal action chunks, allowing conservative, stable policy updates through likelihood regularization and thereby improving sample efficiency. An action-chunked critic evaluates entire action sequences, aligning value estimation with the policy's temporal structure and improving long-horizon credit assignment. To our knowledge, this is the first demonstration of a likelihood-based, multimodal generative policy combined with chunk-level value learning on real robotic hardware. We evaluate SOFT-FLOW on two challenging dexterous manipulation tasks in the real world: cutting tape with scissors retrieved from a case, and in-hand cube rotation with a palm-down grasp -- both of which require precise, dexterous control over long horizons. On these tasks, SOFT-FLOW achieves stable, sample-efficient adaptation where standard methods struggle.

【9】Improved Approximate Regret for Decentralized Online Continuous Submodular Maximization via Reductions
标题：通过约简改进分散在线连续子模最大化的近似遗憾
链接：https://arxiv.org/abs/2602.09502

作者：Yuanyu Wan,Yu Shen,Dingzhi Yu,Bo Xue,Mingli Song
摘要：To expand the applicability of decentralized online learning, previous studies have proposed several algorithms for decentralized online continuous submodular maximization (D-OCSM) -- a non-convex/non-concave setting with continuous DR-submodular reward functions. However, there exist large gaps between their approximate regret bounds and the regret bounds achieved in the convex setting. Moreover, if focusing on projection-free algorithms, which can efficiently handle complex decision sets, they cannot even recover the approximate regret bounds achieved in the centralized setting. In this paper, we first demonstrate that for D-OCSM over general convex decision sets, these two issues can be addressed simultaneously. Furthermore, for D-OCSM over downward-closed decision sets, we show that the second issue can be addressed while significantly alleviating the first issue. Our key techniques are two reductions from D-OCSM to decentralized online convex optimization (D-OCO), which can exploit D-OCO algorithms to improve the approximate regret of D-OCSM in these two cases, respectively.

【10】Beware of the Batch Size: Hyperparameter Bias in Evaluating LoRA
标题：小心批量大小：评估LoRA时的超参数偏差
链接：https://arxiv.org/abs/2602.09492

作者：Sangyoon Lee,Jaeho Lee
摘要：Low-rank adaptation (LoRA) is a standard approach for fine-tuning large language models, yet its many variants report conflicting empirical gains, often on the same benchmarks. We show that these contradictions arise from a single overlooked factor: the batch size. When properly tuned, vanilla LoRA often matches the performance of more complex variants. We further propose a proxy-based, cost-efficient strategy for batch size tuning, revealing the impact of rank, dataset size, and model capacity on the optimal batch size. Our findings elevate batch size from a minor implementation detail to a first-order design parameter, reconciling prior inconsistencies and enabling more reliable evaluations of LoRA variants.

【11】Taming the Monster Every Context: Complexity Measure and Unified Framework for Offline-Oracle Efficient Contextual Bandits
标题：驯服怪物每个上下文：离线Oracle高效上下文盗贼的复杂性衡量和统一框架
链接：https://arxiv.org/abs/2602.09456

作者：Hao Qin,Chicheng Zhang
备注：40 pages (13 pages main body, 24 pages supplementary materials)
摘要：We propose an algorithmic framework, Offline Estimation to Decisions (OE2D), that reduces contextual bandit learning with general reward function approximation to offline regression. The framework allows near-optimal regret for contextual bandits with large action spaces with $O(log(T))$ calls to an offline regression oracle over $T$ rounds, and makes $O(loglog(T))$ calls when $T$ is known. The design of OE2D algorithm generalizes Falcon~\citep{simchi2022bypassing} and its linear reward version~\citep[][Section 4]{xu2020upper} in that it chooses an action distribution that we term ``exploitative F-design'' that simultaneously guarantees low regret and good coverage that trades off exploration and exploitation. Central to our regret analysis is a new complexity measure, the Decision-Offline Estimation Coefficient (DOEC), which we show is bounded in bounded Eluder dimension per-context and smoothed regret settings. We also establish a relationship between DOEC and Decision Estimation Coefficient (DEC)~\citep{foster2021statistical}, bridging the design principles of offline- and online-oracle efficient contextual bandit algorithms for the first time.

【12】Enhancing Affine Maximizer Auctions with Correlation-Aware Payment
标题：通过相关性感知支付增强仿射最大化拍卖
链接：https://arxiv.org/abs/2602.09455

作者：Haoran Sun,Xuanzhi Xia,Xu Chu,Xiaotie Deng
备注：22 pages. Work in progress
摘要：Affine Maximizer Auctions (AMAs), a generalized mechanism family from VCG, are widely used in automated mechanism design due to their inherent dominant-strategy incentive compatibility (DSIC) and individual rationality (IR). However, as the payment form is fixed, AMA's expressiveness is restricted, especially in distributions where bidders' valuations are correlated. In this paper, we propose Correlation-Aware AMA (CA-AMA), a novel framework that augments AMA with a new correlation-aware payment. We show that any CA-AMA preserves the DSIC property and formalize finding optimal CA-AMA as a constraint optimization problem subject to the IR constraint. Then, we theoretically characterize scenarios where classic AMAs can perform arbitrarily poorly compared to the optimal revenue, while the CA-AMA can reach the optimal revenue. For optimizing CA-AMA, we design a practical two-stage training algorithm. We derive that the target function's continuity and the generalization bound on the degree of deviation from strict IR. Finally, extensive experiments showcase that our algorithm can find an approximate optimal CA-AMA in various distributions with improved revenue and a low degree of violation of IR.

【13】The Wisdom of Many Queries: Complexity-Diversity Principle for Dense Retriever Training
标题：多个人的智慧：密集寻回犬训练的复杂性-多样性原则
链接：https://arxiv.org/abs/2602.09448

作者：Xincan Feng,Noriki Nishida,Yusuke Sakai,Yuji Matsumoto
备注：Under review
摘要：Prior work reports conflicting results on query diversity in synthetic data generation for dense retrieval. We identify this conflict and design Q-D metrics to quantify diversity's impact, making the problem measurable. Through experiments on 4 benchmark types (31 datasets), we find query diversity especially benefits multi-hop retrieval. Deep analysis on multi-hop data reveals that diversity benefit correlates strongly with query complexity ($r$$\geq$0.95, $p$$$10: use diversity; CW$

【14】Reward-Guided Discrete Diffusion via Clean-Sample Markov Chain for Molecule and Biological Sequence Design
标题：用于分子和生物序列设计的基于清洁样本Markov链的奖励引导离散扩散
链接：https://arxiv.org/abs/2602.09424

作者：Prin Phunyaphibarn,Minhyuk Sung
摘要：Discrete diffusion models have recently emerged as a powerful class of generative models for chemistry and biology data. In these fields, the goal is to generate various samples with high rewards (e.g., drug-likeness in molecules), making reward-based guidance crucial. Most existing methods are based on guiding the diffusion model using intermediate rewards but tend to underperform since intermediate rewards are noisy due to the non-smooth nature of reward functions used in scientific domains. To address this, we propose Clean-Sample Markov Chain (CSMC) Sampler, a method that performs effective test-time reward-guided sampling for discrete diffusion models, enabling local search without relying on intermediate rewards. CSMC constructs a Markov chain of clean samples using the Metropolis-Hastings algorithm such that its stationary distribution is the target distribution. We design a proposal distribution by sequentially applying the forward and backward diffusion processes, making the acceptance probability tractable. Experiments on molecule and biological sequence generation with various reward functions demonstrate that our method consistently outperforms prior approaches that rely on intermediate rewards.

【15】Sparse Layer Sharpness-Aware Minimization for Efficient Fine-Tuning
标题：稀疏层清晰度感知最小化以实现高效微调
链接：https://arxiv.org/abs/2602.09395

作者：Yifei Cheng,Xianglin Yang,Guoxia Wang,Chao Huang,Fei Ma,Dianhai Yu,Xiaochun Cao,Li Shen
摘要：Sharpness-aware minimization (SAM) seeks the minima with a flat loss landscape to improve the generalization performance in machine learning tasks, including fine-tuning. However, its extra parameter perturbation step doubles the computation cost, which becomes the bottleneck of SAM in the practical implementation. In this work, we propose an approach SL-SAM to break this bottleneck by introducing the sparse technique to layers. Our key innovation is to frame the dynamic selection of layers for both the gradient ascent (perturbation) and descent (update) steps as a multi-armed bandit problem. At the beginning of each iteration, SL-SAM samples a part of the layers of the model according to the gradient norm to participate in the backpropagation of the following parameter perturbation and update steps, thereby reducing the computation complexity. We then provide the analysis to guarantee the convergence of SL-SAM. In the experiments of fine-tuning models in several tasks, SL-SAM achieves the performances comparable to the state-of-the-art baselines, including a \#1 rank on LLM fine-tuning. Meanwhile, SL-SAM significantly reduces the ratio of active parameters in backpropagation compared to vanilla SAM (SL-SAM activates 47\%, 22\% and 21\% parameters on the vision, moderate and large language model respectively while vanilla SAM always activates 100\%), verifying the efficiency of our proposed algorithm.

【16】Priority-Aware Shapley Value
标题：意识优先级的Shapley价值
链接：https://arxiv.org/abs/2602.09326

作者：Kiljae Lee,Ziqi Liu,Weijing Tang,Yuan Zhang
摘要：Shapley values are widely used for model-agnostic data valuation and feature attribution, yet they implicitly assume contributors are interchangeable. This can be problematic when contributors are dependent (e.g., reused/augmented data or causal feature orderings) or when contributions should be adjusted by factors such as trust or risk. We propose Priority-Aware Shapley Value (PASV), which incorporates both hard precedence constraints and soft, contributor-specific priority weights. PASV is applicable to general precedence structures, recovers precedence-only and weight-only Shapley variants as special cases, and is uniquely characterized by natural axioms. We develop an efficient adjacent-swap Metropolis-Hastings sampler for scalable Monte Carlo estimation and analyze limiting regimes induced by extreme priority weights. Experiments on data valuation (MNIST/CIFAR10) and feature attribution (Census Income) demonstrate more structure-faithful allocations and a practical sensitivity analysis via our proposed "priority sweeping".

【17】Clarifying Shampoo: Adapting Spectral Descent to Stochasticity and the Parameter Trajectory
标题：澄清洗发水：使光谱下降适应随机性和参数轨迹
链接：https://arxiv.org/abs/2602.09314

作者：Runa Eschenhagen,Anna Cai,Tsung-Hsien Lee,Hao-Jun Michael Shi
摘要：Optimizers leveraging the matrix structure in neural networks, such as Shampoo and Muon, are more data-efficient than element-wise algorithms like Adam and Signum. While in specific settings, Shampoo and Muon reduce to spectral descent analogous to how Adam and Signum reduce to sign descent, their general relationship and relative data efficiency under controlled settings remain unclear. Through extensive experiments on language models, we demonstrate that Shampoo achieves higher token efficiency than Muon, mirroring Adam's advantage over Signum. We show that Shampoo's update applied to weight matrices can be decomposed into an adapted Muon update. Consistent with this, Shampoo's benefits can be exclusively attributed to its application to weight matrices, challenging interpretations agnostic to parameter shapes. This admits a new perspective that also avoids shortcomings of related interpretations based on variance adaptation and whitening: rather than enforcing semi-orthogonality as in spectral descent, Shampoo's updates are time-averaged semi-orthogonal in expectation.

【18】Statistical Roughness-Informed Machine Unlearning
标题：统计粗糙度知情的机器去学习
链接：https://arxiv.org/abs/2602.09304

作者：Mohammad Partohaghighi,Roummel Marcia,Bruce J. West,YangQuan Chen
摘要：Machine unlearning aims to remove the influence of a designated forget set from a trained model while preserving utility on the retained data. In modern deep networks, approximate unlearning frequently fails under large or adversarial deletions due to pronounced layer-wise heterogeneity: some layers exhibit stable, well-regularized representations while others are brittle, undertrained, or overfit, so naive update allocation can trigger catastrophic forgetting or unstable dynamics. We propose Statistical-Roughness Adaptive Gradient Unlearning (SRAGU), a mechanism-first unlearning algorithm that reallocates unlearning updates using layer-wise statistical roughness operationalized via heavy-tailed spectral diagnostics of layer weight matrices. Starting from an Adaptive Gradient Unlearning (AGU) sensitivity signal computed on the forget set, SRAGU estimates a WeightWatcher-style heavy-tailed exponent for each layer, maps it to a bounded spectral stability weight, and uses this stability signal to spectrally reweight the AGU sensitivities before applying the same minibatch update form. This concentrates unlearning motion in spectrally stable layers while damping updates in unstable or overfit layers, improving stability under hard deletions. We evaluate unlearning via behavioral alignment to a gold retrained reference model trained from scratch on the retained data, using empirical prediction-divergence and KL-to-gold proxies on a forget-focused query set; we additionally report membership inference auditing as a complementary leakage signal, treating forget-set points as should-be-forgotten members during evaluation.

【19】The effect of whitening on explanation performance
标题：白化对解释绩效的影响
链接：https://arxiv.org/abs/2602.09278

作者：Benedict Clark,Stoyan Karastoyanov,Rick Wilming,Stefan Haufe
备注：Presented at the NeurIPS 2024 workshop on Interpretable AI: Past, Present and Future
摘要：Explainable Artificial Intelligence (XAI) aims to provide transparent insights into machine learning models, yet the reliability of many feature attribution methods remains a critical challenge. Prior research (Haufe et al., 2014; Wilming et al., 2022, 2023) has demonstrated that these methods often erroneously assign significant importance to non-informative variables, such as suppressor variables, leading to fundamental misinterpretations. Since statistical suppression is induced by feature dependencies, this study investigates whether data whitening, a common preprocessing technique for decorrelation, can mitigate such errors. Using the established XAI-TRIS benchmark (Clark et al., 2024b), which offers synthetic ground-truth data and quantitative measures of explanation correctness, we empirically evaluate 16 popular feature attribution methods applied in combination with 5 distinct whitening transforms. Additionally, we analyze a minimal linear two-dimensional classification problem (Wilming et al., 2023) to theoretically assess whether whitening can remove the impact of suppressor features from Bayes-optimal models. Our results indicate that, while specific whitening techniques can improve explanation performance, the degree of improvement varies substantially across XAI methods and model architectures. These findings highlight the complex relationship between data non-linearities, preprocessing quality, and attribution fidelity, underscoring the vital role of pre-processing techniques in enhancing model interpretability.

【20】Generalizing GNNs with Tokenized Mixture of Experts
标题：用代币化的专家混合来概括GNN
链接：https://arxiv.org/abs/2602.09258

作者：Xiaoguang Guo,Zehong Wang,Jiazheng Li,Shawn Spitzel,Qi Yang,Kaize Ding,Jundong Li,Chuxu Zhang
备注：Graph Neural Networks, Generalization, Mixture of Experts
摘要：Deployed graph neural networks (GNNs) are frozen at deployment yet must fit clean data, generalize under distribution shifts, and remain stable to perturbations. We show that static inference induces a fundamental tradeoff: improving stability requires reducing reliance on shift-sensitive features, leaving an irreducible worst-case generalization floor. Instance-conditional routing can break this ceiling, but is fragile because shifts can mislead routing and perturbations can make routing fluctuate. We capture these effects via two decompositions separating coverage vs selection, and base sensitivity vs fluctuation amplification. Based on these insights, we propose STEM-GNN, a pretrain-then-finetune framework with a mixture-of-experts encoder for diverse computation paths, a vector-quantized token interface to stabilize encoder-to-head signals, and a Lipschitz-regularized head to bound output amplification. Across nine node, link, and graph benchmarks, STEM-GNN achieves a stronger three-way balance, improving robustness to degree/homophily shifts and to feature/edge corruptions while remaining competitive on clean graphs.

【21】Fair Feature Importance Scores via Feature Occlusion and Permutation
标题：通过特征遮挡和排列获得公平的特征重要性分数
链接：https://arxiv.org/abs/2602.09196

作者：Camille Little,Madeline Navarro,Santiago Segarra,Genevera Allen
摘要：As machine learning models increasingly impact society, their opaque nature poses challenges to trust and accountability, particularly in fairness contexts. Understanding how individual features influence model outcomes is crucial for building interpretable and equitable models. While feature importance metrics for accuracy are well-established, methods for assessing feature contributions to fairness remain underexplored. We propose two model-agnostic approaches to measure fair feature importance. First, we propose to compare model fairness before and after permuting feature values. This simple intervention-based approach decouples a feature and model predictions to measure its contribution to training. Second, we evaluate the fairness of models trained with and without a given feature. This occlusion-based score enjoys dramatic computational simplification via minipatch learning. Our empirical results reflect the simplicity and effectiveness of our proposed metrics for multiple predictive tasks. Both methods offer simple, scalable, and interpretable solutions to quantify the influence of features on fairness, providing new tools for responsible machine learning development.

【22】Gradient Residual Connections
标题：梯度剩余连接
链接：https://arxiv.org/abs/2602.09190

作者：Yangchen Pan,Qizhen Ying,Philip Torr,Bo Liu
备注：Preprint
摘要：Existing work has linked properties of a function's gradient to the difficulty of function approximation. Motivated by these insights, we study how gradient information can be leveraged to improve neural network's ability to approximate high-frequency functions, and we propose a gradient-based residual connection as a complement to the standard identity skip connection used in residual networks. We provide simple theoretical intuition for why gradient information can help distinguish inputs and improve the approximation of functions with rapidly varying behaviour. On a synthetic regression task with a high-frequency sinusoidal ground truth, we show that conventional residual connections struggle to capture high-frequency patterns. In contrast, our gradient residual substantially improves approximation quality. We then introduce a convex combination of the standard and gradient residuals, allowing the network to flexibly control how strongly it relies on gradient information. After validating the design choices of our proposed method through an ablation study, we further validate our approach's utility on the single-image super-resolution task, where the underlying function may be high-frequency. Finally, on standard tasks such as image classification and segmentation, our method achieves performance comparable to standard residual networks, suggesting its broad utility.

【23】Faster Rates For Federated Variational Inequalities
标题：联邦变分不等式的更快速度
链接：https://arxiv.org/abs/2602.09164

作者：Guanghui Wang,Satyen Kale
摘要：In this paper, we study federated optimization for solving stochastic variational inequalities (VIs), a problem that has attracted growing attention in recent years. Despite substantial progress, a significant gap remains between existing convergence rates and the state-of-the-art bounds known for federated convex optimization. In this work, we address this limitation by establishing a series of improved convergence rates. First, we show that, for general smooth and monotone variational inequalities, the classical Local Extra SGD algorithm admits tighter guarantees under a refined analysis. Next, we identify an inherent limitation of Local Extra SGD, which can lead to excessive client drift. Motivated by this observation, we propose a new algorithm, the Local Inexact Proximal Point Algorithm with Extra Step (LIPPAX), and show that it mitigates client drift and achieves improved guarantees in several regimes, including bounded Hessian, bounded operator, and low-variance settings. Finally, we extend our results to federated composite variational inequalities and establish improved convergence guarantees.

【24】Counterfactual Maps: What They Are and How to Find Them
标题：反事实地图：它们是什么以及如何找到它们
链接：https://arxiv.org/abs/2602.09128

作者：Awa Khouna,Julien Ferry,Thibaut Vidal
摘要：Counterfactual explanations are a central tool in interpretable machine learning, yet computing them exactly for complex models remains challenging. For tree ensembles, predictions are piecewise constant over a large collection of axis-aligned hyperrectangles, implying that an optimal counterfactual for a point corresponds to its projection onto the nearest rectangle with an alternative label under a chosen metric. Existing methods largely overlook this geometric structure, relying either on heuristics with no optimality guarantees or on mixed-integer programming formulations that do not scale to interactive use. In this work, we revisit counterfactual generation through the lens of nearest-region search and introduce counterfactual maps, a global representation of recourse for tree ensembles. Leveraging the fact that any tree ensemble can be compressed into an equivalent partition of labeled hyperrectangles, we cast counterfactual search as the problem of identifying the generalized Voronoi cell associated with the nearest rectangle of an alternative label. This leads to an exact, amortized algorithm based on volumetric k-dimensional (KD) trees, which performs branch-and-bound nearest-region queries with explicit optimality certificates and sublinear average query time after a one-time preprocessing phase. Our experimental analyses on several real datasets drawn from high-stakes application domains show that this approach delivers globally optimal counterfactual explanations with millisecond-level latency, achieving query times that are orders of magnitude faster than existing exact, cold-start optimization methods.

【25】Benchmarking the Energy Savings with Speculative Decoding Strategies
标题：通过推测解码策略对节能进行基准
链接：https://arxiv.org/abs/2602.09113

作者：Rohit Dutta,Paramita Koley,Soham Poddar,Janardan Misra,Sanjay Podder,Naveen Balani,Saptarshi Ghosh,Niloy Ganguly
备注：Accepted at EACL Findings 2026
摘要：Speculative decoding has emerged as an effective method to reduce latency and inference cost of LLM inferences. However, there has been inadequate attention towards the energy requirements of these models. To address this gap, this paper presents a comprehensive survey of energy requirements of speculative decoding strategies, with detailed analysis on how various factors -- model size and family, speculative decoding strategies, and dataset characteristics -- influence the energy optimizations.

【26】From Adam to Adam-Like Lagrangians: Second-Order Nonlocal Dynamics
标题：从亚当到类亚当拉格朗日：二阶非局部动力学
链接：https://arxiv.org/abs/2602.09101

作者：Carlos Heredia
备注：42 pages, 10 figures
摘要：In this paper, we derive an accelerated continuous-time formulation of Adam by modeling it as a second-order integro-differential dynamical system. We relate this inertial nonlocal model to an existing first-order nonlocal Adam flow through an $α$-refinement limit, and we provide Lyapunov-based stability and convergence analyses. We also introduce an Adam-inspired nonlocal Lagrangian formulation, offering a variational viewpoint. Numerical simulations on Rosenbrock-type examples show agreement between the proposed dynamics and discrete Adam.

【27】UI-Venus-1.5 Technical Report
标题：UI-Venus-1.5技术报告
链接：https://arxiv.org/abs/2602.09082

作者：Veuns-Team,:,Changlong Gao,Zhangxuan Gu,Yulin Liu,Xinyu Qiu,Shuheng Shen,Yue Wen,Tianyu Xia,Zhenyu Xu,Zhengwen Zeng,Beitong Zhou,Xingran Zhou,Weizhi Chen,Sunhao Dai,Jingya Dou,Yichen Gong,Yuan Guo,Zhenlin Guo,Feng Li,Qian Li,Jinzhen Lin,Yuqi Zhou,Linchao Zhu,Liang Chen,Zhenyu Guo,Changhua Meng,Weiqiang Wang
摘要：GUI agents have emerged as a powerful paradigm for automating interactions in digital environments, yet achieving both broad generality and consistently strong task performance remains challenging.In this report, we present UI-Venus-1.5, a unified, end-to-end GUI Agent designed for robust real-world applications.The proposed model family comprises two dense variants (2B and 8B) and one mixture-of-experts variant (30B-A3B) to meet various downstream application scenarios.Compared to our previous version, UI-Venus-1.5 introduces three key technical advances: (1) a comprehensive Mid-Training stage leveraging 10 billion tokens across 30+ datasets to establish foundational GUI semantics; (2) Online Reinforcement Learning with full-trajectory rollouts, aligning training objectives with long-horizon, dynamic navigation in large-scale environments; and (3) a single unified GUI Agent constructed via Model Merging, which synthesizes domain-specific models (grounding, web, and mobile) into one cohesive checkpoint. Extensive evaluations demonstrate that UI-Venus-1.5 establishes new state-of-the-art performance on benchmarks such as ScreenSpot-Pro (69.6%), VenusBench-GD (75.0%), and AndroidWorld (77.6%), significantly outperforming previous strong baselines. In addition, UI-Venus-1.5 demonstrates robust navigation capabilities across a variety of Chinese mobile apps, effectively executing user instructions in real-world scenarios. Code: https://github.com/inclusionAI/UI-Venus; Model: https://huggingface.co/collections/inclusionAI/ui-venus

【28】SVD-Preconditioned Gradient Descent Method for Solving Nonlinear Least Squares Problems
标题：求解非线性最小二乘问题的SVD-预条件梯度下降法
链接：https://arxiv.org/abs/2602.09057

作者：Zhipeng Chang,Wenrui Hao,Nian Liu
摘要：This paper introduces a novel optimization algorithm designed for nonlinear least-squares problems. The method is derived by preconditioning the gradient descent direction using the Singular Value Decomposition (SVD) of the Jacobian. This SVD-based preconditioner is then integrated with the first- and second-moment adaptive learning rate mechanism of the Adam optimizer. We establish the local linear convergence of the proposed method under standard regularity assumptions and prove global convergence for a modified version of the algorithm under suitable conditions. The effectiveness of the approach is demonstrated experimentally across a range of tasks, including function approximation, partial differential equation (PDE) solving, and image classification on the CIFAR-10 dataset. Results show that the proposed method consistently outperforms standard Adam, achieving faster convergence and lower error in both regression and classification settings.

【29】The Catastrophic Failure of The k-Means Algorithm in High Dimensions, and How Hartigan's Algorithm Avoids It
标题：k均值算法在多维中的灾难性失败以及Hartigan算法如何避免它
链接：https://arxiv.org/abs/2602.09936

作者：Roy R. Lederman,David Silva-Sánchez,Ziling Chen,Gilles Mordant,Amnon Balanov,Tamir Bendory
摘要：Lloyd's k-means algorithm is one of the most widely used clustering methods. We prove that in high-dimensional, high-noise settings, the algorithm exhibits catastrophic failure: with high probability, essentially every partition of the data is a fixed point. Consequently, Lloyd's algorithm simply returns its initial partition - even when the underlying clusters are trivially recoverable by other methods. In contrast, we prove that Hartigan's k-means algorithm does not exhibit this pathology. Our results show the stark difference between these algorithms and offer a theoretical explanation for the empirical difficulties often observed with k-means in high dimensions.

【30】Toeplitz Based Spectral Methods for Data-driven Dynamical Systems
标题：基于Toeplitz的数据驱动动态系统谱方法
链接：https://arxiv.org/abs/2602.09791

作者：Vladimir R. Kostic,Karim Lounici,Massimiliano Pontil
备注：18 pages, 3 figures
摘要：We introduce a Toeplitz-based framework for data-driven spectral estimation of linear evolution operators in dynamical systems. Focusing on transfer and Koopman operators from equilibrium trajectories without access to the underlying equations of motion, our method applies Toeplitz filters to the infinitesimal generator to extract eigenvalues, eigenfunctions, and spectral measures. Structural prior knowledge, such as self-adjointness or skew-symmetry, can be incorporated by design. The approach is statistically consistent and computationally efficient, leveraging both primal and dual algorithms commonly used in statistical learning. Numerical experiments on deterministic and chaotic systems demonstrate that the framework can recover spectral properties beyond the reach of standard data-driven methods.

【31】Tracking Finite-Time Lyapunov Exponents to Robustify Neural ODEs
标题：跟踪临时李雅普诺夫是Robustify神经ODE的指标
链接：https://arxiv.org/abs/2602.09613

作者：Tobias Wöhrer,Christian Kuehn
备注：Lyapunov exponents, neural ODEs, deep learning, adversarial robustness, Lagrangian coherent structures
摘要：We investigate finite-time Lyapunov exponents (FTLEs), a measure for exponential separation of input perturbations, of deep neural networks within the framework of continuous-depth neural ODEs. We demonstrate that FTLEs are powerful organizers for input-output dynamics, allowing for better interpretability and the comparison of distinct model architectures. We establish a direct connection between Lyapunov exponents and adversarial vulnerability, and propose a novel training algorithm that improves robustness by FTLE regularization. The key idea is to suppress exponents far from zero in the early stage of the input dynamics. This approach enhances robustness and reduces computational cost compared to full-interval regularization, as it avoids a full ``double'' backpropagation.

【32】Is Memorization Helpful or Harmful? Prior Information Sets the Threshold
标题：儿童化是有帮助还是有害？先前信息设定阈值
链接：https://arxiv.org/abs/2602.09405

作者：Chen Cheng,Rina Foygel Barber
备注：33 pages, 3 figures
摘要：We examine the connection between training error and generalization error for arbitrary estimating procedures, working in an overparameterized linear model under general priors in a Bayesian setup. We find determining factors inherent to the prior distribution $π$, giving explicit conditions under which optimal generalization necessitates that the training error be (i) near interpolating relative to the noise size (i.e., memorization is necessary), or (ii) close to the noise level (i.e., overfitting is harmful). Remarkably, these phenomena occur when the noise reaches thresholds determined by the Fisher information and the variance parameters of the prior $π$.

【33】TVTSyn: Content-Synchronous Time-Varying Timbre for Streaming Voice Conversion and Anonymization
标题：TVTSyn：内容同步时变音色，用于流媒体语音转换和同步化
链接：https://arxiv.org/abs/2602.09389

作者：Waris Quamer,Mu-Ruei Tseng,Ghady Nasrallah,Ricardo Gutierrez-Osuna
摘要：Real-time voice conversion and speaker anonymization require causal, low-latency synthesis without sacrificing intelligibility or naturalness. Current systems have a core representational mismatch: content is time-varying, while speaker identity is injected as a static global embedding. We introduce a streamable speech synthesizer that aligns the temporal granularity of identity and content via a content-synchronous, time-varying timbre (TVT) representation. A Global Timbre Memory expands a global timbre instance into multiple compact facets; frame-level content attends to this memory, a gate regulates variation, and spherical interpolation preserves identity geometry while enabling smooth local changes. In addition, a factorized vector-quantized bottleneck regularizes content to reduce residual speaker leakage. The resulting system is streamable end-to-end, with <80 ms GPU latency. Experiments show improvements in naturalness, speaker transfer, and anonymization compared to SOTA streaming baselines, establishing TVT as a scalable approach for privacy-preserving and expressive speech synthesis under strict latency budgets.

【34】Mutual Information Collapse Explains Disentanglement Failure in $β$-VAEs
标题：互信息崩溃解释$β$-VAE中的解纠缠失败
链接：https://arxiv.org/abs/2602.09277

作者：Minh Vu,Xiaoliang Wan,Shuangqing Wei
摘要：The $β$-VAE is a foundational framework for unsupervised disentanglement, using $β$ to regulate the trade-off between latent factorization and reconstruction fidelity. Empirically, however, disentanglement performance exhibits a pervasive non-monotonic trend: benchmarks such as MIG and SAP typically peak at intermediate $β$ and collapse as regularization increases. We demonstrate that this collapse is a fundamental information-theoretic failure, where strong Kullback-Leibler pressure promotes marginal independence at the expense of the latent channel's semantic informativeness. By formalizing this mechanism in a linear-Gaussian setting, we prove that for $β> 1$, stationarity-induced dynamics trigger a spectral contraction of the encoder gain, driving latent-factor mutual information to zero. To resolve this, we introduce the $λβ$-VAE, which decouples regularization pressure from informational collapse via an auxiliary $L_2$ reconstruction penalty $λ$. Extensive experiments on dSprites, Shapes3D, and MPI3D-real confirm that $λ> 0$ stabilizes disentanglement and restores latent informativeness over a significantly broader range of $β$, providing a principled theoretical justification for dual-parameter regularization in variational inference backbones.

机器翻译由腾讯交互翻译提供，仅供参考

点击“阅读原文”获取带摘要的学术速递