Py学习  »  机器学习算法

机器学习学术速递[6.3]

arXiv每日学术速递 • 1 周前 • 117 次点击  

点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!


cs.LG 方向,今日共计249篇


大模型相关(36篇)

【1】Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories
标题:语言模型需要睡眠:学会自我修改和巩固记忆
链接:https://arxiv.org/abs/2606.03979

作者:Ali Behrouz,Farnoosh Hashemi,Vahab Mirrokni
备注:A version of this work has been publicly available from September 2025 on OpenReview
摘要:在过去的几十年里,机器学习算法的设计取得了重大进展,从早期对特定任务的浅层模型的研究到更一般的深层大型语言模型(LLM)。尽管在需要即时预测或上下文学习的任务中显示出有希望的结果,但现有模型缺乏持续学习的能力,并且无法将其时间上下文知识有效地转移到其长期参数。受人类学习过程的启发,我们引入了一种“睡眠”范式,允许模型不断学习,将其短期脆弱的记忆提取为稳定的长期知识,并通过“做梦”过程递归地改进自己。更详细地说,睡眠由两个阶段组成:(1)记忆巩固:一个向上的蒸馏过程,称为知识播种,其中一个较小的自我的记忆被蒸馏成一个更大的网络,以提供更多的容量,同时保存知识。作为概念证明,我们提出了一个新的广义蒸馏过程{知识播种}(即,基于策略的蒸馏与基于强化学习(RL)的模仿学习的结合);(2)梦想:自我改进阶段,模型使用RL生成合成数据课程,以排练新知识并在没有人类监督的情况下改进现有能力。我们在长视野、持续学习、知识整合和Few-Shot概括任务上的实验支持了睡眠阶段的重要性。
摘要:The past few decades have witnessed significant advances in the design of machine learning algorithms, from early studies on task-specific shallow models to more general deep Large Language Models (LLMs). Despite showing promising results in tasks that require instant prediction or in-context learning, existing models lack the ability to continually learn and effectively transfer their temporal in-context knowledge to their long-term parameters. Inspired by human learning process, we introduce a ''Sleep'' paradigm that allows the models to continually learn, distill their short-term fragile memories into stable long-term knowledge with replay, and recursively improve themselves with ''Dreaming'' process. In more detail, sleep consists of two stages: (1) Memory Consolidation: an upward distillation process, called Knowledge Seeding, where the memories of a smaller-self are distilled into a larger network to provide more capacity while preserving the knowledge. As a proof of concept, we present a new Generalized Distillation process for {Knowledge Seeding} (i.e., the combination of on-policy distillation with Reinforcement Learning (RL)-based imitation learning); (2) Dreaming: a self-improvement phase, where the model uses RL to generate a curriculum of synthetic data to rehearse new knowledge and refine existing capabilities without human supervision. Our experiments on long-horizon, continual learning, knowledge incorporation, and few-shot generalization tasks support the importance of the sleep stage.

【2】Reasoning Structure of Large Language Models
标题:大型语言模型的推理结构
链接:https://arxiv.org/abs/2606.03883

作者:Frédéric Berdoz,Luca A. Lanzendörfer,Fabian Farestam,Roger Wattenhofer
备注:Accepted at ICML 2026 and presented at the ICLR 2026 workshop on LLM reasoning
摘要:大型推理模型(LRM)通常使用最终答案准确性或令牌计数等指标进行评估。然而,在这些指标上相同的分数可以隐藏根本不同的推理结构。为了解决这个问题,我们引入了一个可扩展的LRM基准的逻辑难题和管道,将非结构化的痕迹转换成可验证的推理图的索赔和依赖关系。这将推理变成一个结构化的、可测量的对象,其拓扑结构可以被定量分析。在此基础上,我们定义了一个推理效率度量,量化了模型的逻辑流的集中程度。我们对开源推理模型的分析表明,结构化测量将令牌计数和准确性合并的行为分开,为诊断故障模式和比较推理如何与难题难度进行缩放提供了实用工具。
摘要:Large reasoning models (LRMs) are often evaluated using metrics such as final-answer accuracy or token count. However, identical scores on these metrics can hide fundamentally different reasoning structures. To address this limitation, we introduce a scalable LRM benchmark of logic puzzles and a pipeline that converts unstructured traces into verifiable reasoning graphs of claims and dependencies. This turns reasoning into a structured, measurable object whose topology can be quantitatively analyzed. Building on this, we define a reasoning efficiency metric that quantifies how concentrated the model's logical flow is. Our analysis on open-source reasoning models shows that structural measurements separate behaviors that token count and accuracy conflate, providing a practical tool for diagnosing failure modes and comparing how reasoning scales with puzzle difficulty.

【3】Clustered Self-Assessment: A Simple yet Effective Method for Uncertainty Quantification in Large Language Models
标题:自动自我评估:大型语言模型中不确定性量化的简单而有效的方法
链接:https://arxiv.org/abs/2606.03846

作者:Qi Cao,Takeshi Kojima,Andrew Gambardella,Helinyi Peng,Yutaka Matsuo,Yusuke Iwasawa
备注:Findings of ACL 2026
摘要:大型语言模型(LLM)在不同的任务中表现出卓越的性能,但它们经常生成看似合理但实际上不正确的响应。由于缺乏明确的不确定性估计,使用户难以判断模型输出的可靠性,这一问题更加严重。现有的不确定性量化方法通常依赖于间接信号,例如跨采样代的熵。这些信号可能很难解释,并且不能充分利用模型评估其自身不确定性的能力。我们提出了一个简单而有效的自我评估方法,在LLM的不确定性量化。我们的方法组采样代语义上不同的集群,将它们转换为答案选项在一个结构化的多项选择题,并使用LLM分配给每个选项的概率作为置信度估计。在多个模型和数据集上的实验表明,我们的方法始终优于基线方法。值得注意的是,它仅用两个额外的样本就实现了具有竞争力的性能,证明了它的有效性和效率。
摘要:Large language models (LLMs) demonstrate remarkable performance across diverse tasks, but they often generate responses that appear plausible while being factually incorrect. This problem is compounded by the lack of explicit uncertainty estimates, which makes it difficult for users to judge the reliability of model outputs. Existing uncertainty quantification methods typically rely on indirect signals, such as entropy across sampled generations. These signals can be difficult to interpret and do not fully leverage the model's ability to assess its own uncertainty. We propose a simple yet effective self-assessment method for uncertainty quantification in LLMs. Our approach groups sampled generations into semantically distinct clusters, converts them into answer options in a structured multiple-choice question, and uses the probability assigned by the LLM to each option as a confidence estimate. Experiments across multiple models and datasets show that our method consistently outperforms baseline approaches. Notably, it achieves competitive performance with as few as two additional samples, demonstrating both its effectiveness and efficiency.

【4】Expert-Aware Causal Tracing of Factual Recall in Sparse MoE Language Models
标题:稀疏MoE语言模型中事实回忆的专家感知因果追踪
链接:https://arxiv.org/abs/2606.03780

作者:Yuetian Lu,Ali Modarressi,Yihong Liu,Hinrich Schütze
备注:Preprint
摘要:事实回忆的因果追踪主要在密集的Transformer语言模型中进行研究,其中干预将信息流定位到层或前馈模块。稀疏混合专家(MoE)语言模型引入了一个更尖锐的问题:当一个事实预测是由路由MoE块介导的,路由专家的贡献的问题?我们制定了专家感知的因果跟踪稀疏MoE语言模型。使用CounterFact事实,我们首先通过向主题标记嵌入添加噪声来破坏模型的事实偏好,然后测试干净的MoE块输出或干净的专家级更新是否恢复了真实与箔的logit对比。对于Qwen 3 - 30 B-A3 B-Base,层扫描选择并验证层44,专家级跟踪将L44 E069识别为在干净运行中重复选择的专家,其保留的补丁优于其他活动的同层专家补丁。对于Mixtral-8x 7 B-v0.1,层级跟踪验证中间层信号,但该信号不局限于所选择的单例专家;联盟检查替代地用路由的多专家更新来恢复它。这些结果表明,MoE的事实跟踪可以专家知道,同时也表明,专家级的本地化是模型和协议依赖,而不是普遍的。
摘要:Causal tracing of factual recall has been studied predominantly in dense transformer language models, where interventions localize information flow to layers or feed-forward modules. Sparse mixture-of-experts (MoE) language models introduce a sharper question: when a factual prediction is mediated by a routed MoE block, which routed expert contributions matter? We formulate expert-aware causal tracing for sparse MoE language models. Using CounterFact facts, we first corrupt the model's factual preference by adding noise to subject-token embeddings, and then test whether clean MoE-block outputs or clean expert-level updates restore the true-vs-foil logit contrast. For Qwen3-30B-A3B-Base, a layer sweep selects and validates layer 44, and expert-level tracing identifies L44E069 as an expert repeatedly selected in the clean run whose held-out patch outperforms other active same-layer expert patches. For Mixtral-8x7B-v0.1, layer-level tracing validates a mid-layer signal, but the signal is not localized to the selected singleton expert; a coalition check instead recovers it with routed multi-expert updates. These results suggest that MoE factual tracing can be made expert-aware, while also showing that expert-level localization is model- and protocol-dependent rather than universal.

【5】When Graph Tokens Sink: A Mechanistic Analysis of Graph Language Models
标题:图符号何时下沉:图语言模型的机制分析
链接:https://arxiv.org/abs/2606.03712

作者:Ding Zhang, Runtao Zhou, Wenqing Zheng, Rizal Fathony, Bayan Bruss, Chirag Agarwal
摘要
摘要

【6】Multi$^2$: Hierarchical Multi-Agent Decision-Making with LLM-Based Agents in Interactive Environments
标题:Multi $' 2 $:交互环境中使用基于LLM的代理进行分层多代理决策
链接:https://arxiv.org/abs/2606.03698

作者:Sangeun Park, Minhae Kwon
备注:Accepted at ICML 2026
摘要
摘要

【7】A Close Look At World Model Recovery In Supervised Fine-Tuned LLM Planners
标题:密切关注受监督微调的LLM规划者中的世界模型复苏
链接:https://arxiv.org/abs/2606.03685

作者:Patrick Emami, Nan Qiang, Peter Graf
备注:17 pages. Under review at TMLR
摘要
摘要

【8】Black-box, Adaptive, Efficient, Transferable, Harmful, Applicable... Attacks Are All You Need to Break LLMs
标题:黑盒子,适应性,高效,可转移,有害,适用......攻击是所有你需要打破LLM
链接:https://arxiv.org/abs/2606.03647

作者:Vincent Limbach, Jonas Dornbusch, David Lüdke, Stephan Günnemann, Leo Schwinn
摘要
摘要

【9】The Shape of Addition: Geometric Structures of Arithmetic in Large Language Models
标题:加法的形状:大型语言模型中算术的几何结构
链接:https://arxiv.org/abs/2606.03645

作者:Liuyuan Wen, Xun Zhu, Lihao Huang, Wenbin Li, Yang Gao
备注:Accepted by ICML 2026
摘要
摘要

【10】CauTion: Knowing When to Trust LLMs for Ensemble Causal Discovery
标题:CauTion:知道何时信任LLMS以发现广泛的因果关系
链接:https://arxiv.org/abs/2606.03602

作者:Bo Peng, Kaiwen Wu, Sirui Chen, Zhiheng Wang, Yu Qiao, Chaochao Lu
摘要
摘要

【11】Rethinking the Role of Tensor Decompositions in Post-Training LLM Compression
标题:重新思考张量分解在训练后LLM压缩中的作用
链接:https://arxiv.org/abs/2606.03465

作者:Artur Zagitov, Alexander Miasnikov, Maxim Krutikov, Vladimir Aletov, Gleb Molodtsov, Nail Bashirov, Artem Tsedenov, Aleksandr Beznosikov
摘要
摘要

【12】RogueMerge: Robust and Unified Attacks against LLM Model Merging
标题:RogueMerge:针对LLM模型合并的稳健统一攻击
链接:https://arxiv.org/abs/2606.03344

作者:Jinghuai Zhang, Yetian He, Kunlin Cai, Han Zhao, Fnu Suya, Yuan Tian
摘要
摘要

【13】FLIPS: Instance-Fingerprinting for LLMs via Pseudo-random Sequences
标题:FLIPS:通过伪随机序列进行LLM的实例指纹识别
链接:https://arxiv.org/abs/2606.03330

作者:Gurvan Richardeau, Gohar Dashyan, Erwan Le Merrer, Gilles Tredan
备注:20 pages, 20 figures, 3 tables. 43rd International Conference on Machine Learning (ICML 2026)
摘要
摘要

【14】Calibration Data Trade-offs Across Capability Dimensions: Why Multi-Source Mixing Matters for High-Sparsity LLM Pruning
标题:跨能力维度的校准数据权衡:为什么多源混合对于高稀疏LLM修剪很重要
链接:https://arxiv.org/abs/2606.03328

作者:Hu Xu, Zhaolong Xing, Congcong Liu, Jiaxing Wang, Zhida Jiang, Junshi Huang, Zhen Chen, Jianfeng Xu
摘要
摘要

【15】DECA: Decentralizing Block-Wise Adam for Efficient LLM Full-Parameter Fine-Tuning on Non-IID Data
标题:DECA:分块式Adam去中心化,对非IID数据进行高效LLM全参数微调
链接:https://arxiv.org/abs/2606.03209

作者:Yunsheng Yuan, Shaowei Li, Kai Wang, Zhongyuan Sun, Zheng Zhang, Kai Han, Jun Luo, Feng Li
摘要
摘要

【16】Decoupled Smart Contract Audits: Lightweight LLM Framework via Distillation and Aggregation
标题:脱钩智能合同审核:通过蒸馏和聚合的轻量级LLM框架
链接:https://arxiv.org/abs/2606.03128

作者:Bagus Rakadyanto Oktavianto Putra, Muhamad Risqi Utama Saputra, Widyawan, Guntur Dharma Putra
备注:12 pages, 4 figures, 5 tables. Accepted to IEEE ICWS 2026
摘要
摘要

【17】Multi-component Causal Tracing in Large Language Models
标题:大型语言模型中的多成分因果追踪
链接:https://arxiv.org/abs/2606.03085

作者:Zirui Yan, Dennis Wei, Dmitriy A. Katz, Prasanna Sattigeri, Ali Tajer
备注:Accepted to ACL 2026 main conference
摘要
摘要

【18】Efficient Hyperparameter Optimization for LLM Reinforcement Learning
标题:LLM强化学习的高效超参数优化
链接:https://arxiv.org/abs/2606.03073

作者:Minping Chen, Bowen Xiao, Du Liang, Chuxuan Zeng, Zeyi Wen
备注:12 pages, 6 figures, accepted at ACL 2026
摘要
摘要

【19】ASymPO: Asymmetric-Scale Policy Optimization for Asynchronous LLM Post-Training Without Behavior Information
标题:ASymPO:无行为信息的非线性LLM后训练的非线性规模策略优化
链接:https://arxiv.org/abs/2606.03070

作者:Zehua Liu, Yuxuan Yao, Xiaojin Fu, Tao Zhong, Mingxuan Yuan
摘要
摘要

【20】Rethinking Molecular Text Representations for LLMs: An Empirical Study
标题:重新思考LLM的分子文本表示:一项实证研究
链接:https://arxiv.org/abs/2606.03057

作者:Arun Raja, Garrett M. Morris, Kian Ming A. Chai
备注:25 pages, 11 figures, 20 tables
摘要
摘要

【21】Spike-Aware C++ INT8 Inference for Sparse Spiking Language Models on Commodity CPUs
标题:Spike-Aware C++ INT 8商用处理器上稀疏Spiking语言模型的推理
链接:https://arxiv.org/abs/2606.03026

作者:Ting Liu
备注:11 pages, 7 tables
摘要:尖峰语言模型暴露了密集Transformer运行时不直接利用的激活稀疏性。本文从系统的角度研究这一特性。基于SymbolicLight V1尖峰门控语言模型家族,我们实现了一个C++ CPU推理运行时,将稀疏二进制尖峰状态视为执行原语,而不仅仅是应用事后权重压缩。运行时结合了清单驱动的权重加载器、混合行/列内存布局、AVX 2/FMA内核、每通道对称INT 8量化和整数域累积,用于尖峰条件稀疏路径。在AMD Ryzen 7 5800 X上,早期标量FP 32基线解码速度为9.5 token/s。混合布局AVX 2 FP 32将其提高到14.7 tokens/s,AVX 2 INT 8在同一步30 k导出时达到19.9 tokens/s,同时将重量占用从3.49 GB减少到1.06 GB。对于可用的186 k步874 M参数INT 8导出,在单线程CPU基准测试中,C++运行时解码速度为22.63 token/s,相比之下,TinyLlama-1.1B Q8_0为16.31 token/s,Falcon 3 -1B Q8_0为11.26 token/s,而llama.cpp下的Qwen2.5-1.5B Q8_0为9.70 token/s。线程扩展在四个CPU线程时达到47.90令牌/秒,并且从一个线程到八个线程,512令牌预填充从29.86令牌/秒提高到94.68令牌/秒。吞吐量结果伴随着质量成本:SNN报告WikiText-2困惑度为24.80,比同一基准中的密集基线更差。我们将结果框架为稀疏语言运行时的推理系统研究,在具体和边缘代理中具有长期动机,这些代理可能受益于传感器和执行器附近的本地低核心推理。尖峰感知执行可以提高稀疏尖峰语言模型的CPU吞吐量和内存行为,而模型质量,受控的密集训练基线,任务评估和测量的CPU能量仍然是悬而未决的问题。
摘要:Spiking language models expose activation sparsity that dense Transformer runtimes do not directly exploit. This paper studies that property from a systems perspective. Building on the SymbolicLight V1 spike-gated language model family, we implement a C++ CPU inference runtime that treats sparse binary spike states as an execution primitive rather than only applying post-hoc weight compression. The runtime combines a manifest-driven weight loader, mixed row/column memory layout, AVX2/FMA kernels, per-channel symmetric INT8 quantization, and integer-domain accumulation for spike-conditioned sparse paths. On an AMD Ryzen 7 5800X, an early scalar FP32 baseline decodes at 9.5 tokens/s. Mixed-layout AVX2 FP32 raises this to 14.7 tokens/s, and AVX2 INT8 reaches 19.9 tokens/s on the same step-30k export while reducing the weight footprint from 3.49 GB to 1.06 GB. For the available 186k-step 874M-parameter INT8 export, the C++ runtime decodes at 22.63 tokens/s in a single-thread CPU benchmark, compared with 16.31 tokens/s for TinyLlama-1.1B Q8_0, 11.26 tokens/s for Falcon3-1B Q8_0, and 9.70 tokens/s for Qwen2.5-1.5B Q8_0 under llama.cpp. Thread scaling reaches 47.90 tokens/s at four CPU threads, and 512-token prefill improves from 29.86 to 94.68 tokens/s from one to eight threads. The throughput result comes with a quality cost: the SNN reports WikiText-2 perplexity 24.80, worse than the dense baselines in the same benchmark. We frame the result as an inference-systems study for sparse language runtimes, with longer-term motivation in embodied and edge agents that may benefit from local, low-core inference near sensors and actuators. Spike-aware execution can improve CPU throughput and memory behavior for sparse spiking language models, while model quality, controlled dense training baselines, embodied-task evaluation, and measured CPU energy remain open problems.

【22】How Quantization Changes Interpretable Features: A Sparse Autoencoder Analysis of Language Models
标题:量化如何改变可解释特征:语言模型的稀疏自动编码器分析
链接:https://arxiv.org/abs/2606.03002

作者:Evan Duan
摘要:量化是部署大型语言模型的标准途径,当量化模型的困惑度或下游准确度接近原始的全精度时,通常会被认为是可接受的。该模型是否仍然以相同的方式进行计算,或者在全精度模型中识别的可解释特征是否能够经受住重量舍入,很少进行测试,即使安全审计和转向干预越来越依赖于这些特征。我们问是否稀疏自动编码器(SAE)的功能,从一个密集的全精度模型提取保持忠实,一旦该模型被量化。使用冻结的SAE作为固定测量基础,我们在相同的令牌上编码全精度和舍入到最近(RTN)量化激活,并通过Pearson相关性量化每个特征的生存,在Pythia-70 M和Gemma-2-2B上扫描从INT 8到INT 4的位宽。我们发现,功能的生存是分级的:功能系统性地退化,而不是一次全部失败,Pythia-70 M上62.4%的活动功能在INT 6时存活,Gemma-2-2B上51.3%的活动功能在INT 6时存活,大多数非幸存者模糊而不是被摧毁。仅从全精度统计数据就可以预测生存率,交叉验证的AUC为0.92至0.97,峰值激活是最强的边缘预测因子。关键的是,任务指标可能会错过这种损害:在Gemma-2-2B上,INT 7改善了困惑,同时降低了18.7%的功能。最后,量化和匹配的困惑幅度修剪损坏强烈重叠的功能集,与Jaccard重叠0.79至0.86和损害评分斯皮尔曼相关性为0.98,这表明一个共享模式的压缩引起的脆弱性。这些结果表明,行为奇偶性不足以证明可解释性研究结果转移到量化部署,激励压缩的功能级审计。
摘要:Quantization is a standard path to deploying large language models, and a quantized model is typically judged acceptable when its perplexity or downstream accuracy stays close to the full-precision original. Whether the model still computes in the same way, or whether the interpretable features identified in the full-precision model survive weight rounding, is rarely tested, even as safety audits and steering interventions increasingly rely on those features. We ask whether sparse autoencoder (SAE) features extracted from a dense full-precision model remain faithful once that model is quantized. Using a frozen SAE as a fixed measurement basis, we encode full-precision and round-to-nearest (RTN) quantized activations on identical tokens and quantify per-feature survival by Pearson correlation, sweeping bit-widths from INT8 to INT4 on Pythia-70M and Gemma-2-2B. We find that feature survival is graded: features degrade systematically rather than failing all at once, with 62.4 percent of active features surviving at INT6 on Pythia-70M and 51.3 percent surviving at INT6 on Gemma-2-2B, and with most non-survivors blurred rather than destroyed. Survival is predictable from full-precision statistics alone, with cross-validated AUCs of 0.92 to 0.97 and peak activation as the strongest marginal predictor. Critically, task metrics can miss this damage: on Gemma-2-2B, INT7 improves perplexity while degrading 18.7 percent of features. Finally, quantization and matched-perplexity magnitude pruning damage strongly overlapping feature sets, with Jaccard overlap of 0.79 to 0.86 and damage-score Spearman correlation of 0.98, suggesting a shared mode of compression-induced vulnerability. These results show that behavioral parity is insufficient evidence that interpretability findings transfer to quantized deployments, motivating feature-level audits of compression.

【23】Patcher: Post-Hoc Patching of Backdoored Large Language Models
标题:修补器:后台大型语言模型的事后修补
链接:https://arxiv.org/abs/2606.02995

作者:Anjun Gao,Yueyang Quan,Yufei Xia,Zhuqing Liu,Minghong Fang
备注:To appear in the USENIX Security Symposium, 2026
摘要:大型语言模型仍然容易受到越狱后门攻击,其中攻击者毒化安全对齐数据以嵌入绕过安全机制的隐藏触发器。现有的防御通常需要全面的攻击信息或多个触发的示例,当防御者只观察到单个报告的失败案例而不知道它是源于后门攻击还是自然对齐错误时,这些防御就变得不切实际了。本文介绍了修补程序,一个事后防御框架,修复后门的语言模型只使用一个单一的报告失败的情况下,模型参数。Patcher分两个阶段操作。首先,它通过计算响应条件的基于梯度的显着性分数和应用自适应聚类将触发器与良性上下文分离来定位后门触发器。其次,它通过一个受约束的微调目标来修补模型,该目标打破了攻击者-响应关联,同时通过KL发散约束来保持良性任务效用和对非触发越狱攻击的鲁棒性。我们对多种后门攻击策略进行了广泛的评估,并证明了Patcher成功地定位了触发器并中和了后门,同时保持了模型效用。我们进一步展示了针对旨在逃避我们防御的自适应攻击的鲁棒性。这项工作代表了在部署的语言模型中对训练时间攻击的实际防御的重要一步。
摘要 :Large language models remain vulnerable to jailbreak backdoor attacks, where adversaries poison safety alignment data to embed hidden triggers that bypass safety mechanisms. Existing defenses often require comprehensive attack information or multiple triggered examples, making them impractical when defenders only observe a single reported failure case without knowing whether it stems from a backdoor attack or a natural alignment bug. This paper presents Patcher, a post-hoc defense framework that repairs backdoored language models using only a single reported failure case and the model parameters. Patcher operates in two stages. First, it localizes backdoor triggers by computing response-conditioned gradient-based saliency scores and applying adaptive clustering to separate triggers from benign context. Second, it patches the model through a constrained fine-tuning objective that breaks the trigger-response association while preserving benign-task utility and robustness to non-triggered jailbreak attacks through KL-divergence constraints. We conduct extensive evaluations across multiple backdoor attack strategies and demonstrate that Patcher successfully localizes triggers and neutralizes backdoors while maintaining model utility. We further show robustness against adaptive attacks designed to evade our defense. This work represents a significant step toward practical defenses against training-time attacks in deployed language models.

【24】Multi-Segment Attention: Enabling Efficient KV-Cache Management for Faster Large Language Model Serving
标题:多段关注:实现高效的KV缓存管理,以更快的大型语言模型服务
链接:https://arxiv.org/abs/2606.02964

作者:Chunan Shi, Yilei Chen, Yilin Chen, Xupeng Miao, Bin Cui
摘要
摘要

【25】KForge: LLM-Driven Cross-Platform Kernel Generation for AI Accelerators
标题:KForge:LLM驱动的人工智能加速器跨平台内核生成
链接:https://arxiv.org/abs/2606.02963

作者:Taras Sereda, Burak Bartan, Ankita Nayak, Tom St.John, Natalie Serrino, Zain Asgar
备注:Accepted at ISCA 2026 Workshop MLArchSys
摘要
摘要

【26】Gate AI: LLM Security Benchmark Evaluation Methodology and Results
标题:Gate AI:LLM安全基准评估方法和结果
链接:https://arxiv.org/abs/2606.02959

作者:Ryle Goehausen, Marcus Sousa
备注:17 pages, 23 figures, 2 tables. Working preprint; subsequent versions may update benchmark numbers as the framework evolves
摘要
摘要

【27】Fast-dLLM++: Fréchet Profile Decoding for Faster Diffusion LLM Inference
标题:Fast-dLLM++:Fréchet轮廓解码,以实现更快的扩散LLM推理
链接:https://arxiv.org/abs/2606.02955

作者:Siva Rajesh Kasa, Yasong Dai, Sumit Negi, Hongdong Li
备注:Initial version accepted at Workshop on Structured Probabilistic Inference & Generative Modeling, ICML 2026
摘要
摘要

【28】BYORn: Bootstrap Your Own Responses to Defend Large Vision-Language Models Against Backdoor Attacks
标题:BYORN:Bootstrap您自己的响应来保护大型视觉语言模型免受后门攻击
链接:https://arxiv.org/abs/2606.02947

作者:Ivan Sabolić, Marin Oršić, Josip Šarić, Sven Lončarić
备注:Accepted to ICML 2026
摘要
摘要

【29】GRZO: Group-Relative Zeroth-Order Optimization for Large Language Model Fine-Tuning
标题:GRZR:大型语言模型微调的组相对零阶优化
链接:https://arxiv.org/abs/2606.02857

作者:Liyan Tan, Yequan Zhao, Yifan Yang, Ruijie Zhang, Xinling Yu, Zheng Zhang
备注:Preprint. Under review
摘要
摘要

【30】Qift: Shift-Friendly No-Zero W2 Post-Training Quantization for Rotated W2A4/KV4 LLM Inference
标题:Qift:旋转W2 A4/KV 4 LLM推理的Shift友好No-Zero W2训练后量化
链接:https://arxiv.org/abs/2606.02823

作者:Chi-Wei Huang, Chia-Chi Tsai
备注:23 pages, 8 figures
摘要
摘要

【31】Representational Capacity: Geometric Limits on Feature Representation in Transformer Language Models
标题:表征能力:Transformer语言模型中特征表征的几何限制
链接:https://arxiv.org/abs/2606.02765

作者:Alexander Guha
备注:22 pages, 10 figures. Submitted to NeurIPS 2026. This is a condensed version of thesis: this https URL
摘要
摘要

【32】Visual Graph Scaffolds for Structural Reasoning in Large Language Models
标题:大型语言模型中结构推理的视觉图支架
链接:https://arxiv.org/abs/2606.02673

作者:Runlin Lei,Xiaokui Xiao,Zhewei Wei
摘要:图已被用于增强结构化推理的大型语言模型(LLM),主要是在测试时为模型提供外部知识源。在本文中,我们采取不同的观点:图的LLM的价值不仅在于提供信息,而且在组织推理。受人类如何使用图形结构思维导图来组织分支和聚合思想的启发,我们询问图形是否可以作为推理辅助的内部形式。我们研究这个问题上的多跳问答任务,教师提供的推理轨迹被重写为图形思维导图,并用于指导学生模型。我们的实验揭示了明显的模态差距。当图形结构被扁平化为文本时,一旦直接答案提示被删除,它们的好处就变得有限了。在这种抽象的指导下,推理效率和答案质量都大大降低。相比之下,视觉图形指导在没有直接答案线索的情况下仍然有效,并且在监督微调和基于KL的蒸馏之后其优势仍然存在。上述研究结果支持这样的主张,即图不仅应该作为LLM的外部知识结构进行研究,而且还应该作为组织推理的视觉支架。
摘要:Graphs have been used to enhance large language models (LLMs) for structured reasoning, mostly as external knowledge sources are provided to models at test time. In this paper, we take a different view: the value of graphs for LLMs lie not only in supplying information, but also in organizing reasoning. Inspired by how humans use graph-structured mind maps to organize branching and converging thoughts, we ask whether graphs can serve as an internal form of reasoning assistance. We study this question on multi-hop question answering tasks, where teacher-provided reasoning traces are rewritten as graph mind maps and used to guide a student model. Our experiments reveal a clear modality gap. When graph structures are flattened into text, their benefits become limited once direct answer hints are removed. Under this abstract guidance setting, both reasoning efficiency and answer quality degrade substantially. In contrast, visual graph guidance remains effective without direct answer clues, and its advantage persists after supervised fine-tuning and KL-based distillation. The above findings support the claim that graphs should be studied not only as external knowledge structures for LLMs, but also as visual scaffolds for organizing reasoning.

【33】Hallucination Is Linearly Decodable from Mid-Layer Hidden States in Quantized LLMs
标题:量化LLM中的中层隐藏状态可线性解码幻觉
链接:https://arxiv.org/abs/2606.02628

作者:Aizierjiang Aiersilan
摘要:我们研究开源LLM是否在其隐藏状态下编码线性可分离的真实性信号,以及在网络深度下该信号最强。在三个$7$B-$8$B预调模型(Llama-3.1-8B,Mistral-7 B,Qwen2.5- 7 B)加载在$4$位NF 4量化,我们提取每层隐藏状态的四个幻觉基准(TruthfulQA,HaluEval-QA,FEVER,和一个控制的合成集)和比较四种检测方法:线性和MLP探针,内部特征分数,自我一致性,注意熵。单个中间网络层上的线性探测器在保持分裂上实现了0.904 $-1.000 $ AUROC,而基于采样的检测器在相同的协议下不超过0.541 $ AUROC。真实性信号大致呈线性:MLP探针很少超过线性探针0.01 $ AUROC。峰值探测层在自然语言基准上落在模型家族的一致带中--对于Llama和Mistral,块约为13 $-18 $,块约为32 $,对于Qwen,块约为19 $-25 $,块约为28 $。第一块注意熵在知识基础的设置中提供了一个补充信号(HaluEval-QA上的AUROC为0.866美元-0.941美元),没有额外的推理成本。本方案下采样方法的低可分辨性反映了配对标签评价与这些方法访问的信息之间的结构性不匹配,而不是这些方法的固有局限性。代码和数据在一个8,000万美元的GPU上完全再现。
摘要 :We investigate whether open-source LLMs encode a linearly separable truthfulness signal in their hidden states, and at which network depth this signal is strongest. Across three $7$B--$8$B instruction-tuned models (Llama-3.1-8B, Mistral-7B, Qwen2.5-7B) loaded in $4$-bit NF4 quantization, we extract per-layer hidden states on four hallucination benchmarks (TruthfulQA, HaluEval-QA, FEVER, and a controlled synthetic set) and compare four detection approaches: linear and MLP probes, INSIDE EigenScore, self-consistency, and attention entropy. A linear probe on a single mid-network layer achieves $0.904$--$1.000$ AUROC on held-out splits, while sampling-based detectors do not exceed $0.541$ AUROC under the same protocol. The truthfulness signal is approximately linear: MLP probes rarely surpass linear probes by more than $0.01$ AUROC. Peak probing layers fall in a consistent band across model families on natural-language benchmarks -- blocks~$13$--$18$ of~$32$ for Llama and Mistral, and blocks~$19$--$25$ of~$28$ for Qwen. First-block attention entropy provides a complementary signal in knowledge-grounded settings ($0.866$--$0.941$ AUROC on HaluEval-QA) at no additional inference cost. The low discriminability of sampling methods under this protocol reflects a structural mismatch between paired-label evaluation and the information these methods access, rather than an inherent limitation of those methods. Code and data are released for full reproducibility on a single $8$\,GB GPU.

【34】ReLoRA: Knowledge-Reusing Adaptation for Fast Rollout of Evolving LLM Services
标题:ReLoRA:知识重用调整,快速推出不断发展的LLM服务
链接:https://arxiv.org/abs/2606.02606

作者:Yang Xu,Zihuai Xu,Hongli Xu,Yunming Liao,Zhiwei Yao,Xitong Fu
摘要:大型语言模型(LLM)越来越多地部署为不断发展的服务,其中频繁的基础模型更新可能会使先前部署的特定于任务的低秩适配(LoRA)适配器失效。对于管理众多下游模型服务的服务提供商来说,为每个更新的基本模型从头开始重新训练每个LoRA适配器在计算上是禁止的,并且会延迟服务推出。同时,更简单的替代方案,即,天真地将原始LoRA适配器应用于更新的基本模型,通常会由于适配器-骨干网不兼容而导致服务质量下降。为了解决这个问题,我们提出了ReLoRA,这是一个知识重用的重新适应框架,可以有效地恢复服务就绪的LoRA适配器,用于不断发展的LLM服务,同时保持或提高任务性能。具体来说,ReLoRA包括两个关键的优化步骤:1)自适应LoRA初始化利用贝叶斯优化,通过融合来自先前部署的任务适配器和基础模型的进化的信息来构建兼容性感知的起点; 2)使用预定正则化的微调首先通过强正则化将适配器快速引导到高质量区域,然后是针对特定任务的放松正则化。这种设计能够快速恢复服务质量,同时减少重新调整开销。大量的实验表明,与基线相比,ReLoRA将准备就绪时间减少了8.9%,并将准确性提高了4.6%。
摘要:Large Language Models (LLMs) are increasingly deployed as continuously evolving services, where frequent base-model updates may invalidate previously deployed task-specific Low-Rank Adaptation (LoRA) adapters. For service providers managing numerous downstream model services, retraining each LoRA adapter from scratch for every updated base model is computationally prohibitive and delays service rollout. Meanwhile, the simpler alternative, i.e., naively applying the original LoRA adapter to the updated base model, often leads to degraded service quality due to adapter-backbone incompatibility. To address this problem, we propose ReLoRA, a knowledge-reusing re-adaptation framework that efficiently restores service-ready LoRA adapters for evolving LLM services while preserving or improving task performance. Specifically, ReLoRA comprises two key optimization steps: 1) Adaptive LoRA initialization leverages Bayesian optimization to construct a compatibility-aware starting point by fusing information from both the previously deployed task adapter and the base model's evolution; 2) Fine-tuning with scheduled regularization first rapidly steers the adapter to a high-quality region via strong regularization, followed by relaxed regularization for task-specific refinement. This design enables rapid service-quality recovery with reduced re-adaptation overhead. Extensive experiments demonstrate that ReLoRA reduces time-to-readiness by up to 8.9$\times$ and improves accuracy by up to 4.6\% compared to baselines.

【35】WUSH: Near-Optimal Adaptive Transforms for LLM Quantization
标题:WUSH:LLM量化的近优自适应变换
链接:https://arxiv.org/abs/2512.00956

作者:Jiale Chen,Vage Egiazarian,Roberto L. Castro,Torsten Hoefler,Dan Alistarh
备注:Published as a conference paper at the 43rd International Conference on Machine Learning (ICML 2026): https://openreview.net/forum?id=ZsECxUkbKB
摘要:量化LLM权重和激活是有效部署的标准方法,但一些极端离群值可能会扩展动态范围并放大低位量化误差。先前基于变换的缓解措施(例如,阿达玛旋转)是固定的和数据不可知的,它们的量化最优性仍然不清楚。我们推导出封闭形式的最佳线性块变换联合权重激活量化下的标准RTN AbsMax缩放块量化器,涵盖整数和浮点格式。由此产生的构造WUSH将Hadamard骨干与数据相关的二阶矩分量相结合,以形成一个非正交变换,该变换在温和的假设下对于FP和INT量化器可证明是接近最佳的,同时允许高效的融合GPU实现。根据经验,WUSH在最强的基于Hadamard的基线上提高了W 4A 4的准确性(例如,在MXFP 4中的Llama-3.1-8B-Instruct上,它在RTN下获得+2.8平均点,在GPTQ下获得+0.7平均点),同时通过FP 4 MatMul在BF 16上提供高达5.8$\times$的每层吞吐量。源代码可在https://github.com/IST-DASLab/WUSH上获得。
摘要:Quantizing LLM weights and activations is a standard approach for efficient deployment, but a few extreme outliers can stretch the dynamic range and amplify low-bit quantization errors. Prior transform-based mitigations (e.g., Hadamard rotations) are fixed and data-agnostic, and their optimality for quantization has remained unclear. We derive closed-form optimal linear blockwise transforms for joint weight-activation quantization under standard RTN AbsMax-scaled block quantizers, covering both integer and floating-point formats. The resulting construction, WUSH, combines a Hadamard backbone with a data-dependent second-moment component to form a non-orthogonal transform that is provably near-optimal for FP and INT quantizers under mild assumptions while admitting an efficient fused GPU implementation. Empirically, WUSH improves W4A4 accuracy over the strongest Hadamard-based baselines (e.g., on Llama-3.1-8B-Instruct in MXFP4, it gains +2.8 average points with RTN and +0.7 with GPTQ) while delivering up to 5.8$\times$ per-layer throughput over BF16 via FP4 MatMul. Source code is available at https://github.com/IST-DASLab/WUSH.

【36】SVHalluc: Benchmarking Speech-Vision Hallucination in Audio-Visual Large Language Models
标题:SVHalluc:视听大型语言模型中的言语视觉幻觉基准
链接:https://arxiv.org/abs/2606.02642

作者:Chenshuang Zhang,Kyeong Seon Kim,Chengxin Liu,Tae-Hyun Oh
备注:Accepted at CVPR 2026
摘要:尽管视听大语言模型(LLM)取得了成功,但它们可以产生看似合理但没有根据的输出,称为幻觉。现有基准侧重于环境声音(例如,狗吠)以指示事件发生。相比之下,人类语音具有根本不同的、丰富的语义和时间结构,但目前的模型是否能准确地将语音内容与相应的视觉信号对齐,仍有待探索。在这项工作中,我们表明,语音内容可以诱导视听LLM的幻觉。为了系统地研究这一点,我们介绍了SVHalluc,这是第一个用于评估视听LLM中的语音-视觉幻觉的综合基准。我们的基准诊断言语视觉幻觉从两个关键和互补的方面:语义和时间。实验结果表明,最先进的开源视听LLM努力将语音内容与相应的视觉信号对齐,在多个任务中具有近乎随机的准确性。相比之下,Gemini 2.5 Pro明显优于开源模型。我们的分析表明,他们的失败源于有限的能力,在跨通道的理解,尽管在单通道的知觉表现强劲。我们的工作揭示了当前视听LLM的一个新的和根本的限制,并强调了对基于语音的视频理解的需要。项目页面:https://chenshuang-zhang.github.io/projects/svhalluc/。
摘要:Despite the success of audio-visual large-language models (LLMs), they can produce plausible but ungrounded outputs, termed hallucination. Existing benchmarks focus on environmental sounds (e.g., dog barking) to indicate event occurrence. In contrast, human speech carries fundamentally different, rich semantics and temporal structures, yet it remains unexplored whether current models can accurately align speech content with corresponding visual signals. In this work, we show that speech content can induce hallucinations in audio-visual LLMs. To systematically study this, we introduce SVHalluc, the first comprehensive benchmark for evaluating speech-vision hallucination in audio-visual LLMs. Our benchmark diagnoses speech-vision hallucinations from two critical and complementary aspects: semantic and temporal. Experimental results demonstrate that state-of-the-art open-source audio-visual LLMs struggle with aligning speech content with corresponding visual signals, with a near-random accuracy on multiple tasks. In contrast, Gemini 2.5 Pro significantly outperforms the open-source models. Our analysis suggests that their failures stem from limited ability in cross-modality understanding, despite strong performance in single-modality perception. Our work uncovers a new and fundamental limitation of current audio-visual LLMs and highlights the need for speech-grounded video comprehension. Project page: https://chenshuang-zhang.github.io/projects/svhalluc/.

Graph相关(图学习|图神经网络|图优化等)(15篇)

【1】Contrastive Neural Algorithmic Reasoning for Graph Coloring
标题:图形着色的对比神经数学推理
链接:https://arxiv.org/abs/2606.03923

作者:Thien Le,Tianyu Zhao,Melanie Weber
备注:52 pages, 5 figures, 45 tables
摘要 :图着色试图为图的节点分配颜色,以便相邻节点接收不同的颜色,使用尽可能少的颜色。在这里,我们研究近似$k$-着色,目标是使用最多$k$颜色,同时最小化单色边的数量。这个问题是图论的核心,在调度和资源分配等领域有应用。最近的无监督GNN方法直接优化每个实例,排除了跨图大小和分布的泛化。相反,我们提出了一个对比学习框架,学习可转移的着色几何,其中相同颜色节点的嵌入对齐,而相邻节点的表示被推向不同的方向。我们分析了由此产生的人口目标有界大小的图。对于单位范数嵌入,我们证明了它的最优解具有线原型结构:相同颜色的节点的表示崩溃到一个共享的一维子空间,边缘连接正交子空间。这种几何结构在监督设置中产生平稳性条件,并在平衡着色假设下由投影次梯度动力学保持。在一个非规范化的变体中,梯度下降有一个最大边际偏差,由一个连续图硬边际问题控制。在合成图和真实世界图上的实验表明,对比GNN编码器可以有效地推广,并产生低冲突的着色,匹配,有时还可以改进贪婪方法。
摘要:Graph coloring seeks to assigns colors to a graph's nodes so that adjacent nodes receive different colors, using as few colors as possible. Here, we study approximate $k$-coloring, where the goal is to use at most $k$ colors while minimizing the number of monochromatic edges. This problem is central to graph theory and has applications in areas such as scheduling and resource allocation. Recent unsupervised GNN approaches optimize each instance directly, precluding generalization across graph sizes and distributions. We instead propose a contrastive learning framework that learns transferable coloring geometry where the embeddings of same-color nodes align, while adjacent nodes' representations are pushed toward distinct directions. We analyze the resulting population objective over bounded-size graphs. For unit-norm embeddings, we show that its optima have a line-prototype structure: Representations of nodes of the same color collapse to a shared one-dimensional subspace, and edges connect orthogonal subspaces. This geometry yields stationarity conditions in the supervised setting and is preserved by projected subgradient dynamics under a balanced-coloring assumption. In an unnormalized variant, gradient descent has a max-margin bias governed by a quotient-graph hard-margin problem. Experiments on synthetic and real-world graphs show that contrastive GNN encoders generalize effectively and produce low-conflict colorings, matching and sometimes improving on greedy approaches.

【2】Text-attributed Graph Condensation via Text Selection and Attribute Matching
标题:通过文本选择和属性匹配实现文本属性图浓缩
链接:https://arxiv.org/abs/2606.03839

作者:Haowei Han,Yuxiang Wang,Guojia Wan,Hao Wang,Shanshan Feng,Hao Huang,Jiawei Jiang,Xiao Yan
摘要:文本属性图(TAG)是一种重要的图结构化数据,其中每个节点都有一个文本描述。TAG模型通常联合训练图神经网络(GNN)和语言模型,这导致了高的空间和时间消耗,特别是在大数据集上。为了缓解这一问题,我们提出了TAGSAM,这是一种压缩TAG同时保持训练准确性的压缩方法。TAGSAM具有两个关键设计,即,子图文本选择和属性相似度匹配,分别对TAG的文本描述和图拓扑进行压缩。对于文本,子图文本选择通过最大化互信息从多个相关文本描述中选择并合并代表性文本块。对于图的拓扑结构,流行的基于匹配训练轨迹(MTT)的浓缩方法遭受高方差,这阻碍了准确性。我们的属性相似性匹配通过调整稳定的相似性矩阵来缓解这个问题。我们根据六个最先进的基线评估TAGSAM,其中它展示了卓越的性能。对于相同的压缩大小,TAGSAM在最佳性能基线的基础上提高了平均4.9%的准确性。此外,即使TAG被压缩到只有1%的大小,它也能保持有竞争力的训练精度。我们的代码可在https://github.com/SundayVHan/TAGSAM上获得
摘要:Text-Attributed Graph (TAG) is an important type of graph structured data, where each node has a text description. TAG models usually train a Graph Neural Network (GNN) and language model jointly, which leads to high space and time consumption, especially on large datasets. To mitigate this, we propose TAGSAM, a condensation method that compresses TAGs while preserving training accuracy. TAGSAM comes with two key designs, i.e., subgraph text Selection and Attribute similarity Matching, which compress the text description and graph topology of TAG, respectively. For the texts, subgraph text selection selects and merges representative text chunks from multiple related text descriptions by maximizing mutual information. For the graph topology, popular condensation methods based on Matching Training Trajectories (MTT) suffer from high variance, which hinders accuracy. Our attribute similarity matching mitigates this issue by aligning stable similarity matrices. We evaluate TAGSAM against six state-of-the-art baselines, where it showcases superior performance. For the same compressed size, TAGSAM improves upon the best-performing baseline by an average of 4.9% in accuracy. Furthermore, it maintains competitive training accuracy even when the TAG is condensed to just 1% size. Our code is available at https://github.com/SundayVHan/TAGSAM

【3】Limit Analysis of Graph Neural Networks with Wireless Conflict Graphs
标题:具有无线冲突图的图神经网络的极限分析
链接:https://arxiv.org/abs/2606.03794

作者:Romina Garcia Camargo,Zhiyang Wang,Alejandro Ribeiro
摘要:图神经网络(GNN)已经成为利用通信网络的底层图结构进行无线资源分配的强大工具。它们的可转移性属性使在小规模图上训练的模型能够推广到大规模部署,而性能几乎没有下降,这是当前不断增长的网络所期望的属性。无线网络是稀疏的,其中单个节点连接到少数其他用户。这项工作建立了GNN在稀疏随机几何图(RGG)上的可转移性的理论结果。特别是,我们专注于冲突图的RGG用于建模链接之间的干扰。我们的方法考虑了RGG和确定性网格图(DGG)之间的接近度,以建立模型跨尺度传输时的性能损失界限。我们通过链路调度问题验证了我们的理论研究结果,表明我们学到的策略在规模上始终优于现有的基准。最后,我们研究了我们的理论假设对实证表现的影响。
摘要:Graph Neural Networks (GNNs) have emerged as a powerful tool for wireless resource allocation that leverages the underlying graph structure of communication networks. Their transferability property enables models trained on small-scale graphs to generalize to large-scale deployments with little performance deterioration, a desirable property for currently growing networks. Wireless networks are sparse regimes, where a single node is connected to a small number of other users. This work establishes theoretical results for transferability of GNNs over graphs derived from sparse Random Geometric Graphs (RGGs). In particular, we focus on conflict graphs of RGGs used to model interference among links. Our approach considers the closeness between RGGs and Deterministic Grid Graphs (DGG) to establish bounds in the performance loss when a model is transferred across scales. We validate our theoretical findings through the problem of link scheduling, demonstrating that our learned policies consistently outperform existing benchmarks at scale. Finally, we examine the impact of our theoretical assumptions on empirical performance.

【4】HiSE: A Lightweight Hierarchical Semantic Explainer for Heterogeneous Graph Neural Networks
标题:HiSE:一种面向异构图神经网络的轻量级层次语义解释器
链接:https://arxiv.org/abs/2606.03495

作者:Zongrui Li, Yuhang Zhao, Ying Zhao, Yuanzhao Guo, Qiang Huang, Yuan Tian
摘要
摘要

【5】Topology-Aware Gaussian Graph Repair for Robust Graph Neural Networks
标题:鲁棒图神经网络的布局感知高斯图修复
链接:https://arxiv.org/abs/2606.03462

作者:Anubha Goel, Juho Kanniainen
摘要
摘要

【6】Link Prediction or Perdition: the Seeds of Instability in Knowledge Graph Embeddings
标题:链接预测还是毁灭:知识图谱嵌入中不稳定的种子
链接:https://arxiv.org/abs/2606.03365

作者:Guillaume Méroué, Fabien Gandon, Pierre Monnin
备注:Paper accepted at ESWC 2026 (this https URL)
摘要
摘要

【7】Multi-Modal Graph Neural Network with Transformer-Guided Adaptive Diffusion for Preclinical Alzheimer Classification
标题:具有变换器引导的自适应扩散的多模式图神经网络用于阿尔茨海默病临床前分类
链接:https://arxiv.org/abs/2606.03322

作者:Jaeyoon Sim, Minjae Lee, Guorong Wu, Won Hwa Kim
备注:10 pages, Accepted to MICCAI 2024
摘要
摘要

【8】A Graph Foundation Model with Spectral Parsing and Prototype-Guided Spatial Propagation
标题:具有谱解析和原型引导空间传播的图基础模型
链接:https://arxiv.org/abs/2606.03315

作者:Ankang Yang, Jitao Zhao, Dongxiao He, Liang Yang, Di Jin, Weixiong Zhang
摘要
摘要

【9】Message Tuning Outshines Graph Prompt Tuning: A Prismatic Space Perspective
标题:消息调整胜过图形提示调整:棱镜空间的视角
链接:https://arxiv.org/abs/2606.03290

作者:Yancheng Chen, Dun Ma, Shuai Zhang, Yang Liu, Xixun Lin, Xiangyu Zhao, Wenguo Yang, Wei Chen, Chuan Zhou
备注:Accepted by ICML 2026
摘要
摘要

【10】Are Common Substructures Transferable? Riemannian Graph Foundation Model with Neural Vector Bundles
标题:常见子结构可以转让吗?具有神经元载体束的Riemann图基础模型
链接:https://arxiv.org/abs/2606.03270

作者:Li Sun, Zhenhao Huang, Yiding Wang, Qin Chen, Pietro Lio, Philip S. Yu
备注:Accepted by ICML 2026
摘要
摘要

【11】GFFMERGE: Efficient Merging of Graph Neural Force Fields and Beyond
标题:GFFMERGE:图形神经力场及其他领域的高效合并
链接:https://arxiv.org/abs/2606.03232

作者:Parth Verma, Parv P. Singh, Vipul Garg, Ishita Thakre, N. M. Anoop Krishnan, Sayan Ranu
摘要
摘要

【12】Learn When and Where to Connect: Adaptive Virtual Nodes for Dynamic Message Passing on Graphs
标题:了解何时何地连接:用于在图形上动态消息传递的自适应虚拟节点
链接:https://arxiv.org/abs/2606.03068

作者:Jaejun Lee, Joyce Jiyoung Whang
备注:12 pages, 6 figures, 10 tables, 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2026)
摘要
摘要

【13】RelGT-AC: A Relational Graph Transformer for Autocomplete Tasks in Relational Databases
标题:RelGT-AC:用于关系数据库中自动完成任务的关系图Transformer
链接:https://arxiv.org/abs/2606.03040

作者:Phillip Jiang
备注:12 pages, 6 figures. Code and model checkpoints available at https://github.com/jiangdmv/graph-transformer
摘要:关系数据库支撑着现代企业、科学和医疗保健系统,但由于其多表、异构和时态结构,对此类数据的预测机器学习仍然具有挑战性。关系深度学习(RDL)通过将数据库表示为异构图并直接应用图神经网络(GNN)来解决这个问题。RelBench v2最近引入了自动完成任务--一种实际激励的任务类型,其目标是从关系上下文预测现有的列值,类似于智能表单填充助手。我们建议RelGT-AC(Relational Graph Transformer for Autocomplete),通过三个有针对性的贡献扩展了RelGT架构:(1)列屏蔽策略,通过在子图编码期间屏蔽目标列来防止平凡的解决方案;(2)统一的任务头,支持单个模型中的二进制分类、多类分类和回归自动完成任务;以及(3)TF-IDF文本编码器,其自动检测和编码自由文本列,恢复分类编码器丢弃的强词汇信号。在跨越3个RelBench v2数据集(rel-trial,rel-f1,rel-stack)的7个任务中,RelGT-AC在所有3个回归自动完成任务上的表现都优于GraphSAGE基线,并通过TF-IDF编码器在文本繁重的资格任务上实现了高达+10 AUROC点。
摘要:Relational databases underpin modern enterprise, scientific, and healthcare systems, yet predictive machine learning on such data remains challenging due to their multi-table, heterogeneous, and temporal structure. Relational Deep Learning (RDL) addresses this by representing databases as heterogeneous graphs and applying graph neural networks (GNNs) directly. RelBench v2 recently introduced autocomplete tasks -- a practically motivated task type where the goal is to predict an existing column value from relational context, analogous to an intelligent form-filling assistant. We propose RelGT-AC (Relational Graph Transformer for Autocomplete), extending the RelGT architecture with three targeted contributions: (1) a column masking strategy that prevents trivial solutions by masking the target column during subgraph encoding; (2) a unified task head supporting binary classification, multiclass classification, and regression autocomplete tasks within a single model; and (3) a TF-IDF text encoder that automatically detects and encodes free-text columns, recovering strong lexical signal that categorical encoders discard. Across 7 tasks spanning 3 RelBench v2 datasets (rel-trial, rel-f1, rel-stack), RelGT-AC outperforms the GraphSAGE baseline on all 3 regression autocomplete tasks and achieves up to +10 AUROC points on text-heavy eligibility tasks via the TF-IDF encoder.

【14】A Nonmonotone Gradient-Based Algorithm for Symmetric Nonnegative Matrix Factorization and Graph Clustering
标题:基于非单调条件的对称非负矩阵因式分解和图聚集算法
链接:https://arxiv.org/abs/2606.02887

作者:Ryan Swart, Johannes Brust
摘要
摘要

【15】Graph Mamba Survival Analysis Based on Topology-Aware ordering
标题:基于布局感知排序的图曼巴生存分析
链接:https://arxiv.org/abs/2606.02602

作者:Yuanfang Chen,Peiqiang Yan,Yuntao Shou,Qian Zhao,Xiangyong Cao
摘要:在计算病理学中,全切片图像(WSIs)生存分析对于患者预后评估至关重要,但它面临着多种技术挑战。虽然Transformer通过其自注意机制捕获远程依赖,但其$O(N^2)$时间复杂度在大规模WSIs图结构中导致严重的计算瓶颈。Mamba模型突破了Transformer线性复杂度的计算瓶颈。但是,由于Mamba对输入数据的顺序非常敏感,Graph Mamba中传统的节点排序方法,例如基于节点度或子图大小的方法,无法充分考虑图数据的拓扑连通性。这一不足限制了Mamba顺序建模的性能。此外,其单向架构无法利用图像的双向空间结构。为了解决这些挑战,本文提出了一种新的基于拓扑感知排序的图Mamba生存分析框架(TopoMamSurv),以适应Mamba的顺序敏感性。我们的可视化实验进一步证实,通过拓扑感知排序(TAO)策略提取的节点确实表现出更高的相似性。设计了双向Mamba模块,并集成了图卷积网络(GCN),实现了图像的双向空间上下文建模,形成了“局部聚集-全局捕获”的分层特征学习架构。该框架通过TAO、双向语义建模和层次特征融合的系统化设计,有效地解决了WSIs分析中长距离依赖建模、计算效率和空间结构利用率之间的矛盾。该框架已在五个TCGA数据集上验证了其综合性能优势。
摘要:In computational pathology, Whole Slide Images (WSIs) survival analysis is crucial for patient prognosis assessment, but it faces multiple technical challenges. Although the Transformer captures long-range dependencies through its self-attention mechanism, its $O(N^2)$ time complexity causes a severe computational bottleneck in large-scale WSIs graph structures. The Mamba model breaks through the Transformer's computational bottleneck with linear complexity. But, owing to Mamba's high sensitivity to the order of input data, traditional node sorting methods in Graph Mamba, such as those based on node degree or subgraph size, fail to adequately account for the topological connectivity of graph data. This inadequacy consequently restricts the performance of Mamba's sequential modeling. Moreover, its unidirectional architecture cannot leverage the bidirectional spatial structure of images. To address these challenges, this paper proposes a novel Graph Mamba survival analysis framework based on topology-aware ordering (TopoMamSurv) to adapt to the sequential sensitivity of Mamba. Our visualization experiments further confirmed that the nodes extracted through the topology-aware ordering (TAO) strategy indeed exhibit higher similarity. Furthermore, we designed a bidirectional Mamba module and integrated a Graph Convolutional Network (GCN) to achieve bidirectional spatial context modeling of images, forming a hierarchical feature learning architecture for "local aggregation - global capture." This framework effectively reconciles the contradiction between long-range dependency modeling, computational efficiency, and spatial structure utilization in WSIs analysis through its systematic design of TAO, bidirectional semantic modeling, and hierarchical feature fusion. This framework has been validated for its comprehensive performance advantage on five TCGA datasets.

Transformer(4篇)

【1】Dynamic Short Convolutions Improve Transformers
标题:动态短卷积改进Transformer
链接:https://arxiv.org/abs/2606.03825

作者:Oliver Sieberling,Bharat Runwal,Rameswar Panda,Yoon Kim
摘要:Transformers已经成为大型语言模型的主要架构,这主要是由于注意力、前馈层、剩余连接和规范化的可伸缩性和灵活性。本文引入动态短卷积作为改进Transformers的附加神经网络基元。与静态短卷积不同,动态卷积使用依赖于输入的滤波器,这保留了卷积的局部偏差,同时增加了表现力。令人鼓舞的实验表明,与静态卷积变体相比,将动态短卷积应用于键,查询和值表示可以提高具有挑战性的联想回忆任务的性能。在从1.5亿到2B参数的语言建模实验中,动态卷积始终优于标准Transformers和静态短卷积增强的Transformers。拟合缩放定律表明,当对键、查询和值向量应用动态卷积时,与计算匹配的Transformers相比,计算优势为1.33 $\times $,当在每个线性层后添加动态卷积时,计算优势为1.60 $\times $。动态卷积还提供了对线性RNN(Mamba-2/Gated DeltaNet)和专家混合架构的改进。我们通过自定义Triton内核实现了这些收益,这些内核可以实现有效的训练,并具有可管理的端到端减速。这些结果表明,动态短卷积是一种可扩展的,硬件效率高的,表达能力强的原语,用于推进基于transformer的语言模型。
摘要:Transformers have become the dominant architecture for large language models, largely due to the scalability and flexibility of attention, feed-forward layers, residual connections, and normalization. This paper introduces dynamic short convolutions as an additional neural network primitive for improving Transformers. Unlike static short convolutions, dynamic convolutions use input-dependent filters, which preserves the locality bias of convolution while increasing expressivity. Motivating experiments show that applying dynamic short convolutions to key, query, and value representations improves performance on challenging associative recall tasks compared with static convolutional variants. Across language-modeling experiments ranging from 150M to 2B parameters, dynamic convolutions consistently outperform standard Transformers and Transformers augmented with static short convolutions. Fitting scaling laws indicates a 1.33$\times$ compute advantage over compute-matched Transformers when dynamic convolutions are applied to the key, query, and value vectors, and a 1.60$\times$ advantage when adding dynamic convolutions after every linear layer. Dynamic convolutions also offer improvements on linear RNNs (Mamba-2/Gated DeltaNet) and mixture-of-experts architectures. We make these gains practical with custom Triton kernels that enable efficient training with a manageable end-to-end slowdown. These results suggest that dynamic short convolutions are a scalable, hardware-efficient, and expressive primitive for advancing Transformer-based language models.

【2】PrimeSVT: An Automated Memory-aware Pruning Framework with Prioritized Compression Policy for Spiking Vision Transformers
标题:PrimeSVT:用于Spiking Vision Transformers的自动内存感知修剪框架,具有优先级压缩策略
链接:https://arxiv.org/abs/2606.03428

作者:Rachmad Vidya Wicaksana Putra, Achyuta Muthuvelan, Alberto Marchisio, Muhammad Shafique
备注:8 pages, 8 figures, 3 tables
摘要
摘要

【3】PSViT: A Methodology for Structurally Pruning Spiking Vision Transformers
标题:PSViT:结构上修剪峰值视觉Transformer的方法
链接:https://arxiv.org/abs/2606.03257

作者:Rachmad Vidya Wicaksana Putra, Achyuta Muthuvelan, Alberto Marchisio, Muhammad Shafique
备注:8 pages, 7 figures, 3 tables
摘要
摘要

【4】Data-Driven Forecasting of three-Component Seismograms Using Transformer Architectures
标题:使用Transformer架构进行三分量地震图的数据驱动预测
链接:https://arxiv.org/abs/2606.02912

作者:Waleed Esmail,Stuart Russell,Jana Klinge,Alexander Kappes,Christine Thomas
备注:35 pages, 13 figures and 4 tables
摘要:由于地震波传播的非线性、色散和多尺度性质,预测观测数据之外的地震波形仍然具有挑战性。在这项工作中,我们介绍了\textsc{SeismoGPT},一个基于变换的自回归模型,旨在预测三分量地震波形直接在时域。预测公式化为物理约束的连续问题,其中模型接收波形上下文开始于P波到达并延伸超过S波到达的定义时间,之后递归地生成未来运动而无需访问地面实况样本。对震源深度为5- 100 km、震中距为10- 90 cm、震级为3 - 7级的合成地震记录进行了评价。为了理清上下文长度和预测范围的影响,我们使用距离归一化上下文比率和120和240 μ s的固定预测范围定义了三种评估配置。在所有配置中,该模型实现了高于0.93的中值归一化互相关。代表性的预测分析表明,成功的预测保持相位相干性和光谱能量分布。在出现故障的情况下,这主要是由于自回归展开期间的逐渐相位漂移,而不是非物理信号生成。这些结果表明,基于transformer的序列模型可以学习地震波场的稳定动态延拓,突出了物理驱动的时间序列预测的基础模型方法的潜力。这种方法在地震预警和减灾方面有潜在的应用,特别是在下一代引力波观测站,如爱因斯坦望远镜。
摘要:Forecasting seismic waveforms beyond observed data remains challenging due to the nonlinear, dispersive, and multi-scale nature of seismic wave propagation. In this work, we introduce \textsc{SeismoGPT}, a transformer-based autoregressive model designed to forecast three-component seismic waveforms directly in the time domain. Forecasting is formulated as a physically constrained continuation problem in which the model receives waveform context beginning at the P-wave arrival and extending a defined time beyond the S-wave arrival, after which future motion is generated recursively without access to ground-truth samples. Evaluation is performed on synthetic seismograms spanning source depths of 5--100\,km, epicentral distances of 10--90$^\circ$, and magnitudes $3 \leq M_w \leq 7$. To disentangle the effects of context length and prediction horizon, we define three evaluation configurations using a distance-normalized context ratio and fixed prediction horizons of 120 and 240\,s. Across all configurations, the model achieves median normalized cross correlation above 0.93. Analysis of representative forecasts shows that successful predictions preserve both phase coherence and spectral energy distribution. Where failure cases arise, this is primarily due to gradual phase drift during autoregressive rollout rather than unphysical signal generation. These results demonstrate that transformer-based sequence models can learn stable dynamical continuation of seismic wavefields, highlighting the potential of foundation-model approaches for physics-driven time-series forecasting. There are potential applications of this methodology in seismic warning and hazard mitigation, particularly for next-generation gravitational-wave observatories, such as the Einstein Telescope.

GAN|对抗|攻击|生成相关(5篇)

【1】DiffUNet^2: Bidirectional Prediction, Probabilistic Generation and Collaborative Visual Discovery for Scientific Data
标题:差异' 2:科学数据的双向预测、概率生成和协作视觉发现
链接:https://arxiv.org/abs/2606.03926

作者:Mengdi Chu,Jiaxin Yang,Angus G. Forbes,Nathan Debardeleben,Earl Lawrence,Ayan Biswas,Han-Wei Shen
备注:12 pages, 20 figures
摘要:建模时间演化对于分析和推理科学现象很重要,但大多数机器学习方法提供确定性的向前预测,忽略了多个可能的结果,很少支持向后推理,限制了它们在实际科学工作流程中的有用性。我们提出了一个框架,集成了基于扩散的生成建模与交互式可视化分析的科学探索。我们引入了DiffUNet ^2,这是一个条件扩散模型,它支持跨时间的双向、任意对任意的生成,并捕获了看似合理的系统演化的分布。基于该模型,我们的交互式系统支持分支时间轴探索,用户引导的状态编辑和概率空间导航,使科学家能够积极探索替代假设,而不是被动地观察预测。我们在不同科学领域的5个数据集上评估了该模型,以验证其预测准确性和概率空间集成质量。在与领域专家的合作中,我们展示了我们的方法在支持实用的科学时态数据分析工作流程中的有效性。通过集成建模和可视化交互,我们的方法使科学家能够交互式地探索系统动力学,将生成模型转化为假设驱动的科学分析工具。
摘要:Modeling temporal evolution is important to analyzing and reasoning about scientific phenomena, yet most machine learning methods provide deterministic forward predictions that overlook multiple plausible outcomes and rarely support backward reasoning, limiting their usefulness in practical scientific workflows. We present a framework that integrates diffusion-based generative modeling with interactive visual analytics for scientific exploration. We introduce DiffUNet^2, a conditional diffusion model that enables bidirectional, any-to-any generation across time and captures distributions of plausible system evolutions. Built upon the model, our interactive system supports branching timeline exploration, user-guided state editing, and probability-space navigation, enabling scientists to actively explore alternative hypotheses rather than passively observe predictions. We evaluate the model on 5 datasets across different scientific domains to validate its predictive accuracy and probability-space ensemble quality. In collaboration with domain experts, we demonstrate the effectiveness of our approach in supporting practical scientific temporal data analysis workflows. By integrating modeling and visual interaction, our approach enables scientists to interactively explore system dynamics, transforming generative models into tools for hypothesis-driven scientific analysis.

【2】Building Reliable Long-Form Generation via Hallucination Rejection Sampling
标题:通过幻觉抑制采样构建可靠的长格式生成
链接:https://arxiv.org/abs/2606.03628

作者 :Lin Li, Georgia Channing, Suhaas M Bhat, Gabriel Davis Jones, Yarin Gal
备注:accepted by ICML 2026
摘要
摘要

【3】Exploiting Verification-Generation Gap: Test-Time Reinforcement Learning with Confidence-Conditioned Verification
标题:利用验证一代差距:具有保密条件验证的测试时强化学习
链接:https://arxiv.org/abs/2606.03608

作者:Jiahui Li, Jianfeng Shan, Wenpei Chen, Shunyu Wu, Jian Lou, Wenjie Feng, Dan Li, See-Kiong Ng
摘要
摘要

【4】SketchSong: Hierarchical Song Generation with Sketch Planning and Fine-Grained Multi-Track Modeling
标题:SketchSong:具有草图规划和细粒度多轨建模的分层歌曲生成
链接:https://arxiv.org/abs/2606.03169

作者:Xiaoyue Duan, Nanxing Hu, Yutang Feng, Xudong Yan, Jiatao Chen, Jinchao Zhang, Jie Zhou
摘要
摘要

【5】Trans GAN-WT: A Feature Extraction and Interactive Learning-Based Anomaly Detection Model for Wind Turbine Time Series Data
标题:Trans GAN-WT:一种基于特征提取和交互学习的风力涡轮机时间序列数据异常检测模型
链接:https://arxiv.org/abs/2606.03112

作者:Jingzhe Kang
摘要:随着风电场规模和数量的不断增加,风电机组的日常运行维护成本也在不断增加。为了降低运行和维护成本,并在发生灾难性故障之前提高风力涡轮机和系统运行数据的可靠性,监控设备的运行状态并在早期阶段检测故障至关重要。利用工况数据对风电机组运行状态进行异常评估,实现对风电机组运行状态的异常监测,具有重要的现实意义。然而,现有的异常检测方法既不能对充满大量冗余信息的数据进行有效的关系建模,也不能合理地利用有价值的异常数据。为此,本文提出了一种融合Transformer和生成式对抗网络的异常检测模型。首先,通过放大重构误差降低微小偏差异常的漏检率。其次,利用自回归推理提取多模态特征,增强训练的稳定性和泛化能力。最后,构建了时间特征提取模块,促进了不同时间尺度特征之间的交互学习,有效降低了时间冗余。在真实WTG数据集上进行的多组实验结果表明,TransGAN-WT在多个风力涡轮机数据集上的平均F1得分为96.10%,比其他几种最先进的基线方法高出5.84%和2.89%。它还实现了0.06%的假阳性率(FPR),并通过Wilcoxon符号秩检验验证,与最先进的基线方法相比,实现了统计学上显著的性能提升,有效地确保了风力发电机组的稳定运行。
摘要:With the increasing scale and number of wind farms, wind turbines' daily operation and maintenance costs are increasing. To reduce operation and maintenance costs and enhance the reliability of wind turbine and system operation data before reaching catastrophic failures, monitoring the operating status of the equipment and detecting failures at an early stage is crucial. It is of great practical significance to utilize the working condition data for abnormal assessment of the operating status of wind turbines to realize abnormal monitoring of the operating status of wind turbines. However, the existing anomaly detection methods can neither perform effective relational modeling in data filled with a large amount of redundant information nor reasonably utilize the valuable anomaly data. For this reason, this paper proposes an anomaly detection model that fuses a Transformer and a generative adversarial network. Firstly, it reduces the leakage detection rate of minor deviation anomalies by amplifying the reconstruction error. Secondly, it uses autoregressive inference to extract multimodal features to enhance the stability and generalization ability of training. Finally, the temporal feature extraction module is constructed to promote the interactive learning between features of different time scales and effectively reduce the time redundancy. The results of multiple sets of experiments conducted on real WTG datasets show that TransGAN-WT achieves an average F1 score of 96.10% across multiple wind turbine datasets, which is 5.84% and 2.89% higher than several other state-of-the-art baseline methods. It also realizes a false positive rate (FPR) of 0.06%, and is verified by the Wilcoxon signed-rank test to have achieved a statistically significant performance enhancement compared to the state-of-the-art baseline methods, effectively ensuring the stable operation of wind turbines.

半/弱/无/有监督|不确定性|主动学习(7篇)

【1】Using Reward Uncertainty to Induce Diverse Behaviour in Reinforcement Learning
标题:利用奖励不确定性在强化学习中诱导多样化行为
链接:https://arxiv.org/abs/2606.03962

作者:Anthony GX-Chen,Ankit Anand,Gheorghe Comanici,Zaheer Abbas,Eser Aygün,David Smalling,Shibl Mourad,Doina Precup,André Barreto,Mark Rowland
备注:Core contributors: Anthony GX-Chen, Ankit Anand, Gheorghe Comanici, André Barreto, Mark Rowland
摘要:经典的强化学习(RL)通常寻求一种确定性策略,使标量奖励的预期总和最大化。然而,语言模型微调或科学发现等现代应用需要多样性。现有的补救措施,如熵正则化或多样性奖金,往往需要脆弱的权衡,牺牲性能的随机性或依赖于启发式指标,可以错位的政策排名。我们认为,多样性更自然地理解为对回报不确定性的理性反应。当奖励函数不是完全已知的时候--就像模糊的偏好或不完美的奖励模型一样--承诺一个单一的行动可能是次优的。在此基础上,我们提出了一个基本的重新制定的RL目标,取代标量奖励与奖励函数的分布,并应用一个非线性的目标集的行动。其结果是一个框架中,校准的行为多样性自然出现,通过奖励函数分布保持可控,并获得不牺牲预期的奖励。专注于上下文的强盗设置,我们推导出一个原则性的梯度估计为这个目标,并证明我们的配方自然概括香草政策梯度和最近开发的行动集的方法。我们的实证结果表明,这个框架提供了一个强大的和理论上接地复杂的RL任务的替代传统的配方的问题,未能诱导所需的代理行为的广度。
摘要:Classical reinforcement learning (RL) typically seeks a deterministic policy that maximizes the expected sum of a scalar reward. Yet, modern applications such as language model fine-tuning or scientific discovery demand diversity. Existing remedies such as entropy regularization or diversity bonuses often require fragile trade-offs that sacrifice performance for stochasticity or rely on heuristic metrics that can misalign policy rankings. We argue that diversity is more naturally understood as the rational response to uncertainty in the reward. When the reward function is not perfectly known--as is the case with ambiguous preferences or imperfect reward models--committing to a single action can be sub-optimal. Building on this, we propose a fundamental reformulation of the RL objective by replacing the scalar reward with a distribution over reward functions, and applying a non-linear objective over sets of actions. The result is a framework in which calibrated behavioural diversity emerges naturally, remains controllable through the reward function distribution, and is obtained without sacrificing expected reward. Focusing on the contextual bandit setting, we derive a principled gradient estimator for this objective and prove that our formulation naturally generalizes both vanilla policy gradient and more recently developed action-set approaches. Our empirical results demonstrate that this framework offers a robust and theoretically grounded alternative for complex RL tasks where the traditional formulation of the problem fails to induce the desired breadth of agent behaviour.

【2】CoralBay: A Self-Supervised CT Foundation Model
标题:CoralBay:自我监督的CT基金会模型
链接:https://arxiv.org/abs/2606.03888

作者:Ioannis Gatopoulos,Nicolas Känzig,Sebastian Otálora,Fei Tang
摘要:自我监督学习已经实现了对2D自然图像的大规模预训练,产生了在任务之间有效传输的通用视觉表示。然而,许多医学成像模态,例如CT扫描,本质上是三维的,并且在结构和语义上与自然图像有根本的不同。体积模态捕获空间连续性、器官解剖结构和基于强度的组织特性(例如,Hounsfield单位),这是不充分的2D预训练建模。为了弥合这一差距,我们引入了CoralBay,这是一个自蒸馏框架,它通过使用分层3D Swin主干并将自蒸馏应用于级联的多尺度特征来扩展DINO,从而实现对编码全局语义和细粒度局部结构的丰富空间表示进行数据高效的自监督学习。因此,CoralBay有效地转移到广泛的下游放射任务,在不同的解剖目标上表现出强大而一致的性能。此外,我们通过引入一个公共的、可复制的3D放射学排行榜来为开源的eva框架做出贡献,该排行榜统一了多个数据集,并建立了一个用于评估体积表示学习方法的标准化基准。
摘要 :Self-supervised learning has enabled large-scale pre-training on 2D natural images, producing general-purpose visual representations that transfer effectively across tasks. However, many medical imaging modalities, such as CT scans, are inherently three-dimensional and differ fundamentally from natural images in both structure and semantics. Volumetric modalities capture spatial continuity, organ anatomy, and intensity-based tissue properties (e.g., Hounsfield Units), which are not adequately modeled by 2D pre-training. To bridge this gap, we introduce CoralBay, a self-distillation framework that extends DINO by using a hierarchical 3D Swin backbone and applying self-distillation to concatenated multi-scale features, enabling data-efficient self-supervised learning of rich spatial representations that encode both global semantics and fine-grained local structure. As a result, CoralBay transfers effectively to a wide range of downstream radiological tasks, demonstrating strong and consistent performance across diverse anatomical targets. In addition, we contribute to the open-source \eva framework by introducing a public, reproducible 3D radiology leaderboard that unifies multiple datasets and establishes a standardized benchmark for evaluating volumetric representation learning methods.

【3】IdEst: Assessing Self-Supervised Learning Representations via Intrinsic Dimension
标题:IdEst:通过内在维度评估自我监督的学习表示
链接:https://arxiv.org/abs/2606.03338

作者:Julie Mordacq, Vicky Kalogeiton, Steve Oudot
备注:ICML 2026
摘要
摘要

【4】Auditing Engagement Incentives in the Kidfluencer Ecosystem: A Multimodal Weak Supervision Approach
标题:Kidfluencer生态系统中的审计参与激励:多模式弱监督方法
链接:https://arxiv.org/abs/2606.03173

作者:Zijing Wei, Chao Peter Yang, Xuanjie Chen
摘要
摘要

【5】ROBUST-WT: Robust Uncertainty-aware Segmentation Transform via Whitening and Training Enhancements
标题:ROBUST-WT:通过白化和训练增强实现稳健的不确定性感知分割转换
链接:https://arxiv.org/abs/2606.03069

作者:Aqsa Naseer, Maryam Bibi, Syeda Samiya Urooj, Muhammad Khurram Shahzad
备注:8 pages, 6 figures; code available at this https URL
摘要
摘要

【6】Scalable Uncertainty Quantification for Extreme Weather Forecasting via Empirical Neural Tangent Kernels
标题:通过经验神经切核进行极端天气预报的可扩展不确定性量化
链接:https://arxiv.org/abs/2606.02886

作者:Jose Marie Antonio Miñoza, Rex Gregor Laylo, Sebastian C. Ibañez
备注:Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2 (KDD '26)
摘要
摘要

【7】Regime-Arrival Uncertainty in Generalization Bounds under Distribution Shift
标题:分布转移下广义边界的地区到达不确定性
链接:https://arxiv.org/abs/2606.02657

作者:Prince Poudel
备注:23 pages, 4 tables, 3 Figures
摘要:标准的泛化边界假设训练和部署分布是相同的,或者是静态的,并且不考虑平静与危机状态的比率不同的状态切换环境。本文提出了一个框架,通过量化额外的风险,由于制度组成不匹配,分布变化是马尔可夫切换时,推广政权感知模型。我们得到了一个精确的分解,分离制度不匹配的制度敏感性,我们扩展到β混合数据使用有效的样本量校正的频谱间隙,我们显示了一个极大极小的下限合成数据和25年的全球股票指数。建议的惩罚是一个事后实现的泛化差距,而仅训练估计并没有表现出显着的相关性:可以检测到的特征几何危机,但不是时间到达。因此,该框架不是一个预测机器。在极少数政权更迭的情况下,预测未来政权的组成是一个悬而未决的问题。
摘要:The standard generalization bounds assume that the training and deployment distributions are the same, or are static, and don't consider regime switching environments where the ratio of calm vs crisis states is different. This paper proposes a framework that generalizes regime-aware models by quantifying the extra risk due to regime composition mismatch, when distribution shifts are Markov-switching. We obtain an exact decomposition, separating regime mismatch from regime sensitivity; we extend the bound to beta-mixing data using the effective sample size corrected for the spectral gap; and we show a minimax lower bound for synthetic data and on 25 years of global equity indices. The proposed penalty is an ex post realized generalization gap, whereas the training-only estimator does not show significant correlation: the feature geometry of crises can be detected, but not the temporal arrival. Thus, the framework is not a forecast machine. Forecasting the composition of the future regime is an open question in the rare cases of regime change.

迁移|Zero/Few/One-Shot|自适应(13篇)

【1】Re-Evaluating Continual Learning with Few-Shot Adaptation
标题:通过Few-Shot适应重新评估持续学习
链接:https://arxiv.org/abs/2606.03843

作者:Amogh Inamdar,Matthew So,Vici Milenia,Richard Zemel
备注:21 pages, 16 figures
摘要:持续学习方法旨在最大限度地提高在一系列任务上训练的机器学习模型的稳定性和可塑性。稳定性的标准度量(即,遗忘)是模型在先前学习的任务上的0-shot性能,以及可塑性,最近学习的任务上的性能。然而,0-shot评估并不能完全衡量模型或方法保留学习信息或快速适应新信息的能力,因为它需要在多个任务中完美回忆。在本文中,我们提出了Few-Shot评价作为一个更全面的评估的稳定性和可塑性的持续学习系统。我们进行了细粒度的连续图像分类的任务序列的评估,发现这种模式产生了新的见解流行的持续学习策略的性能。通过Few-Shot评价与一种新的度量-每杆可塑性-我们表明,通过元学习的一个短序列的未来任务的连续学习方法添加“远见”诱导学习的行为在任务序列。
摘要:Continual learning methods aim to maximize the stability and plasticity of machine learning models that are trained on a sequence of tasks. The standard measure of stability (i.e., forgetting) is the 0-shot performance of a model on previously learned tasks, and plasticity, the performance on the most recently learned task. However, 0-shot evaluation does not fully measure a model or method's ability to retain learned information or adapt quickly to new information, as it requires perfect recall across multiple tasks. In this paper, we propose few-shot evaluation as a more comprehensive assessment of the stability and plasticity of a continual learning system. We conduct a fine-grained assessment on task sequences for continual image classification and find that this paradigm produces novel insights into the performance of popular continual learning strategies. Through few-shot evaluation with a novel metric -- per-shot plasticity -- we show that adding `foresight' to continual learning methods via the meta-learning of a short sequence of future tasks induces learning-to-learn behavior over the task sequence.

【2】AI Agents Enable Adaptive Computer Worms
标题:人工智能代理启用自适应计算机蠕虫
链接:https://arxiv.org/abs/2606.03811

作者:Jonas Guan,Tom Blanchard,Hanna Foerster,Hengrui Jia,Gabriel Huang,Nicolas Papernot
摘要:计算机蠕虫病毒是一种恶意软件,它通过在一台机器到另一台机器上复制自己来在网络上传播。传统的蠕虫,如WannaCry,利用预先确定的漏洞,通过修补这些漏洞可以阻止它们的传播。在这里,我们展示了人工智能(AI)代理实现了一种全新的威胁:一种蠕虫,它会为遇到的每个目标生成量身定制的攻击策略。该蠕虫寄生地使用受损的机器来运行开放权重的大型语言模型(LLM),以维持其推理,或扩展其进一步攻击的范围。该蠕虫部署在Linux、Windows和IoT(物联网)设备的机器网络上,通过利用常见的真实企业网络漏洞进行传播。由于蠕虫是由被盗的计算提供动力的,因此攻击者每次新感染的边际成本为零。这在攻击者和防御者之间造成了不稳定的经济不对称。此外,由于该蠕虫不需要商业AI平台,因此集中式安全控制(如服务拒绝或速率限制)在结构上无关紧要。我们的研究结果表明,自我维持的人工智能驱动的网络威胁不再是理论上的。我们必须为自主生成的对手做好准备:恶意软件系统在没有人类操作员的情况下传播,并且不是由固定的漏洞代码定义的,而是由对目标进行推理,适应观察并实时合成攻击逻辑的能力定义的。
摘要:A computer worm is malware that spreads on a network by replicating itself from one machine to another. Traditional worms, like WannaCry, exploited predetermined vulnerabilities, and their spread can be halted by patching those vulnerabilities. Here we show that artificial intelligence (AI) agents enable a fundamentally new threat: a worm that generates tailored attack strategies to each target it encounters. The worm parasitically uses compromised machines to run open-weight large language models (LLMs) to sustain its reasoning, or extend its reach for further attacks. Deployed on a network of machines spanning Linux, Windows, and IoT (Internet of Things) devices, the worm propagated by exploiting common, real-world corporate network vulnerabilities. Since the worm is powered by stolen compute, the attacker's marginal cost per new infection is zero. This creates a destabilizing economic asymmetry between attackers and defenders. Moreover, because the worm requires no commercial AI platform, centralized safety controls, such as service refusals or rate limiting, are structurally irrelevant. Our results demonstrate that self-sustaining AI-driven cyber-threats are no longer theoretical. We must prepare for autonomous generative adversaries: malware systems that propagate without human operators and are defined not by fixed exploit code, but by the capacity to reason about targets, adapt to observations, and synthesize attack logic in real time.

【3】Neural Navigation Functions for Zero-Shot Generalizable Motion Planning
标题 :Zero-Shot可推广运动规划的神经导航功能
链接:https://arxiv.org/abs/2606.03756

作者:Benjamin D. Shaffer,Pei-An Hsieh,Brooks Kinch,Nathaniel Trask,M. Ani Hsieh
备注:17 pages, 10 figures
摘要:我们介绍神经导航功能(神经NF),学习反应导航功能,能够在看不见的环境几何形状的zero-shot传输。Neural-NF将数据驱动的自适应放置在结构化的椭圆规划器中,其中学习导航目标,同时通过构造保留规划器结构。具体来说,内在拉普拉斯衍生的功能被映射到本地PDE系数,并解决由此产生的边界值问题产生一个全球一致的价值函数的每个目标域。对于每个可接受的学习模型,所得到的策略是无冲突的,通过构造提供单调下降和在目标处的全局最小值。这承认任何参数设置的线性可解的最优控制解释。从经验上讲,Neural-NF在不同的几何形状之间实现了强大的zero-shot传输,并且比直接预测值函数的学习规划器的性能提高了5倍。
摘要:We introduce Neural Navigation Functions (Neural-NF), a learned reactive navigation function capable of zero-shot transfer across unseen environment geometries. Neural-NF places data-driven adaptation within a structured elliptic planner, where the navigation objective is learned while planner structure is preserved by construction. Specifically, intrinsic Laplacian-derived features are mapped to local PDE coefficients, and solving the resulting boundary value problem produces a globally consistent value function on each target domain. For every admissible learned model, the resulting policy is collision-free, provides monotonic descent and a global minimum at the goal by construction. This admits a linearly-solvable optimal-control interpretation for any parameter setting. Empirically, Neural-NF achieves strong zero-shot transfer across diverse geometries and outperforms learned planners that directly predict the value function by up to a $5\times$ improvement.

【4】Validation-Gated Multi-Agent Governance for Online Adaptation of Thermal-Hydraulic Surrogate Models under Operating-Regime Shift
标题:操作制度转变下在线调整热力-水力代理模型的验证门控多主体治理
链接:https://arxiv.org/abs/2606.03321

作者:Doyeong Lim, Seungyoon Lee, In Cheol Bang
摘要
摘要

【5】Zero-Shot 3D Question Answering via Hierarchical View-to-Token Transportation
标题:通过分层视图到令牌传输实现Zero-Shot3D问题解答
链接:https://arxiv.org/abs/2606.03100

作者:Dongsheng Wang, Dawei Su, Hui Huang
备注:19 pages, 6 figures,
摘要
摘要

【6】FGRPO: Federated GRPO with Adaptive Aggregation on Non-IID Data
标题:FGRPO:对非IID数据进行自适应聚合的联合GRPO
链接:https://arxiv.org/abs/2606.03094

作者:Pengyu Chen, Shaowei Li, Kai Wang, Yunsheng Yuan, Kai Han, Jun Luo, Feng Li
摘要
摘要

【7】MOSAIC: Efficient Mixture-of-Agent Scheduling via Adaptive Aggregation and Inference Concurrency
标题:MOSAIC:通过自适应聚合和推理并发的高效混合代理调度
链接:https://arxiv.org/abs/2606.03014

作者:Saptarshi Mitra,Yifan Zhang,Rachid Karami,Phyo Pyae Moe Aung,Nazmul Takbir,Sreetama Sarkar,Souvik Kundu,Sitao Huang
备注:13 pages, 8 main pages
摘要:混合代理(MoA)系统通过将每个查询路由到多个专家LLM并聚合其输出来提高推理准确性。在有限的GPU资源上高效执行此工作负载存在瓶颈。基于技能的路由创建倾斜的专家需求,并结合预防调整的LLM与长推理模型的结果,在极端的变化,代长度。因此,传统的调度策略由于负载不平衡而遭受显著的GPU空闲和吞吐量崩溃。我们提出了MOSAIC,一个调度框架,以加速MoA工作负载。首先,我们制定了一个基于线性规划(ILP)的调度程序,共同优化专家的位置和每个工人的提示分配从离线配置的成本,复制推理专家在工人,而钉轻量级的。其次,MOSAIC使用置信度感知的自适应聚合,利用专家间的协议绕过繁重的最终聚合器LLM进行共识查询。在我们的4-GPU系统中,MOSAIC在基线调度器上实现了高达2.5倍的专家级,4.23倍的聚合器级和1.7~ 2.3倍的端到端加速,同时匹配精度在0.1pp以内。
摘要:Mixture-of-Agents (MoA) systems improve reasoning accuracy by routing each query to multiple expert LLMs and aggregating their outputs. Efficiently executing this workload on limited GPU resources has bottlenecks. Skill-based routing creates skewed expert demand, and combining instruction-tuned LLMs with long-reasoning models results in extreme variability in generation lengths. Consequently, traditional scheduling strategies suffer from significant GPU idling and throughput collapse due to load imbalances. We present MOSAIC, a scheduling framework to accelerate MoA workloads. First, we formulate an Integer Linear Program (ILP) based scheduler that jointly optimizes expert placement and per-worker prompt assignment from offline-profiled costs, replicating reasoning experts across workers while pinning lightweight ones. Second, MOSAIC uses confidence-aware adaptive aggregation, leveraging inter-expert agreement to bypass the heavy final aggregator LLM for consensus queries. In our 4-GPU system, MOSAIC achieves up to 2.5x expert-stage, 4.23x aggregator-stage and 1.7~2.3x end-to-end speedups over the baseline scheduler, while matching accuracy within 0.1pp.

【8】Exact equivariance, kept through training, buys zero-shot generalisation across the symmetry group
标题:通过训练保持的精确等方差可以购买整个对称群的Zero-Shot概括
链接:https://arxiv.org/abs/2606.03003

作者:Hongbo Wang
备注:92 pages, 11 figures. Core paper plus an extended results-log appendix and a forward-looking theory supplement. All experiments are laptop-scale (CPU/MPS), fully seeded and deterministic
摘要:由等变编码器$E$和等变预测器$f$构建的潜在世界模型继承了其训练损失的可证明对称性:当世界的动力学真的携带通过正交表示$ρ(g)$作用于潜变量的群$G$时,一步预测relMSE在整个群中完全不变,因此,将动力学拟合在有限的方向切片上,在数学上确定了整个轨道上的动力学。我们在笔记本电脑规模(CPU/MPS,完全种子)上验证了这一点。[A]对称性在一个真正的Muon/AdamW + EMA + VICReg运行中幸存下来--在优化后组成编码然后预测残差$\sim 10^{-6}$,而不仅仅是在初始化时,并且在任何优化器下。[B]整个组的一步误差持平至五位数,而同一假设类的非等变基线符合切片,但超出了分布(VN $\times 1.00$ vs基线$\times 13.8$(2D),$\times 17.2$(3D),$\times 157$(整个$\mathrm{SE}(3)$阶梯),等变模型小4.5 - 7.4倍。[C]同样的等距论证提升到闭环:在匹配的等变规划器下,方向$g$上的控制轨迹正好应用于所见的控制轨迹,因此闭环误差在整个组中是不变的--在2D/$\mathrm{SO}(2)$中是浮点地板精确的,在3D/$\mathrm{SE}(3)$中是统计平坦的(不相交的95% CI)。我们针对萨顿的痛苦教训对先验进行压力测试:增强,蛮力规模和软等方差每一个都最接近跨组任务度量,而不是浮动地板精确度。由于等方差在复合下是闭合的,因此$H$-fold卷展栏在每个水平线处保持平坦($\times 1.00$,$\le 2\times 10^{-7}$),而基线的残差与$H$复合。超出范围:任务成功扫描,无计划器不变性和扩展。
摘要:A latent world model built from an equivariant encoder $E$ and an equivariant predictor $f$ inherits a provable symmetry of its training loss: when the world's dynamics genuinely carries a group $G$ acting on latents by an orthogonal representation $ρ(g)$, the one-step prediction relMSE is exactly invariant across the whole group, so fitting the dynamics on a restricted slice of orientations mathematically determines it on the entire orbit (jǔ yī fǎn sān). We verify this end-to-end at laptop scale (CPU/MPS, fully seeded). [A] The symmetry survives a real Muon/AdamW + EMA + VICReg run -- composed encode-then-predict residual $\sim 10^{-6}$ after optimisation, not just at initialisation, and under any optimiser. [B] One-step error is flat to five digits across the group, while a same-hypothesis-class non-equivariant baseline fits the slice but breaks out-of-distribution (VN $\times 1.00$ vs baseline $\times 13.8$ in 2D, $\times 17.2$ in 3D, $\times 157$ over the full $\mathrm{SE}(3)$ ladder), with the equivariant model $4.5$-$7.4\times$ smaller. [C] The same isometry argument lifts to closed loop: under a matching equivariant planner the control trajectory at orientation $g$ is exactly $ρ(g)$ applied to the seen one, so closed-loop error is invariant across the group -- float-floor-exact in 2D/$\mathrm{SO}(2)$ on real PushT and statistically flat in 3D/$\mathrm{SE}(3)$ (disjoint 95% CIs). We stress-test the prior against Sutton's Bitter Lesson: augmentation, brute-force scale, and soft-equivariance each close at most the across-group task metric, never the float-floor exactness. Because equivariance is closed under composition, the $H$-fold rollout stays flat ($\times 1.00$, $\le 2\times 10^{-7}$) at every horizon, while the baseline's residual compounds with $H$. Out of scope: task-success sweeps, planner-free invariance, and scaling.

【9】DriftSched: Adaptive QoS-Aware Scheduling under Runtime Token Drift for Multi-Tenant GPU Inference
标题:DriftSched:用于多租户图形处理器推理的多租户令牌漂移下的自适应服务质量感知调度
链接:https://arxiv.org/abs/2606.02982

作者:Kathiravan Palaniappan
备注:17 pages, 22 figures, 7 tables
摘要:大型语言模型(LLM)推理服务的快速增长增加了对高效多租户GPU调度的需求。虽然现代推理运行时(如vLLM)通过持续的优化和优化的内存管理提高了吞吐量,但准确估计异构推理请求的运行时成本仍然是一个重大挑战。在实践中,观察到的输出长度通常偏离接纳时间估计,从而产生运行时令牌漂移,这可能导致工作负载错误分类、队列不平衡、尾部延迟增加和服务质量(QoS)下降。 本文介绍了DriftSched,一个自适应QoS感知调度框架,用于在NVIDIA L4 GPU上提供多租户LLM推理服务。DriftSched结合了工作负载分类、令牌预算估计、租户感知队列管理和运行时反馈驱动的漂移补偿,以改进准入时间调度决策。该框架评估FIFO,优先级,加权,最短作业优先(SJF),老化优先级调度策略下异构多租户工作负载。 实验结果证明了跨工作负载类别的可测量的运行时令牌漂移。自适应偏差校正将工作负载估计误差平均降低了38.8%(MAE)和40.5%(RMSE),提高了工作负载分类稳定性和调度准确性。在所有评估的FIFO中,SJF实现了最佳的整体性能,在持续的GPU争用下,相对于FIFO,端到端延迟的中位数减少了约42%,P99延迟减少了约16%。 这项工作有助于自适应漂移感知调度架构,运行时令牌漂移补偿机制,和一个可再生的基准框架,用于评估QoS感知LLM推理调度共享GPU基础设施。
摘要:The rapid growth of large language model (LLM) inference services has increased the demand for efficient multi-tenant GPU scheduling. While modern inference runtimes such as vLLM improve throughput through continuous batching and optimized memory management, accurately estimating the runtime cost of heterogeneous inference requests remains a significant challenge. In practice, observed output lengths often deviate from admission-time estimates, creating runtime token drift that can lead to workload misclassification, queue imbalance, increased tail latency, and degraded Quality-of-Service (QoS). This paper presents DriftSched, an adaptive QoS-aware scheduling framework for multi-tenant LLM inference serving on NVIDIA L4 GPUs. DriftSched combines workload classification, token-budget estimation, tenant-aware queue management, and runtime feedback-driven drift compensation to improve admission-time scheduling decisions. The framework evaluates FIFO, Priority, Weighted, Shortest-Job-First (SJF), and Aging Priority scheduling policies under heterogeneous multi-tenant workloads. Experimental results demonstrate measurable runtime token drift across workload categories. Adaptive bias correction reduces workload estimation error by an average of 38.8% (MAE) and 40.5% (RMSE), improving workload classification stability and scheduling accuracy. Among all evaluated schedulers, SJF achieves the best overall performance, reducing median end-to-end latency by approximately 42% and P99 latency by approximately 16% relative to FIFO under sustained GPU contention. The work contributes an adaptive drift-aware scheduling architecture, a runtime token-drift compensation mechanism, and a reproducible benchmarking framework for evaluating QoS-aware LLM inference scheduling on shared GPU infrastructure.

【10】From Non-Convex to Strongly Convex: Curvature-Adaptive FTPL for Online Optimization
标题:从非凸到强凸:用于在线优化的曲线自适应FTPL
链接:https://arxiv.org/abs/2606.02948

作者:Moses Charikar, Chirag Pabbaraju, Ambuj Tewari
摘要
摘要

【11】Hybrid Adaptive Kalman Filtering for Data-Efficient Joint Tracking and Classification
标题:用于数据高效联合跟踪和分类的混合自适应卡尔曼过滤
链接:https://arxiv.org/abs/2606.02767

作者:Jiho Lee, Nisar R. Ahmed, Rebecca Russell
备注:8 pages, 4 figures
摘要
摘要

【12】Resource-Constrained Adaptive Inference for Sequential Pricing
标题:资源约束的顺序定价自适应推理
链接:https://arxiv.org/abs/2606.03736

作者:Ruicheng Ao,Jiashuo Jiang,David Simchi-Levi
摘要:资源约束定价控制器可以使固定价格推理不可能:控制器的资源状态可能会从可行集中删除目标价格邻域,即使每个已实现的动作都有一个已知的正密度。我们通过一个局部的非识别结果和一个已实现的信息时钟来形式化这种支持-排除故障。然后,我们设计了一个目标感知定价控制器,证明可行的目标波段和日志连续的局部密度。局部去偏给出了学生化的间隔,其宽度由该时钟控制。由此产生的遗憾-信息会计,说明了试点重新解决错误,表明廉价的探索可能是不够的推理:多项式目标质量给出多项式率,而一个纯粹的1美元/吨$目标分支不产生收缩固定目标的间隔,没有额外的本地移动。实验表明,在认证频带和诊断的警告时,资源状态崩溃的目标支持校准。
摘要:Resource-constrained pricing controllers can make fixed-price inference impossible: the controller's resource state may remove the target price neighborhood from the feasible set, even when every realized action has a known positive density. We formalize this support-exclusion failure through a local non-identification result and a realized information clock. We then design a target-aware pricing controller that certifies feasible target bands and logs continuous local densities. Localized debiasing gives studentized intervals whose width is governed by this clock. The resulting regret--information accounting, stated up to pilot re-solving error, shows that cheap exploration can be insufficient for inference: polynomial target mass gives polynomial rates, while a pure $1/t$ target branch does not yield shrinking fixed-target intervals without additional local movement. Experiments show calibration in certified bands and diagnostic abstention when the resource state collapses target support.

【13】Few-Shot Prediction for Pulsar Noise with Long Short-Term Memory Network
标题:基于长短期记忆网络的脉冲星噪声少炮预测
链接:https://arxiv.org/abs/2606.03574

作者:Qingye Tang,Dechao An,Haoran Peng,Yuqi Ouyang
摘要:这项工作提出了一种新的解决方案来预测脉冲星定时残差有限的数据,解决了PTA数据集中毫秒脉冲星自旋频率子组数据稀缺的关键挑战。所提出的解决方案应用了使用模型不可知元学习算法优化的长短期记忆(LSTM)网络,通过仅使用Few-Shot的地面真值定时残差微调LSTM网络来快速适应新的频域。粒子群优化算法也用于自动超参数优化,从而提高预测精度。我们的解决方案,在国际脉冲星定时阵列(IPTA)的第二次数据发布的评估,展示了强大的泛化与准确的预测在三个指标在高频测试频域,而只需要10%的定时残差从这些域的模型微调。此外,我们的轻量级结构只需花费16.86 MB CPU内存和18毫秒即可进行单步残差预测。所有这些特征使我们的解决方案非常适合现实世界的应用,其中脉冲星计时残差的有效和实时预测至关重要,特别是在计算能力、内存或能源可用性有限的资源受限环境中。
摘要:This work proposes a novel solution to predict pulsar timing residuals with limited data, addressing the critical challenge of data scarcity across spin-frequency subgroups of millisecond pulsars in PTA datasets. The proposed solution applies a Long Short-Term Memory (LSTM) network optimized using the model-agnostic meta-learning algorithm, enabling rapid adaptation to new frequency domain by fine-tuning the LSTM network with only a few-shot of ground truth timing residuals. Particle swarm optimization algorithm is also used for automatic hyperparameter optimization, leading to improved prediction accuracy. Our solution, evaluated on the second data release of the International Pulsar Timing Array (IPTA), demonstrates robust generalization with accurate predictions in three metrics across high-frequency test frequency domains, while requiring only 10% of the timing residuals from these domains for model fine-tuning. Furthermore, our lightweight structure only costs 16.86 MB CPU memory and 18 milliseconds for single-step residual prediction. All these characteristics make our solution highly suitable for real-world applications, where effective and real-time predictions of pulsar timing residuals are essential-particularly in resource-constrained environments with limited computational power, memory, or energy availability.

强化学习(7篇)

【1】Synthesize and Reward -- Reinforcement Learning for Multi-Step Tool Use in Live Environments
标题:合成和奖励--在实时环境中多步骤工具使用的强化学习
链接:https://arxiv.org/abs/2606.03892

作者:Ibrahim Abdelaziz,Asim Munawar,Kinjal Basu,Maxwell Crouse,Chulaka Gunasekara,Suneet Katrekar,Pavan Kapanipathi
摘要:训练LLM来编排多步工具调用受到三个耦合障碍的阻碍:现实的有状态执行环境构建成本高昂,合成训练查询通常与服务器的实际状态分离(因此生成的工具调用无法执行),以及基于召回的RL奖励激励详细的工具调用模式。我们提供证明(Programmatic Rewards On Verified Environments),一个有三个贡献的框架:(1)一个包含20个有状态MCP的库(模型上下文协议)服务器公开了343个工具,使实时执行的RL训练具有会话范围的状态隔离;(2)自动化数据合成管道,其通过依赖图生成针对这些服务器的经验证的多轮工具调用轨迹,基于实时采样服务器状态的引导会话模拟,因此每个生成的查询都引用实际存在的实体;以及(3)多成分程序性奖励-分级有效性评分、依赖性感知覆盖、具有复杂性缩放的调用预算的自适应效率惩罚、工具名称信号和参数值匹配奖励-不需要外部判断模型。我们用GRPO训练了四个模型(Qwen 3 - 4 B、Qwen 3 -8B、Qwen2.5- 7 B、Granite-4.1-8B),使用相同的奖励超参数和大约13 K个训练样本;只有学习率是从三点扫描中调整的。在BFCL Multi-Turn、tau 2-bench和T-Eval上,PROVE分别获得了高达+10.2、+6.8和+6.5分的改进,表明紧凑的程序化奖励在两个模型系列的多步工具编排上获得了一致的收益。
摘要 :Training LLMs to orchestrate multi-step tool calls is held back by three coupled obstacles: realistic stateful execution environments are costly to build, synthetic training queries are often detached from the server's actual state (so the generated tool calls fail to execute), and recall-based RL rewards incentivize verbose tool-calling patterns. We present PROVE (Programmatic Rewards On Verified Environments), a framework with three contributions: (1) a library of 20 stateful MCP (Model Context Protocol) servers exposing 343 tools, enabling live-execution RL training with session-scoped state isolation; (2) an automated data synthesis pipeline that generates validated multi-turn tool-call trajectories against these servers via dependency-graph-guided conversation simulation grounded in live-sampled server state, so every generated query references entities that actually exist; and (3) a multi-component programmatic reward - graduated validity scoring, dependency-aware coverage, an adaptive efficiency penalty with a complexity-scaled call budget, a tool-name signal, and an argument-value matching bonus - requiring no external judge model. We train four models (Qwen3-4B, Qwen3-8B, Qwen2.5-7B, Granite-4.1-8B) with GRPO using identical reward hyperparameters and ~13K training examples; only learning rate is tuned per model family from a three-point sweep. On BFCL Multi-Turn, tau2-bench, and T-Eval, PROVE yields improvements of up to +10.2, +6.8, and +6.5 points respectively, demonstrating that a compact programmatic reward yields consistent gains on multi-step tool orchestration across two model families.

【2】Easy-to-Use Shielding for Reinforcement Learning
标题:用于强化学习的易于使用的屏蔽
链接:https://arxiv.org/abs/2606.03804

作者:Stefan Pranger,Bettina Könighofer
摘要:安全探索是强化学习(RL)中的一个关键挑战,旨在防止代理在探索环境时做出有害的决策。安全探索是强化学习(RL)中的一个关键挑战,旨在防止代理在探索环境时做出有害的决策。屏蔽就是这样一种技术,它以环境模型的形式假设领域知识来决定动作的安全性。虽然已经建立,但由于缺乏连接正式屏蔽合成与标准RL框架的可访问的端到端基础设施,屏蔽在RL中的应用有限。应用屏蔽通常需要正式方法的专业知识和大量的工程工作,将其保持在典型的RL工作流程之外。我们通过将我们的屏蔽合成工具Tempest扩展到安全RL的实际后端来解决这个问题。我们的核心贡献是tempestpy,这是一个Python库,它将基于Tempest的屏蔽合成直接集成到Gymnasium API中,允许在现有的RL管道中合成和部署屏蔽。这降低了屏蔽的进入门槛,并将正式的安全探索方法转变为RL从业者可用的组件。我们还扩展了Tempest的算法支持,以计算随机多人游戏的隔音罩,从而保留正式的安全保证。我们展示了端到端的工作流程,并在多个环境中评估屏蔽和非屏蔽RL。为了便于建模,我们提供了象征性的模型MiniGrid和介绍MiniGridSafe,一个集合的操场环境,旨在使屏蔽容易访问和实验透明。MiniGridSafe通过面向安全的场景扩展了MiniGrid,这些场景具有概率转换和额外的代理,从而能够在简单直观的设置中研究具有挑战性的安全方面。
摘要:Safe exploration is a key challenge in Reinforcement Learning (RL) that aims to prevent agents from making harmful decisions while exploring their environment. Safe exploration is a key challenge in Reinforcement Learning (RL) that aims to prevent agents from making harmful decisions while exploring their environment. Shielding is one such technique that assumes domain knowledge in the form of an environment model to decide upon action safety. Although well-established, shielding has seen limited adoption in RL due to the lack of accessible end-to-end infrastructure connecting formal shield synthesis with standard RL frameworks. Applying shielding typically requires expertise in formal methods and substantial engineering effort, keeping it outside the typical RL workflow. We address this by extending our shield synthesis tool Tempest into a practical backend for safe RL. Our core contribution is tempestpy, a Python library that integrates Tempest-based shield synthesis directly into the Gymnasium API, allowing shields to be synthesized and deployed within existing RL pipelines. This lowers the barrier to entry for shielding and turns formal safe-exploration methods into a usable component for RL practitioners. We also extend Tempest's algorithmic support to compute sound shields for stochastic multiplayer games, preserving formal safety guarantees. We demonstrate the resulting workflow end to end and evaluate shielded and unshielded RL across multiple environments. To facilitate modeling, we provide symbolic models for MiniGrid and introduce MiniGridSafe, a collection of playground environments designed to make shielding easily accessible and experimentally transparent. MiniGridSafe extends MiniGrid with safety-oriented scenarios featuring probabilistic transitions and additional agents, enabling the study of challenging safety aspects in a simple and intuitive setting.

【3】Tool-Aware Optimization with Entropy Guidance for Efficient Agentic Reinforcement Learning
标题:具有信息量指导的工具感知优化,以实现高效的统计强化学习
链接:https://arxiv.org/abs/2606.03762

作者:Hongye Cao,Nuo Yan,Haoyuan Deng,Ziwei Wang,Tianpei Yang,Jing Huo,Yuyao Zhang,Yang Gao
摘要:强化学习(RL)为大型语言模型(LLM)提供了工具使用能力,大大提高了复杂任务的推理能力。然而,集成外部工具往往会破坏培训的稳定性:过度依赖工具可能会导致输入分布变化,而过于保守的工具使用则限制了有效的探索。为了解决这个问题,我们提出了一个统一的框架TAO-RL,它将工具感知的轨迹过滤与熵引导的探索相结合,以实现有效的策略优化。具体来说,在数据级别,TAO-RL根据两个标准过滤卷展轨迹:丢弃所有工具调用都无法执行的轨迹,并删除所有卷展正确或不正确的轨迹,因为这两种情况都会产生退化的优势估计,不会产生有区别的学习信号。这种联合过滤保留了具有工具能力和信息量的数据,建立了高质量的培训分布。在算法层面,我们引入了一个工具感知的熵引导奖金,重塑了工具调用后令牌的优势函数,鼓励政策在关键决策点探索更多样化的推理路径。这两个组成部分是相辅相成的:轨迹过滤建立了一个干净和信息丰富的训练基础,而熵引导的探索在关键的工具交互结合点驱动更强的推理行为。在3个模型尺度上对7个具有挑战性的推理基准进行了大量实验,证明了TAO-RL优于现有方法。
摘要:Agentic reinforcement learning (RL) equips large language models (LLMs) with tool-use capabilities that substantially improve reasoning on complex tasks. However, integrating external tools often destabilizes training: over-reliance on tools can induce input distribution shift, while overly conservative tool use limits effective exploration. To address this issue, we propose a unified framework TAO-RL that couples tool-aware trajectory filtering with entropy-guided exploration for efficient policy optimization. Specifically, at the data level, TAO-RL filters rollout trajectories along two criteria: discarding those where all tool invocations fail to execute, and removing those where all rollouts are either correct or incorrect, as both cases yield degenerate advantage estimates that contribute no discriminative learning signal. This joint filtering retains data that are both tool-capable and informative, establishing a high-quality training distribution. At the algorithmic level, we introduce a tool-aware entropy-guided bonus that reshapes the advantage function at post-tool-call tokens, encouraging the policy to explore more diverse reasoning paths at critical decision points. These two components are mutually reinforcing: trajectory filtering establishes a clean and informative training foundation, while entropy-guided exploration drives stronger reasoning behaviors at critical tool-interaction junctures. Extensive experiments on 7 challenging reasoning benchmarks across 3 model scales demonstrate the superiority of TAO-RL over existing methods.

【4】Post-Hoc Robustness for Model-Based Reinforcement Learning
标题:基于模型的强化学习的事后鲁棒性
链接:https://arxiv.org/abs/2606.03521

作者:Siemen Herremans, Ali Anwar, Siegfried Mercelis
摘要
摘要

【5】Mitigating False Credit Propagation: Probabilistic Graphical Reward Aggregation for Rubric-Based Reinforcement Learning
标题:缓解虚假信用传播:基于条目的强化学习的概率图形奖励聚合
链接:https://arxiv.org/abs/2606.03361

作者:Can Lv, Mingju Chen, Heng Chang, Shiji Zhou
摘要
摘要

【6】Learning to See via Epiretinal Implant Stimulation in silico with Model-Based Deep Reinforcement Learning
标题:通过基于模型的深度强化学习在电子计算机中通过视网膜前植入刺激来学习看东西
链接:https://arxiv.org/abs/2606.03118

作者:Jacob Lavoie, Marwan Besrour, William Lemaire, Jean Rouat, Réjean Fontaine, Eric Plourde
备注:18 pages, 6 figures. Published version: Biomed. Phys. Eng. Express 10, 025006 (2024)
摘要
摘要

【7】Fairness Definitions and Metrics in Deep Reinforcement Learning for Drug Discovery in Healthcare: A Rapid Evidence Review
标题:医疗保健药物发现的深度强化学习中的公平定义和预设:快速证据审查
链接:https://arxiv.org/abs/2606.02902

作者:Esmaeil Shakeri, Ronnie de Souza Santos, Behrouz Far
备注:10 pages, 6 figures, 3 tables. Accepted as a full paper at a symposium of IEEE COMPSAC 2026
摘要
摘要

分层学习(1篇)

【1】Hierarchical RBF-KAN and RBF-SKAN Architectures for Multidimensional Function Approximation and Random Field Learning
标题:用于多维函数逼近和随机场学习的分层RBF-KAN和RBF-SKAN架构
链接:https://arxiv.org/abs/2606.02936

作者:Mingtao Xia, Qijing Shen
摘要
摘要

医学相关(4篇)

【1】CoughSense: Five-Class Respiratory Disease Classification via Whisper Encoder Fine-Tuning and Dual-Encoder Cross-Attention Fusion with Balanced Contrastive Learning
标题:CoughSense:通过Whisper编码器微调和双编码器交叉注意融合以及平衡对比学习进行五级呼吸道疾病分类
链接:https://arxiv.org/abs/2606.02998

作者:Nikhil Vincent
备注:26 pages, 3 figures
摘要:自动咳嗽分析为低成本呼吸道筛查提供了一条途径,但大多数现有工作都停留在二元COVID-19检测上。一个实用的工具需要从消费者智能手机上的一个咳嗽记录中区分出几种呼吸状况。我们提出CoughSense,一个将咳嗽记录分为五类的系统。这些是健康的,COVID-19,哮喘或呼吸道疾病,支气管炎和肺炎。我们汇总了来自四个公共数据集(Coswara,CoughVID,Virufy和华西医院儿科咳嗽数据集)的18,301个记录,并使用OpenAI Whisper编码器作为咳嗽疾病分类的预训练骨干。主要贡献是活动帧QKV注意力池,它将注意力限制在1500个编码器令牌中的前200个。这避免了沉默稀释问题,因为3秒的咳嗽只填充了Whisper 30秒输入窗口的150个标记。其他训练部分处理19比1的类不平衡和四个数据集的域转移。这些包括WeightedRandomSampler,SpecAugment,带有强制少数配对的平衡混合,监督对比辅助损失,Film症状调节和梯度反转域适应。双编码器模型通过交叉注意将Whisper与OPERA-CT呼吸基础模型融合。CoughSense(Whisper-tiny,8.6M参数)在五重交叉验证中达到82.3%的平衡准确度(macro-F1为0.817,AUC为0.941)。它以11.1分的优势击败了ImageNet预训练的EfficientNet-B2,以29.6分的优势击败了从头开始训练的ViT。所有五个班都通过了74%的回忆,五个班中有四个通过了80%。双编码器模型达到了85.4%的平衡精度。活动帧合并是所有消融组件中最大的单一贡献者,为5.1分,这应该有助于使用Whisper作为主干的任何短音频任务。
摘要:Automated cough analysis offers a path to low-cost respiratory screening, but most existing work stops at binary COVID-19 detection. A practical tool needs to tell apart several respiratory conditions from one cough recording on a consumer smartphone. We present CoughSense, a system that sorts cough recordings into five classes. These are healthy, COVID-19, asthma or respiratory condition, bronchitis, and pneumonia. We aggregated 18,301 recordings from four public datasets (Coswara, CoughVID, Virufy, and the West China Hospital Pediatric Cough Dataset) and used the OpenAI Whisper encoder as a pretrained backbone for cough disease classification. The main contribution is active-frame QKV attention pooling, which restricts attention to the first 200 of 1500 encoder tokens. This avoids the silence-dilution problem that arises because a 3-second cough fills only 150 tokens of Whisper's 30-second input window. Other training parts handle the 19 to 1 class imbalance and the four-dataset domain shift. These include WeightedRandomSampler, SpecAugment, Balanced Mixup with forced minority pairing, a supervised contrastive auxiliary loss, FiLM symptom conditioning, and gradient-reversal domain adaptation. A dual-encoder model fuses Whisper with the OPERA-CT respiratory foundation model through cross-attention. CoughSense (Whisper-tiny, 8.6M parameters) reached 82.3 percent balanced accuracy on five-fold cross-validation (macro-F1 of 0.817, AUC of 0.941). It beat an ImageNet-pretrained EfficientNet-B2 by 11.1 points and a ViT trained from scratch by 29.6 points. All five classes passed 74 percent recall and four of five passed 80 percent. The dual-encoder model reached 85.4 percent balanced accuracy. Active-frame pooling is the largest single contributor across all ablation components at 5.1 points, which should help any short-audio task using Whisper as a backbone.

【2】Multi-Modal Machine Learning for Breast Cancer Recurrence Prediction
标题:多模式机器学习用于乳腺癌复发预测
链接:https://arxiv.org/abs/2606.02892

作者:Jiahao Shao, Xudong Wang, Anam Nawaz Khan, Christopher Brett, Xueping Li, Bing Yao
备注:33 pages, 10 figures
摘要
摘要

【3】Cross-Modal Contrastive Learning of ECG and Angiography Representations for Severe Stenosis Classification
标题:用于严重狭窄分类的心电图和血管造影表示的跨模式对比学习
链接:https://arxiv.org/abs/2606.02605

作者:Nikola Cenikj,Özgün Turgut,Alexander Müller,Alexander Steger,Jan Kehrer,Marcus Brugger,Daniel Rueckert,Philip Müller
摘要:冠状动脉狭窄是一种常见的心血管疾病,严重的,未经治疗的情况下,心脏病发作的风险很大。尽管冠状动脉(X射线)血管造影仍然是狭窄诊断的标准,但它们是侵入性的,时间和资源密集型的,因此仅对基于症状和先前临床测试的疾病概率高的患者进行。然而,一部分患者,特别是那些没有症状的患者,可能仍然无法诊断。从ECG中检测狭窄的指征,这是快速,廉价,非侵入性的,因此即使在无症状的患者中也可以常规获得,这将支持早期诊断。然而,由于在ECG中没有识别出可靠的狭窄特异性信号,因此目前无法将其用于狭窄风险分层。为了解决这个问题,我们引入了StenCE,这是一个预训练框架,允许根据直接来自ECG的特征对患者进行分层。不同狭窄严重度阈值和额外ECG疾病分类任务的评估表明,不同ECG编码器的性能得到一致改善,优于先前的工作。所获得的模型成功地检测ECG中的狭窄诊断信号,并且是第一个在严重狭窄分类中实现高性能的模型。源代码可在https://github.com/NikolaCenic/ecg-stenosis-cls上获得。
摘要:Coronary artery stenosis is a common cardiovascular disease, with severe, untreated cases posing significant risks of heart attack. Although coronary (X-ray) angiograms remain the standard for stenosis diagnosis, they are invasive, time- and resource-intensive, and therefore only performed on patients with a high probability of disease based on symptoms and prior clinical tests. However, a subset of patients, especially those without symptoms, may remain undiagnosed. Detecting indications of stenosis from ECGs, which are fast, cheap, non-invasive, and thus routinely acquired even in asymptomatic patients, would support early diagnosis. However, as no reliable stenosis-specific signal has been identified in ECGs, they can not currently be used for stenosis risk stratification. To address this, we introduce StenCE, a pretraining framework, allowing stratification of patients based on features derived directly from ECGs. Evaluations across varying stenosis severity thresholds and additional ECG disease classification tasks demonstrate consistent performance improvements across different ECG encoders, outperforming previous work. The obtained models successfully detect signals for stenosis diagnosis in ECGs and are the first to achieve high performance in severe stenosis classification. The source code is available at https://github.com/NikolaCenic/ecg-stenosis-cls.

【4】Scalable On-Hardware Training of Quantum Neural Networks and Application to Clinical Data Imputation
标题:量子神经网络的可扩展硬件训练及其在临床数据插补中的应用
链接:https://arxiv.org/abs/2606.03517

作者:Natansh Mathur,Panagiotis Kl. Barkoutsos,Masako Yamada,Martin Roetteler,Iordanis Kerenidis
备注:13 pages, 9 figures
摘要:在量子硬件上训练量子神经网络(QNN)目前受到梯度估计成本的影响:标准参数偏移方法需要大量的电路评估,这些评估随着可训练参数的数量呈二次方增长,使得基于硬件的优化在小系统尺寸之外变得不切实际。在这项工作中,我们引入了一个训练框架,将这种成本降低到量子位数的对数,使得基于梯度的QNN优化在短期硬件上以越来越大的规模变得可行。 我们的框架结合了三个共同设计的成分:(i)一个结构化的,子空间保持蝴蝶电路架构与$O(n \log n)$参数和对数深度;(ii)逐层训练策略,每次将硬件优化限制在一个小的、结构良好的层;及(iii)并行参数-移位规则,利用每个Butterfly层内的交换结构,在恒定数量的电路执行中提取所有梯度。这使得每个优化步骤的不同电路评估的数量从$O(n^2)$减少到$O(\log n)$。 我们使用MIMIC-III电子健康记录数据集验证了临床数据插补框架,这是一个对优化不稳定性和模型方差敏感的苛刻基准。混合经典量子模型直接在IonQ Forte Enterprise捕获离子硬件上以16量子位进行训练,相对于理想或噪声模拟,性能不会下降,并通过32量子位的张量网络模拟进行训练,在硬件上执行32量子位推理。由此产生的模型在下游患者生存预测中匹配或超过强经典神经基线,同时在运行中表现出较小的方差,这表明所提出的框架能够在现实的硬件限制下实现实用的、可扩展的QNN训练。
摘要 :Training quantum neural networks (QNNs) on quantum hardware is currently bottlenecked by the cost of gradient estimation: standard parameter-shift methods require a number of circuit evaluations that grows quadratically with the number of trainable parameters, making hardware-based optimisation impractical beyond small system sizes. In this work, we introduce a training framework that reduces this cost to logarithmic in the number of qubits, making gradient-based QNN optimisation feasible on near-term hardware at increasing scales. Our framework combines three co-designed ingredients: (i) a structured, subspace-preserving Butterfly circuit architecture with $O(n \log n)$ parameters and logarithmic depth; (ii) a layer-wise training strategy that confines on-hardware optimisation to one small, well-structured layer at a time; and (iii) a parallelised parameter-shift rule that exploits the commuting structure within each Butterfly layer to extract all gradients in a constant number of circuit executions. Together these reduce the number of distinct circuit evaluations per optimisation step from $O(n^2)$ to $O(\log n)$. We validate the framework on clinical data imputation using the MIMIC-III electronic health record dataset, a demanding benchmark sensitive to optimisation instability and model variance. Hybrid classical-quantum models are trained directly on IonQ Forte Enterprise trapped-ion hardware at 16 qubits without performance degradation relative to ideal or noisy simulation and via tensor-network simulation at 32 qubits, with 32-qubit inference executed on hardware. The resulting models match or exceed strong classical neural baselines in downstream patient survival prediction while exhibiting reduced variance across runs, demonstrating that the proposed framework enables practical, scalable QNN training under realistic hardware constraints.

蒸馏|知识提取(5篇)

【1】Physics-Guided Policy Optimization with Self-Distillation
标题:具有自蒸馏的物理引导政策优化
链接:https://arxiv.org/abs/2606.03620

作者:Ke Wang, Yuning Wu, Haoran Liu, Chaoqun Jia, Devin Chen, Kai Wei
摘要
摘要

【2】When Should the Teacher Move? Temporal Coupling and Stability in Self On-Policy Distillation
标题:老师什么时候应该搬家?自策略蒸馏中的时间耦合和稳定性
链接:https://arxiv.org/abs/2606.03532

作者:Haowei Guo, Baolong Bi, Ruicheng Zhang, Bingqian Sun, Wentao Zhang
摘要
摘要

【3】Constitutional On-Policy Safe Distillation
标题:宪法上的政策安全蒸馏
链接:https://arxiv.org/abs/2606.03089

作者:Ming Wen, Yuxuan Liu, Kun Yang, Yunhao Feng, Zhuoer Xu, Yuhao Sun, Shiwen Cui, Xiang Zheng, Xingjun Ma, Yu-Gang Jiang
摘要
摘要

【4】Filter, Then Reweight: Rethinking Optimization Granularity in On-Policy Distillation
标题:过滤,然后重新加权:重新思考按策略蒸馏中的优化粒度
链接:https://arxiv.org/abs/2606.02684

作者:Yuying Li,Leqi Zheng,Yongzi Yu,Wenrui Zhou,Xuchang Zhong,Xing Hu,Jing Jin,Huangjie Yuan,Tao Feng
摘要:大型语言模型中的On-Policy蒸馏(OPD)正在从全跟踪KL监督转向更具选择性的训练范式。最近的OPD方法越来越多地关注于选择从哪些轨迹学习,哪些标记信息量最大,以及哪些监督信号最可靠。基于这种趋势,我们重新思考了OPD的优化粒度,提出了\fireicon\ FiRe-OPD(Filter,then Reweight),它在轨迹和令牌两个层面联合调整监管信号。详细地说,FiRe-OPD首先过滤轨迹以去除低质量的滚动样本,然后在保留的轨迹内应用软重新加权以强调信息令牌。与硬令牌选择相比,FiRe-OPD利用软加权机制,有效地减少了信息丢失,增强了优化稳定性,从而实现了更细粒度的OPD优化。我们验证了FiRe-OPD在强到弱,单教师和多教师设置中的有效性,并证明了其优于最近的令牌级OPD方法(例如,AIME 2024上的+6.25在强到弱,矿工上的+18.81在多教师)。我们的代码可在https://github.com/YuYingLi0/FiRe-OPD上获得。
摘要:On-Policy distillation (OPD) in large language models is shifting from full-trace KL supervision toward more selective training paradigms. Recent OPD methods increasingly focus on selecting which trajectories to learn from, which tokens are most informative, and which supervision signals are most reliable. Motivated by this trend, we rethink optimization granularity of OPD and propose \fireicon\ FiRe-OPD (Filter, then Reweight), which jointly adjusts supervision signals at both trajectory and token levels. In details, FiRe-OPD first filters trajectories to remove low-quality rollout samples, and then applies soft reweighting within the retained trajectories to emphasize informative tokens. Compared with hard token selection, FiRe-OPD leverages a soft-weighting mechanism to effectively mitigate information loss and enhance optimization stability, thereby achieving finer-grained OPD optimization. We validate the effectiveness of FiRe-OPD across strong-to-weak, single-teacher, and multi-teacher settings, and demonstrate its superiority over recent token-level OPD methods ( (e.g., +6.25 on AIME 2024 in strong-to-weak, +18.81 on Miner in multi-teacher). Our code is available at https://github.com/YuYingLi0/FiRe-OPD.

【5】A Quantitative Approximation Framework for Flow Distillation in Diffusion Models
标题:扩散模型中流动蒸馏的定量逼近框架
链接:https://arxiv.org/abs/2606.03820

作者:Weiguo Gao,Ming Li,Lei Shi,Hanfei Zhou
摘要:我们开发了一个扩散蒸馏的定量近似框架,将几步采样视为学习流图组成下的误差传播。专注于轨迹蒸馏的概率流常微分方程,我们表明,局部近似误差可以强烈放大低噪声多模态制度,其中底层的动态变得僵硬。在一个易于分析的高斯混合奥恩斯坦-乌伦贝克设置,我们分开两个核心的困难:近似的时间相关的分数场和控制的时间积分雅可比界的概率流常微分方程的动态放大。在近似方面,我们证明了构造性L^p(p_t)保证,表明ReLU-ReQU网络随着时间的推移均匀地近似高斯混合分数,深度和宽度在目标精度中以多项式方式缩放,并且明确地与混合几何结构。在稳定性方面,我们推导出概率流速的空间Lipschitz常数的显式界L(t),并将其转换为由\int_s^t L(u)\,du控制的流图稳定性估计,使得刚性区域中的后期放大可计算。建立在这些估计,我们证明了深残留成分有效地近似的长期运输,与全球误差控制的稳定性放大因子,并确定一个Lipschitz失配制度,其中一步蒸馏是结构上不利的。由此产生的理论产生一个稳定平衡的非均匀时间网格通过均匀分区的累积稳定性坐标。实验结果表明,与均匀网格相比,8段网格的端到端相对均方误差最高可降低51.9%。
摘要:We develop a quantitative approximation framework for diffusion distillation, viewing few-step sampling as error propagation under compositions of learned flow maps. Focusing on trajectory distillation for the probability-flow ODE, we show that local approximation errors can be strongly amplified in low-noise multimodal regimes, where the underlying dynamics become stiff. In an analytically tractable Gaussian-mixture Ornstein--Uhlenbeck setting, we separate two core difficulties: approximating the time-dependent score field and controlling the dynamical amplification governed by the time-integrated Jacobian bound of the probability-flow ODE. On the approximation side, we prove constructive L^p(p_t) guarantees showing that ReLU--ReQU networks approximate the Gaussian-mixture score uniformly over time, with depth and width scaling polylogarithmically in the target accuracy and explicitly with the mixture geometry. On the stability side, we derive an explicit bound L(t) for the spatial Lipschitz constant of the probability-flow velocity and convert it into a flow map stability estimate governed by \int_s^t L(u)\,du, making late-time amplification in stiff regimes computable. Building on these estimates, we prove that deep residual compositions efficiently approximate the long-horizon transport, with global error controlled by the stability amplification factor, and identify a Lipschitz-mismatch regime in which one-step distillation is structurally unfavorable. The resulting theory yields a stability-balanced non-uniform time grid obtained by uniform partitioning in the cumulative stability coordinate. Experiments support the prediction and reduce end-to-end relative MSE by up to 51.9\% with 8 segments compared with uniform grids.

超分辨率|去噪|去模糊|去雾(2篇)

【1】Denoise First, Orthogonalize Later: Understanding Momentum in Muon via Spectral Filtering
标题:先去噪,后去子化:通过光谱过滤了解μ子的动量
链接:https://arxiv.org/abs/2606.03899

作者:Xianliang Li,Zihan Zhang,Weiyang Liu,Han Bao
摘要:Muon最近在大型语言模型训练中表现出强大的经验性能,但Muon动量的理论作用仍不清楚。现有的μ子分析要么去除动量,孤立地研究光谱更新,要么保留动量,而不解释为什么它会提高经验性能。我们的工作通过显示μ子中的动量作为光谱滤波器来弥合这一差距。在一个结构化的信号加扰动梯度模型下,我们证明了动量在保持主导信号的同时抑制了扰动,从而扩大了它们之间的谱隙。这个扩大的间隙稳定了传递到μ子正交化步骤的矩阵的奇异子空间,使得到的更新更可靠。我们进一步表明,在正交化之前应用动量可以证明比反转该顺序或简单地去除动量更强地与梯度的信号分量对齐。不同任务的实验,包括LLM预训练,支持我们的理论分析。更广泛地说,我们的理论为理解其他基于矩阵的优化器中动量的好处提供了一个起点。
摘要 :Muon has recently demonstrated strong empirical performance in large language model training, but the theoretical role of momentum in Muon remains unclear. Existing analyses of Muon either remove momentum to study spectral updates in isolation, or retain momentum without explaining why it improves empirical performance. Our work bridges this gap by showing momentum in Muon acts as a spectral filter. Under a structured signal-plus-perturbation gradient model, we prove that momentum suppresses perturbations while preserving the dominant signal, thereby enlarging the spectral gap between them. This enlarged gap stabilizes the singular subspaces of the matrix passed to Muon's orthogonalization step, making the resulting update more reliable. We further show that applying momentum before orthogonalization achieves provably stronger alignment with the signal component of the gradient than either reversing this order or simply removing momentum. Experiments across diverse tasks, including LLM pretraining, support our theoretical analysis. More broadly, our theory offers a starting point for understanding the benefits of momentum in other matrix-based optimizers.

【2】Flicker-DDPM: Accelerating Denoising Diffusion via 1/f Colored Noise Injection
标题:Flicker-DDPM:通过1/f彩色噪音注入加速降噪扩散
链接:https://arxiv.org/abs/2606.03393

作者:Kexiang Mao
备注:16pages, 8 figures, Code available at this https URL
摘要
摘要

自动驾驶|车辆|车道检测等(2篇)

【1】The Road Ahead in Autonomous Driving: The KITScenes Multimodal Dataset
标题:自动驾驶的未来道路:KITScenes多模式数据集
链接:https://arxiv.org/abs/2606.02956

作者:Richard Schwarzkopf, Fabian Immel, Alexander Blumberg, Jonas Merkert, Nils Rack, Kaiwen Wang, Fabian Konstantinidis, Julian Truetsch, Carlos Fernandez, Annika Bätz, Kevin Rösch, Marlon Steiner, Willi Poh, Yinzhe Shen, Royden Wagner, Felix Hauser, Dominik Strutz, Jaime Villa, Gleb Stepanov, Holger Caesar, Ömer Şahin Taş, Frank Bieder, Jan-Hendrik Pauls, Christoph Stiller
备注:28 pages, 21 figures
摘要

【2】Binary Road Surface Classification Using Machine Learning on Production Vehicle Signals During Cruising
标题:使用机器学习对巡航期间生产车辆信号进行二元路面分类
链接:https://arxiv.org/abs/2606.02762

作者:Vishal Hariharan,Salar Basiri,Kanwar Bharat Singh
摘要:Knowledge of real-time road slipperiness, or even better, a refined estimate of peak grip potential, is a critical input for vehicle warning and intervention control systems. Typically, friction is estimated through dynamics-based recursive estimators by calculating the slip slope; however, its efficacy is heavily constrained by the vehicle dynamic scenario. When the vehicle is cruising and there is little to no slip, these methods become ineffective due to the inability of present-day production-grade sensors, such as wheel speed sensors, and methods to either measure or accurately estimate micro slip, which is crucial for distinguishing different surfaces. To address this challenge, the correlation between vehicle signals and road surface condition during cruising needs to be uncovered using machine learning. In this paper, a feature-based framework and an end-to-end data-driven framework are used to correlate the statistics of vehicle dynamics behavior with the condition of the road surface and perform binary classification into grip, dry or damp, and slip, snow or ice, conditions. A sliding-window approach is adopted to batch a short buffered window of wheel speeds, wheel torques, longitudinal acceleration, steering angle, and yaw rate, which are fed into a machine learning module for predicting the road state. Validation results on public-road data show scenarios where the data-driven method identifies the road surface correctly even during cruising, showing promise for accurate data-driven friction-related state estimators in the field of tire and vehicle dynamics.

联邦学习|隐私保护|加密(2篇)

【1】FlashbackCL: Mitigating Temporal Forgetting in Federated Learning
标题:闪回CL:减轻联邦学习中的暂时遗忘
链接:https://arxiv.org/abs/2606.03939

作者:Mubarak A. Ojewale,Adriana E. Chis,Jorge M. Cortes-Mendoza,Bernardo Pulido-Gaytan,Horacio Gonzalez-Velez
摘要:Federated Learning (FL) of foundation and edge models increasingly targets deployments where client data distributions drift over time, yet existing forgetting-mitigation methods assume each client's distribution is stationary. Flashback, the strongest recent FL method against cross-client (spatial) forgetting, uses monotonically accumulating per-class label counts as a knowledge proxy; this proxy becomes miscalibrated under temporal distribution shift and anchors the global model to an outdated class balance. We formalise temporal forgetting in FL with a per-phase metric isolated from protocol-level fluctuations and propose Flashback Continual Learning (FlashbackCL), a drop-in extension of Flashback with (i) temporally-decayed label counts; (ii) a device-aware replay buffer with Class-Balanced Reservoir Sampling (CBRS); and (iii) server-side active coreset curation on the public distillation set. The results show that FlashbackCL achieves 6.9% to 10.0% relative improvement relative to Flashback, on CIFAR-10 with 50 clients and three controlled temporal shift modes, while simultaneously reducing temporal forgetting by up to 68%. A 5-variant ablation identifies CBRS replay as the critical component. FlashbackCL also improves Flashback by 3.5 points on stationary CIFAR-100, suggesting that class-balanced replay regularises spatial heterogeneity as well as temporal shift.

【2】FederatedSkill: Federated Learning for Agentic Skill Evolution
标题:FederatedSkill:联邦学习,促进抽象技能进化
链接:https://arxiv.org/abs/2606.03143

作者:Jingbo Yang, Guanyu Yao, Yang Zhang, Ramana Rao Kompella, Gaowen Liu, Shiyu Chang
摘要

推理|分析|理解|解释(15篇)

【1】Value-Aware Stochastic KV Cache Eviction for Reasoning Models
标题:推理模型的价值感知随机KV缓存驱逐
链接:https://arxiv.org/abs/2606.03928

作者:Ting-Yun Chang,Harvey Yiyun Fu,Deqing Fu,Chenghao Yang,Jesse Thomason,Robin Jia
备注:Codes: https://github.com/terarachang/VaSE
摘要:Reasoning models improve accuracy through extended chains of thought, but their long outputs create a memory and compute bottleneck. KV cache eviction methods reduce this cost by evicting unimportant key-value pairs from the cache, yet they often yield worse accuracy than selection-based sparse attention alternatives, which keep the full KV cache. We identify key factors crucial to KV cache eviction accuracy. First, a small fraction of value states have abnormally large magnitudes, and evicting them causes catastrophic failure where models enter repetitive reasoning loops. Second, introducing stochasticity during eviction improves accuracy by increasing cache diversity. Based on these findings, we propose Value-aware Stochastic KV Cache Eviction (VaSE), a training-free recipe that protects large-magnitude value states and promotes diverse eviction decisions. Across six reasoning tasks, Qwen3 models using VaSE with 4x KV cache compression yield higher average accuracies than SOTA selection method at the same sparsity, while outperforming the strongest eviction method by more than 4%. Overall, VaSE bridges the gap between efficiency and accuracy, supporting FlashAttention2 and enabling a static memory footprint for reasoning models.

【2】Explainable Forecasting of Scientific Breakthroughs from Concept Network Dynamics
标题:概念网络动力学对科学突破的可解释预测
链接:https://arxiv.org/abs/2606.03864

作者:Thomas Maillart,Thibaut Chataing,Ntorina Antoni,David Dosu,Paul Bagourd,Julian Jang-Jaccard,Alain Mermoud
备注:18 pages, 10 figures, 4 tables. An earlier version was presented at Global Tech Mining Conference 2026. Code and data: https://github.com/wazaahhh/breakthroughs-forecasting
摘要:We introduce an explainable machine-learning approach that forecasts the structural precursors of scientific breakthroughs -- the emergence and intensification of links between research concepts -- by modelling how OpenAlex concept networks evolve over time. Using 59 semantic and topological features, a two-stage LightGBM model jointly predicts the formation and the future weight of concept pairs, adding a regression stage that quantifies expected intensity to prior link-existence forecasts. Relative to the state of the art, the approach improves accuracy and explainability at once: comparative validation across four technology and biomedical domains yields ROC-AUC in [0.954, 0.967] at all horizons without re-tuning, exceeding the roughly 0.90 of prior models, while every forecast rests on structural, auditable features rather than opaque embeddings. Classification performance is high (AUC about 0.95) and regression remains stable (RMSLE 0.45 to 0.6 over one to five years). Feature attribution shows that structural factors -- particularly Adamic-Adar similarity and degree-based Hadamard measures -- consistently drive accuracy, suggesting that breakthrough-relevant recombinations emerge in tightly connected sub-networks. Two expert-anchored cases, quantum annealing and AI-enabled quantum architectures, show the model surfacing technological convergence consistent with expert expectations. We then outline a three-layer decision architecture -- detection, expert translation, institutional integration -- that turns these forecasts into evidence-based research strategy and policy, anchored in open data and explainable features.

【3】Staying Alive: Uncensored Survival Analysis with Tabular Foundation Models
标题:生存:使用表格基础模型的未经审查的生存分析
链接:https://arxiv.org/abs/2606.03689

作者:Mariana Vargas Vieyra
摘要

【4】KVarN: Variance-Normalized KV-Cache Quantization Mitigates Error Accumulation in Reasoning Tasks
标题:KVarN:方差规范化的KV缓存量化减轻推理任务中的错误累积
链接:https://arxiv.org/abs/2606.03458

作者:Lorenz K. Muller, Philippe Bich, Chiara Boretti, Hyun-Min Chang, Jiawei Zhuang, Lukas Cavigelli
摘要

【5】The Impact of Temporal Granularity on Socio-Demographic Inference from Household Load Profiles
标题:时间粒度对从家庭负荷剖面推断社会人口学的影响
链接:https://arxiv.org/abs/2606.03358

作者:Dejan Radovanovic, Maximilian Schirl, Andreas Unterweger, Günther Eibl
备注:30 pages, 10 figures, book chapter
摘要

【6】Tailoring Strictly Proper Scoring Rules for Downstream Tasks: An Application to Causal Inference
标题:为下游任务裁剪严格适当的评分规则:因果推理的应用
链接:https://arxiv.org/abs/2606.03332

作者:Roman Plaud, Alexandre Perez-Lebel, Antoine Saillenfest, Thomas Bonald, Marine Le Morvan, Gaël Varoquaux, Matthieu Labeau
备注:Accepted to ICML 2026
摘要

【7】Learning Multi-Scale Hypergraph for High-Order Brain Connectivity Analysis
标题:学习多尺度超图进行高级大脑连接性分析
链接:https://arxiv.org/abs/2606.03310

作者:Jaeyoon Sim, Soojin Hwang, Seunghun Baek, Guorong Wu, Won Hwa Kim
备注:24 pages, Accepted to ICML 2026
摘要

【8】Right Makes Might: Aligning Verified Hidden States Empowers RL Reasoning
标题:正确造就力量:对齐已验证的隐藏状态增强RL推理能力
链接:https://arxiv.org/abs/2606.03234

作者:Ziyue Wang, Aomufei Yuan, Yongfu Zhu, Shuai Dong, Wenpu Liu, Yiran Yao, Weichu Xie, Yuqi Xu, Caoyuan Ma, Wenqi Shao, Xiaoying Zhang, Nan Duan, Jiaqi Wang
备注:16 pages, 7 figures
摘要

【9】Critical evaluation of PINN for FWD inverse analysis and differentiable FEM as an alternative
标题:PINN对FWD逆分析的批判性评估和作为替代方案的可微FEC
链接:https://arxiv.org/abs/2606.03210

作者:Yongjin Choi, Hyeonbin Moon, Seunghwa Ryu
摘要

【10】What Do Students Learn? A Feature-Level Analysis of Dark Knowledge
标题:学生学到什么?黑暗知识的制造层面分析
链接:https://arxiv.org/abs/2606.03052

作者:Seungu Kang, Songkuk Kim
备注:Accepted at ICPR 2026
摘要

【11】RRISE: Robust Radius Inference via a Surrogate Estimator
标题:RRISE:通过代理估计器进行稳健的半径推断
链接:https://arxiv.org/abs/2606.02876

作者:Jong-Ik Park, Shreyas Chaudhari, Carlee Joe-Wong, José M. F. Moura
摘要

【12】Spectral-Progressive Thought Flow for Lightweight Multimodal Reasoning
标题:用于轻量级多模态推理的谱-渐进思想流
链接:https://arxiv.org/abs/2606.02842

作者:Yixian Shen, Zhiheng Yang, Qi Bi, Changshuo Wang, Shuai Wang, Jia-Hong Huang, George Floros, Prayag Tiwari, Anuj Pathania
备注:Accepted at ICML 2026
摘要

【13】Aligning Data-Driven Predictors with Allocation: A Decision-Focused Approach to Survival Analysis
标题:将数据驱动预测与分配保持一致:以决策为中心的生存分析方法
链接:https://arxiv.org/abs/2606.02671

作者:Itai Zilberstein,Ioannis Anagnostides,Tuomas Sandholm
摘要 :Machine learning predictors have become essential tools for guiding automated decision making. However, a major misalignment persists: predictive models are typically optimized in terms of standard statistical metrics in isolation from the algorithmic tasks they inform. We highlight this incongruity in the high-stakes domain of organ allocation by demonstrating that any algorithm relying on (even highly accurate) survival predictors optimized for standard metrics -- such as the Concordance index (C-index) -- can yield arbitrarily poor outcomes when used for allocation, failing to guarantee utility better than a uniform random selection. To bridge the gap between survival analysis and policy optimization, we introduce a decision-focused learning approach based on optimizing normalized discounted cumulative gain (NDCG), a mainstay metric in information retrieval. We establish the utility of NDCG in survival analysis by proving that it translates to guarantees on the performance of allocation. Empirically, we propose a bootstrapping approach to optimize the NDCG of existing survival models. Unlike prior work, we also address the challenge of right censorship when evaluating ranking. On historical heart transplant data from the US, our method dramatically boosts the NDCG of baseline models by 50-100%, which translates to tens of thousands of additional life years gained annually when deployed for transplant allocation. We anticipate that our framework will find broader applications in decision making with predictions.

【14】A Robust Optimization Approach to Sparse Principal Component Analysis
标题:稀疏主成分分析的鲁棒优化方法
链接:https://arxiv.org/abs/2606.03553

作者:David Vävinggren,Francis Bach,André M. H. Teixeira,Dave Zachariah,Antônio H. Ribeiro
摘要:While principal component analysis (PCA) is a fundamental tool for dimensionality reduction, its dense representations make it ill-suited for high-dimensional data. Existing methods address this by promoting sparsity through explicit $\ell_1$-penalties, but these are not obvious to tune due to the unsupervised nature of the task. In contrast, we propose Adversarial PCA (AdvPCA), which leverages robust optimization to achieve sparsity by optimizing the reconstruction objective against bounded, worst-case latent space perturbations. We show that this formulation admits a closed-form reduction, leading to a practical iterative algorithm that alternates between adversarial linear regression-style updates for the sparse encoder and orthogonal updates for the decoder. By theoretically characterizing the solution, we derive a data-adaptive parameterization that allows the algorithm to perform effectively out of the box. We validate these claims through numerical experiments on synthetic and real-world genomics data.

【15】DXA-Derived Skeletal Phenotypes and Hip Fracture Risk: A Backdoor-Adjusted Causal Analysis
标题:DXA衍生的骨骼表型和髋部骨折风险:后门调整的原因分析
链接:https://arxiv.org/abs/2606.02625

作者:Zixin Shi,Chen Zhao,Meiling Zhou,Kevin A. Maupin,Joyce H. Keyak,Nancy E. Lane,Kuan-Jui Su,Hui Shen,Hong-Wen Deng,Kui Zhang,Weihua Zhou
备注:35 pages; main manuscript includes 4 figures and 3 tables; supplementary material includes 13 figures and 3 tables
摘要:Purpose: To compare dual-energy X-ray absorptiometry (DXA)-derived hip skeletal phenotypes in relation to hip fracture risk using prespecified confounder adjustment and to assess whether phenotypes ranked by their backdoor-adjusted average treatment effects (ATEs) improve risk stratification. Methods: We analyzed 21,098 UK Biobank participants with linked health records, hip DXA-derived skeletal measures, and prespecified covariates. Sixteen phenotypes spanning bone mineral content (BMC), bone mineral density (BMD), and T-score across hip-related regions were evaluated. Confounder selection was guided by a prespecified directed acyclic graph (DAG). Backdoor-adjusted ATEs were estimated on the absolute risk-difference scale per standard deviation (SD) increase. Effect heterogeneity was evaluated for total femur BMD, and downstream prediction was assessed using clinical variables combined with phenotypes ranked by ATE magnitude. Results: Among 21,098 participants, 115 had hip fractures. All 16 phenotypes showed negative backdoor-adjusted ATEs per SD increase. The largest ATEs were observed for total femur BMC and total femur BMD, each with a risk difference of -0.0047, corresponding to approximately 4.7 fewer hip fractures per 1,000 participants per SD higher phenotype value. Conditional effects of total femur BMD were stronger among older participants and those with lower BMI. In prediction, clinical variables plus the top 11 ATE-ranked phenotypes achieved higher AUC than FRAX with femoral neck BMD (0.842 vs. 0.709), with higher sensitivity (0.748 vs. 0.443) and similar specificity (0.793 vs. 0.777). Conclusion: DXA-derived hip skeletal phenotypes differed in their backdoor-adjusted ATEs. Phenotype-level causal evaluation may help identify informative DXA measures for risk stratification.

检测相关(6篇)

【1】Learned Non-Maximum Suppression for 3D Object Detection
标题:3D对象检测的习得非最大抑制
链接:https://arxiv.org/abs/2606.03568

作者:Timo Osterburg, Stefan Schütte, Torsten Bertram
备注:6 pages, accepted at IEEE Intelligent Vehicles Symposium (IV) 2026
摘要

【2】Lingo_Research_Group at SemEval-2026 Task 9: Evaluating Prompt Variants for Polarization Detection
标题:Lingo_Research_Group参加SemEval-2026任务9:评估极化检测的提示变体
链接:https://arxiv.org/abs/2606.03334

作者:Pritam Kadasi, Anuj Tiwari, Mayank Singh
备注:Accepted at the SemEval Workshop, ACL 2026
摘要

【3】How Visible Are Silent Manipulation Failures? An Observability Study of False-Success Detection in Simulated Robot Episodes
标题:无声操纵失败有多明显?模拟机器人剧集中失败成功检测的可观察性研究
链接:https://arxiv.org/abs/2606.03134

作者:Aarav Bedi (University of California, Berkeley)
备注:4 pages, 3 figures
摘要

【4】COD10K-C: Benchmarking Robustness of Camouflaged Object Detection Under Natural Image Corruptions
标题:COD 10 K-C:自然图像损坏下伪装对象检测的鲁棒性基准
链接:https://arxiv.org/abs/2606.02603

作者:Arafat Hossain Sayem
备注:7 pages, 1 figure
摘要 :Camouflaged object detection has improved substantially, but most standard benchmarks evaluate models only on clean images. This is not realistic because real cameras often capture blur, sensor noise, weather effects, and compression artifacts. We present COD10K-C, a corruption robustness benchmark based on COD10K. It includes 8 corruption types and 5 severity levels, giving 40 conditions and 81,040 evaluation pairs in total. We evaluate three popular camouflaged object detection models, SINet-v2, PFNet, and ZoomNet, as well as a lightweight model called RobustCODLite. All models show clear performance drops on corrupted images. Motion blur and Gaussian blur cause the largest drops, with SINet-v2 losing 18.5 Dice points under motion blur. Brightness and fog are less harmful. RobustCODLite uses corruption augmentation, a frequency-prior branch, and an uncertainty-consistency loss. It retains 92.3% of its clean Dice score under corruption, compared with 87.7% for SINet-v2, 84.8% for ZoomNet, and 84.1% for PFNet. On the hardest corruptions, RobustCODLite matches or outperforms models that perform better on clean data. We will release the COD10K-C GitHub repository to support future research in robust camouflaged object detection.

【5】Testing the Test: Score-Direction Instability in Class-Split Anomaly Detection
标题:测试测试:类别分裂异常检测中的分数方向不稳定性
链接:https://arxiv.org/abs/2606.02601

作者:Alejandro Ascarate,Leo Lebrat,Rodrigo Santa Cruz,Clinton Fookes,Olivier Salvado
备注:4+1 pages, 1 figure, accepted at ICML 2026 Workshop on Hypothesis Testing
摘要:Within-dataset class-split evaluation is widely used as a proxy for fully unconditional out-of-distribution anomaly detection. We show that this protocol can become ill-posed when the held-out anomaly class overlaps the normal mixture in representation space. In this regime, anomaly scores may collapse toward chance or even invert, and the preferred score direction can depend on the unknown anomaly class. We introduce a simple training-free diagnostic, neighborhood class leakage, and show that it predicts score-direction instability across Fashion-MNIST, CIFAR-10, and Imagenette, in both pixel and VAE latent spaces. Our results suggest that class-split AD benchmarks should be treated as geometry-dependent stress tests rather than unconditional evidence of anomaly-detection ability.

【6】One Transit Is All You Need: Detecting Exoplanets Through Learned Stellar Behaviour with EXOVEIL
标题:一次凌日即可:用EXOVEIL通过学习的恒星行为检测系外行星
链接:https://arxiv.org/abs/2606.02778

作者:Pratik Priyanshu
备注:7 pages, 7 figures, 3 tables. Code and pretrained model: pip install exoveil. Candidate catalogue included as supplementary material
摘要:I present EXOVEIL, a transit detection system that learns what a star's brightness should look like and flags when reality disagrees. Unlike existing systems that require phase-folded input, EXOVEIL operates on raw flux time series and can detect planets that transit only once.A Transformer world model, trained on 16,499 Kepler light curves with transit-masked self-supervised learning, predicts expected stellar flux. A matched-filter detector with variance weighting extracts transit signals from the prediction residuals. A learned classifier (XGBoost) separates planets from false positives, achieving AUC 0.938 on Kepler DR25. Applied to single-transit injection-recovery, EXOVEIL recovers 32% of transits at 1000 ppm depth a task where all classification-based systems score 0% by construction. A blind search of 3,737 Kepler stars yields 179 new transit-like signals not present in the DR25 TCE catalogue, including 46 monotransit candidates. Applied withoutretraining to 47 confirmed TESS planets in the PLATO LOPS2 field, EXOVEIL achieves 100% recovery, demonstrating zero-shot cross-mission transfer. At PLATO's 25-second cadence, detection reaches 100 ppm -- approaching the Earth-analog regime. I provide the first application of conformal prediction to transit detection (95.9% empirical coverage) and release the system as pip install exoveil with pretrained weights and a candidate catalogue.

分类|识别(8篇)

【1】AnchorMoE: Interpretable Time Series Classification via Anchor-Routed MoE
标题:锚点MoE:通过锚点路由MoE的可解释时间序列分类
链接:https://arxiv.org/abs/2606.03631

作者:Tao Xie, Zexi Tan, Haoyi Xiao, Mengke Li, Yiqun Zhang, Yang Lu, Cuie Yang, Yiu-ming Cheung
备注:Accepted by KDD 2026, 12 pages
摘要

【2】A Hybrid Approach For Malware Classification Using Secondary Features Fusion
标题:一种基于二级特征融合的恶意软件分类混合方法
链接:https://arxiv.org/abs/2606.03432

作者:Raja Khurram Shahzad, Muhammad Mustaqeem, Haroon Elahi
摘要

【3】Speech Emotion Recognition using Attention-based LSTM-Network with Residual Connection
标题:使用具有剩余连接的基于注意力的LSTM网络的语音情感识别
链接:https://arxiv.org/abs/2606.03359

作者:Daniil Krasnoproshin, Maxim Vashkevich
备注:6 pages, 5 figures, DSPA 2026
摘要

【4】Effect of Demographic Bias on Skin Lesion Classification
标题:人口统计学偏见对皮肤病变分类的影响
链接:https://arxiv.org/abs/2606.03214

作者:Ralf Raumanns, Gerard Schouten, Veronika Cheplygina, Josien P.W. Pluim
备注:Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) , 26 pages, 12 figures
摘要

【5】WISE-HAR: A Generalizable Ensemble Deep Learning Framework for WiFi-Based Human Activity Recognition
标题:WISE-HAR:用于基于WiFi的人类活动识别的可推广集成深度学习框架
链接:https://arxiv.org/abs/2606.02974

作者:Maheen Arshad,Qindeel E Zahra,Muhammad Khuram Shahzad
备注:8 pages, 5 figures
摘要:Human Activity Recognition (HAR) using WiFi signals has emerged as a transformative technology for smart homes, healthcare monitoring, security systems, and ambient assisted living. Unlike traditional camera-based systems that raise significant privacy concerns and fail in low-light conditions, or wearable sensors that require user compliance, WiFi-based HAR is non-intrusive, privacy-preserving, cost-effective, and works seamlessly in any lighting condition. This paper presents a comprehensive approach to recognize three distinct human activities: "No Presence" (empty room), "Walking", and "Walking + Arm-waving" using the Wallhack1.8k WiFi spectrogram dataset. We propose three key improvements to address the main challenges in WiFi-based HAR. First, to address high performance variance, we implement ensemble learning with five different CNN architectures (Deep CNN, Wide CNN, MobileNetV2, ResNet50V2, and EfficientNetB0). Second, to address the small dataset size limitation, we apply aggressive data augmentation techniques including time-warping, frequency masking, and noise addition. Third, to evaluate real-world generalization capability, we perform cross-scenario evaluation (training on Line-of-Sight and testing on Non-Line-of-Sight) and cross-antenna evaluation (training on Biquad antenna and testing on PIFA antenna). Our ensemble model achieved a test accuracy of 94.87% on the LOS scenario with Biquad antenna, outperforming the best individual model by 0.66%. Data augmentation improved Random Forest performance from 60% to 95%. Cross-scenario evaluation showed minimal accuracy drops of only 1.37% and 2.07%, demonstrating strong generalization capabilities. The results indicate that the proposed approach is robust, reliable, and suitable for real-world deployment in diverse environments with different hardware configurations.

【6】ERP-XTTN: Interpretable Prototype-Guided Cross-Attention for Cross-Subject ERP Classification
标题:ERP-XTTN:可解释原型引导的交叉注意力跨学科资源规划分类
链接:https://arxiv.org/abs/2606.02939

作者:Charlotte Genevier Wyman, Leanne Hirshfield
摘要

【7】Combining Statistical Features and Deep Encodings for Rehearsal-Based Class-Incremental Time Series Classification
标题:结合统计特征和深度编码进行基于排练的类增量时间序列分类
链接:https://arxiv.org/abs/2606.03292

作者:Pablo García-Santaclara,Bruno Fernández-Castro,Rebeca Pilar Díaz-Redondo
摘要:Many systems used in real-world environments require adding new categories and incorporating new information without forgetting what was previously learnt by the classification model. This is known as class-incremental continual learning, and in the case of multivariate time-series, is further complicated by the temporal structure of the data. In this paper, we present a novel approach for performing class incremental continual learning for the classification of multivariate time series data based upon the construction of a dual-stream feature extraction pipeline (using both deep temporal embedding features generated via a pre-trained frozen foundation model and application of statistical features). Evaluated on five benchmark datasets, the proposed system achieves competitive average accuracy across all datasets while maintaining low forgetting rates across all experimental configurations.

【8】Hierarchies of Calibration: Classification meets Regression
标题:校准层次结构:分类满足回归
链接:https://arxiv.org/abs/2606.03245

作者:Johannes Resin,Lu Yang,Tilmann Gneiting
摘要:Concepts of calibration formalize the compatibility between probabilistic predictions and the respective outcomes. In a nutshell, the outcomes ought to be indistinguishable from random draws from the predictive distributions. In this paper, we review, extend, and bridge notions of calibration that have been proposed for classification and regression tasks. Particular emphasis is given to hierarchical relations between the various notions, as they apply to general real-valued data, continuous outcomes, count data, nominal classes, and binary outcomes. To highlight a number of contributions, we introduce the notion of modal calibration for nominal outcomes, we distinguish full, partial, and average calibration in this setting, and we show that double probability integral transform (PIT) calibration is logically independent of previously proposed concepts of calibration for discrete outcomes. Furthermore, we generalize extant results on concepts of calibration that are expressed in terms of properties or functionals of the predictive distributions, such as means, quantiles, or event probabilities. Throughout the paper, we illustrate the concepts and their hierarchical relations in worked examples, and we provide algorithmic tools that support the construction of instructive examples and counterexamples.

表征(4篇)

【1】GLINT: Sparsely Gated Vision-Language Alignment for Fine-Grained Radiology Representations
标题:GLINT:用于细粒度放射学表示的稀疏门控视觉语言对齐
链接:https://arxiv.org/abs/2606.03180

作者:Jonggwon Park, Seongeun Lee, Junhyun Park, Hannah Yun, Hyunwoong Kim, Sohyun Jeong, Hyewon Kang, Byungmu Yoon, Kyoyun Choi
摘要

【2】Neural Networks Provably Learn Spectral Representations for Group Composition
标题:神经网络可证明地学习群组成的谱表示
链接:https://arxiv.org/abs/2606.02993

作者:Jianliang He,Leda Wang,Fengzhuo Zhang,Siyu Chen,Zhuoran Yang
摘要:Understanding how structured internal structure emerges during neural network training is central to the study of deep learning. We investigate this phenomenon through the group composition task, where a two-layer neural network is trained to predict $g_1 \star g_2$ for elements of a finite group $G$. By lifting the projected gradient flow to the Fourier domain, we demonstrate that the training dynamics are governed by a Riemannian gradient ascent on a representation-theoretic energy functional. We prove that, under random initialization, this flow drives each neuron to converge almost surely toward a single irreducible representation, while the cross-layer Fourier coefficients achieve a rotational rank-one alignment. This framework provides a representation-theoretic account of feature learning and characterizes a novel low-rank compression phenomenon for matrix-valued group representations. Moreover, for Abelian groups, we provide a complete population-level description: random initialization promotes uniform diversification across nontrivial representations and induces Haar-uniform phases, jointly approximating the indicator via a majority-vote mechanism. We further prove that both phase alignment and representation competition emerge with exponential convergence rates.

【3】Learning Coherent Representations: A Topological Approach to Interpretability
标题:学习连贯表示:可解释性的一种topology方法
链接:https://arxiv.org/abs/2606.02841

作者:Sigurd Gaukstad, Melvin Vaupel, Valdemar Kargård Olsen, Erik Hermansen, Benjamin Dunn
备注:To appear in ICML 2026
摘要

【4】QUIVER: Quantum-Informed Views for Enhanced Representations in Large ML Models
标题:QUIVER:大型ML模型中增强表示的量子知情观点
链接:https://arxiv.org/abs/2606.02785

作者:Aritra Bal, Michael Binder, Markus Klute, Benedikt Maier, Michael Spannowsky
备注:9 pages, 1 figure and 2 tables. Accepted as a poster at the AI4Physics Workshop, ICML 2026 (Seoul, South Korea)
摘要

3D|3D重建等相关(2篇)

【1】A 3D Isovist World Model -- Revealing a City's Unseen Geometry and Its Emergent Cross-City Signature
标题:3D Isovist世界模型--揭示城市未见的几何形状及其新兴的跨城市标志
链接:https://arxiv.org/abs/2606.03609

作者:Xuhui Lin, Stephen Law, Nanjiang Chen, Kunyao Li, Tao Yang
摘要

【2】EqGINO: Equivariant Geometry-Informed Fourier Neural Operators for 3D PDEs
标题:EqGINO:3D PED的等变几何信息傅里叶神经运算符
链接:https://arxiv.org/abs/2606.03260

作者:Sungwon Kim, Juho Song, Seungmin Shin, Guimok Cho, Sangkook Kim, Chanyoung Park
备注:ICML 2026
摘要

优化|敛散性(4篇)

【1】Analytical Evaluation of DCA Convergence Properties for Minimizing Prediction Functions of Gaussian RBF Support Vector Regression
标题:最小化高斯基函数支持量回归预测函数的BCA收敛性分析评估
链接:https://arxiv.org/abs/2606.03559

作者:Yohei Kakimoto, Yuto Omae, Hirotaka Takahashi
备注:29 pages, 5 figures, 2 tables
摘要

【2】P\textsuperscript{2}-DPO: Grounding Hallucination in Perceptual Processing via Calibration Direct Preference Optimization
标题:P extsuperScript{2}-DPO:通过校准直接偏好优化在感知处理中产生幻觉
链接:https://arxiv.org/abs/2606.03376

作者:Ruipeng Zhang, Zhihao Li, Haozhang Yuan, C. L. Philip Chen, Tong Zhang
摘要

【3】Learning Temporal Causal Structure via Smooth Differentiable Optimization
标题:基于光滑可微优化的时序因果结构学习
链接:https://arxiv.org/abs/2606.03227

作者:Tong Zhao, Ce Guo, Wayne Luk, Emil Lupu, Ray Dipojjwal
摘要

【4】Rethinking Neural Width for Alternating Current Optimal Power Flow Proxies
标题:交流最优潮流代理神经网络宽度的再思考
链接:https://arxiv.org/abs/2606.03125

作者:Dhruvi Khandelwal, Anurag Basistha, Ayushi Jolotia, Parikshit Pareek
摘要

预测|估计(9篇)

【1】Forecasting Conceptual Diffusion in Science: The Case of Quantum Computing
标题:预测科学中的概念扩散:量子计算的案例
链接:https://arxiv.org/abs/2606.03919

作者:Thomas Maillart,Thibaut Chataing,David Dosu,Paul Bagourd,Julian Jang-Jaccard,Alain Mermoud
备注:19 pages, 5 figures, 6 tables. Code and manuscript sources: https://github.com/wazaahhh/breakthroughs-diffusion . An earlier version was presented at the Global Tech Mining Conference (GTM) 2026 (submission #117)
摘要:Understanding and anticipating scientific change requires models that distinguish between endogenous consolidation and exogenous diffusion of scientific concepts. Using the quantum computing subtree of concepts in OpenAlex, we construct a temporally resolved concept co-occurrence network and track each concept pair through its upstream citation lineage and downstream diffusion. We train LightGBM models on distributional and diversity-aware features to predict four outcomes: endogenous reinforcement, exogenous diffusion, their ratio, and diffusion entropy. After controlling for overall publication growth of the scientific body, endogenous reinforcement proves largely unpredictable in the primary quantum-computing benchmark. In contrast, exogenous diffusion and entropy are strongly predictable ($R^2$ up to $0.78à) and are driven by upstream heterogeneity, citation breadth, and distributional dispersion, as shown by SHAP analyses; replications on robotics, advanced materials, and neuro implants confirm that exogenous diffusion remains the top-ranked target across fields ($R^2_test \sim 0.60-0.87$), while endogenous predictability rises markedly in neuro implants (R^2_test = 0.83), indicating that the quantum-computing asymmetry does not generalise uniformly. Case studies reveal that sharp entropy increases coincide with the opening of new conceptual frontiers, while entropy collapses signal technological convergence or paradigm displacement. These results demonstrate that conceptual diffusion is governed by stable structural regularities embedded in semantic and citation environments. By identifying early diversity-based signals of cross-domain uptake, the approach provides a scalable foundation for anticipatory scientometrics, technology foresight, and innovation-oriented policy analysis in rapidly evolving research fields.

【2】Training a Predictive Coding Network on ImageNet using Equilibrium Propagation
标题:使用平衡传播在ImageNet上训练预测编码网络
链接:https://arxiv.org/abs/2606.03584

作者:Tugdual Kerjan, Rasmus Høier, Benjamin Scellier
摘要

【3】Fast Organic Crystal Structure Prediction with Unit Cell Flow Matching
标题:利用单元流匹配快速预测有机晶体结构
链接:https://arxiv.org/abs/2606.03199

作者:Alston Lo, Luka Mucko, Austin H. Cheng, Andy Cai, Alastair J. A. Price, Wojciech Matusik, Alán Aspuru-Guzik
摘要

【4】RESCAST-100K: A Comprehensive Dataset for Cross-Domain Residential Load and Indoor Temperature Forecasting
标题:RESCAST-100 K:跨域住宅负荷和室内温度预测的综合数据集
链接:https://arxiv.org/abs/2606.02852

作者:Jainam Dhruva, Yousaf Raza, A.B. Siddique, Simone Silvestri
摘要

【5】A Systematic Evaluation of Current Architectures in Wind Power Forecasting
标题:风电预测中当前架构的系统评估
链接:https://arxiv.org/abs/2606.02849

作者:Vinicius Bortolini, Gilson Adamczuk Oliveira, Erick Oliveira Rodrigues, Matheus Henrique Dal Molin Ribeiro
摘要

【6】Assessing Region-Level EEG Contributions to Cognitive Workload Prediction
标题:评估地区级脑电对认知时间预测的贡献
链接:https://arxiv.org/abs/2606.02598

作者:Jacob Wong,Sohan Singh,Prannaya Gupta,Jin Xing Ang,Kritika Johari,U-Xuan Tan
备注:Accepted to EMBC 2026
摘要:Accurate and generalizable estimation of cognitive workload from electroencephalography (EEG) is critical for human-centered and safety-critical systems. Although EEG is widely used for workload assessment, the consistency of region-level EEG contributions across tasks, datasets, and subjects remains unclear. This paper presents a region-level evaluation framework for EEG-based workload prediction in which models are trained and evaluated using features extracted exclusively from electrodes belonging to anatomically defined scalp regions. We perform a large-scale analysis across four publicly available EEG workload datasets spanning diverse task demands, recording hardware, and electrode montages. Region importance is quantified using a model-agnostic, performance-based approach under both mixed-subject and subject-independent evaluation protocols, with results aggregated using a rank-based strategy to ensure robustness across experimental configurations. Across all datasets and subject-independent evaluations, frontal electrode groups outperform the full-scalp baseline by approximately 15-20% in relative rank position while using substantially fewer electrodes. Fronto-central regions exhibit the most stable predictive utility, whereas posterior and occipital regions contribute less consistently across experimental conditions. These findings indicate that workload-relevant EEG information is most consistently retained within frontal and fronto-central electrode groups, supporting the design of efficient and generalizable EEG-based workload monitoring systems.

【7】FinStressTS: A Parametric Synthetic Benchmark for Time-Series Forecasting in Finance
标题:FinStressTS:金融时间序列预测的参数综合基准
链接:https://arxiv.org/abs/2606.03184

作者:Jiaze Sun,Kelvin J. L. Koa,Ruiyang Ni,Yize Liu,Haonan Chen,Ke-Wei Huang
备注:KDD 2026 (Oral)
摘要:Financial forecasting is difficult due to low signal-to-noise ratios, latent factors, heavy tails, regime shifts, and jumps. Real-world benchmarks offer limited failure attribution: researchers can observe underperformance, but often cannot isolate why because mechanisms are unobservable and entangled. Real financial data reveal only one realized path, making it difficult to assess tail-risk calibration or data efficiency. We introduce FinStressTS, a mechanism-aware synthetic benchmark that links model behavior to controlled structural causes. FinStressTS comprises 30 diagnostic environments around six mechanism families: volatility clustering, multi-scale persistence, heavy-tailed shocks, regime switching, self-exciting jumps, and zero-inflated processes. We evaluate two tasks: point forecasting, using NMAE across five settings, and probabilistic forecasting, using CRPS under known data-generating mechanisms. We benchmark 15 models, from classical methods (HAR, VAR) to Transformer forecasters (PatchTST, iTransformer) and deep probabilistic architectures (DeepAR, TSFlow), and use learning curves to measure sample efficiency. Our evaluation reveals three insights. First, performance is mechanism-dependent: autoregressive and linear models are highly competitive, and often outperform Transformer-based models, in several volatility-, tail-, and jump-driven environments. Second, distributional alignment matters: parametric probabilistic models such as DeepAR calibrate well in stationary settings, while flexible models can help when distributions become multimodal or sparse. Third, neural models often require more data to match simple baselines, with larger gains mainly when learning latent regimes or complex distributions. FinStressTS provides an open framework for diagnosing failure modes and advancing risk-aware forecasting.

【8】Learning to Refine: Spectral-Decoupled Iterative Refinement Framework for Precipitation Nowcasting
标题:学习完善:降水预播的光谱脱钩迭代完善框架
链接:https://arxiv.org/abs/2606.02661

作者:Yunlong Zhou,Chen Zhao,Danyang Peng,Fanfan Ji,Xiao-Tong Yuan
备注:21 pages, 10 figures, accepted at ICML 2026
摘要:Accurate precipitation nowcasting is vital for disaster mitigation, but deep learning methods face a key trade-off: regression models produce over-smoothed, spectrally decaying predictions that blur convective details and violate turbulence power laws; diffusion models generate realistic yet unanchored hallucinations lacking physical grounding. We propose Spectral-Decoupled Iterative Refinement (SDIR), a deterministic framework that reformulates nowcasting as progressive frequency-decoupled refinement. SDIR first extracts a stable low-frequency synoptic skeleton, then iteratively refines high-frequency textures under physical constraints, eliminating both blurring and hallucinations. It features a dual-path design: the Synoptic Frequency-Guided Former (SFG-Former) with Scale-Adaptive Transformers for global structure, and the Fourier Residual Refiner (FR-Refiner) with Scale-Conditioned Fourier Neural Operators for fine residuals. A Physically Consistent Power Spectral Density (PCPSD) loss with dynamic masking enforces a turbulence-consistent spectral distribution. Experiments on three benchmarks show SDIR significantly outperforms SOTA methods in spatial accuracy while achieving spectral fidelity competitive with diffusion-based methods, enabling reliable high-resolution operational nowcasting. Code link: https://github.com/RuntimeWarning/SDIR.

【9】Enhancing Protein-Protein Interaction Prediction with Hierarchical Motif-based Multimodal Protein Embedding
标题:利用基于层次基序的多模式蛋白质嵌入增强蛋白质相互作用预测
链接:https://arxiv.org/abs/2606.02629

作者:Zaifei Yang,Samuel Ping-Man Choi,James Kwok
摘要:Protein-protein interactions (PPIs) are essential for many biological processes. However, existing PPI prediction approaches suffer from two major limitations: they overlook the hierarchical organization of proteins, particularly meso-scale motifs that critically regulate PPIs, and fail to effectively integrate sequence, structure, and function modalities. To address these limitations, we propose MMM-PPI, a Hierarchical Motif-based Multi-Modal protein Encoder for PPI Prediction that constructs PPI embeddings in a bottom-up multi-modal manner across three scales. At the micro-scale, we encode three modal residue features; at the meso-scale, a novel multimodal motif encoder aggregates residues into spatially-informed motif embeddings; at the macro-scale, a multimodal protein encoder integrates motifs into protein embeddings by jointly modeling motif importance and inter-modal correlations. The pre-trained encoder can be used off-the-shelf for large-scale PPI prediction. Extensive experiments on multiple PPI datasets show that MMM-PPI outperforms state-of-the-art multi-label PPI prediction models, particularly under challenging data partitions and limited data scenarios. Codes are in https://github.com/yzf-code/MMM-PPI.

其他神经网络|深度学习|模型|建模(30篇)

【1】FFR: Forward-Forward Learning for Regression
标题:血流储备分数:回归的前向学习
链接:https://arxiv.org/abs/2606.03927

作者:Xinyang Liu,Xuanyu Liang,Shiqi Ding,Boyang Li,Zhiqiang Que,Jiayang Li,Guosheng Hu
摘要:The Forward-Forward (FF) algorithm offers a computationally efficient and biologically plausible alternative to backpropagation (BP) by training neural networks through purely local, layer-wise optimization. However, FF is inherently designed for classification via contrastive positive-negative sample pairs, and extending it to regression poses fundamental challenges: continuous target space lack natural "opposites" for contrastive learning, and the standard goodness function carries no information about target magnitude or ordering. We propose FFR (Forward-Forward for Regression), to our knowledge, the first framework to extend FF to real-world regression and demonstrate competitive performance across diverse real-world datasets. FFR introduces three key innovations: (1) an ordinal competitive goodness function that replaces contrastive pairs with competitive learning between partitioned neuron groups under distance-aware ordinal supervision; (2) a stratified ladder architecture where shallow layers learn coarse ordinal discrimination and deeper layers refine into fine-grained regression, with multi-scale feature aggregation for inter-layer collaboration; and (3) hierarchical prediction with uncertainty estimation, where multi-scale predictors jointly provide robust predictions and prediction confidence as a free-lunch. Extensive experimental results show FFR recovers on average 98.6% of BP's accuracy across five real-world regression benchmarks while reducing peak training memory to only 27% of BP's at depth 8 and 8% at depth 32, with per-iteration time around 72% of BP's, and substantially outperforms all BP-free competitors.

【2】Online Learning with Gradient-Variation Interval Regret
标题:具有学生变化区间后悔的在线学习
链接:https://arxiv.org/abs/2606.03831

作者:Yan-Feng Xie,Shuche Wang,Peng Zhao,Zhi-Hua Zhou
摘要 :This paper investigates non-stationary online learning using the metric of interval regret, which requires an online algorithm to perform well over every time interval. We propose the first online learning algorithm that achieves an interval regret bound scaling with gradient variation, a fundamental measure of the cumulative change in online function gradients, which relates to various problem-dependent quantities and is closely connected to stochastic optimization and other problems. Our method employs a simple and efficient two-layer online ensemble structure that achieves strong theoretical guarantees. Specifically, it enjoys a regret bound that simultaneously adapts to various problem-dependent quantities while also preserving the minimax-optimal rate in the worst case. Moreover, recognizing the challenge of hyperparameter tuning, we introduce a Lipschitz- and smoothness-agnostic variant that automatically adapts to these potentially unknown constants. This is primarily enabled by a novel Lipschitz-adaptive meta algorithm, which may be of independent interest. Beyond interval regret, our method also yields broader implications: it provides versatile bounds for interval dynamic regret, a stronger measure that competes with changing comparators over any interval, and yields the first piecewise characterization for stochastic extended adversarial optimization. Theoretical findings are validated by experiments.

【3】Conformal Language Modeling via Posterior Sampling
标题:通过后验抽样的保形语言建模
链接:https://arxiv.org/abs/2606.03731

作者:Nicolas Emmenegger, Theo X. Olausson, Armando Solar-Lezama, Chara Podimata
摘要

【4】Speedrunning Tabular Foundation Model Pretraining
标题:竞速表格基础模型预训练
链接:https://arxiv.org/abs/2606.03681

作者:Salih Bora Ozturk, Alexander Pfefferle, Frank Hutter
摘要

【5】Spatial Transcriptomics-Guided Alignment Enhances Molecular Profiling in Pathology Foundation Model
标题:空间转录组学引导的比对增强病理学基础模型中的分子分析
链接:https://arxiv.org/abs/2606.03644

作者:Fengtao Zhou, Yingxue Xu, Zhengyu Zhang, Yihui Wang, Zhengrui Guo, Ling Liang, Jiabo Ma, Cheng Jin, Ziyi Liu, Huajun Zhou, Hongyi Wang, Du Cai, Chenglong Zhao, Xi Wang, Can Yang, Yu Wang, Wenbin Li, Feng Gao, Zhe Wang, Zhenhui Li, Xiuming Zhang, Li Liang, Hao Chen
摘要

【6】Low-Frequency Shortcuts in Texture-Driven Visual Learning
标题:纹理驱动视觉学习中的低频快捷方式
链接:https://arxiv.org/abs/2606.03493

作者:Utku Şirin, Cathy Hou, David Alvarez-Melis, Stratos Idreos
摘要

【7】When Model Merging Breaks Routing: Training-Free Calibration for MoE
标题:当模型合并中断路由时:MoE的免训练校准
链接:https://arxiv.org/abs/2606.03391

作者:Canbin Huang, Tianyuan Shi, Xiaojun Quan, Jingang Wang, Jianfei Zhang, Qifan Wang
摘要

【8】AugMask: Training Diffusion Models on Incomplete Tabular Data via Stochastic Augmentation and Masking
标题:AugMass:通过随机增强和掩蔽在不完整表格数据上训练扩散模型
链接:https://arxiv.org/abs/2606.03347

作者:Jungkyu Kim, Taeyoung Park, Kibok Lee
摘要

【9】Bayesian Tensor Decomposition with Diffusion Model Prior
标题:具有扩散模型先验的Bayesian张量分解
链接:https://arxiv.org/abs/2606.03212

作者:Zerui Tao, Qibin Zhao
备注:ICML 2026
摘要

【10】HARVE: Hacking-Aware Reward-Head Vector Editing for Robust Reward Models
标题:HARVE:具有黑客意识的奖励头部载体编辑,用于稳健的奖励模型
链接:https://arxiv.org/abs/2606.03131

作者:Shuang Liu, Yuxuan Bo, Qiuyang Zhao, Caiyue Huang, Xiaorong Chen, Yanguang Liu, Mengnan Du
摘要

【11】Synthetic Hallucinations, Real Gains: Hard Negatives from Frontier Models for FIM Hallucination Mitigation
标题:合成幻觉,真正的收益:Frontier模型中的严重负面影响用于缓解幻觉
链接:https://arxiv.org/abs/2606.03130

作者:Mahdi Erfanian, Nelson Daniel Troncoso, Aashna Garg, Amabel Gale, Xiaoyu Liu, Pareesa Ameneh Golnari, Shengyu Fu
摘要

【12】TiWeaver: Unified Temporal Dynamics Modeling via Contextual Patching
标题:TiWeaver:基于上下文补丁的统一时间动态建模
链接:https://arxiv.org/abs/2606.03121

作者:Zhe Li, Jindong Tian, Hao Miao, Zhi Lei, Chenjuan Guo, Bin Yang
摘要

【13】GuidedBridge: Training-freely Improving Bridge Models with Prior Guidance
标题:GuidedBridge:在事先指导下免费训练改进桥梁模型
链接:https://arxiv.org/abs/2606.03119

作者:Zehua Chen, Yucheng Yang, Binjie Yuan, Kaiwen Zheng, Jun S. Liu, Jun Zhu
备注:ICML 2026
摘要

【14】Learning to Solve, Forgetting to Retain: Correct-Set Turnover in RLVR
标题:学会解决问题,忘记保留:WLVR中正确设定的周转率
链接:https://arxiv.org/abs/2606.03087

作者:Chuanyu Qin, Chenxu Yang, Qingyi Si, Naibin Gu, Peng Fu, Zheng Lin
摘要

【15】Brief Announcement: Generative Markov Model for Distributed Computing Systems
标题:简短公告:分布式计算系统的生成马尔科夫模型
链接:https://arxiv.org/abs/2606.03061

作者:Alfreds Lapkovskis, Ali Beikmohammadi, Sindri Magnússon, Praveen Kumar Donta
备注:Submitted to 40th International Symposium on Distributed Computing (DISC 2026)
摘要

【16】Are we really tilting? The mechanics of reward guidance in flow and diffusion models
标题:我们真的在倾斜吗?流动和扩散模型中奖励引导的机制
链接:https://arxiv.org/abs/2606.02884

作者:Sanjit Dandapanthula, Nicholas M. Boffi
摘要

【17】Cosmos 3: Omnimodal World Models for Physical AI
标题:宇宙3:物理人工智能的全模式世界模型
链接:https://arxiv.org/abs/2606.02800

作者:Aditi, Niket Agarwal, Arslan Ali, Jon Allen, Martin Antolini, Adeline Aubame, Alisson Azzolini, Junjie Bai, Maciej Bala, Yogesh Balaji, Josh Bapst, Aarti Basant, Mukesh Beladiya, Mohammad Qazim Bhat, Zaid Pervaiz Bhat, Dan Blick, Vanni Brighella, Han Cai, Tiffany Cai, Eric Cameracci, Jiaxin Cao, Yulong Cao, Mark Carlson, Carlos Casanova, Ting-Yun Chang, Yan Chang, Yu-Wei Chao, Prithvijit Chattopadhyay, Roshan Chaudhari, Chieh-Yun Chen, Junyu Chen, Ke Chen, Qizhi Chen, Wenkai Chen, Xiaotong Chen, Yu Chen, An-Chieh Cheng, Click Cheng, Xiu Chia, Jeana Choi, Chaeyeon Chung, Wenyan Cong, Yin Cui, Magdalena Dadela, Nalin Dadhich, Wenliang Dai, Joyjit Daw, Alperen Degirmenci, Rodrigo Vieira Del Monte, Robert Denomme, Sameer Dharur, Marco Di Lucca, Ke Ding, Wenhao Ding, Yifan Ding, Yuzhu Dong, Nicole Drumheller, Yilun Du, Aigul Dzhumamuratova, Aleksandr Efitorov, Hamid Eghbalzadeh, Naomi Eigbe, Imad El Hanafi, Hassan Eslami, Benedikt Falk, Jiaojiao Fan, Jim Fan, Amol Fasale, Sergiy Fefilatyev, Liang Feng, Francesco Ferroni, Sanja Fidler, Xiao Fu, Vikram Fugro, Prashant Gaikwad, TJ Galda, Katelyn Gao, Yihuai Gao, Wenhang Ge, Sreyan Ghosh, Arushi Goel, Vivek Goel, Akash Gokul, Rama Govindaraju, Jinwei Gu, Miguel Guerrero, Elfie Guo, Aryaman Gupta, Siddharth Gururani, Hugo Hadfield, Song Han, Ankur Handa, Zekun Hao, Mohammad Harrim, Ali Hassani, Nathan Hayes-Roth, Yufan He, Chris Helvig, Cyrus Hogg, Madison Huang
摘要

【18】Improvise, Adapt, Overcome: An On-The-Fly Multifidelity Algorithm for Efficient Machine Learning
标题:即兴、适应、克服:高效机器学习的实时多保真算法
链接:https://arxiv.org/abs/2606.02662

作者:Vivin Vinod,Peter Zaspel
备注:Supplementary Information added as separate PDF
摘要:Machine learning has accelerated quantum chemistry but is hindered by the prohibitive cost of generating high fidelity training data. Multifidelity machine learning (MFML) mitigates this overhead by systematically combining abundant low fidelity data with sparse high fidelity data. In spite of its success, standard MFML schemes rely on pre-defined scaling factors to determine sparse data ratio across fidelities, often generating redundant multifidelity data resulting in a loss of efficiency. Here, we introduce an adaptive on-the-fly multifidelity framework for machine learning that autonomously determines training dataset composition. By dynamically querying training samples at each fidelity, the algorithm saturates model accuracy at lower fidelities before moving up to more expensive reference calculations. We benchmark the novel adaptive-MFML across diverse chemical properties including the computational chemistry gold standard coupled cluster energies, and the more chemically challenging excitation energies. In our numerical experiments we show that our adaptive algorithm reduces data generation costs by up to a factor of 30 compared to single fidelity methods and improves upon standard MFML by up to a factor of 5. The mitigation of data redundancy establishes a high-accuracy low-cost pathway for sustainable cost-aware machine learning in quantum chemistry.

【19】CL-DMDF:Dynamic Multimodal Data Fusion Model Based on Contrastive Learning
标题:CL-DMDF:基于对比学习的动态多峰数据融合模型
链接:https://arxiv.org/abs/2606.02659

作者:Dong Li,Lingling Zhang,Binghao Han,Linlin Ding,Yue Kou
备注:9 pages, 5 figures, 7 tables
摘要:Multimodal data fusion involves integrating and analyzing information from multiple modalities to uncover latent correlations and complementary patterns, thereby enhancing data processing and decision-making. While existing methods for structured multimodal inputs are typically designed around specific tasks and assume fully observed modalities, real-world applications often suffer from uncertain or missing modality inputs due to various factors. Some traditional models overly emphasize local interactions within missing modalities, neglecting the global complementary cues embedded in multimodal representations. To overcome these limitations, we propose a Dynamic Multimodal Data Fusion model based on Contrastive Learning (CL-DMDF). CL-DMDF introduces a novel attention mechanism that operates across both feature and modality dimensions to compute reliable attention scores, effectively reflecting importance at each level. The CL-DMDF further incorporates an entity-centroid contrastive learning module that constructs centroid-based positive samples from entity features to enhance discriminative learning. Additionally, an adaptive fusion module is employed to improve the efficiency and accuracy of dynamic fusion strategies. Extensive experiments conducted on three datasets demonstrate the effectiveness of the CL-DMDF across diverse multimodal fusion tasks.

【20】Oscillatory State-Space Models as Inductive Biases for Physics-Informed Neural PDE Solvers
标题:振荡状态空间模型作为物理信息神经偏置的诱导偏差
链接:https://arxiv.org/abs/2606.02623

作者:Abhishek Chandra,Taniya Kapoor
摘要:Solving time-dependent partial differential equations (PDEs) is an important problem in computational science and engineering. Physics-informed neural networks (PINNs) learn PDE solutions from governing equations. However, accurately capturing temporal evolution remains challenging. Recent sequence-model-based approaches parameterize time evolution using general-purpose sequence models, which capture temporal dependencies but do not explicitly encode the structured dynamics of PDE solutions. In addition, their memory requirements can scale unfavorably with sequence length and resolution, limiting applicability in large-scale or high-dimensional settings. This work introduces a PINN approach that incorporates oscillatory state-space dynamics to represent the modal structure of PDE solutions. The proposed method leverages a linear-oscillator-based temporal evolution, together with a PDE-aware spectral basis in space. This design enables closed-form spatial differentiation and consistent enforcement of boundary conditions. The method is evaluated on forward, inverse, and high-dimensional PDE problems, including cases up to 100 spatial dimensions. The results show improved accuracy and reduced memory usage compared to recent sequence-model-based PINN approaches. Overall, this work highlights the benefits of incorporating structured dynamical priors into the temporal evolution of neural PDE solvers and suggests designing more physics-aligned and computationally efficient PINN architectures.

【21】Pruning Deep Neural Networks via the Marchenko--Pastur Distribution
标题:通过Marchenko修剪深度神经网络--牧场分布
链接:https://arxiv.org/abs/2606.02608

作者:Leonid Berlyand,Theo Bourdais,Houman Owhad,Yitzchak Shmalo
摘要 :We study a Marchenko--Pastur (MP) random-matrix approach to pruning deep neural networks with very small post-pruning fine-tuning budgets. The main practical contribution is accuracy retention under short calibration and fine-tuning schedules, rather than a long post-pruning reoptimization pipeline. The theory gives deterministic data-path certificates: if the removed component $R$ has small propagated logit effect $L_s \| R ψ_1(s) \|_\infty$, pruning decreases an elastic-net objective and preserves samples whose dense margin exceeds twice the perturbation. The zero-budget case gives perfect pruning; a prune--restore extension models weight restoration inside a fixed sparse-execution pattern; and an additive $L_2$-regularized model shows admissible random-like components vanish at the training limit, with persistent spikes stabilizing as the MP bulk collapses. Under iid-Gaussian sufficient conditions, the fitted MP edge $σ_+$ gives a high-probability layerwise budget signal. On ImageNet-1k, after only three distillation epochs, ViT-B/16 $2{:}4{+}$ToMe reaches $83.41\%$ top-1 ($-1.70$ pp from dense) at $59.81\%$ sparse-execution MAC reduction, with $1.388\times$ best-observed A40 native-$2{:}4$ backend speedup for the same checkpoint and ToMe graph; a separate no-ToMe A100 endpoint gives $2.705\times$. At structured sparsity, ViT-B/16 $6{:}12$ reaches $83.74\%$, ViT-L/16 $8{:}16$ dense+permutation reaches $85.33\%$ ($-0.51$ pp), and ConvNeXtV2-Base $12{:}16$ reaches $86.35\%$ ($-0.37$ pp). For CNNs, ResNet50 $8{:}16$ dense+permutation reaches $75.87\%$ ($-0.26$ pp), and ResNet152d CAST-conv+permutation reaches $81.33\%$ ($-1.53$ pp) at ${\sim}50\%$ MAC accounting with a $1.62\times$ A40 im2col$+2{:}4$ sparse-GEMM audit.

【22】Auditable Climate Risk Intelligence from Fragmented ESG Data: Deterministic Orchestration and Imbalance-Aware Learning for Scope 1-3 Validation
标题:来自碎片化ESG数据的可审计气候风险情报:范围1-3验证的确定性预测和不平衡意识学习
链接:https://arxiv.org/abs/2606.02604

作者:Karan Sehgal,Khawar Naveed Bhatti
备注:22 pages, 7 figures. Preprint
摘要:ESG and climate risk data remain fragmented across heterogeneous Scope 1, Scope 2, and Scope 3 reporting environments, while conventional validation pipelines lack provenance aware auditability, hidden drift detection, and reproducibility oriented governance. This paper proposes a deterministic climate risk intelligence framework integrating single source of truth orchestration, temporal anomaly detection, imbalance aware ensemble learning, and explainability oriented governance for auditable ESG validation. To support open reproducibility, we construct and release a synthetic ESG validation benchmark calibrated against publicly reported characteristics of the GHG Protocol, PCAF, and ISSB standards. The methodology incorporates temporal drift analysis, SMOTE based rare event optimization, ensemble learning, provenance aware orchestration, and TreeSHAP based interpretability for governance inspection and audit reconstruction. We evaluate the framework against statistical classifiers, anomaly detection methods, temporal forecasting baselines, and a threshold based system using classification metrics (recall, F1, ROC AUC), calibration metrics (ECE, Brier score), and a governance oriented audit trace completeness metric measuring the fraction of flagged anomalies for which a deterministic source to escalation provenance chain can be reconstructed. Results are reported as mean and standard deviation across stratified five fold cross validation with paired significance testing. The framework reframes ESG reporting toward deterministic climate risk governance infrastructure supporting reproducibility, explainability, and operational auditability.

【23】Spectral Asymptotics of Neural Network Loss Landscapes: An Exact Decomposition of the Curvature Exponent
标题:神经网络损失景观的谱渐进性:弯曲指数的精确分解
链接:https://arxiv.org/abs/2606.02596

作者:Anherutowa Calvo
备注:13 pages, 6 figures, 3 tables. Code and data: https://github.com/9D-Labs/9d-spectral-alignment-decomposition
摘要:The curvature exponent $α$ in $h_k \propto σ_k^α$ -- governing how Hessian eigenvalues scale with gradient singular values -- varies systematically across layer types ($α\approx 2$ for convolutions, $\approx 1$ for transformer attention, $< 1$ for MLP up-projections). Why? We prove the Spectral Alignment Decomposition: $α= 2 + d\logΦ_k / d\logσ_k$, where $Φ_k$ measures alignment between Kronecker factor eigenbases and gradient singular directions. This reduces "why does $α$ vary?" to a geometric question we answer for LayerNorm, residual connections, and softmax heads. The decomposition implies a spectral transfer identity $s = αγ$ linking curvature exponent, effective gradient rank-decay $γ$, and Hessian decay exponent $s$. The identity is algebraic; its empirical content is that $α$ and $γ$, fit on independent data (HVPs vs. SVD), recover $s$ to ~2% median error across 93 layers, five architectures, and three datasets -- with no free parameters. A zeta-function bound on participation ratio shows curvature concentrates onto effectively one direction per layer. As a proof of concept, we derive the architecture-adaptive preconditioner $T(σ;α)$ and show that Spectral Newton -- implementing $T$ in the gradient singular basis -- outperforms AdamW on vision benchmarks where $α\approx 2$.

【24】Human-in-the-Loop Contextual Bandits for Short-Term Rental Dynamic Pricing: Structural Equivalence of Historical Warm-Up and Approval-Gated Live Learning
标题:短期租金动态定价的人参与情境盗贼:历史热身和审批门控现场学习的结构等效性
链接:https://arxiv.org/abs/2606.02595

作者:Oleg Miroshnichenko
摘要:Dynamic pricing in short-term rental (STR) markets presents a distinctive challenge for online learning algorithms: pricing decisions carry significant financial risk, operators require explainability, and market feedback is sparse (one booking outcome per listed night). We introduce the Human-in-the-Loop Gated Bandit (HITL-GB) framework, in which a contextual bandit algorithm generates price recommendations but a human agent retains authority to accept, modify, or reject each recommendation before it is applied. We show that under this approval constraint, historical pricing data -- collected under a prior deterministic policy -- is structurally equivalent to on-policy warm-up data for initialising the bandit's posterior, bypassing the weeks-to-months cold-start period that renders pure online bandit learning impractical in sparse-feedback markets. We formalise the approval-gated reward signal, derive a regularised ridge-regression warm-up procedure from historical episodes, and validate the approach on real STR production data (anonymised urban market, 2 rooms, April 2022 -- April 2026, 1,461 nightly pricing episodes). Our warm-up procedure compresses effective cold-start from ~150 episodes to ~30 episodes when initialising agents from the Hierarchical Factored Thompson Sampling (HF-TS) family. We further argue that the structural equivalence result is domain-agnostic: any high-stakes domain where human approval is legally or operationally required -- including clinical drug dosing, credit origination, content moderation, and radiological diagnosis -- satisfies the same conditions and benefits from the same warm-up strategy. In regulated industries, mandatory human oversight is thus a statistical asset rather than a deployment constraint.

【25】Applying Two-Grid Preconditioner for Subsurface Flow Simulation using Attention-enhanced Hybrid Network to Accelerate Multiscale Discretization in High-contrast Media
标题:使用注意力增强混合网络将两网格预处理器应用于地下流模拟,以加速高对比度媒体中的多尺度离散化
链接:https://arxiv.org/abs/2606.02582

作者:Peiqi Li,Jie Chen,Shubin Fu
摘要 :In this paper, we study the efficient numerical solution of Darcy equations in strongly heterogeneous media with high-contrast permeability and propose a hybrid framework that combines learning with multiscale numerical methods. The learning component is used for the prediction of multiscale basis functions in the mixed generalized multiscale finite element method (mixed GMsFEM), with the goal of reducing the repeated local computations required in the offline stage. Once these basis functions are predicted, the global system is assembled and the pressure field is computed by a two-grid preconditioned solver. The resulting method accelerates the costly local basis-construction stage while retaining the multiscale discretization and preconditioned iterative structure of the underlying solver. Numerical experiments on two-dimensional heterogeneous Darcy problems show that the proposed framework yields more accurate final pressure reconstruction than several representative learning-based methods and remains stable under strong heterogeneity and high-contrast coefficients. In comparison with the traditional mixed GMsFEM, its main advantage lies in the efficiency of the basis-generation stage, while the quality of the global solve is still ensured by the two-grid preconditioner. These results indicate that accelerating multiscale basis construction through learning, while preserving a mature numerical solver for the global problem, provides a viable approach for high-resolution Darcy-type simulations.

【26】An Asymptotic Theory of Chain-of-Thought in In-Context Learning
标题:情境学习中思维链的渐进理论
链接:https://arxiv.org/abs/2606.03217

作者:Kaito Takanami,Cengiz Pehlevan
摘要:Chain-of-thought (CoT) reasoning has become a widely used mechanism for eliciting multi-step reasoning in large language models by generating intermediate reasoning steps at inference time. Yet the scaling behavior of generalization with CoT depth remains poorly understood. To address this question, we study a theoretically solvable model of CoT for in-context weight prediction in linear regression, where test-time reasoning is represented as an iterative refinement of the weight-parameter estimate. Using tools from random matrix theory under high-dimensional asymptotics, we derive an exact formula for the generalization error as a function of reasoning depth, pretraining data amount, and context length. Our analysis reveals a sharp phase transition separating exponential and polynomial improvement, saturation, and overthinking, and characterizes how the optimal reasoning depth scales. We further show that deeper reasoning is most effective with sufficiently rich pretraining and in-context information, whereas limited pretraining or context makes longer reasoning prone to error amplification or saturation. We also validate these predictions through experiments on fully learned linear attention and softmax attention models. Our results provide a unified theoretical account of how test-time CoT depth affects generalization.

【27】Theoretical Aspects of Lie Groupoid and Lie Algebroid Equivariant Convolutional Neural Networks
标题:李群和李代数体等变卷积神经网络的理论问题
链接:https://arxiv.org/abs/2606.02758

作者:Michael Astwood
备注:28 pages, 2 figures. Preliminary version. Comments and criticism welcome!
摘要:We introduce Lie groupoid equivariant neural networks as a specialization of recently proposed topological category-equivariant neural networks to the differentiable setting. Lie groupoid equivariant neural networks are composed from Lie groupoid lifting convolutions and Lie groupoid convolution layers, and we show how for suitable Lie groupoids they are equivalent to certain Lie algebroid-equivariant neural networks. We additionally describe groupoid invariant global pooling as a generalization of group invariant global pooling. Furthermore, we show that each of the aforementioned layers is a special case of recently introduced admissible category-equivariant layers by demonstrating that they define continuous natural transformations between continuous feature functors.

【28】Coherent Swap Regret and Channel-Proof Learning
标题:连贯的交换遗憾和无障碍学习
链接:https://arxiv.org/abs/2606.02655

作者:Sohail Sarkar
备注:23 pages
摘要:External regret certifies stability only against replacing one's behavior by a fixed alternative. In a quantum game, this misses a natural physical move: a player can apply a local completely positive trace-preserving (CPTP) map to the state it actually received or prepared. We introduce coherent swap regret as the regret benchmark against all such local CPTP deviations, and give an algorithm achieving $O(\sqrt{dT\log d})$ coherent swap regret via entropic mirror ascent on the CPTP Choi slice with a fixed-point play rule. The main result is a three-level deviation-class landscape. Replacement channels recover ordinary external regret at rate $Θ(\sqrt{T\log d})$. Unital channels, including unitary deviations and mixtures of unitaries, have zero minimax regret. Deterministic measurement-and-preparation channels already force $Ω(\sqrt{dT\log d})$ regret in the moderate-horizon regime, and this rate is also sufficient for all CPTP deviations. Thus the hardness comes from non-unital use of the recommendation register, not from quantum coherence alone. As an application, decentralized full-information learning in finite quantum games reaches an $\varepsilon$-approximate separable quantum correlated equilibrium after $T=O(\max_i d_i\log d_i/\varepsilon^2)$ rounds. We identify these equilibria with channel-proofness of mediated quantum recommendation protocols, give an SDP audit for local CPTP exploitability applicable to arbitrary finite-dimensional states, and include a probing-bandit extension with pseudo-regret $O(d^{4/3}T^{2/3}(\log d)^{1/3})$ under Haar-random pure-state probes.

【29】Target Updates May Stabilize Linear Q-Learning: Periodic and Soft Dynamics
标题:目标更新可能稳定线性Q学习:周期性和软动态
链接:https://arxiv.org/abs/2606.02645

作者:Donghwan Lee
摘要:Periodic target updates in Q-learning and soft target updates in actor-critic methods are empirically well established stabilization mechanisms, but their precise theoretical explanation is still incomplete. This paper gives a rigorous and exact analysis of these mechanisms for Q-learning with linear function approximation (linear Q-learning) using the exact switched linear system (SLS) dynamics induced by the Bellman maximum and the joint spectral radius (JSR) of the resulting switching matrix families. Although linear Q-learning can fail to converge in general, we prove that, under explicit spectral and step-size conditions, periodic hard target updates and soft target updates can guarantee convergence to the exact projected Q-Bellman solution. The main analysis is carried out for deterministic linear Q-learning, where the target-update mechanism is most transparent. Once the corresponding JSR certificate is established for the mean recursion, the stochastic reinforcement-learning setting can be treated by replacing deterministic modes with sampled stochastic modes and adding the corresponding stochastic-noise analysis.

【30】Position: Prioritize Identifying Structure, Not Complex Models, for Scientific Discovery
标题:立场:优先识别结构,而不是复杂模型,以进行科学发现
链接:https://arxiv.org/abs/2606.02632

作者:Tyler H. McCormick
备注:Will appear as a position paper in ICML
摘要:Modern Machine Learning (ML) and Artificial Intelligence (AI) models, especially large language models (LLMs), are increasingly used to generate scientific hypotheses and mechanistic explanations from observational data. This position paper argues that in the high-dimensional proxy regimes where modern ML excels, mechanistic learning is generically underdetermined: many incompatible mechanisms induce essentially the same observational relationships on the support of the data, so predictive success and coherent explanations are insufficient evidence of mechanism discovery. This underdetermination becomes uniquely hazardous with large language models (LLMs), which tend to collapse large equivalence classes of explanations into a single fluent narrative. This paper proposes concrete standards for ``mechanistic ML,'' and argues these norms are necessary if LLM-centered workflows are to support science rather than merely simulate it.

其他(68篇)

【1】Neuron Populations Exhibit Divergent Selectivity with Scale
标题:神经元群体随尺度表现出不同的选择性
链接:https://arxiv.org/abs/2606.03990

作者:Amil Dravid,Yasaman Bahri,Alexei A. Efros,Yossi Gandelsman
备注:Project page and code: https://avdravid.github.io/rosetta-neuron-scaling/
摘要:We investigate whether neuron populations within neural networks evolve predictably with scale, extending scaling laws beyond macroscopic observables such as loss. To probe this question, we study Rosetta Neurons, a previously characterized class of neurons whose activation patterns are similar across independently trained models (Dravid et al., 2023). In separate analyses of language models up to 30B parameters and vision models up to 5B parameters, we observe that the population of Rosetta Neurons follows a sublinear power law in model size, growing in absolute number but occupying a shrinking fraction of the total neuron count. We further observe a Neuron Polarization Effect: Rosetta Neurons become more selective and increasingly monosemantic with scale, separating from a growing non-Rosetta population that remains less selective. An analytical model balancing feature utility against limited neuron capacity explains the sublinear power-law scaling and this polarization effect. Finally, we find that Rosetta Neurons become more domain-specialized with scale and illustrate their selectivity through a targeted data-filtering case study for continued pretraining. Our results point to a scaling law for interpretable, shared neuron-level structure, linking model size to systematic changes in neuron universality, selectivity, and specialization.

【2】Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill
标题:Skill-RM:通过代理技能统一异类评估标准
链接:https://arxiv.org/abs/2606.03980

作者:Tao Chen,Gangwei Jiang,Pengyu Cheng,Siyuan Huang,Yihao Liu,Jingwei Ni,Jiaqi Guo,Mengyu Zhou,Kai Tang,Junling Liu,Qinliang Su,Xiaoxi Jiang,Guanjun Jiang
摘要:Reward models (RMs) provide critical feedback signals for LLM post-training, notably in reinforced fine-tuning (RFT) and reinforcement learning (RL) pipelines. However, current reward evaluation relies on heterogeneous criteria such as rule-based verifiers, ground-truth references, procedural checklists, and complex rubrics, where a unified mechanism to integrate all types of evidence remains unexplored. To this end, we propose Skill Reward Model (Skill-RM), a unified framework that reformulates reward modeling as the execution of a reusable Reward-Evaluation Skill. By treating reward computation as a structured agentic task, Skill-RM provides a consistent interface to orchestrate heterogeneous resources, dynamically selecting and aggregating evidence tailored to the specific requirements of each input. This approach enables the reward model to move beyond static evaluation, ensuring consistency and transparency across diverse tasks. Extensive experiments on reward benchmarks and downstream applications, including best-of-N selection and reinforcement learning, demonstrate that Skill-RM consistently outperforms traditional judge baselines. Our findings suggest that Skill-RM not only provides a unified solution for reward modeling but also achieves superior performance through the strategic and dynamic orchestration of evidence. The code is at https://github.com/Qwen-Applications/Skill-RM.

【3】Formalizing the Binding Problem
标题:正式化约束问题
链接:https://arxiv.org/abs/2606.03976

作者:Lianghuan Huang,Yihao Li,Saeed Salehi,Yingshan Chang,Ansh Soni,Konrad P. Kording
备注:Accepted to ICML 2026
摘要:Representations of the world, arguably, contain information about features (e.g. something is blue, something is a circle) but also information about which features are part of the same object (e.g. the circle is blue), which we call binding information. Any system with the ability to understand scenes with multiple objects must be able to solve the binding problem: it needs to know which features belong together. However, despite work showing that Vision Transformers (ViTs) know which patches belong together, it is not known whether current deep learning models learn to exhibit binding information, i.e., for features. We may believe that there is not much binding information, after all misattributing features to wrong objects is a common failure of ViT-based architectures, especially in scenes with objects sharing features. Here we formalize the binding problem with an information-theoretic approach, and introduce a probing method to measure binding information in model representations. We perform experiments on ViTs, measuring binding from different components of the architecture, such as the image summary token [CLS] or the spatial tokens. We use datasets with different binding challenges, such as feature sharing, occlusion, and natural features, while comparing the performance of several pre-trained ViTs. Overall, our research demonstrates binding as a key ingredient to strong visual recognition and reasoning.

【4】VLESA: Vision-Language Embodied Safety Agent for Human Activity Monitoring
标题:VLESA:用于人类活动监控的视觉语言授权安全代理
链接:https://arxiv.org/abs/2606.03954

作者:Hanjiang Hu,Yiyuan Pan,Jiaxing Li,Xusheng Luo,Alexander Robey,Na Li,Yebin Wang,Changliu Liu
备注:18 pages, 5 tables, 5 figures
摘要:As AI systems increasingly assist humans in physical tasks, ensuring safety becomes paramount -- physical actions carry immediate and irreversible consequences that digital errors do not. We introduce the Vision-Language Embodied Safety Agent (VLESA), a framework that monitors human activities from egocentric video and triggers real-time safety interventions when dangerous actions are predicted. VLESA addresses intent-dependent safety where identical actions can be safe or dangerous depending on context. A dataset pairing egocentric frames with goal-conditioned safety annotations is introduced, enabling a goal-conditioned safety Q-filter trained via GRPO that evaluates actions with respect to inferred intent without retraining. On top of that, an intent-action prediction agent is proposed to jointly infer goals and predict future actions from video. On the ASIMOV-2.0 benchmark, VLESA achieves higher intervention accuracy at the exact ground-truth frame compared to baselines, while the GRPO-trained Q-filter improves action safety by over 41 percentage points through goal-conditioned constrained decoding. Code is available at https://github.com/HanjiangHu/VLESA.

【5】MLSkip: Data Skipping for ML Filters via Lightweight Metadata
标题:MLSkip:通过轻量级元数据实现ML过滤器的数据跳过
链接:https://arxiv.org/abs/2606.03946

作者:Mihail Stoian,Mark Gerarts,Pascal Ginter,Andreas Zimmerer,Jan Van den Bussche,Andreas Kipf
摘要:Database vendors recently released AI functions that can be used in filter predicates. As such functions often rely on costly, black-box ML models, they unveil new data management challenges. Concretely, traditional data skipping techniques for integer and string data fail to be applicable to the new filter type. Indeed, there is no known mechanism for pruning non-qualifying row groups, e.g., when reading files from blob storage. In this work, we initiate the study of data skipping techniques for ML filters. We make the case that Parquet's default min-max metadata is enough to enable pruning. To this end, we draw connections to two lines of research: (i) the recently proposed query language for ML models and (ii) neural network verification. Our preliminary results on ReLU architectures show that on tables from TPC-H and TPC-DS, the average pruning effectiveness for filters of selectivity below 0.1% amounts to 27.4%. Finally, inspired by research on spatial joins, we propose an enhanced metadata structure: a size-bounded 2D convex hull that verification tools can make better use of, increasing the pruning effectiveness to 38.31%, while occupying at most 45 bytes per row group and column pair. We observe an end-to-end speedup of 1.07$\times$ over PyTorch in DuckDB.

【6】q0: Primitives for Hyper-Epoch Pretraining
标题:q0:超纪元预训练的基本要素
链接:https://arxiv.org/abs/2606.03938

作者:Bishwas Mandal,Shmuel Berman,Akshay Vegesna,Samip Dahal
摘要:Multi-epoch training is becoming the standard now that compute is growing faster than the supply of high-quality text. But pretraining a single model saturates within a few passes, long before the compute budget is exhausted. We argue this calls for a conceptual shift from training a single model toward exploring a population of models and aggregating their predictions. We introduce hyper-epoch pretraining (q0), which turns a multi-epoch budget into a population of diverse models whose combined predictions reach a lower validation loss than a single refined model. q0 reduces to three core primitives. A cyclic schedule with anti-correlated learning rate and weight decay collects diverse models from a few parallel trajectories. Chain distillation trains each model against its predecessor so that model quality compounds across the population. A learned prior, fit on a held out set, selects and weights members for any inference budget. On a 1.8B-parameter model trained on 100M FineWeb tokens, q0 matches a strong 256-epoch ensemble baseline using only ${\sim}56$ epochs (${\sim}4.6\times$ fewer), or ${\sim}67$ epochs (${\sim}3.8\times$ fewer) when matched to the baseline's ensemble size, and continues to improve beyond it. These gains reach cumulative ${\sim}12.9\times$ data efficiency under the Slowrun setting and transfer to downstream benchmarks. Crucially, the optimal allocation shifts with the budget, so we give prescriptive recipes for how to spend a given epoch budget to maximize generalization, from a single epoch up to the largest budgets.

【7】Correcting Neural Operator Spectral Bias via Diffusion Posterior Sampling with Sparse Observations
标题:通过具有稀疏观测的扩散后验抽样来纠正神经运算符谱偏差
链接:https://arxiv.org/abs/2606.03936

作者:Niccolò Perrone,Fanny Lehmann,Stefania Fresca,Filippo Gatti
摘要:Neural operator surrogates (NO) approximate PDE solutions orders of magnitude faster than numerical solvers, but suffer from spectral bias: high-frequency content is systematically attenuated, limiting reliability where fine-scale structure matters. Sparse sensor measurements of the field are often available too, offering pointwise accuracy without spectral distortion but covering only a small fraction of the domain. We address this by treating NO predictions as auxiliary observations in a diffusion posterior sampling framework. Our method, FreqNO-DPS (https://github.com/niccoloperrone/FreqNO-DPS), combines an unconditional score-based diffusion prior, trained on high-fidelity simulations, with diffusion posterior sampling (DPS) conditioned on sparse observations and guided by a frozen neural operator. Naive integration reintroduces the surrogate's spectral bias; we resolve this with a closed-form, spectrally shaped guidance score that weights the surrogate by its frequency-dependent accuracy and needs no denoiser backpropagation. A distribution-free analysis bounds the approximation error across the frequency-diffusion-time plane and shows the guidance's frequency dependence is preserved regardless of distributional assumptions. On 3D elastic wavefield prediction at 5% and 2% sensor coverage, the method reaches near-zero spectral bias across all bands, where both the surrogate and sensor-only DPS show systematic high-frequency attenuation. Isotropic guidance, the natural baseline, improves pointwise accuracy but carries the bias into the posterior nearly intact, confirming that frequency-dependent calibration is essential, not merely beneficial. The framework needs only paired surrogate/reference data and exploits no problem-specific structure beyond the residual's approximate spectral diagonality, verifiable for new surrogates via the coherence diagnostic we provide.

【8】Quadratic integrate-and-fire neurons exhibit less fragmented loss landscapes and outperform leaky integrate-and-fire neurons in spike-based gradient descent
标题:二次整合激发神经元表现出更少的碎片化损失景观,并且在基于尖峰的梯度下降中优于泄漏的整合激发神经元
链接:https://arxiv.org/abs/2606.03935

作者:Carlo Wenig,Raoul-Martin Memmesheimer,Christian Klos
备注:9 pages, 5 figures (main part)
摘要:The ability to train spiking neural networks is essential for modeling biological neural networks as well as for neuromorphic computing. However, for the extensively used leaky integrate-and-fire (LIF) neurons, arbitrarily small parameter changes can induce spike (dis)appearances that disrupt subsequent activity, leading to unstable neural representations and permanently silent neurons during exact spike-based gradient descent. Recent work shows that a class of neuron models, which includes the quadratic integrate-and-fire (QIF) neuron, avoids these discontinuities and enables continuous and even smooth spike-based gradient descent. However, it remains unclear whether these advantages translate into practice. Here, we demonstrate that they do so via a controlled comparison between networks of LIF and QIF neurons on the popular Spiking Heidelberg Digits dataset. Specifically, in a first step, we perform a thorough hyperparameter search to optimize both models, revealing a clear performance advantage of QIF neurons. In a second step, we visualize the loss and gradient landscapes. Consistent with their inferior performance, we find that the loss landscapes of LIF neurons, which are discontinuous, appear more fragmented and the related gradients more erratic. An analysis of the landscapes of single samples indicates that these features arise from changes in the temporal order of spikes, which often cause disruptive spike (dis)appearances. Overall, our results advocate replacing LIF neurons with neuron models exhibiting continuous spiking dynamics, such as QIF neurons, for gradient descent training.

【9】MAdam: Metric-Aware Multi-Objective Adam
标题:MAAdam:具有节拍意识的多目标Adam
链接:https://arxiv.org/abs/2606.03904

作者:Fengbei Liu,Rachit Saluja,Sunwoo Kwak,Ruibo Wang,Ruining Deng,Heejong Kim,Johannes C. Paetzold,Mert R. Sabuncu
摘要:Multi-objective optimization (MOO) underlies many machine learning problems, yet MOO solvers across the loss-balancing, gradient-balancing, and Pareto-based families almost universally hand their reconciled directions to Adam~\cite{kingma2015adam}. We show this coupling introduces two systematic gaps between the solver's intent and the optimizer's execution. The first is a \emph{weighting mismatch}: Adam's second-moment denominator entangles the time-varying preference vector with gradient statistics, marginalizing the preference into a history average and collapsing distinct Pareto trade-offs toward a near-uniform mixture. The second is a \emph{geometric mismatch}: Adam's adaptive metric distorts the Euclidean geometry MOO solvers assume, turning aligned objectives into apparent conflicts. To resolve both jointly, we introduce \textbf{MAdam} (Metric-Aware Multi-Objective Adam), a drop-in wrapper that leaves both solver and optimizer unchanged. MAdam preconditions the reconciled direction by the preference-conditioned curvature of the scalarized objective; on this whitened input, Adam's second moment collapses to identity, so the realized update is governed by the preference-conditioned metric. Across multi-task learning, Pareto-front recovery, physics-informed neural networks, and medical imaging, MAdam consistently improves over Adam for every solver family.

【10】Attribution via Distributional Paths for Information Revelation
标题:通过信息披露的分布路径归因
链接:https://arxiv.org/abs/2606.03885

作者:Kieran A. Murphy,Shameen Shrestha
备注:Code: https://github.com/murphyka/Reveal-IG
摘要 :Feature attribution methods explain predictions by assigning importance scores to input features. Path-based methods such as Integrated Gradients are especially appealing because they satisfy \textit{completeness}: attributions sum to the change in model output between a reference state and the input. Yet most path methods define this trajectory in input space, explaining a model through pointwise perturbed inputs along a chosen path. An input-space path integrates the model's raw response at each point it passes through, with no control over the resolution at which a feature is queried; the early, baseline-adjacent part of the trajectory contributes to the explanation on equal footing with the input itself. Here, we lift path attribution from input space to a space of structured probe distributions around the example of interest, and call our method Reveal-IG. Rather than traversing raw input values, Reveal-IG progressively reveals information about the input and attributes changes in the model's expected output along this distributional path. The result is a path-attribution framework that retains completeness with respect to the expected model response, and naturally accommodates multiscale image probes and feature-wise uncertainty in tabular data. Synthetic diagnostics show that Reveal-IG avoids path artifacts that affect input-space methods, and across ImageNet classification and tabular regression it produces stable, signed attributions -- leading on metrics that use attribution sign while remaining competitive on the rest.

【11】Visual Instruction Tuning Aligns Modalities through Abstraction
标题:视觉指令调优通过抽象调整模式
链接:https://arxiv.org/abs/2606.03871

作者:Luis Palacios,Lorenzo Basile,Diego Doimo,Alberto Cazzaniga
摘要:Visual instruction tuning effectively adapts a pre-trained Large Language Model (LLM) to process image information alongside text. Yet, it remains unclear how visual features are embedded into the layer-wise hierarchy of abstractions of the LLM backbone. Across a diverse set of vision-language architectures, we show that instruction tuning primarily serves as a bridge, embedding visual features directly into the intermediate semantic layers of the LLM, bypassing the early layers devoted to unimodal processing. With probing analyses and causal interventions, we show that these intermediate layers are the semantic core of vision-language processing and play a critical role in the performance on a broad set of multimodal benchmarks. In addition, by comparing the geometry of semantically equivalent visual and textual representations, we find that fine-tuning extends and strengthens the existing abstraction phase, aligning visual features with pre-existing textual ones. Finally, we confirm the functional role of this localized alignment by restricting fine-tuning to intermediate layers alone: this strategy preserves the performance of full fine-tuning on vision-centric benchmarks while reducing training time. Our results suggest that multimodal integration is a localized phenomenon driven by the repurposing of the internal abstraction engine of the LLM.

【12】Two-Action Apple Tasting with Switching Costs
标题:具有转换成本的双动苹果品尝
链接:https://arxiv.org/abs/2606.03851

作者:Tommaso Cesari,Roberto Colomboni
摘要:We study the two-action apple-tasting problem with switching costs against an oblivious adversary. In an equivalent normalized formulation, at each round the learner chooses between a revealing action and a blind action: the revealing action gives reward $0$ and reveals the hidden value $x_t\in[-1,1]$ of the blind action; the blind action gives reward $x_t$ but reveals nothing. The learner pays one unit whenever they switches actions, and regret is measured against the best fixed action in hindsight. General feedback-graph algorithms with switching costs give $\widetilde O(T^{2/3})$ regret guarantees for this problem. The two-action apple-tasting graph was the natural candidate for the missing $Ω(T^{2/3})$ obstruction in the switching-cost classification: such a lower bound would have transferred to a large family of still-unclassified feedback graphs. We prove that this obstruction is not there: the oblivious minimax expected regret for this problem satisfies \[ \frac{1}{2\sqrt3}\cdot\sqrt T \le R_T^\star \le 2\sqrt{3}\cdot \sqrt{T}. \]

【13】Finding Needles in the Haystack: Transductive Active Labeling in Ecology
标题:大海捞针:生态学中的转导活性标记
链接:https://arxiv.org/abs/2606.03821

作者:Rupa Kurinchi-Vendhan,Sara Beery
摘要:Active learning is now standard practice in labeling ecological data, enabling ecologists to quickly process large volumes of field data to understand and monitor natural environments. Current practices evaluate active learning inductively, estimating predictive performance on a held-out test set. We argue that this evaluation is misaligned with most ecological tasks, where the goal is to transductively label an entire pool of data as efficiently as possible. We demonstrate that ignoring the human-in-the-loop underestimates the importance of continuing to label, particularly for classes in the long tail which may be of disproportionate ecological importance (rare species, uncommon behaviors, etc.). Our analysis shows that, for this long tail, the transductive objective shifts importance from prediction to discovery: the true challenge becomes finding "needles in the haystack," examples of rare classes that are embedded within dense regions of abundant classes in the latent geometry, which we quantify with a novel metric of sampling difficulty. Finally, to translate these insights to practical ecological workflows, we propose a conservative hybrid stopping criterion inspired by ecological rarefaction curves, and show that combining predictive performance with discovery criteria reduces premature stopping on long-tailed pools, improving rare-class recovery when discovery, not classification, is the limiting factor.

【14】TreeFlash: Parallel AR-Approximation for Faster Speculative Decoding
标题:TreeFlash:用于更快推测解码的并行AR近似
链接:https://arxiv.org/abs/2606.03819

作者:Peer Rheinboldt,Frédéric Berdoz,Roger Wattenhofer
摘要:One-shot block drafters for speculative decoding generate the full draft in a single forward pass, achieving strong throughput by eliminating sequential token generation. However, they predict each draft token conditioned only on the prefix context, with no dependence on previously drafted tokens. This non-autoregressive conditioning causes the drafter's distribution to diverge from the verifier's true autoregressive distribution as draft depth grows. This problem becomes more severe in tree-based drafting, where distinct branches are forced to share the same marginal distribution for subsequent tokens. We propose TreeFlash, which addresses this by incorporating an MLP layer conditioned on the drafter's hidden state and the previous token to approximate an autoregressive distribution. TreeFlash retains the $\mathcal{O}(1)$ decoding time complexity of one-shot drafters by employing a two-stage approximation mechanism. TreeFlash achieves state-of-the-art performance across a variety of tasks and models, improving over marginal tree drafting by $12\%$ higher block efficiency and $9\%$ higher speedup.

【15】PURGE: Projected Unlearning via Retain-Guided Erasure
标题:PURGE:通过保留引导擦除的预计取消学习
链接:https://arxiv.org/abs/2606.03808

作者:Vedant Jawandhia,Daksh Ahuja,Ghufran Alam Siddiqui,Prashant Trivedi,Yash Sinha,Pratik Narang
备注:13 pages, 10 figures, 6 tables
摘要 :We propose PURGE, a machine unlearning algorithm built on a simple but an under-exploited observation: continual learning (CL) and machine unlearning (MU) which are fundamentally dual problems. CL tries to learn new tasks without forgetting old ones; MU tries to erase specific data without hurting retained performance representing the same underlying tension in opposite directions. PURGE leverages this duality by adapting gradient projection from A-GEM (Chaudhry et al., 2019) so that every unlearning step is constrained to not increase the retain-set loss. On top of this, it performs multi-layer representation erasure, pushing forget-set activations in intermediate layers towards the retain distribution to remove information from hidden representations rather than just suppressing it at the output. A key design choice is the retain-confusion target: rather than pushing forget outputs toward the uniform distribution, which we found to be surprisingly easy for membership inference attacks to detect, we instead target the model's natural confusion pattern on retain data. This makes the unlearned model hard to distinguish from one retrained from scratch. Two self-regulating stopping criteria (a retain-loss budget and a forget-accuracy target) let the algorithm decide on its own when to stop, removing the need for manual epoch tuning. In experiments on five datasets (CIFAR-10, MNIST, SVHN, STL10, PathMNIST) across 22 class-level forgetting tasks, PURGE consistently keeps retain accuracy above 96% while achieving MIA AUROC close to 0.5 (the ideal), outperforming gradient ascent, KL-uniform, and several published baselines on the privacy-utility frontier.

【16】Trading Human Curation for Synthetic Augmentation in RLVR
标题:在WLVR中用人体治疗换取合成增强
链接:https://arxiv.org/abs/2606.03800

作者:Akshansh ,Leonardo Rosa Rodrigues,Michael Korostelev,Youssef Hassan,Mark E. Whiting
备注:21 pages, 5 main-text figures, 4 appendix figures. Preprint
摘要:The supply of high-quality training tasks is a central bottleneck for reinforcement learning from verifiable rewards (RLVR) on agentic language models. Each task requires a sandboxed setup, a prompt, and a hand-authored reward function, and only tasks that pass a quality bar produce useful training signal. Hand-curation at this quality bar does not scale economically to the task counts effective RL training requires, and the substitution rate between automatically generated task variants and human-authored ones is not yet established. We investigate using pre-specified, gate-filtered augmentations of a small hand-authored base as a substitute for additional human curation during RLVR. We formalize the cost-adjusted trade rate $ρ_{\text{cost}}$ between augmented and human-authored tasks, measure it through a controlled ablation across training corpora with varying augmentation share, and characterize the end-to-end economics of the augmentation pipeline. Substituting augmented content for additional human-authored tasks retains aggregate held-out generalization on a ten-benchmark suite spanning code, instruction following, reasoning, and multi-turn agentic function-calling. The cost-adjusted trade rate $ρ_{\text{cost}}$ between gated synthetic and human-authored RLVR tasks stays in $[1.4\times, 11.6\times]$ across the plausible $c_{\text{human}}/c_{\text{aug}}$ range.

【17】Training-Free Multi-Concept LoRA Composition with Prompt-Aware Weighting
标题:具有预算感知加权的免训练多概念LoRA构成
链接:https://arxiv.org/abs/2606.03792

作者:Georgios Tsoumplekas,Stella Bounareli,Vasileios Argyriou
备注:Accepted at IEEE FG 2026
摘要:Low-Rank Adaptation (LoRA) successfully enables personalization in text-to-image generation by adapting pre-trained diffusion models to specific visual concepts and styles. However, extending such models to multi-concept customization remains challenging. Naively combining multiple LoRA weights or their outputs often leads to interference among concepts, resulting in degraded visual quality and reduced fidelity to the reference images of individual concepts. This paper proposes a simple yet effective approach for multi-concept customization by optimally combining the outputs of multiple LoRA modules. We leverage the relative importance of each concept during generation, as inferred from its corresponding prompt tokens and introduce two methods, W-Switch and W-Composite, that employ a prompt-aware importance weighting strategy in which each LoRA is weighted according to the semantic influence of its trigger words in the target prompt. In addition, we extend existing quantitative evaluation metrics by proposing a new image-based similarity evaluation framework that assesses image fidelity and identity preservation through comparisons between real-world reference images and automatically segmented concept regions from generated images. We evaluate our approach on the ComposLoRA testbed and demonstrate consistent improvements over existing state-of-the-art methods in terms of visual quality, identity preservation and compositionality. Qualitative evaluations, including a Large Language Model (LLM) based assessment and a user study, further validate the effectiveness of the proposed methods and align with the newly introduced quantitative image-based metrics. Our code is available at https://github.com/GeorgeTsoumplekas/Prompt-Aware-Multi-LoRA-Composition.

【18】Qwen-Image-Flash: Beyond Objective Design
标题:Qwen-Image-Flash:超越客观设计
链接:https://arxiv.org/abs/2606.03746

作者:Tianhe Wu,Kun Yan,Zikai Zhou,Lihan Jiang,Jiahao Li,Jie Zhang,Kaiyuan Gao,Ningyuan Tang,Shengming Yin,Xiaoyue Chen,Xiao Xu,Yilei Chen,Yuxiang Chen,Yan Shu,Yixian Xu,Yanran Zhang,Zihao Liu,Zhendong Wang,Zekai Zhang,Deqing Li,Liang Peng,Yi Wang,Jingren Zhou,Chenfei Wu
摘要:Few-step distillation has become an effective strategy for accelerating advanced visual generative models, yet prior work has largely focused on distillation objectives. In this work, we revisit few-step distillation from a complementary perspective, focusing on the training recipe that critically shapes student performance. Using Qwen-Image-2.0 as a representative case, we systematically investigate three factors in unified text-to-image generation and instruction-guided image editing distillation: data composition, teacher guidance, and task mixture. Our empirical analysis reveals several non-obvious behaviors, which motivate the development of Qwen-Image-Flash. Overall, our results suggest that effective few-step distillation requires not only carefully designed objectives, but also principled organization of the broader training pipeline.

【19】Compress then Merge: From Multiple LoRAs into One Low-Rank Adapter
标题:压缩然后合并:从多个LoRA到一个低级别适配器
链接:https://arxiv.org/abs/2606.03723

作者:Zhengbao He, Ruiqi Ding, Zhehao Huang, Ruikai Yang, Tao Li, Xiaolin Huang
备注:Accepted to ICML 2026. Code: this https URL
摘要

【20】How Many Trees in a Random Forest? A Revisited Approach with Plateau Search and Optuna Integration
标题:随机森林中有多少棵树?重新审视高原搜索和Optuna集成的方法
链接:https://arxiv.org/abs/2606.03549

作者:Vadim Porvatov, Andrey Dukhovny, Andrey Lange
摘要

【21】High-Precision APT Malware Attribution with Out-of-Scope Resilience
标题:具有超范围弹性的高精度APT恶意软件归因
链接:https://arxiv.org/abs/2606.03523

作者:Peter Williams, Adam Sobey, Erisa Karafili
摘要

【22】Demystifying Pipeline Parallelism: First Theory for PipeDream
标题:揭开管道临时工主义的神秘面纱:PipeDream的第一个理论
链接:https://arxiv.org/abs/2606.03498

作者:Ivan Ilin, Peter Richtárik
备注:40 pages, 4 figures
摘要

【23】Analyzing Stream Collapse in Hyper-Connections: From Diagnosis to Mitigation
标题:分析超连接中的河流崩溃:从诊断到缓解
链接:https://arxiv.org/abs/2606.03483

作者:Ekaterina Alimaskina, Gleb Molodtsov, Aleksandr Beznosikov
摘要

【24】PerchRL: Vision-Based Agile Perching on Inclined Platforms under Rapid and Irregular Motion
标题:PerchRL:基于视觉的敏捷栖息在快速和不规则运动下的倾斜平台上
链接:https://arxiv.org/abs/2606.03441

作者:Zihong Lu, Zongzhuo Liu, Huaxu Li, Jinqiang Cui, Jie Mei, Youmin Gong, U Kei Cheang, Boyu Zhou
摘要

【25】Local Guidance, Global Impact: Gaussian-Reshaped Trust Region Unlocks Behavior Transitions
标题:本地指导,全球影响:高斯重塑的信任区域解锁行为转变
链接:https://arxiv.org/abs/2606.03382

作者:Bingxu Liu, Jiashun Liu, Johan Obando-Ceron, Hao Wang, Runze Liu, Pablo Samuel Castro, Aaron Courville, Ling Pan
备注:21 pages
摘要

【26】APIC: Amortized Physics-Informed Calibration using Neural Processes
标题:APIC:使用神经过程的摊销物理信息校准
链接:https://arxiv.org/abs/2606.03355

作者:Aishwarya Venkataramanan, Sai Karthikeya Vemuri, Joachim Denzler
备注:Accepted at UAI 2026
摘要

【27】From Script to Semantics: Prompting Strategies for African NLI
标题:从脚本到语义:非洲NLI的预算策略
链接:https://arxiv.org/abs/2606.03304

作者:Anuj Tiwari, Terry Oko-odion, Hannah Nwokocha
备注:Accepted at the RAIL Workshop, LREC 2026
摘要

【28】A Geometric Lens on Physics-Aligned Data Compression
标题:物理对齐数据压缩的几何镜头
链接:https://arxiv.org/abs/2606.03279

作者:Aleix Segui, Wesley Armour
备注:Proceedings of the 43rd International Conference on Machine Learning, Seoul, South Korea. PMLR 306, 2026
摘要

【29】Let There Be Light: Reflection, Refraction and Scattering for Neural Operators
标题:Let There Be Light:神经运算符的反射、折射和散射
链接:https://arxiv.org/abs/2606.03262

作者:Keke Wu, Yixuan Zhang, Jingrun Chen
摘要

【30】Do Real-World Datasets Contain Natural Experiments? An Empirical Study Using Causal Feature Selection
标题:真实世界的数据集包含自然实验吗?基于因果特征选择的实证研究
链接:https://arxiv.org/abs/2606.03251

作者:Gautam Gare, John Galeotti, Michael Mozer, Deva Ramanan, Nan Rosemary Ke
摘要

【31】When RLHF Fails: A Mechanistic Taxonomy of Reward Hacking, Collapse, and Evaluator Gaming
标题:当WLHF失败时:奖励黑客、崩溃和评估者游戏的机械分类
链接:https://arxiv.org/abs/2606.03238

作者:Zelalem Abahana
备注:20 pages, 8 figures; includes code, artifacts, and live demo
摘要

【32】Solipsistic Superintelligence is Unlikely to be Cooperative
标题:唯我论的超级智能不太可能合作
链接:https://arxiv.org/abs/2606.03237

作者:Rakshit S Trivedi, Natasha Jaques, Logan Cross, Alexander Sasha Vezhnevets, Joel Z Leibo
备注:24 pages, 1 figure, Accepted at Proceedings of the 43rd International Conference on Machine Learning, 2026
摘要

【33】Sample-Size Scaling of the African Languages NLI Evaluation
标题:非洲语言NLI评估的样本量缩放
链接:https://arxiv.org/abs/2606.03219

作者:Anuj Tiwari, Oluwapelumi Ogunremu, Terry Oko-odion, Jesujuwon Egbewale, Hannah Nwokocha
备注:Accepted at the AfricaNLP Workshop, EACL 2026
摘要

【34】Libra: Efficient Resource Management for Agentic RL Post-Training
标题:天秤座:Atltic RL后训练的高效资源管理
链接:https://arxiv.org/abs/2606.03077

作者:Kaiwen Chen, Xin Tan, Jingzong Li, Hong Xu
备注:18 pages, 13 figures
摘要

【35】RMPrior: Bridging Propagation Priors and Diffusion Refinement for Efficient Radio Map Construction
标题:RMPrior:桥梁传播先验和扩散细化以实现高效的无线电地图构建
链接:https://arxiv.org/abs/2606.03074

作者:Zixuan Guo, Xiucheng Wang, Nan Cheng
摘要

【36】Will Accurate Fields Mislead Photonic Design? FromGlobal Accuracy to Port Readout
标题:准确的场会误导光学设计吗?从全球准确性到端口读数
链接:https://arxiv.org/abs/2606.03038

作者:Yitian Zhang,Yonghong chen,Youming Chen,Yiyang Li,Xing Zhe,Renhe Lu,Shaolin Liao,Yuzhe Ma,Zhong Guan
摘要 :Neural field surrogates can accelerate photonic design loops, but a surrogate that looks accurate in global field error can still mis-rank candidate devices when the final decision depends on localized output-port readouts. This risk is acute in propagation-dominated MMI splitters and couplers, where port power, splitting, phase, and coupling are determined by accumulated modal interference and output-window aggregation rather than by average field similarity alone. We study this field-to-design mismatch through a Field/Mediator/Readout view that separates dense complex-field error from propagation-profile and output-window errors before port aggregation. To align the surrogate with this chain, we propose PaNO, a propagation-aligned neural operator that keeps the full-field prediction interface while organizing latent states around local boundary structure, transverse modal content, axial propagation, and cross-mode interaction. We also evaluate PaNO-R2, an output-aware feedback variant for residual field components near the port region. On a 15-wavelength tunable $3{\times}3$ MMI benchmark with 4608 held-out fields, PaNO lowers NeurOLight's port-power error from 0.2018 to 0.0739 despite slightly higher cMAE, showing that global field accuracy alone is not sufficient for design-relevant readout fidelity. PaNO-R2 attains the best cMAE, propagation-profile error, output-profile error, and port-power error, reducing NeurOLight's port-power and output-profile errors by 72.7\% and 72.5\%.

【37】ConTraIRL: Factorized Contrastive Abstractions for Transferable IRL
标题:ConTraIRL:可转移IRL的分解对比抽象
链接:https://arxiv.org/abs/2606.03017

作者:Yikang Gui,Bikramjit Banerjee,Prashant Doshi
摘要:Reward transfer in Inverse Reinforcement Learning (IRL) is unreliable when policies must generalize to unseen combinations of environment dynamics and task goals. We propose Factorized Contrastive Abstractions for Transferable IRL (ConTraIRL), a framework that enables compositional reward transfer by learning decoupled latent representations of these two factors. ConTraIRL uses a dual-encoder architecture that maps observations into separate dynamics and goal latent spaces, trained with a dual contrastive objective. Temporal alignment encourages the dynamics encoder to learn goal-invariant structure, while the goal encoder captures dynamics-invariant features. This factorization supports reward inference under recombined dynamics-goal settings. Experiments on continuous control benchmarks demonstrate effective few-shot transfer to unseen dynamics-goal pairings, improving sample efficiency and reward recovery over transfer IRL baselines.

【38】Outsmarting the Chameleon: Counterfactual Decoupling for Tactical OOD Shifts in Live Streaming Risk Assessment
标题:智取变色龙:实时流媒体风险评估中战术OOD转变的反事实脱钩
链接:https://arxiv.org/abs/2606.02946

作者:Yiran Qiao, Jing Chen, Jiaqi Xu, Yang Liu, Qiwei Zhong, Xiang Ao
备注:Accepted by KDD'26
摘要

【39】Fast Unlearning at Scale via Margin Self-Correction
标题:通过保证金自我纠正快速大规模学习
链接:https://arxiv.org/abs/2606.02920

作者:Federico Di Gennaro, Alexander Shevchenko, Fanny Yang
摘要

【40】Forgetting is Not Erasure: Recovering Latent Knowledge via Transport Keys
标题:忘记并不是抹去:通过传输密钥恢复潜在知识
链接:https://arxiv.org/abs/2606.02860

作者:Archie Chaudhury
备注:Technical report showcasing results from transport keys
摘要

【41】Mitigating Spurious Correlations with Memorization-Guided Dataset De-Biasing
标题:通过小型化引导数据集去偏置来缓解虚假相关性
链接:https://arxiv.org/abs/2606.02830

作者:Arda Fazla, Abolfazl Hashemi
摘要

【42】$Ψ$-Bench: Evaluating Persona-Sensitive Influencing in Persuasive Dialogues
标题:Bench:评估说服性对话中的人物敏感影响力
链接:https://arxiv.org/abs/2606.02754

作者:Peixuan Han,Hongyi Du,Jiayu Liu,Yihang Sun,Yutong Liu,Jiaxuan You
摘要:Personalization is a crucial capability of modern language agents. However, current research primarily positions personalized agents as passive responders to user preferences, limiting their ability to interact with users and provide suggestions or guidance proactively. To systematically evaluate such proactive personalization in realistic interactions, we propose $Ψ$-Bench, a benchmark for assessing LLMs' ability to influence realistic users through conversation. We design three real-world interaction scenarios that involve persuasion in $Ψ$-Bench, and endow simulated clients with personal characteristics through explicit user profiles derived from dialogue histories. We evaluate 10 frontier LLMs on $Ψ$-Bench and find that while most models can produce coherent and reasonable arguments, even state-of-the-art models still leave considerable room for improvement in persuasion. We also find that providing access to client profiles yields an average performance gain of 18.24\%, highlighting the importance of user-specific information for effective persuasion. Overall, our work highlights persona-sensitive influencing as a challenging yet practical direction for evaluating and developing more proactive personalized LLM agents. Codes are available at: https://github.com/Hanpx20/Psi-Bench.

【43】SeeTraceAct: Visibility-Aware Latent Planning from Cross-Embodiment Demonstration Videos
标题:SeeTraceAct:跨实施例演示视频中的可见度感知潜在规划
链接:https://arxiv.org/abs/2606.02745

作者:Jaehyeon Son,Junhyun Kim,Kyle Kam,Jeremiah Coholich,Seok Joon Kim,Jinhoo Kim,Chris Dongjoo Kim,Jaemin Cho,Dieter Fox,Zsolt Kira
摘要:Vision-language-action models (VLAs) are promising general-purpose robot policies, but adapting them to new tasks typically requires costly task-specific teleoperation data. As an alternative, we study one-shot demo-conditioned VLAs, where a robot policy is conditioned on a single demonstration video of an unseen task. We find that existing end-to-end approaches often struggle when successful execution requires precisely localizing small target regions. To address this limitation, we propose SeeTraceAct, a demo-conditioned VLA framework that encourages precise spatial grounding through visibility-aware prediction of future end-effector traces. To enable reproducible evaluation with cross-embodiment demonstrations, we introduce and release RoboCasa-DC, a demo-conditioned extension of RoboCasa with episode-paired humanoid videos. Experiments on RoboCasa-DC and a real-world benchmark, where a Franka Panda arm is conditioned on human demonstrations, show that SeeTraceAct outperforms baselines, achieving the best success rate across all four RoboCasa-DC settings and improving real-world average success by 12.5 percentage points.

【44】See Less, Specify More: Visual Evidence Budgets for Generalizable VLAs
标题:少看,多指定:可概括的VLA的视觉证据预算
链接:https://arxiv.org/abs/2606.02735

作者:Yueh-Hua Wu,Tatsuya Matsushima,Kei Ota
备注 :Project page: https://s2.airoa.io
摘要:Generalization remains a central bottleneck for vision-language-action (VLA) models: under distractors, appearance shifts, and semantically similar tasks, the policy must often infer local execution details from coarse instructions while also deciding which parts of the image matter for control. We present S2 (See Less, Specify More), a framework for improving VLA generalization by training the executor under a cleaner interface. Specify More preserves the original instruction as a stable high-level goal while relabeling each trajectory into refined trajectory- and subtask-level language that disambiguates the current execution mode. Unlike native attention, See Less imposes an explicit visual evidence budget, training the executor to act from task-sufficient evidence rather than unconstrained visual context, without any region or mask annotation. This interface lets the executor follow detailed guidance without relying on distracting visual patches or resolving avoidable ambiguity on its own, and it remains compatible with off-the-shelf VLM planners through in-context learning. Across our main evaluation settings, S2 improves overall generalization metrics by changing the executor's learning problem: coarse instructions induce avoidable supervision aliasing, goal-preserving local guidance outperforms instruction replacement in our main ablations, and explicit evidence budgeting reduces dependence on broad visual context beyond efficiency considerations. Across eight real-robot tasks on TX-G2 (an AgiBot G2-compatible variant) and HSR, S2 raises mean subtask success from 54.2% to 79.0% over pi0.5. Together, these results suggest that VLA generalization improves when the executor is trained to act from informative local guidance and task-sufficient visual evidence, rather than recovering both from weak supervision.

【45】Locality Does Not Imply Reachability: Boundary Repair in Block-Sparse Causal Attention
标题:局部性并不影响可达性:块中的边界修复-稀疏原因注意力
链接:https://arxiv.org/abs/2606.02680

作者:Zhibo Yang
备注:36 pages, 5 figures, 16 tables
摘要:Sparse causal attention is usually described by sequence locality: nearby tokens should remain easy to access, while distant tokens may be dropped to reduce cost. This paper studies a mismatch between sequence locality and attention-graph reachability. In fixed block causal attention, two adjacent tokens can be disconnected in the attention graph at every depth. We formalize this boundary artifact through structural dependency sets: if every attention layer uses the same fixed block causal mask and all remaining operations are positionwise, a target representation can depend only on tokens in its own block prefix. This yields an architecture-level boundary-copy separation for a constructed K-way boundary-copy distribution, with top-1 accuracy upper bound 1/K and expected cross-entropy lower bound log K. We then derive phase-conditioned coverage functions showing that reachability depends on both source-target distance and the target's offset within its block. These coverage laws predict when a sparse pattern should fail, when a repair can help, and why sliding-window attention and boundary repair are not interchangeable. Boundary Bridge Attention is treated as a constructive witness: it preserves the fixed block path and adds zero-additional-parameter auxiliary causal edges near block boundaries using shared projections. Controlled 1024-token experiments show that gains concentrate in coverage-aligned diagnostics. As secondary external-validity evidence, a fixed-checkpoint 8K-token Qwen2.5-7B probe shows the same coverage-incomparability pattern. The contribution is a theory-guided diagnostic framework for locality-reachability mismatch in block-sparse causal attention, together with phase-conditioned coverage analysis and a minimal constructive repair.

【46】Before Fusion, Ask What to Keep: Contextual Calibration of Multimodal Signals
标题:Fusion之前,询问要保留什么:多模式信号的上下文校准
链接:https://arxiv.org/abs/2606.02679

作者:Jiyuan Liu,Liangwei Nathan Zheng,Wei Emma Zhang,Xinpei Wang,Weitong Chen
备注:11 pages, 7 figures, 9 tables
摘要:Multimodal systems often benefit from combining information across language, sound, and visual streams, but this benefit is not guaranteed. A modality that is useful for one input may become distracting for another, and local feature responses within the same modality can disagree with evidence from other sources. This work investigates how to adjust multimodal representations before they are merged by a downstream predictor. We develop a compact calibration module that compares each modality with the others at the summary level, extracts cues of cross-source support and conflict, and converts these cues into instance-wise and dimension-wise modulation signals. The calibration is applied to the original modality features rather than to already fused representations, enabling the model to suppress misleading components, preserve weak but useful evidence, and emphasize responses that are better supported by the current multimodal context. The module is designed as a plug-in component and can be attached to different fusion backbones without changing their prediction heads. Across five benchmarks covering sentiment understanding, action recognition, audio-visual event detection, and audio-visual emotion classification, the proposed pre-combination calibration strategy improves performance under both sequence-based and convolutional fusion settings. Additional analyses under modality removal, synthetic corruption, training dynamics, and feature-level visualization show that calibrating signals before fusion can reduce interference from unreliable modalities and produce more stable multimodal optimization.

【47】Anomalies in Multivariate Time Series Benchmarks Are Mostly Univariate
标题:多变量时间序列基准的异常大多是单变量的
链接:https://arxiv.org/abs/2606.02670

作者:Marc Pinet,Julien Cumin,Samuel Berlemont,Dominique Vaufreydaz
摘要:Many recent multivariate time series anomaly detection (MT-SAD) models incorporate cross-channel modeling, under the implicit assumption that the structure of anomalies may be spread across multiple channels. We evaluate this assumption on eight widely used public benchmarks by introducing a per-segment diagnostic framework that flags, for each labeled anomaly, whether at least one channel deviates individually from its normal history, whether the cross-channel correlation structure changes, or both. The framework shows that no crosschannel rupture occurs without an accompanying univariate deviation across a range of reasonable thresholds. A complementary metric also reveals that on six of the eight benchmarks, at least half of the labeled anomaly segments deviate univariately on 79% to 100% of their timesteps, reaching 100% on three of these datasets. To verify that our framework captures cross-channel structure when present, we construct synthetic data of phase-shifted sinusoidal channels with shared noise. Each anomalous segment is altered through one of two channelwise corruptions that preserve the per-channel marginal distribution while breaking cross-channel structure, and our framework correctly characterizes these segments as cross-channel-only. On these data, channel-dependent (CD) models successfully exploit the cross-channel signal whereas channel-independent (CI) ones fail. The CI/CD comparison of a recent SOTA detector on real benchmarks further confirms that CD modeling brings no measurable gain. We conclude that current MTSAD benchmarks are unsuitable for validating cross-channel modeling capabilities, and we call for the development of more structurally diverse evaluation sets. The code for this study is publicly available.

【48】AdaWeather: Adaptively Mixing Probabilistic Weather Forecasts with Logarithmic Regret
标题:AdaWeather:适应性地混合概率天气预报与对数遗憾
链接:https://arxiv.org/abs/2606.02663

作者:Saptarishi Dhanuka,Sarvesh Iyer,Manmeet Singh,Mihir More,Rushil Gupta,Dhruman Gupta,Parthasarathi Mukhopadhyay,Sandeep Juneja
备注 :36 pages, 16 figures. Submitted to arXiv. Forecast aggregation for probabilistic weather prediction using offline supervised learning and online prediction with expert advice. Includes theoretical regret guarantees and empirical evaluation on temperature forecasting. Submitted to NeurIPS 2026
摘要:Recent advances in machine learning have produced probabilistic weather forecasting models comparable to state-of-the-art numerical weather predictors. But no model consistently dominates spatio-temporally, and relative performance is highly context-dependent. This motivates adaptive methods for combining multiple forecasts to obtain improvements and robustness. While combined forecasts have been proposed in the literature, these are achieved either through supervised learning or through prediction with expert advice methods. We introduce AdaWeather, an adaptive framework that combines many probabilistic forecasts using both machine learning as well as mixture of experts to arrive at a unified improved probabilistic forecast. While traditional expert methods develop the regret bounds with respect to the best single expert in hindsight, we extend the algorithm and analysis to show our method has logarithmic regret compared to the best static mixture of experts in hindsight. Empirically, we focus on forecasting temperature, and observe improvements over existing methods.

【49】Samudra 2: Scaling Ocean Emulators across Resolutions
标题:Samudra 2:跨决议扩展海洋模拟器
链接:https://arxiv.org/abs/2606.02610

作者:Yuan Yuan,Jesse Rusak,Alexander Merose,Adam Subel,Pavel Perezhogin,Alistair Adcroft,Carlos Fernandez-Granda,Laure Zanna
摘要:Ocean general circulation models (OGCMs) are essential to climate science but computationally expensive, limiting ensemble size and forcing scenarios. Neural emulators promise orders-of-magnitude speedups, yet existing ocean emulators have not combined fine spatial resolution with multi-year autoregressive rollouts. Samudra, the first autoregressive neural ocean emulator to produce multi-decade global rollouts, is limited to $1^\circ$ resolution and exhibits two long-horizon failure modes: \emph{variance collapse}, the loss of temporal variability, and \emph{imprinting artifacts}, in which velocity patterns leak into deep-ocean fields. We present Samudra 2, which introduces a wider U-Net backbone with modified ConvNeXt-style blocks and a reduced block-internal expansion factor, together with a dynamic loss that reweights output channels according to their prediction errors, strengthening gradients for slow-evolving deep-ocean fields. At $1^\circ$, Samudra 2 increases upper-ocean global-mean temperature $R^2$ from 0.56 to 0.87 and reduces deep-ocean temperature error by roughly sevenfold. The same architecture scales to $1/2^\circ$ and $1/4^\circ$ over approximately 8-year autoregressive rollouts, recovering mesoscale eddies and sharp western boundary currents. Running on a single GPU, Samudra 2 enables larger ensembles for sea-level projections, ocean heat uptake, and climate variability studies. We provide code, documentation, and benchmark resources at https://openathena.ai/Ocean_Emulator/.

【50】Building Better Activation Oracles
标题:打造更好的激活先知
链接:https://arxiv.org/abs/2606.02609

作者:Jan Bauer,Celeste De Schamphelaere,Adam Karvonen,Niclas Luick,Neel Nanda
备注:Jan Bauer and Celeste De Schamphelaere contributed equally; author order determined randomly
摘要:Activation Oracles (AOs) are promising methods for interpreting residual stream activations. However, current AOs face important issues, such as hallucinations and vagueness. Additionally, text-inversion confounds make them hard to evaluate. To this end, we improve the Activation Oracle (AO) training regime in four ways: training on on-policy rollouts, improving the conversational dataset, feeding more layers and an improvement to the injection formula. The capability improvements are marginal, but quality of life improvements are quite substantial. In addition, we open source the first comprehensive evaluation suite for AO quality, which we call AObench. Overall, we hope that our work sets a foundation that helps improve AOs and other models in the paradigm of scalable, end-to-end interpretability.

【51】Geometry-Aware Tabular Diffusion
标题:具有几何意识的表格扩散
链接:https://arxiv.org/abs/2606.02607

作者:David Turtora Zagardo
备注:Accepted to the ICML 2026 main track. 24 pages, 10 figures, 22 tables
摘要:Tabular synthesis is critical for privacy-preserving sharing and augmentation, yet diffusion models rely on implicit mechanisms to capture inter-column relationships. We introduce Geometry-Aware Tabular Diffusion (GATD), which augments tabular diffusion denoisers with pairwise angles and lengths computed from column value differences and used as inputs and auxiliary targets. Our MLP instantiation achieves state-of-the-art benchmark performance while using 3.5x fewer parameters on average (up to 25x for classification tasks): on ten datasets, it wins 8/10 Shape, 7/10 Trend, and 9/10 downstream utility (F1/RMSE), reducing Shape and Trend error by 27% and 20%. Default loss weights transfer to GNN and Transformer denoisers, improving Shape on 27/30 and Trend on 25/30 architecture-dataset cells. A matched ablation shows supervision (not extra inputs or capacity) drives the gain. This shows explicit relational supervision is a portable inductive bias for tabular diffusion.

【52】Making Brain-Computer Interfaces More Secure
标题:使脑机接口更加安全
链接:https://arxiv.org/abs/2606.02597

作者:Md Fahimul Kabir Chowdhury,Gahangir Hossain
备注:Accepted and presented at IEEE World AI IoT Congress 2026
摘要:The development of brain-computer interfaces (BCIs) based on electroencephalograms (EEGs) has advanced significantly mainly to machine learning. Although the majority of earlier research has been on increasing classification accuracy, relatively little focus has been placed on security and robustness. According to recent research, EEG-based BCIs are susceptible to adversarial attacks, which can cause misdiagnosis due to minute, well-crafted disturbances. Evaluating model robustness against such perturbations is therefore critical for ensuring reliable deployment. In this study, we propose a lightweight custom Convolutional Neural Network (CNN) architecture to investigate adversarial robustness in EEG-based BCIs. The suggested method is assessed using two EEG datasets and contrasted with three novel CNN models tailored to EEG, namely EEGNet, DeepConvNet, and SleepEEGNet, under gradient-based adversarial attack scenarios. According to experimental findings, the suggested model continuously performs better in classification under adversarial perturbations compared to baseline models, indicating improved robustness. These findings highlight the potential of lightweight architectures for enhancing the reliability of EEG-based BCI systems under adversarial conditions.

【53】VESTA: Visual Exploration with Statistical Tool Agents
标题:VESTA:使用统计工具代理的视觉探索
链接:https://arxiv.org/abs/2606.00384

作者:William Rudman,Abhishek Divekar,Kanishk Jain,Sebastian Joseph,Stella S. R. Offner,Matthew Lease,Kyle Mahowald,Greg Durrett,Junyi Jessy Li
摘要 :Fitting quantitative models to data is a central step in scientific workflows, yet it remains one of the least automated. Recent agent-based systems leverage language and vision-language models (VLMs) to iteratively propose and refine statistical models, but these systems struggle on more challenging modeling tasks. To address these limitations, we introduce VESTA: Visual Exploration with Statistical Tool Agents, a framework that equips VLMs with a dynamically growing exploration toolkit to guide model refinement through data transformations, hypothesis-driven visualizations, and robust statistical tests. Unlike prior systems that rely on iterative critique alone, VESTA actively explores data before and during refinement by selecting or creating diagnostic tools, which accumulate in the model's context and can be reused later. We evaluate VESTA against established baselines in three toolkit configurations: no tools, static expert-written tools, and dynamic model-written tools. To support this evaluation, we introduce DAWN (Dataset for Automated Workflows and Numerical Modeling), a benchmark targeting distribution fitting and time series modeling with varying difficulty tiers, and culminating in real-world astronomy tasks including modeling initial mass functions and gravitational-wave chirp signals. We find that VESTA's dynamic tool creation outperforms prior agentic pipelines, with the largest gains on complex and domain-specific tasks. We further show that dynamically generated tools are substantially more sophisticated than those produced by existing visual tool-creation systems, covering more diagnostic categories per function and strongly preferring visual outputs that the VLM critic can reason over directly.

【54】PaintBench: Deterministic Evaluation of Precise Visual Editing
标题:PaintBench:精确视觉编辑的确定性评估
链接:https://arxiv.org/abs/2606.00188

作者:Kai Xu,Ellis Brown,Shrikar Madhu,Rob Fergus,He He,Saining Xie
备注:Project Page: https://paintbench.github.io/
摘要:While current multimodal models are proficient at open-ended visual editing, executing precise single-answer edits remains an important obstacle. To probe this challenge, we introduce PaintBench, a dynamically scalable benchmark targeting 20 fundamental precise visual editing operations across four categories: geometric transformation, structural manipulation, color change, and symbolic reasoning. Procedural generation with configurable complexity enables an effectively infinite, contamination-resistant evaluation suite, and deterministic pixel-level evaluation eliminates reliance on bias-prone judge models. Across 11 image editing models, we find overall low performance, with the current highest-performing industry leader scoring only 17.1% (mIoU). Task decomposition reveals especially challenging operation types (geometric transformation, most structural manipulation, formula-based color change) and model-specific specializations. Fine-grained benchmark diagnostics further show performance degradations induced by scene variations in object count, background complexity, color scheme, and edit-region size. To test generalization of PaintBench scores to applied task performance, we create a procedural, deterministic evaluation for data visualization editing (TinyGrafixBench) and find strong linear correlation with PaintBench scores ($R^2 = 0.91$, $p < 0.001$). Altogether, PaintBench provides a rigorous foundation for measuring and driving progress in precise multimodal visual editing.

【55】SEAOTTER: Sensor Embedded Autoencoding with One-Time Transcode for Efficient Reconstruction
标题:SEAOTTER:传感器嵌入式自动编码,一次性转码,用于高效重建
链接:https://arxiv.org/abs/2606.03940

作者:Dan Jacobellis,Neeraja J. Yadwadkar
摘要:In robotics systems, vast amounts of visual data are easily captured at high resolution using low-cost, low-power hardware. Yet, limited bandwidth and on-device compute resources prevent full utilization when transmitted via conventional codecs like JPEG/MPEG. Newer codecs, like AV1/AVIF, improve the rate-distortion trade-off, but demand far more resources for encoding, impractical without custom ASICs. Recent asymmetric autoencoders deliver high quality under extreme power and bandwidth constraints, but add prohibitive decoding cost and use bespoke formats that ignore decades of infrastructure built around standards like JPEG. To address these limitations, we introduce a compression framework for cloud robotics based on a Sensor Embedded Autoencoder paired with a One-Time Transcode for Efficient Reconstruction (SEAOTTER). Because the sensor, cloud, and consumer stages face very different power and bandwidth budgets, SEAOTTER combines the compactness of a learned latent with the broad usability of a standard JPEG file. Since naive transcoding degrades performance, we propose a learnable JPEG color and quantization transform that enables increased accuracy for global, dense, and vision-language-based perception. Using SEAOTTER, we train both general-purpose and task-aware transcoding pipelines for a pre-trained, frozen encoder. At a compression ratio of 200:1 and compared to AVIF, we observe 7 times faster encoding, 3.5 times faster decoding, and +8% ImageNet top-1 accuracy, while retaining compatibility with JPEG infrastructure. Our code is available at https://github.com/UT-SysML/seaotter .

【56】Beyond Gradient Descent: Adam for Analog Ising Machines
标题:超越梯度下降:模拟伊辛机的亚当
链接:https://arxiv.org/abs/2606.03917

作者:Stijn Van Vooren,Guy Van der Sande,Guy Verschaffelt
备注:submitted to Physical Review E
摘要:As Moore's law reaches its limits, Ising machines offer a promising alternative computing approach for difficult optimization problems. However, many analog, time-continuous Ising machines rely on gradient-descent-like dynamics to find solutions, which can limit speed and robustness. We investigate whether momentum and Adam optimization can improve these systems. Since these optimizers are traditionally formulated in discrete time, we derive continuous-time versions suitable for analog, time-continuous Ising-machine dynamics. On Max-Cut benchmarks, we find that Adam-based dynamics substantially reduce time-to-target and improve solution quality compared with gradient-descent- and momentum-based dynamics. We further introduce a first-order continuous-time approximation of Adam that is intended as a simpler starting point for future physical implementations and while performing better than the full Adam formulation in a continuous-time setting. We also study a purely algorithmic discrete-time setting, where the performance gap is reduced on easier problem instances, while the Adam-based update rule performs best on harder weighted problem instances. These results identify continuous-time Adam dynamics as a powerful design principle for analog Ising machines.

【57】Privacy-Robust Incrementality Measurement for Advertising Systems under Signal Loss
标题:信号丢失下广告系统的隐私稳健增量测量
链接:https://arxiv.org/abs/2606.03878

作者:Prashant Shekhar,Caroline Howard
摘要:Advertising platforms use randomized lift tests to measure incrementality, but privacy-preserving reporting systems degrade the observed signal through match-rate loss, linkability loss, attribution-window loss, aggregation-threshold suppression, randomized reporting noise, and segment-heterogeneous signal loss. This paper formulates privacy-constrained advertising measurement as a robust causal decision problem under the mentioned signal losses. Given a randomized experiment and an ambiguity set for privacy-induced degradation, the framework projects the observation-compatible fiber of clean/unfiltered experimental worlds onto the incrementality functional and returns certified, rejected, and unresolved decisions. The main result gives a sharp decision frontier. Reports outside the frontier support uniformly valid certification or rejection, whereas reports inside it contain too little information for any method to uniformly distinguish above-threshold incrementality from non-incrementality. Supporting results give finite-sample certification, sample-complexity guarantees, a minimax lower bound showing that signal loss reduces effective information, and a reporting-granularity tradeoff. On 2.0M Criteo Uplift rows and the 64K-row Hillstrom email experiment, clean conversion lift is positive in both datasets, with lifts 0.00112 and 0.00495, respectively. Population certification survives mild degradation in Criteo and severe degradation in Hillstrom, while all considered finite-sample stress settings in both datasets remain unresolved after simultaneous uncertainty and reporting noise are included. Overall, the research contributes a decision-theoretic layer for privacy-aware incrementality measurement whose output is the strongest causal-claim justified by degraded ads signals.

【58】Bregman meets Lévy: Stochastic mirror descent with heavy-tailed noise in continuous and discrete time
标题:布雷格曼遇见莱维:连续和离散时间中具有重尾噪音的随机镜像下降
链接:https://arxiv.org/abs/2606.03769

作者:Pierre-Louis Cauvin,Panayotis Mertikopoulos
备注:68 pages, 3 figures; to appear in the proceedings of ICML 2026
摘要:We study the robustness of stochastic mirror descent (SMD) under heavy-tailed noise, focusing on whether the method retains its convergence guarantees when run with infinite-variance stochastic gradient input. To address this question in a principled manner, we begin by introducing a continuous-time model of SMD as a stochastic differential equation (SDE) driven by a centered Lévy noise process with finite $p$-th order moments, $1 < p \leq 2$. This scheme -- which we call the Lévy mirror flow (LMF) -- arises naturally as the scaling limit of SMD in the presence of heavy-tailed noise. In particular, when $p < 2$ -- the heavy noise regime -- the trajectories of LMF generically exhibit jump discontinuities of arbitrary magnitude which, if frequent enough, lead to infinite variance. Nonetheless, despite this highly singular behavior, we show that LMF attains $ε$-optimality within $\mathcal{O}(ε^{-p/(p-1)})$ time in the convex case, and within $\mathcal{\tilde O}(ε^{-1/(p-1)})$ time for (relatively) strongly convex objectives. These guarantees provide a transparent characterization of the impact of frequent long jumps on the convergence of the process, and percolate to a series of matching discrete-time guarantees for several variants of SMD under heavy-tailed noise.

【59】Set-Preserving Calibration from Conformal P-Values to E-Values
标题:从保形P值到E值的集保持校准
链接:https://arxiv.org/abs/2606.03600

作者:Nabil Alami,Jad Zakharia,Souhaib Ben Taieb
摘要:Standard conformal prediction (CP) procedures are typically formulated in terms of p-values, but reliance on p-values alone limits flexibility, for example, when combining dependent evidence across models or data splits. Recent work has explored e-value formulations for conformal inference, yet a direct connection between p- and e-value formulations in CP has been missing, especially regarding their statistical efficiency. We first identify limitations of classical p-to-e calibrators in the CP setting, showing that they are not set-preserving and can lead to overly conservative prediction sets. To address this, we propose a novel P2E calibrator that converts conformal p-values into e-values without altering the prediction set induced by the original conformal p-value. We establish both theoretically and empirically that our calibrator can yield significant efficiency gains over existing p-to-e calibrators. This e-value formulation enables principled use of recent advances in e-value merging and randomization, where we demonstrate its impact in two applications: cross-conformal prediction (CCP), whose variants typically provide only approximate $1-2α$ coverage, and conformal aggregation (CA). In both cases, our e-value-based methods satisfy the desired $1-α$ coverage guarantee while improving efficiency over standard baselines. More broadly, our approach expands the flexibility of CP and opens new directions for efficient, distribution-free uncertainty quantification.

【60】Trajectory-Aware Node Contributions and the Limits of Static Controllability
标题:轨迹感知节点贡献和静态可控性的极限
链接:https://arxiv.org/abs/2606.03067

作者:Valentina Kuskova,Dmitry Zaytsev,Michael Coppedge
备注:11 pages, 1 figure
摘要:A recurring data mining task in complex networks is to determine how individual nodes contribute to system behavior. Existing approaches rely on either static-graph centralities or control-theoretic quantities such as controllability Gramians, which assume linear, time-invariant dynamics. Estimated systems, however, are typically nonlinear and time-varying. We define "emergent contribution (EC)," a finite-horizon measure of a node's dynamical leverage: the metric-weighted energy of its impulse response accumulated along the system trajectory. Computed from the Jacobians of any differentiable model, EC is estimator-agnostic and reduces exactly to average controllability in the linear, time-invariant limit. Our contribution is a characterization of when the two measures agree and diverge. Using a controlled synthetic family with known ground-truth contribution, we construct a phase diagram spanning nonlinearity, regime structure, persistence, and perturbation amplitude. EC and average controllability agree under static or smoothly drifting dynamics and both track ground truth. Divergence emerges under persistent regime switching, is strongest under persistent sign reversal, and disappears when the sign reversal is removed. At extreme perturbation amplitudes, both measures degrade, identifying the limits of local linearization. We place five estimated real systems from several domains within this phase space. Their placement serves as a diagnostic of when EC provides information beyond static controllability and therefore justifies its additional computational cost. On one panel examined in depth, a twenty-seed retraining ensemble reveals a robust variance--leverage dissociation: nodes whose perturbations propagate widely despite low within-system variance, which is not recovered by static centralities nor variance-based summaries.

【61】A Fast Screening Approach for High-dimensional Outcomes and High-dimensional Predictors
标题:多维结果和多维预测因子的快速筛选方法
链接:https://arxiv.org/abs/2606.03018

作者:Hongju Park,Zhenyao Ye,Shuo Chen
备注:38 pages, 2 figures
摘要:Modeling interactions among multimodal, high-dimensional data is intrinsically challenging due to ultra-high dimensionality and complex dependence structure with high level noise. Screening methods are effective for reducing dimensionality, but most existing approaches shrink only the predictor space while retaining all outcomes. In cross-modal analyses, different outcomes often select different predictor subsets, so the union remains large and the response dimension is unchanged, limiting the practical benefit of screening. This gives rise to heavy computational burdens and poor interpretability. To address these limitations, we propose a new screening framework, Graph Independence Dual Screening (GIDS), which simultaneously reduces the dimensionality of response variables and predictors. We design computationally efficient algorithms that facilitate downstream selection procedures, improving accuracy and scalability, and establish supporting theoretical results. Extensive simulation studies demonstrate that GIDS outperforms existing methods that screen only predictors. To illustrate its utility, we applied GIDS to the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, analyzing interactions between genome-wide 865,353 DNA methylation and 49,386 transcriptomic variables. GIDS reduced the feature space to approximately 9,000 CpGs and 2,000 transcripts, uncovering blockwise interaction structures: clusters of CpG sites and gene transcripts with strong associations. These findings not only improve computational tractability but also yield interpretable biological insights, highlighting coordinated regulatory mechanisms underlying Alzheimer's disease.

【62】Scalable Derivative Gaussian Processes via Exact Gradient Reduction
标题:通过精确梯度约简实现可扩展的求导高斯过程
链接:https://arxiv.org/abs/2606.02909

作者:Hyunseok Seung,Matthias Katzfuss
摘要 :Gradient observations can substantially improve Gaussian process (GP) surrogates, particularly in high-dimensional settings where function evaluations are expensive. However, exact inference with $n$ function values and $n$ full gradients in $d$ dimensions scales cubically in the joint state size, imposing an intractable $\mathcal{O}(n^3 d^3)$ computational bottleneck. We introduce TERA, a highly scalable derivative GP method based on target-specific exact gradient reduction. We prove that for stationary kernels, the gradient components orthogonal to the directions connecting the target and conditioning points are conditionally independent of the target function value; consequently, the exact conditional density is fully characterized by at most $m^2$ directional derivatives once a conditioning set of size $m$ is specified. By using these reduced, dimension-free conditionals as local factors in a Vecchia approximation, TERA effectively decouples $n$ and $d$ from the dense matrix inversion. This reduces the per-target evaluation cost to $\mathcal{O}(dm^2 + m^6)$ time and $\mathcal{O}(dm^2 + m^4)$ memory, leaving the underlying derivative GP model mathematically unchanged. Empirical evaluations demonstrate that TERA achieves state-of-the-art predictive accuracy while operating orders of magnitude faster than standard derivative GPs. Crucially, both computation time and peak GPU memory remain essentially flat with respect to $d$, enabling highly scalable inference in high-dimensional spaces.

【63】Neutrino Fingerprints: Image-Based Encodings of IceCube Events for CNN Direction Reconstruction
标题:Neutrino指纹:用于CNN方向重建的IceCube事件基于图像的编码
链接:https://arxiv.org/abs/2606.02788

作者:Floriano Tori,Brecht Verbeken,Vincent Ginis
备注:6 pages, 1 figure
摘要:Reconstructing the direction of incoming neutrinos in the IceCube Neutrino Observatory is an important problem in astrophysics. The public IceCube--Neutrinos in Deep Ice Kaggle competition provided 140 million simulated events to benchmark reconstruction techniques. To address this challenge from a novel perspective we introduce neutrino fingerprints compact $72 \times 72 \times 3$ images in which each pixel represents a single detector, with pulse timing and charge statistics encoded as color channels. This representation transforms sparse, irregular pulse data into dense images suitable for convolutional processing. Our ResNet18 model achieves a mean angular error of $1.10$ rad, indicating that convolutional networks trained on fingerprints rival more complex architectures while offering an effective, interpretable baseline for IceCube event reconstruction.

【64】ScoreStop: Gradient-based early stopping using functional score tests
标题:ScoreStop:使用功能评分测试基于学生的提前停止
链接:https://arxiv.org/abs/2606.02740

作者:Oliver J. Hines,Christian L. Hines
备注:Presented at the International Conference on Machine Learning 2026 Workshop on Hypothesis Testing
摘要:Gradient boosted decision trees require a stopping rule to avoid overfitting. The standard rule monitors a validation loss and stops if the loss fails to improve for a fixed patience period. However, the patience parameter has no interpretable scale and validation losses can be noisy or implicitly defined by a user-specified gradient. We propose ScoreStop, a gradient-based early-stopping rule that casts the stopping decision at each iteration as a test of the null hypothesis that the current predictor is the population risk minimizer. We use a functional score test, computed on validation data, with a statistic that is scale-invariant in the update direction, with a known asymptotic distribution under the null. Because our test uses gradients rather than loss values, the same construction applies to implicit losses such as LambdaRank, and data-dependent losses such as Cox regression via influence functions. In synthetic experiments and real-data benchmarks, we show that ScoreStop is competitive with loss-based methods.

【65】State-Coupled Volatility in Latent Dynamical Systems: Recovery Under Partial Observation
标题:潜在动态系统中的状态耦合波动性:部分观察下的复苏
链接:https://arxiv.org/abs/2606.02664

作者:Imani Beckett
备注:40 pages, 16 figures
摘要:Latent state-space models are widely used to study partially observed dynamical systems, yet most formulations assume that process variability is independent of latent-state position. In many biological, behavioral, and physiological systems, however, variability may depend systematically on the underlying dynamical state, producing structured stochasticity that is not captured by constant-variance models. We introduce a state-coupled stochastic volatility framework in which latent process variance depends on displacement from a latent equilibrium. To estimate this relationship under partial observation, we develop a particle expectation-maximization procedure combining bootstrap particle filtering and backward trajectory smoothing. The model includes a coupling parameter, $γ$, that quantifies the strength of association between latent-state position and process variability. A large-scale simulation benchmark evaluated recovery and detection performance across varying coupling strengths, observation noise levels, trajectory lengths, and persistence regimes. The proposed framework consistently reduced recovery bias relative to an observed-state heteroskedastic proxy, with the largest improvements occurring under strong coupling. Recovery performance improved with increasing latent persistence, while detection performance remained competitive across a broad range of conditions and became increasingly advantageous as observation noise increased. Taken together, the results demonstrate that state-coupled volatility can be identified and estimated under partial observation when latent-state structure is explicitly modeled. The framework provides a practical methodological foundation for studying state-dependent variability and evaluating whether structured stochasticity contributes information about system dynamics beyond that contained in mean-state trajectories alone.

【66】Wavelet as Tokenizer: Preliminary Results on a Shared Wavelet Token Schema for Natural Signals
标题:作为令牌化器的子波:自然信号共享子波令牌模式的初步结果
链接:https://arxiv.org/abs/2606.02631

作者:Shenghao Ding
备注:12 pages, 3 figures
摘要:This paper studies whether audio, images, and video can share a common wavelet token schema rather than relying on separate modality-specific latent grids. It introduces a preliminary continuous-token model built around a one-level Haar DWT/IDWT frontend, a shared coefficient-token layout, optional structural metadata, lightweight modality value adapters, and a shared token-wise encoder-decoder trunk. On Speech Commands, EuroSAT RGB, and DAVIS 2017 data, a dense shared model reaches 39.92 dB audio, 29.37 dB image, and 23.93 dB video PSNR. A matched-rate sweep under continuous latent scalar budgets indicates that the visual gains are not explained solely by latent capacity, while also showing that additive metadata embeddings are not a universal source of improvement. Finally, fixed-rate energy selection provides a strong non-parametric baseline: energy_global improves average PSNR over uniform selection by 16.73 dB for audio, 16.90 dB for images, and 15.86 dB for video under compressed keep ratios. Masked sparse training reaches 34.45 dB video PSNR with 50% of dense tokens. The results support a unified wavelet token schema and sparse token interface, while stopping short of establishing a universal discrete vocabulary.

【67】TadA-Bench: A Million-Variant Benchmark for Future-Round Discovery Toward Agentic Protein Engineering
标题:TadA-Bench:未来一轮发现的百万种变体基准,以满足新生儿蛋白质工程的需求
链接:https://arxiv.org/abs/2606.02624

作者:Jin Gao,Juntu Zhao,Zirui Zeng,Jiaqi Shen,Junhao Shi,Dukun Zhao,Yuming Lu,Dequan Wang
备注:Accepted at the 43rd International Conference on Machine Learning (ICML 2026). Data: https://huggingface.co/datasets/JinGao/TadABench-1M . Code: https://github.com/shiyegao/TadABench-1M
摘要 :AI for scientific discovery is entering an agentic era, where protein-engineering systems are expected to prioritize future wet-lab experiments rather than merely fit static measurements. We introduce TadA-Bench, a million-variant wet-lab replay benchmark from 31 TadA directed-evolution rounds for future-round discovery toward agentic protein engineering. TadA-Bench preserves the campaign chronology and defines a fixed-data replay task: given earlier experimental rounds, models rank variants that appear only in later rounds. It provides aligned DNA, RNA, and protein views, and uses Seq2Graph, a graph-based label-unification pipeline, to reconcile noisy enrichment measurements into consistent cross-round activity labels. Random-split controls show strong interpolation, but future-round ranking and finite-budget candidate selection are much weaker. Controlled analyses suggest that evolutionary coverage is more informative than local data density, positioning TadA-Bench as a reproducible wet-lab replay substrate for future-round discovery toward agentic protein engineering; the data and code are released on Hugging Face and GitHub.

【68】High-Dimensional Latents Should Be Diagnosed Through Phase Structure
标题:应通过相结构诊断多维潜伏
链接:https://arxiv.org/abs/2606.02600

作者:Alejandro Ascarate,Leo Lebrat,Rodrigo Santa Cruz,Clinton Fookes,Olivier Salvado
备注:9+22 pages, 4+6 figures, under review
摘要:We study autoencoder and variational-autoencoder latent spaces through the lens of spin-glass theory. The paper has two components. First, we formalize a latent-space spin-glass dictionary: for a fixed decoder, the reconstruction term together with a hyperspherical coordinates prior induces a Hamiltonian on the latent sphere, where latent coordinates play the role of continuous spins and the prior acts as an external magnetic field. This allows us to import operational spin-glass diagnostics -- overlap distributions, susceptibility, and block-spin coarse-graining -- to detect ordered, disordered, and edge-of-stability phases in trained latent representations. Second, we show that deliberately driving the latent system toward the edge-of-stability of the topological trivialization regime has concrete downstream consequences. In generation, hyperspherical compression improves the reconstruction-generation trade-off on CIFAR-10 and CelebA64, yielding lower self-FID while preserving or improving reconstruction. In anomaly detection, the same semi-ordered latent geometry improves both fully unsupervised and conditional OOD detection, including real-world Mars Rover and Galaxy Zoo datasets, as well as CIFAR-10/100 and Imagenette-based OOD benchmarks. We therefore advocate a phase-aware evaluation paradigm for AEs/VAEs, in which spin-glass observables complement standard ML metrics and expose the latent regimes that underlie downstream success or failure in many cases.

机器翻译由腾讯交互翻译提供,仅供参考

点击“阅读原文”获取带摘要的学术速递

Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/197237