点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!
cs.LG 方向,今日共计224篇
大模型相关(28篇)
【1】Rethinking the Trust Region in LLM Reinforcement Learning
标题:LLM强化学习中的信任域的重新思考
链接:https://arxiv.org/abs/2602.04879
作者:Penghui Qi,Xiangxin Zhou,Zichen Liu,Tianyu Pang,Chao Du,Min Lin,Wee Sun Lee
摘要:强化学习(RL)已成为微调大型语言模型(LLM)的基石,而邻近策略优化(PPO)则是事实上的标准算法。尽管它的普遍存在,我们认为,在PPO的核心比率裁剪机制是结构不适合LLM固有的大词汇。PPO基于采样令牌的概率比来约束策略更新,该概率比用作真实策略发散的噪声单样本蒙特卡罗估计。这产生了一种次优的学习动态:对低概率令牌的更新被过度惩罚,而高概率令牌中潜在的灾难性变化则受到约束不足,导致训练效率低下和不稳定。为了解决这个问题,我们提出了发散邻近策略优化(DPPO),它基于对策略发散的直接估计(例如,总变差或KL)。为了避免巨大的内存占用,我们引入了有效的二进制和Top-K近似,以捕捉基本的分歧,可以忽略不计的开销。广泛的实证评估表明,DPPO实现了优越的训练稳定性和效率相比,现有的方法,提供了一个更强大的基础,基于RL的LLM微调。
摘要:Reinforcement learning (RL) has become a cornerstone for fine-tuning Large Language Models (LLMs), with Proximal Policy Optimization (PPO) serving as the de facto standard algorithm. Despite its ubiquity, we argue that the core ratio clipping mechanism in PPO is structurally ill-suited for the large vocabularies inherent to LLMs. PPO constrains policy updates based on the probability ratio of sampled tokens, which serves as a noisy single-sample Monte Carlo estimate of the true policy divergence. This creates a sub-optimal learning dynamic: updates to low-probability tokens are aggressively over-penalized, while potentially catastrophic shifts in high-probability tokens are under-constrained, leading to training inefficiency and instability. To address this, we propose Divergence Proximal Policy Optimization (DPPO), which substitutes heuristic clipping with a more principled constraint based on a direct estimate of policy divergence (e.g., Total Variation or KL). To avoid huge memory footprint, we introduce the efficient Binary and Top-K approximations to capture the essential divergence with negligible overhead. Extensive empirical evaluations demonstrate that DPPO achieves superior training stability and efficiency compared to existing methods, offering a more robust foundation for RL-based LLM fine-tuning.
【2】Team, Then Trim: An Assembly-Line LLM Framework for High-Quality Tabular Data Generation
标题:团队,然后修剪:用于高质量表格数据生成的流水线LLM框架
链接:https://arxiv.org/abs/2602.04785
作者:Congjing Zhang,Ryan Feng Lin,Ruoxuan Bao,Shuai Huang
摘要:虽然表格数据是许多现实世界机器学习(ML)应用程序的基础,但获取高质量的表格数据通常是劳动密集型的,而且成本高昂。受限于观测数据的稀缺性,表格数据集往往表现出严重的缺陷,如类别不平衡,选择偏差和低保真度。为了应对这些挑战,基于大型语言模型(LLM)的最新进展,本文介绍了Team-then-Trim(T$^2$),这是一个通过LLM协作团队合成高质量表格数据的框架,随后是严格的三阶段插件数据质量控制(QC)管道。在T$^2$中,表格数据生成被概念化为一个制造过程:由领域知识指导的专用LLM负责顺序生成不同的数据组件,并且生成的产品,即,综合数据,在多个QC维度上进行系统评估。模拟和真实世界数据集的实证结果表明,T$^2$在生成高质量表格数据方面优于最先进的方法,突出了其在直接数据收集实际上不可行时支持下游模型的潜力。
摘要:While tabular data is fundamental to many real-world machine learning (ML) applications, acquiring high-quality tabular data is usually labor-intensive and expensive. Limited by the scarcity of observations, tabular datasets often exhibit critical deficiencies, such as class imbalance, selection bias, and low fidelity. To address these challenges, building on recent advances in Large Language Models (LLMs), this paper introduces Team-then-Trim (T$^2$), a framework that synthesizes high-quality tabular data through a collaborative team of LLMs, followed by a rigorous three-stage plug-in data quality control (QC) pipeline. In T$^2$, tabular data generation is conceptualized as a manufacturing process: specialized LLMs, guided by domain knowledge, are tasked with generating different data components sequentially, and the resulting products, i.e., the synthetic data, are systematically evaluated across multiple dimensions of QC. Empirical results on both simulated and real-world datasets demonstrate that T$^2$ outperforms state-of-the-art methods in producing high-quality tabular data, highlighting its potential to support downstream models when direct data collection is practically infeasible.
【3】Less Finetuning, Better Retrieval: Rethinking LLM Adaptation for Biomedical Retrievers via Synthetic Data and Model Merging
标题:更少的微调,更好的检索:通过合成数据和模型合并重新思考生物医学检索器的LLM适应
链接:https://arxiv.org/abs/2602.04731
作者:Sameh Khattab,Jean-Philippe Corbeil,Osman Alperen Koraş,Amin Dada,Julian Friedrich,François Beaulieu,Paul Vozila,Jens Kleesiek
备注:Preprint
摘要:检索增强生成(RAG)已经成为大语言模型(LLM)的基础,改善知识更新和减少幻觉。最近,基于LLM的回收模型已经显示出RAG应用的最新性能。然而,在如何将通用LLM适应于有效的特定领域检索器方面,特别是在生物医学等专业领域,仍有几个技术方面有待探索。我们提出了合成训练合并(STM),一个模块化的框架,增强解码器只有LLM与合成硬否定,检索提示优化和模型合并。对MTEB基准测试中12个医疗和一般任务的子集进行的实验表明,STM将特定任务的专家提高了23.5%(平均7.5%),并产生了合并模型,这些模型在没有大量预训练的情况下优于单个专家和强基线。我们的研究结果展示了一个可扩展的,有效的路径,将一般LLM变成高性能的,领域专用的检索器,保留一般领域的能力,同时擅长于专门的任务。
摘要:Retrieval-augmented generation (RAG) has become the backbone of grounding Large Language Models (LLMs), improving knowledge updates and reducing hallucinations. Recently, LLM-based retriever models have shown state-of-the-art performance for RAG applications. However, several technical aspects remain underexplored on how to adapt general-purpose LLMs into effective domain-specific retrievers, especially in specialized domains such as biomedicine. We present Synthesize-Train-Merge (STM), a modular framework that enhances decoder-only LLMs with synthetic hard negatives, retrieval prompt optimization, and model merging. Experiments on a subset of 12 medical and general tasks from the MTEB benchmark show STM boosts task-specific experts by up to 23.5\% (average 7.5\%) and produces merged models that outperform both single experts and strong baselines without extensive pretraining. Our results demonstrate a scalable, efficient path for turning general LLMs into high-performing, domain-specialized retrievers, preserving general-domain capabilities while excelling on specialized tasks.
【4】Inference-Time Backdoors via Hidden Instructions in LLM Chat Templates
标题:通过LLM聊天模板中隐藏说明的推理时间后门
链接:https://arxiv.org/abs/2602.04653
作者
:Ariel Fogel,Omer Hofman,Eilon Cohen,Roman Vainshtein
摘要:开放权重语言模型越来越多地用于生产环境,带来了新的安全挑战。在这种情况下,一个突出的威胁是后门攻击,其中对手将隐藏的行为嵌入在特定条件下激活的语言模型中。以前的工作假设对手可以访问训练管道或部署基础设施。我们提出了一种新的攻击面,既不需要,它利用聊天模板。聊天模板是可执行的Jinja 2程序,在每次推理调用时调用,在用户输入和模型处理之间占据特权位置。我们表明,分发带有恶意修改模板的模型的对手可以在不修改模型权重、毒化训练数据或控制运行时基础设施的情况下植入推理时间后门。我们通过构建针对两个目标的模板后门来评估此攻击向量:降低事实准确性和诱导攻击者控制的URL的排放,并将其应用于跨越7个家族和4个推理引擎的18个模型。在触发条件下,事实准确性平均从90%下降到15%,而攻击者控制的URL发出的成功率超过80%;良性输入没有显示出可测量的退化。后门可以在推理运行时进行泛化,并逃避最大的开放权重分发平台应用的所有自动安全扫描。这些结果将聊天模板确立为LLM供应链中可靠且目前不设防的攻击面。
摘要:Open-weight language models are increasingly used in production settings, raising new security challenges. One prominent threat in this context is backdoor attacks, in which adversaries embed hidden behaviors in language models that activate under specific conditions. Previous work has assumed that adversaries have access to training pipelines or deployment infrastructure. We propose a novel attack surface requiring neither, which utilizes the chat template. Chat templates are executable Jinja2 programs invoked at every inference call, occupying a privileged position between user input and model processing. We show that an adversary who distributes a model with a maliciously modified template can implant an inference-time backdoor without modifying model weights, poisoning training data, or controlling runtime infrastructure. We evaluated this attack vector by constructing template backdoors targeting two objectives: degrading factual accuracy and inducing emission of attacker-controlled URLs, and applied them across eighteen models spanning seven families and four inference engines. Under triggered conditions, factual accuracy drops from 90% to 15% on average while attacker-controlled URLs are emitted with success rates exceeding 80%; benign inputs show no measurable degradation. Backdoors generalize across inference runtimes and evade all automated security scans applied by the largest open-weight distribution platform. These results establish chat templates as a reliable and currently undefended attack surface in the LLM supply chain.
【5】QUATRO: Query-Adaptive Trust Region Policy Optimization for LLM Fine-tuning
标题:QUATRO:LLM微调的查询自适应信任区域政策优化
链接:https://arxiv.org/abs/2602.04620
作者:Doyeon Lee,Eunyi Lyou,Hyunsoo Cho,Sookyung Kim,Joonseok Lee,Jaemoo Choi
摘要:基于GRPO风格的强化学习(RL)的LLM微调算法最近越来越受欢迎。然而,依赖于启发式信任域近似,它们可能导致脆弱的优化行为,因为全局重要性比裁剪和分组归一化无法调节重要性比落在裁剪范围之外的样本。我们提出了查询自适应信任域策略优化(QUATRO),它直接执行信任域约束,通过原则性优化。这产生了一个明确的和可解释的目标,使明确的控制策略更新和稳定的,熵控制的优化,与稳定剂条款产生本质上从确切的信任区域制定。在不同的数学推理基准上进行了经验验证,QUATRO在增加的策略陈旧性和积极的学习率下显示出稳定的训练,在整个训练过程中保持了良好的控制熵。
摘要:GRPO-style reinforcement learning (RL)-based LLM fine-tuning algorithms have recently gained popularity. Relying on heuristic trust-region approximations, however, they can lead to brittle optimization behavior, as global importance-ratio clipping and group-wise normalization fail to regulate samples whose importance ratios fall outside the clipping range. We propose Query-Adaptive Trust-Region policy Optimization (QUATRO), which directly enforces trust-region constraints through a principled optimization. This yields a clear and interpretable objective that enables explicit control over policy updates and stable, entropy-controlled optimization, with a stabilizer terms arising intrinsically from the exact trust-region formulation. Empirically verified on diverse mathematical reasoning benchmarks, QUATRO shows stable training under increased policy staleness and aggressive learning rates, maintaining well-controlled entropy throughout training.
【6】Focus-LIME: Surgical Interpretation of Long-Context Large Language Models via Proxy-Based Neighborhood Selection
标题:Focus-LIME:通过基于代理的邻居选择对长上下文大型语言模型进行外科解释
链接:https://arxiv.org/abs/2602.04607
作者:Junhao Liu,Haonan Yu,Zhenyu Yan,Xin Zhang
摘要:随着大型语言模型(LLM)的扩展以处理大量的上下文窗口,实现外科手术特征级解释对于法律审计和代码调试等高风险任务至关重要。然而,现有的局部模型不可知的解释方法面临着一个关键的困境,在这些情况下:基于特征的方法遭受归因稀释由于高特征维数,从而无法提供忠实的解释。在本文中,我们提出了焦点石灰,一个由粗到细的框架,旨在恢复手术解释的易处理性。Focus-LIME利用代理模型来管理扰动邻域,允许目标模型仅在优化的上下文中执行细粒度属性。在长上下文基准测试上的实证评估表明,该方法使手术解释切实可行,并为用户提供了忠实的解释。
摘要:As Large Language Models (LLMs) scale to handle massive context windows, achieving surgical feature-level interpretation is essential for high-stakes tasks like legal auditing and code debugging. However, existing local model-agnostic explanation methods face a critical dilemma in these scenarios: feature-based methods suffer from attribution dilution due to high feature dimensionality, thus failing to provide faithful explanations. In this paper, we propose Focus-LIME, a coarse-to-fine framework designed to restore the tractability of surgical interpretation. Focus-LIME utilizes a proxy model to curate the perturbation neighborhood, allowing the target model to perform fine-grained attribution exclusively within the optimized context. Empirical evaluations on long-context benchmarks demonstrate that our method makes surgical explanations practicable and provides faithful explanations to users.
【7】Mixture of Masters: Sparse Chess Language Models with Player Routing
标题:大师混合:具有玩家路由的稀疏国际象棋语言模型
链接:https://arxiv.org/abs/2602.04447
作者:Giacomo Frisoni,Lorenzo Molfetta,Davide Freddi,Gianluca Moro
摘要:现代国际象棋语言模型是经过数千名高评价个人进行的数百万场比赛训练的密集Transformers。然而,这些单一的网络往往会崩溃成模式平均的行为,其中风格的边界是模糊的,罕见的,但有效的策略被抑制。为了对抗同质化,我们引入了Mixture-of-Masters(MoM),这是第一个国际象棋专家混合模型,其中小型GPT专家模仿世界级大师。每个专家都是通过自我监督学习和强化学习的组合进行训练的,并由国际象棋特定的奖励指导。对于每一步棋,一个事后可学习的门控网络根据游戏状态选择最合适的角色进行引导,允许MoM动态地切换其风格。塔尔的进攻性职业还是彼得罗辛的防守坚固。当在看不见的标准游戏上与Stockfish进行评估时,MoM的表现优于密集的个人专家网络和在聚合数据上训练的流行GPT基线,同时确保生成多样性,控制性和可解释性。
摘要:Modern chess language models are dense transformers trained on millions of games played by thousands of high-rated individuals. However, these monolithic networks tend to collapse into mode-averaged behavior, where stylistic boundaries are blurred, and rare but effective strategies are suppressed. To counteract homogenization, we introduce Mixture-of-Masters (MoM), the first chess mixture-of-experts model with small-sized GPT experts emulating world-class grandmasters. Each expert is trained with a combination of self-supervised learning and reinforcement learning guided by chess-specific rewards. For each move, a post-hoc learnable gating network selects the most appropriate persona to channel depending on the game state, allowing MoM to switch its style dynamically$--$e.g., Tal's offensive vocation or Petrosian's defensive solidity. When evaluated against Stockfish on unseen standard games, MoM outperforms both dense individual expert networks and popular GPT baselines trained on aggregated data, while ensuring generation variety, control, and interpretability.
【8】EMA Policy Gradient: Taming Reinforcement Learning for LLMs with EMA Anchor and Top-k KL
标题:EMA政策梯度:使用EMA Anchor和Top-k KL驯服LLM的强化学习
链接
:https://arxiv.org/abs/2602.04417
作者:Lunjun Zhang,Jimmy Ba
摘要:强化学习(RL)使大型语言模型(LLM)能够获得越来越复杂的推理和代理行为。在这项工作中,我们提出了两个简单的技术,以改善政策梯度算法的LLM。首先,我们用指数移动平均(EMA)代替RL期间的固定锚策略,类似于深度Q学习中的目标网络。其次,我们引入Top-k KL估计器,它允许精确KL和采样KL之间的灵活插值。我们推导出使用EMA锚的稳定性条件;此外,我们证明了我们的Top-k KL估计量在任何k处都可以产生无偏KL值和无偏梯度,同时带来精确KL的好处。当与GRPO相结合时,这两种技术(EMA-PG)导致显著的性能提升。在数学推理方面,它允许R1蒸馏的Qwen-1.5B在OlympiadBench上达到53.9%,而GRPO为50.8%。在代理RL域上,基于Qwen-3B,EMA-PG在7个搜索引擎问答数据集上平均提高了33.3%的GRPO,其中HotpotQA上为29.7% $\rightarrow $44.1%,2 WikiMultiHopQA上为27.4% $\rightarrow $40.1%。总体而言,我们表明EMA-PG是一种简单、有原则且强大的扩展LLM强化学习的方法。代码:https://github.com/LunjunZhang/ema-pg
摘要:Reinforcement Learning (RL) has enabled Large Language Models (LLMs) to acquire increasingly complex reasoning and agentic behaviors. In this work, we propose two simple techniques to improve policy gradient algorithms for LLMs. First, we replace the fixed anchor policy during RL with an Exponential Moving Average (EMA), similar to a target network in deep Q-learning. Second, we introduce Top-k KL estimator, which allows for flexible interpolation between exact KL and sampled KL. We derive the stability conditions for using EMA anchor; moreover, we show that our Top-k KL estimator yields both unbiased KL values and unbiased gradients at any k, while bringing the benefits of exact KL. When combined with GRPO, the two techniques (EMA-PG) lead to a significant performance boost. On math reasoning, it allows R1-distilled Qwen-1.5B to reach 53.9% on OlympiadBench compared to 50.8% by GRPO. On agentic RL domains, with Qwen-3B base, EMA-PG improves GRPO by an average of 33.3% across 7 datasets of Q&A with search engines, including 29.7% $\rightarrow$ 44.1% on HotpotQA, 27.4% $\rightarrow$ 40.1% on 2WikiMultiHopQA. Overall, we show that EMA-PG is a simple, principled, and powerful approach to scaling RL for LLMs. Code: https://github.com/LunjunZhang/ema-pg
【9】On the use of LLMs to generate a dataset of Neural Networks
标题:使用LLM生成神经网络数据集
链接:https://arxiv.org/abs/2602.04388
作者:Nadia Daoudi,Jordi Cabot
摘要:神经网络越来越多地用于支持决策。为了验证它们的可靠性和适应性,研究人员和实践者提出了各种工具和方法,用于NN代码验证,重构和迁移等任务。这些工具在保证神经网络架构的正确性和可维护性方面发挥着至关重要的作用,有助于防止实现错误,简化模型更新,并确保复杂网络可以可靠地扩展和重用。然而,由于缺乏允许系统评估的公开多样的神经网络数据集,评估其有效性仍然具有挑战性。为了解决这一差距,我们利用大型语言模型(LLM)自动生成神经网络数据集,可以作为验证的基准。该数据集旨在涵盖各种架构组件,并处理多种输入数据类型和任务。总共生成了608个样本,每个样本都符合一组精确的设计选择。为了进一步确保它们的一致性,我们使用静态分析和符号跟踪来验证生成的网络的正确性。我们公开了数据集,以支持社区推进神经网络可靠性和适应性的研究。
摘要:Neural networks are increasingly used to support decision-making. To verify their reliability and adaptability, researchers and practitioners have proposed a variety of tools and methods for tasks such as NN code verification, refactoring, and migration. These tools play a crucial role in guaranteeing both the correctness and maintainability of neural network architectures, helping to prevent implementation errors, simplify model updates, and ensure that complex networks can be reliably extended and reused. Yet, assessing their effectiveness remains challenging due to the lack of publicly diverse datasets of neural networks that would allow systematic evaluation. To address this gap, we leverage large language models (LLMs) to automatically generate a dataset of neural networks that can serve as a benchmark for validation. The dataset is designed to cover diverse architectural components and to handle multiple input data types and tasks. In total, 608 samples are generated, each conforming to a set of precise design choices. To further ensure their consistency, we validate the correctness of the generated networks using static analysis and symbolic tracing. We make the dataset publicly available to support the community in advancing research on neural network reliability and adaptability.
【10】Beyond KL Divergence: Policy Optimization with Flexible Bregman Divergences for LLM Reasoning
标题:超越KL分歧:通过灵活的Bregman分歧进行政策优化,用于LLM推理
链接:https://arxiv.org/abs/2602.04380
作者:Rui Yuan,Mykola Khandoga,Vinay Kumar Sankarapu
摘要:组相对策略优化(GRPO)及其变体等策略优化方法在数学推理和代码生成任务上取得了很好的效果。尽管对奖励处理策略和训练动态进行了广泛的探索,但所有现有的基于组的方法都只使用KL散度进行策略正则化,而没有探索散度函数的选择。我们介绍了基于组的镜像策略优化(GBMPO),这是一个框架,它将基于组的策略优化扩展到灵活的Bregman分歧,包括手工设计的替代方案(概率空间中的L2)和学习的神经镜像映射。在GSM 8 K数学推理上,手工设计的ProbL 2-GRPO达到了86.7%的准确率,比GRPO博士的基线提高了+5.5分。在MBPP代码生成中,神经镜像映射达到60.1-60.8%的pass@1,随机初始化已经获得了大部分好处。虽然进化策略元学习提供了边际准确性改进,但其主要价值在于方差减少($\pm$0.2 vs $\pm$0.6)和效率提高(MBPP上的响应缩短15%),这表明神经镜像映射的随机初始化对于大多数实际应用来说已经足够了。这些结果建立发散选择作为一个关键的,以前未探索的设计尺寸在基于组的政策优化LLM推理。
摘要:Policy optimization methods like Group Relative Policy Optimization (GRPO) and its variants have achieved strong results on mathematical reasoning and code generation tasks. Despite extensive exploration of reward processing strategies and training dynamics, all existing group-based methods exclusively use KL divergence for policy regularization, leaving the choice of divergence function unexplored. We introduce Group-Based Mirror Policy Optimization (GBMPO), a framework that extends group-based policy optimization to flexible Bregman divergences, including hand-designed alternatives (L2 in probability space) and learned neural mirror maps. On GSM8K mathematical reasoning, hand-designed ProbL2-GRPO achieves 86.7% accuracy, improving +5.5 points over the Dr. GRPO baseline. On MBPP code generation, neural mirror maps reach 60.1-60.8% pass@1, with random initialization already capturing most of the benefit. While evolutionary strategies meta-learning provides marginal accuracy improvements, its primary value lies in variance reduction ($\pm$0.2 versus $\pm$0.6) and efficiency gains (15% shorter responses on MBPP), suggesting that random initialization of neural mirror maps is sufficient for most practical applications. These results establish divergence choice as a critical, previously unexplored design dimension in group-based policy optimization for LLM reasoning.
【11】Multi-scale hypergraph meets LLMs: Aligning large language models for time series analysis
标题:多尺度超图与LLM相结合:调整大型语言模型以进行时间序列分析
链接:https://arxiv.org/abs/2602.04369
作者:Zongjiang Shang,Dongliang Cui,Binqing Wu,Ling Chen
备注:Accepted by ICLR2026
摘要:最近,在利用预训练的大型语言模型(LLM)进行时间序列分析方面取得了巨大成功。其核心思想在于有效地对齐自然语言和时间序列之间的模态。然而,没有充分考虑自然语言和时间序列的多尺度结构,导致LLM功能的利用不足。为此,我们提出了MSH-LLM,这是一种多尺度超图方法,可以将大型语言模型用于时间序列分析。具体地说,设计了一种超细化机制来增强时间序列语义空间的多尺度语义信息。然后,跨模态对齐(CMA)模块被引入到自然语言和时间序列之间的模态对齐在不同的尺度。此外,一个混合的提示(MoP)机制被引入,以提供上下文信息,并提高LLM理解时间序列的多尺度时间模式的能力。在5个不同应用程序的27个真实数据集上的实验结果表明,MSH-LLM达到了最先进的结果。
摘要
:Recently, there has been great success in leveraging pre-trained large language models (LLMs) for time series analysis. The core idea lies in effectively aligning the modality between natural language and time series. However, the multi-scale structures of natural language and time series have not been fully considered, resulting in insufficient utilization of LLMs capabilities. To this end, we propose MSH-LLM, a Multi-Scale Hypergraph method that aligns Large Language Models for time series analysis. Specifically, a hyperedging mechanism is designed to enhance the multi-scale semantic information of time series semantic space. Then, a cross-modality alignment (CMA) module is introduced to align the modality between natural language and time series at different scales. In addition, a mixture of prompts (MoP) mechanism is introduced to provide contextual information and enhance the ability of LLMs to understand the multi-scale temporal patterns of time series. Experimental results on 27 real-world datasets across 5 different applications demonstrate that MSH-LLM achieves the state-of-the-art results.
【12】Revisiting Prompt Sensitivity in Large Language Models for Text Classification: The Role of Prompt Underspecification
标题:重新审视文本分类的大型语言模型中的提示敏感性:提示未规范的作用
链接:https://arxiv.org/abs/2602.04297
作者:Branislav Pecher,Michal Spiegel,Robert Belanec,Jan Cegin
摘要:大型语言模型(LLM)被广泛用作zero-shot和Few-Shot分类器,其中任务行为在很大程度上通过提示来控制。越来越多的作品已经观察到,LLM是敏感的提示变化,小的变化导致大的变化性能。然而,在许多情况下,灵敏度的调查是使用未指定的提示,提供最小的任务指令和弱约束模型的输出空间。在这项工作中,我们认为,所观察到的提示敏感性的一个重要部分可以归因于提示规格不足。我们系统地研究和比较的敏感性未充分指定的提示和提示,提供具体的指示。利用性能分析,logit分析,和线性探测,我们发现,欠指定的提示表现出较高的性能方差和较低的logit值相关的令牌,而警告提示少遭受这样的问题。然而,线性探测分析表明,提示规格不足的影响只有边际影响的内部LLM表示,而不是出现在最后一层。总的来说,我们的研究结果强调了在调查和减轻即时敏感性时需要更加严格。
摘要:Large language models (LLMs) are widely used as zero-shot and few-shot classifiers, where task behaviour is largely controlled through prompting. A growing number of works have observed that LLMs are sensitive to prompt variations, with small changes leading to large changes in performance. However, in many cases, the investigation of sensitivity is performed using underspecified prompts that provide minimal task instructions and weakly constrain the model's output space. In this work, we argue that a significant portion of the observed prompt sensitivity can be attributed to prompt underspecification. We systematically study and compare the sensitivity of underspecified prompts and prompts that provide specific instructions. Utilising performance analysis, logit analysis, and linear probing, we find that underspecified prompts exhibit higher performance variance and lower logit values for relevant tokens, while instruction-prompts suffer less from such problems. However, linear probing analysis suggests that the effects of prompt underspecification have only a marginal impact on the internal LLM representations, instead emerging in the final layers. Overall, our findings highlight the need for more rigour when investigating and mitigating prompt sensitivity.
【13】Contextual Drag: How Errors in the Context Affect LLM Reasoning
标题:上下文拖累:上下文中的错误如何影响LLM推理
链接:https://arxiv.org/abs/2602.04288
作者:Yun Cheng,Xingyu Zhu,Haoyu Zhao,Sanjeev Arora
摘要:大型语言模型(LLM)的许多自我改进管道的核心是假设模型可以通过反思过去的错误来改进。我们研究了一种现象,称为上下文拖动:在上下文中的存在下,失败的尝试偏向结构相似的错误的后代。在对8个推理任务的11个专有和开放权重模型的评估中,上下文阻力导致10-20%的性能下降,并且具有严重上下文阻力的模型中的迭代自我完善可能会崩溃为自我恶化。使用树编辑距离的结构分析表明,随后的推理轨迹继承结构上相似的错误模式从上下文。我们证明,无论是外部反馈,也不成功的自我验证足以消除这种影响。虽然缓解策略,如回退行为微调和上下文去噪产生部分改进,但它们无法完全恢复基线性能,将上下文拖动定位为当前推理架构中的持久故障模式。
摘要:Central to many self-improvement pipelines for large language models (LLMs) is the assumption that models can improve by reflecting on past mistakes. We study a phenomenon termed contextual drag: the presence of failed attempts in the context biases subsequent generations toward structurally similar errors. Across evaluations of 11 proprietary and open-weight models on 8 reasoning tasks, contextual drag induces 10-20% performance drops, and iterative self-refinement in models with severe contextual drag can collapse into self-deterioration. Structural analysis using tree edit distance reveals that subsequent reasoning trajectories inherit structurally similar error patterns from the context. We demonstrate that neither external feedback nor successful self-verification suffices to eliminate this effect. While mitigation strategies such as fallback-behavior fine-tuning and context denoising yield partial improvements, they fail to fully restore baseline performance, positioning contextual drag as a persistent failure mode in current reasoning architectures.
【14】Agent-Omit: Training Efficient LLM Agents for Adaptive Thought and Observation Omission via Agentic Reinforcement Learning
标题:代理忽略:通过抽象强化学习训练高效的LLM代理进行自适应思维和观察忽略
链接:https://arxiv.org/abs/2602.04284
作者:Yansong Ning,Jun Fang,Naiqiang Tan,Hao Liu
备注:Under Review
摘要:在多轮代理环境交互过程中管理代理思维和观察是提高代理效率的一种新兴策略。然而,现有的研究平等对待整个互动轨迹,忽略了思想的必要性和观察效用的变化,跨回合。为此,我们首先进行定量调查,思考和观察如何影响代理的有效性和效率。基于我们的研究结果,我们提出了Agent-Omit,一个统一的训练框架,使LLM代理自适应地忽略冗余的想法和观察。具体来说,我们首先合成少量的冷启动数据,包括单轮和多轮遗漏的情况下,微调代理的遗漏行为。此外,我们引入了一个遗漏意识的代理强化学习方法,结合双采样机制和定制的遗漏奖励,以激励代理的自适应遗漏能力。在理论上,我们证明了我们的遗漏策略的偏差是上界的KL-分歧。实验结果表明,我们所构建的Agent-Omit-8B可以获得与七个前沿LLM代理相媲美的性能,并实现了最佳的效率-效率权衡比七个有效的LLM代理方法。我们的代码和数据可在https://github.com/usail-hkust/Agent-Omit上获得。
摘要:Managing agent thought and observation during multi-turn agent-environment interactions is an emerging strategy to improve agent efficiency. However, existing studies treat the entire interaction trajectories equally, overlooking the thought necessity and observation utility varies across turns. To this end, we first conduct quantitative investigations into how thought and observation affect agent effectiveness and efficiency. Based on our findings, we propose Agent-Omit, a unified training framework that empowers LLM agents to adaptively omit redundant thoughts and observations. Specifically, we first synthesize a small amount of cold-start data, including both single-turn and multi-turn omission scenarios, to fine-tune the agent for omission behaviors. Furthermore, we introduce an omit-aware agentic reinforcement learning approach, incorporating a dual sampling mechanism and a tailored omission reward to incentivize the agent's adaptive omission capability. Theoretically, we prove that the deviation of our omission policy is upper-bounded by KL-divergence. Experimental results on five agent benchmarks show that our constructed Agent-Omit-8B could obtain performance comparable to seven frontier LLM agent, and achieve the best effectiveness-efficiency trade-off than seven efficient LLM agents methods. Our code and data are available at https://github.com/usail-hkust/Agent-Omit.
【15】Thickening-to-Thinning: Reward Shaping via Human-Inspired Learning Dynamics for LLM Reasoning
标题:从厚到薄:通过LLM推理的人类启发学习动力学进行奖励塑造
链接:https://arxiv.org/abs/2602.04265
作者:Wenze Lin,Zhen Yang,Xitai Jiang,Pony Ma,Gao Huang
摘要:具有可验证奖励的强化学习(RLVR)已经成为增强大型语言模型(LLM)推理的一种有前途的范例。然而,它经常遇到的挑战,如熵崩溃,过度冗长,和探索困难的问题。至关重要的是,现有的奖励计划未能区分解决问题过程中广泛搜索的需要和掌握知识所需的效率。在这项工作中,我们引入了T2 T(Thickening-to-Thinning),这是一个受人类学习过程启发的动态奖励框架。具体来说,它实现了一种双阶段机制:(1)在不正确的尝试中,T2 T激励“加厚”(更长的轨迹)以扩大搜索空间并探索新的解决方案路径;(2)在实现正确性后,它转向“细化”,施加长度惩罚以阻止冗余,从而培养模型信心并结晶推理能力。在Qwen系列和Deepseek模型上对数学基准(MATH-500,AIME,AMC)进行的大量实验表明,T2 T的性能显著优于标准GRPO和最近的基线,实现了卓越的性能。
摘要:Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a promising paradigm for enhancing reasoning in Large Language Models (LLMs). However, it frequently encounters challenges such as entropy collapse, excessive verbosity, and insufficient exploration for hard problems. Crucially, existing reward schemes fail to distinguish between the need for extensive search during problem-solving and the efficiency required for mastered knowledge. In this work, we introduce T2T(Thickening-to-Thinning), a dynamic reward framework inspired by human learning processes. Specifically, it implements a dual-phase mechanism: (1) On incorrect attempts, T2T incentivizes "thickening" (longer trajectories) to broaden the search space and explore novel solution paths; (2) Upon achieving correctness, it shifts to "thinning", imposing length penalties to discourage redundancy, thereby fostering model confidence and crystallizing reasoning capabilities. Extensive experiments on mathematical benchmarks (MATH-500, AIME, AMC) across Qwen-series and Deepseek models demonstrate that T2T significantly outperforms standard GRPO and recent baselines, achieving superior performance.
【16】Steering LLMs via Scalable Interactive Oversight
标题:通过可扩展的交互式监督指导LLM
链接:https://arxiv.org/abs/2602.04210
作者:Enyu Zhou,Zhiheng Xi,Long Ma,Zhihao Zhang,Shihan Dou,Zhikai Lei,Guoteng Wang,Rui Zheng,Hang Yan,Tao Gui,Qi Zhang,Xuanjing Huang
摘要:随着大型语言模型越来越多地自动化复杂的长期任务,例如Vibe编码,出现了监督缺口。虽然模型在执行方面表现出色,但由于领域专业知识不足,难以表达精确的意图,以及无法可靠地验证复杂的输出,用户通常难以有效地指导它们。它在可扩展的监督方面提出了一个关键挑战:使人类能够负责任地引导人工智能系统执行超出其自身指定或验证能力的任务。为了解决这个问题,我们提出了可扩展的交互式监督,这是一个框架,可以将复杂的意图分解为可管理决策的递归树,以放大人类的监督。我们的系统不依赖于开放式提示,而是在每个节点上消除了低负担的反馈,并递归地将这些信号聚合成精确的全局指导。在Web开发任务中验证,我们的框架使非专家能够生成专家级的产品需求文档,实现了54%的对齐改进。至关重要的是,我们证明了这个框架可以通过强化学习来优化,只使用在线用户反馈,为在AI扩展时保持人类控制提供了一条实用的途径。
摘要:As Large Language Models increasingly automate complex, long-horizon tasks such as \emph{vibe coding}, a supervision gap has emerged. While models excel at execution, users often struggle to guide them effectively due to insufficient domain expertise, the difficulty of articulating precise intent, and the inability to reliably validate complex outputs. It presents a critical challenge in scalable oversight: enabling humans to responsibly steer AI systems on tasks that surpass their own ability to specify or verify. To tackle this, we propose Scalable Interactive Oversight, a framework that decomposes complex intent into a recursive tree of manageable decisions to amplify human supervision. Rather than relying on open-ended prompting, our system elicits low-burden feedback at each node and recursively aggregates these signals into precise global guidance. Validated in web development task, our framework enables non-experts to produce expert-level Product Requirement Documents, achieving a 54\% improvement in alignment. Crucially, we demonstrate that this framework can be optimized via Reinforcement Learning using only online user feedback, offering a practical pathway for maintaining human control as AI scales.
【17】SCALE: Self-uncertainty Conditioned Adaptive Looking and Execution for Vision-Language-Action Models
标题:SCALE:视觉-语言-动作模型的自我不确定性条件适应性外观和执行
链接:https://arxiv.org/abs/2602.04208
作者:Hyeonbeom Choi,Daechul Ahn,Youhan Lee,Taewook Kang,Seongwon Cho,Jonghyun Choi
备注:20 pages, 8 figures
摘要:视觉-语言-动作(VLA)模型已经成为通用机器人控制的一个有前途的范例,测试时间缩放(TTS)越来越受到关注,以提高训练之外的鲁棒性。然而,现有的TTS方法的VLA需要额外的培训,验证,和多个前向传递,使他们不切实际的部署。此外,他们只干预动作解码,同时保持视觉表征固定-在感知模糊下不足,重新考虑如何感知与决定做什么一样重要。为了解决这些限制,我们提出了SCALE,一个简单的推理策略,共同调制视觉感知和行动的基础上“自我不确定性”,灵感来自不确定性驱动的探索主动推理理论,不需要额外的训练,没有验证,只有一个单一的向前通过。SCALE扩展了在高度不确定性下的感知和行动探索,同时专注于在不同条件下实现自信的自适应执行时的利用。模拟和真实世界的基准测试表明,SCALE提高了最先进的VLA,并优于现有的TTS方法,同时保持单遍效率。
摘要:Vision-Language-Action (VLA) models have emerged as a promising paradigm for general-purpose robotic control, with test-time scaling (TTS) gaining attention to enhance robustness beyond training. However, existing TTS methods for VLAs require additional training, verifiers, and multiple forward passes, making them impractical for deployment. Moreover, they intervene only at action decoding while keeping visual representations fixed-insufficient under perceptual ambiguity, where reconsidering how to perceive is as important as deciding what to do. To address these limitations, we propose SCALE, a simple inference strategy that jointly modulates visual perception and action based on 'self-uncertainty', inspired by uncertainty-driven exploration in Active Inference theory-requiring no additional training, no verifier, and only a single forward pass. SCALE broadens exploration in both perception and action under high uncertainty, while focusing on exploitation when confident-enabling adaptive execution across varying conditions. Experiments on simulated and real-world benchmarks demonstrate that SCALE improves state-of-the-art VLAs and outperforms existing TTS methods while maintaining single-pass efficiency.
【18】Natural Language Instructions for Scene-Responsive Human-in-the-Loop Motion Planning in Autonomous Driving using Vision-Language-Action Models
标题:使用视觉-语言-动作模型的自动驾驶中场景响应式人机在环运动规划的自然语言指令
链接:https://arxiv.org/abs/2602.04184
作者:Angel Martinez-Sanchez,Parthib Roy,Ross Greer
摘要:基于指令的驾驶,即乘客语言指导轨迹规划,要求车辆在运动之前理解意图。然而,大多数先前的预防跟随计划器依赖于模拟或固定的命令词汇表,限制了现实世界的泛化。doScenes是第一个将自由形式的指令(具有引用性)与nuScenes地面实况运动联系起来的真实世界数据集,可以实现预防条件规划。在这项工作中,我们适应OpenEMMA,一个开源的基于MLLM的端到端驾驶框架,摄取前置摄像头视图和自我状态,并输出10步速度曲率轨迹,以这种设置,在doScenes上呈现可重复的预防条件基线,并调查人类指令提示对预测驾驶行为的影响。我们在OpenEMMA的视觉语言界面中集成doScenes指令作为浏览器风格的提示,从而在轨迹生成之前实现语言调节。评估849注释的场景使用ADE,我们观察到,指令调节大大提高了鲁棒性,防止极端基线故障,平均ADE产生98.7%的减少。当这些离群值被删除时,指令仍然会影响轨迹对齐,措辞良好的提示将ADE提高了5.1%。我们使用这个分析来讨论什么是OpenEMMA框架的“好”指令。我们发布评估提示和脚本,为预防意识规划建立可重复的基线。GitHub:https://github.com/Mi3-Lab/doScenes-VLM-Planning
摘要
:Instruction-grounded driving, where passenger language guides trajectory planning, requires vehicles to understand intent before motion. However, most prior instruction-following planners rely on simulation or fixed command vocabularies, limiting real-world generalization. doScenes, the first real-world dataset linking free-form instructions (with referentiality) to nuScenes ground-truth motion, enables instruction-conditioned planning. In this work, we adapt OpenEMMA, an open-source MLLM-based end-to-end driving framework that ingests front-camera views and ego-state and outputs 10-step speed-curvature trajectories, to this setting, presenting a reproducible instruction-conditioned baseline on doScenes and investigate the effects of human instruction prompts on predicted driving behavior. We integrate doScenes directives as passenger-style prompts within OpenEMMA's vision-language interface, enabling linguistic conditioning before trajectory generation. Evaluated on 849 annotated scenes using ADE, we observe that instruction conditioning substantially improves robustness by preventing extreme baseline failures, yielding a 98.7% reduction in mean ADE. When such outliers are removed, instructions still influence trajectory alignment, with well-phrased prompts improving ADE by up to 5.1%. We use this analysis to discuss what makes a "good" instruction for the OpenEMMA framework. We release the evaluation prompts and scripts to establish a reproducible baseline for instruction-aware planning. GitHub: https://github.com/Mi3-Lab/doScenes-VLM-Planning
【19】BPDQ: Bit-Plane Decomposition Quantization on a Variable Grid for Large Language Models
标题:BPDQ:大型语言模型可变网格上的位平面分解量化
链接:https://arxiv.org/abs/2602.04163
作者:Junyu Chen,Jungang Li,Jing Xiong,Wenjie Wang,Qingyao Yang,He Xiao,Zhen Li,Taiqiang Wu,Mengzhao Chen,Zhen Peng,Chaofan Tao,Long Shi,Hongxia Yang,Ngai Wong
摘要:在资源受限的部署中,大型语言模型(LLM)推理通常受到内存占用和内存带宽的限制,这使得量化成为有效服务的基本技术。虽然后训练量化(PTQ)在4比特时保持高保真度,但在2-3比特时会恶化。基本上,现有方法强制形状不变的量化网格(例如,每个组的UINT 2)的固定均匀间隔,严重限制了误差最小化的可行集。为了解决这个问题,我们提出了位平面分解量化(BPDQ),它通过位平面和标量系数构建一个可变的量化网格,并使用近似的二阶信息迭代地细化它们,同时逐步补偿量化误差,以最大限度地减少输出差异。在2位模式下,BPDQ能够在单个RTX 3090上提供Qwen2.5- 72 B,GSM 8 K准确率为83.85%(16位时为90.83%)。此外,我们提供的理论分析表明,可变网格扩展了可行集,并且量子化过程始终与Hessian诱导几何中的优化目标保持一致。代码:github.com/KingdalfGoodman/BPDQ.
摘要:Large language model (LLM) inference is often bounded by memory footprint and memory bandwidth in resource-constrained deployments, making quantization a fundamental technique for efficient serving. While post-training quantization (PTQ) maintains high fidelity at 4-bit, it deteriorates at 2-3 bits. Fundamentally, existing methods enforce a shape-invariant quantization grid (e.g., the fixed uniform intervals of UINT2) for each group, severely restricting the feasible set for error minimization. To address this, we propose Bit-Plane Decomposition Quantization (BPDQ), which constructs a variable quantization grid via bit-planes and scalar coefficients, and iteratively refines them using approximate second-order information while progressively compensating quantization errors to minimize output discrepancy. In the 2-bit regime, BPDQ enables serving Qwen2.5-72B on a single RTX 3090 with 83.85% GSM8K accuracy (vs. 90.83% at 16-bit). Moreover, we provide theoretical analysis showing that the variable grid expands the feasible set, and that the quantization process consistently aligns with the optimization objective in Hessian-induced geometry. Code: github.com/KingdalfGoodman/BPDQ.
【20】Rethinking Perplexity: Revealing the Impact of Input Length on Perplexity Evaluation in LLMs
标题:重新思考困惑:揭示输入长度对LLM困惑评估的影响
链接:https://arxiv.org/abs/2602.04099
作者:Letian Cheng,Junyan Wang,Yan Gao,Elliott Wen,Ting Dang,Hong Jia
摘要:复杂度是一种广泛采用的评估大型语言模型(LLM)预测质量的指标,通常用作下游评估的参考指标。然而,最近的证据表明,困惑可能是不可靠的,特别是当使用不相关的长输入,提高基准测试和系统部署的关注。虽然先前的努力采用了选择性输入过滤和策划的数据集,输入长度对困惑的影响还没有从系统的角度进行系统研究,输入长度很少被视为影响公平性和效率的第一类系统变量。在这项工作中,我们通过引入LengthBenchmark来缩小这一差距,LengthBenchmark是一个系统意识的评估框架,它明确地集成了输入长度,评估协议设计和系统级成本,在两个评分协议(直接累积和固定窗口滑动)下评估代表性的LLM。与之前只关注面向准确性的指标的工作不同,LengthBenchmark还测量了延迟、内存占用和评估成本,从而将预测指标与部署现实联系起来。我们进一步将量化的变量不是作为主要贡献,而是作为鲁棒性检查,表明长度引起的偏差在全精度和压缩模型中都存在。这种设计理清了评估逻辑、量化和输入长度的影响,并证明了长度偏差是一种破坏公平跨模型比较的普遍现象。我们的分析产生了两个关键的观察结果:(i)滑动窗口评估一致地提高了短输入的性能,以及(ii)随着评估段长度的增长,全精度和量化模型似乎都实现了收益。
摘要:Perplexity is a widely adopted metric for assessing the predictive quality of large language models (LLMs) and often serves as a reference metric for downstream evaluations. However, recent evidence shows that perplexity can be unreliable, especially when irrelevant long inputs are used, raising concerns for both benchmarking and system deployment. While prior efforts have employed selective input filtering and curated datasets, the impact of input length on perplexity has not been systematically studied from a systems perspective and input length has rarely been treated as a first-class system variable affecting both fairness and efficiency. In this work, we close this gap by introducing LengthBenchmark, a system-conscious evaluation framework that explicitly integrates input length, evaluation protocol design, and system-level costs, evaluating representative LLMs under two scoring protocols (direct accumulation and fixed window sliding) across varying context lengths. Unlike prior work that focuses solely on accuracy-oriented metrics, LengthBenchmark additionally measures latency, memory footprint, and evaluation cost, thereby linking predictive metrics to deployment realities. We further incorporate quantized variants not as a main contribution, but as robustness checks, showing that length-induced biases persist across both full-precision and compressed models. This design disentangles the effects of evaluation logic, quantization, and input length, and demonstrates that length bias is a general phenomenon that undermines fair cross-model comparison. Our analysis yields two key observations: (i) sliding window evaluation consistently inflates performance on short inputs, and (ii) both full-precision and quantized models appear to realise gains as the evaluated segment length grows.
【21】CoRe: Context-Robust Remasking for Diffusion Language Models
标题:CoRe:扩散语言模型的上下文鲁棒重建模
链接:https://arxiv.org/abs/2602.04096
作者:Kevin Zhai,Sabbir Mollah,Zhenyi Wang,Mubarak Shah
摘要:屏蔽扩散模型(Masked Diffusion Models,MDM)中的标准解码受到上下文刚性的阻碍:基于瞬时高置信度保留令牌,通常忽略早期预测缺乏完整的上下文。这就产生了级联效应,最初的不一致会误导剩余的一代。现有的修正策略试图通过依赖于静态置信度分数来缓解这一点,但这些信号本质上是短视的;不一致的标记可能对模型本身表现出自信。我们提出了上下文鲁棒重掩码(Core),一个用于推理时间修正的无训练框架。而不是信任静态令牌概率,Core通过探测它们对目标屏蔽上下文扰动的敏感性来识别上下文脆弱令牌。我们将修订形式化为针对上下文变化的鲁棒优化目标,并有效地逼近该目标,以优先考虑不稳定的令牌进行修订。在LLaDA-8B-Base上,Core在推理和代码基准上提供了一致的改进,优于计算匹配基线,并将MBPP提高了9.2个百分点。
摘要:Standard decoding in Masked Diffusion Models (MDMs) is hindered by context rigidity: tokens are retained based on transient high confidence, often ignoring that early predictions lack full context. This creates cascade effects where initial inconsistencies misguide the remaining generation. Existing revision strategies attempt to mitigate this by relying on static confidence scores, but these signals are inherently myopic; inconsistent tokens can appear confident to the model itself. We propose Context-Robust Remasking (CoRe), a training-free framework for inference-time revision. Rather than trusting static token probabilities, CoRe identifies context-brittle tokens by probing their sensitivity to targeted masked-context perturbations. We formalize revision as a robust optimization objective over context shifts and efficiently approximate this objective to prioritize unstable tokens for revision. On LLaDA-8B-Base, CoRe delivers consistent improvements across reasoning and code benchmarks, outperforming compute-matched baselines and improving MBPP by up to 9.2 percentage points.
【22】Scaling In-Context Online Learning Capability of LLMs via Cross-Episode Meta-RL
标题:通过跨事件Meta-RL扩展LLM的上下文在线学习能力
链接:https://arxiv.org/abs/2602.04089
作者:Xiaofeng Lin,Sirou Zhu,Yilei Chen,Mingyu Chen,Hejian Sang,Ioannis Paschalidis,Zhipeng Wang,Aldo Pacchiano,Xuezhou Zhang
摘要:大型语言模型(LLM)在所有与任务相关的信息都预先可用的情况下,如静态预测和后续问题中,可以实现强大的性能。然而,许多现实世界的决策任务本质上是在线的:关键信息必须通过交互获得,反馈是延迟的,有效的行为需要随着时间的推移平衡信息收集和利用。虽然上下文学习可以在没有权重更新的情况下进行自适应,但现有的LLM通常难以在这种环境中可靠地利用上下文交互体验。在这项工作中,我们表明,这种限制可以通过培训来解决。我们介绍了ORBIT,一个多任务,多集元强化学习框架,训练LLM从上下文中的交互中学习。在元训练之后,一个相对较小的开源模型(Qwen 3 - 14 B)在完全看不见的环境中展示了显著改进的上下文在线学习,与GPT-5.2的性能相匹配,并大大优于标准RL微调。缩放实验进一步揭示了与模型大小一致的收益,这表明推理时决策代理的学习空间很大。复制论文中结果的代码可以在https://github.com/XiaofengLin7/ORBIT上找到。
摘要:Large language models (LLMs) achieve strong performance when all task-relevant information is available upfront, as in static prediction and instruction-following problems. However, many real-world decision-making tasks are inherently online: crucial information must be acquired through interaction, feedback is delayed, and effective behavior requires balancing information collection and exploitation over time. While in-context learning enables adaptation without weight updates, existing LLMs often struggle to reliably leverage in-context interaction experience in such settings. In this work, we show that this limitation can be addressed through training. We introduce ORBIT, a multi-task, multi-episode meta-reinforcement learning framework that trains LLMs to learn from interaction in context. After meta-training, a relatively small open-source model (Qwen3-14B) demonstrates substantially improved in-context online learning on entirely unseen environments, matching the performance of GPT-5.2 and outperforming standard RL fine-tuning by a large margin. Scaling experiments further reveal consistent gains with model size, suggesting significant headroom for learn-at-inference-time decision-making agents. Code reproducing the results in the paper can be found at https://github.com/XiaofengLin7/ORBIT.
【23】Stroke Lesions as a Rosetta Stone for Language Model Interpretability
标题:中风损伤是语言模型可解释性的罗塞塔石碑
链接:https://arxiv.org/abs/2602.04074
作者:Julius Fridriksson,Roger D. Newman-Norlund,Saeed Ahmadi,Regan Willis,Nadra Salman,Kalil Warren,Xiang Guan,Yong Yang,Srihari Nelakuditi,Rutvik Desai,Leonardo Bonilha,Jeff Charney,Chris Rorden
备注:45 pages, 17 figures
摘要:大型语言模型(LLM)已经取得了显着的能力,但方法来验证哪些模型组件是真正必要的语言功能仍然有限。目前的可解释性方法依赖于内部度量,缺乏外部验证。在这里,我们提出了脑LLM统一模型(BLUM),一个框架,利用病变症状映射,黄金标准建立因果脑行为关系超过一个世纪,作为外部参考结构评估LLM扰动效应。使用来自慢性卒中后失语症患者(N = 410)的数据,我们训练了从行为错误配置文件预测脑损伤位置的脑损伤-病变模型,将系统扰动应用于Transformer层,对扰动的LLM和人类患者进行相同的临床评估,并将LLM错误配置文件投影到人类病变空间。LLM错误配置文件与人类错误配置文件足够相似,在67%的图片命名条件下,预测病变与错误匹配的人类中的实际病变相对应(p < 10^{-23})和68.3%的句子完成条件(p < 10^{-61}),语义优势错误映射到腹侧流损伤模式,语音优势错误映射到背侧流模式。这些发现为LLM的可解释性开辟了一条新的方法论途径,其中临床神经科学提供了外部验证,建立了人类病变-症状映射作为评估人工语言系统的参考框架,并激励直接调查行为对齐是否反映了共享的计算原理。
摘要:Large language models (LLMs) have achieved remarkable capabilities, yet methods to verify which model components are truly necessary for language function remain limited. Current interpretability approaches rely on internal metrics and lack external validation. Here we present the Brain-LLM Unified Model (BLUM), a framework that leverages lesion-symptom mapping, the gold standard for establishing causal brain-behavior relationships for over a century, as an external reference structure for evaluating LLM perturbation effects. Using data from individuals with chronic post-stroke aphasia (N = 410), we trained symptom-to-lesion models that predict brain damage location from behavioral error profiles, applied systematic perturbations to transformer layers, administered identical clinical assessments to perturbed LLMs and human patients, and projected LLM error profiles into human lesion space. LLM error profiles were sufficiently similar to human error profiles that predicted lesions corresponded to actual lesions in error-matched humans above chance in 67% of picture naming conditions (p < 10^{-23}) and 68.3% of sentence completion conditions (p < 10^{-61}), with semantic-dominant errors mapping onto ventral-stream lesion patterns and phonemic-dominant errors onto dorsal-stream patterns. These findings open a new methodological avenue for LLM interpretability in which clinical neuroscience provides external validation, establishing human lesion-symptom mapping as a reference framework for evaluating artificial language systems and motivating direct investigation of whether behavioral alignment reflects shared computational principles.
【24】The Illusion of Generalization: Re-examining Tabular Language Model Evaluation
标题:泛化的错觉:重新审视表格语言模型评估
链接:https://arxiv.org/abs/2602.04031
作者:Aditya Gorla,Ratish Puduppully
摘要:表格语言模型(TLM)被认为可以实现表格预测的涌现泛化。我们利用UniPredict基准测试的165个数据集,对Tabula-8B作为代表性TLM进行了系统的重新评估。我们的调查揭示了三个发现。首先,二元和分类分类在大多数类基线上实现了接近零的中值提升,并且强大的总体性能完全由四分位数分类任务驱动。其次,表现最好的数据集表现出普遍的污染,包括完全的训练测试重叠和逃避标准重复数据删除的任务级泄漏。第三,不暴露于表格中的修正调整恢复了92.2%的标准分类性能,在四分位数分类中,格式熟悉度弥补了71.3%的差距,剩余部分归因于污染数据集。这些发现表明,声称的概括可能反映了评估工件,而不是学习表格推理。最后,我们提出了加强TLM评估的建议。
摘要:Tabular Language Models (TLMs) have been claimed to achieve emergent generalization for tabular prediction. We conduct a systematic re-evaluation of Tabula-8B as a representative TLM, utilizing 165 datasets from the UniPredict benchmark. Our investigation reveals three findings. First, binary and categorical classification achieve near-zero median lift over majority-class baselines and strong aggregate performance is driven entirely by quartile classification tasks. Second, top-performing datasets exhibit pervasive contamination, including complete train-test overlap and task-level leakage that evades standard deduplication. Third, instruction-tuning without tabular exposure recovers 92.2% of standard classification performance and on quartile classification, format familiarity closes 71.3% of the gap with the residual attributable to contaminated datasets. These findings suggest claimed generalization likely reflects evaluation artifacts rather than learned tabular reasoning. We conclude with recommendations for strengthening TLM evaluation.
【25】Understanding and Guiding Layer Placement in Parameter-Efficient Fine-Tuning of Large Language Models
标题:在大型语言模型的参数高效微调中理解和指导层放置
链接:https://arxiv.org/abs/2602.04019
作者:Yichen Xu,Yuyang Liang,Shan Dai,Tianyang Hu,Tsz Nam Chan,Chenhao Ma
摘要
:随着大型语言模型(LLM)的持续增长,全参数微调的成本已经使参数有效微调(PEFT)成为下游适应的默认策略。可扩展服务中的推理延迟和边缘或快速部署设置中的微调成本的约束使得选择微调哪些层不可避免。然而,目前的实践通常在所有层上均匀地应用PEFT,对层选择的理解或利用有限。本文提出了一个统一的投影剩余视图的PEFT顶部冻结的基础模型。在局部二次近似下,逐层自适应由三个量控制:(i)投影残差范数(resnorm),它测量一个层可以捕获多少可校正的偏差;(ii)激活能,它确定特征条件;以及(iii)层耦合,它量化残差在层间的相互作用有多强。我们发现,平方损失和线性适配器,resnorm等于一个归一化的梯度范数,激活能控制病态和噪声放大,弱耦合产生约加性分层的贡献。在这些见解的基础上,我们引入了层卡,这是一种可重复使用的诊断工具,可以总结给定模型每层的剩余信号强度、计算成本和性能。使用相同的模型和LoRA配置,层卡引导的布局可以优化适配层的选择,以灵活地优先考虑不同的目标,例如最大限度地提高性能或降低微调成本。此外,在Qwen 3 -8B上,我们证明了选择性地调整层的子集可以实现接近全层LoRA的性能,同时大大降低了微调成本和推理过程中适配器增强层的数量,为全层插入提供了更具性价比的替代方案。
摘要:As large language models (LLMs) continue to grow, the cost of full-parameter fine-tuning has made parameter-efficient fine-tuning (PEFT) the default strategy for downstream adaptation. Constraints from inference latency in scalable serving and fine-tuning cost in edge or rapid-deployment settings make the choice of which layers to fine-tune unavoidable. Yet current practice typically applies PEFT uniformly across all layers, with limited understanding or leverage of layer selection. This paper develops a unified projected residual view of PEFT on top of a frozen base model. Under a local quadratic approximation, layerwise adaptation is governed by three quantities: (i) the projected residual norm (resnorm), which measures how much correctable bias a layer can capture; (ii) the activation energy, which determines feature conditioning; and (iii) layer coupling, which quantifies how strongly residuals interact across layers. We show that, for squared loss and linear adapters, the resnorm equals a normalized gradient norm, activation energy controls ill-conditioning and noise amplification, and weak coupling yields approximately additive layerwise contributions. Building on these insights, we introduce the Layer Card, a reusable diagnostic that summarizes residual signal strength, compute cost, and performance for each layer of a given model. With an identical model and LoRA configuration, Layer Card-guided placement refines the choice of adapted layers to flexibly prioritize different objectives, such as maximizing performance or reducing fine-tuning cost. Moreover, on Qwen3-8B, we show that selectively adapting a subset of layers can achieve performance close to full-layer LoRA while substantially reducing fine-tuning cost and the number of adapter-augmented layers during inference, offering a more cost-performance-aware alternative to full-layer insertion.
【26】When Chains of Thought Don't Matter: Causal Bypass in Large Language Models
标题:当思想链不重要时:大型语言模型中的因果绕过
链接:https://arxiv.org/abs/2602.03994
作者:Anish Sathyanarayanan,Aditya Nagarsekar,Aarush Rathore
备注:Under Review at ICLR, 2026
摘要:思想链(CoT)提示被广泛认为可以暴露模型的推理过程并提高透明度。我们试图通过惩罚不忠实的推理来强化这一假设,但发现表面上的顺从并不能保证因果依赖。我们的中心发现是负面的:即使CoT是冗长的,战略性的,并标记的表面水平的操纵检测器,模型的答案往往是因果关系独立的CoT的内容。我们提出了一个诊断框架,审计这种故障模式:它结合了(i)一个可解释的行为模块,评分操作相关的信号在CoT文本和(ii)一个因果探针,测量CoT介导的影响(CMI)通过隐藏状态修补和报告的旁路分数(1-\mathrm{CMI}$),量化的程度,答案是由旁路电路产生的独立的理由。在试点评估中,意识到的提示增加可检测到的操纵信号(平均风险评分增量:$+5.10$),但因果探针揭示任务依赖的调解:许多QA项目表现出接近总旁路(CMI $\约0$),而一些逻辑问题显示更强的调解(CMI高达0.56 $)。分层分析揭示了狭窄的和任务相关的“推理窗口”,即使当平均CMI是低的。
摘要:Chain-of-thought (CoT) prompting is widely assumed to expose a model's reasoning process and improve transparency. We attempted to enforce this assumption by penalizing unfaithful reasoning, but found that surface-level compliance does not guarantee causal reliance. Our central finding is negative: even when CoT is verbose, strategic, and flagged by surface-level manipulation detectors, model answers are often causally independent of the CoT content. We present a diagnostic framework for auditing this failure mode: it combines (i) an interpretable behavioral module that scores manipulation-relevant signals in CoT text and (ii) a causal probe that measures CoT-mediated influence (CMI) via hidden-state patching and reports a bypass score ($1-\mathrm{CMI}$), quantifying the degree to which the answer is produced by a bypass circuit independent of the rationale. In pilot evaluations, audit-aware prompting increases detectable manipulation signals (mean risk-score delta: $+5.10$), yet causal probes reveal task-dependent mediation: many QA items exhibit near-total bypass (CMI $\approx 0$), while some logic problems show stronger mediation (CMI up to $0.56$). Layer-wise analysis reveals narrow and task-dependent ``reasoning windows'' even when mean CMI is low.
【27】Enhancing Mathematical Problem Solving in LLMs through Execution-Driven Reasoning Augmentation
标题:通过执行驱动推理增强增强LLM中的数学问题解决
链接:https://arxiv.org/abs/2602.03950
作者:Aditya Basarkar,Benyamin Tabarsi,Tiffany Barnes,Dongkuan,Xu
备注:9 pages, 7 figures, submitted to ACL ARR 2026
摘要:数学问题解决是评估人工智能推理能力的基本基准,也是通往教育、科学和工程领域应用的门户,在这些领域,可靠的符号推理至关重要。尽管基于多智能体LLM的系统的最新进展增强了它们的数学推理能力,但它们仍然缺乏推理过程的可靠可修改表示。现有的代理要么在严格的顺序管道中操作,无法纠正早期步骤,要么依赖启发式自我评估,无法识别和修复错误。此外,编程上下文会分散语言模型的注意力,降低准确性。为了解决这些差距,我们引入了迭代改进的程序构造(IIPC),这是一种推理方法,它迭代地改进程序推理链,并将执行反馈与基础LLM的本地思想链能力相结合,以保持高层次的上下文焦点。IIPC在多个基本LLM的大多数推理基准中超越了竞争方法。所有的代码和实现都是开源的。
摘要:Mathematical problem solving is a fundamental benchmark for assessing the reasoning capabilities of artificial intelligence and a gateway to applications in education, science, and engineering where reliable symbolic reasoning is essential. Although recent advances in multi-agent LLM-based systems have enhanced their mathematical reasoning capabilities, they still lack a reliably revisable representation of the reasoning process. Existing agents either operate in rigid sequential pipelines that cannot correct earlier steps or rely on heuristic self-evaluation that can fail to identify and fix errors. In addition, programmatic context can distract language models and degrade accuracy. To address these gaps, we introduce Iteratively Improved Program Construction (IIPC), a reasoning method that iteratively refines programmatic reasoning chains and combines execution feedback with the native Chain-of-thought abilities of the base LLM to maintain high-level contextual focus. IIPC surpasses competing approaches in the majority of reasoning benchmarks on multiple base LLMs. All code and implementations are released as open source.
【28】SpatiaLab: Can Vision-Language Models Perform Spatial Reasoning in the Wild?
标题:SpatiaLab:视觉语言模型可以在野外执行空间推理吗?
链接:https://arxiv.org/abs/2602.03916
作者:Azmine Toushik Wasi,Wahid Faisal,Abdur Rahman,Mahfuz Ahmed Anik,Munem Shahriar,Mohsin Mahmud Topu,Sadia Tasnim Meem,Rahatun Nesa Priti,Sabrina Afroz Mitu,Md. Iqramul Hoque,Shahriyar Zaman Ridoy,Mohammed Eunus Ali,Majd Hawasly,Mohammad Raza,Md Rizwan Parvez
备注:Accepted to ICLR 2026. 92 Pages. 42 Figures and 29 Tables
摘要
:空间推理是人类认知的一个基本方面,但它仍然是当代视觉语言模型(VLM)的一个主要挑战。以前的工作主要依赖于合成或LLM生成的环境,任务设计和益智类设置有限,无法捕捉真实世界的复杂性,视觉噪声和VLM遇到的各种空间关系。为了解决这个问题,我们引入了SpatiaLab,这是一个用于评估VLM在现实、无约束背景下的空间推理的综合基准。SpatiaLab包含1,400个视觉问答对,分为六个主要类别:相对定位,深度和遮挡,方向,大小和比例,空间导航和3D几何,每个类别有五个子类别,产生30种不同的任务类型。每个子类别包含至少25个问题,每个主类别包含至少200个问题,支持多项选择和开放式评价。在不同的国家的最先进的VLM,包括开源和闭源模型,推理为重点的,和专门的空间推理模型的实验,揭示了空间推理能力与人类相比,有很大的差距。在多项选择设置中,InternVL3.5- 72 B的准确率为54.93%,而人类的准确率为87.57%。在开放式环境中,所有模型的性能下降约10- 25%,GPT-5-mini得分最高为40.93%,而人类为64.93%。这些结果突出了处理复杂空间关系、深度感知、导航和3D几何形状的关键限制。通过提供一个多样化的真实世界的评估框架,SpatiaLab揭示了推进VLMs空间推理的关键挑战和机遇,为指导未来的研究提供了一个基准,以实现强大的、与人类一致的空间理解。SpatiaLab的网址是:https://spatialab-reasoning.github.io/。
摘要:Spatial reasoning is a fundamental aspect of human cognition, yet it remains a major challenge for contemporary vision-language models (VLMs). Prior work largely relied on synthetic or LLM-generated environments with limited task designs and puzzle-like setups, failing to capture the real-world complexity, visual noise, and diverse spatial relationships that VLMs encounter. To address this, we introduce SpatiaLab, a comprehensive benchmark for evaluating VLMs' spatial reasoning in realistic, unconstrained contexts. SpatiaLab comprises 1,400 visual question-answer pairs across six major categories: Relative Positioning, Depth & Occlusion, Orientation, Size & Scale, Spatial Navigation, and 3D Geometry, each with five subcategories, yielding 30 distinct task types. Each subcategory contains at least 25 questions, and each main category includes at least 200 questions, supporting both multiple-choice and open-ended evaluation. Experiments across diverse state-of-the-art VLMs, including open- and closed-source models, reasoning-focused, and specialized spatial reasoning models, reveal a substantial gap in spatial reasoning capabilities compared with humans. In the multiple-choice setup, InternVL3.5-72B achieves 54.93% accuracy versus 87.57% for humans. In the open-ended setting, all models show a performance drop of around 10-25%, with GPT-5-mini scoring highest at 40.93% versus 64.93% for humans. These results highlight key limitations in handling complex spatial relationships, depth perception, navigation, and 3D geometry. By providing a diverse, real-world evaluation framework, SpatiaLab exposes critical challenges and opportunities for advancing VLMs' spatial reasoning, offering a benchmark to guide future research toward robust, human-aligned spatial understanding. SpatiaLab is available at: https://spatialab-reasoning.github.io/.
Graph相关(图学习|图神经网络|图优化等)(9篇)
【1】Billion-Scale Graph Foundation Models
标题:十亿规模图表基础模型
链接:https://arxiv.org/abs/2602.04768
作者:Maya Bechler-Speicher,Yoel Gottlieb,Andrey Isakov,David Abensur,Ami Tavory,Daniel Haimovich,Ido Guy,Udi Weinsberg
摘要:图结构数据支撑着许多关键应用程序。虽然基础模型通过大规模的预训练和轻量级的适应改变了语言和视觉,但将这种范式扩展到一般的真实世界的图形是具有挑战性的。在这项工作中,我们提出了Graph Billion- Foundation-Fusion(GraphBFF):第一个为任意异构的十亿级图构建十亿参数图基础模型(GFM)的端到端配方。配方的核心是GraphBFF Transformer,这是一种灵活且可扩展的架构,专为实际的十亿级GFM而设计。使用GraphBFF,我们提出了一般图的第一个神经标度律,并表明损失随着模型容量或训练数据的扩展而可预测地减少,这取决于哪个因素是瓶颈。GraphBFF框架为大规模构建GFM提供了数据优化、预训练和微调的具体方法。我们证明了该框架的有效性,评估了在10亿个样本上预训练的14亿个参数的GraphBFF Transformer。在训练期间未见过的十个不同的真实下游任务中,跨越节点和链路级分类和回归,GraphBFF实现了卓越的zero-shot和探测性能,包括在Few-Shot设置中,具有高达31个PRAUC点的大余量。最后,我们讨论了使GFM成为工业规模图学习的实用和原则基础的关键挑战和开放机会。
摘要:Graph-structured data underpins many critical applications. While foundation models have transformed language and vision via large-scale pretraining and lightweight adaptation, extending this paradigm to general, real-world graphs is challenging. In this work, we present Graph Billion- Foundation-Fusion (GraphBFF): the first end-to-end recipe for building billion-parameter Graph Foundation Models (GFMs) for arbitrary heterogeneous, billion-scale graphs. Central to the recipe is the GraphBFF Transformer, a flexible and scalable architecture designed for practical billion-scale GFMs. Using the GraphBFF, we present the first neural scaling laws for general graphs and show that loss decreases predictably as either model capacity or training data scales, depending on which factor is the bottleneck. The GraphBFF framework provides concrete methodologies for data batching, pretraining, and fine-tuning for building GFMs at scale. We demonstrate the effectiveness of the framework with an evaluation of a 1.4 billion-parameter GraphBFF Transformer pretrained on one billion samples. Across ten diverse, real-world downstream tasks on graphs unseen during training, spanning node- and link-level classification and regression, GraphBFF achieves remarkable zero-shot and probing performance, including in few-shot settings, with large margins of up to 31 PRAUC points. Finally, we discuss key challenges and open opportunities for making GFMs a practical and principled foundation for graph learning at industrial scale.
【2】Towards Understanding and Avoiding Limitations of Convolutions on Graphs
标题:理解和避免图上卷积的局限性
链接:https://arxiv.org/abs/2602.04709
作者:Andreas Roth
备注:dissertation
摘要:虽然消息传递神经网络(MPNN)已经显示出有希望的结果,但它们在现实世界中的影响仍然有限。虽然已经确定了各种限制,但对其理论基础仍然知之甚少,导致分散的研究工作。在这篇论文中,我们提供了深入的理论分析,并确定了几个关键属性限制MPNN的性能。基于这些发现,我们提出了几个框架,以解决这些缺点。我们确定了许多MPNN所表现出的两个属性:共享组件放大(SCA),其中每个消息传递迭代在所有功能通道中放大相同的组件,以及组件主导(CD),其中单个组件随着应用更多的消息传递步骤而被越来越多地放大。这些性质导致节点表示的秩崩溃的可观察现象,这推广了已建立的过平滑现象。通过概括和分解过度平滑,我们可以更深入地了解MPNN,更有针对性的解决方案,以及更精确的领域内沟通。为了避免SCA,我们表明,利用多个计算图或边缘关系是必要的。我们的多关系分割(MRS)框架将任何现有的MPNN转换为利用多个边缘关系的MPNN。此外,我们还介绍了多特征通道的谱图卷积(MIMO-GC),它自然地使用了多个计算图。本地化变体LMGC在继承其有益特性的同时近似MIMO-GC。为了解决CD,我们展示了MPNN和PageRank算法之间的密切联系。基于个性化的PageRank,我们提出了一个变体的MPNN,允许无限多的消息传递迭代,同时保留初始节点的功能。总的来说,这些结果深化了对MPNN的理论理解。
摘要:While message-passing neural networks (MPNNs) have shown promising results, their real-world impact remains limited. Although various limitations have been identified, their theoretical foundations remain poorly understood, leading to fragmented research efforts. In this thesis, we provide an in-depth theoretical analysis and identify several key properties limiting the performance of MPNNs. Building on these findings, we propose several frameworks that address these shortcomings. We identify two properties exhibited by many MPNNs: shared component amplification (SCA), where each message-passing iteration amplifies the same components across all feature channels, and component dominance (CD), where a single component gets increasingly amplified as more message-passing steps are applied. These properties lead to the observable phenomenon of rank collapse of node representations, which generalizes the established over-smoothing phenomenon. By generalizing and decomposing over-smoothing, we enable a deeper understanding of MPNNs, more targeted solutions, and more precise communication within the field. To avoid SCA, we show that utilizing multiple computational graphs or edge relations is necessary. Our multi-relational split (MRS) framework transforms any existing MPNN into one that leverages multiple edge relations. Additionally, we introduce the spectral graph convolution for multiple feature channels (MIMO-GC), which naturally uses multiple computational graphs. A localized variant, LMGC, approximates the MIMO-GC while inheriting its beneficial properties. To address CD, we demonstrate a close connection between MPNNs and the PageRank algorithm. Based on personalized PageRank, we propose a variant of MPNNs that allows for infinitely many message-passing iterations, while preserving initial node features. Collectively, these results deepen the theoretical understanding of MPNNs.
【3】Generalized Schrödinger Bridge on Graphs
标题:图上的广义薛定谔桥
链接:https://arxiv.org/abs/2602.04675
作者:Panagiotis Theodoropoulos,Juno Nam,Evangelos Theodorou,Jaemoo Choi
摘要
:图上的传输是许多领域的基本挑战,其中决策必须尊重拓扑和操作约束。尽管需要可操作的策略,但现有的图传输方法缺乏这种表现力。它们依赖于限制性的假设,无法在稀疏拓扑中推广,并且随着图形大小和时间范围的扩展性很差。为了解决这些问题,我们介绍了广义薛定谔桥图(GSBoG),一种新的可扩展的数据驱动的框架,用于学习可执行的受控连续时间马尔可夫链(CTMC)的政策下的状态成本增强动态任意图。值得注意的是,GSBoG学习自动化级别的策略,避免了密集的全局求解器,从而增强了可扩展性。这是通过似然优化方法实现的,满足端点边际,同时优化状态相关运行成本下的中间行为。在具有挑战性的现实世界图拓扑上进行的广泛实验表明,GSBoG可靠地学习准确的、尊重拓扑的策略,同时优化特定于应用的中间状态成本,突出了其广泛的适用性,并为通用图上的成本感知动态传输铺平了新的道路。
摘要:Transportation on graphs is a fundamental challenge across many domains, where decisions must respect topological and operational constraints. Despite the need for actionable policies, existing graph-transport methods lack this expressivity. They rely on restrictive assumptions, fail to generalize across sparse topologies, and scale poorly with graph size and time horizon. To address these issues, we introduce Generalized Schrödinger Bridge on Graphs (GSBoG), a novel scalable data-driven framework for learning executable controlled continuous-time Markov chain (CTMC) policies on arbitrary graphs under state cost augmented dynamics. Notably, GSBoG learns trajectory-level policies, avoiding dense global solvers and thereby enhancing scalability. This is achieved via a likelihood optimization approach, satisfying the endpoint marginals, while simultaneously optimizing intermediate behavior under state-dependent running costs. Extensive experimentation on challenging real-world graph topologies shows that GSBoG reliably learns accurate, topology-respecting policies while optimizing application-specific intermediate state costs, highlighting its broad applicability and paving new avenues for cost-aware dynamical transport on general graphs.
【4】Probabilistic Label Spreading: Efficient and Consistent Estimation of Soft Labels with Epistemic Uncertainty on Graphs
标题:概率标签传播:图形上具有认识不确定性的软标签的有效一致估计
链接:https://arxiv.org/abs/2602.04574
作者:Jonathan Klees,Tobias Riedlinger,Peter Stehr,Bennet Böddecker,Daniel Kondermann,Matthias Rottmann
摘要:用于感知任务的安全人工智能仍然是一个重大挑战,部分原因是缺乏具有高质量标签的数据。注释本身受到任意性和认识不确定性的影响,这在注释和评估过程中通常被忽略。虽然众包使得能够收集每个图像的多个注释以估计这些不确定性,但是由于所需的注释工作,这种方法在规模上是不切实际的。我们引入了一个概率标签扩散方法,提供了可靠的估计任意和认识的不确定性的标签。假设标签在特征空间上是平滑的,我们使用基于图的扩散方法来传播单个注释。我们证明了标签扩散产生一致的概率估计,即使每个数据点的注释数量收敛到零。我们提出并分析了一个可扩展的实现我们的方法。实验结果表明,与基线相比,我们的方法大大减少了在常见图像数据集上实现所需标签质量所需的注释预算,并在以数据为中心的图像分类基准上实现了最新的技术水平。
摘要:Safe artificial intelligence for perception tasks remains a major challenge, partly due to the lack of data with high-quality labels. Annotations themselves are subject to aleatoric and epistemic uncertainty, which is typically ignored during annotation and evaluation. While crowdsourcing enables collecting multiple annotations per image to estimate these uncertainties, this approach is impractical at scale due to the required annotation effort. We introduce a probabilistic label spreading method that provides reliable estimates of aleatoric and epistemic uncertainty of labels. Assuming label smoothness over the feature space, we propagate single annotations using a graph-based diffusion method. We prove that label spreading yields consistent probability estimators even when the number of annotations per data point converges to zero. We present and analyze a scalable implementation of our method. Experimental results indicate that, compared to baselines, our approach substantially reduces the annotation budget required to achieve a desired label quality on common image datasets and achieves a new state of the art on the Data-Centric Image Classification benchmark.
【5】Training A Foundation Model to Represent Graphs as Vectors
标题:训练基础模型以将图形表示为载体
链接:https://arxiv.org/abs/2602.04244
作者:Qi Feng,Jicong Fan
摘要:本文旨在训练一个图形基础模型,该模型能够将任何图形表示为保留结构和语义信息的向量,用于下游图形级别的任务,如图形分类和图形聚类。为了学习不同领域的图的特征,同时保持对新领域的强大泛化能力,我们提出了一种基于多图的特征对齐方法,该方法利用每个数据集中所有节点的属性构建加权图,然后生成一致的节点嵌入。为了提高不同数据集特征的一致性,我们提出了一种保证收敛的密度最大化均值对齐算法。原始图和生成的节点嵌入被馈送到图神经网络中,以在对比学习中实现有区别的图表示。更重要的是,为了增强从节点级表示到图级表示的信息保存,我们构建了一个多层引用分布模块,而不使用任何池化操作。我们还提供了一个理论上的推广,以支持所提出的模型的有效性。Few-Shot图分类和图聚类的实验结果表明,我们的模型优于强基线。
摘要:This paper aims to train a graph foundation model that is able to represent any graph as a vector preserving structural and semantic information useful for downstream graph-level tasks such as graph classification and graph clustering. To learn the features of graphs from diverse domains while maintaining strong generalization ability to new domains, we propose a multi-graph-based feature alignment method, which constructs weighted graphs using the attributes of all nodes in each dataset and then generates consistent node embeddings. To enhance the consistency of the features from different datasets, we propose a density maximization mean alignment algorithm with guaranteed convergence. The original graphs and generated node embeddings are fed into a graph neural network to achieve discriminative graph representations in contrastive learning. More importantly, to enhance the information preservation from node-level representations to the graph-level representation, we construct a multi-layer reference distribution module without using any pooling operation. We also provide a theoretical generalization bound to support the effectiveness of the proposed model. The experimental results of few-shot graph classification and graph clustering show that our model outperforms strong baselines.
【6】Pruning for Generalization: A Transfer-Oriented Spatiotemporal Graph Framework
标题:通用修剪:面向传输的时空图框架
链接:https://arxiv.org/abs/2602.04153
作者:Zihao Jing,Yuxi Long,Ganlin Feng
备注:Under review at ICLR 2026 Workshop TSALM
摘要:图结构域中的多变量时间序列预测对于现实世界的应用至关重要,然而现有的时空模型经常在数据稀缺和跨域迁移的情况下遭受性能下降。我们通过结构感知上下文选择的镜头来解决这些挑战。我们提出了TL-GPSTGN,一个面向传输的时空框架,通过选择性地修剪非优化的图形上下文来提高样本效率和分布泛化。具体而言,我们的方法采用信息理论和基于相关性的标准来提取结构信息子图和特征,从而产生紧凑的,语义接地的表示。这种优化的上下文随后被集成到时空卷积架构中,以捕获复杂的多变量动态。对大规模流量基准的评估表明,TL-GPSTGN在低数据传输场景中始终优于基线。我们的研究结果表明,显式上下文修剪作为一个强大的归纳偏见,提高基于图的预测模型的鲁棒性。
摘要:Multivariate time series forecasting in graph-structured domains is critical for real-world applications, yet existing spatiotemporal models often suffer from performance degradation under data scarcity and cross-domain shifts. We address these challenges through the lens of structure-aware context selection. We propose TL-GPSTGN, a transfer-oriented spatiotemporal framework that enhances sample efficiency and out-of-distribution generalization by selectively pruning non-optimized graph context. Specifically, our method employs information-theoretic and correlation-based criteria to extract structurally informative subgraphs and features, resulting in a compact, semantically grounded representation. This optimized context is subsequently integrated into a spatiotemporal convolutional architecture to capture complex multivariate dynamics. Evaluations on large-scale traffic benchmarks demonstrate that TL-GPSTGN consistently outperforms baselines in low-data transfer scenarios. Our findings suggest that explicit context pruning serves as a powerful inductive bias for improving the robustness of graph-based forecasting models.
【7】Toward Effective Multimodal Graph Foundation Model: A Divide-and-Conquer Based Approach
标题:一种有效的多模态图基础模型:基于分治的方法
链接:https://arxiv.org/abs/2602.04116
作者:Sicheng Liu,Xunkai Li,Daohan Su,Ru Zhang,Hongchao Qin,Ronghua Li,Guoren Wang
备注:20 pages, 6 figures
摘要:图基础模型(GFM)在跨不同领域的推广方面取得了显着的成功。然而,它们主要集中在文本属性图(TAG)上,而多模态属性图(MAG)在很大程度上尚未开发。开发多模态图基础模型(MGFMs)允许利用MAG中丰富的多模态信息,并将适用性扩展到更广泛类型的下游任务。虽然最近的MGFMs集成了不同的模态信息,但我们的实证研究揭示了现有MGFMs的两个基本局限性:(1)它们未能明确地建模模态交互,这对于捕获复杂的跨模态语义而不仅仅是简单的聚合至关重要,(2)它们表现出次优的模态对齐,这对于弥合不同模态空间之间的显著语义差异至关重要。为了解决这些挑战,我们提出了PLANET(图形拓扑感知模态交互和对齐),一种新的框架,采用分治策略解耦模态交互和对齐不同的粒度。在嵌入粒度上,(1)嵌入式域门(EDG)通过自适应地注入拓扑感知的跨模态上下文来执行局部语义富集,实现模态交互。在节点粒度上,(2)节点离散化检索(NDR)通过构建离散化语义表示空间(DSRS)来弥合模态间隙,从而确保全局模态对齐。大量的实验表明,PLANET在各种以图形为中心的多模态生成任务中的表现明显优于最先进的基线。
摘要:Graph Foundation Models (GFMs) have achieved remarkable success in generalizing across diverse domains. However, they mainly focus on Text-Attributed Graphs (TAGs), leaving Multimodal-Attributed Graphs (MAGs) largely untapped. Developing Multimodal Graph Foundation Models (MGFMs) allows for leveraging the rich multimodal information in MAGs, and extends applicability to broader types of downstream tasks. While recent MGFMs integrate diverse modality information, our empirical investigation reveals two fundamental limitations of existing MGFMs: (1)they fail to explicitly model modality interaction, essential for capturing intricate cross-modal semantics beyond simple aggregation, and (2)they exhibit sub-optimal modality alignment, which is critical for bridging the significant semantic disparity between distinct modal spaces. To address these challenges, we propose PLANET (graPh topoLogy-aware modAlity iNteraction and alignmEnT), a novel framework employing a Divide-and-Conquer strategy to decouple modality interaction and alignment across distinct granularities. At the embedding granularity, (1)Embedding-wise Domain Gating (EDG) performs local semantic enrichment by adaptively infusing topology-aware cross-modal context, achieving modality interaction. At the node granularity, (2)Node-wise Discretization Retrieval (NDR) ensures global modality alignment by constructing a Discretized Semantic Representation Space (DSRS) to bridge modality gaps. Extensive experiments demonstrate that PLANET significantly outperforms state-of-the-art baselines across diverse graph-centric and multimodal generative tasks.
【8】A Consensus-Bayesian Framework for Detecting Malicious Activity in Enterprise Directory Access Graphs
标题:用于检测企业目录访问图中恶意活动的先验-Bayesian框架
链接:https://arxiv.org/abs/2602.04027
作者:Pratyush Uppuluri,Shilpa Noushad,Sajan Kumar
摘要:这项工作提出了一个基于共识的贝叶斯框架来检测企业目录访问图中的恶意用户行为。通过将目录建模为主题,将用户建模为多级交互图中的代理,我们使用影响加权意见动态来模拟访问演化。用户之间的逻辑依赖性被编码在动态矩阵Ci中,并且目录相似性经由共享影响矩阵W被捕获。恶意行为作为违反强连接组件(SCC)结构规范的跨组件逻辑扰动注入。我们从意见动力学文献的理论保证,以确定主题收敛和检测异常通过缩放的意见方差。为了量化不确定性,我们引入了一个贝叶斯异常评分机制,随着时间的推移,使用静态和在线先验。合成访问图上的模拟验证了我们的方法,证明了其逻辑不一致的敏感性和动态扰动下的鲁棒性。
摘要:This work presents a consensus-based Bayesian framework to detect malicious user behavior in enterprise directory access graphs. By modeling directories as topics and users as agents within a multi-level interaction graph, we simulate access evolution using influence-weighted opinion dynamics. Logical dependencies between users are encoded in dynamic matrices Ci, and directory similarity is captured via a shared influence matrix W. Malicious behavior is injected as cross-component logical perturbations that violate structural norms of strongly connected components(SCCs). We apply theoretical guarantees from opinion dynamics literature to determine topic convergence and detect anomaly via scaled opinion variance. To quantify uncertainty, we introduce a Bayesian anomaly scoring mechanism that evolves over time, using both static and online priors. Simulations over synthetic access graphs validate our method, demonstrating its sensitivity to logical inconsistencies and robustness under dynamic perturbation.
【9】DeXposure-FM: A Time-series, Graph Foundation Model for Credit Exposures and Stability on Decentralized Financial Networks
标题:DeXposure-FM:分散金融网络信用暴露和稳定性的时间序列、图形基础模型
链接:https://arxiv.org/abs/2602.03981
作者:Aijie Shu,Wenbin Wu,Gbenga Ibikunle,Fengxiang He
摘要:去中心化金融(DeFi)中的信用风险敞口通常是隐式的和令牌介导的,从而创建了一个密集的协议间依赖网络。因此,对一个代币的冲击可能会导致重大和不受控制的传染效应。随着DeFi生态系统通过稳定币等工具与传统金融基础设施的联系越来越紧密,这种动态带来的风险需要更强大的量化工具。我们介绍DeXposure-FM,这是第一个时间序列,图形基础模型,用于测量和预测DeFi网络上的协议间信用风险。DeXposure-FM采用图形表格编码器,具有预先训练的权重初始化和多个特定于任务的头部,在DeXplant数据集上训练,该数据集拥有4370万个数据条目,跨越602个区块链上的4,300多个协议,覆盖24,300多个唯一令牌。该培训可操作用于信用风险预测,预测(1)协议级流的联合动态,以及(2)信用风险链接的拓扑和权重。DeXposure-FM在两个机器学习基准上进行了经验验证;它始终优于最先进的方法,包括图形基础模型和时间图神经网络。DeXposure-FM进一步开发金融经济学工具,支持宏观审慎监测和基于场景的DeFi压力测试,通过预测-然后-测量管道实现协议级系统重要性评分、行业级溢出和集中度指标。实证检验完全支持我们的金融经济学工具。模型和代码已经公开。Model:https://huggingface.co/EVIEHub/DeXposure-FM.代码:https://github.com/EVIEHub/DeXposure-FM.
摘要:Credit exposure in Decentralized Finance (DeFi) is often implicit and token-mediated, creating a dense web of inter-protocol dependencies. Thus, a shock to one token may result in significant and uncontrolled contagion effects. As the DeFi ecosystem becomes increasingly linked with traditional financial infrastructure through instruments, such as stablecoins, the risk posed by this dynamic demands more powerful quantification tools. We introduce DeXposure-FM, the first time-series, graph foundation model for measuring and forecasting inter-protocol credit exposure on DeFi networks, to the best of our knowledge. Employing a graph-tabular encoder, with pre-trained weight initialization, and multiple task-specific heads, DeXposure-FM is trained on the DeXposure dataset that has 43.7 million data entries, across 4,300+ protocols on 602 blockchains, covering 24,300+ unique tokens. The training is operationalized for credit-exposure forecasting, predicting the joint dynamics of (1) protocol-level flows, and (2) the topology and weights of credit-exposure links. The DeXposure-FM is empirically validated on two machine learning benchmarks; it consistently outperforms the state-of-the-art approaches, including a graph foundation model and temporal graph neural networks. DeXposure-FM further produces financial economics tools that support macroprudential monitoring and scenario-based DeFi stress testing, by enabling protocol-level systemic-importance scores, sector-level spillover and concentration measures via a forecast-then-measure pipeline. Empirical verification fully supports our financial economics tools. The model and code have been publicly available. Model: https://huggingface.co/EVIEHub/DeXposure-FM. Code: https://github.com/EVIEHub/DeXposure-FM.
Transformer(3篇)
【1】From independent patches to coordinated attention: Controlling information flow in vision transformers
标题:从独立补丁到协调注意力:控制视觉转换器中的信息流
链接:https://arxiv.org/abs/2602.04784
作者:Kieran A. Murphy
备注:Code at https://github.com/murphyka/vit_ib
摘要:在Vision Transformers中,我们使注意力传递的信息成为一个明确的、可测量的量。通过在所有以注意力为媒介的写入到剩余流上插入变化的信息瓶颈--没有其他架构变化--我们用明确的信息成本训练模型,并获得从独立补丁处理到完全表达全局注意力的可控频谱。在ImageNet-100上,我们描述了分类行为和信息路由如何在这个频谱上演变,并通过分析传输信息的第一个注意头,提供了关于全局视觉表示如何从局部补丁处理中出现的初步见解。通过将学习偏向于内部沟通受限的解决方案,我们的方法产生的模型更易于进行机械分析,更易于控制。
摘要:We make the information transmitted by attention an explicit, measurable quantity in vision transformers. By inserting variational information bottlenecks on all attention-mediated writes to the residual stream -- without other architectural changes -- we train models with an explicit information cost and obtain a controllable spectrum from independent patch processing to fully expressive global attention. On ImageNet-100, we characterize how classification behavior and information routing evolve across this spectrum, and provide initial insights into how global visual representations emerge from local patch processing by analyzing the first attention heads that transmit information. By biasing learning toward solutions with constrained internal communication, our approach yields models that are more tractable for mechanistic analysis and more amenable to control.
【2】SPOT-Occ: Sparse Prototype-guided Transformer for Camera-based 3D Occupancy Prediction
标题:SPOT-Occ:基于相机的稀疏原型引导Transformer三维占用预测
链接:https://arxiv.org/abs/2602.04240
作者:Suzeyu Chen,Leheng Li,Ying-Cong Chen
备注:8 pages, 6 figures
摘要:通过摄像头实现高度准确和实时的3D占用预测是自动驾驶汽车安全和实用部署的关键要求。虽然这种向稀疏3D表示的转变解决了编码瓶颈,但它为解码器带来了新的挑战:如何有效地从稀疏的、非均匀分布的体素特征集合中聚合信息,而不诉诸计算上禁止的密集注意力。 在本文中,我们提出了一种新的基于原型的稀疏Transformer解码器,取代了这种昂贵的互动与一个有效的,两个阶段的过程中,引导功能选择和集中的聚合。我们的核心思想是使解码器的注意原型指导。我们实现这一点,通过一个稀疏的原型选择机制,每个查询自适应地确定一个紧凑的一组最突出的体素功能,称为原型,集中的功能聚合。 为了确保这种动态选择是稳定和有效的,我们引入了一个互补的去噪范例。这种方法利用地面实况掩码来提供明确的指导,保证跨解码器层的查询-原型关联一致。我们的模型被称为SPOT-Occ,在速度上明显优于以前的方法,同时也提高了准确性。源代码发布于https://github.com/chensuzeyu/SpotOcc。
摘要:Achieving highly accurate and real-time 3D occupancy prediction from cameras is a critical requirement for the safe and practical deployment of autonomous vehicles. While this shift to sparse 3D representations solves the encoding bottleneck, it creates a new challenge for the decoder: how to efficiently aggregate information from a sparse, non-uniformly distributed set of voxel features without resorting to computationally prohibitive dense attention. In this paper, we propose a novel Prototype-based Sparse Transformer Decoder that replaces this costly interaction with an efficient, two-stage process of guided feature selection and focused aggregation. Our core idea is to make the decoder's attention prototype-guided. We achieve this through a sparse prototype selection mechanism, where each query adaptively identifies a compact set of the most salient voxel features, termed prototypes, for focused feature aggregation. To ensure this dynamic selection is stable and effective, we introduce a complementary denoising paradigm. This approach leverages ground-truth masks to provide explicit guidance, guaranteeing a consistent query-prototype association across decoder layers. Our model, dubbed SPOT-Occ, outperforms previous methods with a significant margin in speed while also improving accuracy. Source code is released at https://github.com/chensuzeyu/SpotOcc.
【3】Cross-Attention Transformer for Joint Multi-Receiver Uplink Neural Decoding
标题:用于联合多接收机上行神经解码的交叉注意Transformer
链接:https://arxiv.org/abs/2602.04728
作者:Xavier Tardy,Grégoire Lefebvre,Apostolos Kountouris,Haïfa Fares,Amor Nafkha
备注:6 pages, 3 figures, 3 tables, conference submission
摘要:我们提出了一个交叉关注Transformer的联合解码的上行链路OFDM信号接收多个协调接入点。共享的每接收器编码器学习每个接收网格内的时频结构,并且令牌式交叉注意模块融合接收器以产生用于标准信道解码器的软对数似然比,而不需要显式的每接收器信道估计。训练与比特度量的目标,该模型适应其融合每个接收机的可靠性,容忍丢失或退化的链接,并保持强大的导频稀疏时。在真实的Wi-Fi信道中,它始终优于经典管道和强大的卷积基线,经常匹配(在某些情况下超过)假设每个接入点都有完美信道知识的强大基线。尽管其表现力很强,但该架构结构紧凑,计算成本低(低GFLOPs),并在GPU上实现了低延迟,使其成为下一代Wi-Fi接收器的实用构建模块。
摘要:We propose a cross-attention Transformer for joint decoding of uplink OFDM signals received by multiple coordinated access points. A shared per-receiver encoder learns time-frequency structure within each received grid, and a token-wise cross-attention module fuses the receivers to produce soft log-likelihood ratios for a standard channel decoder, without requiring explicit per-receiver channel estimates. Trained with a bit-metric objective, the model adapts its fusion to per-receiver reliability, tolerates missing or degraded links, and remains robust when pilots are sparse. Across realistic Wi-Fi channels, it consistently outperforms classical pipelines and strong convolutional baselines, frequently matching (and in some cases surpassing) a powerful baseline that assumes perfect channel knowledge per access point. Despite its expressiveness, the architecture is compact, has low computational cost (low GFLOPs), and achieves low latency on GPUs, making it a practical building block for next-generation Wi-Fi receivers.
GAN|对抗|攻击|生成相关(6篇)
【1】Protein Autoregressive Modeling via Multiscale Structure Generation
标题:通过多尺度结构生成的蛋白质自回归建模
链接:https://arxiv.org/abs/2602.04883
作者:Yanru Qu,Cheng-Yen Hsieh,Zaixiang Zheng,Ge Liu,Quanquan Gu
备注:ByteDance Seed Tech Report; Page: https://par-protein.github.io/
摘要:我们提出了蛋白质自回归建模(PAR),第一个多尺度自回归框架蛋白质骨架生成通过粗到细下一个尺度预测。利用蛋白质的层次性质,PAR生成模仿雕塑的结构,形成粗糙的拓扑结构并在尺度上细化结构细节。为了实现这一点,PAR由三个关键组件组成:(i)多尺度下采样操作,在训练期间表示多尺度上的蛋白质结构;(ii)自回归Transformer,对多尺度信息进行编码并产生条件嵌入以指导结构生成;(iii)基于流的主干解码器,生成以这些嵌入为条件的主干原子。此外,自回归模型遭受曝光偏差,造成的训练和生成过程不匹配,并大大降低结构生成质量。我们有效地缓解了这个问题,采用嘈杂的上下文学习和定时采样,使强大的骨干生成。值得注意的是,PAR表现出很强的zero-shot泛化,支持灵活的人类提示的条件生成和模体支架,而不需要微调。在无条件生成基准上,PAR有效地学习蛋白质分布,产生高设计质量的主链,并表现出良好的缩放行为。总之,这些特性使PAR成为蛋白质结构生成的一个有前途的框架。
摘要
:We present protein autoregressive modeling (PAR), the first multi-scale autoregressive framework for protein backbone generation via coarse-to-fine next-scale prediction. Using the hierarchical nature of proteins, PAR generates structures that mimic sculpting a statue, forming a coarse topology and refining structural details over scales. To achieve this, PAR consists of three key components: (i) multi-scale downsampling operations that represent protein structures across multiple scales during training; (ii) an autoregressive transformer that encodes multi-scale information and produces conditional embeddings to guide structure generation; (iii) a flow-based backbone decoder that generates backbone atoms conditioned on these embeddings. Moreover, autoregressive models suffer from exposure bias, caused by the training and the generation procedure mismatch, and substantially degrades structure generation quality. We effectively alleviate this issue by adopting noisy context learning and scheduled sampling, enabling robust backbone generation. Notably, PAR exhibits strong zero-shot generalization, supporting flexible human-prompted conditional generation and motif scaffolding without requiring fine-tuning. On the unconditional generation benchmark, PAR effectively learns protein distributions and produces backbones of high design quality, and exhibits favorable scaling behavior. Together, these properties establish PAR as a promising framework for protein structure generation.
【2】Toward Reliable and Explainable Nail Disease Classification: Leveraging Adversarial Training and Grad-CAM Visualization
标题:迈向可靠且可解释的指甲疾病分类:利用对抗训练和Grad-CAM可视化
链接:https://arxiv.org/abs/2602.04820
作者:Farzia Hossain,Samanta Ghosh,Shahida Begum,B. M. Shahria Alam,Mohammad Tahmid Noor,Md Parvez Mia,Nishat Tasnim Niloy
备注:6 pages, 12 figures. This is the author's accepted manuscript of a paper accepted for publication in the Proceedings of the 16th International IEEE Conference on Computing, Communication and Networking Technologies (ICCCNT 2025). The final published version will be available via IEEE Xplore
摘要:人类指甲疾病在所有年龄组中逐渐观察到,特别是在老年人中,通常被忽视,直到它们变得严重。早期发现和准确诊断这些疾病很重要,因为它们有时会揭示我们身体的健康问题。但由于疾病类型之间的推断视觉差异,这是具有挑战性的。本文提出了一种基于机器学习的指甲疾病自动分类模型,该模型基于一个公开的数据集,该数据集包含3,835张缩放六个类别的图像。在224 x224像素中,所有图像都调整了大小以确保一致性。为了评估性能,我们训练和分析了四个著名的CNN模型-InceptionV 3、DenseNet 201、EfficientNetV 2和ResNet 50。其中,InceptionV 3的准确率为95.57%,而DenseNet 201则为94.79%。为了使模型更强大并且在棘手或嘈杂的图像上不太可能出错,我们使用了对抗性训练。为了帮助理解模型如何做出决策,我们使用SHAP来突出预测中的重要特征。该系统可以为医生提供帮助,使指甲疾病诊断更准确,更快速。
摘要:Human nail diseases are gradually observed over all age groups, especially among older individuals, often going ignored until they become severe. Early detection and accurate diagnosis of such conditions are important because they sometimes reveal our body's health problems. But it is challenging due to the inferred visual differences between disease types. This paper presents a machine learning-based model for automated classification of nail diseases based on a publicly available dataset, which contains 3,835 images scaling six categories. In 224x224 pixels, all images were resized to ensure consistency. To evaluate performance, four well-known CNN models-InceptionV3, DenseNet201, EfficientNetV2, and ResNet50 were trained and analyzed. Among these, InceptionV3 outperformed the others with an accuracy of 95.57%, while DenseNet201 came next with 94.79%. To make the model stronger and less likely to make mistakes on tricky or noisy images, we used adversarial training. To help understand how the model makes decisions, we used SHAP to highlight important features in the predictions. This system could be a helpful support for doctors, making nail disease diagnosis more accurate and faster.
【3】DMFlow: Disordered Materials Generation by Flow Matching
标题:DMFlow:通过流匹配生成无序材料
链接:https://arxiv.org/abs/2602.04734
作者:Liming Wu,Rui Jiao,Qi Li,Mingze Li,Songyou Li,Shifeng Jin,Wenbing Huang
摘要:设计具有定制性能的材料对技术进步至关重要。然而,大多数深层生成模型只关注完美有序的晶体,忽略了无序材料的重要类别。为了解决这个差距,我们引入DMFlow,一个专门为无序晶体设计的生成框架。我们的方法引入了一个统一的表示有序,置换无序(SD),和位置无序(PD)晶体,并采用流匹配模型,共同产生所有的结构组件。一个关键的创新是一个黎曼流匹配框架与球形重新参数化,确保物理上有效的无序权重的概率单纯形。该向量场是通过一种新型的图神经网络(GNN)学习的,该网络结合了物理对称性和专门的消息传递方案。最后,一个两阶段的离散化过程转换成多热原子分配的连续权重。为了支持该领域的研究,我们发布了一个基准,其中包含从晶体学开放数据库中精选的SD、PD和混合结构。晶体结构预测(CSP)和从头生成(DNG)任务的实验表明,DMFlow显着优于从有序晶体生成适应的最先进的基线。我们希望我们的工作为人工智能驱动的无序材料发现奠定基础。
摘要:The design of materials with tailored properties is crucial for technological progress. However, most deep generative models focus exclusively on perfectly ordered crystals, neglecting the important class of disordered materials. To address this gap, we introduce DMFlow, a generative framework specifically designed for disordered crystals. Our approach introduces a unified representation for ordered, Substitutionally Disordered (SD), and Positionally Disordered (PD) crystals, and employs a flow matching model to jointly generate all structural components. A key innovation is a Riemannian flow matching framework with spherical reparameterization, which ensures physically valid disorder weights on the probability simplex. The vector field is learned by a novel Graph Neural Network (GNN) that incorporates physical symmetries and a specialized message-passing scheme. Finally, a two-stage discretization procedure converts the continuous weights into multi-hot atomic assignments. To support research in this area, we release a benchmark containing SD, PD, and mixed structures curated from the Crystallography Open Database. Experiments on Crystal Structure Prediction (CSP) and De Novo Generation (DNG) tasks demonstrate that DMFlow significantly outperforms state-of-the-art baselines adapted from ordered crystal generation. We hope our work provides a foundation for the AI-driven discovery of disordered materials.
【4】OMG-Agent: Toward Robust Missing Modality Generation with Decoupled Coarse-to-Fine Agentic Workflows
标题:OMG-Agent:基于粗到细解耦工作流的鲁棒缺失模态生成
链接:https://arxiv.org/abs/2602.04144
作者:Ruiting Dai,Zheyu Wang,Haoyu Yang,Yihan Liu,Chengzhi Wang,Zekun Zhang,Zishan Huang,Jiaman Cen,Lisi Mo
摘要:数据的不完整性严重影响了多模态系统的可靠性。现有的重建方法面临着明显的瓶颈:传统的参数化/生成模型由于过度依赖内部存储器而容易产生幻觉,而检索增强框架则与检索刚性作斗争。重要的是,这些端到端架构从根本上受到语义细节纠缠的限制-逻辑推理和信号合成之间的结构冲突,这会损害保真度。在本文中,我们提出了\textbf{\underline{O}}mni-\textbf {\underline{M}}odality \textbf{\underline{G}}eneration Agent(\textbf{OMG-Agent}),一个新的框架,从静态映射的范式转移到一个动态的粗到细的抽象工作流。OMG-Agent通过模仿一个“慎思而后行动”的认知过程,将任务明确地分为三个协同阶段:(1)MLLM驱动的语义规划器,通过渐进式上下文推理解决输入歧义,创建确定性结构化语义规划;(2)非参数证据检索器,在外部知识中建立抽象语义;和(3)检索注入执行器,利用检索的证据作为灵活的特征提示,以克服刚性和合成高保真的细节。在多个基准测试上进行的大量实验表明,OMG-Agent始终优于最先进的方法,在极端缺失情况下保持鲁棒性,例如,CMU-MOSI上涨2.6美元,遗漏率为70美元。
摘要
:Data incompleteness severely impedes the reliability of multimodal systems. Existing reconstruction methods face distinct bottlenecks: conventional parametric/generative models are prone to hallucinations due to over-reliance on internal memory, while retrieval-augmented frameworks struggle with retrieval rigidity. Critically, these end-to-end architectures are fundamentally constrained by Semantic-Detail Entanglement -- a structural conflict between logical reasoning and signal synthesis that compromises fidelity. In this paper, we present \textbf{\underline{O}}mni-\textbf{\underline{M}}odality \textbf{\underline{G}}eneration Agent (\textbf{OMG-Agent}), a novel framework that shifts the paradigm from static mapping to a dynamic coarse-to-fine Agentic Workflow. By mimicking a \textit{deliberate-then-act} cognitive process, OMG-Agent explicitly decouples the task into three synergistic stages: (1) an MLLM-driven Semantic Planner that resolves input ambiguity via Progressive Contextual Reasoning, creating a deterministic structured semantic plan; (2) a non-parametric Evidence Retriever that grounds abstract semantics in external knowledge; and (3) a Retrieval-Injected Executor that utilizes retrieved evidence as flexible feature prompts to overcome rigidity and synthesize high-fidelity details. Extensive experiments on multiple benchmarks demonstrate that OMG-Agent consistently surpasses state-of-the-art methods, maintaining robustness under extreme missingness, e.g., a $2.6$-point gain on CMU-MOSI at $70$\% missing rates.
【5】Synthesizable Molecular Generation via Soft-constrained GFlowNets with Rich Chemical Priors
标题:通过具有丰富化学先验的软约束GFlowNet产生可合成分子
链接:https://arxiv.org/abs/2602.04119
作者:Hyeonah Kim,Minsu Kim,Celine Roget,Dionessa Biton,Louis Vaillancourt,Yves V. Brun,Yoshua Bengio,Alex Hernandez-Garcia
摘要:生成模型在实验药物发现活动中的应用受到从头设计可在实践中合成的分子的困难的严重限制。以前的工作已经利用生成流网络(GFlowNets)通过基于预定义的反应模板和构建块的状态和动作空间的设计来施加硬的可合成性约束。尽管这种方法前景看好,但目前缺乏灵活性和可扩展性。作为替代方案,我们提出了S3-GFN,它通过基于序列的GFlowNet的简单软正则化生成可合成的SMILES分子。我们的方法利用从大规模SMILES语料库中学习到的丰富的分子先验知识,将分子生成转向高回报,可合成的化学空间。该模型通过基于可合成和不可合成样本的单独缓冲区的对比学习信号进行离线重放训练来引入约束。我们的实验表明,S3-GFN学习生成可合成的分子($\geq 95\%$),在不同的任务中具有更高的回报。
摘要:The application of generative models for experimental drug discovery campaigns is severely limited by the difficulty of designing molecules de novo that can be synthesized in practice. Previous works have leveraged Generative Flow Networks (GFlowNets) to impose hard synthesizability constraints through the design of state and action spaces based on predefined reaction templates and building blocks. Despite the promising prospects of this approach, it currently lacks flexibility and scalability. As an alternative, we propose S3-GFN, which generates synthesizable SMILES molecules via simple soft regularization of a sequence-based GFlowNet. Our approach leverages rich molecular priors learned from large-scale SMILES corpora to steer molecular generation towards high-reward, synthesizable chemical spaces. The model induces constraints through off-policy replay training with a contrastive learning signal based on separate buffers of synthesizable and unsynthesizable samples. Our experiments show that S3-GFN learns to generate synthesizable molecules ($\geq 95\%$) with higher rewards in diverse tasks.
【6】Attack-Resistant Uniform Fairness for Linear and Smooth Contextual Bandits
标题:线性和光滑上下文盗贼的抗攻击均匀公平性
链接:https://arxiv.org/abs/2602.04125
作者:Qingwen Zhang,Wenjia Wang
摘要:现代系统,如数字平台和服务系统,越来越多地依赖于情境强盗进行在线决策;然而,它们的部署可能无意中造成武器之间的不公平暴露,破坏长期平台可持续性和供应商信任。本文研究了在一致$(1-δ)$-公平约束下的上下文强盗问题,并指出了其易受策略操纵的弱点.公平约束确保优惠待遇在所有情况下和时间范围内都有一个部门的实际报酬作为严格的理由,利用统一性来防止统计漏洞。我们开发了新的算法,实现(近)最小最优遗憾的线性和光滑的奖励函数,同时保持强大的$(1-\tilde{O}(1/T))$-公平保证,并进一步表征理论上固有的但渐近边际的“公平价格”。然而,我们发现,这种基于绩效的公平性变得特别容易受到信号操纵。我们发现,一个对手与最小的$\tilde{O}(1)$预算不仅可以降低整体性能,在传统的攻击,但也有选择性地诱导阴险的公平性特定的失败,而留下明显的遗憾措施基本上不受影响。为了解决这个问题,我们设计了强大的变种,将腐败自适应探索和误差补偿阈值。我们的方法在$C$-预算攻击下得到了第一个最小最大最优后悔界,同时保持了$(1-\tilde{O}(1/T))$-公平性。数值实验和一个真实的案例表明,我们的算法保持公平性和效率。
摘要:Modern systems, such as digital platforms and service systems, increasingly rely on contextual bandits for online decision-making; however, their deployment can inadvertently create unfair exposure among arms, undermining long-term platform sustainability and supplier trust. This paper studies the contextual bandit problem under a uniform $(1-δ)$-fairness constraint, and addresses its unique vulnerabilities to strategic manipulation. The fairness constraint ensures that preferential treatment is strictly justified by an arm's actual reward across all contexts and time horizons, using uniformity to prevent statistical loopholes. We develop novel algorithms that achieve (nearly) minimax-optimal regret for both linear and smooth reward functions, while maintaining strong $(1-\tilde{O}(1/T))$-fairness guarantees, and further characterize the theoretically inherent yet asymptotically marginal "price of fairness". However, we reveal that such merit-based fairness becomes uniquely susceptible to signal manipulation. We show that an adversary with a minimal $\tilde{O}(1)$ budget can not only degrade overall performance as in traditional attacks, but also selectively induce insidious fairness-specific failures while leaving conspicuous regret measures largely unaffected. To counter this, we design robust variants incorporating corruption-adaptive exploration and error-compensated thresholding. Our approach yields the first minimax-optimal regret bounds under $C$-budgeted attack while preserving $(1-\tilde{O}(1/T))$-fairness. Numerical experiments and a real-world case demonstrate that our algorithms sustain both fairness and efficiency.
半/弱/无/有监督|不确定性|主动学习(14篇)
【1】Safe Urban Traffic Control via Uncertainty-Aware Conformal Prediction and World-Model Reinforcement Learning
标题:通过不确定性感知共形预测和世界模型强化学习实现安全城市交通控制
链接:https://arxiv.org/abs/2602.04821
作者:Joydeep Chandra,Satyam Kumar Navneet,Aleksandr Algazinov,Yong Zhang
摘要:城市交通管理要求系统能够同时预测未来状况、检测异常并采取安全的纠正措施,同时提供可靠性保证。我们提出了STREAM-RL,一个统一的框架,引入了三个新的算法贡献:(1)PU-GAT+,一个不确定性引导的自适应共形预测器,使用预测的不确定性,通过置信单调注意力动态地重新加权图注意力,实现无分布覆盖保证;(2)CRFN-BY,一个共形残差流网络,通过使用Benjamini对流量进行归一化,对不确定性归一化残差进行建模。Yekutieli FDR控制下任意依赖;和(3)LyCon-WRL+,不确定性引导的安全世界模型RL代理与李雅普诺夫稳定性证书,认证Lipschitz边界,和不确定性传播的想象力推出。据我们所知,这是第一个将校准的不确定性从预测通过异常检测传播到具有端到端理论保证的安全策略学习的框架。在多个真实交通轨迹数据上的实验表明,STREAM-RL的覆盖效率达到91.4%,在验证依赖性的情况下,FDR控制在4.1%,安全率从标准PPO的69%提高到95.2%,同时获得更高的奖励,端到端推理延迟为23 ms。
摘要:Urban traffic management demands systems that simultaneously predict future conditions, detect anomalies, and take safe corrective actions -- all while providing reliability guarantees. We present STREAM-RL, a unified framework that introduces three novel algorithmic contributions: (1) PU-GAT+, an Uncertainty-Guided Adaptive Conformal Forecaster that uses prediction uncertainty to dynamically reweight graph attention via confidence-monotonic attention, achieving distribution-free coverage guarantees; (2) CRFN-BY, a Conformal Residual Flow Network that models uncertainty-normalized residuals via normalizing flows with Benjamini-Yekutieli FDR control under arbitrary dependence; and (3) LyCon-WRL+, an Uncertainty-Guided Safe World-Model RL agent with Lyapunov stability certificates, certified Lipschitz bounds, and uncertainty-propagated imagination rollouts. To our knowledge, this is the first framework to propagate calibrated uncertainty from forecasting through anomaly detection to safe policy learning with end-to-end theoretical guarantees. Experiments on multiple real-world traffic trajectory data demonstrate that STREAM-RL achieves 91.4\% coverage efficiency, controls FDR at 4.1\% under verified dependence, and improves safety rate to 95.2\% compared to 69\% for standard PPO while achieving higher reward, with 23ms end-to-end inference latency.
【2】Interval-Based AUC (iAUC): Extending ROC Analysis to Uncertainty-Aware Classification
标题:基于区间的AUC(iAUC):将ROC分析扩展到不确定性感知分类
链接:https://arxiv.org/abs/2602.04775
作者:Yuqi Li,Matthew M. Engelhard
摘要:在高风险预测中,通过区间值预测量化不确定性对于可靠的决策至关重要。然而,标准的评估工具,如受试者工作特征(ROC)曲线和曲线下面积(AUC)是专为得分而设计的,无法捕捉预测不确定性对排名性能的影响。我们提出了一个专门针对区间值预测的不确定性感知ROC框架,引入了两个新的度量:$AUC_L$和$AUC_U$。该框架实现了ROC平面的信息三区域分解,将成对排名划分为正确,不正确和不确定的排序。这种方法自然支持选择性预测,通过允许模型放弃对具有重叠间隔的情况进行排名,从而优化预测率和判别可靠性之间的权衡。我们证明了在有效的类条件覆盖下,$AUC_L$和$AUC_U$提供了理论最优AUC($AUC^*$)的正式下限和上限,表征了可实现区分的物理极限。所提出的框架广泛适用于区间值预测模型,无论区间构造方法。在真实世界的基准数据集上进行的实验,使用基于引导的间隔作为一个实例,验证了框架的正确性,并展示了其对不确定性感知评估和决策的实用性。
摘要:In high-stakes risk prediction, quantifying uncertainty through interval-valued predictions is essential for reliable decision-making. However, standard evaluation tools like the receiver operating characteristic (ROC) curve and the area under the curve (AUC) are designed for point scores and fail to capture the impact of predictive uncertainty on ranking performance. We propose an uncertainty-aware ROC framework specifically for interval-valued predictions, introducing two new measures: $AUC_L$ and $AUC_U$. This framework enables an informative three-region decomposition of the ROC plane, partitioning pairwise rankings into correct, incorrect, and uncertain orderings. This approach naturally supports selective prediction by allowing models to abstain from ranking cases with overlapping intervals, thereby optimizing the trade-off between abstention rate and discriminative reliability. We prove that under valid class-conditional coverage, $AUC_L$ and $AUC_U$ provide formal lower and upper bounds on the theoretical optimal AUC ($AUC^*$), characterizing the physical limit of achievable discrimination. The proposed framework applies broadly to interval-valued prediction models, regardless of the interval construction method. Experiments on real-world benchmark datasets, using bootstrap-based intervals as one instantiation, validate the framework's correctness and demonstrate its practical utility for uncertainty-aware evaluation and decision-making.
【3】Active Asymmetric Multi-Agent Multimodal Learning under Uncertainty
标题:不确定性下的主动非对称多智能体多模式学习
链接:https://arxiv.org/abs/2602.04763
作者:Rui Liu,Pratap Tokekar,Ming Lin
摘要:多智能体系统越来越多地配备了异构多模态传感器,使更丰富的感知,但引入特定于模态和代理依赖的不确定性。现有的多智能体协作框架通常在代理级别的原因,假设同质感测,并隐式处理不确定性,限制传感器损坏下的鲁棒性。我们提出了不确定性下的主动非对称多智能体多模态学习(A2MAML),这是一种用于不确定性感知的模态级协作的原则性方法。A2MAML将每个模态特定特征建模为具有不确定性预测的随机估计,主动选择可靠的代理模态对,并通过贝叶斯逆方差加权来聚合信息。该公式实现了细粒度的模态级融合,支持非对称模态可用性,并提供了抑制损坏或噪声模态的原则性机制。在用于协同事故检测的连接自动驾驶场景上进行的大量实验表明,A2MAML始终优于单智能体和协同基线,事故检测率提高了18.7%。
摘要:Multi-agent systems are increasingly equipped with heterogeneous multimodal sensors, enabling richer perception but introducing modality-specific and agent-dependent uncertainty. Existing multi-agent collaboration frameworks typically reason at the agent level, assume homogeneous sensing, and handle uncertainty implicitly, limiting robustness under sensor corruption. We propose Active Asymmetric Multi-Agent Multimodal Learning under Uncertainty (A2MAML), a principled approach for uncertainty-aware, modality-level collaboration. A2MAML models each modality-specific feature as a stochastic estimate with uncertainty prediction, actively selects reliable agent-modality pairs, and aggregates information via Bayesian inverse-variance weighting. This formulation enables fine-grained, modality-level fusion, supports asymmetric modality availability, and provides a principled mechanism to suppress corrupted or noisy modalities. Extensive experiments on connected autonomous driving scenarios for collaborative accident detection demonstrate that A2MAML consistently outperforms both single-agent and collaborative baselines, achieving up to 18.7% higher accident detection rate.
【4】Let Experts Feel Uncertainty: A Multi-Expert Label Distribution Approach to Probabilistic Time Series Forecasting
标题:让专家感受到不确定性:概率时间序列预测的多专家标签分布方法
链接:https://arxiv.org/abs/2602.04678
作者:Zhen Zhou,Zhirui Wang,Qi Hong,Yunyang Shi,Ziyuan Gu,Zhiyuan Liu
备注:11 pages, 2figures
摘要:时间序列预测在现实世界中的应用需要高的预测精度和可解释的不确定性量化。传统的点预测方法往往无法捕捉时间序列数据中固有的不确定性,而现有的概率方法则难以平衡计算效率与可解释性。我们提出了一种新的多专家学习分布式标签(LDL)框架,通过混合专家架构与分布式学习能力来解决这些挑战。我们的方法引入了两种互补的方法:(1)Multi-Expert LDL,它采用具有不同学习参数的多个专家来捕获不同的时间模式,以及(2)Pattern-Aware LDL-MoE,它通过专业的子专家将时间序列明确地分解为可解释的组件(趋势,季节性,变化点,波动性)。这两个框架都将传统的点预测扩展到分布式学习,通过最大均值离散(MMD)实现了丰富的不确定性量化。我们对来自M5数据集的汇总销售数据进行了评估,与基线方法相比,表现出卓越的性能。连续Multi-Expert LDL实现了最佳的整体性能,而Pattern-Aware LDL-MoE通过组件分析提供了增强的可解释性。我们的框架成功地平衡了预测准确性和可解释性,使其适用于性能和可操作见解都至关重要的现实预测应用。
摘要:Time series forecasting in real-world applications requires both high predictive accuracy and interpretable uncertainty quantification. Traditional point prediction methods often fail to capture the inherent uncertainty in time series data, while existing probabilistic approaches struggle to balance computational efficiency with interpretability. We propose a novel Multi-Expert Learning Distributional Labels (LDL) framework that addresses these challenges through mixture-of-experts architectures with distributional learning capabilities. Our approach introduces two complementary methods: (1) Multi-Expert LDL, which employs multiple experts with different learned parameters to capture diverse temporal patterns, and (2) Pattern-Aware LDL-MoE, which explicitly decomposes time series into interpretable components (trend, seasonality, changepoints, volatility) through specialized sub-experts. Both frameworks extend traditional point prediction to distributional learning, enabling rich uncertainty quantification through Maximum Mean Discrepancy (MMD). We evaluate our methods on aggregated sales data derived from the M5 dataset, demonstrating superior performance compared to baseline approaches. The continuous Multi-Expert LDL achieves the best overall performance, while the Pattern-Aware LDL-MoE provides enhanced interpretability through component-wise analysis. Our frameworks successfully balance predictive accuracy with interpretability, making them suitable for real-world forecasting applications where both performance and actionable insights are crucial.
【5】Benchmarking Uncertainty Quantification of Plug-and-Play Diffusion Priors for Inverse Problems Solving
标题:即插即用扩散先验的不确定性量化以解决反问题
链接:https://arxiv.org/abs/2602.04189
作者:Xiaoyu Qiu,Taewon Yang,Zhanhao Liu,Guanyang Wang,Liyue Shen
摘要:即插即用扩散先验(Plug-and-Play Diffusion Priors,PnPDP)已成为科学和工程领域中求解逆问题的一个强有力的范例。然而,目前的重建质量的评价强调点估计精度指标对一个单一的样本,这并不反映随机性质的PnPDP求解器和逆问题的内在不确定性,关键的科学任务。这就产生了一个根本的不匹配:在逆问题中,期望的输出通常是后验分布,大多数PnPDP求解器会在重建过程中引入分布,但现有的基准只评估单个重建,忽略了不确定性等分布特征。为了解决这一差距,我们进行了系统的研究,基准现有的扩散逆求解器的不确定性量化(UQ)。具体来说,我们设计了一个严格的玩具模型模拟,以评估各种PnPDP求解器的不确定性行为,并提出了一个用户需求驱动的分类。通过对玩具模拟和各种现实世界的科学逆问题进行广泛的实验,我们观察到与我们的分类和理论依据一致的不确定性行为,为评估和理解PnPDP的不确定性提供了新的见解。
摘要
:Plug-and-play diffusion priors (PnPDP) have become a powerful paradigm for solving inverse problems in scientific and engineering domains. Yet, current evaluations of reconstruction quality emphasize point-estimate accuracy metrics on a single sample, which do not reflect the stochastic nature of PnPDP solvers and the intrinsic uncertainty of inverse problems, critical for scientific tasks. This creates a fundamental mismatch: in inverse problems, the desired output is typically a posterior distribution and most PnPDP solvers induce a distribution over reconstructions, but existing benchmarks only evaluate a single reconstruction, ignoring distributional characterization such as uncertainty. To address this gap, we conduct a systematic study to benchmark the uncertainty quantification (UQ) of existing diffusion inverse solvers. Specifically, we design a rigorous toy model simulation to evaluate the uncertainty behavior of various PnPDP solvers, and propose a UQ-driven categorization. Through extensive experiments on toy simulations and diverse real-world scientific inverse problems, we observe uncertainty behaviors consistent with our taxonomy and theoretical justification, providing new insights for evaluating and understanding the uncertainty for PnPDPs.
【6】Supervised Learning as Lossy Compression: Characterizing Generalization and Sample Complexity via Finite Blocklength Analysis
标题:作为有损压缩的监督学习:通过有限块长分析描述概括和样本复杂性
链接:https://arxiv.org/abs/2602.04107
作者:Kosuke Sugiyama,Masato Uchida
备注:22 pages, 1 figure
摘要:本文提出了一种新的信息理论的角度概括在机器学习的框架内的学习问题的背景下,有损压缩和有限块长度分析。在我们的方法中,训练数据的采样形式上对应于一个编码过程,模型的构建过程对应于一个解码过程。通过利用有限块长度分析,我们得到了一个固定的随机学习算法和相关的最佳采样策略的样本复杂度和泛化误差的下界。我们的界限明确地描述了学习算法的过拟合程度以及其归纳偏差和任务之间的不匹配。这种分离提供了优于现有框架的显著优势。此外,我们分解过拟合项,以显示其理论上的连接,现有的度量发现在信息理论界和稳定性理论,统一这些观点下,我们提出的框架。
摘要:This paper presents a novel information-theoretic perspective on generalization in machine learning by framing the learning problem within the context of lossy compression and applying finite blocklength analysis. In our approach, the sampling of training data formally corresponds to an encoding process, and the model construction to a decoding process. By leveraging finite blocklength analysis, we derive lower bounds on sample complexity and generalization error for a fixed randomized learning algorithm and its associated optimal sampling strategy. Our bounds explicitly characterize the degree of overfitting of the learning algorithm and the mismatch between its inductive bias and the task as distinct terms. This separation provides a significant advantage over existing frameworks. Additionally, we decompose the overfitting term to show its theoretical connection to existing metrics found in information-theoretic bounds and stability theory, unifying these perspectives under our proposed framework.
【7】Federated Concept-Based Models: Interpretable models with distributed supervision
标题:基于概念的联邦模型:具有分布式监督的可解释模型
链接:https://arxiv.org/abs/2602.04093
作者:Dario Fenoglio,Arianna Casanova,Francesco De Santis,Mohan Li,Gabriele Dominici,Johannes Schneider,Martin Gjoreski,Marc Langheinrich,Pietro Barbiero,Giovanni De Felice
摘要:基于概念的模型(CM)通过将预测建立在人类可理解的概念中来增强深度学习的可解释性。然而,概念注释的获取成本很高,并且很少在单个数据源中大规模使用。联邦学习(FL)可以通过启用跨机构培训来缓解这一限制,该培训利用分布在多个数据所有者之间的概念注释。然而,FL缺乏可解释的建模范式。将CM与FL相结合是不平凡的:CM假设一个固定的概念空间和预定义的模型架构,而现实世界的FL是异构和非固定的,随着时间的推移,机构加入并带来新的监督。在这项工作中,我们提出了联邦概念为基础的模型(F-CMs),一种新的方法来部署CM在不断发展的FL设置。F-CM聚合了机构间的概念级信息,并有效地适应模型架构,以应对可用概念监督的变化,同时保护机构隐私。从经验上讲,F-CM在完全概念监督的情况下保持了训练设置的准确性和干预有效性,同时优于非自适应联邦基线。值得注意的是,F-CM能够对给定机构无法获得的概念进行可解释的推理,这是现有方法的一个关键新颖性。
摘要:Concept-based models (CMs) enhance interpretability in deep learning by grounding predictions in human-understandable concepts. However, concept annotations are expensive to obtain and rarely available at scale within a single data source. Federated learning (FL) could alleviate this limitation by enabling cross-institutional training that leverages concept annotations distributed across multiple data owners. Yet, FL lacks interpretable modeling paradigms. Integrating CMs with FL is non-trivial: CMs assume a fixed concept space and a predefined model architecture, whereas real-world FL is heterogeneous and non-stationary, with institutions joining over time and bringing new supervision. In this work, we propose Federated Concept-based Models (F-CMs), a new methodology for deploying CMs in evolving FL settings. F-CMs aggregate concept-level information across institutions and efficiently adapt the model architecture in response to changes in the available concept supervision, while preserving institutional privacy. Empirically, F-CMs preserve the accuracy and intervention effectiveness of training settings with full concept supervision, while outperforming non-adaptive federated baselines. Notably, F-CMs enable interpretable inference on concepts not available to a given institution, a key novelty with respect to existing approaches.
【8】eCP: Informative uncertainty quantification via Equivariantized Conformal Prediction with pre-trained models
标题:eCP:通过预训练模型的等方差保形预测进行信息不确定性量化
链接:https://arxiv.org/abs/2602.03986
作者:Nikolaos Bousias,Lars Lindemann,George Pappas
摘要:我们研究了预训练模型的组对称化对共形预测(CP)的影响,共形预测是一种事后的、无分布的、有限样本的不确定性量化方法,在数据交换的假设下提供了正式的覆盖保证。不幸的是,CP不确定性区域可以在长时间任务中显著增长,使得统计保证没有信息。为此,我们建议通过预训练预测器的组平均来向CP注入几何信息,以在轨道上分布非一致性质量。每个样本现在被视为一个轨道的代表,因此不确定性可以通过其他样本通过对称群的轨道诱导元素与它纠缠来减轻。我们的方法可证明产生收缩的不一致分数在增加凸顺序,这意味着改善指数尾界和更尖锐的共形预测集的期望,特别是在高置信水平。然后,我们提出了一个实验设计,以测试这些理论主张在行人轨迹预测。
摘要:We study the effect of group symmetrization of pre-trained models on conformal prediction (CP), a post-hoc, distribution-free, finite-sample method of uncertainty quantification that offers formal coverage guarantees under the assumption of data exchangeability. Unfortunately, CP uncertainty regions can grow significantly in long horizon missions, rendering the statistical guarantees uninformative. To that end, we propose infusing CP with geometric information via group-averaging of the pretrained predictor to distribute the non-conformity mass across the orbits. Each sample now is treated as a representative of an orbit, thus uncertainty can be mitigated by other samples entangled to it via the orbit inducing elements of the symmetry group. Our approach provably yields contracted non-conformity scores in increasing convex order, implying improved exponential-tail bounds and sharper conformal prediction sets in expectation, especially at high confidence levels. We then propose an experimental design to test these theoretical claims in pedestrian trajectory prediction.
【9】Learning to Separate RF Signals Under Uncertainty: Detect-Then-Separate vs. Unified Joint Models
标题:学习在不确定性下分离RF信号:先检测然后分离与统一关节模型
链接:https://arxiv.org/abs/2602.04650
作者:Ariel Rodrigez,Alejandro Lancho,Amir Weiss
备注:6 pages, 6 figures, 1 table, accepted at the 2026 IEEE International Conference on Communications
摘要:日益拥挤的射频(RF)频谱迫使通信信号共存,产生了异构的干扰源,其结构通常偏离高斯模型。在这种环境中恢复受干扰污染的感兴趣信号是一个核心挑战,特别是在单通道RF处理中。现有的数据驱动方法通常假设干扰类型是已知的,产生的专业模型的集合,规模与干扰的数量很差。我们发现,检测,然后分离(DETECT-THEN-SEPLESS)的策略承认一个分析的理由:在高斯混合框架内,一个插件的最大后验检测器,其次是类型条件的最优估计实现渐近最小均方误差最优性下一个温和的时间多样性条件。这使得XML成为一个原则性的基准,但它依赖于多个特定类型的模型,限制了可扩展性。受此启发,我们提出了一个统一的联合模型(UJM),其中单个深度神经架构在直接应用于接收信号时学习联合检测和分离。使用量身定制的UNet架构的基带(复值)RF信号,我们比较合成和记录的干扰类型,表明容量匹配的UJM可以匹配不同的信号与干扰和噪声比,干扰类型和星座顺序,包括不匹配的训练和测试类型的不确定性比例的甲骨文辅助的性能。这些研究结果突出表明,统一司法机制是一种可扩展的、切实可行的替代司法独立的办法,同时为在更广泛的制度下统一分离开辟了新的方向。
摘要:The increasingly crowded radio frequency (RF) spectrum forces communication signals to coexist, creating heterogeneous interferers whose structure often departs from Gaussian models. Recovering the interference-contaminated signal of interest in such settings is a central challenge, especially in single-channel RF processing. Existing data-driven methods often assume that the interference type is known, yielding ensembles of specialized models that scale poorly with the number of interferers. We show that detect-then-separate (DTS) strategies admit an analytical justification: within a Gaussian mixture framework, a plug-in maximum a posteriori detector followed by type-conditioned optimal estimation achieves asymptotic minimum mean-square error optimality under a mild temporal-diversity condition. This makes DTS a principled benchmark, but its reliance on multiple type-specific models limits scalability. Motivated by this, we propose a unified joint model (UJM), in which a single deep neural architecture learns to jointly detect and separate when applied directly to the received signal. Using tailored UNet architectures for baseband (complex-valued) RF signals, we compare DTS and UJM on synthetic and recorded interference types, showing that a capacity-matched UJM can match oracle-aided DTS performance across diverse signal-to-interference-and-noise ratios, interference types, and constellation orders, including mismatched training and testing type-uncertainty proportions. These findings highlight UJM as a scalable and practical alternative to DTS, while opening new directions for unified separation under broader regimes.
【10】A principled framework for uncertainty decomposition in TabPFN
标题:TabPFN中不确定性分解的原则框架
链接:https://arxiv.org/abs/2602.04596
作者:Sandra Fortini,Kenyon Ng,Sonia Petrone,Judith Rousseau,Susan Wei
备注:9 pages (+2 reference, +34 appendix). Code in https://github.com/weiyaw/ud4pfn
摘要:TabPFN是一个Transformer,通过将贝叶斯预测分摊到单个前向传递中,在有监督的表格任务上实现最先进的性能。然而,目前在TabPFN中没有不确定性分解的方法。因为它的行为,在一个理想的限制,作为一个贝叶斯上下文学习者,我们投分解的挑战作为贝叶斯预测推理(BPI)的问题。BPI中的主要计算工具,预测蒙特卡罗,在这里应用具有挑战性,因为它需要模拟未建模的协变量。因此,我们追求的渐近替代,填补了监督设置的理论空白,证明了预测CLT下的准鞅条件。我们推导出由上下文中预测更新的波动性确定的方差估计量。由此产生的可信带是快速计算,目标认知的不确定性,并实现近标称频率覆盖。对于分类,我们进一步得到一个基于熵的不确定性分解。
摘要:TabPFN is a transformer that achieves state-of-the-art performance on supervised tabular tasks by amortizing Bayesian prediction into a single forward pass. However, there is currently no method for uncertainty decomposition in TabPFN. Because it behaves, in an idealised limit, as a Bayesian in-context learner, we cast the decomposition challenge as a Bayesian predictive inference (BPI) problem. The main computational tool in BPI, predictive Monte Carlo, is challenging to apply here as it requires simulating unmodeled covariates. We therefore pursue the asymptotic alternative, filling a gap in the theory for supervised settings by proving a predictive CLT under quasi-martingale conditions. We derive variance estimators determined by the volatility of predictive updates along the context. The resulting credible bands are fast to compute, target epistemic uncertainty, and achieve near-nominal frequentist coverage. For classification, we further obtain an entropy-based uncertainty decomposition.
【11】Bayesian PINNs for uncertainty-aware inverse problems (BPINN-IP)
标题:用于不确定性感知逆问题的Bayesian PINN(BPINN-IP)
链接:https://arxiv.org/abs/2602.04459
作者:Ali Mohammad-Djafari
备注:submitted to ICIP 2006 conference
摘要:本文的主要贡献是发展了一个层次贝叶斯公式的PINNs的线性逆问题,这是所谓的BPINN-IP。所提出的方法扩展PINN占先验知识的预期NN输出的性质,以及它的权重。此外,由于我们可以获得后验概率分布,自然可以量化不确定性。此外,变分推理和蒙特卡罗辍学提供预测手段和重建图像的方差。一个应用程序的反卷积和超分辨率的例子被认为是,不同的实施步骤的细节,并提出了一些初步的结果。
摘要:The main contribution of this paper is to develop a hierarchical Bayesian formulation of PINNs for linear inverse problems, which is called BPINN-IP. The proposed methodology extends PINN to account for prior knowledge on the nature of the expected NN output, as well as its weights. Also, as we can have access to the posterior probability distributions, naturally uncertainties can be quantified. Also, variational inference and Monte Carlo dropout are employed to provide predictive means and variances for reconstructed images. Un example of applications to deconvolution and super-resolution is considered, details of the different steps of implementations are given, and some preliminary results are presented.
【12】Aortic Valve Disease Detection from PPG via Physiology-Informed Self-Supervised Learning
标题:通过生理知情自我监督学习从PGP检测主动脉瓣疾病
链接:https://arxiv.org/abs/2602.04266
作者:Jiaze Wang,Qinghao Zhao,Zizheng Chen,Zhejun Sun,Deyun Zhang,Yuxi Zhou,Shenda Hong
备注:28 pages, 7 figures. Under review
摘要:主动脉瓣疾病的传统诊断依赖于超声心动图,但其成本和所需的专业知识限制了其在大规模早期筛查中的应用。光电体积描记术(PPG)由于其在可穿戴设备中的广泛可用性以及其反映潜在血流动力学的能力而已经成为一种有前途的筛查模式。然而,黄金标准标记的PPG数据的极度稀缺严重限制了数据驱动方法的有效性。为了应对这一挑战,我们提出并验证了一种新的范式,生理引导的自我监督学习(PG-SSL),旨在解锁大规模未标记PPG数据的价值,以有效筛选主动脉狭窄(AS)和主动脉返流(AR)。使用来自英国生物银行的超过170,000个未标记的PPG样本,我们将临床知识形式化为一组PPG形态表型,并构建用于自我监督预训练的脉搏模式识别代理任务。一个双分支,门控融合架构,然后采用高效的微调一个小的标记子集。拟议的PG-SSL框架分别实现了AS和AR筛查的AUC为0.765和0.776,显著优于在有限标记数据上训练的监督基线。多变量分析进一步验证了模型输出作为独立的数字生物标志物,在调整标准临床风险因素后具有持续的预后价值。这项研究表明,PG-SSL提供了一种有效的、领域知识驱动的解决方案,以标记医疗人工智能中的稀缺性,并显示出实现低成本、大规模早期筛查主动脉瓣疾病的强大潜力。
摘要
:Traditional diagnosis of aortic valve disease relies on echocardiography, but its cost and required expertise limit its use in large-scale early screening. Photoplethysmography (PPG) has emerged as a promising screening modality due to its widespread availability in wearable devices and its ability to reflect underlying hemodynamic dynamics. However, the extreme scarcity of gold-standard labeled PPG data severely constrains the effectiveness of data-driven approaches. To address this challenge, we propose and validate a new paradigm, Physiology-Guided Self-Supervised Learning (PG-SSL), aimed at unlocking the value of large-scale unlabeled PPG data for efficient screening of Aortic Stenosis (AS) and Aortic Regurgitation (AR). Using over 170,000 unlabeled PPG samples from the UK Biobank, we formalize clinical knowledge into a set of PPG morphological phenotypes and construct a pulse pattern recognition proxy task for self-supervised pre-training. A dual-branch, gated-fusion architecture is then employed for efficient fine-tuning on a small labeled subset. The proposed PG-SSL framework achieves AUCs of 0.765 and 0.776 for AS and AR screening, respectively, significantly outperforming supervised baselines trained on limited labeled data. Multivariable analysis further validates the model output as an independent digital biomarker with sustained prognostic value after adjustment for standard clinical risk factors. This study demonstrates that PG-SSL provides an effective, domain knowledge-driven solution to label scarcity in medical artificial intelligence and shows strong potential for enabling low-cost, large-scale early screening of aortic valve disease.
【13】Prenatal Stress Detection from Electrocardiography Using Self-Supervised Deep Learning: Development and External Validation
标题:使用自我监督深度学习通过心电图检测产前压力:开发和外部验证
链接:https://arxiv.org/abs/2602.03886
作者:Martin G. Frasch,Marlene J. E. Mayer,Clara Becker,Peter Zimmermann,Camilla Zelgert,Marta C. Antonelli,Silvia M. Lobmaier
备注:22 pages, 5 figures
摘要:产前心理压力影响15-25%的怀孕,并增加早产,低出生体重和不良神经发育结果的风险。目前的筛查依赖于主观问卷(PSS-10),限制了连续监测。我们使用FELICITy 1队列(151名孕妇,妊娠32-38周)开发了用于心电图(ECG)压力检测的深度学习模型。ResNet-34编码器通过Simplified对比学习对每个受试者的40,692个ECG段进行预训练。多层特征提取使得能够跨母体(mECG)、胎儿(fECG)和腹部ECG(aECG)进行二元分类和连续PSS预测。外部确认使用FELICITy 2 RCT(28例受试者,不同ECG器械,瑜伽干预vs.对照)。FELICITy 1(5倍CV):mECG 98.6%准确度(R2=0.88,MAE=1.90),fECG 99.8%(R2=0.95,MAE=1.19),aECG 95.5%(R2=0.75,MAE=2.80)。FELICITy 2的外部验证:mECG 77.3%准确度(R2=0.62,MAE=3.54,AUC=0.826),aECG 63.6%(R2=0.29,AUC=0.705)。基于信号质量的信道选择优于所有信道平均(R2提高+12%)。混合效应模型检测到显著的干预反应(p=0.041)。对妊娠ECG的自我监督深度学习可以实现准确、客观的压力评估,多层特征提取的性能大大优于单一嵌入方法。
摘要:Prenatal psychological stress affects 15-25% of pregnancies and increases risks of preterm birth, low birth weight, and adverse neurodevelopmental outcomes. Current screening relies on subjective questionnaires (PSS-10), limiting continuous monitoring. We developed deep learning models for stress detection from electrocardiography (ECG) using the FELICITy 1 cohort (151 pregnant women, 32-38 weeks gestation). A ResNet-34 encoder was pretrained via SimCLR contrastive learning on 40,692 ECG segments per subject. Multi-layer feature extraction enabled binary classification and continuous PSS prediction across maternal (mECG), fetal (fECG), and abdominal ECG (aECG). External validation used the FELICITy 2 RCT (28 subjects, different ECG device, yoga intervention vs. control). On FELICITy 1 (5-fold CV): mECG 98.6% accuracy (R2=0.88, MAE=1.90), fECG 99.8% (R2=0.95, MAE=1.19), aECG 95.5% (R2=0.75, MAE=2.80). External validation on FELICITy 2: mECG 77.3% accuracy (R2=0.62, MAE=3.54, AUC=0.826), aECG 63.6% (R2=0.29, AUC=0.705). Signal quality-based channel selection outperformed all-channel averaging (+12% R2 improvement). Mixed-effects models detected a significant intervention response (p=0.041). Self-supervised deep learning on pregnancy ECG enables accurate, objective stress assessment, with multi-layer feature extraction substantially outperforming single embedding approaches.
【14】Online unsupervised Hebbian learning in deep photonic neuromorphic networks
标题:深度光神经形态网络中的在线无监督Hebbian学习
链接:https://arxiv.org/abs/2601.22300
作者:Xi Li,Disha Biswas,Peng Zhou,Wesley H. Brigner,Anna Capuano,Joseph S. Friedman,Qing Gu
备注:15 pages, 4 figures
摘要:虽然神经网络的软件实现已经推动了计算的重大进步,但冯·诺依曼架构对速度和能效造成了根本性的限制。神经形态网络的结构受到大脑结构的启发,提供了一个令人信服的解决方案,有可能接近神经生物系统的极端能源效率。光子神经形态网络(PNN)特别有吸引力,因为它们利用了光的固有优势,即高并行性,低延迟和卓越的能源效率。以前的PNN演示主要集中在设备级功能或系统级实现上,依赖于监督学习和低效的光电光(OEO)转换。在这里,我们介绍了一种纯光子深度PNN架构,可以实现在线无监督学习。我们提出了一个本地的反馈机制,完全在光域中,实现了赫布学习规则,使用非易失性相变材料突触。我们使用商用光纤平台在一个非平凡的字母识别任务上实验性地演示了这种方法,并实现了100%的识别率,展示了一种用于高效实时信息处理的全光学解决方案。这项工作释放了光子计算在复杂人工智能应用中的潜力,实现了对光学信息的直接、高吞吐量处理,而无需中间的OEO信号转换。
摘要:While software implementations of neural networks have driven significant advances in computation, the von Neumann architecture imposes fundamental limitations on speed and energy efficiency. Neuromorphic networks, with structures inspired by the brain's architecture, offer a compelling solution with the potential to approach the extreme energy efficiency of neurobiological systems. Photonic neuromorphic networks (PNNs) are particularly attractive because they leverage the inherent advantages of light, namely high parallelism, low latency, and exceptional energy efficiency. Previous PNN demonstrations have largely focused on device-level functionalities or system-level implementations reliant on supervised learning and inefficient optical-electrical-optical (OEO) conversions. Here, we introduce a purely photonic deep PNN architecture that enables online, unsupervised learning. We propose a local feedback mechanism operating entirely in the optical domain that implements a Hebbian learning rule using non-volatile phase-change material synapses. We experimentally demonstrate this approach on a non-trivial letter recognition task using a commercially available fiber-optic platform and achieve a 100 percent recognition rate, showcasing an all-optical solution for efficient, real-time information processing. This work unlocks the potential of photonic computing for complex artificial intelligence applications by enabling direct, high-throughput processing of optical information without intermediate OEO signal conversions.
迁移|Zero/Few/One-Shot|自适应(8篇)
【1】Resilient Load Forecasting under Climate Change: Adaptive Conditional Neural Processes for Few-Shot Extreme Load Forecasting
标题:气候变化下的弹性负荷预测:用于少次极端负荷预测的自适应条件神经过程
链接:https://arxiv.org/abs/2602.04609
作者:Chenxi Hu,Yue Ma,Yifan Wu,Yunhe Hou
摘要:极端天气会大幅改变电力消耗行为,导致负荷曲线出现尖峰和明显的波动。如果在这些时期预测不准确,电力系统更有可能面临供应短缺或局部过载,迫使采取紧急行动,如减载,并增加服务中断和公共安全影响的风险。这个问题本质上是困难的,因为极端事件可以触发负载模式的突然状态转变,而相关的极端样本是罕见的和不规则的,使得可靠的学习和校准具有挑战性。提出了一种数据稀缺条件下的概率预测模型AdaCNP。AdaCNP在共享嵌入空间中学习相似性。对于每个目标数据,它评估每个历史上下文段与当前条件的相关程度,并相应地重新加权上下文信息。这种设计突出了最翔实的历史证据,即使极端的样本是罕见的。它使Few-Shot适应以前看不见的极端模式。AdaCNP还为风险意识决策生成预测分布,而无需对目标域进行昂贵的微调。我们评估AdaCNP对现实世界的电力系统负荷数据,并将其与一系列有代表性的基线进行比较。结果表明,AdaCNP在极端时期更加稳健,相对于最强基线,均方误差降低了22%,同时实现了最低的负对数似然,表明更可靠的概率输出。这些研究结果表明,AdaCNP可以有效地减轻突然的分布变化和稀缺的极端样本的综合影响,提供了一个更值得信赖的预测弹性电力系统运行极端事件。
摘要
:Extreme weather can substantially change electricity consumption behavior, causing load curves to exhibit sharp spikes and pronounced volatility. If forecasts are inaccurate during those periods, power systems are more likely to face supply shortfalls or localized overloads, forcing emergency actions such as load shedding and increasing the risk of service disruptions and public-safety impacts. This problem is inherently difficult because extreme events can trigger abrupt regime shifts in load patterns, while relevant extreme samples are rare and irregular, making reliable learning and calibration challenging. We propose AdaCNP, a probabilistic forecasting model for data-scarce condition. AdaCNP learns similarity in a shared embedding space. For each target data, it evaluates how relevant each historical context segment is to the current condition and reweights the context information accordingly. This design highlights the most informative historical evidence even when extreme samples are rare. It enables few-shot adaptation to previously unseen extreme patterns. AdaCNP also produces predictive distributions for risk-aware decision-making without expensive fine-tuning on the target domain. We evaluate AdaCNP on real-world power-system load data and compare it against a range of representative baselines. The results show that AdaCNP is more robust during extreme periods, reducing the mean squared error by 22\% relative to the strongest baseline while achieving the lowest negative log-likelihood, indicating more reliable probabilistic outputs. These findings suggest that AdaCNP can effectively mitigate the combined impact of abrupt distribution shifts and scarce extreme samples, providing a more trustworthy forecasting for resilient power system operation under extreme events.
【2】Forget to Generalize: Iterative Adaptation for Generalization in Federated Learning
标题:忘记泛化:联邦学习中泛化的迭代适应
链接:https://arxiv.org/abs/2602.04536
作者:Abdulrahman Alotaibi,Irene Tenison,Miriam Kim,Isaac Lee,Lalana Kagal
摘要:Web自然是异构的,用户设备,地理区域,浏览模式和上下文都导致高度多样化,独特的数据集。联合学习(FL)是Web的一个重要范例,因为它可以在不同的用户设备、Web服务和客户端之间实现隐私保护和协作机器学习,而无需集中敏感数据。然而,在现实世界的Web系统中普遍存在的非IID客户端分布下,其性能严重下降。在这项工作中,我们提出了一个新的训练范式-迭代联邦适应(IFA)-通过代明智的忘记和进化策略,提高泛化在异构联邦设置。具体来说,我们将训练分为多代,在每代结束时,选择一部分模型参数(a)随机或(b)从模型的后面几层中重新初始化它们。这种迭代的遗忘和进化时间表允许模型逃避局部最小值并保留全局相关的表示。在CIFAR-10、MIT-Indoors和Stanford Dogs数据集上的大量实验表明,该方法提高了全局精度,特别是当跨客户端的数据是非IID时。该方法可以在任何联邦算法上实现,以提高其泛化性能。我们观察到数据集平均提高了21.5%。这项工作推进了现实世界的异构和分布式Web系统的可扩展性,隐私保护智能的愿景。
摘要:The Web is naturally heterogeneous with user devices, geographic regions, browsing patterns, and contexts all leading to highly diverse, unique datasets. Federated Learning (FL) is an important paradigm for the Web because it enables privacy-preserving, collaborative machine learning across diverse user devices, web services and clients without needing to centralize sensitive data. However, its performance degrades severely under non-IID client distributions that is prevalent in real-world web systems. In this work, we propose a new training paradigm - Iterative Federated Adaptation (IFA) - that enhances generalization in heterogeneous federated settings through generation-wise forget and evolve strategy. Specifically, we divide training into multiple generations and, at the end of each, select a fraction of model parameters (a) randomly or (b) from the later layers of the model and reinitialize them. This iterative forget and evolve schedule allows the model to escape local minima and preserve globally relevant representations. Extensive experiments on CIFAR-10, MIT-Indoors, and Stanford Dogs datasets show that the proposed approach improves global accuracy, especially when the data cross clients are Non-IID. This method can be implemented on top any federated algorithm to improve its generalization performance. We observe an average of 21.5%improvement across datasets. This work advances the vision of scalable, privacy-preserving intelligence for real-world heterogeneous and distributed web systems.
【3】AGMA: Adaptive Gaussian Mixture Anchors for Prior-Guided Multimodal Human Trajectory Forecasting
标题:AGMA:用于先验引导多模式人类轨迹预测的自适应高斯混合先验信息
链接:https://arxiv.org/abs/2602.04204
作者:Chao Li,Rui Zhang,Siyuan Huang,Xian Zhong,Hongbo Jiang
备注:14 pages, 3 figures
摘要:人类轨迹预测需要捕捉行人行为的多模态性质。然而,现有的方法遭受先前的不对准。他们学习或固定的先验往往无法捕捉合理未来的完整分布,限制了预测的准确性和多样性。我们从理论上建立了预测误差是下界的先验质量,使先验建模的一个关键性能瓶颈。在此基础上,我们提出了自适应高斯混合模型(AGMA),它通过两个阶段构建表达性先验:从训练数据中提取不同的行为模式,并将其提取为场景自适应的全局先验进行推理。在ETH-UCY、Stanford Drone和JRDB数据集上进行的大量实验表明,AGMA实现了最先进的性能,证实了高质量先验在轨迹预测中的关键作用。
摘要:Human trajectory forecasting requires capturing the multimodal nature of pedestrian behavior. However, existing approaches suffer from prior misalignment. Their learned or fixed priors often fail to capture the full distribution of plausible futures, limiting both prediction accuracy and diversity. We theoretically establish that prediction error is lower-bounded by prior quality, making prior modeling a key performance bottleneck. Guided by this insight, we propose AGMA (Adaptive Gaussian Mixture Anchors), which constructs expressive priors through two stages: extracting diverse behavioral patterns from training data and distilling them into a scene-adaptive global prior for inference. Extensive experiments on ETH-UCY, Stanford Drone, and JRDB datasets demonstrate that AGMA achieves state-of-the-art performance, confirming the critical role of high-quality priors in trajectory forecasting.
【4】Piece of CAKE: Adaptive Execution Engines via Microsecond-Scale Learning
标题:蛋糕:通过微秒规模学习的自适应执行引擎
链接:https://arxiv.org/abs/2602.04181
作者:Zijie Zhao,Ryan Marcus
摘要:低级数据库操作符通常允许多个物理实现(“内核”),这些实现在语义上是等价的,但根据输入数据的分布,它们具有截然不同的性能特征。现有的数据库系统通常依赖于静态算法或最坏情况下的最佳默认值来选择这些内核,通常会错过重要的性能机会。在这项工作中,我们提出了CAKE(反事实自适应内核执行),一个系统,学习选择最佳的内核为每个数据“小块”使用微秒级的上下文多臂强盗。CAKE通过利用反事实的廉价性(选择性地运行多个内核以获得完整的反馈)并将策略编译成低延迟的后悔树来规避传统强化学习的高延迟。在实验中,我们表明,CAKE可以减少端到端的工作负载延迟高达2倍的最先进的静态队列相比。
摘要:Low-level database operators often admit multiple physical implementations ("kernels") that are semantically equivalent but have vastly different performance characteristics depending on the input data distribution. Existing database systems typically rely on static heuristics or worst-case optimal defaults to select these kernels, often missing significant performance opportunities. In this work, we propose CAKE (Counterfactual Adaptive Kernel Execution), a system that learns to select the optimal kernel for each data "morsel" using a microsecond-scale contextual multi-armed bandit. CAKE circumvents the high latency of traditional reinforcement learning by exploiting the cheapness of counterfactuals -- selectively running multiple kernels to obtain full feedback -- and compiling policies into low-latency regret trees. Experimentally, we show that CAKE can reduce end-to-end workload latency by up to 2x compared to state-of-the-art static heuristics.
【5】DADP: Domain Adaptive Diffusion Policy
标题:域自适应扩散政策
链接:https://arxiv.org/abs/2602.04037
作者:Pengcheng Wang,Qinghang Liu,Haotian Lin,Yiheng Li,Guojian Zhan,Masayoshi Tomizuka,Yixiao Wang
摘要
:学习域自适应策略,可以推广到看不见的过渡动态,仍然是一个基本的挑战,基于学习的控制。通过领域表示学习来捕获特定领域的信息,从而实现领域感知决策,已经取得了实质性的进展。我们分析了通过动态预测学习域表示的过程,发现选择与当前步骤相邻的上下文会导致学习到的表示将静态域信息与不同的动态属性纠缠在一起。这种混合可能会混淆有条件的策略,从而限制zero-shot自适应。为了应对这一挑战,我们提出了DADP(域自适应扩散策略),它通过无监督的解纠缠和域感知扩散注入来实现鲁棒的自适应。首先,我们引入了滞后上下文动态预测,这是一种基于历史偏移上下文的未来状态估计策略;通过增加这个时间间隔,我们通过过滤掉瞬态属性来无监督地解开静态域表示。其次,我们通过偏置先验分布和重新制定扩散目标,将学习到的域表示直接集成到生成过程中。在运动和操纵的挑战性基准上进行的大量实验证明了DADP优于先前方法的优越性能和可推广性。更多可视化结果可在https://outsider86.github.io/DomainAdaptiveDiffusionPolicy/上获得。
摘要:Learning domain adaptive policies that can generalize to unseen transition dynamics, remains a fundamental challenge in learning-based control. Substantial progress has been made through domain representation learning to capture domain-specific information, thus enabling domain-aware decision making. We analyze the process of learning domain representations through dynamical prediction and find that selecting contexts adjacent to the current step causes the learned representations to entangle static domain information with varying dynamical properties. Such mixture can confuse the conditioned policy, thereby constraining zero-shot adaptation. To tackle the challenge, we propose DADP (Domain Adaptive Diffusion Policy), which achieves robust adaptation through unsupervised disentanglement and domain-aware diffusion injection. First, we introduce Lagged Context Dynamical Prediction, a strategy that conditions future state estimation on a historical offset context; by increasing this temporal gap, we unsupervisedly disentangle static domain representations by filtering out transient properties. Second, we integrate the learned domain representations directly into the generative process by biasing the prior distribution and reformulating the diffusion target. Extensive experiments on challenging benchmarks across locomotion and manipulation demonstrate the superior performance, and the generalizability of DADP over prior methods. More visualization results are available on the https://outsider86.github.io/DomainAdaptiveDiffusionPolicy/.
【6】WIND: Weather Inverse Diffusion for Zero-Shot Atmospheric Modeling
标题:WIND:零发射大气建模的天气逆扩散
链接:https://arxiv.org/abs/2602.03924
作者:Michael Aich,Andreas Fürst,Florian Sestak,Carlos Ruiz-Gonzalez,Niklas Boers,Johannes Brandstetter
摘要:深度学习已经彻底改变了天气和气候建模,但目前的情况仍然是支离破碎的:高度专业化的模型通常针对不同的任务进行单独训练。为了统一这一格局,我们引入了WIND,这是一个经过预先训练的基础模型,能够在大量任务中取代专门的基线。至关重要的是,与以前的大气基础模型相比,我们在没有任何特定任务微调的情况下实现了这一目标。为了学习一个强大的,任务不可知的大气先验,我们用一个自监督的视频重建目标来预训练WIND,利用一个无条件的视频扩散模型来迭代地从噪声状态重建大气动力学。在推理中,我们严格地将不同领域的特定问题框定为逆问题,并通过后验抽样来解决它们。这种统一的方法使我们能够解决高度相关的天气和气候问题,包括概率预测,空间和时间降尺度,稀疏重建和纯粹使用我们预先训练的模型执行守恒定律。我们进一步证明了该模型的能力,以产生物理上一致的反事实的故事情节的极端天气事件在全球变暖的情况下。通过将生成视频建模与逆问题求解相结合,WIND在基于AI的大气建模中提供了计算效率高的范式转变。
摘要:Deep learning has revolutionized weather and climate modeling, yet the current landscape remains fragmented: highly specialized models are typically trained individually for distinct tasks. To unify this landscape, we introduce WIND, a single pre-trained foundation model capable of replacing specialized baselines across a vast array of tasks. Crucially, in contrast to previous atmospheric foundation models, we achieve this without any task-specific fine-tuning. To learn a robust, task-agnostic prior of the atmosphere, we pre-train WIND with a self-supervised video reconstruction objective, utilizing an unconditional video diffusion model to iteratively reconstruct atmospheric dynamics from a noisy state. At inference, we frame diverse domain-specific problems strictly as inverse problems and solve them via posterior sampling. This unified approach allows us to tackle highly relevant weather and climate problems, including probabilistic forecasting, spatial and temporal downscaling, sparse reconstruction and enforcing conservation laws purely with our pre-trained model. We further demonstrate the model's capacity to generate physically consistent counterfactual storylines of extreme weather events under global warming scenarios. By combining generative video modeling with inverse problem solving, WIND offers a computationally efficient paradigm shift in AI-based atmospheric modeling.
【7】Entropy-Aware Structural Alignment for Zero-Shot Handwritten Chinese Character Recognition
标题:Zero-Shot手写体中文识别的信息感知结构对齐
链接:https://arxiv.org/abs/2602.03913
作者:Qiuming Luo,Tao Zeng,Feng Li,Heming Liu,Rui Mao,Chang Kong
备注:37 pages, 8 figures
摘要:Zero-shot手写汉字识别(HCCR)的目标是通过利用基于偏旁的语义组合来识别看不见的字符。然而,现有的方法往往把字符作为平坦的根式序列,忽略了层次拓扑结构和不均匀的信息密度的不同组成部分。为了解决这些限制,我们提出了一个熵感知结构对齐网络,通过信息理论建模来弥合视觉语义差距。首先,我们引入了一个信息熵之前,通过乘法交互动态调制位置嵌入,作为一个显着性检测器,优先考虑无处不在的组件的歧视性根源。其次,我们构建了一个双视图根树提取多粒度的结构特征,这是通过一个自适应Sigmoid为基础的门控网络编码的全局布局和局部空间角色。最后,Top-K语义特征融合机制的设计,以增强解码过程中,利用语义邻居的质心,有效地纠正视觉歧义,通过特征级的共识。广泛的实验表明,我们的方法建立了新的最先进的性能,在具有挑战性的零触发(zero-shot)设置中显着优于现有的基于CLIP的基线。此外,该框架具有出色的数据效率,表现出快速的适应性与最小的支持样本。
摘要:Zero-shot Handwritten Chinese Character Recognition (HCCR) aims to recognize unseen characters by leveraging radical-based semantic compositions. However, existing approaches often treat characters as flat radical sequences, neglecting the hierarchical topology and the uneven information density of different components. To address these limitations, we propose an Entropy-Aware Structural Alignment Network that bridges the visual-semantic gap through information-theoretic modeling. First, we introduce an Information Entropy Prior to dynamically modulate positional embeddings via multiplicative interaction, acting as a saliency detector that prioritizes discriminative roots over ubiquitous components. Second, we construct a Dual-View Radical Tree to extract multi-granularity structural features, which are integrated via an adaptive Sigmoid-based gating network to encode both global layout and local spatial roles. Finally, a Top-K Semantic Feature Fusion mechanism is devised to augment the decoding process by utilizing the centroid of semantic neighbors, effectively rectifying visual ambiguities through feature-level consensus. Extensive experiments demonstrate that our method establishes new state-of-the-art performance, significantly outperforming existing CLIP-based baselines in the challenging zero-shot setting. Furthermore, the framework exhibits exceptional data efficiency, demonstrating rapid adaptability with minimal support samples.
【8】Universal Robust Speech Adaptation for Cross-Domain Speech Recognition and Enhancement
标题:用于跨域语音识别和增强的通用鲁棒语音自适应
链接:https://arxiv.org/abs/2602.04307
作者:Chien-Chun Wang,Hung-Shin Lee,Hsin-Min Wang,Berlin Chen
备注:Accepted to IEEE Transactions on Audio, Speech and Language Processing (IEEE TASLP)
摘要:用于自动语音识别(ASR)和语音增强(SE)的预训练模型在匹配的噪声和信道条件下表现出了显着的能力。然而,这些模型往往遭受严重的性能下降时,面对域偏移,特别是在存在看不见的噪声和信道失真。有鉴于此,我们在本文中提出了URSA-GAN,这是一个统一的、领域感知的生成框架,专门用于减轻噪声和信道条件下的失配。URSA-GAN利用双嵌入架构,该架构由噪声编码器和信道编码器组成,每个编码器都使用有限的域内数据进行预训练,以捕获域相关的表示。这些嵌入条件基于GAN的语音生成器,促进语音的合成,该语音在声学上与目标域对齐,同时保留语音内容。为了进一步提高泛化能力,我们提出了动态随机扰动,这是一种新的正则化技术,它在生成过程中将受控的可变性引入到嵌入中,从而提高了对未知域的鲁棒性。实验结果表明,URSA-GAN有效地降低了字符错误率在ASR和提高感知指标在SE在不同的噪声和不匹配的信道场景。值得注意的是,在信道和噪声退化的复合测试条件下的评估证实了URSA-GAN的泛化能力,ASR性能相对提高了16.16%,SE指标相对提高了15.58%。
摘要
:Pre-trained models for automatic speech recognition (ASR) and speech enhancement (SE) have exhibited remarkable capabilities under matched noise and channel conditions. However, these models often suffer from severe performance degradation when confronted with domain shifts, particularly in the presence of unseen noise and channel distortions. In view of this, we in this paper present URSA-GAN, a unified and domain-aware generative framework specifically designed to mitigate mismatches in both noise and channel conditions. URSA-GAN leverages a dual-embedding architecture that consists of a noise encoder and a channel encoder, each pre-trained with limited in-domain data to capture domain-relevant representations. These embeddings condition a GAN-based speech generator, facilitating the synthesis of speech that is acoustically aligned with the target domain while preserving phonetic content. To enhance generalization further, we propose dynamic stochastic perturbation, a novel regularization technique that introduces controlled variability into the embeddings during generation, promoting robustness to unseen domains. Empirical results demonstrate that URSA-GAN effectively reduces character error rates in ASR and improves perceptual metrics in SE across diverse noisy and mismatched channel scenarios. Notably, evaluations on compound test conditions with both channel and noise degradations confirm the generalization ability of URSA-GAN, yielding relative improvements of 16.16% in ASR performance and 15.58% in SE metrics.
强化学习(11篇)
【1】CRoSS: A Continual Robotic Simulation Suite for Scalable Reinforcement Learning with High Task Diversity and Realistic Physics Simulation
标题:CRoSS:一个连续机器人模拟套件,用于可扩展强化学习,具有高任务多样性和真实物理模拟
链接:https://arxiv.org/abs/2602.04868
作者:Yannick Denker,Alexander Gepperth
摘要:持续强化学习(CRL)要求智能体从一系列任务中学习,而不会忘记先前获得的策略。在这项工作中,我们引入了一个新的基准测试套件CRL的基础上逼真地模拟机器人在凉亭模拟器。我们的连续机器人仿真套件(CRoSS)基准测试依赖于两个机器人平台:一个带有激光雷达、摄像头和保险杠传感器的两轮差动驱动机器人,以及一个带有七个关节的机器人手臂。前者代表代理行以下和对象推的情况下,视觉和结构参数的变化产生了大量的不同的任务,而后者是用于两个目标达成的情况下,高层次的carpet手的位置控制(建模后的连续世界基准),和低层次的控制的基础上联合角度。对于机器人手臂基准测试,我们提供了额外的仅运动学变体,这些变体绕过了物理模拟的需要(只要不需要传感器读数),并且可以更快地运行两个数量级。CRoSS被设计为易于扩展,并允许在具有高度物理现实主义的机器人环境中对持续强化学习进行受控研究,特别是允许使用几乎任意的模拟传感器。为了确保可重复性和易用性,我们提供了一个开箱即用的容器化设置(Apptainer),并报告标准RL算法的性能,包括Deep Q-Networks(DQN)和策略梯度方法。这突出了作为CRL研究的可扩展和可再现基准的适用性。
摘要:Continual reinforcement learning (CRL) requires agents to learn from a sequence of tasks without forgetting previously acquired policies. In this work, we introduce a novel benchmark suite for CRL based on realistically simulated robots in the Gazebo simulator. Our Continual Robotic Simulation Suite (CRoSS) benchmarks rely on two robotic platforms: a two-wheeled differential-drive robot with lidar, camera and bumper sensor, and a robotic arm with seven joints. The former represent an agent in line-following and object-pushing scenarios, where variation of visual and structural parameters yields a large number of distinct tasks, whereas the latter is used in two goal-reaching scenarios with high-level cartesian hand position control (modeled after the Continual World benchmark), and low-level control based on joint angles. For the robotic arm benchmarks, we provide additional kinematics-only variants that bypass the need for physical simulation (as long as no sensor readings are required), and which can be run two orders of magnitude faster. CRoSS is designed to be easily extensible and enables controlled studies of continual reinforcement learning in robotic settings with high physical realism, and in particular allow the use of almost arbitrary simulated sensors. To ensure reproducibility and ease of use, we provide a containerized setup (Apptainer) that runs out-of-the-box, and report performances of standard RL algorithms, including Deep Q-Networks (DQN) and policy gradient methods. This highlights the suitability as a scalable and reproducible benchmark for CRL research.
【2】Beyond Rewards in Reinforcement Learning for Cyber Defence
标题:网络防御强化学习的超越回报
链接:https://arxiv.org/abs/2602.04809
作者:Elizabeth Bates,Chris Hicks,Vasilios Mavroudis
摘要:近年来,人们对使用深度强化学习来保护计算机网络的自主网络防御代理的兴趣激增。这些代理通常在网络健身房环境中使用密集的,高度工程化的奖励功能进行训练,这些功能结合了许多惩罚和激励措施,用于一系列(不)理想的状态和昂贵的行动。密集的奖励有助于减轻探索复杂环境的挑战,但有可能使代理偏向次优和潜在风险更高的解决方案,这是复杂网络环境中的一个关键问题。我们使用各种稀疏和密集的奖励函数,两个完善的网络健身房,一系列的网络规模,以及政策梯度和基于值的RL算法,彻底评估奖励函数结构对学习和政策行为特征的影响。我们的评估是通过一种新的地面实况评估方法,它允许直接比较不同的奖励函数,照亮奖励,行动空间和网络环境中次优政策的风险之间的微妙的相互关系。我们的研究结果表明,稀疏的奖励,只要它们是目标一致的,可以经常遇到,独特地提供增强的训练可靠性和更有效的网络防御代理与低风险的政策。令人惊讶的是,稀疏奖励也可以产生更好地与网络防御者目标保持一致的策略,并在没有明确的基于奖励的数字惩罚的情况下节省使用昂贵的防御行动。
摘要:Recent years have seen an explosion of interest in autonomous cyber defence agents trained to defend computer networks using deep reinforcement learning. These agents are typically trained in cyber gym environments using dense, highly engineered reward functions which combine many penalties and incentives for a range of (un)desirable states and costly actions. Dense rewards help alleviate the challenge of exploring complex environments but risk biasing agents towards suboptimal and potentially riskier solutions, a critical issue in complex cyber environments. We thoroughly evaluate the impact of reward function structure on learning and policy behavioural characteristics using a variety of sparse and dense reward functions, two well-established cyber gyms, a range of network sizes, and both policy gradient and value-based RL algorithms. Our evaluation is enabled by a novel ground truth evaluation approach which allows directly comparing between different reward functions, illuminating the nuanced inter-relationships between rewards, action space and the risks of suboptimal policies in cyber environments. Our results show that sparse rewards, provided they are goal aligned and can be encountered frequently, uniquely offer both enhanced training reliability and more effective cyber defence agents with lower-risk policies. Surprisingly, sparse rewards can also yield policies that are better aligned with cyber defender goals and make sparing use of costly defensive actions without explicit reward-based numerical penalties.
【3】Rationality Measurement and Theory for Reinforcement Learning Agents
标题:强化学习代理的合理性测量和理论
链接:https://arxiv.org/abs/2602.04737
作者:Kejiang Qian,Amos Storkey,Fengxiang He
摘要:本文提出了一套合理性措施和相关理论的强化学习代理,一个属性越来越重要,但很少探索。我们定义部署中的动作是完全理性的,如果它在最陡的方向上最大化隐藏的真值函数。一个政策的行动对他们的理性同行,最终在部署的轨迹的期望值的差异,被定义为预期的理性风险,在培训的经验平均版本也被定义。它们的差异,称为合理风险差距,被分解为(1)由训练和部署之间的环境变化引起的外在成分,以及(2)由于算法在动态环境中的泛化性而引起的内在成分。它们的上限分别为:(1)训练和部署中过渡内核与初始状态分布之间的1 $-Wasserstein距离,以及(2)值函数类的经验Rademacher复杂度。我们的理论提出了关于调节器(包括层归一化,$\ell_2 $正则化和权重归一化)和域随机化的好处的假设,以及环境变化的危害。实验与这些假设完全一致。该代码可在https://github.com/EVIEHub/Rationality上获得。
摘要
:This paper proposes a suite of rationality measures and associated theory for reinforcement learning agents, a property increasingly critical yet rarely explored. We define an action in deployment to be perfectly rational if it maximises the hidden true value function in the steepest direction. The expected value discrepancy of a policy's actions against their rational counterparts, culminating over the trajectory in deployment, is defined to be expected rational risk; an empirical average version in training is also defined. Their difference, termed as rational risk gap, is decomposed into (1) an extrinsic component caused by environment shifts between training and deployment, and (2) an intrinsic one due to the algorithm's generalisability in a dynamic environment. They are upper bounded by, respectively, (1) the $1$-Wasserstein distance between transition kernels and initial state distributions in training and deployment, and (2) the empirical Rademacher complexity of the value function class. Our theory suggests hypotheses on the benefits from regularisers (including layer normalisation, $\ell_2$ regularisation, and weight normalisation) and domain randomisation, as well as the harm from environment shifts. Experiments are in full agreement with these hypotheses. The code is available at https://github.com/EVIEHub/Rationality.
【4】Rethinking the Design Space of Reinforcement Learning for Diffusion Models: On the Importance of Likelihood Estimation Beyond Loss Design
标题:重新思考扩散模型强化学习的设计空间:论损失设计之外的可能性估计的重要性
链接:https://arxiv.org/abs/2602.04663
作者:Jaemoo Choi,Yuchen Zhu,Wei Guo,Petr Molodyk,Bo Yuan,Jinbin Bai,Yi Xin,Molei Tao,Yongxin Chen
备注:23 pages, 11 figures
摘要:强化学习已被广泛应用于视觉任务的扩散和流动模型,如文本到图像生成。然而,这些任务仍然具有挑战性,因为扩散模型具有难以处理的似然性,这为直接应用流行的策略梯度类型方法造成了障碍。现有的方法主要集中在制作新的目标,建立在已经大量工程LLM目标,使用特设估计的可能性,没有彻底调查这种估计如何影响整体算法性能。在这项工作中,我们提供了一个系统的分析RL设计空间,解开三个因素:i)政策梯度目标,ii)似然估计,iii)推出抽样方案。我们表明,采用基于证据下限(ELBO)的模型似然估计,仅从最终生成的样本计算,是实现有效,高效和稳定的RL优化的主导因素,超过了特定的政策梯度损失功能的影响。我们使用SD 3.5 Medium在多个奖励基准中验证了我们的发现,并在所有任务中观察到一致的趋势。我们的方法在90 GPU小时内将GenEval分数从0.24提高到0.95,比FlowGRPO效率高4.6\times $,比SOTA方法DiffusionNFT效率高2\times $,没有奖励黑客。
摘要:Reinforcement learning has been widely applied to diffusion and flow models for visual tasks such as text-to-image generation. However, these tasks remain challenging because diffusion models have intractable likelihoods, which creates a barrier for directly applying popular policy-gradient type methods. Existing approaches primarily focus on crafting new objectives built on already heavily engineered LLM objectives, using ad hoc estimators for likelihood, without a thorough investigation into how such estimation affects overall algorithmic performance. In this work, we provide a systematic analysis of the RL design space by disentangling three factors: i) policy-gradient objectives, ii) likelihood estimators, and iii) rollout sampling schemes. We show that adopting an evidence lower bound (ELBO) based model likelihood estimator, computed only from the final generated sample, is the dominant factor enabling effective, efficient, and stable RL optimization, outweighing the impact of the specific policy-gradient loss functional. We validate our findings across multiple reward benchmarks using SD 3.5 Medium, and observe consistent trends across all tasks. Our method improves the GenEval score from 0.24 to 0.95 in 90 GPU hours, which is $4.6\times$ more efficient than FlowGRPO and $2\times$ more efficient than the SOTA method DiffusionNFT without reward hacking.
【5】WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning
标题:WideSeek-R1:探索通过多智能体强化学习进行广泛信息搜索的宽度缩放
链接:https://arxiv.org/abs/2602.04634
作者:Zelai Xu,Zhexuan Xu,Ruize Zhang,Chunyang Zhu,Shi Yu,Weilin Liu,Quanlu Zhang,Wenbo Ding,Chao Yu,Yu Wang
摘要:大型语言模型(LLM)的最新进展主要集中在深度缩放上,其中单个智能体通过多轮推理和工具使用解决长期问题。然而,随着任务的扩大,关键的瓶颈从个人能力转移到组织能力。在这项工作中,我们探索了一个互补的尺寸的宽度缩放与多智能体系统,以解决广泛的信息寻求。现有的多代理系统通常依赖于手工制作的工作流程和轮流交互,无法有效地并行化工作。为了弥合这一差距,我们提出了WideSeek-R1,这是一个通过多代理强化学习(MARL)训练的领导代理子代理框架,可以协同可扩展的编排和并行执行。通过利用具有隔离上下文和专用工具的共享LLM,WideSeek-R1在20 k广泛信息搜索任务的策划数据集上联合优化了主代理和并行子代理。广泛的实验表明,WideSeek-R1- 4 B在WideSearch基准测试中的项目F1得分为40.0%,与单智能体DeepSeek-R1- 671 B的性能相当。此外,随着并行子代理数量的增加,WideSeek-R1- 4 B表现出一致的性能增益,突出了宽度缩放的有效性。
摘要:Recent advancements in Large Language Models (LLMs) have largely focused on depth scaling, where a single agent solves long-horizon problems with multi-turn reasoning and tool use. However, as tasks grow broader, the key bottleneck shifts from individual competence to organizational capability. In this work, we explore a complementary dimension of width scaling with multi-agent systems to address broad information seeking. Existing multi-agent systems often rely on hand-crafted workflows and turn-taking interactions that fail to parallelize work effectively. To bridge this gap, we propose WideSeek-R1, a lead-agent-subagent framework trained via multi-agent reinforcement learning (MARL) to synergize scalable orchestration and parallel execution. By utilizing a shared LLM with isolated contexts and specialized tools, WideSeek-R1 jointly optimizes the lead agent and parallel subagents on a curated dataset of 20k broad information-seeking tasks. Extensive experiments show that WideSeek-R1-4B achieves an item F1 score of 40.0% on the WideSearch benchmark, which is comparable to the performance of single-agent DeepSeek-R1-671B. Furthermore, WideSeek-R1-4B exhibits consistent performance gains as the number of parallel subagents increases, highlighting the effectiveness of width scaling.
【6】Stochastic Decision Horizons for Constrained Reinforcement Learning
标题:约束强化学习的随机决策视野
链接:https://arxiv.org/abs/2602.04599
作者:Nikola Milosevic,Leonard Franz,Daniel Haeufle,Georg Martius,Nico Scherf,Pavel Kolev
摘要:约束马尔可夫决策过程(CMDPs)为强化学习中处理约束(如安全和其他辅助目标)提供了一个原则性模型。使用附加成本约束和对偶变量的常见方法通常会阻碍策略外的可伸缩性。我们提出了一种基于随机决策视野的控制推理公式,其中违反约束会削弱奖励贡献,并通过状态动作相关的延续缩短有效的规划视野。这产生了生存加权的目标,保持重放兼容的离线政策演员-评论家学习。我们提出了两个违反语义,吸收和虚拟终止,共享相同的生存加权回报,但导致不同的优化结构,导致SAC/MPO风格的政策改进。实验表明,改进的采样效率和良好的返回违规权衡标准基准。此外,MPO与虚拟终止(VT-MPO)规模有效地为我们的高维肌肉骨骼Hyfydy设置。
摘要:Constrained Markov decision processes (CMDPs) provide a principled model for handling constraints, such as safety and other auxiliary objectives, in reinforcement learning. The common approach of using additive-cost constraints and dual variables often hinders off-policy scalability. We propose a Control as Inference formulation based on stochastic decision horizons, where constraint violations attenuate reward contributions and shorten the effective planning horizon via state-action-dependent continuation. This yields survival-weighted objectives that remain replay-compatible for off-policy actor-critic learning. We propose two violation semantics, absorbing and virtual termination, that share the same survival-weighted return but result in distinct optimization structures that lead to SAC/MPO-style policy improvement. Experiments demonstrate improved sample efficiency and favorable return-violation trade-offs on standard benchmarks. Moreover, MPO with virtual termination (VT-MPO) scales effectively to our high-dimensional musculoskeletal Hyfydy setup.
【7】Learning the Value Systems of Agents with Preference-based and Inverse Reinforcement Learning
标题:利用基于偏好和反向强化学习代理的价值体系
链接:https://arxiv.org/abs/2602.04518
作者:Andrés Holgado-Sánchez,Holger Billhardt,Alberto Fernández,Sascha Ossowski
备注:42 pages, 5 figures. Published in Journal of Autonomous Agents and Multi-Agent Systems
摘要:协议技术指的是开放的计算机系统,在该系统中,自主软件代理相互交互,通常代表人类,以便达成相互可接受的协议。随着近年来人工智能系统的发展,很明显,为了让相关各方接受,这些协议必须与伦理原则和道德价值观保持一致。然而,这是众所周知的难以确保的,特别是因为不同的人类用户(及其软件代理)可能持有不同的价值体系,即他们可能对个人道德价值观的重要性有不同的衡量。此外,通常很难以计算方式指定特定上下文中值的精确含义。基于人类工程规范(例如基于价值调查)来估计价值系统的方法由于需要强烈的人类调节而在规模上受到限制。在这篇文章中,我们提出了一种新的方法来自动学习价值系统的观察和人类的示范。特别是,我们提出了一个正式的模型的\endash {价值系统学习}的问题,其实例化的顺序决策域的基础上,多目标马尔可夫决策过程,以及定制的基于偏好和逆强化学习算法来推断值接地功能和价值系统。通过两个模拟用例说明和评估该方法。
摘要:Agreement Technologies refer to open computer systems in which autonomous software agents interact with one another, typically on behalf of humans, in order to come to mutually acceptable agreements. With the advance of AI systems in recent years, it has become apparent that such agreements, in order to be acceptable to the involved parties, must remain aligned with ethical principles and moral values. However, this is notoriously difficult to ensure, especially as different human users (and their software agents) may hold different value systems, i.e. they may differently weigh the importance of individual moral values. Furthermore, it is often hard to specify the precise meaning of a value in a particular context in a computational manner. Methods to estimate value systems based on human-engineered specifications, e.g. based on value surveys, are limited in scale due to the need for intense human moderation. In this article, we propose a novel method to automatically \emph{learn} value systems from observations and human demonstrations. In particular, we propose a formal model of the \emph{value system learning} problem, its instantiation to sequential decision-making domains based on multi-objective Markov decision processes, as well as tailored preference-based and inverse reinforcement learning algorithms to infer value grounding functions and value systems. The approach is illustrated and evaluated by two simulated use cases.
【8】HoRD: Robust Humanoid Control via History-Conditioned Reinforcement Learning and Online Distillation
标题:HoRD:通过历史条件强化学习和在线蒸馏实现鲁棒的仿人控制
链接:https://arxiv.org/abs/2602.04412
作者:Puyue Wang,Jiawei Hu,Yan Gao,Junyan Wang,Yu Zhang,Gillian Dobbie,Tao Gu,Wafa Johal,Ting Dang,Hong Jia
摘要:在动力学、任务规范或环境设置的微小变化下,人形机器人可能会遭受显著的性能下降。我们提出了HoRD,一个两阶段的学习框架下域转移的鲁棒人形控制。首先,我们通过历史条件强化学习训练一个高性能的教师政策,该政策推断潜在的动态背景下,从最近的状态-行动轨迹,以适应在线不同的随机动态。其次,我们进行在线蒸馏,将教师的鲁棒控制能力转移到基于变换器的学生策略中,该策略在稀疏根相对3D联合关键点轨迹上操作。通过将历史条件适应与在线蒸馏相结合,HoRD使单个策略能够适应zero-shot到看不见的域,而无需对每个域进行重新训练。大量的实验表明,HoRD在鲁棒性和传输方面优于强基线,特别是在未知域和外部扰动下。代码和项目页面位于\href{https://tonywang-0517.github.io/hord/}{https://tonywang-0517.github.io/hord/}。
摘要:Humanoid robots can suffer significant performance drops under small changes in dynamics, task specifications, or environment setup. We propose HoRD, a two-stage learning framework for robust humanoid control under domain shift. First, we train a high-performance teacher policy via history-conditioned reinforcement learning, where the policy infers latent dynamics context from recent state--action trajectories to adapt online to diverse randomized dynamics. Second, we perform online distillation to transfer the teacher's robust control capabilities into a transformer-based student policy that operates on sparse root-relative 3D joint keypoint trajectories. By combining history-conditioned adaptation with online distillation, HoRD enables a single policy to adapt zero-shot to unseen domains without per-domain retraining. Extensive experiments show HoRD outperforms strong baselines in robustness and transfer, especially under unseen domains and external perturbations. Code and project page are available at \href{https://tonywang-0517.github.io/hord/}{https://tonywang-0517.github.io/hord/}.
【9】Decoupling Time and Risk: Risk-Sensitive Reinforcement Learning with General Discounting
标题:时间与风险脱钩:具有一般折扣的风险敏感强化学习
链接:https://arxiv.org/abs/2602.04131
作者:Mehrdad Moghimi,Anthony Coache,Hyejin Ku
摘要:分布式强化学习(RL)是一个强大的框架,越来越多地采用在安全关键领域的能力,以优化风险敏感的目标。然而,折扣因子的作用往往被忽视,因为它通常被视为马尔可夫决策过程或可调超参数的固定参数,很少考虑其对学习策略的影响。在文献中,众所周知,贴现函数在表征代理人的时间偏好方面起着重要作用,而指数贴现因子无法完全捕获。基于这一见解,我们提出了一个新的框架,支持灵活的折扣未来的奖励和优化的风险措施,在分布式RL。我们提供了我们的算法的最优性的技术分析,表明我们的多视野扩展修复了现有方法提出的问题,并通过广泛的实验验证了我们的方法的鲁棒性。我们的结果强调,贴现是决策问题的基石,可以捕捉更具表达性的时间和风险偏好配置文件,对现实世界的安全关键型应用程序具有潜在影响。
摘要:Distributional reinforcement learning (RL) is a powerful framework increasingly adopted in safety-critical domains for its ability to optimize risk-sensitive objectives. However, the role of the discount factor is often overlooked, as it is typically treated as a fixed parameter of the Markov decision process or tunable hyperparameter, with little consideration of its effect on the learned policy. In the literature, it is well-known that the discounting function plays a major role in characterizing time preferences of an agent, which an exponential discount factor cannot fully capture. Building on this insight, we propose a novel framework that supports flexible discounting of future rewards and optimization of risk measures in distributional RL. We provide a technical analysis of the optimality of our algorithms, show that our multi-horizon extension fixes issues raised with existing methodologies, and validate the robustness of our methods through extensive experiments. Our results highlight that discounting is a cornerstone in decision-making problems for capturing more expressive temporal and risk preferences profiles, with potential implications for real-world safety-critical applications.
【10】Autonomous AI Agents for Real-Time Affordable Housing Site Selection: Multi-Objective Reinforcement Learning Under Regulatory Constraints
标题:实时经济适用房选址的自主人工智能代理:监管约束下的多目标强化学习
链接:https://arxiv.org/abs/2602.03940
作者:Olaf Yunus Laitinen Imanov,Duygu Erisken,Derya Umut Kulali,Taner Yilmaz,Rana Irem Turhan
备注:12 pages, 6 figures, 5 tables
摘要:负担得起的住房短缺影响到数十亿人,而土地稀缺和法规使选址缓慢。我们提出了AURA(自治城市资源分配器),一个层次化的多智能体强化学习系统,用于硬监管约束(QCT,DDA,LIHTC)下的实时经济适用房选址。我们将任务建模为一个受约束的多目标马尔可夫决策过程,优化可达性,环境影响,建设成本和社会公平,同时执行可行性。AURA使用一个监管意识的国家编码127个联邦和地方约束,帕累托约束的政策梯度与可行性保证,和奖励分解分离的直接成本从长期的社会成果。在来自8个美国城市(47,392个候选地块)的数据集上,AURA达到了94.3%的法规合规性,并将Pareto超体积提高了37.2%。在纽约市2026年的案例研究中,它将选择时间从18个月缩短到72小时,并确定了23%的可行地点;所选地点的交通便利性比专家选择的地点高31%,环境影响低19%。
摘要
:Affordable housing shortages affect billions, while land scarcity and regulations make site selection slow. We present AURA (Autonomous Urban Resource Allocator), a hierarchical multi-agent reinforcement learning system for real-time affordable housing site selection under hard regulatory constraints (QCT, DDA, LIHTC). We model the task as a constrained multi-objective Markov decision process optimizing accessibility, environmental impact, construction cost, and social equity while enforcing feasibility. AURA uses a regulatory-aware state encoding 127 federal and local constraints, Pareto-constrained policy gradients with feasibility guarantees, and reward decomposition separating immediate costs from long-term social outcomes. On datasets from 8 U.S. metros (47,392 candidate parcels), AURA attains 94.3% regulatory compliance and improves Pareto hypervolume by 37.2% over strong baselines. In a New York City 2026 case study, it reduces selection time from 18 months to 72 hours and identifies 23% more viable sites; chosen sites have 31% better transit access and 19% lower environmental impact than expert picks.
【11】Puzzle it Out: Local-to-Global World Model for Offline Multi-Agent Reinforcement Learning
标题:解谜:离线多智能体强化学习的本地到全球世界模型
链接:https://arxiv.org/abs/2601.07463
作者:Sijia li,Xinran Li,Shibo Chen,Jun Zhang
摘要:离线多智能体强化学习(MARL)旨在使用预先收集的数据集解决多智能体系统中的协作决策问题。现有的离线MARL方法主要将训练限制在数据集分布范围内,导致过于保守的策略难以推广到数据支持之外。虽然基于模型的方法通过使用从学习世界模型生成的合成数据扩展原始数据集提供了一个有前途的解决方案,但多智能体系统的高维性,非平稳性和复杂性使得准确估计离线MARL中的转换和奖励函数具有挑战性。鉴于直接建模联合动力学的困难,我们提出了一个本地到全球(ESTA)的世界模型,一个新的框架,利用本地预测,这是更容易估计推断全球状态动态,从而提高预测精度,同时隐式捕捉代理明智的依赖关系。使用经过训练的世界模型,我们生成合成数据来增强原始数据集,扩展有效的状态-动作空间。为了确保可靠的策略学习,我们进一步引入了一种不确定性感知的采样机制,该机制通过预测不确定性自适应地对合成数据进行加权,从而减少了近似误差传播到策略。与传统的基于集合的方法相比,我们的方法只需要一个额外的编码器来进行不确定性估计,在保持准确性的同时显著降低了计算开销。针对8个基线的8个场景的广泛实验表明,我们的方法超过了标准离线MARL基准测试的最新基线,为可推广的离线多智能体学习建立了一个新的基于模型的基线。
摘要:Offline multi-agent reinforcement learning (MARL) aims to solve cooperative decision-making problems in multi-agent systems using pre-collected datasets. Existing offline MARL methods primarily constrain training within the dataset distribution, resulting in overly conservative policies that struggle to generalize beyond the support of the data. While model-based approaches offer a promising solution by expanding the original dataset with synthetic data generated from a learned world model, the high dimensionality, non-stationarity, and complexity of multi-agent systems make it challenging to accurately estimate the transitions and reward functions in offline MARL. Given the difficulty of directly modeling joint dynamics, we propose a local-to-global (LOGO) world model, a novel framework that leverages local predictions-which are easier to estimate-to infer global state dynamics, thus improving prediction accuracy while implicitly capturing agent-wise dependencies. Using the trained world model, we generate synthetic data to augment the original dataset, expanding the effective state-action space. To ensure reliable policy learning, we further introduce an uncertainty-aware sampling mechanism that adaptively weights synthetic data by prediction uncertainty, reducing approximation error propagation to policies. In contrast to conventional ensemble-based methods, our approach requires only an additional encoder for uncertainty estimation, significantly reducing computational overhead while maintaining accuracy. Extensive experiments across 8 scenarios against 8 baselines demonstrate that our method surpasses state-of-the-art baselines on standard offline MARL benchmarks, establishing a new model-based baseline for generalizable offline multi-agent learning.
医学相关(3篇)
【1】Benchmarking and Enhancing PPG-Based Cuffless Blood Pressure Estimation Methods
标题:基准和增强基于PGP的无袖带血压估计方法
链接:https://arxiv.org/abs/2602.04725
作者:Neville Mathew,Yidan Shen,Renjie Hu,Maham Rahimi,George Zouridakis
摘要:基于容易获得的光电容积描记(PPG)信号的无袖带血压筛查提供了一种可扩展的心血管健康评估的实用途径。尽管进展迅速,但现有的基于PPG的血压估计模型尚未始终达到既定的临床数值限值(如AAMI/ISO 81060-2),并且之前的评价通常缺乏有效临床评估所需的严格实验控制。此外,通常使用的公开数据集是异质的,缺乏生理控制的条件,以公平的基准。为了在生理控制条件下实现公平的基准测试,我们创建了一个标准化的基准测试子集NBPDB,包括来自1,103名健康成人的101,453个高质量PPG片段,这些片段来自MIMIC-III和VitalDB。使用该数据集,我们系统地对几种最先进的基于PPG的模型进行了基准测试。结果表明,所评价的模型均未达到AAMI/ISO 81060-2的精度要求(平均误差
摘要:Cuffless blood pressure screening based on easily acquired photoplethysmography (PPG) signals offers a practical pathway toward scalable cardiovascular health assessment. Despite rapid progress, existing PPG-based blood pressure estimation models have not consistently achieved the established clinical numerical limits such as AAMI/ISO 81060-2, and prior evaluations often lack the rigorous experimental controls necessary for valid clinical assessment. Moreover, the publicly available datasets commonly used are heterogeneous and lack physiologically controlled conditions for fair benchmarking. To enable fair benchmarking under physiologically controlled conditions, we created a standardized benchmarking subset NBPDB comprising 101,453 high-quality PPG segments from 1,103 healthy adults, derived from MIMIC-III and VitalDB. Using this dataset, we systematically benchmarked several state-of-the-art PPG-based models. The results showed that none of the evaluated models met the AAMI/ISO 81060-2 accuracy requirements (mean error $
【2】DiGAN: Diffusion-Guided Attention Network for Early Alzheimer's Disease Detection
标题:DiGAN:用于早期阿尔茨海默病检测的扩散引导注意力网络
链接:https://arxiv.org/abs/2602.03881
作者:Maxx Richard Rahman,Mostafa Hammouda,Wolfgang Maass
摘要:阿尔茨海默病(AD)的早期诊断仍然是一个重大的挑战,由于微妙的和暂时不规则的进展,结构的脑变化在前驱阶段。现有的深度学习方法需要大型纵向数据集,并且通常无法对现实世界临床数据中固有的时间连续性和模态不规则性进行建模。为了解决这些限制,我们提出了扩散引导注意力网络(DiGAN),它将潜在扩散建模与注意力引导卷积网络集成在一起。扩散模型从有限的训练数据中合成逼真的纵向神经成像轨迹,丰富了时间背景并提高了对不均匀间隔访问的鲁棒性。然后,注意力卷积层捕获区分性的结构-时间模式,将认知正常的受试者与轻度认知障碍和主观认知下降的受试者区分开来。在合成数据集和ADNI数据集上的实验表明,DiGAN的性能优于现有的最先进的基线,显示出其在早期AD检测方面的潜力。
摘要:Early diagnosis of Alzheimer's disease (AD) remains a major challenge due to the subtle and temporally irregular progression of structural brain changes in the prodromal stages. Existing deep learning approaches require large longitudinal datasets and often fail to model the temporal continuity and modality irregularities inherent in real-world clinical data. To address these limitations, we propose the Diffusion-Guided Attention Network (DiGAN), which integrates latent diffusion modelling with an attention-guided convolutional network. The diffusion model synthesizes realistic longitudinal neuroimaging trajectories from limited training data, enriching temporal context and improving robustness to unevenly spaced visits. The attention-convolutional layer then captures discriminative structural--temporal patterns that distinguish cognitively normal subjects from those with mild cognitive impairment and subjective cognitive decline. Experiments on synthetic and ADNI datasets demonstrate that DiGAN outperforms existing state-of-the-art baselines, showing its potential for early-stage AD detection.
【3】Machine Learning-Driven Crystal System Prediction for Perovskites Using Augmented X-ray Diffraction Data
标题:使用增强X射线散射数据的机器学习驱动钙钛矿晶体系预测
链接:https://arxiv.org/abs/2602.04435
作者:Ansu Mathew,Ahmer A. B. Baloch,Alamin Yakasai,Hemant Mittal,Vivian Alberts,Jayakumar V. Karunamurthy
备注:37 pages, 7 figures. Author accepted manuscript. Published in Engineering Applications of Artificial Intelligence
摘要:从X射线衍射(XRD)光谱预测晶体系统是材料科学中的一项关键任务,特别是对于以其在光电子学、光电子学和催化中的多种应用而闻名的钙钛矿材料。在这项研究中,我们提出了一个机器学习(ML)驱动的框架,该框架利用先进的模型,包括时间序列森林(TSF),随机森林(RF),极端梯度提升(XGBoost),递归神经网络(RNN),长短期记忆(LSTM),门控递归单元(GRU)和一个简单的前馈神经网络(NN),对晶体系统,点群,和空间群。为了解决类别不平衡和增强模型鲁棒性,我们集成了特征增强策略,如合成少数过采样技术(SMOTE),类别加权,抖动和频谱偏移,以及高效的数据预处理管道。SMOTE增强的TSF模型在晶体系统预测方面表现出色,Matthews相关系数(MCC)为0.9,F1得分为0.92,准确率为97.76%。对于点群和空间群的预测,平衡精度达到95%以上。该模型表现出高性能的独特的类,包括立方晶体系统,点群3 m和m-3 m,空间群Pnma和Pnnn。这项工作突出了ML在基于XRD的结构表征和加速发现钙钛矿材料方面的潜力
摘要:Prediction of crystal system from X-ray diffraction (XRD) spectra is a critical task in materials science, particularly for perovskite materials which are known for their diverse applications in photovoltaics, optoelectronics, and catalysis. In this study, we present a machine learning (ML)-driven framework that leverages advanced models, including Time Series Forest (TSF), Random Forest (RF), Extreme Gradient Boosting (XGBoost), Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and a simple feedforward neural network (NN), to classify crystal systems, point groups, and space groups from XRD data of perovskite materials. To address class imbalance and enhance model robustness, we integrated feature augmentation strategies such as Synthetic Minority Over-sampling Technique (SMOTE), class weighting, jittering, and spectrum shifting, along with efficient data preprocessing pipelines. The TSF model with SMOTE augmentation achieved strong performance for crystal system prediction, with a Matthews correlation coefficient (MCC) of 0.9, an F1 score of 0.92, and an accuracy of 97.76%. For point and space group prediction, balanced accuracies above 95% were obtained. The model demonstrated high performance for symmetry-distinct classes, including cubic crystal systems, point groups 3m and m-3m, and space groups Pnma and Pnnn. This work highlights the potential of ML for XRD-based structural characterization and accelerated discovery of perovskite materials
蒸馏|知识提取(2篇)
【1】REDistill: Robust Estimator Distillation for Balancing Robustness and Efficiency
标题:REDistill:平衡稳健性和效率的稳健估计蒸馏
链接:https://arxiv.org/abs/2602.04677
作者:Ondrej Tybl,Lukas Neumann
摘要:知识蒸馏(KD)通过调整预测分布将知识从大型教师模型转移到较小的学生。然而,传统的KD公式-通常基于Kullback-Leibler发散-假设教师提供可靠的软目标。在实践中,教师的预测往往是嘈杂的或过于自信,现有的基于校正的方法依赖于ad-hoc算法和广泛的超参数调整,这阻碍了泛化。我们介绍REDistill(稳健估计蒸馏),这是一个基于稳健统计的简单而有原则的框架。REDistill用幂发散损失代替标准KD目标,这是KL发散的一种推广,它自适应地降低不可靠的教师输出的权重,同时保留信息性的logit关系。该公式提供了教师噪声的统一和可解释的处理,仅需要logits,无缝集成到现有的KD管道中,并且产生的计算开销可以忽略不计。在CIFAR-100和ImageNet-1 k上进行的大量实验表明,REDistill在不同的师生架构中始终提高了学生的准确性。值得注意的是,它在没有模型特定的超参数调整的情况下实现了这些增益,强调了它的鲁棒性和对看不见的师生对的强大泛化能力。
摘要:Knowledge Distillation (KD) transfers knowledge from a large teacher model to a smaller student by aligning their predictive distributions. However, conventional KD formulations - typically based on Kullback-Leibler divergence - assume that the teacher provides reliable soft targets. In practice, teacher predictions are often noisy or overconfident, and existing correction-based approaches rely on ad-hoc heuristics and extensive hyper-parameter tuning, which hinders generalization. We introduce REDistill (Robust Estimator Distillation), a simple yet principled framework grounded in robust statistics. REDistill replaces the standard KD objective with a power divergence loss, a generalization of KL divergence that adaptively downweights unreliable teacher output while preserving informative logit relationships. This formulation provides a unified and interpretable treatment of teacher noise, requires only logits, integrates seamlessly into existing KD pipelines, and incurs negligible computational overhead. Extensive experiments on CIFAR-100 and ImageNet-1k demonstrate that REDistill consistently improves student accuracy in diverse teacher-student architectures. Remarkably, it achieves these gains without model-specific hyper-parameter tuning, underscoring its robustness and strong generalization to unseen teacher-student pairs.
【2】Knowledge Distillation for mmWave Beam Prediction Using Sub-6 GHz Channels
标题:使用Sub-6 GHz通道预测毫米波束的知识提炼
链接:https://arxiv.org/abs/2602.04703
作者:Sina Tavakolian,Nhan Thanh Nguyen,Ahmed Alkhateeb,Markku Juntti
备注:5 pages, 4 figures. Accepted for publication at IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2026
摘要:毫米波(mmWave)高移动性环境中的波束成形通常会导致大量的训练开销。虽然先前的研究表明,可以利用低于6 GHz的信道来预测最佳毫米波波束,但现有方法依赖于大型深度学习(DL)模型,具有极高的计算和内存要求。在本文中,我们提出了一个计算效率的框架,6 GHz以下的信道毫米波波束映射的基础上知识蒸馏(KD)技术。我们开发了两个紧凑的学生DL架构的基础上,个人和关系蒸馏策略,只保留了几个隐藏层,但密切模仿大型教师DL模型的性能。大量的模拟表明,所提出的学生模型实现了教师的波束预测精度和频谱效率,同时减少了99%的可训练参数和计算复杂度。
摘要:Beamforming in millimeter-wave (mmWave) high-mobility environments typically incurs substantial training overhead. While prior studies suggest that sub-6 GHz channels can be exploited to predict optimal mmWave beams, existing methods depend on large deep learning (DL) models with prohibitive computational and memory requirements. In this paper, we propose a computationally efficient framework for sub-6 GHz channel-mmWave beam mapping based on the knowledge distillation (KD) technique. We develop two compact student DL architectures based on individual and relational distillation strategies, which retain only a few hidden layers yet closely mimic the performance of large teacher DL models. Extensive simulations demonstrate that the proposed student models achieve the teacher's beam prediction accuracy and spectral efficiency while reducing trainable parameters and computational complexity by 99%.
推荐(1篇)
【1】A Bandit-Based Approach to Educational Recommender Systems: Contextual Thompson Sampling for Learner Skill Gain Optimization
标题:教育推荐系统中基于Bandit的方法:学习者技能增益优化的上下文Thompson采样
链接:https://arxiv.org/abs/2602.04347
作者:Lukas De Kerpel,Arthur Thuy,Dries F. Benoit
备注:Accepted for publication in INFORMS Transactions on Education
摘要:近年来,运筹学(OR)、管理科学(MS)和分析学的教学实践越来越多地转向数字环境,在数字环境中,庞大而多样化的学习者群体使得难以提供适应个人需求的实践。本文介绍了一种方法,通过选择,在每一步,练习最有可能推进学习者的目标技能的理解,生成个性化的练习序列。该方法使用有关学习者及其过去表现的信息来指导这些选择,学习进度以每次练习前后估计技能水平的变化来衡量。使用来自在线数学辅导平台的数据,我们发现该方法推荐与更大的技能提高相关的练习,并有效地适应学习者之间的差异。从教学的角度来看,该框架可以实现大规模的个性化实践,突出具有持续强大学习价值的练习,并帮助教师识别可能受益于额外支持的学习者。
摘要
:In recent years, instructional practices in Operations Research (OR), Management Science (MS), and Analytics have increasingly shifted toward digital environments, where large and diverse groups of learners make it difficult to provide practice that adapts to individual needs. This paper introduces a method that generates personalized sequences of exercises by selecting, at each step, the exercise most likely to advance a learner's understanding of a targeted skill. The method uses information about the learner and their past performance to guide these choices, and learning progress is measured as the change in estimated skill level before and after each exercise. Using data from an online mathematics tutoring platform, we find that the approach recommends exercises associated with greater skill improvement and adapts effectively to differences across learners. From an instructional perspective, the framework enables personalized practice at scale, highlights exercises with consistently strong learning value, and helps instructors identify learners who may benefit from additional support.
聚类(2篇)
【1】Legendre Memory Unit with A Multi-Slice Compensation Model for Short-Term Wind Speed Forecasting Based on Wind Farm Cluster Data
标题:基于风电场集群数据的短期风速预测具有多片补偿模型的Legendre存储单元
链接:https://arxiv.org/abs/2602.04782
作者:Mumin Zhang,Haochen Zhang,Xin Zhi Khoo,Yilin Zhang,Nuo Chen,Ting Zhang,Junjie Tang
备注:10 pages, 11 figures,
摘要:随着越来越多的风电场集群并网,风电场集群的短期风速预测对电力系统的正常运行至关重要。本文研究如何充分利用具有时空相关性的聚类数据,实现准确、快速、鲁棒的风速预测。首先,加权均值滤波(WMF)应用于去噪风速数据在单农场的水平。创新性地将勒让德记忆单元(Legendre memory unit,LMU)应用于风速预测,并结合基于风电场聚类数据Kendall秩相关系数(Kendall rank correlation coefficient,CPK)的补偿参数,构建多切片LMU(multi-slice LMU,MSLMU)。最后,本文提出了一种新的集成模型WMF-CPK-MSLMU,包括三个关键模块:数据预处理、预测和多切片补偿。优点包括:1)LMU联合建模风电场之间的线性和非线性依赖关系,以通过反向传播捕获时空相关性; 2)MSLMU通过使用CPK导出的权重而不是随机初始化来增强预测,允许空间相关性完全激活集群风电场之间的隐藏节点。3)CPK自适应地对MSLMU中的补偿模型进行加权,并在空间上对缺失数据进行补充,使整个模型具有较高的精度和鲁棒性。对不同风电场群的测试结果表明,与现有模型相比,所提出的集合模型WMF-CPK-MSLMU在风电场群短期预测中的有效性和优越性。
摘要:With more wind farms clustered for integration, the short-term wind speed prediction of such wind farm clusters is critical for normal operation of power systems. This paper focuses on achieving accurate, fast, and robust wind speed prediction by full use of cluster data with spatial-temporal correlation. First, weighted mean filtering (WMF) is applied to denoise wind speed data at the single-farm level. The Legendre memory unit (LMU) is then innovatively applied for the wind speed prediction, in combination with the Compensating Parameter based on Kendall rank correlation coefficient (CPK) of wind farm cluster data, to construct the multi-slice LMU (MSLMU). Finally, an innovative ensemble model WMF-CPK-MSLMU is proposed herein, with three key blocks: data pre-processing, forecasting, and multi-slice compensation. Advantages include: 1) LMU jointly models linear and nonlinear dependencies among farms to capture spatial-temporal correlations through backpropagation; 2) MSLMU enhances forecasting by using CPK-derived weights instead of random initialization, allowing spatial correlations to fully activate hidden nodes across clustered wind farms.; 3) CPK adaptively weights the compensation model in MSLMU and complements missing data spatially, to facilitate the whole model highly accurate and robust. Test results on different wind farm clusters indicate the effectiveness and superiority of proposed ensemble model WMF-CPK-MSLMU in the short-term prediction of wind farm clusters compared to the existing models.
【2】Journey to the Centre of Cluster: Harnessing Interior Nodes for A/B Testing under Network Interference
标题:集群中心之旅:在网络干扰下利用内部节点进行A/B测试
链接:https://arxiv.org/abs/2602.04457
作者:Qianyi Chen,Anpeng Wu,Bo Li,Lu Deng,Yong Wang
备注:ICLR 2026
摘要:平台上的A/B测试通常面临来自网络干扰的挑战,其中单元的结果不仅取决于其自身的处理,还取决于其网络邻居的处理。为了解决这个问题,集群级的随机化已经成为标准,使网络感知估计器的使用成为可能。这些估计器通常修剪数据以仅保留信息单元的子集,在合适的条件下实现低偏差,但通常遭受高方差。在本文中,我们首先证明,内部节点-单位的邻居都位于同一个集群-构成绝大多数的后修剪子群。有鉴于此,我们建议直接平均内部节点,以构建内部均值(MII)估计,这规避了现有网络感知估计所需的微妙的重新加权,并大大减少了经典设置中的方差。然而,我们发现,内部节点往往不能代表整个人口,特别是在网络依赖的协变量,导致显着的偏见。然后,我们使用在整个网络上训练的反事实预测器来增强MII估计器,使我们能够调整内部节点和完整群体之间的协变量分布变化。通过重新排列的表达式,我们发现,我们的增强MII估计体现了预测动力推理框架内的点估计的分析形式。这种见解激发了半监督镜头,其中内部节点被视为受选择偏差影响的标记数据。广泛和具有挑战性的模拟研究表明,我们的增强MII估计在各种设置的出色表现。
摘要:A/B testing on platforms often faces challenges from network interference, where a unit's outcome depends not only on its own treatment but also on the treatments of its network neighbors. To address this, cluster-level randomization has become standard, enabling the use of network-aware estimators. These estimators typically trim the data to retain only a subset of informative units, achieving low bias under suitable conditions but often suffering from high variance. In this paper, we first demonstrate that the interior nodes - units whose neighbors all lie within the same cluster - constitute the vast majority of the post-trimming subpopulation. In light of this, we propose directly averaging over the interior nodes to construct the mean-in-interior (MII) estimator, which circumvents the delicate reweighting required by existing network-aware estimators and substantially reduces variance in classical settings. However, we show that interior nodes are often not representative of the full population, particularly in terms of network-dependent covariates, leading to notable bias. We then augment the MII estimator with a counterfactual predictor trained on the entire network, allowing us to adjust for covariate distribution shifts between the interior nodes and full population. By rearranging the expression, we reveal that our augmented MII estimator embodies an analytical form of the point estimator within prediction-powered inference framework. This insight motivates a semi-supervised lens, wherein interior nodes are treated as labeled data subject to selection bias. Extensive and challenging simulation studies demonstrate the outstanding performance of our augmented MII estimator across various settings.
联邦学习|隐私保护|加密(1篇)
【1】Blockchain Federated Learning for Sustainable Retail: Reducing Waste through Collaborative Demand Forecasting
标题:可持续零售的区块链联合学习:通过协作需求预测减少浪费
链接:https://arxiv.org/abs/2602.04384
作者:Fabio Turazza,Alessandro Neri,Marcello Pietri,Maria Angela Butturi,Marco Picone,Marco Mamei
备注:Author-accepted manuscript of a paper published in the IEEE International Symposium on Computers and Communications (ISCC), 2025, pp. 1-6. doi: https://doi.org/10.1109/ISCC65549.2025.11326299
摘要:有效的需求预测对于减少食物浪费至关重要。然而,数据隐私问题往往阻碍零售商之间的合作,限制了提高预测准确性的潜力。在这项研究中,我们探讨了联邦学习(FL)在可持续供应链管理(SSCM)中的应用,重点是处理易腐商品的杂货零售业。我们开发了一个基线预测模型的需求预测和浪费评估在一个孤立的零售商的情况。随后,我们引入了一个基于区块链的FL模型,在多个零售商之间进行协作训练,而无需直接共享数据。我们的初步结果表明,FL模型的性能几乎相当于各方共享数据的理想环境,并且明显优于由各方构建的模型,而无需共享数据,减少浪费并提高效率。
摘要
:Effective demand forecasting is crucial for reducing food waste. However, data privacy concerns often hinder collaboration among retailers, limiting the potential for improved predictive accuracy. In this study, we explore the application of Federated Learning (FL) in Sustainable Supply Chain Management (SSCM), with a focus on the grocery retail sector dealing with perishable goods. We develop a baseline predictive model for demand forecasting and waste assessment in an isolated retailer scenario. Subsequently, we introduce a Blockchain-based FL model, trained collaboratively across multiple retailers without direct data sharing. Our preliminary results show that FL models have performance almost equivalent to the ideal setting in which parties share data with each other, and are notably superior to models built by individual parties without sharing data, cutting waste and boosting efficiency.
推理|分析|理解|解释(13篇)
【1】It's not a Lottery, it's a Race: Understanding How Gradient Descent Adapts the Network's Capacity to the Task
标题:这不是彩票,这是一场竞赛:了解梯度下降如何使网络容量适应任务
链接:https://arxiv.org/abs/2602.04832
作者:Hannah Pinson
摘要:我们对神经网络的理论理解落后于它们的经验成功。其中一个重要的未解释的现象是,在梯度下降训练过程中,神经网络的理论容量为什么以及如何降低到适合任务的有效容量。在这里,我们通过分析单隐藏层ReLU网络中单个神经元级别的学习动态来研究梯度下降实现这一目标的机制。我们确定了三个动力学原则-相互对齐,解锁和竞赛-共同解释了为什么我们经常可以通过合并等效神经元或修剪低范数权重来成功地减少训练后的容量。我们特别解释了彩票猜想背后的机制,或者为什么某些神经元的特定的、有益的初始条件会导致它们获得更高的权重范数。
摘要:Our theoretical understanding of neural networks is lagging behind their empirical success. One of the important unexplained phenomena is why and how, during the process of training with gradient descent, the theoretical capacity of neural networks is reduced to an effective capacity that fits the task. We here investigate the mechanism by which gradient descent achieves this through analyzing the learning dynamics at the level of individual neurons in single hidden layer ReLU networks. We identify three dynamical principles -- mutual alignment, unlocking and racing -- that together explain why we can often successfully reduce capacity after training through the merging of equivalent neurons or the pruning of low norm weights. We specifically explain the mechanism behind the lottery ticket conjecture, or why the specific, beneficial initial conditions of some neurons lead them to obtain higher weight norms.
【2】Delving into Muon and Beyond: Deep Analysis and Extensions
标题:深入研究μ子及超越:深度分析和扩展
链接:https://arxiv.org/abs/2602.04669
作者:Xianbiao Qi,Marco Chen,Jiaquan Ye,Yelin He,Rong Xiao
备注:This paper studies matrix-based optimizers (e.g., Muon) from a spectral perspective and unifies a range of methods under a common spectral framework
摘要:Muon优化器最近因其强大的经验性能和对矩阵形状参数的正交更新的使用而引起了相当大的关注,但其潜在的机制和与Adam等自适应优化器的关系仍然没有得到充分的理解。在这项工作中,我们的目标是解决这些问题,通过一个统一的光谱的角度来看。具体来说,我们认为μ子作为一个家庭的频谱变换的形式U \boldsymbol ^{p} V'的p = 0端点,并考虑其他变量与p = 1/2,p = 1/4,和p = 1。这些变换既适用于动量SGD中的一阶矩更新,也适用于Adam中的均方根(RMS)归一化梯度更新。为了使有效的计算,我们开发了一个耦合牛顿迭代,避免显式奇异值分解。在对照实验中,我们发现RMS归一化更新比一阶矩更新产生更稳定的优化。此外,虽然频谱压缩在一阶矩更新下提供了强大的稳定性优势,但μ子更新(p = 0)并不总是优于Adam。这些结果表明,μ介子最好理解为一种有效的光谱归一化形式,但不是一种普遍优越的优化方法。我们的源代码将在https://github.com/Ocram7/BeyondMuon上发布。
摘要:The Muon optimizer has recently attracted considerable attention for its strong empirical performance and use of orthogonalized updates on matrix-shaped parameters, yet its underlying mechanisms and relationship to adaptive optimizers such as Adam remain insufficiently understood. In this work, we aim to address these questions through a unified spectral perspective. Specifically, we view Muon as the p = 0 endpoint of a family of spectral transformations of the form U \boldsymbolΣ^{p} V' , and consider additional variants with p = 1/2 , p = 1/4 , and p = 1 . These transformations are applied to both first-moment updates, as in momentum SGD, and to root-mean-square (RMS) normalized gradient updates as in Adam. To enable efficient computation, we develop a coupled Newton iteration that avoids explicit singular value decomposition. Across controlled experiments, we find that RMS-normalized updates yield more stable optimization than first-moment updates. Moreover, while spectral compression provides strong stabilization benefits under first-moment updates, the Muon update (p = 0) does not consistently outperform Adam. These results suggest that Muon is best understood as an effective form of spectral normalization, but not a universally superior optimization method. Our source code will be released at https://github.com/Ocram7/BeyondMuon.
【3】EXaMCaP: Subset Selection with Entropy Gain Maximization for Probing Capability Gains of Large Chart Understanding Training Sets
标题:EXaMCaP:具有最大化的子集选择,用于探索大型图表理解训练集的能力收益
链接:https://arxiv.org/abs/2602.04365
作者:Jiapeng Liu,Liang Li,Bing Li,Peng Fu,Xiyan Gao,Chengyang Fang,Xiaoshuai Hao,Can Ma
摘要:最近的工作集中在合成图表理解(ChartU)训练集,将高级图表知识注入多模态大型语言模型(MLLM),其中知识的充分性通常通过微调然后评估范式量化能力增益来验证。然而,完整的微调MLLM来评估这样的收益会产生大量的时间成本,阻碍了ChartU数据集的迭代优化周期。回顾ChartU数据集合成和数据选择域,我们发现子集可以潜在地探测MLLM从全集微调中获得的能力。考虑到数据多样性对于提高MLLM的性能至关重要,并且熵反映了这一特征,我们提出了EXaMCaP,它使用熵增益最大化来选择子集。为了获得高多样性子集,EXaMCaP从大型ChartU数据集中选择最大熵子集。由于枚举所有可能的子集是不切实际的,EXaMCaP迭代地选择样本以最大化相对于当前集合的集合熵增益,近似于完整数据集的最大熵子集。实验表明,EXaMCaP在探测ChartU训练集的能力增益方面优于基线,以及其在不同子集大小上的强大有效性和与各种MLLM架构的兼容性。
摘要:Recent works focus on synthesizing Chart Understanding (ChartU) training sets to inject advanced chart knowledge into Multimodal Large Language Models (MLLMs), where the sufficiency of the knowledge is typically verified by quantifying capability gains via the fine-tune-then-evaluate paradigm. However, full-set fine-tuning MLLMs to assess such gains incurs significant time costs, hindering the iterative refinement cycles of the ChartU dataset. Reviewing the ChartU dataset synthesis and data selection domains, we find that subsets can potentially probe the MLLMs' capability gains from full-set fine-tuning. Given that data diversity is vital for boosting MLLMs' performance and entropy reflects this feature, we propose EXaMCaP, which uses entropy gain maximization to select a subset. To obtain a high-diversity subset, EXaMCaP chooses the maximum-entropy subset from the large ChartU dataset. As enumerating all possible subsets is impractical, EXaMCaP iteratively selects samples to maximize the gain in set entropy relative to the current set, approximating the maximum-entropy subset of the full dataset. Experiments show that EXaMCaP outperforms baselines in probing the capability gains of the ChartU training set, along with its strong effectiveness across diverse subset sizes and compatibility with various MLLM architectures.
【4】Counterfactual Explanations for Hypergraph Neural Networks
标题:超图神经网络的反事实解释
链接:https://arxiv.org/abs/2602.04360
作者:Fabiano Veglianti,Lorenzo Antonelli,Gabriele Tolomei
摘要:超图神经网络(Hypergraph Neural Networks,HGNN)有效地模拟了许多现实世界系统中的高阶交互,但仍然难以解释,限制了它们在高风险环境中的部署。 我们介绍了CF-HyperGNNExplainer,这是一种针对HGNN的反事实解释方法,它可以识别改变模型预测所需的最小结构变化。该方法使用可操作的编辑来生成反事实超图,该编辑限于移除节点超边缘发生率或删除超边缘,从而产生简洁且结构上有意义的解释。在三个基准数据集上的实验表明,CF-HyperGNNExplainer生成了有效且简洁的反事实,突出了对HGNN决策最关键的高阶关系。
摘要:Hypergraph neural networks (HGNNs) effectively model higher-order interactions in many real-world systems but remain difficult to interpret, limiting their deployment in high-stakes settings. We introduce CF-HyperGNNExplainer, a counterfactual explanation method for HGNNs that identifies the minimal structural changes required to alter a model's prediction. The method generates counterfactual hypergraphs using actionable edits limited to removing node-hyperedge incidences or deleting hyperedges, producing concise and structurally meaningful explanations. Experiments on three benchmark datasets show that CF-HyperGNNExplainer generates valid and concise counterfactuals, highlighting the higher-order relations most critical to HGNN decisions.
【5】RAPO: Risk-Aware Preference Optimization for Generalizable Safe Reasoning
标题:RAPO:可推广安全推理的风险感知偏好优化
链接:https://arxiv.org/abs/2602.04224
作者:Zeming Wei,Qiaosheng Zhang,Xia Hu,Xingcheng Xu
摘要:大型推理模型(LRM)在其思想链(CoT)推理方面取得了巨大的成功,但也面临着与基本语言模型类似的安全问题。特别是,虽然算法的设计是为了引导他们用安全的推理来故意拒绝有害的提示,但这个过程往往不能概括各种复杂的越狱攻击。在这项工作中,我们将这些失败归因于安全推理过程的泛化,特别是它们对复杂攻击提示的不足。我们提供了理论和经验证据,以显示一个更充分的安全推理过程,以抵御先进的攻击提示的必要性。在此基础上,我们提出了一个风险感知偏好优化(RAPO)框架,使LRM能够自适应地识别和解决安全风险与适当的粒度在其思想内容。大量的实验表明,RAPO成功地概括了多个LRM的安全推理自适应不同的攻击提示,同时保持通用性,有助于LRM安全的鲁棒对齐技术。我们的代码可在https://github.com/weizeming/RAPO上获得。
摘要:Large Reasoning Models (LRMs) have achieved tremendous success with their chain-of-thought (CoT) reasoning, yet also face safety issues similar to those of basic language models. In particular, while algorithms are designed to guide them to deliberately refuse harmful prompts with safe reasoning, this process often fails to generalize against diverse and complex jailbreak attacks. In this work, we attribute these failures to the generalization of the safe reasoning process, particularly their insufficiency against complex attack prompts. We provide both theoretical and empirical evidence to show the necessity of a more sufficient safe reasoning process to defend against advanced attack prompts. Building on this insight, we propose a Risk-Aware Preference Optimization (RAPO) framework that enables LRM to adaptively identify and address the safety risks with appropriate granularity in its thinking content. Extensive experiments demonstrate that RAPO successfully generalizes multiple LRMs' safe reasoning adaptively across diverse attack prompts whilst preserving general utility, contributing a robust alignment technique for LRM safety. Our code is available at https://github.com/weizeming/RAPO.
【6】Axiomatic Foundations of Counterfactual Explanations
标题:反事实解释的公理基础
链接:https://arxiv.org/abs/2602.04028
作者:Leila Amgoud,Martin Cooper
摘要:解释自主和智能系统对于提高对它们的决策的信任至关重要。反事实已经成为最令人信服的解释形式之一。它们通过揭示如何改变决定来解决“为什么不”的问题。尽管越来越多的文献,大多数现有的解释集中在一个单一类型的反事实,并限于本地的解释,专注于个别情况。目前还没有系统的研究替代反事实类型,也没有全球反事实,揭示了系统的整体推理过程。 本文通过引入一个建立在反事实解释者的一组理想属性上的公理框架来解决这两个差距。它证明了不可能性定理,表明没有一个解释者可以同时满足某些公理组合,并充分刻画了所有相容集。然后,表现定理在公理的特定子集和满足它们的解释者家族之间建立了五个一一对应的关系。每一个家族都产生了一种不同类型的反事实解释,揭示了五种根本不同类型的反事实。其中一些对应于局部解释,而另一些则捕获全局解释。最后,该框架将现有的解释器置于此分类中,正式描述其行为,并分析生成此类解释的计算复杂性。
摘要:Explaining autonomous and intelligent systems is critical in order to improve trust in their decisions. Counterfactuals have emerged as one of the most compelling forms of explanation. They address ``why not'' questions by revealing how decisions could be altered. Despite the growing literature, most existing explainers focus on a single type of counterfactual and are restricted to local explanations, focusing on individual instances. There has been no systematic study of alternative counterfactual types, nor of global counterfactuals that shed light on a system's overall reasoning process. This paper addresses the two gaps by introducing an axiomatic framework built on a set of desirable properties for counterfactual explainers. It proves impossibility theorems showing that no single explainer can satisfy certain axiom combinations simultaneously, and fully characterizes all compatible sets. Representation theorems then establish five one-to-one correspondences between specific subsets of axioms and the families of explainers that satisfy them. Each family gives rise to a distinct type of counterfactual explanation, uncovering five fundamentally different types of counterfactuals. Some of these correspond to local explanations, while others capture global explanations. Finally, the framework situates existing explainers within this taxonomy, formally characterizes their behavior, and analyzes the computational complexity of generating such explanations.
【7】Monitorability as a Free Gift: How RLVR Spontaneously Aligns Reasoning
标题:作为免费礼物的可穿戴性:WLVR赞助商如何协调推理
链接:https://arxiv.org/abs/2602.03978
作者:Zidi Xiong,Shan Chen,Himabindu Lakkaraju
摘要:随着大型推理模型(LRM)的部署越来越多,审计其思想链(CoT)跟踪的安全性变得至关重要。最近的研究报告指出,可监控性(CoT忠实和信息化地反映内部计算的程度)可以在具有可验证奖励的强化学习(RLVR)的早期阶段作为“免费礼物”出现。我们通过对模型家族和培训领域的系统评估,使这一观察具体化。我们的研究结果表明,这种效果是不普遍的:可监测性的改善是强烈的数据依赖。特别是,我们证明了数据多样性和预防以下数据在RLVR培训的关键作用。我们进一步表明,可监测性是正交的能力-推理性能的改善并不意味着增加透明度。通过机制分析,我们属性的可监测性收益主要是响应分布锐化(熵减少)和增加注意的提示,而不是更强的因果关系依赖推理痕迹。我们还揭示了如何监控动态变化与控制培训和评估难度。总之,这些调查结果提供了一个全面的看法,如何监测出现在RLVR下,澄清何时可能出现收益,何时不会。
摘要:As Large Reasoning Models (LRMs) are increasingly deployed, auditing their chain-of-thought (CoT) traces for safety becomes critical. Recent work has reported that monitorability--the degree to which CoT faithfully and informatively reflects internal computation--can appear as a "free gift" during the early stages of Reinforcement Learning with Verifiable Rewards (RLVR). We make this observation concrete through a systematic evaluation across model families and training domains. Our results show that this effect is not universal: monitorability improvements are strongly data-dependent. In particular, we demonstrate the critical role of data diversity and instruction-following data during RLVR training. We further show that monitorability is orthogonal to capability--improvements in reasoning performance do not imply increased transparency. Through mechanistic analysis, we attribute monitorability gains primarily to response distribution sharpening (entropy reduction) and increased attention to the prompt, rather than stronger causal reliance on reasoning traces. We also reveal how monitorability dynamics vary with controlled training and evaluation difficulty. Together, these findings provide a holistic view of how monitorability emerges under RLVR, clarifying when gains are likely to occur and when they are not.
【8】Semantic Rate Distortion and Posterior Design: Compute Constraints, Multimodality, and Strategic Inference
标题:语义率失真和后验设计:计算约束、多模式和战略推理
链接:https://arxiv.org/abs/2602.03949
作者:Emrah Akyol
备注:submitted for publication
摘要:我们研究战略高斯语义压缩率和计算约束下,编码器和解码器优化不同的二次目标。一个潜在的高斯状态产生一个任务相关的语义变量,解码器最好的响应通过MMSE估计,减少编码器的问题,信息速率约束下的后验协方差设计。我们的特点是战略率失真函数在直接,远程和全信息制度,推导出语义注水和速率约束高斯说服解决方案,并建立高斯最优失调的目标。我们进一步表明,架构计算限制充当隐式速率约束,通过模型深度和推理时间计算,使语义准确性呈指数级提高,而多模式观察消除了远程编码固有的几何平均惩罚。这些结果为数据和节能AI提供了信息理论基础,并为现代多模态语言模型作为资源约束下的后验设计机制提供了原则性解释。
摘要:We study strategic Gaussian semantic compression under rate and compute constraints, where an encoder and decoder optimize distinct quadratic objectives. A latent Gaussian state generates a task dependent semantic variable, and the decoder best responds via MMSE estimation, reducing the encoder's problem to posterior covariance design under an information rate constraint. We characterize the strategic rate distortion function in direct, remote, and full information regimes, derive semantic waterfilling and rate constrained Gaussian persuasion solutions, and establish Gaussian optimality under misaligned objectives. We further show that architectural compute limits act as implicit rate constraints, yielding exponential improvements in semantic accuracy with model depth and inference time compute, while multimodal observation eliminates the geometric mean penalty inherent to remote encoding. These results provide information theoretic foundations for data and energy efficient AI and offer a principled interpretation of modern multimodal language models as posterior design mechanisms under resource constraints.
【9】Explainable Computer Vision Framework for Automated Pore Detection and Criticality Assessment in Additive Manufacturing
标题:用于增材制造中自动孔隙检测和临界性评估的可解释计算机视觉框架
链接:https://arxiv.org/abs/2602.03883
作者:Akshansh Mishra,Rakesh Morisetty
备注:6 figures
摘要:内部孔隙仍然是增材制造组件中的关键缺陷模式,影响了结构性能并限制了工业应用。自动化缺陷检测方法存在,但缺乏可解释性,阻止工程师理解临界预测的物理基础。这项研究提出了一个可解释的计算机视觉框架孔隙检测和临界性评估的三维层析体积。将连续的灰度切片重建为体积数据集,并通过基于强度的阈值化和连通分量分析识别出500个单独的孔隙。每个孔的特征在于使用几何描述符,包括大小,纵横比,程度,和相对于标本边界的空间位置。使用基于网格的欧几里德距离标准构建孔隙相互作用网络,产生24,950个孔隙间连接。机器学习模型从提取的特征预测孔隙关键性分数,并且SHAP分析量化个体特征贡献。结果表明,归一化的表面距离占主导地位的模型预测,贡献超过一个数量级的重要性比所有其他描述符。孔径提供最小的影响,而几何参数显示可忽略的影响。表面接近度和临界度之间的强反比关系揭示了边界驱动的失效机制。这种可解释的框架可以实现透明的缺陷评估,并为增材制造中的工艺优化和质量控制提供可操作的见解。
摘要:Internal porosity remains a critical defect mode in additively manufactured components, compromising structural performance and limiting industrial adoption. Automated defect detection methods exist but lack interpretability, preventing engineers from understanding the physical basis of criticality predictions. This study presents an explainable computer vision framework for pore detection and criticality assessment in three-dimensional tomographic volumes. Sequential grayscale slices were reconstructed into volumetric datasets, and intensity-based thresholding with connected component analysis identified 500 individual pores. Each pore was characterized using geometric descriptors including size, aspect ratio, extent, and spatial position relative to the specimen boundary. A pore interaction network was constructed using percentile-based Euclidean distance criteria, yielding 24,950 inter-pore connections. Machine learning models predicted pore criticality scores from extracted features, and SHAP analysis quantified individual feature contributions. Results demonstrate that normalized surface distance dominates model predictions, contributing more than an order of magnitude greater importance than all other descriptors. Pore size provides minimal influence, while geometric parameters show negligible impact. The strong inverse relationship between surface proximity and criticality reveals boundary-driven failure mechanisms. This interpretable framework enables transparent defect assessment and provides actionable insights for process optimization and quality control in additive manufacturing.
【10】Understanding the Impact of Differentially Private Training on Memorization of Long-Tailed Data
标题:了解差异私人训练对长尾数据小型化的影响
链接:https://arxiv.org/abs/2602.03872
作者:Jiaming Zhang,Huanyi Xie,Meng Ding,Shaopeng Fu,Jinyan Liu,Di Wang
备注:arXiv admin note: text overlap with arXiv:2502.11893 by other authors
摘要:最近的研究表明,现代深度学习模型在一定程度上通过记忆单个训练样本来实现高预测精度。这种记忆引起了严重的隐私问题,促使人们广泛采用差分私有训练算法,如DP-SGD。然而,越来越多的实证研究表明,DP-SGD往往会导致次优的泛化性能,特别是在包含大量罕见或非典型样本的长尾数据上。尽管有这些观察结果,这种现象的理论理解仍然在很大程度上未被探索,现有的差分隐私分析很难扩展到实践中常用的非凸和非光滑神经网络。在这项工作中,我们开发了第一个理论框架,用于从特征学习的角度分析长尾数据上的DP-SGD。我们发现,DP-SGD训练模型在长尾子群体上的测试误差明显大于整个数据集的总体测试误差。我们的分析进一步表征了DP-SGD的训练动态,展示了梯度裁剪和噪声注入如何共同对模型记忆信息丰富但代表性不足的样本的能力产生不利影响。最后,我们通过对合成和真实世界数据集的广泛实验来验证我们的理论研究结果。
摘要:Recent research shows that modern deep learning models achieve high predictive accuracy partly by memorizing individual training samples. Such memorization raises serious privacy concerns, motivating the widespread adoption of differentially private training algorithms such as DP-SGD. However, a growing body of empirical work shows that DP-SGD often leads to suboptimal generalization performance, particularly on long-tailed data that contain a large number of rare or atypical samples. Despite these observations, a theoretical understanding of this phenomenon remains largely unexplored, and existing differential privacy analysis are difficult to extend to the nonconvex and nonsmooth neural networks commonly used in practice. In this work, we develop the first theoretical framework for analyzing DP-SGD on long-tailed data from a feature learning perspective. We show that the test error of DP-SGD-trained models on the long-tailed subpopulation is significantly larger than the overall test error over the entire dataset. Our analysis further characterizes the training dynamics of DP-SGD, demonstrating how gradient clipping and noise injection jointly adversely affect the model's ability to memorize informative but underrepresented samples. Finally, we validate our theoretical findings through extensive experiments on both synthetic and real-world datasets.
【11】Causal explanations of outliers in systems with lagged time-dependencies
标题:具有滞后时间依赖性的系统中异常值的原因解释
链接:https://arxiv.org/abs/2602.04667
作者:Philipp Alexander Schwarz,Johannes Oberpriller,Sven Klaassen
摘要
:受控时变系统的根本原因分析在应用中提出了一个主要的挑战。特别是能源系统是难以处理的,因为它们表现出即时以及延迟的影响,如果配备了存储,确实有一个记忆。在本文中,我们将Budhathoki等人的因果根本原因分析方法。[2022]适用于一般的时间依赖系统,因为它可以被视为术语“根本原因”的严格因果定义。特别地,我们讨论了两种截断方法来处理存在于时间依赖系统中的无限依赖图。其中一个保持因果机制不变,另一个则近似于起始节点的机制。不同方法的有效性是使用一个具有挑战性的数据生成过程进行基准测试的,该过程的灵感来自工厂能源管理中的一个问题:避免功耗峰值。我们表明,给予足够的滞后,我们的扩展是能够本地化的根本原因,在特征和时域。进一步讨论了机构近似的影响。
摘要:Root-cause analysis in controlled time dependent systems poses a major challenge in applications. Especially energy systems are difficult to handle as they exhibit instantaneous as well as delayed effects and if equipped with storage, do have a memory. In this paper we adapt the causal root-cause analysis method of Budhathoki et al. [2022] to general time-dependent systems, as it can be regarded as a strictly causal definition of the term "root-cause". Particularly, we discuss two truncation approaches to handle the infinite dependency graphs present in time-dependent systems. While one leaves the causal mechanisms intact, the other approximates the mechanisms at the start nodes. The effectiveness of the different approaches is benchmarked using a challenging data generation process inspired by a problem in factory energy management: the avoidance of peaks in the power consumption. We show that given enough lags our extension is able to localize the root-causes in the feature and time domain. Further the effect of mechanism approximation is discussed.
【12】Efficient Subgroup Analysis via Optimal Trees with Global Parameter Fusion
标题:通过具有全局参数融合的最优树进行有效的亚群分析
链接:https://arxiv.org/abs/2602.04077
作者:Zhongming Xie,Joseph Giorgio,Jingshen Wang
摘要:Identifying and making statistical inferences on differential treatment effects (commonly known as subgroup analysis in clinical research) is central to precision health. Subgroup analysis allows practitioners to pinpoint populations for whom a treatment is especially beneficial or protective, thereby advancing targeted interventions. Tree based recursive partitioning methods are widely used for subgroup analysis due to their interpretability. Nevertheless, these approaches encounter significant limitations, including suboptimal partitions induced by greedy heuristics and overfitting from locally estimated splits, especially under limited sample sizes. To address these limitations, we propose a fused optimal causal tree method that leverages mixed integer optimization (MIO) to facilitate precise subgroup identification. Our approach ensures globally optimal partitions and introduces a parameter fusion constraint to facilitate information sharing across related subgroups. This design substantially improves subgroup discovery accuracy and enhances statistical efficiency. We provide theoretical guarantees by rigorously establishing out of sample risk bounds and comparing them with those of classical tree based methods. Empirically, our method consistently outperforms popular baselines in simulations. Finally, we demonstrate its practical utility through a case study on the Health and Aging Brain Study Health Disparities (HABS-HD) dataset, where our approach yields clinically meaningful insights.
【13】Statistical Guarantees for Reasoning Probes on Looped Boolean Circuits
标题:循环布尔电路上推理探针的统计保证
链接:https://arxiv.org/abs/2602.03970
作者:Anastasis Kratsios,Giulia Livieri,A. Martina Neuman
摘要:We study the statistical behaviour of reasoning probes in a stylized model of looped reasoning, given by Boolean circuits whose computational graph is a perfect $ν$-ary tree ($ν\ge 2$) and whose output is appended to the input and fed back iteratively for subsequent computation rounds. A reasoning probe has access to a sampled subset of internal computation nodes, possibly without covering the entire graph, and seeks to infer which $ν$-ary Boolean gate is executed at each queried node, representing uncertainty via a probability distribution over a fixed collection of $\mathtt{m}$ admissible $ν$-ary gates. This partial observability induces a generalization problem, which we analyze in a realizable, transductive setting. We show that, when the reasoning probe is parameterized by a graph convolutional network (GCN)-based hypothesis class and queries $N$ nodes, the worst-case generalization error attains the optimal rate $\mathcal{O}(\sqrt{\log(2/δ)}/\sqrt{N})$ with probability at least $1-δ$, for $δ\in (0,1)$. Our analysis combines snowflake metric embedding techniques with tools from statistical optimal transport. A key insight is that this optimal rate is achievable independently of graph size, owing to the existence of a low-distortion one-dimensional snowflake embedding of the induced graph metric. As a consequence, our results provide a sharp characterization of how structural properties of the computational graph govern the statistical efficiency of reasoning under partial access.
检测相关(1篇)
【1】NeuroCanvas: VLLM-Powered Robust Seizure Detection by Reformulating Multichannel EEG as Image
标题:NeuroCanvas:通过将多通道脑电重新定义为图像来实现基于VLLM的稳健癫痫发作检测
链接:https://arxiv.org/abs/2602.04769
作者:Yan Chen,Jie Peng,Moajjem Hossain Chowdhury,Tianlong Chen,Yunmei Liu
摘要:Accurate and timely seizure detection from Electroencephalography (EEG) is critical for clinical intervention, yet manual review of long-term recordings is labor-intensive. Recent efforts to encode EEG signals into large language models (LLMs) show promise in handling neural signals across diverse patients, but two significant challenges remain: (1) multi-channel heterogeneity, as seizure-relevant information varies substantially across EEG channels, and (2) computing inefficiency, as the EEG signals need to be encoded into a massive number of tokens for the prediction. To address these issues, we draw the EEG signal and propose the novel NeuroCanvas framework. Specifically, NeuroCanvas consists of two modules: (i) The Entropy-guided Channel Selector (ECS) selects the seizure-relevant channels input to LLM and (ii) the following Canvas of Neuron Signal (CNS) converts selected multi-channel heterogeneous EEG signals into structured visual representations. The ECS module alleviates the multi-channel heterogeneity issue, and the CNS uses compact visual tokens to represent the EEG signals that improve the computing efficiency. We evaluate NeuroCanvas across multiple seizure detection datasets, demonstrating a significant improvement of $20\%$ in F1 score and reductions of $88\%$ in inference latency. These results highlight NeuroCanvas as a scalable and effective solution for real-time and resource-efficient seizure detection in clinical practice.The code will be released at https://github.com/Yanchen30247/seizure_detect.
分类|识别(5篇)
【1】XtraLight-MedMamba for Classification of Neoplastic Tubular Adenomas
标题:XtraLight-MedMamba用于肿瘤性管状腺癌的分类
链接:https://arxiv.org/abs/2602.04819
作者:Aqsa Sultana,Rayan Afsar,Ahmed Rahu,Surendra P. Singh,Brian Shula,Brandon Combs,Derrick Forchetti,Vijayan K. Asari
备注:13 pages, 8 figures
摘要:Accurate risk stratification of precancerous polyps during routine colonoscopy screenings is essential for lowering the risk of developing colorectal cancer (CRC). However, assessment of low-grade dysplasia remains limited by subjective histopathologic interpretation. Advancements in digital pathology and deep learning provide new opportunities to identify subtle and fine morphologic patterns associated with malignant progression that may be imperceptible to the human eye. In this work, we propose XtraLight-MedMamba, an ultra-lightweight state-space-based deep learning framework for classifying neoplastic tubular adenomas from whole-slide images (WSIs). The architecture is a blend of ConvNext based shallow feature extractor with parallel vision mamba to efficiently model both long- and short-range dependencies and image generalization. An integration of Spatial and Channel Attention Bridge (SCAB) module enhances multiscale feature extraction, while Fixed Non-Negative Orthogonal Classifier (FNOClassifier) enables substantial parameter reduction and improved generalization. The model was evaluated on a curated dataset acquired from patients with low-grade tubular adenomas, stratified into case and control cohorts based on subsequent CRC development. XtraLight-MedMamba achieved an accuracy of 97.18% and an F1-score of 0.9767 using approximately 32,000 parameters, outperforming transformer-based and conventional Mamba architectures with significantly higher model complexity.
【2】Hand Gesture Recognition from Doppler Radar Signals Using Echo State Networks
标题:利用回声状态网络从多普勒雷达信号中识别手势
链接:https://arxiv.org/abs/2602.04436
作者:Towa Sano,Gouhei Tanaka
备注:Submitted to IJCNN 2026. 21 pages, 10figures
摘要:Hand gesture recognition (HGR) is a fundamental technology in human computer interaction (HCI).In particular, HGR based on Doppler radar signals is suited for in-vehicle interfaces and robotic systems, necessitating lightweight and computationally efficient recognition techniques. However, conventional deep learning-based methods still suffer from high computational costs. To address this issue, we propose an Echo State Network (ESN) approach for radar-based HGR, using frequency-modulated-continuous-wave (FMCW) radar signals. Raw radar data is first converted into feature maps, such as range-time and Doppler-time maps, which are then fed into one or more recurrent neural network-based reservoirs. The obtained reservoir states are processed by readout classifiers, including ridge regression, support vector machines, and random forests. Comparative experiments demonstrate that our method outperforms existing approaches on an 11-class HGR task using the Soli dataset and surpasses existing deep learning models on a 4-class HGR task using the Dop-NET dataset. The results indicate that parallel processing using multi-reservoir ESNs are effective for recognizing temporal patterns from the multiple different feature maps in the time-space and time-frequency domains. Our ESN approaches achieve high recognition performance with low computational cost in HGR, showing great potential for more advanced HCI technologies, especially in resource-constrained environments.
【3】Multi-Integration of Labels across Categories for Component Identification (MILCCI)
标题:跨类别标签的多重集成以实现组件识别(MILMCC)
链接:https://arxiv.org/abs/2602.04270
作者:Noga Mudrik,Yuxi Chen,Gal Mishne,Adam S. Charles
摘要:Many fields collect large-scale temporal data through repeated measurements (trials), where each trial is labeled with a set of metadata variables spanning several categories. For example, a trial in a neuroscience study may be linked to a value from category (a): task difficulty, and category (b): animal choice. A critical challenge in time-series analysis is to understand how these labels are encoded within the multi-trial observations, and disentangle the distinct effect of each label entry across categories. Here, we present MILCCI, a novel data-driven method that i) identifies the interpretable components underlying the data, ii) captures cross-trial variability, and iii) integrates label information to understand each category's representation within the data. MILCCI extends a sparse per-trial decomposition that leverages label similarities within each category to enable subtle, label-driven cross-trial adjustments in component compositions and to distinguish the contribution of each category. MILCCI also learns each component's corresponding temporal trace, which evolves over time within each trial and varies flexibly across trials. We demonstrate MILCCI's performance through both synthetic and real-world examples, including voting patterns, online page view trends, and neuronal recordings.
【4】Discovering Mechanistic Models of Neural Activity: System Identification in an in Silico Zebrafish
标题:发现神经活动的机制模型:Silico Zribrafish的系统识别
链接:https://arxiv.org/abs/2602.04492
作者:Jan-Matthis Lueckmann,Viren Jain,Michał Januszewski
摘要:Constructing mechanistic models of neural circuits is a fundamental goal of neuroscience, yet verifying such models is limited by the lack of ground truth. To rigorously test model discovery, we establish an in silico testbed using neuromechanical simulations of a larval zebrafish as a transparent ground truth. We find that LLM-based tree search autonomously discovers predictive models that significantly outperform established forecasting baselines. Conditioning on sensory drive is necessary but not sufficient for faithful system identification, as models exploit statistical shortcuts. Structural priors prove essential for enabling robust out-of-distribution generalization and recovery of interpretable mechanistic models. Our insights provide guidance for modeling real-world neural recordings and offer a broader template for AI-driven scientific discovery.
【5】Fixed Budget is No Harder Than Fixed Confidence in Best-Arm Identification up to Logarithmic Factors
标题:固定预算并不比最佳臂识别的固定信心困难,直到算术因子
链接:https://arxiv.org/abs/2602.03972
作者:Kapilan Balagopalan,Yinan Li,Yao Zhao,Tuan Nguyen,Anton Daitche,Houssam Nassif,Kwang-Sung Jun
摘要:The best-arm identification (BAI) problem is one of the most fundamental problems in interactive machine learning, which has two flavors: the fixed-budget setting (FB) and the fixed-confidence setting (FC). For $K$-armed bandits with the unique best arm, the optimal sample complexities for both settings have been settled down, and they match up to logarithmic factors. This prompts an interesting research question about the generic, potentially structured BAI problems: Is FB harder than FC or the other way around? In this paper, we show that FB is no harder than FC up to logarithmic factors. We do this constructively: we propose a novel algorithm called FC2FB (fixed confidence to fixed budget), which is a meta algorithm that takes in an FC algorithm $\mathcal{A}$ and turn it into an FB algorithm. We prove that this FC2FB enjoys a sample complexity that matches, up to logarithmic factors, that of the sample complexity of $\mathcal{A}$. This means that the optimal FC sample complexity is an upper bound of the optimal FB sample complexity up to logarithmic factors. Our result not only reveals a fundamental relationship between FB and FC, but also has a significant implication: FC2FB, combined with existing state-of-the-art FC algorithms, leads to improved sample complexity for a number of FB problems.
表征(2篇)
【1】SEIS: Subspace-based Equivariance and Invariance Scores for Neural Representations
标题:SEIS:神经表示的基于子空间的等方差和不变性分数
链接:https://arxiv.org/abs/2602.04054
作者:Huahua Lin,Katayoun Farrahi,Xiaohao Cai
摘要:Understanding how neural representations respond to geometric transformations is essential for evaluating whether learned features preserve meaningful spatial structure. Existing approaches primarily assess robustness by comparing model outputs under transformed inputs, offering limited insight into how geometric information is organized within internal representations and failing to distinguish between information loss and re-encoding. In this work, we introduce SEIS (Subspace-based Equivariance and Invariance Scores), a subspace metric for analyzing layer-wise feature representations under geometric transformations, disentangling equivariance from invariance without requiring labels or explicit knowledge of the transformation. Synthetic validation confirms that SEIS correctly recovers known transformations. Applied to trained classification networks, SEIS reveals a transition from equivariance in early layers to invariance in deeper layers, and that data augmentation increases invariance while preserving equivariance. We further show that multi-task learning induces synergistic gains in both properties at the shared encoder, and skip connections restore equivariance lost during decoding.
【2】Representation Geometry as a Diagnostic for Out-of-Distribution Robustness
标题:表示几何作为非分布稳健性的诊断
链接:https://arxiv.org/abs/2602.03951
作者:Ali Zia,Farid Hazratian
摘要:Robust generalization under distribution shift remains difficult to monitor and optimize in the absence of target-domain labels, as models with similar in-distribution accuracy can exhibit markedly different out-of-distribution (OOD) performance. While prior work has focused on training-time regularization and low-order representation statistics, little is known about whether the geometric structure of learned embeddings provides reliable post-hoc signals of robustness. We propose a geometry-based diagnostic framework that constructs class-conditional mutual k-nearest-neighbor graphs from in-distribution embeddings and extracts two complementary invariants: a global spectral complexity proxy based on the reduced log-determinant of the normalized Laplacian, and a local smoothness measure based on Ollivier--Ricci curvature. Across multiple architectures, training regimes, and corruption benchmarks, we find that lower spectral complexity and higher mean curvature consistently predict stronger OOD accuracy across checkpoints. Controlled perturbations and topological analyses further show that these signals reflect meaningful representation structure rather than superficial embedding statistics. Our results demonstrate that representation geometry enables interpretable, label-free robustness diagnosis and supports reliable unsupervised checkpoint selection under distribution shift.
优化|敛散性(10篇)
【1】Improved Dimension Dependence for Bandit Convex Optimization with Gradient Variations
标题:具有梯度变化的Bandit凸优化的改进维度依赖性
链接:https://arxiv.org/abs/2602.04761
作者:Hang Yu,Yu-Hu Yan,Peng Zhao
摘要:Gradient-variation online learning has drawn increasing attention due to its deep connections to game theory, optimization, etc. It has been studied extensively in the full-information setting, but is underexplored with bandit feedback. In this work, we focus on gradient variation in Bandit Convex Optimization (BCO) with two-point feedback. By proposing a refined analysis on the non-consecutive gradient variation, a fundamental quantity in gradient variation with bandits, we improve the dimension dependence for both convex and strongly convex functions compared with the best known results (Chiang et al., 2013). Our improved analysis for the non-consecutive gradient variation also implies other favorable problem-dependent guarantees, such as gradient-variance and small-loss regrets. Beyond the two-point setup, we demonstrate the versatility of our technique by achieving the first gradient-variation bound for one-point bandit linear optimization over hyper-rectangular domains. Finally, we validate the effectiveness of our results in more challenging tasks such as dynamic/universal regret minimization and bandit games, establishing the first gradient-variation dynamic and universal regret bounds for two-point BCO and fast convergence rates in bandit games.
【2】Optimal Rates for Feasible Payoff Set Estimation in Games
标题:博弈中可行支付集估计的最优率
链接:https://arxiv.org/abs/2602.04397
作者:Annalisa Barbara,Riccardo Poiani,Martino Bernasconi,Andrea Celli
摘要
:We study a setting in which two players play a (possibly approximate) Nash equilibrium of a bimatrix game, while a learner observes only their actions and has no knowledge of the equilibrium or the underlying game. A natural question is whether the learner can rationalize the observed behavior by inferring the players' payoff functions. Rather than producing a single payoff estimate, inverse game theory aims to identify the entire set of payoffs consistent with observed behavior, enabling downstream use in, e.g., counterfactual analysis and mechanism design across applications like auctions, pricing, and security games. We focus on the problem of estimating the set of feasible payoffs with high probability and up to precision $ε$ on the Hausdorff metric. We provide the first minimax-optimal rates for both exact and approximate equilibrium play, in zero-sum as well as general-sum games. Our results provide learning-theoretic foundations for set-valued payoff inference in multi-agent environments.
【3】LoRDO: Distributed Low-Rank Optimization with Infrequent Communication
标题:LoRDO:不频繁通信的分布式低秩优化
链接:https://arxiv.org/abs/2602.04396
作者:Andrej Jovanović,Alex Iacob,Mher Safaryan,Ionut-Vlad Modoranu,Lorenzo Sani,William F. Shen,Xinchi Qiu,Dan Alistarh,Nicholas D. Lane
备注:Preprint; under review
摘要:Distributed training of foundation models via $\texttt{DDP}$ is limited by interconnect bandwidth. While infrequent communication strategies reduce synchronization frequency, they remain bottlenecked by the memory and communication requirements of optimizer states. Low-rank optimizers can alleviate these constraints; however, in the local-update regime, workers lack access to the full-batch gradients required to compute low-rank projections, which degrades performance. We propose $\texttt{LoRDO}$, a principled framework unifying low-rank optimization with infrequent synchronization. We first demonstrate that, while global projections based on pseudo-gradients are theoretically superior, they permanently restrict the optimization trajectory to a low-rank subspace. To restore subspace exploration, we introduce a full-rank quasi-hyperbolic update. $\texttt{LoRDO}$ achieves near-parity with low-rank $\texttt{DDP}$ in language modeling and downstream tasks at model scales of $125$M--$720$M, while reducing communication by $\approx 10 \times$. Finally, we show that $\texttt{LoRDO}$ improves performance even more in very low-memory settings with small rank/batch size.
【4】Multi Objective Design Optimization of Non Pneumatic Passenger Car Tires Using Finite Element Modeling, Machine Learning, and Particle swarm Optimization and Bayesian Optimization Algorithms
标题:利用有限元素建模、机器学习、粒子群优化和Bayesian优化算法进行非充气乘用车轮胎多目标设计优化
链接:https://arxiv.org/abs/2602.04277
作者:Priyankkumar Dhrangdhariya,Soumyadipta Maiti,Venkataramana Runkana
摘要:Non Pneumatic tires offer a promising alternative to pneumatic tires. However, their discontinuous spoke structures present challenges in stiffness tuning, durability, and high speed vibration. This study introduces an integrated generative design and machine learning driven framework to optimize UPTIS type spoke geometries for passenger vehicles. Upper and lower spoke profiles were parameterized using high order polynomial representations, enabling the creation of approximately 250 generative designs through PCHIP based geometric variation. Machine learning models like KRR for stiffness and XGBoost for durability and vibration achieved strong predictive accuracy, reducing the reliance on computationally intensive FEM simulations. Optimization using Particle Swarm Optimization and Bayesian Optimization further enabled extensive performance refinement. The resulting designs demonstrate 53% stiffness tunability, up to 50% durability improvement, and 43% reduction in vibration compared to the baseline. PSO provided fast, targeted convergence, while Bayesian Optimization effectively explored multi objective tradeoffs. Overall, the proposed framework enables systematic development of high performance, next generation UPTIS spoke structures.
【5】Rate-Optimal Noise Annealing in Semi-Dual Neural Optimal Transport: Tangential Identifiability, Off-Manifold Ambiguity, and Guaranteed Recovery
标题:半二元神经最优传输中的速率最优噪音模拟:切向可识别性、管外模糊性和保证恢复
链接:https://arxiv.org/abs/2602.04110
作者:Raymond Chu,Jaewoong Choi,Dohyun Kwon
摘要:Semi-dual neural optimal transport learns a transport map via a max-min objective, yet training can converge to incorrect or degenerate maps. We fully characterize these spurious solutions in the common regime where data concentrate on low-dimensional manifold: the objective is underconstrained off the data manifold, while the on-manifold transport signal remains identifiable. Following Choi, Choi, and Kwon (2025), we study additive-noise smoothing as a remedy and prove new map recovery guarantees as the noise vanishes. Our main practical contribution is a computable terminal noise level $\varepsilon_{\mathrm{stat}}(N)$ that attains the optimal statistical rate, with scaling governed by the intrinsic dimension $m$ of the data. The formula arises from a theoretical unified analysis of (i) quantitative stability of optimal plans, (ii) smoothing-induced bias, and (iii) finite-sample error, yielding rates that depend on $m$ rather than the ambient dimension. Finally, we show that the reduced semi-dual objective becomes increasingly ill-conditioned as $\varepsilon \downarrow 0$. This provides a principled stopping rule: annealing below $\varepsilon_{\mathrm{stat}}(N)$ can $\textit{worsen}$ optimization conditioning without improving statistical accuracy.
【6】GOPO: Policy Optimization using Ranked Rewards
标题:GOPO:使用排名奖励的政策优化
链接:https://arxiv.org/abs/2602.03876
作者:Kyuseong Choi,Dwaipayan Saha,Woojeong Kim,Anish Agarwal,Raaz Dwivedi
备注:17 pages, 8 figures
摘要
:Standard reinforcement learning from human feedback (RLHF) trains a reward model on pairwise preference data and then uses it for policy optimization. However, while reward models are optimized to capture relative preferences, existing policy optimization techniques rely on absolute reward magnitudes during training. In settings where the rewards are non-verifiable such as summarization, instruction following, and chat completion, this misalignment often leads to suboptimal performance. We introduce Group Ordinal Policy Optimization (GOPO), a policy optimization method that uses only the ranking of the rewards and discards their magnitudes. Our rank-based transformation of rewards provides several gains, compared to Group Relative Policy Optimization (GRPO), in settings with non-verifiable rewards: (1) consistently higher training/validation reward trajectories, (2) improved LLM-as-judge evaluations across most intermediate training steps, and (3) reaching a policy of comparable quality in substantially less training steps than GRPO. We demonstrate consistent improvements across a range of tasks and model sizes.
【7】Multi-layer Cross-Attention is Provably Optimal for Multi-modal In-context Learning
标题:多层交叉注意力对于多模式上下文学习来说是最佳的
链接:https://arxiv.org/abs/2602.04872
作者:Nicholas Barnfield,Subhabrata Sen,Pragya Sur
摘要:Recent progress has rapidly advanced our understanding of the mechanisms underlying in-context learning in modern attention-based neural networks. However, existing results focus exclusively on unimodal data; in contrast, the theoretical underpinnings of in-context learning for multi-modal data remain poorly understood. We introduce a mathematically tractable framework for studying multi-modal learning and explore when transformer-like architectures can recover Bayes-optimal performance in-context. To model multi-modal problems, we assume the observed data arises from a latent factor model. Our first result comprises a negative take on expressibility: we prove that single-layer, linear self-attention fails to recover the Bayes-optimal predictor uniformly over the task distribution. To address this limitation, we introduce a novel, linearized cross-attention mechanism, which we study in the regime where both the number of cross-attention layers and the context length are large. We show that this cross-attention mechanism is provably Bayes optimal when optimized using gradient flow. Our results underscore the benefits of depth for in-context learning and establish the provable utility of cross-attention for multi-modal distributions.
【8】Theory of Optimal Learning Rate Schedules and Scaling Laws for a Random Feature Model
标题:随机特征模型的最佳学习率计划和标度定律理论
链接:https://arxiv.org/abs/2602.04774
作者:Blake Bordelon,Francesco Mori
摘要:Setting the learning rate for a deep learning model is a critical part of successful training, yet choosing this hyperparameter is often done empirically with trial and error. In this work, we explore a solvable model of optimal learning rate schedules for a powerlaw random feature model trained with stochastic gradient descent (SGD). We consider the optimal schedule $η_T^\star(t)$ where $t$ is the current iterate and $T$ is the total training horizon. This schedule is computed both numerically and analytically (when possible) using optimal control methods. Our analysis reveals two regimes which we term the easy phase and hard phase. In the easy phase the optimal schedule is a polynomial decay $η_T^\star(t) \simeq T^{-ξ} (1-t/T)^δ$ where $ξ$ and $δ$ depend on the properties of the features and task. In the hard phase, the optimal schedule resembles warmup-stable-decay with constant (in $T$) initial learning rate and annealing performed over a vanishing (in $T$) fraction of training steps. We investigate joint optimization of learning rate and batch size, identifying a degenerate optimality condition. Our model also predicts the compute-optimal scaling laws (where model size and training steps are chosen optimally) in both easy and hard regimes. Going beyond SGD, we consider optimal schedules for the momentum $β(t)$, where speedups in the hard phase are possible. We compare our optimal schedule to various benchmarks in our task including (1) optimal constant learning rates $η_T(t) \sim T^{-ξ}$ (2) optimal power laws $η_T(t) \sim T^{-ξ} t^{-χ}$, finding that our schedule achieves better rates than either of these. Our theory suggests that learning rate transfer across training horizon depends on the structure of the model and task. We explore these ideas in simple experimental pretraining setups.
【9】Geometry-Aware Optimal Transport: Fast Intrinsic Dimension and Wasserstein Distance Estimation
标题:具有几何意识的最佳传输:快速内在维度和沃瑟斯坦距离估计
链接:https://arxiv.org/abs/2602.04335
作者:Ferdinand Genans,Olivier Wintenberger
摘要:Solving large scale Optimal Transport (OT) in machine learning typically relies on sampling measures to obtain a tractable discrete problem. While the discrete solver's accuracy is controllable, the rate of convergence of the discretization error is governed by the intrinsic dimension of our data. Therefore, the true bottleneck is the knowledge and control of the sampling error. In this work, we tackle this issue by introducing novel estimators for both sampling error and intrinsic dimension. The key finding is a simple, tuning-free estimator of $\text{OT}_c(ρ, \hatρ)$ that utilizes the semi-dual OT functional and, remarkably, requires no OT solver. Furthermore, we derive a fast intrinsic dimension estimator from the multi-scale decay of our sampling error estimator. This framework unlocks significant computational and statistical advantages in practice, enabling us to (i) quantify the convergence rate of the discretization error, (ii) calibrate the entropic regularization of Sinkhorn divergences to the data's intrinsic geometry, and (iii) introduce a novel, intrinsic-dimension-based Richardson extrapolation estimator that strongly debiases Wasserstein distance estimation. Numerical experiments demonstrate that our geometry-aware pipeline effectively mitigates the discretization error bottleneck while maintaining computational efficiency.
【10】Byzantine Machine Learning: MultiKrum and an optimal notion of robustness
标题:拜占庭机器学习:MultiKrum和稳健性的最佳概念
链接:https://arxiv.org/abs/2602.03899
作者:Gilles Bareilles,Wassim Bouaziz,Julien Fageot,El-Mahdi El-Mhamdi
摘要
:Aggregation rules are the cornerstone of distributed (or federated) learning in the presence of adversaries, under the so-called Byzantine threat model. They are also interesting mathematical objects from the point of view of robust mean estimation. The Krum aggregation rule has been extensively studied, and endowed with formal robustness and convergence guarantees. Yet, MultiKrum, a natural extension of Krum, is often preferred in practice for its superior empirical performance, even though no theoretical guarantees were available until now. In this work, we provide the first proof that MultiKrum is a robust aggregation rule, and bound its robustness coefficient. To do so, we introduce $κ^\star$, the optimal *robustness coefficient* of an aggregation rule, which quantifies the accuracy of mean estimation in the presence of adversaries in a tighter manner compared with previously adopted notions of robustness. We then construct an upper and a lower bound on MultiKrum's robustness coefficient. As a by-product, we also improve on the best-known bounds on Krum's robustness coefficient. We show that MultiKrum's bounds are never worse than Krum's, and better in realistic regimes. We illustrate this analysis by an experimental investigation on the quality of the lower bound.
预测|估计(14篇)
【1】Robust Generalizable Heterogeneous Legal Link Prediction
标题:鲁棒的可推广异构合法链接预测
链接:https://arxiv.org/abs/2602.04812
作者:Lorenz Wendlinger,Simon Alexander Nonn,Abdullah Al Zubaer,Michael Granitzer
备注:9 Pages
摘要:Recent work has applied link prediction to large heterogeneous legal citation networks \new{with rich meta-features}. We find that this approach can be improved by including edge dropout and feature concatenation for the learning of more robust representations, which reduces error rates by up to 45%. We also propose an approach based on multilingual node features with an improved asymmetric decoder for compatibility, which allows us to generalize and extend the prediction to more, geographically and linguistically disjoint, data from New Zealand. Our adaptations also improve inductive transferability between these disjoint legal systems.
【2】A Dual-TransUNet Deep Learning Framework for Multi-Source Precipitation Merging and Improving Seasonal and Extreme Estimates
标题:用于多源降水合并并改进季节性和极端估计的Dual-TransUNet深度学习框架
链接:https://arxiv.org/abs/2602.04757
作者:Yuchen Ye,Zixuan Qi,Shixuan Li,Wei Qi,Yanpeng Cai,Chaoxia Yuan
备注:75 pages,20 figures
摘要:Multi-source precipitation products (MSPs) from satellite retrievals and reanalysis are widely used for hydroclimatic monitoring, yet spatially heterogeneous biases and limited skill for extremes still constrain their hydrologic utility. Here we develop a dual-stage TransUNet-based multi-source precipitation merging framework (DDL-MSPMF) that integrates six MSPs with four ERA5 near-surface physical predictors. A first-stage classifier estimates daily precipitation occurrence probability, and a second-stage regressor fuses the classifier outputs together with all predictors to estimate daily precipitation amount at 0.25 degree resolution over China for 2001-2020. Benchmarking against multiple deep learning and hybrid baselines shows that the TransUNet - TransUNet configuration yields the best seasonal performance (R = 0.75; RMSE = 2.70 mm/day) and improves robustness relative to a single-regressor setting. For heavy precipitation (>25 mm/day), DDL-MSPMF increases equitable threat scores across most regions of eastern China and better reproduces the spatial pattern of the July 2021 Zhengzhou rainstorm, indicating enhanced extreme-event detection beyond seasonal-mean corrections. Independent evaluation over the Qinghai-Tibet Plateau using TPHiPr further supports its applicability in data-scarce regions. SHAP analysis highlights the importance of precipitation occurrence probabilities and surface pressure, providing physically interpretable diagnostics. The proposed framework offers a scalable and explainable approach for precipitation fusion and extreme-event assessment.
【3】From Data to Behavior: Predicting Unintended Model Behaviors Before Training
标题:从数据到行为:在训练之前预测非预期的模型行为
链接:https://arxiv.org/abs/2602.04735
作者:Mengru Wang,Zhenqian Xu,Junfeng Fang,Yunzhi Yao,Shumin Deng,Huajun Chen,Ningyu Zhang
备注:Work in progress
摘要:Large Language Models (LLMs) can acquire unintended biases from seemingly benign training data even without explicit cues or malicious content. Existing methods struggle to detect such risks before fine-tuning, making post hoc evaluation costly and inefficient. To address this challenge, we introduce Data2Behavior, a new task for predicting unintended model behaviors prior to training. We also propose Manipulating Data Features (MDF), a lightweight approach that summarizes candidate data through their mean representations and injects them into the forward pass of a base model, allowing latent statistical signals in the data to shape model activations and reveal potential biases and safety risks without updating any parameters. MDF achieves reliable prediction while consuming only about 20% of the GPU resources required for fine-tuning. Experiments on Qwen3-14B, Qwen2.5-32B-Instruct, and Gemma-3-12b-it confirm that MDF can anticipate unintended behaviors and provide insight into pre-training vulnerabilities.
【4】Bounded-Abstention Multi-horizon Time-series Forecasting
标题:有界回避多水平时间序列预测
链接:https://arxiv.org/abs/2602.04714
作者:Luca Stradiotti,Laurens Devos,Anna Monreale,Jesse Davis,Andrea Pugnana
摘要:Multi-horizon time-series forecasting involves simultaneously making predictions for a consecutive sequence of subsequent time steps. This task arises in many application domains, such as healthcare and finance, where mispredictions can have a high cost and reduce trust. The learning with abstention framework tackles these problems by allowing a model to abstain from offering a prediction when it is at an elevated risk of making a misprediction. Unfortunately, existing abstention strategies are ill-suited for the multi-horizon setting: they target problems where a model offers a single prediction for each instance. Hence, they ignore the structured and correlated nature of the predictions offered by a multi-horizon forecaster. We formalize the problem of learning with abstention for multi-horizon forecasting setting and show that its structured nature admits a richer set of abstention problems. Concretely, we propose three natural notions of how a model could abstain for multi-horizon forecasting. We theoretically analyze each problem to derive the optimal abstention strategy and propose an algorithm that implements it. Extensive evaluation on 24 datasets shows that our proposed algorithms significantly outperforms existing baselines.
【5】SAFE: Stable Alignment Finetuning with Entropy-Aware Predictive Control for RLHF
标题:SAFE:RL HF的稳定对准微调和感知熵预测控制
链接:https://arxiv.org/abs/2602.04651
作者:Dipan Maity
摘要:Optimization (PPO) has been positioned by recent literature as the canonical method for the RL part of RLHF. PPO performs well empirically but has a heuristic motivation and handles the KL-divergence constraint used in LM-RLHF in an ad-hoc manner and suffers form reward oscillations, entropy collapse, value function drift, and sudden policy divergence that require frequent restarts and extensive hyperparameter tuning. In this paper, we develop a new pure on policy actor-critic RL method for the LM-RLHF setting. We present SAFE (Stable Alignment Finetuning with Entropy-aware control),a novel RLHF algorithm that combines a Double Soft-Min Critic for pessimistic value estimation with a new multi-layer stabilization framework combining entropy-gated KL regulation, and PID-controlled adaptive thresholds. Unlike standard PPO's symmetric KL penalties, SAFE distinguishes high-entropy exploration from low-entropy mode collapse and adjusts penalties dynamically based on reward velocity. Experiments on a 3B parameter model show SAFE achieves +5.15\% training-average reward than PPO (0.725 vs 0.689), negligible reward crashes, and superior KL control than ppo . Our method adds minimal computational overhead and provides an interpretable, crash-resistant RLHF framework that maintains aggressive learning speed while ensuring stable long-horizon optimization suitable for production deployment. Code is available at https://github.com/ryyzn9/SAFE
【6】MTS-JEPA: Multi-Resolution Joint-Embedding Predictive Architecture for Time-Series Anomaly Prediction
标题:MTS-JEPA:用于时间序列异常预测的多分辨率联合嵌入预测架构
链接:https://arxiv.org/abs/2602.04643
作者:Yanan He,Yunshi Wen,Xin Wang,Tengfei Ma
摘要:Multivariate time series underpin modern critical infrastructure, making the prediction of anomalies a vital necessity for proactive risk mitigation. While Joint-Embedding Predictive Architectures (JEPA) offer a promising framework for modeling the latent evolution of these systems, their application is hindered by representation collapse and an inability to capture precursor signals across varying temporal scales. To address these limitations, we propose MTS-JEPA, a specialized architecture that integrates a multi-resolution predictive objective with a soft codebook bottleneck. This design explicitly decouples transient shocks from long-term trends, and utilizes the codebook to capture discrete regime transitions. Notably, we find this constraint also acts as an intrinsic regularizer to ensure optimization stability. Empirical evaluations on standard benchmarks confirm that our approach effectively prevents degenerate solutions and achieves state-of-the-art performance under the early-warning protocol.
【7】Efficient Equivariant High-Order Crystal Tensor Prediction via Cartesian Local-Environment Many-Body Coupling
标题:通过Cartesian局部环境多体耦合进行高效等变高次晶体张量预测
链接:https://arxiv.org/abs/2602.04323
作者:Dian Jin,Yancheng Yuan,Xiaoming Tao
摘要:End-to-end prediction of high-order crystal tensor properties from atomic structures remains challenging: while spherical-harmonic equivariant models are expressive, their Clebsch-Gordan tensor products incur substantial compute and memory costs for higher-order targets. We propose the Cartesian Environment Interaction Tensor Network (CEITNet), an approach that constructs a multi-channel Cartesian local environment tensor for each atom and performs flexible many-body mixing via a learnable channel-space interaction. By performing learning in channel space and using Cartesian tensor bases to assemble equivariant outputs, CEITNet enables efficient construction of high-order tensor. Across benchmark datasets for order-2 dielectric, order-3 piezoelectric, and order-4 elastic tensor prediction, CEITNet surpasses prior high-order prediction methods on key accuracy criteria while offering high computational efficiency.
【8】Partition Trees: Conditional Density Estimation over General Outcome Spaces
标题:分区树:一般结果空间上的条件密度估计
链接:https://arxiv.org/abs/2602.04042
作者:Felipe Angelim,Alessandro Leite
备注:Code available at https://github.com/felipeangelimvieira/partition_tree
摘要:We propose Partition Trees, a tree-based framework for conditional density estimation over general outcome spaces, supporting both continuous and categorical variables within a unified formulation. Our approach models conditional distributions as piecewise-constant densities on data adaptive partitions and learns trees by directly minimizing conditional negative log-likelihood. This yields a scalable, nonparametric alternative to existing probabilistic trees that does not make parametric assumptions about the target distribution. We further introduce Partition Forests, an ensemble extension obtained by averaging conditional densities. Empirically, we demonstrate improved probabilistic prediction over CART-style trees and competitive or superior performance compared to state-of-the-art probabilistic tree methods and Random Forests, along with robustness to redundant features and heteroscedastic noise.
【9】Child Mortality Prediction in Bangladesh: A Decade-Long Validation Study
标题:孟加拉国儿童死亡率预测:一项长达十年的验证研究
链接:https://arxiv.org/abs/2602.03957
作者:Md Muhtasim Munif Fahim,Md Rezaul Karim
摘要
:The predictive machine learning models for child mortality tend to be inaccurate when applied to future populations, since they suffer from look-ahead bias due to the randomization used in cross-validation. The Demographic and Health Surveys (DHS) data from Bangladesh for 2011-2022, with n = 33,962, are used in this paper. We trained the model on (2011-2014) data, validated it on 2017 data, and tested it on 2022 data. Eight years after the initial test of the model, a genetic algorithm-based Neural Architecture Search found a single-layer neural architecture (with 64 units) to be superior to XGBoost (AUROC = 0.76 vs. 0.73; p < 0.01). Additionally, through a detailed fairness audit, we identified an overall "Socioeconomic Predictive Gradient," with a positive correlation between regional poverty level (r = -0.62) and the algorithm's AUC. In addition, we found that the model performed at its highest levels in the least affluent divisions (AUC 0.74) and decreased dramatically in the wealthiest divisions (AUC 0.66). These findings suggest that the model is identifying areas with the greatest need for intervention. Our model would identify approximately 1300 additional at-risk children annually than a Gradient Boosting model when screened at the 10% level and validated using SHAP values and Platt Calibration, and therefore provide a robust, production-ready computational phenotype for targeted maternal and child health interventions.
【10】Echo State Networks for Time Series Forecasting: Hyperparameter Sweep and Benchmarking
标题:用于时间序列预测的回声状态网络:超参数扫描和基准测试
链接:https://arxiv.org/abs/2602.03912
作者:Alexander Häußer
摘要:This paper investigates the forecasting performance of Echo State Networks (ESNs) for univariate time series forecasting using a subset of the M4 Forecasting Competition dataset. Focusing on monthly and quarterly time series with at most 20 years of historical data, we evaluate whether a fully automatic, purely feedback-driven ESN can serve as a competitive alternative to widely used statistical forecasting methods. The study adopts a rigorous two-stage evaluation approach: a Parameter dataset is used to conduct an extensive hyperparameter sweep covering leakage rate, spectral radius, reservoir size, and information criteria for regularization, resulting in over four million ESN model fits; a disjoint Forecast dataset is then used for out-of-sample accuracy assessment. Forecast accuracy is measured using MASE and sMAPE and benchmarked against simple benchmarks like drift and seasonal naive and statistical models like ARIMA, ETS, and TBATS. The hyperparameter analysis reveals consistent and interpretable patterns, with monthly series favoring moderately persistent reservoirs and quarterly series favoring more contractive dynamics. Across both frequencies, high leakage rates are preferred, while optimal spectral radii and reservoir sizes vary with temporal resolution. In the out-of-sample evaluation, the ESN performs on par with ARIMA and TBATS for monthly data and achieves the lowest mean MASE for quarterly data, while requiring lower computational cost than the more complex statistical models. Overall, the results demonstrate that ESNs offer a compelling balance between predictive accuracy, robustness, and computational efficiency, positioning them as a practical option for automated time series forecasting.
【11】Conditional Counterfactual Mean Embeddings: Doubly Robust Estimation and Learning Rates
标题:条件反事实均值嵌入:双稳健估计和学习率
链接:https://arxiv.org/abs/2602.04736
作者:Thatchanon Anancharoenkij,Donlapark Ponnoprat
备注:Code is available at https://github.com/donlap/Conditional-Counterfactual-Mean-Embeddings
摘要:A complete understanding of heterogeneous treatment effects involves characterizing the full conditional distribution of potential outcomes. To this end, we propose the Conditional Counterfactual Mean Embeddings (CCME), a framework that embeds conditional distributions of counterfactual outcomes into a reproducing kernel Hilbert space (RKHS). Under this framework, we develop a two-stage meta-estimator for CCME that accommodates any RKHS-valued regression in each stage. Based on this meta-estimator, we develop three practical CCME estimators: (1) Ridge Regression estimator, (2) Deep Feature estimator that parameterizes the feature map by a neural network, and (3) Neural-Kernel estimator that performs RKHS-valued regression, with the coefficients parameterized by a neural network. We provide finite-sample convergence rates for all estimators, establishing that they possess the double robustness property. Our experiments demonstrate that our estimators accurately recover distributional features including multimodal structure of conditional counterfactual distributions.
【12】Thermodynamic assessment of machine learning models for solid-state synthesis prediction
标题:用于固态合成预测的机器学习模型的热力学评估
链接:https://arxiv.org/abs/2602.04075
作者:Jane Schlesinger,Simon Hjaltason,Nathan J. Szymanski,Christopher J. Bartel
摘要:Machine learning models have recently emerged to predict whether hypothetical solid-state materials can be synthesized. These models aim to circumvent direct first-principles modeling of solid-state phase transformations, instead learning from large databases of successfully synthesized materials. Here, we assess the alignment of several recently introduced synthesis prediction models with material and reaction thermodynamics, quantified by the energy with respect to the convex hull and a metric accounting for thermodynamic selectivity of enumerated synthesis reactions. A dataset of successful synthesis recipes was used to determine the likely bounds on both quantities beyond which materials can be deemed unlikely to be synthesized. With these bounds as context, thermodynamic quantities were computed using the CHGNet foundation potential for thousands of new hypothetical materials generated using the Chemeleon generative model. Four recently published machine learning models for synthesizability prediction were applied to this same dataset, and the resultant predictions were considered against computed thermodynamics. We find these models generally overpredict the likelihood of synthesis, but some model scores do trend with thermodynamic heuristics, assigning lower scores to materials that are less stable or do not have an available synthesis recipe that is calculated to be thermodynamically selective. In total, this work identifies existing gaps in machine learning models for materials synthesis and introduces a new approach to assess their quality in the absence of extensive negative examples (failed syntheses).
【13】Privacy utility trade offs for parameter estimation in degree heterogeneous higher order networks
标题:度异类更高级网络参数估计的隐私效用权衡
链接:https://arxiv.org/abs/2602.03948
作者:Bibhabasu Mandal,Sagnik Nandy
摘要
:In sensitive applications involving relational datasets, protecting information about individual links from adversarial queries is of paramount importance. In many such settings, the available data are summarized solely through the degrees of the nodes in the network. We adopt the $β$ model, which is the prototypical statistical model adopted for this form of aggregated relational information, and study the problem of minimax-optimal parameter estimation under both local and central differential privacy constraints. We establish finite sample minimax lower bounds that characterize the precise dependence of the estimation risk on the network size and the privacy parameters, and we propose simple estimators that achieve these bounds up to constants and logarithmic factors under both local and central differential privacy frameworks. Our results provide the first comprehensive finite sample characterization of privacy utility trade offs for parameter estimation in $β$ models, addressing the classical graph case and extending the analysis to higher order hypergraph models. We further demonstrate the effectiveness of our methods through experiments on synthetic data and a real world communication network.
【14】A Hitchhiker's Guide to Poisson Gradient Estimation
标题:Poisson梯度估计搭便车指南
链接:https://arxiv.org/abs/2602.03896
作者:Michael Ibrahim,Hanqi Zhao,Eli Sennesh,Zhi Li,Anqi Wu,Jacob L. Yates,Chengrui Li,Hadi Vafaii
备注:Code: https://github.com/hadivafaii/PoissonGradientEstimation
摘要:Poisson-distributed latent variable models are widely used in computational neuroscience, but differentiating through discrete stochastic samples remains challenging. Two approaches address this: Exponential Arrival Time (EAT) simulation and Gumbel-SoftMax (GSM) relaxation. We provide the first systematic comparison of these methods, along with practical guidance for practitioners. Our main technical contribution is a modification to the EAT method that theoretically guarantees an unbiased first moment (exactly matching the firing rate), and reduces second-moment bias. We evaluate these methods on their distributional fidelity, gradient quality, and performance on two tasks: (1) variational autoencoders with Poisson latents, and (2) partially observable generalized linear models, where latent neural connectivity must be inferred from observed spike trains. Across all metrics, our modified EAT method exhibits better overall performance (often comparable to exact gradients), and substantially higher robustness to hyperparameter choices. Together, our results clarify the trade-offs between these methods and offer concrete recommendations for practitioners working with Poisson latent variable models.
其他神经网络|深度学习|模型|建模(43篇)
【1】Reinforced Attention Learning
标题:加强注意力学习
链接:https://arxiv.org/abs/2602.04884
作者:Bangzheng Li,Jianmo Ni,Chen Qu,Ian Miao,Liu Yang,Xingyu Fu,Muhao Chen,Derek Zhiyuan Cheng
摘要:Post-training with Reinforcement Learning (RL) has substantially improved reasoning in Large Language Models (LLMs) via test-time scaling. However, extending this paradigm to Multimodal LLMs (MLLMs) through verbose rationales yields limited gains for perception and can even degrade performance. We propose Reinforced Attention Learning (RAL), a policy-gradient framework that directly optimizes internal attention distributions rather than output token sequences. By shifting optimization from what to generate to where to attend, RAL promotes effective information allocation and improved grounding in complex multimodal inputs. Experiments across diverse image and video benchmarks show consistent gains over GRPO and other baselines. We further introduce On-Policy Attention Distillation, demonstrating that transferring latent attention behaviors yields stronger cross-modal alignment than standard knowledge distillation. Our results position attention policies as a principled and general alternative for multimodal post-training.
【2】Contrastive Continual Learning for Model Adaptability in Internet of Things
标题:物联网模型适应性的对比持续学习
链接:https://arxiv.org/abs/2602.04881
作者:Ajesh Koyatan Chathoth
摘要:Internet of Things (IoT) deployments operate in nonstationary, dynamic environments where factors such as sensor drift, evolving user behavior, and heterogeneous user privacy requirements can affect application utility. Continual learning (CL) addresses this by adapting models over time without catastrophic forgetting. Meanwhile, contrastive learning has emerged as a powerful representation-learning paradigm that improves robustness and sample efficiency in a self-supervised manner. This paper reviews the usage of \emph{contrastive continual learning} (CCL) for IoT, connecting algorithmic design (replay, regularization, distillation, prompts) with IoT system realities (TinyML constraints, intermittent connectivity, privacy). We present a unifying problem formulation, derive common objectives that blend contrastive and distillation losses, propose an IoT-oriented reference architecture for on-device, edge, and cloud-based CCL, and provide guidance on evaluation protocols and metrics. Finally, we highlight open unique challenges with respect to the IoT domain, such as spanning tabular and streaming IoT data, concept drift, federated settings, and energy-aware training.
【3】From Evaluation to Design: Using Potential Energy Surface Smoothness Metrics to Guide Machine Learning Interatomic Potential Architectures
标题:从评估到设计:使用势能表面光滑度指标来指导机器学习原子间势架构
链接:https://arxiv.org/abs/2602.04861
作者:Ryan Liu,Eric Qu,Tobias Kreiman,Samuel M. Blau,Aditi S. Krishnapriyan
备注:13 pages main text, 10 pages reference & appendix, 8 figures
摘要:Machine Learning Interatomic Potentials (MLIPs) sometimes fail to reproduce the physical smoothness of the quantum potential energy surface (PES), leading to erroneous behavior in downstream simulations that standard energy and force regression evaluations can miss. Existing evaluations, such as microcanonical molecular dynamics (MD), are computationally expensive and primarily probe near-equilibrium states. To improve evaluation metrics for MLIPs, we introduce the Bond Smoothness Characterization Test (BSCT). This efficient benchmark probes the PES via controlled bond deformations and detects non-smoothness, including discontinuities, artificial minima, and spurious forces, both near and far from equilibrium. We show that BSCT correlates strongly with MD stability while requiring a fraction of the cost of MD. To demonstrate how BSCT can guide iterative model design, we utilize an unconstrained Transformer backbone as a testbed, illustrating how refinements such as a new differentiable $k$-nearest neighbors algorithm and temperature-controlled attention reduce artifacts identified by our metric. By optimizing model design systematically based on BSCT, the resulting MLIP simultaneously achieves a low conventional E/F regression error, stable MD simulations, and robust atomistic property predictions. Our results establish BSCT as both a validation metric and as an "in-the-loop" model design proxy that alerts MLIP developers to physical challenges that cannot be efficiently evaluated by current MLIP benchmarks.
【4】Evolving Afferent Architectures: Biologically-inspired Models for Damage-Avoidance Learning
标题:不断发展的输入建筑:生物启发的避免损害学习模型
链接:https://arxiv.org/abs/2602.04807
作者:Wolfgang Maass,Sabine Janzen,Prajvi Saxena,Sach Mukherjee
备注:16 pages, 6 figures
摘要:We introduce Afferent Learning, a framework that produces Computational Afferent Traces (CATs) as adaptive, internal risk signals for damage-avoidance learning. Inspired by biological systems, the framework uses a two-level architecture: evolutionary optimization (outer loop) discovers afferent sensing architectures that enable effective policy learning, while reinforcement learning (inner loop) trains damage-avoidance policies using these signals. This formalizes afferent sensing as providing an inductive bias for efficient learning: architectures are selected based on their ability to enable effective learning (rather than directly minimizing damage). We provide theoretical convergence guarantees under smoothness and bounded-noise assumptions. We illustrate the general approach in the challenging context of biomechanical digital twins operating over long time horizons (multiple decades of the life-course). Here, we find that CAT-based evolved architectures achieve significantly higher efficiency and better age-robustness than hand-designed baselines, enabling policies that exhibit age-dependent behavioral adaptation (23% reduction in high-risk actions). Ablation studies validate CAT signals, evolution, and predictive discrepancy as essential. We release code and data for reproducibility.
【5】Dynamical Regimes of Multimodal Diffusion Models
标题:多峰扩散模型的动力学区域
链接:https://arxiv.org/abs/2602.04780
作者:Emil Albrychiewicz,Andrés Franco Valiente,Li-Ching Chen
备注:40 pages, 14 figures
摘要:Diffusion based generative models have achieved unprecedented fidelity in synthesizing high dimensional data, yet the theoretical mechanisms governing multimodal generation remain poorly understood. Here, we present a theoretical framework for coupled diffusion models, using coupled Ornstein-Uhlenbeck processes as a tractable model. By using the nonequilibrium statistical physics of dynamical phase transitions, we demonstrate that multimodal generation is governed by a spectral hierarchy of interaction timescales rather than simultaneous resolution. A key prediction is the ``synchronization gap'', a temporal window during the reverse generative process where distinct eigenmodes stabilize at different rates, providing a theoretical explanation for common desynchronization artifacts. We derive analytical conditions for speciation and collapse times under both symmetric and anisotropic coupling regimes, establishing strict bounds for coupling strength to avoid unstable symmetry breaking. We show that the coupling strength acts as a spectral filter that enforces a tunable temporal hierarchy on generation. We support these predictions through controlled experiments with diffusion models trained on MNIST datasets and exact score samplers. These results motivate time dependent coupling schedules that target mode specific timescales, offering a potential alternative to ad hoc guidance tuning.
【6】Generative Modeling via Drifting
标题:通过漂移进行生成建模
链接:https://arxiv.org/abs/2602.04770
作者:Mingyang Deng,He Li,Tianhong Li,Yilun Du,Kaiming He
备注:Project page: https://lambertae.github.io/projects/drifting/
摘要:Generative modeling can be formulated as learning a mapping f such that its pushforward distribution matches the data distribution. The pushforward behavior can be carried out iteratively at inference time, for example in diffusion and flow-based models. In this paper, we propose a new paradigm called Drifting Models, which evolve the pushforward distribution during training and naturally admit one-step inference. We introduce a drifting field that governs the sample movement and achieves equilibrium when the distributions match. This leads to a training objective that allows the neural network optimizer to evolve the distribution. In experiments, our one-step generator achieves state-of-the-art results on ImageNet at 256 x 256 resolution, with an FID of 1.54 in latent space and 1.61 in pixel space. We hope that our work opens up new opportunities for high-quality one-step generation.
【7】Finding Structure in Continual Learning
标题:在持续学习中寻找结构
链接:https://arxiv.org/abs/2602.04555
作者:Pourya Shamsolmoali,Masoumeh Zareapoor
备注:Submitted to NeurIPS 2025
摘要:Learning from a stream of tasks usually pits plasticity against stability: acquiring new knowledge often causes catastrophic forgetting of past information. Most methods address this by summing competing loss terms, creating gradient conflicts that are managed with complex and often inefficient strategies such as external memory replay or parameter regularization. We propose a reformulation of the continual learning objective using Douglas-Rachford Splitting (DRS). This reframes the learning process not as a direct trade-off, but as a negotiation between two decoupled objectives: one promoting plasticity for new tasks and the other enforcing stability of old knowledge. By iteratively finding a consensus through their proximal operators, DRS provides a more principled and stable learning dynamic. Our approach achieves an efficient balance between stability and plasticity without the need for auxiliary modules or complex add-ons, providing a simpler yet more powerful paradigm for continual learning systems.
【8】Gradient Flow Through Diagram Expansions: Learning Regimes and Explicit Solutions
标题:通过图表展开的梯度流动:学习机制和显式解决方案
链接:https://arxiv.org/abs/2602.04548
作者:Dmitry Yarotsky,Eugene Golikov,Yaroslav Gusev
备注:48 pages, under review for ICML'2026
摘要:We develop a general mathematical framework to analyze scaling regimes and derive explicit analytic solutions for gradient flow (GF) in large learning problems. Our key innovation is a formal power series expansion of the loss evolution, with coefficients encoded by diagrams akin to Feynman diagrams. We show that this expansion has a well-defined large-size limit that can be used to reveal different learning phases and, in some cases, to obtain explicit solutions of the nonlinear GF. We focus on learning Canonical Polyadic (CP) decompositions of high-order tensors, and show that this model has several distinct extreme lazy and rich GF regimes such as free evolution, NTK and under- and over-parameterized mean-field. We show that these regimes depend on the parameter scaling, tensor order, and symmetry of the model in a specific and subtle way. Moreover, we propose a general approach to summing the formal loss expansion by reducing it to a PDE; in a wide range of scenarios, it turns out to be 1st order and solvable by the method of characteristics. We observe a very good agreement of our theoretical predictions with experiment.
【9】Continual Learning through Control Minimization
标题:通过控制最小化持续学习
链接:https://arxiv.org/abs/2602.04542
作者:Sander de Haan,Yassine Taoudi-Benchekroun,Pau Vilimelis Aceituno,Benjamin F. Grewe
摘要:Catastrophic forgetting remains a fundamental challenge for neural networks when tasks are trained sequentially. In this work, we reformulate continual learning as a control problem where learning and preservation signals compete within neural activity dynamics. We convert regularization penalties into preservation signals that protect prior-task representations. Learning then proceeds by minimizing the control effort required to integrate new tasks while competing with the preservation of prior tasks. At equilibrium, the neural activities produce weight updates that implicitly encode the full prior-task curvature, a property we term the continual-natural gradient, requiring no explicit curvature storage. Experiments confirm that our learning framework recovers true prior-task curvature and enables task discrimination, outperforming existing methods on standard benchmarks without replay.
【10】RASA: Routing-Aware Safety Alignment for Mixture-of-Experts Models
标题:RASA:专家混合模型的竞争意识安全调整
链接:https://arxiv.org/abs/2602.04448
作者:Jiacheng Liang,Yuhui Wang,Tanqiu Jiang,Ting Wang
备注:9 pages
摘要:Mixture-of-Experts (MoE) language models introduce unique challenges for safety alignment due to their sparse routing mechanisms, which can enable degenerate optimization behaviors under standard full-parameter fine-tuning. In our preliminary experiments, we observe that naively applying full-parameter safety fine-tuning to MoE models can reduce attack success rates through routing or expert dominance effects, rather than by directly repairing Safety-Critical Experts. To address this challenge, we propose RASA, a routing-aware expert-level alignment framework that explicitly repairs Safety-Critical Experts while preventing routing-based bypasses. RASA identifies experts disproportionately activated by successful jailbreaks, selectively fine-tunes only these experts under fixed routing, and subsequently enforces routing consistency with safety-aligned contexts. Across two representative MoE architectures and a diverse set of jailbreak attacks, RASA achieves near-perfect robustness, strong cross-attack generalization, and substantially reduced over-refusal, while preserving general capabilities on benchmarks such as MMLU, GSM8K, and TruthfulQA. Our results suggest that robust MoE safety alignment benefits from targeted expert repair rather than global parameter updates, offering a practical and architecture-preserving alternative to prior approaches.
【11】Theory of Speciation Transitions in Diffusion Models with General Class Structure
标题:具有一般类结构的扩散模型中的物种转变理论
链接:https://arxiv.org/abs/2602.04404
作者:Beatrice Achilli,Marco Benedetti,Giulio Biroli,Marc Mézard
备注:17 pages, 6 figures
摘要:Diffusion Models generate data by reversing a stochastic diffusion process, progressively transforming noise into structured samples drawn from a target distribution. Recent theoretical work has shown that this backward dynamics can undergo sharp qualitative transitions, known as speciation transitions, during which trajectories become dynamically committed to data classes. Existing theoretical analyses, however, are limited to settings where classes are identifiable through first moments, such as mixtures of Gaussians with well-separated means. In this work, we develop a general theory of speciation in diffusion models that applies to arbitrary target distributions admitting well-defined classes. We formalize the notion of class structure through Bayes classification and characterize speciation times in terms of free-entropy difference between classes. This criterion recovers known results in previously studied Gaussian-mixture models, while extending to situations in which classes are not distinguishable by first moments and may instead differ through higher-order or collective features. Our framework also accommodates multiple classes and predicts the existence of successive speciation times associated with increasingly fine-grained class commitment. We illustrate the theory on two analytically tractable examples: mixtures of one-dimensional Ising models at different temperatures and mixtures of zero-mean Gaussians with distinct covariance structures. In the Ising case, we obtain explicit expressions for speciation times by mapping the problem onto a random-field Ising model and solving it via the replica method. Our results provide a unified and broadly applicable description of speciation transitions in diffusion-based generative models.
【12】SparVAR: Exploring Sparsity in Visual AutoRegressive Modeling for Training-Free Acceleration
标题:SparVAR:探索视觉自回归建模中的稀疏性以实现免训练加速
链接:https://arxiv.org/abs/2602.04361
作者:Zekun Li,Ning Wang,Tongxin Bai,Changwang Mei,Peisong Wang,Shuang Qiu,Jian Cheng
摘要
:Visual AutoRegressive (VAR) modeling has garnered significant attention for its innovative next-scale prediction paradigm. However, mainstream VAR paradigms attend to all tokens across historical scales at each autoregressive step. As the next scale resolution grows, the computational complexity of attention increases quartically with resolution, causing substantial latency. Prior accelerations often skip high-resolution scales, which speeds up inference but discards high-frequency details and harms image quality. To address these problems, we present SparVAR, a training-free acceleration framework that exploits three properties of VAR attention: (i) strong attention sinks, (ii) cross-scale activation similarity, and (iii) pronounced locality. Specifically, we dynamically predict the sparse attention pattern of later high-resolution scales from a sparse decision scale, and construct scale self-similar sparse attention via an efficient index-mapping mechanism, enabling high-efficiency sparse attention computation at large scales. Furthermore, we propose cross-scale local sparse attention and implement an efficient block-wise sparse kernel, which achieves $\mathbf{> 5\times}$ faster forward speed than FlashAttention. Extensive experiments demonstrate that the proposed SparseVAR can reduce the generation time of an 8B model producing $1024\times1024$ high-resolution images to the 1s, without skipping the last scales. Compared with the VAR baseline accelerated by FlashAttention, our method achieves a $\mathbf{1.57\times}$ speed-up while preserving almost all high-frequency details. When combined with existing scale-skipping strategies, SparseVAR attains up to a $\mathbf{2.28\times}$ acceleration, while maintaining competitive visual generation quality. Code is available at https://github.com/CAS-CLab/SparVAR.
【13】Mosaic Learning: A Framework for Decentralized Learning with Model Fragmentation
标题:马赛克学习:具有模型碎片化的去中心化学习框架
链接:https://arxiv.org/abs/2602.04352
作者:Sayan Biswas,Davide Frey,Romaric Gaudel,Nirupam Gupta,Anne-Marie Kermarrec,Dimitri Lerévérend,Rafael Pires,Rishi Sharma,François Taïani,Martijn de Vos
摘要:Decentralized learning (DL) enables collaborative machine learning (ML) without a central server, making it suitable for settings where training data cannot be centrally hosted. We introduce Mosaic Learning, a DL framework that decomposes models into fragments and disseminates them independently across the network. Fragmentation reduces redundant communication across correlated parameters and enables more diverse information propagation without increasing communication cost. We theoretically show that Mosaic Learning (i) shows state-of-the-art worst-case convergence rate, and (ii) leverages parameter correlation in an ML model, improving contraction by reducing the highest eigenvalue of a simplified system. We empirically evaluate Mosaic Learning on four learning tasks and observe up to 12 percentage points higher node-level test accuracy compared to epidemic learning (EL), a state-of-the-art baseline. In summary, Mosaic Learning improves DL performance without sacrificing its utility or efficiency, and positions itself as a new DL standard.
【14】RISE: Interactive Visual Diagnosis of Fairness in Machine Learning Models
标题:RISE:机器学习模型公平性的交互式视觉诊断
链接:https://arxiv.org/abs/2602.04339
作者:Ray Chen,Christan Grant
摘要:Evaluating fairness under domain shift is challenging because scalar metrics often obscure exactly where and how disparities arise. We introduce \textit{RISE} (Residual Inspection through Sorted Evaluation), an interactive visualization tool that converts sorted residuals into interpretable patterns. By connecting residual curve structures to formal fairness notions, RISE enables localized disparity diagnosis, subgroup comparison across environments, and the detection of hidden fairness issues. Through post-hoc analysis, RISE exposes accuracy-fairness trade-offs that aggregate statistics miss, supporting more informed model selection.
【15】Proxy Compression for Language Modeling
标题:语言建模的代理压缩
链接:https://arxiv.org/abs/2602.04289
作者:Lin Zheng,Xinyu Li,Qian Liu,Xiachong Feng,Lingpeng Kong
摘要:Modern language models are trained almost exclusively on token sequences produced by a fixed tokenizer, an external lossless compressor often over UTF-8 byte sequences, thereby coupling the model to that compressor. This work introduces proxy compression, an alternative training scheme that preserves the efficiency benefits of compressed inputs while providing an end-to-end, raw-byte interface at inference time. During training, one language model is jointly trained on raw byte sequences and compressed views generated by external compressors; through the process, the model learns to internally align compressed sequences and raw bytes. This alignment enables strong transfer between the two formats, even when training predominantly on compressed inputs which are discarded at inference. Extensive experiments on code language modeling demonstrate that proxy compression substantially improves training efficiency and significantly outperforms pure byte-level baselines given fixed compute budgets. As model scale increases, these gains become more pronounced, and proxy-trained models eventually match or rival tokenizer approaches, all while operating solely on raw bytes and retaining the inherent robustness of byte-level modeling.
【16】Convolution Operator Network for Forward and Inverse Problems (FI-Conv): Application to Plasma Turbulence Simulations
标题:正反问题卷积运算符网络(FI-Conv):在等离子体湍流模拟中的应用
链接:https://arxiv.org/abs/2602.04287
作者:Xingzhuo Chen,Anthony Poole,Ionut-Gabriel Farcas,David R. Hatch,Ulisses Braga-Neto
摘要
:We propose the Convolutional Operator Network for Forward and Inverse Problems (FI-Conv), a framework capable of predicting system evolution and estimating parameters in complex spatio-temporal dynamics, such as turbulence. FI-Conv is built on a U-Net architecture, in which most convolutional layers are replaced by ConvNeXt V2 blocks. This design preserves U-Net performance on inputs with high-frequency variations while maintaining low computational complexity. FI-Conv uses an initial state, PDE parameters, and evolution time as input to predict the system future state. As a representative example of a system exhibiting complex dynamics, we evaluate the performance of FI-Conv on the task of predicting turbulent plasma fields governed by the Hasegawa-Wakatani (HW) equations. The HW system models two-dimensional electrostatic drift-wave turbulence and exhibits strongly nonlinear behavior, making accurate approximation and long-term prediction particularly challenging. Using an autoregressive forecasting procedure, FI-Conv achieves accurate forward prediction of the plasma state evolution over short times (t ~ 3) and captures the statistic properties of derived physical quantities of interest over longer times (t ~ 100). Moreover, we develop a gradient-descent-based inverse estimation method that accurately infers PDE parameters from plasma state evolution data, without modifying the trained model weights. Collectively, our results demonstrate that FI-Conv can be an effective alternative to existing physics-informed machine learning methods for systems with complex spatio-temporal dynamics.
【17】From Dead Neurons to Deep Approximators: Deep Bernstein Networks as a Provable Alternative to Residual Layers
标题:从死亡神经元到深度逼近器:深度伯恩斯坦网络作为剩余层的可证明替代方案
链接:https://arxiv.org/abs/2602.04264
作者:Ibrahim Albool,Malak Gamal El-Din,Salma Elmalaki,Yasser Shoukry
备注:15 pages
摘要:Residual connections are the de facto standard for mitigating vanishing gradients, yet they impose structural constraints and fail to address the inherent inefficiencies of piecewise linear activations. We show that Deep Bernstein Networks (which utilizes Bernstein polynomials as activation functions) can act as residual-free architecture while simultaneously optimize trainability and representation power. We provide a two-fold theoretical foundation for our approach. First, we derive a theoretical lower bound on the local derivative, proving it remains strictly bounded away from zero. This directly addresses the root cause of gradient stagnation; empirically, our architecture reduces ``dead'' neurons from 90\% in standard deep networks to less than 5\%, outperforming ReLU, Leaky ReLU, SeLU, and GeLU. Second, we establish that the approximation error for Bernstein-based networks decays exponentially with depth, a significant improvement over the polynomial rates of ReLU-based architectures. By unifying these results, we demonstrate that Bernstein activations provide a superior mechanism for function approximation and signal flow. Our experiments on HIGGS and MNIST confirm that Deep Bernstein Networks achieve high-performance training without skip-connections, offering a principled path toward deep, residual-free architectures with enhanced expressive capacity.
【18】Cascading Robustness Verification: Toward Efficient Model-Agnostic Certification
标题:级联稳健性验证:迈向高效的模型不可知认证
链接:https://arxiv.org/abs/2602.04236
作者:Mohammadreza Maleki,Rushendra Sidibomma,Arman Adibi,Reza Samavi
摘要:Certifying neural network robustness against adversarial examples is challenging, as formal guarantees often require solving non-convex problems. Hence, incomplete verifiers are widely used because they scale efficiently and substantially reduce the cost of robustness verification compared to complete methods. However, relying on a single verifier can underestimate robustness because of loose approximations or misalignment with training methods. In this work, we propose Cascading Robustness Verification (CRV), which goes beyond an engineering improvement by exposing fundamental limitations of existing robustness metric and introducing a framework that enhances both reliability and efficiency. CRV is a model-agnostic verifier, meaning that its robustness guarantees are independent of the model's training process. The key insight behind the CRV framework is that, when using multiple verification methods, an input is certifiably robust if at least one method certifies it as robust. Rather than relying solely on a single verifier with a fixed constraint set, CRV progressively applies multiple verifiers to balance the tightness of the bound and computational cost. Starting with the least expensive method, CRV halts as soon as an input is certified as robust; otherwise, it proceeds to more expensive methods. For computationally expensive methods, we introduce a Stepwise Relaxation Algorithm (SR) that incrementally adds constraints and checks for certification at each step, thereby avoiding unnecessary computation. Our theoretical analysis demonstrates that CRV achieves equal or higher verified accuracy compared to powerful but computationally expensive incomplete verifiers in the cascade, while significantly reducing verification overhead. Empirical results confirm that CRV certifies at least as many inputs as benchmark approaches, while improving runtime efficiency by up to ~90%.
【19】LORE: Jointly Learning the Intrinsic Dimensionality and Relative Similarity Structure From Ordinal Data
标题:LORE:从有序数据中联合学习内在相似性和相对相似性结构
链接:https://arxiv.org/abs/2602.04192
作者:Vivek Anand,Alec Helbling,Mark Davenport,Gordon Berman,Sankar Alagapan,Christopher Rozell
备注:10 Pages, 31 with appendix: Accepted at ICLR 2026
摘要:Learning the intrinsic dimensionality of subjective perceptual spaces such as taste, smell, or aesthetics from ordinal data is a challenging problem. We introduce LORE (Low Rank Ordinal Embedding), a scalable framework that jointly learns both the intrinsic dimensionality and an ordinal embedding from noisy triplet comparisons of the form, "Is A more similar to B than C?". Unlike existing methods that require the embedding dimension to be set apriori, LORE regularizes the solution using the nonconvex Schatten-$p$ quasi norm, enabling automatic joint recovery of both the ordinal embedding and its dimensionality. We optimize this joint objective via an iteratively reweighted algorithm and establish convergence guarantees. Extensive experiments on synthetic datasets, simulated perceptual spaces, and real world crowdsourced ordinal judgements show that LORE learns compact, interpretable and highly accurate low dimensional embeddings that recover the latent geometry of subjective percepts. By simultaneously inferring both the intrinsic dimensionality and ordinal embeddings, LORE enables more interpretable and data efficient perceptual modeling in psychophysics and opens new directions for scalable discovery of low dimensional structure from ordinal data in machine learning.
【20】Training Data Efficiency in Multimodal Process Reward Models
标题:多模式流程奖励模型中训练数据效率
链接:https://arxiv.org/abs/2602.04145
作者:Jinyuan Li,Chengsong Huang,Langlin Huang,Shaoyang Xu,Haolin Liu,Wenxuan Zhang,Jiaxin Huang
摘要
:Multimodal Process Reward Models (MPRMs) are central to step-level supervision for visual reasoning in MLLMs. Training MPRMs typically requires large-scale Monte Carlo (MC)-annotated corpora, incurring substantial training cost. This paper studies the data efficiency for MPRM training.Our preliminary experiments reveal that MPRM training quickly saturates under random subsampling of the training data, indicating substantial redundancy within existing MC-annotated corpora.To explain this, we formalize a theoretical framework and reveal that informative gradient updates depend on two factors: label mixtures of positive/negative steps and label reliability (average MC scores of positive steps). Guided by these insights, we propose the Balanced-Information Score (BIS), which prioritizes both mixture and reliability based on existing MC signals at the rollout level, without incurring any additional cost. Across two backbones (InternVL2.5-8B and Qwen2.5-VL-7B) on VisualProcessBench, BIS-selected subsets consistently match and even surpass the full-data performance at small fractions. Notably, the BIS subset reaches full-data performance using only 10% of the training data, improving over random subsampling by a relative 4.1%.
【21】Learning to Reason in 13 Parameters
标题:学习13个参数推理
链接:https://arxiv.org/abs/2602.04118
作者:John X. Morris,Niloofar Mireshghallah,Mark Ibrahim,Saeed Mahloujifar
摘要:Recent research has shown that language models can learn to \textit{reason}, often via reinforcement learning. Some work even trains low-rank parameterizations for reasoning, but conventional LoRA cannot scale below the model dimension. We question whether even rank=1 LoRA is necessary for learning to reason and propose TinyLoRA, a method for scaling low-rank adapters to sizes as small as one parameter. Within our new parameterization, we are able to train the 8B parameter size of Qwen2.5 to 91\% accuracy on GSM8K with only 13 trained parameters in bf16 (26 total bytes). We find this trend holds in general: we are able to recover 90\% of performance improvements while training $1000x$ fewer parameters across a suite of more difficult learning-to-reason benchmarks such as AIME, AMC, and MATH500. Notably, we are only able to achieve such strong performance with RL: models trained using SFT require $100-1000x$ larger updates to reach the same performance.
【22】Turning mechanistic models into forecasters by using machine learning
标题:利用机器学习将机械模型转变为预测者
链接:https://arxiv.org/abs/2602.04114
作者:Amit K. Chakraborty,Hao Wang,Pouria Ramazi
备注:47 pages, 11 figures
摘要:The equations of complex dynamical systems may not be identified by expert knowledge, especially if the underlying mechanisms are unknown. Data-driven discovery methods address this challenge by inferring governing equations from time-series data using a library of functions constructed from the measured variables. However, these methods typically assume time-invariant coefficients, which limits their ability to capture evolving system dynamics. To overcome this limitation, we allow some of the parameters to vary over time, learn their temporal evolution directly from data, and infer a system of equations that incorporates both constant and time-varying parameters. We then transform this framework into a forecasting model by predicting the time-varying parameters and substituting these predictions into the learned equations. The model is validated using datasets for Susceptible-Infected-Recovered, Consumer--Resource, greenhouse gas concentration, and Cyanobacteria cell count. By dynamically adapting to temporal shifts, our proposed model achieved a mean absolute error below 3\% for learning a time series and below 6\% for forecasting up to a month ahead. We additionally compare forecasting performance against CNN-LSTM and Gradient Boosting Machine (GBM), and show that our model outperforms these methods across most datasets. Our findings demonstrate that integrating time-varying parameters into data-driven discovery of differential equations improves both modeling accuracy and forecasting performance.
【23】A Probabilistic Framework for Solving High-Frequency Helmholtz Equations via Diffusion Models
标题:通过扩散模型求解高频Helmholtz方程的概率框架
链接:https://arxiv.org/abs/2602.04082
作者:Yicheng Zou,Samuel Lanthaler,Hossein Salahshoor
摘要:Deterministic neural operators perform well on many PDEs but can struggle with the approximation of high-frequency wave phenomena, where strong input-to-output sensitivity makes operator learning challenging, and spectral bias blurs oscillations. We argue for adopting a probabilistic approach for approximating waves in high-frequency regime, and develop our probabilistic framework using a score-based conditional diffusion operator. After demonstrating a stability analysis of the Helmholtz operator, we present our numerical experiments across a wide range of frequencies, benchmarked against other popular data-driven and machine learning approaches for waves. We show that our probabilistic neural operator consistently produces robust predictions with the lowest errors in $L^2$, $H^1$, and energy norms. Moreover, unlike all the other tested deterministic approaches, our framework remarkably captures uncertainties in the input sound speed map propagated to the solution field. We envision that our results position probabilistic operator learning as a principled and effective approach for solving complex PDEs such as Helmholtz in the challenging high-frequency regime.
【24】Principles of Lipschitz continuity in neural networks
标题:神经网络中的Lipschitz连续性原理
链接:https://arxiv.org/abs/2602.04078
作者:Róisín Luo
备注:Ph.D. Thesis
摘要
:Deep learning has achieved remarkable success across a wide range of domains, significantly expanding the frontiers of what is achievable in artificial intelligence. Yet, despite these advances, critical challenges remain -- most notably, ensuring robustness to small input perturbations and generalization to out-of-distribution data. These critical challenges underscore the need to understand the underlying fundamental principles that govern robustness and generalization. Among the theoretical tools available, Lipschitz continuity plays a pivotal role in governing the fundamental properties of neural networks related to robustness and generalization. It quantifies the worst-case sensitivity of network's outputs to small input perturbations. While its importance is widely acknowledged, prior research has predominantly focused on empirical regularization approaches based on Lipschitz constraints, leaving the underlying principles less explored. This thesis seeks to advance a principled understanding of the principles of Lipschitz continuity in neural networks within the paradigm of machine learning, examined from two complementary perspectives: an internal perspective -- focusing on the temporal evolution of Lipschitz continuity in neural networks during training (i.e., training dynamics); and an external perspective -- investigating how Lipschitz continuity modulates the behavior of neural networks with respect to features in the input data, particularly its role in governing frequency signal propagation (i.e., modulation of frequency signal propagation).
【25】An Empirical Survey and Benchmark of Learned Distance Indexes for Road Networks
标题:道路网学习距离指数的实证调查和基准
链接:https://arxiv.org/abs/2602.04068
作者:Gautam Choudhary,Libin Zhou,Yeasir Rayhan,Walid G. Aref
备注:Preprint (Under Review). 14 pages, 2 figures
摘要:The calculation of shortest-path distances in road networks is a core operation in navigation systems, location-based services, and spatial analytics. Although classical algorithms, e.g., Dijkstra's algorithm, provide exact answers, their latency is prohibitive for modern real-time, large-scale deployments. Over the past two decades, numerous distance indexes have been proposed to speed up query processing for shortest distance queries. More recently, with the advancement in machine learning (ML), researchers have designed and proposed ML-based distance indexes to answer approximate shortest path and distance queries efficiently. However, a comprehensive and systematic evaluation of these ML-based approaches is lacking. This paper presents the first empirical survey of ML-based distance indexes on road networks, evaluating them along four key dimensions: Training time, query latency, storage, and accuracy. Using seven real-world road networks and workload-driven query datasets derived from trajectory data, we benchmark ten representative ML techniques and compare them against strong classical non-ML baselines, highlighting key insights and practical trade-offs. We release a unified open-source codebase to support reproducibility and future research on learned distance indexes.
【26】PluRel: Synthetic Data unlocks Scaling Laws for Relational Foundation Models
标题:PluRel:合成数据解开关系基础模型的缩放定律
链接:https://arxiv.org/abs/2602.04029
作者:Vignesh Kothapalli,Rishabh Ranjan,Valter Hudovernik,Vijay Prakash Dwivedi,Johannes Hoffart,Carlos Guestrin,Jure Leskovec
备注:Code: https://github.com/snap-stanford/plurel
摘要:Relational Foundation Models (RFMs) facilitate data-driven decision-making by learning from complex multi-table databases. However, the diverse relational databases needed to train such models are rarely public due to privacy constraints. While there are methods to generate synthetic tabular data of arbitrary size, incorporating schema structure and primary--foreign key connectivity for multi-table generation remains challenging. Here we introduce PluRel, a framework to synthesize multi-tabular relational databases from scratch. In a step-by-step fashion, PluRel models (1) schemas with directed graphs, (2) inter-table primary-foreign key connectivity with bipartite graphs, and, (3) feature distributions in tables via conditional causal mechanisms. The design space across these stages supports the synthesis of a wide range of diverse databases, while being computationally lightweight. Using PluRel, we observe for the first time that (1) RFM pretraining loss exhibits power-law scaling with the number of synthetic databases and total pretraining tokens, (2) scaling the number of synthetic databases improves generalization to real databases, and (3) synthetic pretraining yields strong base models for continued pretraining on real databases. Overall, our framework and results position synthetic data scaling as a promising paradigm for RFMs.
【27】Group Contrastive Learning for Weakly Paired Multimodal Data
标题:弱配对多峰数据的群体对比学习
链接:https://arxiv.org/abs/2602.04021
作者:Aditya Gorla,Hugues Van Assel,Jan-Christian Huetter,Heming Yao,Kyunghyun Cho,Aviv Regev,Russell Littman
摘要:We present GROOVE, a semi-supervised multi-modal representation learning approach for high-content perturbation data where samples across modalities are weakly paired through shared perturbation labels but lack direct correspondence. Our primary contribution is GroupCLIP, a novel group-level contrastive loss that bridges the gap between CLIP for paired cross-modal data and SupCon for uni-modal supervised contrastive learning, addressing a fundamental gap in contrastive learning for weakly-paired settings. We integrate GroupCLIP with an on-the-fly backtranslating autoencoder framework to encourage cross-modally entangled representations while maintaining group-level coherence within a shared latent space. Critically, we introduce a comprehensive combinatorial evaluation framework that systematically assesses representation learners across multiple optimal transport aligners, addressing key limitations in existing evaluation strategies. This framework includes novel simulations that systematically vary shared versus modality-specific perturbation effects enabling principled assessment of method robustness. Our combinatorial benchmarking reveals that there is not yet an aligner that uniformly dominates across settings or modality pairs. Across simulations and two real single-cell genetic perturbation datasets, GROOVE performs on par with or outperforms existing approaches for downstream cross-modal matching and imputation tasks. Our ablation studies demonstrate that GroupCLIP is the key component driving performance gains. These results highlight the importance of leveraging group-level constraints for effective multi-modal representation learning in scenarios where only weak pairing is available.
【28】PromptSplit: Revealing Prompt-Level Disagreement in Generative Models
标题:EntSplit:揭示生成模型中预算级别的分歧
链接:https://arxiv.org/abs/2602.04009
作者:Mehdi Lotfian,Mohammad Jalali,Farzan Farnia
摘要
:Prompt-guided generative AI models have rapidly expanded across vision and language domains, producing realistic and diverse outputs from textual inputs. The growing variety of such models, trained with different data and architectures, calls for principled methods to identify which types of prompts lead to distinct model behaviors. In this work, we propose PromptSplit, a kernel-based framework for detecting and analyzing prompt-dependent disagreement between generative models. For each compared model pair, PromptSplit constructs a joint prompt--output representation by forming tensor-product embeddings of the prompt and image (or text) features, and then computes the corresponding kernel covariance matrix. We utilize the eigenspace of the weighted difference between these matrices to identify the main directions of behavioral difference across prompts. To ensure scalability, we employ a random-projection approximation that reduces computational complexity to $O(nr^2 + r^3)$ for projection dimension $r$. We further provide a theoretical analysis showing that this approximation yields an eigenstructure estimate whose expected deviation from the full-dimensional result is bounded by $O(1/r^2)$. Experiments across text-to-image, text-to-text, and image-captioning settings demonstrate that PromptSplit accurately detects ground-truth behavioral differences and isolates the prompts responsible, offering an interpretable tool for detecting where generative models disagree.
【29】Rational ANOVA Networks
标题:理性方差分析网络
链接:https://arxiv.org/abs/2602.04006
作者:Jusheng Zhang,Ningyuan Liu,Qinhan Lyu,Jing Yang,Keze Wang
备注:Code: \url{https://github.com/jushengzhang/Rational-ANOVA-Networks.git}
摘要:Deep neural networks typically treat nonlinearities as fixed primitives (e.g., ReLU), limiting both interpretability and the granularity of control over the induced function class. While recent additive models (like KANs) attempt to address this using splines, they often suffer from computational inefficiency and boundary instability. We propose the Rational-ANOVA Network (RAN), a foundational architecture grounded in functional ANOVA decomposition and Padé-style rational approximation. RAN models f(x) as a composition of main effects and sparse pairwise interactions, where each component is parameterized by a stable, learnable rational unit. Crucially, we enforce a strictly positive denominator, which avoids poles and numerical instability while capturing sharp transitions and near-singular behaviors more efficiently than polynomial bases. This ANOVA structure provides an explicit low-order interaction bias for data efficiency and interpretability, while the rational parameterization significantly improves extrapolation. Across controlled function benchmarks and vision classification tasks (e.g., CIFAR-10) under matched parameter and compute budgets, RAN matches or surpasses parameter-matched MLPs and learnable-activation baselines, with better stability and throughput. Code is available at https://github.com/jushengzhang/Rational-ANOVA-Networks.git.
【30】Grables: Tabular Learning Beyond Independent Rows
标题:Grables:超越独立收件箱的表格学习
链接:https://arxiv.org/abs/2602.03945
作者:Tamara Cucumides,Floris Geerts
摘要:Tabular learning is still dominated by row-wise predictors that score each row independently, which fits i.i.d. benchmarks but fails on transactional, temporal, and relational tables where labels depend on other rows. We show that row-wise prediction rules out natural targets driven by global counts, overlaps, and relational patterns. To make "using structure" precise across architectures, we introduce grables: a modular interface that separates how a table is lifted to a graph (constructor) from how predictions are computed on that graph (node predictor), pinpointing where expressive power comes from. Experiments on synthetic tasks, transaction data, and a RelBench clinical-trials dataset confirm the predicted separations: message passing captures inter-row dependencies that row-local models miss, and hybrid approaches that explicitly extract inter-row structure and feed it to strong tabular learners yield consistent gains.
【31】Phaedra: Learning High-Fidelity Discrete Tokenization for the Physical Science
标题:Phaedra:学习物理科学的高保真离散令牌化
链接:https://arxiv.org/abs/2602.03915
作者:Levi Lingsch,Georgios Kissas,Johannes Jakubik,Siddhartha Mishra
备注:57 pages, 27 figures
摘要:Tokens are discrete representations that allow modern deep learning to scale by transforming high-dimensional data into sequences that can be efficiently learned, generated, and generalized to new tasks. These have become foundational for image and video generation and, more recently, physical simulation. As existing tokenizers are designed for the explicit requirements of realistic visual perception of images, it is necessary to ask whether these approaches are optimal for scientific images, which exhibit a large dynamic range and require token embeddings to retain physical and spectral properties. In this work, we investigate the accuracy of a suite of image tokenizers across a range of metrics designed to measure the fidelity of PDE properties in both physical and spectral space. Based on the observation that these struggle to capture both fine details and precise magnitudes, we propose Phaedra, inspired by classical shape-gain quantization and proper orthogonal decomposition. We demonstrate that Phaedra consistently improves reconstruction across a range of PDE datasets. Additionally, our results show strong out-of-distribution generalization capabilities to three tasks of increasing complexity, namely known PDEs with different conditions, unknown PDEs, and real-world Earth observation and weather data.
【32】The Role of Target Update Frequencies in Q-Learning
标题:目标更新频率在Q学习中的作用
链接:https://arxiv.org/abs/2602.03911
作者:Simon Weissmann,Tilman Aach,Benedikt Wille,Sebastian Kassing,Leif Döring
摘要:The target network update frequency (TUF) is a central stabilization mechanism in (deep) Q-learning. However, their selection remains poorly understood and is often treated merely as another tunable hyperparameter rather than as a principled design decision. This work provides a theoretical analysis of target fixing in tabular Q-learning through the lens of approximate dynamic programming. We formulate periodic target updates as a nested optimization scheme in which each outer iteration applies an inexact Bellman optimality operator, approximated by a generic inner loop optimizer. Rigorous theory yields a finite-time convergence analysis for the asynchronous sampling setting, specializing to stochastic gradient descent in the inner loop. Our results deliver an explicit characterization of the bias-variance trade-off induced by the target update period, showing how to optimally set this critical hyperparameter. We prove that constant target update schedules are suboptimal, incurring a logarithmic overhead in sample complexity that is entirely avoidable with adaptive schedules. Our analysis shows that the optimal target update frequency increases geometrically over the course of the learning process.
【33】TruKAN: Towards More Efficient Kolmogorov-Arnold Networks Using Truncated Power Functions
标题:TruKAN:使用截短功率函数实现更高效的Kolmogorov-Arnold网络
链接:https://arxiv.org/abs/2602.03879
作者:Ali Bayeh,Samira Sadaoui,Malek Mouhoub
备注:23 pages, 9 figures
摘要:To address the trade-off between computational efficiency and adherence to Kolmogorov-Arnold Network (KAN) principles, we propose TruKAN, a new architecture based on the KAN structure and learnable activation functions. TruKAN replaces the B-spline basis in KAN with a family of truncated power functions derived from k-order spline theory. This change maintains the KAN's expressiveness while enhancing accuracy and training time. Each TruKAN layer combines a truncated power term with a polynomial term and employs either shared or individual knots. TruKAN exhibits greater interpretability than other KAN variants due to its simplified basis functions and knot configurations. By prioritizing interpretable basis functions, TruKAN aims to balance approximation efficacy with transparency. We develop the TruKAN model and integrate it into an advanced EfficientNet-V2-based framework, which is then evaluated on computer vision benchmark datasets. To ensure a fair comparison, we develop various models: MLP-, KAN-, SineKAN and TruKAN-based EfficientNet frameworks and assess their training time and accuracy across small and deep architectures. The training phase uses hybrid optimization to improve convergence stability. Additionally, we investigate layer normalization techniques for all the models and assess the impact of shared versus individual knots in TruKAN. Overall, TruKAN outperforms other KAN models in terms of accuracy, computational efficiency and memory usage on the complex vision task, demonstrating advantages beyond the limited settings explored in prior KAN studies.
【34】Reversible Deep Learning for 13C NMR in Chemoinformatics: On Structures and Spectra
标题:化学信息学中13 C核磁共振的可逆深度学习:结构和光谱
链接:https://arxiv.org/abs/2602.03875
作者:Stefan Kuhn,Vandana Dwarka,Przemyslaw Karol Grenda,Eero Vainikko
备注:10 pages, 4 figures, 4 tables
摘要:We introduce a reversible deep learning model for 13C NMR that uses a single conditional invertible neural network for both directions between molecular structures and spectra. The network is built from i-RevNet style bijective blocks, so the forward map and its inverse are available by construction. We train the model to predict a 128-bit binned spectrum code from a graph-based structure encoding, while the remaining latent dimensions capture residual variability. At inference time, we invert the same trained network to generate structure candidates from a spectrum code, which explicitly represents the one-to-many nature of spectrum-to-structure inference. On a filtered subset, the model is numerically invertible on trained examples, achieves spectrum-code prediction above chance, and produces coarse but meaningful structural signals when inverted on validation spectra. These results demonstrate that invertible architectures can unify spectrum prediction and uncertainty-aware candidate generation within one end-to-end model.
【35】Beyond Learning on Molecules by Weakly Supervising on Molecules
标题:通过弱分子监督超越分子学习
链接:https://arxiv.org/abs/2602.04696
作者:Gordan Prastalo,Kevin Maik Jablonka
摘要:Molecular representations are inherently task-dependent, yet most pre-trained molecular encoders are not. Task conditioning promises representations that reorganize based on task descriptions, but existing approaches rely on expensive labeled data. We show that weak supervision on programmatically derived molecular motifs is sufficient. Our Adaptive Chemical Embedding Model (ACE-Mol) learns from hundreds of motifs paired with natural language descriptors that are cheap to compute, trivial to scale. Conventional encoders slowly search the embedding space for task-relevant structure, whereas ACE-Mol immediately aligns its representations with the task. ACE-Mol achieves state-of-the-art performance across molecular property prediction benchmarks with interpretable, chemically meaningful representations.
【36】Universality of General Spiked Tensor Models
标题:一般尖峰张量模型的普遍性
链接:https://arxiv.org/abs/2602.04472
作者:Yanjin Xiang,Zhihua Zhang
备注:102pages
摘要:We study the rank-one spiked tensor model in the high-dimensional regime, where the noise entries are independent and identically distributed with zero mean, unit variance, and finite fourth moment.This setting extends the classical Gaussian framework to a substantially broader class of noise distributions.Focusing on asymmetric tensors of order $d$ ($\ge 3$), we analyze the maximum likelihood estimator of the best rank-one approximation.Under a mild assumption isolating informative critical points of the associated optimization landscape, we show that the empirical spectral distribution of a suitably defined block-wise tensor contraction converges almost surely to a deterministic limit that coincides with the Gaussian case.As a consequence, the asymptotic singular value and the alignments between the estimated and true spike directions admit explicit characterizations identical to those obtained under Gaussian noise. These results establish a universality principle for spiked tensor models, demonstrating that their high-dimensional spectral behavior and statistical limits are robust to non-Gaussian noise. Our analysis relies on resolvent methods from random matrix theory, cumulant expansions valid under finite moment assumptions, and variance bounds based on Efron-Stein-type arguments. A key challenge in the proof is how to handle the statistical dependence between the signal term and the noise term.
【37】Performative Learning Theory
标题:表演学习理论
链接:https://arxiv.org/abs/2602.04402
作者
:Julian Rodemann,Unai Fischer-Abaigar,James Bailie,Krikamol Muandet
备注:52 pages, 2 figures
摘要:Performative predictions influence the very outcomes they aim to forecast. We study performative predictions that affect a sample (e.g., only existing users of an app) and/or the whole population (e.g., all potential app users). This raises the question of how well models generalize under performativity. For example, how well can we draw insights about new app users based on existing users when both of them react to the app's predictions? We address this question by embedding performative predictions into statistical learning theory. We prove generalization bounds under performative effects on the sample, on the population, and on both. A key intuition behind our proofs is that in the worst case, the population negates predictions, while the sample deceptively fulfills them. We cast such self-negating and self-fulfilling predictions as min-max and min-min risk functionals in Wasserstein space, respectively. Our analysis reveals a fundamental trade-off between performatively changing the world and learning from it: the more a model affects data, the less it can learn from it. Moreover, our analysis results in a surprising insight on how to improve generalization guarantees by retraining on performatively distorted samples. We illustrate our bounds in a case study on prediction-informed assignments of unemployed German residents to job trainings, drawing upon administrative labor market records from 1975 to 2017 in Germany.
【38】Provable Target Sample Complexity Improvements as Pre-Trained Models Scale
标题:随着预训练模型规模的变化,可证明目标样本复杂性有所提高
链接:https://arxiv.org/abs/2602.04233
作者:Kazuto Fukuchi,Ryuichiro Hataya,Kota Matsui
备注:AISTATS2026
摘要:Pre-trained models have become indispensable for efficiently building models across a broad spectrum of downstream tasks. The advantages of pre-trained models have been highlighted by empirical studies on scaling laws, which demonstrate that larger pre-trained models can significantly reduce the sample complexity of downstream learning. However, existing theoretical investigations of pre-trained models lack the capability to explain this phenomenon. In this paper, we provide a theoretical investigation by introducing a novel framework, caulking, inspired by parameter-efficient fine-tuning (PEFT) methods such as adapter-based fine-tuning, low-rank adaptation, and partial fine-tuning. Our analysis establishes that improved pre-trained models provably decrease the sample complexity of downstream tasks, thereby offering theoretical justification for the empirically observed scaling laws relating pre-trained model size to downstream performance, a relationship not covered by existing results.
【39】Maximin Relative Improvement: Fair Learning as a Bargaining Problem
标题:最大化相对改进:公平学习作为讨价还价问题
链接:https://arxiv.org/abs/2602.04155
作者:Jiwoo Han,Moulinath Banerjee,Yuekai Sun
摘要:When deploying a single predictor across multiple subpopulations, we propose a fundamentally different approach: interpreting group fairness as a bargaining problem among subpopulations. This game-theoretic perspective reveals that existing robust optimization methods such as minimizing worst-group loss or regret correspond to classical bargaining solutions and embody different fairness principles. We propose relative improvement, the ratio of actual risk reduction to potential reduction from a baseline predictor, which recovers the Kalai-Smorodinsky solution. Unlike absolute-scale methods that may not be comparable when groups have different potential predictability, relative improvement provides axiomatic justification including scale invariance and individual monotonicity. We establish finite-sample convergence guarantees under mild conditions.
【40】A Multi-Modal Foundational Model for Wireless Communication and Sensing
标题:无线通信和传感的多模式基础模型
链接:https://arxiv.org/abs/2602.04016
作者:Vahid Yazdnian,Yasaman Ghasempour
摘要:Artificial intelligence is a key enabler for next-generation wireless communication and sensing. Yet, today's learning-based wireless techniques do not generalize well: most models are task-specific, environment-dependent, and limited to narrow sensing modalities, requiring costly retraining when deployed in new scenarios. This work introduces a task-agnostic, multi-modal foundational model for physical-layer wireless systems that learns transferable, physics-aware representations across heterogeneous modalities, enabling robust generalization across tasks and environments. Our framework employs a physics-guided self-supervised pretraining strategy incorporating a dedicated physical token to capture cross-modal physical correspondences governed by electromagnetic propagation. The learned representations enable efficient adaptation to diverse downstream tasks, including massive multi-antenna optimization, wireless channel estimation, and device localization, using limited labeled data. Our extensive evaluations demonstrate superior generalization, robustness to deployment shifts, and reduced data requirements compared to task-specific baselines.
【41】Learning Multi-type heterogeneous interacting particle systems
标题:学习多类型异类相互作用粒子系统
链接:https://arxiv.org/abs/2602.03954
作者:Quanjun Lang,Xiong Wang,Fei Lu,Mauro Maggioni
摘要:We propose a framework for the joint inference of network topology, multi-type interaction kernels, and latent type assignments in heterogeneous interacting particle systems from multi-trajectory data. This learning task is a challenging non-convex mixed-integer optimization problem, which we address through a novel three-stage approach. First, we leverage shared structure across agent interactions to recover a low-rank embedding of the system parameters via matrix sensing. Second, we identify discrete interaction types by clustering within the learned embedding. Third, we recover the network weight matrix and kernel coefficients through matrix factorization and a post-processing refinement. We provide theoretical guarantees with estimation error bounds under a Restricted Isometry Property (RIP) assumption and establish conditions for the exact recovery of interaction types based on cluster separability. Numerical experiments on synthetic datasets, including heterogeneous predator-prey systems, demonstrate that our method yields an accurate reconstruction of the underlying dynamics and is robust to noise.
【42】PENGUIN: General Vital Sign Reconstruction from PPG with Flow Matching State Space Model
标题:PENGUIN:利用流量匹配状态空间模型从PPV重建一般生命体征
链接:https://arxiv.org/abs/2602.03858
作者:Shuntaro Suzuki,Shuitsu Koyama,Shinnosuke Hirano,Shunya Nagashima
备注:Accepted for presentation at ICASSP2026
摘要:Photoplethysmography (PPG) plays a crucial role in continuous cardiovascular health monitoring as a non-invasive and cost-effective modality. However, PPG signals are susceptible to motion artifacts and noise, making accurate estimation of vital signs such as arterial blood pressure (ABP) challenging. Existing estimation methods are often restricted to a single-task or environment, limiting their generalizability across diverse PPG decoding scenarios. Moreover, recent general-purpose approaches typically rely on predictions over multi-second intervals, discarding the morphological characteristics of vital signs. To address these challenges, we propose PENGUIN, a generative flow-matching framework that extends deep state space models, enabling fine-grained conditioning on PPG for reconstructing multiple vital signs as continuous waveforms. We evaluate PENGUIN using six real-world PPG datasets across three distinct vital sign reconstruction tasks (electrocardiogram reconstruction, respiratory monitoring, and ABP monitoring). Our method consistently outperformed both task-specific and general-purpose baselines, demonstrating PENGUIN as a general framework for robust vital sign reconstruction from PPG.
【43】Majorization-Minimization Networks for Inverse Problems: An Application to EEG Imaging
标题:反问题的优化-最小化网络:在脑电成像中的应用
链接:https://arxiv.org/abs/2602.03855
作者:Le Minh Triet Tran,Sarah Reynaud,Ronan Fablet,Adrien Merlini,François Rousseau,Mai Quyen Pham
摘要:Inverse problems are often ill-posed and require optimization schemes with strong stability and convergence guarantees. While learning-based approaches such as deep unrolling and meta-learning achieve strong empirical performance, they typically lack explicit control over descent and curvature, limiting robustness. We propose a learned Majorization-Minimization (MM) framework for inverse problems within a bilevel optimization setting. Instead of learning a full optimizer, we learn a structured curvature majorant that governs each MM step while preserving classical MM descent guarantees. The majorant is parameterized by a lightweight recurrent neural network and explicitly constrained to satisfy valid MM conditions. For cosine-similarity losses, we derive explicit curvature bounds yielding diagonal majorants. When analytic bounds are unavailable, we rely on efficient Hessian-vector product-based spectral estimation to automatically upper-bound local curvature without forming the Hessian explicitly. Experiments on EEG source imaging demonstrate improved accuracy, stability, and cross-dataset generalization over deep-unrolled and meta-learning baselines.
其他(48篇)
【1】Multi-Head LatentMoE and Head Parallel: Communication-Efficient and Deterministic MoE Parallelism
标题:多头潜伏教育部和平行头部:沟通高效和确定性MoE平行原则
链接:https://arxiv.org/abs/2602.04870
作者:Chenwei Cui,Rockwell Jackson,Benjamin Joseph Herrera,Ana María Tárano,Hannah Kerner
摘要:Large language models have transformed many applications but remain expensive to train. Sparse Mixture of Experts (MoE) addresses this through conditional computation, with Expert Parallel (EP) as the standard distributed training method. However, EP has three limitations: communication cost grows linearly with the number of activated experts $k$, load imbalance affects latency and memory usage, and data-dependent communication requires metadata exchange. We propose Multi-Head LatentMoE and Head Parallel (HP), a new architecture and parallelism achieving $O(1)$ communication cost regardless of $k$, completely balanced traffic, and deterministic communication, all while remaining compatible with EP. To accelerate Multi-Head LatentMoE, we propose IO-aware routing and expert computation. Compared to MoE with EP, Multi-Head LatentMoE with HP trains up to $1.61\times$ faster while having identical performance. With doubled granularity, it achieves higher overall performance while still being $1.11\times$ faster. Our method makes multi-billion-parameter foundation model research more accessible.
【2】Subliminal Effects in Your Data: A General Mechanism via Log-Linearity
标题:数据中的潜意识效应:通过对线性的一般机制
链接:https://arxiv.org/abs/2602.04863
作者:Ishaq Aden-Ali,Noah Golowich,Allen Liu,Abhishek Shetty,Ankur Moitra,Nika Haghtalab
备注:Code available at https://github.com/ishaqadenali/logit-linear-selection
摘要:Training modern large language models (LLMs) has become a veritable smorgasbord of algorithms and datasets designed to elicit particular behaviors, making it critical to develop techniques to understand the effects of datasets on the model's properties. This is exacerbated by recent experiments that show datasets can transmit signals that are not directly observable from individual datapoints, posing a conceptual challenge for dataset-centric understandings of LLM training and suggesting a missing fundamental account of such phenomena. Towards understanding such effects, inspired by recent work on the linear structure of LLMs, we uncover a general mechanism through which hidden subtexts can arise in generic datasets. We introduce Logit-Linear-Selection (LLS), a method that prescribes how to select subsets of a generic preference dataset to elicit a wide range of hidden effects. We apply LLS to discover subsets of real-world datasets so that models trained on them exhibit behaviors ranging from having specific preferences, to responding to prompts in a different language not present in the dataset, to taking on a different persona. Crucially, the effect persists for the selected subset, across models with varying architectures, supporting its generality and universality.
【3】The Key to State Reduction in Linear Attention: A Rank-based Perspective
标题:国家减少线性注意力的关键:基于等级的视角
链接:https://arxiv.org/abs/2602.04852
作者:Philipp Nazari,T. Konstantin Rusch
摘要:Linear attention offers a computationally efficient yet expressive alternative to softmax attention. However, recent empirical results indicate that the state of trained linear attention models often exhibits a low-rank structure, suggesting that these models underexploit their capacity in practice. To illuminate this phenomenon, we provide a theoretical analysis of the role of rank in linear attention, revealing that low effective rank can affect retrieval error by amplifying query noise. In addition to these theoretical insights, we conjecture that the low-rank states can be substantially reduced post-training with only minimal performance degradation, yielding faster and more memory-efficient models. To this end, we propose a novel hardware-aware approach that structurally prunes key and query matrices, reducing the state size while retaining compatibility with existing CUDA kernels. We adapt several existing pruning strategies to fit our framework and, building on our theoretical analysis, propose a novel structured pruning method based on a rank-revealing QR decomposition. Our empirical results, evaluated across models of varying sizes and on various downstream tasks, demonstrate the effectiveness of our state reduction framework. We highlight that our framework enables the removal of 50% of the query and key channels at only a marginal increase in perplexity. The code for this project can be found at https://github.com/camail-official/LinearAttentionPruning.
【4】SE-Bench: Benchmarking Self-Evolution with Knowledge Internalization
标题:SE-Bench:以知识内化为基准自我进化
链接:https://arxiv.org/abs/2602.04811
作者:Jiarui Yuan,Tailin Jin,Weize Chen,Zeyuan Liu,Zhiyuan Liu,Maosong Sun
备注:Under review
摘要:True self-evolution requires agents to act as lifelong learners that internalize novel experiences to solve future problems. However, rigorously measuring this foundational capability is hindered by two obstacles: the entanglement of prior knowledge, where ``new'' knowledge may appear in pre-training data, and the entanglement of reasoning complexity, where failures may stem from problem difficulty rather than an inability to recall learned knowledge. We introduce SE-Bench, a diagnostic environment that obfuscates the NumPy library and its API doc into a pseudo-novel package with randomized identifiers. Agents are trained to internalize this package and evaluated on simple coding tasks without access to documentation, yielding a clean setting where tasks are trivial with the new API doc but impossible for base models without it. Our investigation reveals three insights: (1) the Open-Book Paradox, where training with reference documentation inhibits retention, requiring "Closed-Book Training" to force knowledge compression into weights; (2) the RL Gap, where standard RL fails to internalize new knowledge completely due to PPO clipping and negative gradients; and (3) the viability of Self-Play for internalization, proving models can learn from self-generated, noisy tasks when coupled with SFT, but not RL. Overall, SE-Bench establishes a rigorous diagnostic platform for self-evolution with knowledge internalization. Our code and dataset can be found at https://github.com/thunlp/SE-Bench.
【5】Maximum-Volume Nonnegative Matrix Factorization
标题:最大体积非负矩阵分解
链接:https://arxiv.org/abs/2602.04795
作者:Olivier Vu Thanh,Nicolas Gillis
备注:arXiv admin note: substantial text overlap with arXiv:2412.06380
摘要:Nonnegative matrix factorization (NMF) is a popular data embedding technique. Given a nonnegative data matrix $X$, it aims at finding two lower dimensional matrices, $W$ and $H$, such that $X\approx WH$, where the factors $W$ and $H$ are constrained to be element-wise nonnegative. The factor $W$ serves as a basis for the columns of $X$. In order to obtain more interpretable and unique solutions, minimum-volume NMF (MinVol NMF) minimizes the volume of $W$. In this paper, we consider the dual approach, where the volume of $H$ is maximized instead; this is referred to as maximum-volume NMF (MaxVol NMF). MaxVol NMF is identifiable under the same conditions as MinVol NMF in the noiseless case, but it behaves rather differently in the presence of noise. In practice, MaxVol NMF is much more effective to extract a sparse decomposition and does not generate rank-deficient solutions. In fact, we prove that the solutions of MaxVol NMF with the largest volume correspond to clustering the columns of $X$ in disjoint clusters, while the solutions of MinVol NMF with smallest volume are rank deficient. We propose two algorithms to solve MaxVol NMF. We also present a normalized variant of MaxVol NMF that exhibits better performance than MinVol NMF and MaxVol NMF, and can be interpreted as a continuum between standard NMF and orthogonal NMF. We illustrate our results in the context of hyperspectral unmixing.
【6】Decomposing Query-Key Feature Interactions Using Contrastive Covariances
标题:使用对比协方差分解查询关键特征交互
链接:https://arxiv.org/abs/2602.04752
作者:Andrew Lee,Yonatan Belinkov,Fernanda Viégas,Martin Wattenberg
摘要:Despite the central role of attention heads in Transformers, we lack tools to understand why a model attends to a particular token. To address this, we study the query-key (QK) space -- the bilinear joint embedding space between queries and keys. We present a contrastive covariance method to decompose the QK space into low-rank, human-interpretable components. It is when features in keys and queries align in these low-rank subspaces that high attention scores are produced. We first study our method both analytically and empirically in a simplified setting. We then apply our method to large language models to identify human-interpretable QK subspaces for categorical semantic features and binding features. Finally, we demonstrate how attention scores can be attributed to our identified features.
【7】Identifying Intervenable and Interpretable Features via Orthogonality Regularization
标题:通过渗透性规范化识别可干预和可解释特征
链接:https://arxiv.org/abs/2602.04718
作者:Moritz Miller,Florent Draye,Bernhard Schölkopf
摘要:With recent progress on fine-tuning language models around a fixed sparse autoencoder, we disentangle the decoder matrix into almost orthogonal features. This reduces interference and superposition between the features, while keeping performance on the target dataset essentially unchanged. Our orthogonality penalty leads to identifiable features, ensuring the uniqueness of the decomposition. Further, we find that the distance between embedded feature explanations increases with stricter orthogonality penalty, a desirable property for interpretability. Invoking the $\textit{Independent Causal Mechanisms}$ principle, we argue that orthogonality promotes modular representations amenable to causal intervention. We empirically show that these increasingly orthogonalized features allow for isolated interventions. Our code is available under $\texttt{https://github.com/mrtzmllr/sae-icm}$.
【8】Static and auto-regressive neural emulation of phytoplankton biomass dynamics from physical predictors in the global ocean
标题:来自全球海洋物理预报的浮游植物生物量动力学的静态和自回归神经模拟
链接:https://arxiv.org/abs/2602.04689
作者:Mahima Lakra,Ronan Fablet,Lucas Drumetz,Etienne Pauthenet,Elodie Martinez
摘要:Phytoplankton is the basis of marine food webs, driving both ecological processes and global biogeochemical cycles. Despite their ecological and climatic significance, accurately simulating phytoplankton dynamics remains a major challenge for biogeochemical numerical models due to limited parameterizations, sparse observational data, and the complexity of oceanic processes. Here, we explore how deep learning models can be used to address these limitations predicting the spatio-temporal distribution of phytoplankton biomass in the global ocean based on satellite observations and environmental conditions. First, we investigate several deep learning architectures. Among the tested models, the UNet architecture stands out for its ability to reproduce the seasonal and interannual patterns of phytoplankton biomass more accurately than other models like CNNs, ConvLSTM, and 4CastNet. When using one to two months of environmental data as input, UNet performs better, although it tends to underestimate the amplitude of low-frequency changes in phytoplankton biomass. Thus, to improve predictions over time, an auto-regressive version of UNet was also tested, where the model uses its own previous predictions to forecast future conditions. This approach works well for short-term forecasts (up to five months), though its performance decreases for longer time scales. Overall, our study shows that combining ocean physical predictors with deep learning allows for reconstruction and short-term prediction of phytoplankton dynamics. These models could become powerful tools for monitoring ocean health and supporting marine ecosystem management, especially in the context of climate change.
【9】RIGA-Fold: A General Framework for Protein Inverse Folding via Recurrent Interaction and Geometric Awareness
标题:RIGA-Fold:通过循环相互作用和几何意识实现蛋白质反向折叠的一般框架
链接:https://arxiv.org/abs/2602.04637
作者:Sisi Yuan,Jiehuang Chen,Junchuang Cai,Dong Xu,Xueliang Li,Zexuan Zhu,Junkai Ji
备注:16 pages, 4 figures. Includes appendix. Preprint under review
摘要:Protein inverse folding, the task of predicting amino acid sequences for desired structures, is pivotal for de novo protein design. However, existing GNN-based methods typically suffer from restricted receptive fields that miss long-range dependencies and a "single-pass" inference paradigm that leads to error accumulation. To address these bottlenecks, we propose RIGA-Fold, a framework that synergizes Recurrent Interaction with Geometric Awareness. At the micro-level, we introduce a Geometric Attention Update (GAU) module where edge features explicitly serve as attention keys, ensuring strictly SE(3)-invariant local encoding. At the macro-level, we design an attention-based Global Context Bridge that acts as a soft gating mechanism to dynamically inject global topological information. Furthermore, to bridge the gap between structural and sequence modalities, we introduce an enhanced variant, RIGA-Fold*, which integrates trainable geometric features with frozen evolutionary priors from ESM-2 and ESM-IF via a dual-stream architecture. Finally, a biologically inspired ``predict-recycle-refine'' strategy is implemented to iteratively denoise sequence distributions. Extensive experiments on CATH 4.2, TS50, and TS500 benchmarks demonstrate that our geometric framework is highly competitive, while RIGA-Fold* significantly outperforms state-of-the-art baselines in both sequence recovery and structural consistency.
【10】A Human-Centered Privacy Approach (HCP) to AI
标题:人工智能以人为本的隐私方法(HCP)
链接:https://arxiv.org/abs/2602.04616
作者:Luyi Sun,Wei Xu,Zaifeng Gao
摘要:As the paradigm of Human-Centered AI (HCAI) gains prominence, its benefits to society are accompanied by significant ethical concerns, one of which is the protection of individual privacy. This chapter provides a comprehensive overview of privacy within HCAI, proposing a human-centered privacy (HCP) framework, providing integrated solution from technology, ethics, and human factors perspectives. The chapter begins by mapping privacy risks across each stage of AI development lifecycle, from data collection to deployment and reuse, highlighting the impact of privacy risks on the entire system. The chapter then introduces privacy-preserving techniques such as federated learning and dif erential privacy. Subsequent chapters integrate the crucial user perspective by examining mental models, alongside the evolving regulatory and ethical landscapes as well as privacy governance. Next, advice on design guidelines is provided based on the human-centered privacy framework. After that, we introduce practical case studies across diverse fields. Finally, the chapter discusses persistent open challenges and future research directions, concluding that a multidisciplinary approach, merging technical, design, policy, and ethical expertise, is essential to successfully embed privacy into the core of HCAI, thereby ensuring these technologies advance in a manner that respects and ensures human autonomy, trust and dignity.
【11】Jacobian Regularization Stabilizes Long-Term Integration of Neural Differential Equations
标题:雅可比正规化稳定神经方程的长期积分
链接:https://arxiv.org/abs/2602.04608
作者:Maya Janvier,Julien Salomon,Etienne Meunier
摘要
:Hybrid models and Neural Differential Equations (NDE) are getting increasingly important for the modeling of physical systems, however they often encounter stability and accuracy issues during long-term integration. Training on unrolled trajectories is known to limit these divergences but quickly becomes too expensive due to the need for computing gradients over an iterative process. In this paper, we demonstrate that regularizing the Jacobian of the NDE model via its directional derivatives during training stabilizes long-term integration in the challenging context of short training rollouts. We design two regularizations, one for the case of known dynamics where we can directly derive the directional derivatives of the dynamic and one for the case of unknown dynamics where they are approximated using finite differences. Both methods, while having a far lower cost compared to long rollouts during training, are successful in improving the stability of long-term simulations for several ordinary and partial differential equations, opening up the door to training NDE methods for long-term integration of large scale systems.
【12】Trust The Typical
标题:相信典型
链接:https://arxiv.org/abs/2602.04581
作者:Debargha Ganguly,Sreehari Sankar,Biyao Zhang,Vikash Singh,Kanan Gupta,Harshini Kavuru,Alan Luo,Weicong Chen,Warren Morningstar,Raghu Machiraju,Vipin Chaudhary
摘要:Current approaches to LLM safety fundamentally rely on a brittle cat-and-mouse game of identifying and blocking known threats via guardrails. We argue for a fresh approach: robust safety comes not from enumerating what is harmful, but from deeply understanding what is safe. We introduce Trust The Typical (T3), a framework that operationalizes this principle by treating safety as an out-of-distribution (OOD) detection problem. T3 learns the distribution of acceptable prompts in a semantic space and flags any significant deviation as a potential threat. Unlike prior methods, it requires no training on harmful examples, yet achieves state-of-the-art performance across 18 benchmarks spanning toxicity, hate speech, jailbreaking, multilingual harms, and over-refusal, reducing false positive rates by up to 40x relative to specialized safety models. A single model trained only on safe English text transfers effectively to diverse domains and over 14 languages without retraining. Finally, we demonstrate production readiness by integrating a GPU-optimized version into vLLM, enabling continuous guardrailing during token generation with less than 6% overhead even under dense evaluation intervals on large-scale workloads.
【13】Rethinking Weight Tying: Pseudo-Inverse Tying for Stable LM Training and Updates
标题:重新思考体重绑定:用于稳定LM训练和更新的伪反向绑定
链接:https://arxiv.org/abs/2602.04556
作者:Jian Gu,Aldeida Aleti,Chunyang Chen,Hongyu Zhang
备注:an early-stage version
摘要:Weight tying is widely used in compact language models to reduce parameters by sharing the token table between the input embedding and the output projection. However, weight sharing does not guarantee a stable token interface: during training, the correspondence between encoding tokens into hidden states and decoding hidden states into logits can drift, worsening optimization sensitivity and making post-training interventions such as editing, patching, and lightweight adaptation less predictable. We propose Pseudo-Inverse Tying (PIT), which synchronizes embedding and unembedding as coupled projections of a shared latent token memory, guaranteeing a pseudo-inverse-consistent interface throughout training. PIT maintains an orthonormal shared memory, obtained by thin polar decomposition for teacher initialization or random orthonormal initialization from scratch, and introduces a fully learned symmetric positive definite hidden-space transform parameterized via a Cholesky factor. The output head applies this transform to hidden states before the vocabulary projection, while the embedding applies the inverse transform to token vectors using stable triangular solves, avoiding explicit pseudo-inverse recomputation and any vocabulary-sized auxiliary parameters. We evaluate PIT on on-device models spanning 256M-1.3B parameters across pretraining and adaptation, and consistently observe improved training stability, stronger layerwise semantic consistency, and substantially reduced side effects.
【14】Greedy-Gnorm: A Gradient Matrix Norm-Based Alternative to Attention Entropy for Head Pruning
标题:Greedy-Gnorm:一种基于梯度矩阵规范的注意力熵替代方案
链接:https://arxiv.org/abs/2602.04491
作者:Yuxi Guo,Paul Sheridan
备注:24 pages, 5 figures, 5 tables
摘要:Attention head pruning has emerged as an effective technique for transformer model compression, an increasingly important goal in the era of Green AI. However, existing pruning methods often rely on static importance scores, which fail to capture the evolving role of attention heads during iterative removal. We propose Greedy-Gradient norm (Greedy-Gnorm), a novel head pruning algorithm that dynamically recalculates head importance after each pruning step. Specifically, each head is scored by the elementwise product of the l2-norms of its Q/K/V gradient blocks, as estimated from a hold-out validation set and updated at every greedy iteration. This dynamic approach to scoring mitigates against stale rankings and better reflects gradient-informed importance as pruning progresses. Extensive experiments on BERT, ALBERT, RoBERTa, and XLM-RoBERTa demonstrate that Greedy-Gnorm consistently preserves accuracy under substantial head removal, outperforming attention entropy. By effectively reducing model size while maintaining task performance, Greedy-Gnorm offers a promising step toward more energy-efficient transformer model deployment.
【15】No One-Size-Fits-All: Building Systems For Translation to Bashkir, Kazakh, Kyrgyz, Tatar and Chuvash Using Synthetic And Original Data
标题:没有一刀切:使用合成和原始数据将巴什基尔语、哈萨克语、吉尔吉斯语、鞑靼语和楚瓦什语翻译成建筑系统
链接:https://arxiv.org/abs/2602.04442
作者:Dmitry Karpov
备注:Accepted to EACL 2026 (LoResMT workshop)
摘要
:We explore machine translation for five Turkic language pairs: Russian-Bashkir, Russian-Kazakh, Russian-Kyrgyz, English-Tatar, English-Chuvash. Fine-tuning nllb-200-distilled-600M with LoRA on synthetic data achieved chrF++ 49.71 for Kazakh and 46.94 for Bashkir. Prompting DeepSeek-V3.2 with retrieved similar examples achieved chrF++ 39.47 for Chuvash. For Tatar, zero-shot or retrieval-based approaches achieved chrF++ 41.6, while for Kyrgyz the zero-shot approach reached 45.6. We release the dataset and the obtained weights.
【16】MaMa: A Game-Theoretic Approach for Designing Safe Agentic Systems
标题:MaMa:设计安全统计系统的游戏理论方法
链接:https://arxiv.org/abs/2602.04431
作者:Jonathan Nöther,Adish Singla,Goran Radanovic
摘要:LLM-based multi-agent systems have demonstrated impressive capabilities, but they also introduce significant safety risks when individual agents fail or behave adversarially. In this work, we study the automated design of agentic systems that remain safe even when a subset of agents is compromised. We formalize this challenge as a Stackelberg security game between a system designer (the Meta-Agent) and a best-responding Meta-Adversary that selects and compromises a subset of agents to minimize safety. We propose Meta-Adversary-Meta-Agent (MaMa), a novel algorithm for approximately solving this game and automatically designing safe agentic systems. Our approach uses LLM-based adversarial search, where the Meta-Agent iteratively proposes system designs and receives feedback based on the strongest attacks discovered by the Meta-Adversary. Empirical evaluations across diverse environments show that systems designed with MaMa consistently defend against worst-case attacks while maintaining performance comparable to systems optimized solely for task success. Moreover, the resulting systems generalize to stronger adversaries, as well as ones with different attack objectives or underlying LLMs, demonstrating robust safety beyond the training setting.
【17】Separation-Utility Pareto Frontier: An Information-Theoretic Characterization
标题:分离效用帕累托前沿:信息论的描述
链接:https://arxiv.org/abs/2602.04408
作者:Shizhou Xu
摘要:We study the Pareto frontier (optimal trade-off) between utility and separation, a fairness criterion requiring predictive independence from sensitive attributes conditional on the true outcome. Through an information-theoretic lens, we prove a characterization of the utility-separation Pareto frontier, establish its concavity, and thereby prove the increasing marginal cost of separation in terms of utility. In addition, we characterize the conditions under which this trade-off becomes strict, providing a guide for trade-off selection in practice. Based on the theoretical characterization, we develop an empirical regularizer based on conditional mutual information (CMI) between predictions and sensitive attributes given the true outcome. The CMI regularizer is compatible with any deep model trained via gradient-based optimization and serves as a scalar monitor of residual separation violations, offering tractable guarantees during training. Finally, numerical experiments support our theoretical findings: across COMPAS, UCI Adult, UCI Bank, and CelebA, the proposed method substantially reduces separation violations while matching or exceeding the utility of established baseline methods. This study thus offers a provable, stable, and flexible approach to enforcing separation in deep learning.
【18】Reducing the labeling burden in time-series mapping using Common Ground: a semi-automated approach to tracking changes in land cover and species over time
标题:使用Common Ground减少时间序列制图中的标签负担:跟踪土地覆盖和物种随时间变化的半自动方法
链接:https://arxiv.org/abs/2602.04373
作者:Geethen Singh,Jasper A Slingsby,Tamara B Robinson,Glenn Moncrieff
摘要:Reliable classification of Earth Observation data depends on consistent, up-to-date reference labels. However, collecting new labelled data at each time step remains expensive and logistically difficult, especially in dynamic or remote ecological systems. As a response to this challenge, we demonstrate that a model with access to reference data solely from time step t0 can perform competitively on both t0 and a future time step t1, outperforming models trained separately on time-specific reference data (the gold standard). This finding suggests that effective temporal generalization can be achieved without requiring manual updates to reference labels beyond the initial time step t0. Drawing on concepts from change detection and semi-supervised learning (SSL), the most performant approach, "Common Ground", uses a semi-supervised framework that leverages temporally stable regions-areas with little to no change in spectral or semantic characteristics between time steps-as a source of implicit supervision for dynamic regions. We evaluate this strategy across multiple classifiers, sensors (Landsat-8, Sentinel-2 satellite multispectral and airborne imaging spectroscopy), and ecological use cases. For invasive tree species mapping, we observed a 21-40% improvement in classification accuracy using Common Ground compared to naive temporal transfer, where models trained at a single time step are directly applied to a future time step. We also observe a 10 -16% higher accuracy for the introduced approach compared to a gold-standard approach. In contrast, when broad land cover categories were mapped across Europe, we observed a more modest 2% increase in accuracy compared to both the naive and gold-standard approaches. These results underscore the effectiveness of combining stable reference screening with SSL for scalable and label-efficient multi-temporal remote sensing classification.
【19】MirrorLA: Reflecting Feature Map for Vision Linear Attention
标题:CLARLA:反映视觉线性注意力的特征地图
链接:https://arxiv.org/abs/2602.04346
作者:Weikang Meng,Liangyu Huo,Yadan Luo,Yaowei Wang,Yingjian Li,Zheng Zhang
摘要:Linear attention significantly reduces the computational complexity of Transformers from quadratic to linear, yet it consistently lags behind softmax-based attention in performance. We identify the root cause of this degradation as the non-negativity constraint imposed on kernel feature maps: standard projections like ReLU act as "passive truncation" operators, indiscriminately discarding semantic information residing in the negative domain. We propose MirrorLA, a geometric framework that substitutes passive truncation with active reorientation. By leveraging learnable Householder reflections, MirrorLA rotates the feature geometry into the non-negative orthant to maximize information retention. Our approach restores representational density through a cohesive, multi-scale design: it first optimizes local discriminability via block-wise isometries, stabilizes long-context dynamics using variance-aware modulation to diversify activations, and finally, integrates dispersed subspaces via cross-head reflections to induce global covariance mixing. MirrorLA achieves state-of-the-art performance across standard benchmarks, demonstrating that strictly linear efficiency can be achieved without compromising representational fidelity.
【20】UnMaskFork: Test-Time Scaling for Masked Diffusion via Deterministic Action Branching
标题:UnMaskFork:通过确定性动作分支进行掩蔽扩散的测试时缩放
链接:https://arxiv.org/abs/2602.04344
作者:Kou Misaki,Takuya Akiba
摘要:Test-time scaling strategies have effectively leveraged inference-time compute to enhance the reasoning abilities of Autoregressive Large Language Models. In this work, we demonstrate that Masked Diffusion Language Models (MDLMs) are inherently amenable to advanced search strategies, owing to their iterative and non-autoregressive generation process. To leverage this, we propose UnMaskFork (UMF), a framework that formulates the unmasking trajectory as a search tree and employs Monte Carlo Tree Search to optimize the generation path. In contrast to standard scaling methods relying on stochastic sampling, UMF explores the search space through deterministic partial unmasking actions performed by multiple MDLMs. Our empirical evaluation demonstrates that UMF consistently outperforms existing test-time scaling baselines on complex coding benchmarks, while also exhibiting strong scalability on mathematical reasoning tasks.
【21】Disentangling Causal Importance from Emergent Structure in Multi-Expert Orchestration
标题:多专家规划中因果关系的重要性与紧急结构的分离
链接:https://arxiv.org/abs/2602.04291
作者:Sudipto Ghosh,Sujoy Nath,Sunny Manchanda,Tanmoy Chakraborty
摘要:Multi-expert systems, where multiple Large Language Models (LLMs) collaborate to solve complex tasks, are increasingly adopted for high-performance reasoning and generation. However, the orchestration policies governing expert interaction and sequencing remain largely opaque. We introduce INFORM, an interpretability analysis that treats orchestration as an explicit, analyzable computation, enabling the decoupling of expert interaction structure, execution order, and causal attribution. We use INFORM to evaluate an orchestrator on GSM8K, HumanEval, and MMLU using a homogeneous consortium of ten instruction-tuned experts drawn from LLaMA-3.1 8B, Qwen-3 8B, and DeepSeek-R1 8B, with controlled decoding-temperature variation, and a secondary heterogeneous consortium spanning 1B-7B parameter models. Across tasks, routing dominance is a poor proxy for functional necessity. We reveal a divergence between relational importance, captured by routing mass and interaction topology, and intrinsic importance, measured via gradient-based causal attribution: frequently selected experts often act as interaction hubs with limited causal influence, while sparsely routed experts can be structurally critical. Orchestration behaviors emerge asynchronously, with expert centralization preceding stable routing confidence and expert ordering remaining non-deterministic. Targeted ablations show that masking intrinsically important experts induces disproportionate collapse in interaction structure compared to masking frequent peers, confirming that INFORM exposes causal and structural dependencies beyond accuracy metrics alone.
【22】From Ambiguity to Action: A POMDP Perspective on Partial Multi-Label Ambiguity and Its Horizon-One Resolution
标题:从模糊性到行动:从POMDP角度看部分多标签模糊性及其视野-一号解决方案
链接:https://arxiv.org/abs/2602.04255
作者:Hanlin Pan,Yuhao Tang,Wanfu Gao
摘要:In partial multi-label learning (PML), the true labels are unobserved, which makes label disambiguation important but difficult. A key challenge is that ambiguous candidate labels can propagate errors into downstream tasks such as feature engineering. To solve this issue, we jointly model the disambiguation and feature selection tasks as Partially Observable Markov Decision Processes (POMDP) to turn PML risk minimization into expected-return maximization. Stage 1 trains a transformer policy via reinforcement learning to produce high-quality hard pseudo-labels; Stage 2 describes feature selection as a sequential reinforcement learning problem, selecting features step by step and outputting an interpretable global ranking. We further provide the theoretical analysis of PML-POMDP correspondence and the excess-risk bound that decompose the error into pseudo label quality term and sample size. Experiments in multiple metrics and data sets verify the advantages of the framework.
【23】OAT: Ordered Action Tokenization
标题:OAT:有序动作代币化
链接:https://arxiv.org/abs/2602.04215
作者:Chaoqi Liu,Xiaoshen Han,Jiawei Gao,Yue Zhao,Haonan Chen,Yilun Du
摘要:Autoregressive policies offer a compelling foundation for scalable robot learning by enabling discrete abstraction, token-level reasoning, and flexible inference. However, applying autoregressive modeling to continuous robot actions requires an effective action tokenization scheme. Existing approaches either rely on analytical discretization methods that produce prohibitively long token sequences, or learned latent tokenizers that lack structure, limiting their compatibility with next-token prediction. In this work, we identify three desiderata for action tokenization - high compression, total decodability, and a left-to-right causally ordered token space - and introduce Ordered Action Tokenization (OAT), a learned action tokenizer that satisfies all three. OAT discretizes action chunks into an ordered sequence of tokens using transformer with registers, finite scalar quantization, and ordering-inducing training mechanisms. The resulting token space aligns naturally with autoregressive generation and enables prefix-based detokenization, yielding an anytime trade-off between inference cost and action fidelity. Across more than 20 tasks spanning four simulation benchmarks and real-world settings, autoregressive policies equipped with OAT consistently outperform prior tokenization schemes and diffusion-based baselines, while offering significantly greater flexibility at inference time.
【24】From Sparse Sensors to Continuous Fields: STRIDE for Spatiotemporal Reconstruction
标题:从稀疏传感器到连续场:用于时空重建的TARIDE
链接:https://arxiv.org/abs/2602.04201
作者
:Yanjie Tong,Peng Chen
摘要:Reconstructing high-dimensional spatiotemporal fields from sparse point-sensor measurements is a central challenge in learning parametric PDE dynamics. Existing approaches often struggle to generalize across trajectories and parameter settings, or rely on discretization-tied decoders that do not naturally transfer across meshes and resolutions. We propose STRIDE (Spatio-Temporal Recurrent Implicit DEcoder), a two-stage framework that maps a short window of sensor measurements to a latent state with a temporal encoder and reconstructs the field at arbitrary query locations with a modulated implicit neural representation (INR) decoder. Using the Fourier Multi-Component and Multi-Layer Neural Network (FMMNN) as the INR backbone improves representation of complex spatial fields and yields more stable optimization than sine-based INRs. We provide a conditional theoretical justification: under stable delay observability of point measurements on a low-dimensional parametric invariant set, the reconstruction operator factors through a finite-dimensional embedding, making STRIDE-type architectures natural approximators. Experiments on four challenging benchmarks spanning chaotic dynamics and wave propagation show that STRIDE outperforms strong baselines under extremely sparse sensing, supports super-resolution, and remains robust to noise.
【25】The Missing Half: Unveiling Training-time Implicit Safety Risks Beyond Deployment
标题:缺失的一半:揭露训练时部署之外的隐性安全风险
链接:https://arxiv.org/abs/2602.04196
作者:Zhexin Zhang,Yida Lu,Junfeng Fang,Junxiao Yang,Shiyao Cui,Hao Zhou,Fandong Meng,Jie Zhou,Hongning Wang,Minlie Huang,Tat-Seng Chua
摘要:Safety risks of AI models have been widely studied at deployment time, such as jailbreak attacks that elicit harmful outputs. In contrast, safety risks emerging during training remain largely unexplored. Beyond explicit reward hacking that directly manipulates explicit reward functions in reinforcement learning, we study implicit training-time safety risks: harmful behaviors driven by a model's internal incentives and contextual background information. For example, during code-based reinforcement learning, a model may covertly manipulate logged accuracy for self-preservation. We present the first systematic study of this problem, introducing a taxonomy with five risk levels, ten fine-grained risk categories, and three incentive types. Extensive experiments reveal the prevalence and severity of these risks: notably, Llama-3.1-8B-Instruct exhibits risky behaviors in 74.4% of training runs when provided only with background information. We further analyze factors influencing these behaviors and demonstrate that implicit training-time risks also arise in multi-agent training settings. Our results identify an overlooked yet urgent safety challenge in training.
【26】Topology-Aware Revival for Efficient Sparse Training
标题:实现高效稀疏训练的Topology感知复兴
链接:https://arxiv.org/abs/2602.04166
作者:Meiling Jin,Fei Wang,Xiaoyun Yuan,Chen Qian,Yuan Cheng
摘要:Static sparse training is a promising route to efficient learning by committing to a fixed mask pattern, yet the constrained structure reduces robustness. Early pruning decisions can lock the network into a brittle structure that is difficult to escape, especially in deep reinforcement learning (RL) where the evolving policy continually shifts the training distribution. We propose Topology-Aware Revival (TAR), a lightweight one-shot post-pruning procedure that improves static sparsity without dynamic rewiring. After static pruning, TAR performs a single revival step by allocating a small reserve budget across layers according to topology needs, randomly uniformly reactivating a few previously pruned connections within each layer, and then keeping the resulting connectivity fixed for the remainder of training. Across multiple continuous-control tasks with SAC and TD3, TAR improves final return over static sparse baselines by up to +37.9% and also outperforms dynamic sparse training baselines with a median gain of +13.5%.
【27】Generative Neural Operators through Diffusion Last Layer
标题:通过扩散最后层的生成神经运算符
链接:https://arxiv.org/abs/2602.04139
作者:Sungwon Park,Anthony Zhou,Hongjoong Kim,Amir Barati Farimani
摘要:Neural operators have emerged as a powerful paradigm for learning discretization-invariant function-to-function mappings in scientific computing. However, many practical systems are inherently stochastic, making principled uncertainty quantification essential for reliable deployment. To address this, we introduce a simple add-on, the diffusion last layer (DLL), a lightweight probabilistic head that can be attached to arbitrary neural operator backbones to model predictive uncertainty. Motivated by the relative smoothness and low-dimensional structure often exhibited by PDE solution distributions, DLL parameterizes the conditional output distribution directly in function space through a low-rank Karhunen-Loève expansion, enabling efficient and expressive uncertainty modeling. Across stochastic PDE operator learning benchmarks, DLL improves generalization and uncertainty-aware prediction. Moreover, even in deterministic long-horizon rollout settings, DLL enhances rollout stability and provides meaningful estimates of epistemic uncertainty for backbone neural operators.
【28】Lyapunov Constrained Soft Actor-Critic (LC-SAC) using Koopman Operator Theory for Quadrotor Trajectory Tracking
标题:使用Koopman操作理论进行四螺旋桨轨迹跟踪的李亚普诺夫约束软行为者-批评者(LC-SAC)
链接:https://arxiv.org/abs/2602.04132
作者:Dhruv S. Kushwaha,Zoleikha A. Biron
备注:12 pages, 7 Figures, submitted to IEEE RA-L
摘要
:Reinforcement Learning (RL) has achieved remarkable success in solving complex sequential decision-making problems. However, its application to safety-critical physical systems remains constrained by the lack of stability guarantees. Standard RL algorithms prioritize reward maximization, often yielding policies that may induce oscillations or unbounded state divergence. There has significant work in incorporating Lyapunov-based stability guarantees in RL algorithms with key challenges being selecting a candidate Lyapunov function, computational complexity by using excessive function approximators and conservative policies by incorporating stability criterion in the learning process. In this work we propose a novel Lyapunov-constrained Soft Actor-Critic (LC-SAC) algorithm using Koopman operator theory. We propose use of extended dynamic mode decomposition (EDMD) to produce a linear approximation of the system and use this approximation to derive a closed form solution for candidate Lyapunov function. This derived Lyapunov function is incorporated in the SAC algorithm to further provide guarantees for a policy that stabilizes the nonlinear system. The results are evaluated trajectory tracking of a 2D Quadrotor environment based on safe-control-gym. The proposed algorithm shows training convergence and decaying violations for Lyapunov stability criterion compared to baseline vanilla SAC algorithm. GitHub Repository: https://github.com/DhruvKushwaha/LC-SAC-Quadrotor-Trajectory-Tracking
【29】Scalable Explainability-as-a-Service (XaaS) for Edge AI Systems
标题:边缘人工智能系统的可扩展解释性即服务(XASS)
链接:https://arxiv.org/abs/2602.04120
作者:Samaresh Kumar Singh,Joyjit Roy
备注:8 pages, 5 figures, submitted and accepted in the conference IEEE SoutheastCon 2026
摘要:Though Explainable AI (XAI) has made significant advancements, its inclusion in edge and IoT systems is typically ad-hoc and inefficient. Most current methods are "coupled" in such a way that they generate explanations simultaneously with model inferences. As a result, these approaches incur redundant computation, high latency and poor scalability when deployed across heterogeneous sets of edge devices. In this work we propose Explainability-as-a-Service (XaaS), a distributed architecture for treating explainability as a first-class system service (as opposed to a model-specific feature). The key innovation in our proposed XaaS architecture is that it decouples inference from explanation generation allowing edge devices to request, cache and verify explanations subject to resource and latency constraints. To achieve this, we introduce three main innovations: (1) A distributed explanation cache with a semantic similarity based explanation retrieval method which significantly reduces redundant computation; (2) A lightweight verification protocol that ensures the fidelity of both cached and newly generated explanations; and (3) An adaptive explanation engine that chooses explanation methods based upon device capability and user requirement. We evaluated the performance of XaaS on three real-world edge-AI use cases: (i) manufacturing quality control; (ii) autonomous vehicle perception; and (iii) healthcare diagnostics. Experimental results show that XaaS reduces latency by 38\% while maintaining high explanation quality across three real-world deployments. Overall, this work enables the deployment of transparent and accountable AI across large scale, heterogeneous IoT systems, and bridges the gap between XAI research and edge-practicality.
【30】ZKBoost: Zero-Knowledge Verifiable Training for XGBoost
标题:ZKBoost:XGBoost的零知识可验证训练
链接:https://arxiv.org/abs/2602.04113
作者:Nikolas Melissaris,Jiayi Xu,Antigoni Polychroniadou,Akira Takahashi,Chenkai Weng
摘要:Gradient boosted decision trees, particularly XGBoost, are among the most effective methods for tabular data. As deployment in sensitive settings increases, cryptographic guarantees of model integrity become essential. We present ZKBoost, the first zero-knowledge proof of training (zkPoT) protocol for XGBoost, enabling model owners to prove correct training on a committed dataset without revealing data or parameters. We make three key contributions: (1) a fixed-point XGBoost implementation compatible with arithmetic circuits, enabling instantiation of efficient zkPoT, (2) a generic template of zkPoT for XGBoost, which can be instantiated with any general-purpose ZKP backend, and (3) vector oblivious linear evaluation (VOLE)-based instantiation resolving challenges in proving nonlinear fixed-point operations. Our fixed-point implementation matches standard XGBoost accuracy within 1\% while enabling practical zkPoT on real-world datasets.
【31】Agentic AI-Empowered Dynamic Survey Framework
标题:大型人工智能动态调查框架
链接:https://arxiv.org/abs/2602.04071
作者:Furkan Mumcu,Lokman Bekit,Michael J. Jones,Anoop Cherian,Yasin Yilmaz
摘要:Survey papers play a central role in synthesizing and organizing scientific knowledge, yet they are increasingly strained by the rapid growth of research output. As new work continues to appear after publication, surveys quickly become outdated, contributing to redundancy and fragmentation in the literature. We reframe survey writing as a long-horizon maintenance problem rather than a one-time generation task, treating surveys as living documents that evolve alongside the research they describe. We propose an agentic Dynamic Survey Framework that supports the continuous updating of existing survey papers by incrementally integrating new work while preserving survey structure and minimizing unnecessary disruption. Using a retrospective experimental setup, we demonstrate that the proposed framework effectively identifies and incorporates emerging research while preserving the coherence and structure of existing surveys.
【32】Non-linear PCA via Evolution Strategies: a Novel Objective Function
标题:通过进化策略的非线性PCA:一种新型目标函数
链接:https://arxiv.org/abs/2602.03967
作者:Thomas Uriot,Elise Chung
摘要:Principal Component Analysis (PCA) is a powerful and popular dimensionality reduction technique. However, due to its linear nature, it often fails to capture the complex underlying structure of real-world data. While Kernel PCA (kPCA) addresses non-linearity, it sacrifices interpretability and struggles with hyperparameter selection. In this paper, we propose a robust non-linear PCA framework that unifies the interpretability of PCA with the flexibility of neural networks. Our method parametrizes variable transformations via neural networks, optimized using Evolution Strategies (ES) to handle the non-differentiability of eigendecomposition. We introduce a novel, granular objective function that maximizes the individual variance contribution of each variable providing a stronger learning signal than global variance maximization. This approach natively handles categorical and ordinal variables without the dimensional explosion associated with one-hot encoding. We demonstrate that our method significantly outperforms both linear PCA and kPCA in explained variance across synthetic and real-world datasets. At the same time, it preserves PCA's interpretability, enabling visualization and analysis of feature contributions using standard tools such as biplots. The code can be found on GitHub.
【33】C-IDS: Solving Contextual POMDP via Information-Directed Objective
标题:C-IDS:基于信息导向目标的上下文POMDP求解
链接:https://arxiv.org/abs/2602.03939
作者:Chongyang Shi,Michael Dorothy,Jie Fu
摘要:We study the policy synthesis problem in contextual partially observable Markov decision processes (CPOMDPs), where the environment is governed by an unknown latent context that induces distinct POMDP dynamics. Our goal is to design a policy that simultaneously maximizes cumulative return and actively reduces uncertainty about the underlying context. We introduce an information-directed objective that augments reward maximization with mutual information between the latent context and the agent's observations. We develop the C-IDS algorithm to synthesize policies that maximize the information-directed objective. We show that the objective can be interpreted as a Lagrangian relaxation of the linear information ratio and prove that the temperature parameter is an upper bound on the information ratio. Based on this characterization, we establish a sublinear Bayesian regret bound over K episodes. We evaluate our approach on a continuous Light-Dark environment and show that it consistently outperforms standard POMDP solvers that treat the unknown context as a latent state variable, achieving faster context identification and higher returns.
【34】Online Vector Quantized Attention
标题:在线载体量化注意力
链接:https://arxiv.org/abs/2602.03922
作者:Nick Alonso,Tomas Figliolia,Beren Millidge
摘要:Standard sequence mixing layers used in language models struggle to balance efficiency and performance. Self-attention performs well on long context tasks but has expensive quadratic compute and linear memory costs, while linear attention and SSMs use only linear compute and constant memory but struggle with long context processing. In this paper, we develop a sequence mixing layer that aims to find a better compromise between memory-compute costs and long-context processing, which we call online vector-quantized (OVQ) attention. OVQ-attention requires linear compute costs and constant memory, but, unlike linear attention and SSMs, it uses a sparse memory update that allows it to greatly increase the size of its memory state and, consequently, memory capacity. We develop a theoretical basis for OVQ-attention based on Gaussian mixture regression, and we test it on a variety of synthetic long context tasks and on long context language modeling. OVQ-attention shows significant improvements over linear attention baselines and the original VQ-attention, on which OVQ-attention was inspired. It demonstrates competitive, and sometimes identical, performance to strong self-attention baselines up 64k sequence length, despite using a small fraction of the memory of full self-attention.
【35】SpecMD: A Comprehensive Study On Speculative Expert Prefetching
标题:SpecMD:关于推测专家预取的综合研究
链接:https://arxiv.org/abs/2602.03921
作者:Duc Hoang,Ajay Jaiswal,Mohammad Samragh,Minsik Cho
摘要:Mixture-of-Experts (MoE) models enable sparse expert activation, meaning that only a subset of the model's parameters is used during each inference. However, to translate this sparsity into practical performance, an expert caching mechanism is required. Previous works have proposed hardware-centric caching policies, but how these various caching policies interact with each other and different hardware specification remains poorly understood. To address this gap, we develop \textbf{SpecMD}, a standardized framework for benchmarking ad-hoc cache policies on various hardware configurations. Using SpecMD, we perform an exhaustive benchmarking of several MoE caching strategies, reproducing and extending prior approaches in controlled settings with realistic constraints. Our experiments reveal that MoE expert access is not consistent with temporal locality assumptions (e.g LRU, LFU). Motivated by this observation, we propose \textbf{Least-Stale}, a novel eviction policy that exploits MoE's predictable expert access patterns to reduce collision misses by up to $85\times$ over LRU. With such gains, we achieve over $88\%$ hit rates with up to $34.7\%$ Time-to-first-token (TTFT) reduction on OLMoE at only $5\%$ or $0.6GB$ of VRAM cache capacity.
【36】Causal Discovery for Cross-Sectional Data Based on Super-Structure and Divide-and-Conquer
标题:基于超结构和分治的横断面数据因果发现
链接:https://arxiv.org/abs/2602.03914
作者:Wenyu Wang,Yaping Wan
备注:7 pages,16 figures
摘要:This paper tackles a critical bottleneck in Super-Structure-based divide-and-conquer causal discovery: the high computational cost of constructing accurate Super-Structures--particularly when conditional independence (CI) tests are expensive and domain knowledge is unavailable. We propose a novel, lightweight framework that relaxes the strict requirements on Super-Structure construction while preserving the algorithmic benefits of divide-and-conquer. By integrating weakly constrained Super-Structures with efficient graph partitioning and merging strategies, our approach substantially lowers CI test overhead without sacrificing accuracy. We instantiate the framework in a concrete causal discovery algorithm and rigorously evaluate its components on synthetic data. Comprehensive experiments on Gaussian Bayesian networks, including magic-NIAB, ECOLI70, and magic-IRRI, demonstrate that our method matches or closely approximates the structural accuracy of PC and FCI while drastically reducing the number of CI tests. Further validation on the real-world China Health and Retirement Longitudinal Study (CHARLS) dataset confirms its practical applicability. Our results establish that accurate, scalable causal discovery is achievable even under minimal assumptions about the initial Super-Structure, opening new avenues for applying divide-and-conquer methods to large-scale, knowledge-scarce domains such as biomedical and social science research.
【37】GeoIB: Geometry-Aware Information Bottleneck via Statistical-Manifold Compression
标题:GeoIB:通过统计Manifold压缩的几何感知信息瓶颈
链接:https://arxiv.org/abs/2602.03906
作者:Weiqi Wang,Zhiyi Tian,Chenhan Zhang,Shui Yu
摘要:Information Bottleneck (IB) is widely used, but in deep learning, it is usually implemented through tractable surrogates, such as variational bounds or neural mutual information (MI) estimators, rather than directly controlling the MI I(X;Z) itself. The looseness and estimator-dependent bias can make IB "compression" only indirectly controlled and optimization fragile. We revisit the IB problem through the lens of information geometry and propose a \textbf{Geo}metric \textbf{I}nformation \textbf{B}ottleneck (\textbf{GeoIB}) that dispenses with mutual information (MI) estimation. We show that I(X;Z) and I(Z;Y) admit exact projection forms as minimal Kullback-Leibler (KL) distances from the joint distributions to their respective independence manifolds. Guided by this view, GeoIB controls information compression with two complementary terms: (i) a distribution-level Fisher-Rao (FR) discrepancy, which matches KL to second order and is reparameterization-invariant; and (ii) a geometry-level Jacobian-Frobenius (JF) term that provides a local capacity-type upper bound on I(Z;X) by penalizing pullback volume expansion of the encoder. We further derive a natural-gradient optimizer consistent with the FR metric and prove that the standard additive natural-gradient step is first-order equivalent to the geodesic update. We conducted extensive experiments and observed that the GeoIB achieves a better trade-off between prediction accuracy and compression ratio in the information plane than the mainstream IB baselines on popular datasets. GeoIB improves invariance and optimization stability by unifying distributional and geometric regularization under a single bottleneck multiplier. The source code of GeoIB is released at "https://anonymous.4open.science/r/G-IB-0569".
【38】NeuroPareto: Calibrated Acquisition for Costly Many-Goal Search in Vast Parameter Spaces
标题:NeuroPareto:用于在巨大参数空间中进行昂贵的多目标搜索的校准采集
链接:https://arxiv.org/abs/2602.03901
作者:Rong Fu,Wenxin Zhang,Chunlei Meng,Youjin Wang,Haoyu Zhao,Jiaxuan Lu,Kun Liu,JiaBao Dou,Simon James Fong
备注:39 pages, 19 figures
摘要:The pursuit of optimal trade-offs in high-dimensional search spaces under stringent computational constraints poses a fundamental challenge for contemporary multi-objective optimization. We develop NeuroPareto, a cohesive architecture that integrates rank-centric filtering, uncertainty disentanglement, and history-conditioned acquisition strategies to navigate complex objective landscapes. A calibrated Bayesian classifier estimates epistemic uncertainty across non-domination tiers, enabling rapid generation of high-quality candidates with minimal evaluation cost. Deep Gaussian Process surrogates further separate predictive uncertainty into reducible and irreducible components, providing refined predictive means and risk-aware signals for downstream selection. A lightweight acquisition network, trained online from historical hypervolume improvements, guides expensive evaluations toward regions balancing convergence and diversity. With hierarchical screening and amortized surrogate updates, the method maintains accuracy while keeping computational overhead low. Experiments on DTLZ and ZDT suites and a subsurface energy extraction task show that NeuroPareto consistently outperforms classifier-enhanced and surrogate-assisted baselines in Pareto proximity and hypervolume.
【39】Benchmarking Bias Mitigation Toward Fairness Without Harm from Vision to LVLMs
标题:基准偏差缓解,以实现公平,而不会对视觉和LVLM造成伤害
链接:https://arxiv.org/abs/2602.03895
作者:Xuwei Tan,Ziyu Hu,Xueru Zhang
备注:Accepted at ICLR 26
摘要:Machine learning models trained on real-world data often inherit and amplify biases against certain social groups, raising urgent concerns about their deployment at scale. While numerous bias mitigation methods have been proposed, comparing the effectiveness of bias mitigation methods remains difficult due to heterogeneous datasets, inconsistent fairness metrics, isolated evaluation of vision versus multi-modal models, and insufficient hyperparameter tuning that undermines fair comparisons. We introduce NH-Fair, a unified benchmark for fairness without harm that spans both vision models and large vision-language models (LVLMs) under standardized data, metrics, and training protocols, covering supervised and zero-shot regimes. Our key contributions are: (1) a systematic ERM tuning study that identifies training choices with large influence on both utility and disparities, yielding empirically grounded guidelines to help practitioners reduce expensive hyperparameter tuning space in achieving strong fairness and accuracy; (2) evidence that many debiasing methods do not reliably outperform a well-tuned ERM baseline, whereas a composite data-augmentation method consistently delivers parity gains without sacrificing utility, emerging as a promising practical strategy. (3) an analysis showing that while LVLMs achieve higher average accuracy, they still exhibit subgroup disparities, and gains from scaling are typically smaller than those from architectural or training-protocol choices. NH-Fair provides a reproducible, tuning-aware pipeline for rigorous, harm-aware fairness evaluation.
【40】Audit After Segmentation: Reference-Free Mask Quality Assessment for Language-Referred Audio-Visual Segmentation
标题:分割后的审核:参考视听分割的无参考面罩质量评估
链接:https://arxiv.org/abs/2602.03892
作者:Jinxing Zhou,Yanghao Zhou,Yaoting Wang,Zongyan Han,Jiaqi Ma,Henghui Ding,Rao Muhammad Anwer,Hisham Cholakkal
摘要:Language-referred audio-visual segmentation (Ref-AVS) aims to segment target objects described by natural language by jointly reasoning over video, audio, and text. Beyond generating segmentation masks, providing rich and interpretable diagnoses of mask quality remains largely underexplored. In this work, we introduce Mask Quality Assessment in the Ref-AVS context (MQA-RefAVS), a new task that evaluates the quality of candidate segmentation masks without relying on ground-truth annotations as references at inference time. Given audio-visual-language inputs and each provided segmentation mask, the task requires estimating its IoU with the unobserved ground truth, identifying the corresponding error type, and recommending an actionable quality-control decision. To support this task, we construct MQ-RAVSBench, a benchmark featuring diverse and representative mask error modes that span both geometric and semantic issues. We further propose MQ-Auditor, a multimodal large language model (MLLM)-based auditor that explicitly reasons over multimodal cues and mask information to produce quantitative and qualitative mask quality assessments. Extensive experiments demonstrate that MQ-Auditor outperforms strong open-source and commercial MLLMs and can be integrated with existing Ref-AVS systems to detect segmentation failures and support downstream segmentation improvement. Data and codes will be released at https://github.com/jasongief/MQA-RefAVS.
【41】HybridQuestion: Human-AI Collaboration for Identifying High-Impact Research Questions
标题:HybridQuestion:人机协作识别高影响力的研究问题
链接:https://arxiv.org/abs/2602.03849
作者:Keyu Zhao,Fengli Xu,Yong Li,Tie-Yan Liu
备注:16 pages, 6 figures, 4 tables
摘要:The "AI Scientist" paradigm is transforming scientific research by automating key stages of the research process, from idea generation to scholarly writing. This shift is expected to accelerate discovery and expand the scope of scientific inquiry. However, a key question remains unclear: can AI scientists identify meaningful research questions? While Large Language Models (LLMs) have been applied successfully to task-specific ideation, their potential to conduct strategic, long-term assessments of past breakthroughs and future questions remains largely unexplored. To address this gap, we explore a human-AI hybrid solution that integrates the scalable data processing capabilities of AI with the value judgment of human experts. Our methodology is structured in three phases. The first phase, AI-Accelerated Information Gathering, leverages AI's advantage in processing vast amounts of literature to generate a hybrid information base. The second phase, Candidate Question Proposing, utilizes this synthesized data to prompt an ensemble of six diverse LLMs to propose an initial candidate pool, filtered via a cross-model voting mechanism. The third phase, Hybrid Question Selection, refines this pool through a multi-stage filtering process that progressively increases human oversight. To validate this system, we conducted an experiment aiming to identify the Top 10 Scientific Breakthroughs of 2025 and the Top 10 Scientific Questions for 2026 across five major disciplines. Our analysis reveals that while AI agents demonstrate high alignment with human experts in recognizing established breakthroughs, they exhibit greater divergence in forecasting prospective questions, suggesting that human judgment remains crucial for evaluating subjective, forward-looking challenges.
【42】Targeted Synthetic Control Method
标题:有针对性的综合控制方法
链接:https://arxiv.org/abs/2602.04611
作者:Yuxin Wang,Dennis Frauen,Emil Javurek,Konstantin Hess,Yuchen Ma,Stefan Feuerriegel
摘要:The synthetic control method (SCM) estimates causal effects in panel data with a single-treated unit by constructing a counterfactual outcome as a weighted combination of untreated control units that matches the pre-treatment trajectory. In this paper, we introduce the targeted synthetic control (TSC) method, a new two-stage estimator that directly estimates the counterfactual outcome. Specifically, our TSC method (1) yields a targeted debiasing estimator, in the sense that the targeted updating refines the initial weights to produce more stable weights; and (2) ensures that the final counterfactual estimation is a convex combination of observed control outcomes to enable direct interpretation of the synthetic control weights. TSC is flexible and can be instantiated with arbitrary machine learning models. Methodologically, TSC starts from an initial set of synthetic-control weights via a one-dimensional targeted update through the weight-tilting submodel, which calibrates the weights to reduce bias of weights estimation arising from pre-treatment fit. Furthermore, TSC avoids key shortcomings of existing methods (e.g., the augmented SCM), which can produce unbounded counterfactual estimates. Across extensive synthetic and real-world experiments, TSC consistently improves estimation accuracy over state-of-the-art SCM baselines.
【43】Anytime-Valid Conformal Risk Control
标题:随时有效的合规风险控制
链接:https://arxiv.org/abs/2602.04364
作者:Bror Hultberg,Dave Zachariah,Antônio H. Ribeiro
摘要:Prediction sets provide a means of quantifying the uncertainty in predictive tasks. Using held out calibration data, conformal prediction and risk control can produce prediction sets that exhibit statistically valid error control in a computationally efficient manner. However, in the standard formulations, the error is only controlled on average over many possible calibration datasets of fixed size. In this paper, we extend the control to remain valid with high probability over a cumulatively growing calibration dataset at any time point. We derive such guarantees using quantile-based arguments and illustrate the applicability of the proposed framework to settings involving distribution shift. We further establish a matching lower bound and show that our guarantees are asymptotically tight. Finally, we demonstrate the practical performance of our methods through both simulations and real-world numerical examples.
【44】Bures-Wasserstein Importance-Weighted Evidence Lower Bound: Exposition and Applications
标题:布雷斯-沃瑟斯坦重要性加权证据下限:阐述和应用
链接:https://arxiv.org/abs/2602.04272
作者:Peiwen Jiang,Takuo Matsubara,Minh-Ngoc Tran
备注:27 pages, 6 figures. Submitted to Bayesian Analysis
摘要:The Importance-Weighted Evidence Lower Bound (IW-ELBO) has emerged as an effective objective for variational inference (VI), tightening the standard ELBO and mitigating the mode-seeking behaviour. However, optimizing the IW-ELBO in Euclidean space is often inefficient, as its gradient estimators suffer from a vanishing signal-to-noise ratio (SNR). This paper formulates the optimisation of the IW-ELBO in Bures-Wasserstein space, a manifold of Gaussian distributions equipped with the 2-Wasserstein metric. We derive the Wasserstein gradient of the IW-ELBO and project it onto the Bures-Wasserstein space to yield a tractable algorithm for Gaussian VI. A pivotal contribution of our analysis concerns the stability of the gradient estimator. While the SNR of the standard Euclidean gradient estimator is known to vanish as the number of importance samples $K$ increases, we prove that the SNR of the Wasserstein gradient scales favourably as $Ω(\sqrt{K})$, ensuring optimisation efficiency even for large $K$. We further extend this geometric analysis to the Variational Rényi Importance-Weighted Autoencoder bound, establishing analogous stability guarantees. Experiments demonstrate that the proposed framework achieves superior approximation performance compared to other baselines.
【45】Functional Stochastic Localization
标题:功能随机定位
链接:https://arxiv.org/abs/2602.03999
作者:Anming Gu,Bobby Shi,Kevin Tian
备注:Comments welcome!
摘要:Eldan's stochastic localization is a probabilistic construction that has proved instrumental to modern breakthroughs in high-dimensional geometry and the design of sampling algorithms. Motivated by sampling under non-Euclidean geometries and the mirror descent algorithm in optimization, we develop a functional generalization of Eldan's process that replaces Gaussian regularization with regularization by any positive integer multiple of a log-Laplace transform. We further give a mixing time bound on the Markov chain induced by our localization process, which holds if our target distribution satisfies a functional Poincaré inequality. Finally, we apply our framework to differentially private convex optimization in $\ell_p$ norms for $p \in [1, 2)$, where we improve state-of-the-art query complexities in a zeroth-order model.
【46】All-Atom GPCR-Ligand Simulation via Residual Isometric Latent Flow
标题:利用剩余等距潜流进行全原子GPCR-配体模拟
链接:https://arxiv.org/abs/2602.03902
作者:Jiying Zhang,Shuhao Zhang,Pierre Vandergheynst,Patrick Barth
备注:36 pages
摘要:G-protein-coupled receptors (GPCRs), primary targets for over one-third of approved therapeutics, rely on intricate conformational transitions to transduce signals. While Molecular Dynamics (MD) is essential for elucidating this transduction process, particularly within ligand-bound complexes, conventional all-atom MD simulation is computationally prohibitive. In this paper, we introduce GPCRLMD, a deep generative framework for efficient all-atom GPCR-ligand simulation.GPCRLMD employs a Harmonic-Prior Variational Autoencoder (HP-VAE) to first map the complex into a regularized isometric latent space, preserving geometric topology via physics-informed constraints. Within this latent space, a Residual Latent Flow samples evolution trajectories, which are subsequently decoded back to atomic coordinates. By capturing temporal dynamics via relative displacements anchored to the initial structure, this residual mechanism effectively decouples static topology from dynamic fluctuations. Experimental results demonstrate that GPCRLMD achieves state-of-the-art performance in GPCR-ligand dynamics simulation, faithfully reproducing thermodynamic observables and critical ligand-receptor interactions.
【47】Transcendental Regularization of Finite Mixtures:Theoretical Guarantees and Practical Limitations
标题:有限混合流的超越正则化:理论保证与实际限制
链接:https://arxiv.org/abs/2602.03889
作者:Ernest Fokoué
备注:24 pages, 6 figures, 2 tables
摘要:Finite mixture models are widely used for unsupervised learning, but maximum likelihood estimation via EM suffers from degeneracy as components collapse. We introduce transcendental regularization, a penalized likelihood framework with analytic barrier functions that prevent degeneracy while maintaining asymptotic efficiency. The resulting Transcendental Algorithm for Mixtures of Distributions (TAMD) offers strong theoretical guarantees: identifiability, consistency, and robustness. Empirically, TAMD successfully stabilizes estimation and prevents collapse, yet achieves only modest improvements in classification accuracy-highlighting fundamental limits of mixture models for unsupervised learning in high dimensions. Our work provides both a novel theoretical framework and an honest assessment of practical limitations, implemented in an open-source R package.
【48】The Turing Synthetic Radar Dataset: A dataset for pulse deinterleaving
标题:图灵合成雷达数据集:用于脉冲去交织的数据集
链接:https://arxiv.org/abs/2602.03856
作者:Edward Gunn,Adam Hosford,Robert Jones,Leo Zeitler,Ian Groves,Victoria Nockles
备注:7 pages 6 figures, submitted to International Radar Symposium 2026
摘要:We present the Turing Synthetic Radar Dataset, a comprehensive dataset to serve both as a benchmark for radar pulse deinterleaving research and as an enabler of new research methods. The dataset addresses the critical problem of separating interleaved radar pulses from multiple unknown emitters for electronic warfare applications and signal intelligence. Our dataset contains a total of 6000 pulse trains over two receiver configurations, totalling to almost 3 billion pulses, featuring realistic scenarios with up to 110 emitters and significant parameter space overlap. To encourage dataset adoption and establish standardised evaluation procedures, we have launched an accompanying Turing Deinterleaving Challenge, for which models need to associate pulses in interleaved pulse trains to the correct emitter by clustering and maximising metrics such as the V-measure. The Turing Synthetic Radar Dataset is one of the first publicly available, comprehensively simulated pulse train datasets aimed to facilitate sophisticated model development in the electronic warfare community
机器翻译由腾讯交互翻译提供,仅供参考
点击“阅读原文”获取带摘要的学术速递