Py学习  »  机器学习算法

机器学习学术速递[2.23]

arXiv每日学术速递 • 4 天前 • 78 次点击  

点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!


cs.LG 方向,今日共计159篇


大模型相关(28篇)

【1】A Probabilistic Framework for LLM-Based Model Discovery
标题:基于LLM的模型发现的概率框架
链接:https://arxiv.org/abs/2602.18266

作者:Stefan Wahl,Raphaela Schenk,Ali Farnoud,Jakob H. Macke,Daniel Gedon
摘要:从观测数据中发现机械模拟器模型的自动化方法为加速科学进步提供了一条有前途的道路。这些方法通常采用代理风格的迭代工作流的形式,通过模仿人类发现过程来重复提出和修改候选模型。然而,现有的基于LLM的方法通常通过手工制作的启发式程序来实现这样的工作流,而没有明确的概率公式。我们将模型发现重新定义为概率推理,即,从能够解释数据的机械模型的未知分布中采样。这种观点提供了一种统一的方式,在一个单一的推理框架内对模型的建议、细化和选择进行推理。作为这种观点的一个具体实例,我们介绍ModelSMC,一种基于顺序蒙特卡罗抽样的算法。ModelSMC将候选模型表示为由LLM迭代提出和细化的粒子,并使用基于似然的标准进行加权。在真实世界科学系统上的实验表明,该公式发现了具有可解释机制的模型,并改进了后验预测检查。更广泛地说,这种观点为理解和开发基于LLM的模型发现方法提供了概率透镜。
摘要:Automated methods for discovering mechanistic simulator models from observational data offer a promising path toward accelerating scientific progress. Such methods often take the form of agentic-style iterative workflows that repeatedly propose and revise candidate models by imitating human discovery processes. However, existing LLM-based approaches typically implement such workflows via hand-crafted heuristic procedures, without an explicit probabilistic formulation. We recast model discovery as probabilistic inference, i.e., as sampling from an unknown distribution over mechanistic models capable of explaining the data. This perspective provides a unified way to reason about model proposal, refinement, and selection within a single inference framework. As a concrete instantiation of this view, we introduce ModelSMC, an algorithm based on Sequential Monte Carlo sampling. ModelSMC represents candidate models as particles which are iteratively proposed and refined by an LLM, and weighted using likelihood-based criteria. Experiments on real-world scientific systems illustrate that this formulation discovers models with interpretable mechanisms and improves posterior predictive checks. More broadly, this perspective provides a probabilistic lens for understanding and developing LLM-based approaches to model discovery.

【2】Simplifying Outcomes of Language Model Component Analyses with ELIA
标题:用ELIA简化语言模型组件分析的结果
链接:https://arxiv.org/abs/2602.18262

作者:Aaron Louis Eidt,Nils Feldhus
备注:EACL 2026 System Demonstrations. GitHub: https://github.com/aaron0eidt/ELIA
摘要:虽然机械可解释性开发了强大的工具来分析大型语言模型(LLM)的内部工作,但它们的复杂性造成了可访问性差距,限制了它们对专家的使用。我们通过设计,构建和评估ELIA(可解释语言解释性分析)来应对这一挑战,ELIA是一个交互式Web应用程序,可以为更广泛的受众简化各种语言模型组件分析的结果。该系统集成了三个关键技术-属性分析,功能向量分析和电路跟踪-并引入了一种新的方法:使用视觉语言模型自动生成自然语言解释(NLE),这些方法产生的复杂可视化。这种方法的有效性通过混合方法的用户研究进行了经验验证,该研究显示了对交互式,可探索界面的明显偏好,而不是简单的静态可视化。一个关键的发现是,人工智能驱动的解释有助于弥合非专家的知识差距;统计分析显示,用户之前的LLM经验与他们的理解分数之间没有显着的相关性,这表明该系统减少了跨经验水平的理解障碍。我们的结论是,人工智能系统确实可以简化复杂的模型分析,但当与深思熟虑的、以用户为中心的设计相结合时,它的真正力量就会被释放出来,这些设计优先考虑交互性、特异性和叙事指导。
摘要:While mechanistic interpretability has developed powerful tools to analyze the internal workings of Large Language Models (LLMs), their complexity has created an accessibility gap, limiting their use to specialists. We address this challenge by designing, building, and evaluating ELIA (Explainable Language Interpretability Analysis), an interactive web application that simplifies the outcomes of various language model component analyses for a broader audience. The system integrates three key techniques -- Attribution Analysis, Function Vector Analysis, and Circuit Tracing -- and introduces a novel methodology: using a vision-language model to automatically generate natural language explanations (NLEs) for the complex visualizations produced by these methods. The effectiveness of this approach was empirically validated through a mixed-methods user study, which revealed a clear preference for interactive, explorable interfaces over simpler, static visualizations. A key finding was that the AI-powered explanations helped bridge the knowledge gap for non-experts; a statistical analysis showed no significant correlation between a user's prior LLM experience and their comprehension scores, suggesting that the system reduced barriers to comprehension across experience levels. We conclude that an AI system can indeed simplify complex model analyses, but its true power is unlocked when paired with thoughtful, user-centered design that prioritizes interactivity, specificity, and narrative guidance.

【3】[Re] Benchmarking LLM Capabilities in Negotiation through Scoreable Games
标题:[Re]通过可评分游戏对LLM谈判能力进行基准
链接:https://arxiv.org/abs/2602.18230

作者:Jorge Carrasco Pollo,Ioannis Kapetangeorgis,Joshua Rosenthal,John Hua Yao
备注:Accepted for publication at Transactions on Machine Learning Research (TMLR) and MLRC Journal Track, 2025. Code available at: https://github.com/joshrosie/FACT29
摘要:大型语言模型(LLM)在多智能体谈判任务中表现出巨大的潜力,但由于缺乏强大和可推广的基准,在这一领域的评估仍然具有挑战性。Abdelnabi等人(2024)介绍了一种基于可评分游戏的谈判基准,旨在为LLM开发一个高度复杂和现实的评估框架。我们的工作调查了他们的基准中声明的可重复性,并对其可用性和可推广性提供了更深入的理解。我们复制原来的实验上的其他模型,并引入额外的指标来验证谈判质量和均匀性的评估。我们的研究结果表明,虽然基准确实是复杂的,模型比较是模糊的,提出了关于其客观性的问题。此外,我们确定在实验设置的局限性,特别是在信息泄漏检测和彻底的消融研究。通过在基准测试的扩展版本上检查和分析更广泛的模型的行为,我们揭示了为潜在用户提供额外上下文的见解。我们的研究结果强调了模型比较评估中背景的重要性。
摘要:Large Language Models (LLMs) demonstrate significant potential in multi-agent negotiation tasks, yet evaluation in this domain remains challenging due to a lack of robust and generalizable benchmarks. Abdelnabi et al. (2024) introduce a negotiation benchmark based on Scoreable Games, with the aim of developing a highly complex and realistic evaluation framework for LLMs. Our work investigates the reproducibility of claims in their benchmark, and provides a deeper understanding of its usability and generalizability. We replicate the original experiments on additional models, and introduce additional metrics to verify negotiation quality and evenness of evaluation. Our findings reveal that while the benchmark is indeed complex, model comparison is ambiguous, raising questions about its objectivity. Furthermore, we identify limitations in the experimental setup, particularly in information leakage detection and thoroughness of the ablation study. By examining and analyzing the behavior of a wider range of models on an extended version of the benchmark, we reveal insights that provide additional context to potential users. Our results highlight the importance of context in model-comparative evaluations.

【4】SeedFlood: A Step Toward Scalable Decentralized Training of LLMs
标题:SeedFlood:迈向LLM可扩展分散训练的一步
链接:https://arxiv.org/abs/2602.18181

作者:Jihun Kim,Namhoon Lee
摘要:这项工作提出了一种新的分散式训练方法-SeedFlood-旨在扩展复杂网络拓扑中的大型模型,并以最小的通信开销实现全局共识。传统的基于流言的方法受到随着模型大小而增长的消息通信成本的影响,而网络跳数上的信息衰减使得全局共识效率低下。SeedFlood通过利用零阶更新的种子可重构结构并有效地使消息大小接近于零,从而使它们能够被洪泛到网络中的每个客户端,从而偏离了这些实践。这种机制使得通信开销可以忽略不计,并且与模型大小无关,消除了分散训练中的主要可扩展性瓶颈。因此,SeedFlood能够在以前被认为不切实际的制度中进行训练,例如分布在数百个客户端上的十亿参数模型。我们对分散式LLM微调的实验表明,SeedFlood在泛化性能和通信效率方面始终优于基于流言的基线,甚至在大规模设置中达到了与一阶方法相当的结果。
摘要 :This work presents a new approach to decentralized training-SeedFlood-designed to scale for large models across complex network topologies and achieve global consensus with minimal communication overhead. Traditional gossip-based methods suffer from message communication costs that grow with model size, while information decay over network hops renders global consensus inefficient. SeedFlood departs from these practices by exploiting the seed-reconstructible structure of zeroth-order updates and effectively making the messages near-zero in size, allowing them to be flooded to every client in the network. This mechanism makes communication overhead negligible and independent of model size, removing the primary scalability bottleneck in decentralized training. Consequently, SeedFlood enables training in regimes previously considered impractical, such as billion-parameter models distributed across hundreds of clients. Our experiments on decentralized LLM fine-tuning demonstrate thatSeedFlood consistently outperforms gossip-based baselines in both generalization performance and communication efficiency, and even achieves results comparable to first-order methods in large scale settings.

【5】Agentic Adversarial QA for Improving Domain-Specific LLMs
标题:用于改进特定领域LLM的显着对抗性QA
链接:https://arxiv.org/abs/2602.18137

作者:Vincent Grari,Ciprian Tomoiaga,Sylvain Lamprier,Tatsunori Hashimoto,Marcin Detyniecki
备注:9 pages, 1 Figure
摘要:大型语言模型(LLM)尽管在广泛的互联网语料库上进行了广泛的预训练,但通常很难有效地适应专业领域。人们越来越有兴趣为这些领域微调这些模型;然而,进展受到高质量、与任务相关的数据稀缺和覆盖范围有限的制约。为了解决这个问题,通常应用诸如释义或知识提取的合成数据生成方法。虽然这些方法擅长事实回忆和概念知识,但它们有两个关键缺点:(i)它们为这些专业领域的解释推理能力提供的支持最少,(ii)它们经常产生过大且冗余的合成语料库,导致样本效率低下。为了克服这些差距,我们提出了一个对抗性的问题生成框架,产生一组紧凑的语义上具有挑战性的问题。这些问题是通过比较模型的输出来适应和一个强大的专家模型接地参考文件,使用一个迭代的,反馈驱动的过程,旨在揭示和解决理解的差距。对LegalBench语料库的专门子集的评估表明,我们的方法实现了更高的准确性,大大减少了合成样本。
摘要:Large Language Models (LLMs), despite extensive pretraining on broad internet corpora, often struggle to adapt effectively to specialized domains. There is growing interest in fine-tuning these models for such domains; however, progress is constrained by the scarcity and limited coverage of high-quality, task-relevant data. To address this, synthetic data generation methods such as paraphrasing or knowledge extraction are commonly applied. Although these approaches excel at factual recall and conceptual knowledge, they suffer from two critical shortcomings: (i) they provide minimal support for interpretive reasoning capabilities in these specialized domains, and (ii) they often produce synthetic corpora that are excessively large and redundant, resulting in poor sample efficiency. To overcome these gaps, we propose an adversarial question-generation framework that produces a compact set of semantically challenging questions. These questions are constructed by comparing the outputs of the model to be adapted and a robust expert model grounded in reference documents, using an iterative, feedback-driven process designed to reveal and address comprehension gaps. Evaluation on specialized subsets of the LegalBench corpus demonstrates that our method achieves greater accuracy with substantially fewer synthetic samples.

【6】NIMMGen: Learning Neural-Integrated Mechanistic Digital Twins with LLMs
标题:NIMMGen:通过LLM学习神经集成机械数字双胞胎
链接:https://arxiv.org/abs/2602.18008

作者:Zihan Guan,Rituparna Datta,Mengxuan Hu,Shunshun Liu,Aiying Zhang,Prasanna Balachandran,Sheng Li,Anil Vullikanti
备注:19 pages, 6 figures
摘要:机械模型编码有关动力系统的科学知识,并广泛用于下游科学和政策应用。最近的工作已经探索了基于LLM的代理框架,以自动从数据中构建机械模型;然而,现有的问题设置大大简化了现实世界的条件,因此不清楚LLM生成的机械模型在实践中是否可靠。为了解决这一差距,我们引入了神经集成机制建模(NIMM)评估框架,该框架在具有部分观测和多样化任务目标的现实环境下评估LLM生成的机制模型。我们的评估揭示了当前基线的基本挑战,从模型有效性到代码级正确性。受这些发现的启发,我们设计了NIMMgen,一个神经集成机制建模的代理框架,通过迭代细化来提高代码的正确性和实际有效性。在来自不同科学领域的三个数据集上的实验证明了其强大的性能。我们还表明,学习的机制模型支持反事实干预模拟。
摘要:Mechanistic models encode scientific knowledge about dynamical systems and are widely used in downstream scientific and policy applications. Recent work has explored LLM-based agentic frameworks to automatically construct mechanistic models from data; however, existing problem settings substantially oversimplify real-world conditions, leaving it unclear whether LLM-generated mechanistic models are reliable in practice. To address this gap, we introduce the Neural-Integrated Mechanistic Modeling (NIMM) evaluation framework, which evaluates LLM-generated mechanistic models under realistic settings with partial observations and diversified task objectives. Our evaluation reveals fundamental challenges in current baselines, ranging from model effectiveness to code-level correctness. Motivated by these findings, we design NIMMgen, an agentic framework for neural-integrated mechanistic modeling that enhances code correctness and practical validity through iterative refinement. Experiments across three datasets from diversified scientific domains demonstrate its strong performance. We also show that the learned mechanistic models support counterfactual intervention simulation.

【7】Memory-Based Advantage Shaping for LLM-Guided Reinforcement Learning
标题:LLM引导强化学习的基于记忆的优势塑造
链接:https://arxiv.org/abs/2602.17931

作者:Narjes Nourzad,Carlee Joe-Wong
备注:Association for the Advancement of Artificial Intelligence (AAAI)
摘要:在奖励稀疏或延迟的环境中,由于学习需要大量的交互,强化学习(RL)会导致高样本复杂度。这种限制促使使用大型语言模型(LLM)进行子目标发现和轨迹指导。虽然LLM可以支持探索,但频繁依赖LLM调用会引起对可扩展性和可靠性的担忧。我们通过构建一个记忆图来解决这些挑战,该记忆图对LLM指导和代理自己成功推出的子目标和轨迹进行编码。从这个图中,我们推导出一个效用函数,该函数评估代理的轨迹与先前成功的策略有多接近。这个效用塑造了优势函数,在不改变奖励的情况下为批评者提供额外的指导。我们的方法主要依赖于离线输入,只有偶尔的在线查询,避免依赖于连续LLM监督。基准环境中的初步实验表明,与基线RL方法相比,样本效率提高,早期学习速度更快,最终回报与需要频繁LLM交互的方法相当。
摘要:In environments with sparse or delayed rewards, reinforcement learning (RL) incurs high sample complexity due to the large number of interactions needed for learning. This limitation has motivated the use of large language models (LLMs) for subgoal discovery and trajectory guidance. While LLMs can support exploration, frequent reliance on LLM calls raises concerns about scalability and reliability. We address these challenges by constructing a memory graph that encodes subgoals and trajectories from both LLM guidance and the agent's own successful rollouts. From this graph, we derive a utility function that evaluates how closely the agent's trajectories align with prior successful strategies. This utility shapes the advantage function, providing the critic with additional guidance without altering the reward. Our method relies primarily on offline input and only occasional online queries, avoiding dependence on continuous LLM supervision. Preliminary experiments in benchmark environments show improved sample efficiency and faster early learning compared to baseline RL methods, with final returns comparable to methods that require frequent LLM interaction.

【8】MIRA: Memory-Integrated Reinforcement Learning Agent with Limited LLM Guidance
标题:MIRA:具有有限LLM指导的内存集成强化学习代理
链接:https://arxiv.org/abs/2602.17930

作者:Narjes Nourzad,Carlee Joe-Wong
备注:International Conference on Learning Representations (ICLR'26)
摘要:由于先验结构有限,强化学习(RL)代理在稀疏或延迟奖励设置中通常会遇到高样本复杂度的问题。大型语言模型(LLM)可以提供子目标分解,合理的轨迹和抽象的先验知识,以促进早期学习。然而,对LLM监督的严重依赖引入了可扩展性约束和对潜在不可靠信号的依赖。我们提出了MIRA(记忆集成强化学习代理),它包含一个结构化的,不断发展的记忆图来指导早期训练。该图存储决策相关信息,包括轨迹段和子目标结构,并从代理的高回报经验和LLM输出构建。这种设计将LLM查询分摊到持久存储器中,而不需要连续的实时监督。从这个记忆图中,我们得到了一个实用信号,软调整优势估计,影响政策更新,而不修改底层的奖励函数。随着训练的进行,代理的策略逐渐超过初始LLM导出的先验,并且效用项衰减,保持标准收敛保证。我们提供的理论分析表明,基于效用的塑造改善了稀疏奖励环境中的早期学习。从经验上讲,MIRA的表现优于RL基线,并实现了与依赖频繁LLM监督的方法相当的回报,同时需要更少的在线LLM查询。项目网页:https://narjesno.github.io/MIRA/
摘要:Reinforcement learning (RL) agents often suffer from high sample complexity in sparse or delayed reward settings due to limited prior structure. Large language models (LLMs) can provide subgoal decompositions, plausible trajectories, and abstract priors that facilitate early learning. However, heavy reliance on LLM supervision introduces scalability constraints and dependence on potentially unreliable signals. We propose MIRA (Memory-Integrated Reinforcement Learning Agent), which incorporates a structured, evolving memory graph to guide early training. The graph stores decision-relevant information, including trajectory segments and subgoal structures, and is constructed from both the agent's high-return experiences and LLM outputs. This design amortizes LLM queries into a persistent memory rather than requiring continuous real-time supervision. From this memory graph, we derive a utility signal that softly adjusts advantage estimation to influence policy updates without modifying the underlying reward function. As training progresses, the agent's policy gradually surpasses the initial LLM-derived priors, and the utility term decays, preserving standard convergence guarantees. We provide theoretical analysis showing that utility-based shaping improves early-stage learning in sparse-reward environments. Empirically, MIRA outperforms RL baselines and achieves returns comparable to approaches that rely on frequent LLM supervision, while requiring substantially fewer online LLM queries. Project webpage: https://narjesno.github.io/MIRA/

【9】Understanding Unreliability of Steering Vectors in Language Models: Geometric Predictors and the Limits of Linear Approximations
标题:理解语言模型中引导向量的不可靠性:几何预测器和线性逼近的极限
链接:https://arxiv.org/abs/2602.17881

作者:Joschka Braun
备注:Master's Thesis, University of Tübingen. 89 pages, 34 figures. Portions of this work were published at the ICLR 2025 Workshop on Foundation Models in the Wild (see arXiv:2505.22637)
摘要:导向向量是一种轻量级的方法,通过在推理时向激活添加学习偏差来控制语言模型行为。虽然平均而言是有效的,但转向效应的大小因样本而异,对许多目标行为来说是不可靠的。在我的论文中,我研究了为什么转向可靠性在不同的行为之间存在差异,以及它如何受到转向矢量训练数据的影响。首先,我发现训练激活差异之间的余弦相似性越高,预测的转向越可靠。其次,我观察到,沿着转向方向更好地分离积极和消极激活的行为数据集更可靠地转向。最后,在不同的提示变化上训练的导向向量在方向上是不同的,但表现相似,并且在数据集上表现出相关的功效。我的研究结果表明,导向矢量是不可靠的潜在目标的行为表示时,不能有效地近似的线性转向方向。总之,这些见解提供了一个实用的诊断转向不可靠性和激励更强大的转向方法,明确占非线性潜在的行为表示的发展。
摘要:Steering vectors are a lightweight method for controlling language model behavior by adding a learned bias to the activations at inference time. Although effective on average, steering effect sizes vary across samples and are unreliable for many target behaviors. In my thesis, I investigate why steering reliability differs across behaviors and how it is impacted by steering vector training data. First, I find that higher cosine similarity between training activation differences predicts more reliable steering. Second, I observe that behavior datasets where positive and negative activations are better separated along the steering direction are more reliably steerable. Finally, steering vectors trained on different prompt variations are directionally distinct, yet perform similarly well and exhibit correlated efficacy across datasets. My findings suggest that steering vectors are unreliable when the latent target behavior representation is not effectively approximated by the linear steering direction. Taken together, these insights offer a practical diagnostic for steering unreliability and motivate the development of more robust steering methods that explicitly account for non-linear latent behavior representations.

【10】Understanding the Fine-Grained Knowledge Capabilities of Vision-Language Models
标题:理解视觉语言模型的细粒度知识能力
链接:https://arxiv.org/abs/2602.17871

作者:Dhruba Ghosh,Yuhui Zhang,Ludwig Schmidt
摘要:视觉语言模型(VLM)在广泛的视觉问答基准测试中取得了实质性进展,包括视觉推理、文档理解和多模态对话。这些改进在基于各种基础模型、比对架构和训练数据构建的各种VLM中显而易见。然而,最近的工作表明,这些模型落后于传统的图像分类基准,测试细粒度的视觉知识。我们在细粒度分类基准上测试了大量最近的VLM,并确定了细粒度知识与其他视觉基准之间脱节的潜在因素。通过一系列的消融实验,我们发现,使用更好的LLM同样提高了所有基准分数,而更好的视觉编码器不成比例地提高了细粒度分类性能。此外,我们发现预训练阶段对细粒度性能也至关重要,特别是当语言模型权重在预训练期间解冻时。这些见解为增强VLM中的细粒度视觉理解和以视觉为中心的功能铺平了道路。
摘要:Vision-language models (VLMs) have made substantial progress across a wide range of visual question answering benchmarks, spanning visual reasoning, document understanding, and multimodal dialogue. These improvements are evident in a wide range of VLMs built on a variety of base models, alignment architectures, and training data. However, recent works show that these models trail behind in traditional image classification benchmarks, which test fine-grained visual knowledge. We test a large number of recent VLMs on fine-grained classification benchmarks and identify potential factors in the disconnect between fine-grained knowledge and other vision benchmarks. Through a series of ablation experiments, we find that using a better LLM improves all benchmark scores equally, while a better vision encoder disproportionately improves fine-grained classification performance. Furthermore, we find that the pretraining stage is also vital to fine-grained performance, particularly when the language model weights are unfrozen during pretraining. These insights pave the way for enhancing fine-grained visual understanding and vision-centric capabilities in VLMs.

【11】ADAPT: Hybrid Prompt Optimization for LLM Feature Visualization
标题:ADAPT:LLM特征可视化的混合提示优化
链接:https://arxiv.org/abs/2602.17867

作者:João N. Cardoso,Arlindo L. Oliveira,Bruno Martins
摘要:了解LLM激活空间中学习方向编码的特征需要识别强烈激活它们的输入。特征可视化优化了输入以最大限度地激活目标方向,为昂贵的数据集搜索方法提供了一种替代方案,但由于文本的离散性,LLM仍然没有得到充分的探索。此外,现有的即时优化技术不适合这个领域,这是非常容易局部极小值。为了克服这些限制,我们引入ADAPT,一种结合波束搜索初始化和自适应梯度引导突变的混合方法,围绕这些故障模式设计。我们评估了Gemma 2 2B的稀疏自动编码器潜伏期,提出了基于数据集激活统计的指标,以进行严格的比较,并表明ADAPT在各层和潜伏类型上始终优于先前的方法。我们的研究结果表明,LLM的功能可视化是易于处理的,但需要设计假设量身定制的域。
摘要:Understanding what features are encoded by learned directions in LLM activation space requires identifying inputs that strongly activate them. Feature visualization, which optimizes inputs to maximally activate a target direction, offers an alternative to costly dataset search approaches, but remains underexplored for LLMs due to the discrete nature of text. Furthermore, existing prompt optimization techniques are poorly suited to this domain, which is highly prone to local minima. To overcome these limitations, we introduce ADAPT, a hybrid method combining beam search initialization with adaptive gradient-guided mutation, designed around these failure modes. We evaluate on Sparse Autoencoder latents from Gemma 2 2B, proposing metrics grounded in dataset activation statistics to enable rigorous comparison, and show that ADAPT consistently outperforms prior methods across layers and latent types. Our results establish that feature visualization for LLMs is tractable, but requires design assumptions tailored to the domain.

【12】TFL: Targeted Bit-Flip Attack on Large Language Model
标题:TFL:针对大型语言模型的位翻转攻击
链接:https://arxiv.org/abs/2602.17837

作者:Jingkai Guo,Chaitali Chakrabarti,Deliang Fan
备注:13 pages, 11 figures. Preprint
摘要:大型语言模型(LLM)越来越多地部署在安全和安全关键应用程序中,引起了人们对模型参数故障注入攻击的鲁棒性的关注。最近的研究表明,利用计算机主存储器(即,DRAM)漏洞翻转模型权重中的少量位,可以严重破坏LLM行为。然而,LLM上现有的BFA在很大程度上会导致非目标故障或一般性能下降,对操纵特定或目标输出的控制有限。在本文中,我们提出了TFL,一种新的有针对性的位翻转攻击框架,可以精确操作选定提示的LLM输出,同时在不相关的输入上保持几乎没有或轻微的退化。在我们的TFL框架内,我们提出了一种新的以关键字为中心的攻击损失,以促进生成输出中攻击者指定的目标令牌,以及一个辅助效用得分,该得分平衡了攻击有效性对良性数据的附带性能影响。我们在多个LLM(Qwen,DeepSeek,Llama)和基准测试(DROP,GSM8K和TriviaQA)上评估TFL。实验表明,TFL实现了成功的有针对性的LLM输出操作小于50位翻转,并显着降低了对无关查询的影响相比,以前的BFA方法。这证明了TFL的有效性,并将其定位为一类新的隐形和有针对性的LLM模型攻击。
摘要:Large language models (LLMs) are increasingly deployed in safety and security critical applications, raising concerns about their robustness to model parameter fault injection attacks. Recent studies have shown that bit-flip attacks (BFAs), which exploit computer main memory (i.e., DRAM) vulnerabilities to flip a small number of bits in model weights, can severely disrupt LLM behavior. However, existing BFA on LLM largely induce un-targeted failure or general performance degradation, offering limited control over manipulating specific or targeted outputs. In this paper, we present TFL, a novel targeted bit-flip attack framework that enables precise manipulation of LLM outputs for selected prompts while maintaining almost no or minor degradation on unrelated inputs. Within our TFL framework, we propose a novel keyword-focused attack loss to promote attacker-specified target tokens in generative outputs, together with an auxiliary utility score that balances attack effectiveness against collateral performance impact on benign data. We evaluate TFL on multiple LLMs (Qwen, DeepSeek, Llama) and benchmarks (DROP, GSM8K, and TriviaQA). The experiments show that TFL achieves successful targeted LLM output manipulations with less than 50 bit flips and significantly reduced effect on unrelated queries compared to prior BFA approaches. This demonstrates the effectiveness of TFL and positions it as a new class of stealthy and targeted LLM model attack.

【13】Influence-Preserving Proxies for Gradient-Based Data Selection in LLM Fine-tuning
标题:LLM微调中基于对象的数据选择的影响力保留代理
链接:https://arxiv.org/abs/2602.17835

作者:Sirui Chen,Yunzhe Qi,Mengting Ai,Yifan Sun,Ruizhong Qiu,Jiaru Zou,Jingrui He
摘要 :监督微调(SFT)主要依赖于选择最有利于模型下游性能的训练数据。基于影响力的数据选择方法,如TracIn和Influence Functions,利用影响力来识别有用的样本,但它们的计算成本很低,这使得它们对于数十亿参数的大型语言模型(LLM)来说不切实际。一种常见的替代方法是使用现成的较小模型作为代理,但它们仍然是次优的,因为它们的学习动态不清楚,它们的大小无法灵活调整,并且它们无法在基于梯度的影响估计方面与目标模型进一步对齐。为了解决这些挑战,我们引入了Iprox,这是一个两阶段的框架,直接从目标模型中导出影响力保留代理。它首先应用一个低秩压缩阶段来保留目标模型的影响信息,然后应用一个对齐阶段来对齐模型梯度和logit,从而构建灵活控制计算成本的代理,同时保留目标模型的影响。不同LLM系列和评估任务的实验结果表明,Iprox始终优于现成的代理和基线方法。在Qwen 3 - 4 B上,使用Iprox构建的1.5B代理比更大的1.7B现成代理实现更强的性能。值得注意的是,在Llama3.2上,Iprox实现了比基线更好的性能,同时相对于完整的3B模型,计算成本降低了一半以上。这些结果表明,Iprox提供了有效的影响保留代理,使基于梯度的数据选择更具可扩展性的LLM。
摘要:Supervised fine-tuning (SFT) relies critically on selecting training data that most benefits a model's downstream performance. Gradient-based data selection methods such as TracIn and Influence Functions leverage influence to identify useful samples, but their computational cost scales poorly, making them impractical for multi-billion-parameter large language models (LLMs). A common alternative is to use off-the-shelf smaller models as proxies, but they remain suboptimal since their learning dynamics are unclear, their sizes cannot be flexibly adjusted, and they cannot be further aligned with the target model in terms of gradient-based influence estimation. To address these challenges, we introduce Iprox, a two-stage framework that derives influence-preserving proxies directly from the target model. It first applies a low-rank compression stage to preserve influence information of the target model, and then an aligning stage to align both model gradients and logits, thereby constructing proxies that flexibly control computational cost while retaining the target model's influence. Experimental results across diverse LLM families and evaluation tasks show that Iprox consistently outperforms off-the-shelf proxies and baseline methods. On Qwen3-4B, a 1.5B proxy constructed with Iprox achieves stronger performance than the larger 1.7B off-the-shelf proxy. Notably, on Llama3.2, Iprox achieves better performance than baselines while reducing computational cost by more than half relative to the full 3B model. These results show that Iprox provides effective influence-preserving proxies, making gradient-based data selection more scalable for LLMs.

【14】Causality by Abstraction: Symbolic Rule Learning in Multivariate Timeseries with Large Language Models
标题:抽象的因果关系:具有大型语言模型的多元时间序列中的符号规则学习
链接:https://arxiv.org/abs/2602.17829

作者:Preetom Biswas,Giulia Pedrielli,K. Selçuk Candan
摘要:在具有延迟效应的时间序列数据中推断因果关系是一个根本性的挑战,特别是当底层系统表现出无法通过简单的函数映射捕获的复杂动态时。传统的方法往往无法产生广义和可解释的解释,因为多个不同的输入轨迹可能会产生几乎无法区分的输出。在这项工作中,我们提出了ruleXplain,一个框架,利用大型语言模型(LLM)提取正式的解释,在模拟驱动的动态系统的输入输出关系。我们的方法引入了一个受约束的符号规则语言的时间运营商和延迟语义,使LLM生成可验证的因果规则,通过结构化的提示。ruleXplain依赖于原则模型的可用性(例如,模拟器),其将多变量输入时间序列映射到输出时间序列。在ruleXplain中,模拟器用于生成不同的反事实输入轨迹,这些轨迹产生类似的目标输出,作为候选解释。这些反事实输入被聚类并作为上下文提供给LLM,LLM的任务是生成对负责输出时间序列中可观察到的模式的联合时间趋势进行编码的符号规则。闭环细化过程确保规则的一致性和语义有效性。我们使用PySIRTEM流行病模拟器验证框架,将测试率输入映射到每日感染计数;和EnergyPlus建筑能源模拟器,观察温度和太阳辐照度输入到电力需求。为了验证,我们进行了三类实验:(1)通过输入重建的规则的功效;(2)评估规则的因果编码的消融研究;以及(3)在具有不同相位动态的不可见输出趋势中提取规则的泛化测试。
摘要:Inferring causal relations in timeseries data with delayed effects is a fundamental challenge, especially when the underlying system exhibits complex dynamics that cannot be captured by simple functional mappings. Traditional approaches often fail to produce generalized and interpretable explanations, as multiple distinct input trajectories may yield nearly indistinguishable outputs. In this work, we present ruleXplain, a framework that leverages Large Language Models (LLMs) to extract formal explanations for input-output relations in simulation-driven dynamical systems. Our method introduces a constrained symbolic rule language with temporal operators and delay semantics, enabling LLMs to generate verifiable causal rules through structured prompting. ruleXplain relies on the availability of a principled model (e.g., a simulator) that maps multivariate input time series to output time series. Within ruleXplain, the simulator is used to generate diverse counterfactual input trajectories that yield similar target output, serving as candidate explanations. Such counterfactual inputs are clustered and provided as context to the LLM, which is tasked with the generation of symbolic rules encoding the joint temporal trends responsible for the patterns observable in the output times series. A closed-loop refinement process ensures rule consistency and semantic validity. We validate the framework using the PySIRTEM epidemic simulator, mapping testing rate inputs to daily infection counts; and the EnergyPlus building energy simulator, observing temperature and solar irradiance inputs to electricity needs. For validation, we perform three classes of experiments: (1) the efficacy of the ruleset through input reconstruction; (2) ablation studies evaluating the causal encoding of the ruleset; and (3) generalization tests of the extracted rules across unseen output trends with varying phase dynamics.

【15】Ontology-Guided Neuro-Symbolic Inference: Grounding Language Models with Mathematical Domain Knowledge
标题:实体引导的神经符号推理:以数学领域知识为基础语言模型
链接:https://arxiv.org/abs/2602.17826

作者:Marcelo Labre
备注:Submitted to NeuS 2026. Supplementary materials and code: https://doi.org/10.5281/zenodo.18665030
摘要:语言模型表现出基本的局限性-幻觉,脆弱性和缺乏正式的基础-在需要可验证推理的高风险专业领域特别成问题。我调查是否正式领域本体可以提高语言模型的可靠性,通过检索增强生成。使用数学作为概念证明,我实现了一个神经符号管道,利用OpenMath本体与混合检索和交叉编码器重新排序注入相关的定义到模型提示。三个开源模型的MATH基准的评估表明,本体引导的上下文提高检索质量时的性能是高的,但不相关的上下文积极降低它-突出的承诺和挑战的神经符号的方法。
摘要:Language models exhibit fundamental limitations -- hallucination, brittleness, and lack of formal grounding -- that are particularly problematic in high-stakes specialist fields requiring verifiable reasoning. I investigate whether formal domain ontologies can enhance language model reliability through retrieval-augmented generation. Using mathematics as proof of concept, I implement a neuro-symbolic pipeline leveraging the OpenMath ontology with hybrid retrieval and cross-encoder reranking to inject relevant definitions into model prompts. Evaluation on the MATH benchmark with three open-source models reveals that ontology-guided context improves performance when retrieval quality is high, but irrelevant context actively degrades it -- highlighting both the promise and challenges of neuro-symbolic approaches.

【16】Asking Forever: Universal Activations Behind Turn Amplification in Conversational LLMs
标题:永远询问:对话式LLM中转折放大背后的普遍激活
链接:https://arxiv.org/abs/2602.17778

作者:Zachary Coalson,Bo Fang,Sanghyun Hong
备注:Pre-print
摘要:多轮互动长度是会话式LLM运营成本的主要因素。在这项工作中,我们提出了一个新的失败模式在会话LLM:回合放大,其中一个模型始终重复多回合的互动,而没有完成底层任务。我们表明,对手可以系统地利用寻求和解的行为通常鼓励在多轮对话设置$-$可扩展地延长互动。超越行为层面的行为,我们采取机械的角度来看,并确定一个查询独立的,通用的激活子空间与寻求解释的反应。与之前依赖于每回合提示优化的成本放大攻击不同,我们的攻击来自会话动态,并在提示和任务中持续存在。我们表明,这种机制提供了一个可扩展的途径,以诱导回合放大:通过微调和运行时攻击通过低级别的参数损坏的供应链攻击一致地将模型转向抽象的,寻求解释的行为在提示。在多个防御调整的LLM和基准测试中,我们的攻击在保持兼容的同时大幅增加了回合数。我们还表明,现有的防御提供有限的保护,对这类新兴的故障。
摘要:Multi-turn interaction length is a dominant factor in the operational costs of conversational LLMs. In this work, we present a new failure mode in conversational LLMs: turn amplification, in which a model consistently prolongs multi-turn interactions without completing the underlying task. We show that an adversary can systematically exploit clarification-seeking behavior$-$commonly encouraged in multi-turn conversation settings$-$to scalably prolong interactions. Moving beyond prompt-level behaviors, we take a mechanistic perspective and identify a query-independent, universal activation subspace associated with clarification-seeking responses. Unlike prior cost-amplification attacks that rely on per-turn prompt optimization, our attack arises from conversational dynamics and persists across prompts and tasks. We show that this mechanism provides a scalable pathway to induce turn amplification: both supply-chain attacks via fine-tuning and runtime attacks through low-level parameter corruptions consistently shift models toward abstract, clarification-seeking behavior across prompts. Across multiple instruction-tuned LLMs and benchmarks, our attack substantially increases turn count while remaining compliant. We also show that existing defenses offer limited protection against this emerging class of failures.

【17】CLUTCH: Contextualized Language model for Unlocking Text-Conditioned Hand motion modelling in the wild
标题:CUTCH:用于解锁野外文本条件手运动建模的上下文化语言模型
链接:https://arxiv.org/abs/2602.17770

作者 :Balamurugan Thambiraja,Omid Taheri,Radek Danecek,Giorgio Becherini,Gerard Pons-Moll,Justus Thies
备注:ICLR2026; Project page: https://balamuruganthambiraja.github.io/CLUTCH/
摘要:手在日常生活中起着核心作用,但建模自然的手部动作仍然探索不足。解决文本到手部运动生成或手部动画字幕的现有方法依赖于具有有限动作和上下文的工作室捕获的数据集,使得它们扩展到“野外”设置的成本很高。此外,当代模型及其训练方案难以通过文本运动对齐来捕获动画保真度。为了解决这个问题,我们(1)引入了“3D Hands in the Wild”(3D-HIW),这是一个32 K 3D手部运动序列和对齐文本的数据集,(2)提出了CLUTCH,一个基于LLM的手部动画系统,具有两个关键创新:(a)一个新的VQ-VAE架构来标记手部运动,(b)一个几何细化阶段来微调LLM。为了构建3D-HIW,我们提出了一个结合视觉语言模型(VLM)和最先进的3D手部跟踪器的数据注释管道,并将其应用于覆盖广泛场景的以自我为中心的动作视频的大型语料库。为了完全捕捉野外运动,CLUTCH采用了一种部分模态分解的VQ-VAE,它提高了泛化和重建保真度。最后,为了提高动画质量,我们引入了一个几何细化阶段,其中CLUTCH是共同监督的重建损失直接应用到解码的手部运动参数。实验证明了文本到运动和运动到文本任务的最先进的性能,建立了第一个可扩展的野外手部运动建模的基准。将发布代码、数据和模型。
摘要:Hands play a central role in daily life, yet modeling natural hand motions remains underexplored. Existing methods that tackle text-to-hand-motion generation or hand animation captioning rely on studio-captured datasets with limited actions and contexts, making them costly to scale to "in-the-wild" settings. Further, contemporary models and their training schemes struggle to capture animation fidelity with text-motion alignment. To address this, we (1) introduce '3D Hands in the Wild' (3D-HIW), a dataset of 32K 3D hand-motion sequences and aligned text, and (2) propose CLUTCH, an LLM-based hand animation system with two critical innovations: (a) SHIFT, a novel VQ-VAE architecture to tokenize hand motion, and (b) a geometric refinement stage to finetune the LLM. To build 3D-HIW, we propose a data annotation pipeline that combines vision-language models (VLMs) and state-of-the-art 3D hand trackers, and apply it to a large corpus of egocentric action videos covering a wide range of scenarios. To fully capture motion in-the-wild, CLUTCH employs SHIFT, a part-modality decomposed VQ-VAE, which improves generalization and reconstruction fidelity. Finally, to improve animation quality, we introduce a geometric refinement stage, where CLUTCH is co-supervised with a reconstruction loss applied directly to decoded hand motion parameters. Experiments demonstrate state-of-the-art performance on text-to-motion and motion-to-text tasks, establishing the first benchmark for scalable in-the-wild hand motion modelling. Code, data and models will be released.

【18】ScaleBITS: Scalable Bitwidth Search for Hardware-Aligned Mixed-Precision LLMs
标题:ScaleBITS:针对硬件对齐的混合精度LLM的可扩展比特宽度搜索
链接:https://arxiv.org/abs/2602.17698

作者:Xinlin Li,Timothy Chou,Josh Fromm,Zichang Liu,Yunjie Pan,Christina Fragouli
摘要:训练后权重量化对于降低大型语言模型(LLM)的内存和推理成本至关重要,但由于高度不均匀的权重敏感性和缺乏原则性的精度分配,将平均精度推到4位以下仍然具有挑战性。现有的解决方案使用具有高运行时开销的不规则细粒度混合精度,或者依赖于算法或高度约束的精度分配策略。在这项工作中,我们提出了ScaleBITS,一个混合精度的量化框架,使自动化,细粒度的位宽分配下的内存预算,同时保持硬件效率。在一个新的敏感性分析的指导下,我们引入了一个硬件对齐的,块级的权重划分方案,由双向信道重排序提供动力。我们制定全球位宽分配作为一个约束优化问题,并开发了一个可扩展的近似贪婪算法,使端到端的原则性分配。实验表明,ScaleBITS在均匀精度量化(高达+36%)上有显著改进,并且在超低位机制中优于最先进的灵敏度感知基线(高达+13%),而不会增加运行时开销。
摘要:Post-training weight quantization is crucial for reducing the memory and inference cost of large language models (LLMs), yet pushing the average precision below 4 bits remains challenging due to highly non-uniform weight sensitivity and the lack of principled precision allocation. Existing solutions use irregular fine-grained mixed-precision with high runtime overhead or rely on heuristics or highly constrained precision allocation strategies. In this work, we propose ScaleBITS, a mixed-precision quantization framework that enables automated, fine-grained bitwidth allocation under a memory budget while preserving hardware efficiency. Guided by a new sensitivity analysis, we introduce a hardware-aligned, block-wise weight partitioning scheme, powered by bi-directional channel reordering. We formulate global bitwidth allocation as a constrained optimization problem and develop a scalable approximation to the greedy algorithm, enabling end-to-end principled allocation. Experiments show that ScaleBITS significantly improves over uniform-precision quantization (up to +36%) and outperforms state-of-the-art sensitivity-aware baselines (up to +13%) in ultra-low-bit regime, without adding runtime overhead.

【19】Pimp My LLM: Leveraging Variability Modeling to Tune Inference Hyperparameters
标题:Pimp My LLM:利用变异性建模来调整推理超参数
链接:https://arxiv.org/abs/2602.17697

作者:Nada Zine,Clément Quinton,Romain Rouvoy
摘要:大型语言模型(LLM)越来越多地用于各种任务。然而,它们大量的计算需求引起了人们对训练和推理的能源效率和可持续性的担忧。特别是推理,它主导着总的计算使用,使其优化至关重要。最近的研究探索了优化技术,并分析了配置选择如何影响能耗。然而,由于组合爆炸,推理服务器的巨大配置空间使得详尽的经验评估不可行。在本文中,我们引入了一个新的角度对这个问题的处理LLM作为可配置的系统和应用可变性管理技术,系统地分析推理时间的配置选择。我们评估我们的拥抱脸Transformers库的方法,通过使用基于特征的可变性模型表示生成超参数及其约束,对代表性配置进行采样,测量其能耗,延迟,准确性,并从收集的数据中学习预测模型。我们的研究结果表明,可变性建模有效地管理LLM推理配置的复杂性。它能够系统地分析超参数效应和相互作用,揭示权衡,并支持从有限数量的测量中准确预测推理行为。总的来说,这项工作开辟了一个新的研究方向,通过利用可变性建模来实现LLM的高效和可持续配置,从而将软件工程和机器学习联系起来。
摘要:Large Language Models (LLMs) are being increasingly used across a wide range of tasks. However, their substantial computational demands raise concerns about the energy efficiency and sustainability of both training and inference. Inference, in particular, dominates total compute usage, making its optimization crucial. Recent research has explored optimization techniques and analyzed how configuration choices influence energy consumption. Yet, the vast configuration space of inference servers makes exhaustive empirical evaluation infeasible due to combinatorial explosion. In this paper, we introduce a new perspective on this problem by treating LLMs as configurable systems and applying variability management techniques to systematically analyze inference-time configuration choices. We evaluate our approach on the Hugging Face Transformers library by representing generation hyperparameters and their constraints using a feature-based variability model, sampling representative configurations, measuring their energy consumption, latency, accuracy, and learning predictive models from the collected data. Our results show that variability modeling effectively manages the complexity of LLM inference configurations. It enables systematic analysis of hyperparameters effects and interactions, reveals trade-offs, and supports accurate prediction of inference behavior from a limited number of measurements. Overall, this work opens a new research direction that bridges software engineering and machine learning by leveraging variability modeling for the efficient and sustainable configuration of LLMs.

【20】Can LLM Safety Be Ensured by Constraining Parameter Regions?
标题:通过约束参数区域能否保证LLM安全?
链接:https://arxiv.org/abs/2602.17696

作者:Zongmin Li,Jian Su,Farah Benamara,Aixin Sun
备注:32 pages
摘要:大型语言模型(LLM)通常被假设包含“安全区域”-参数子集,其修改直接影响安全行为。我们进行了系统的评估四个安全区域识别方法跨越不同的参数粒度,从个人的权重,整个Transformer层,在四个家庭的骨干LLM不同的大小。使用10个安全识别数据集,我们发现所识别的安全区域仅表现出低到中度的重叠,如IoU所测量的。当使用效用数据集(即非有害查询)进一步细化安全区域时,重叠显著下降。这些结果表明,目前的技术无法可靠地识别一个稳定的,不可知的安全区域。
摘要:Large language models (LLMs) are often assumed to contain ``safety regions'' -- parameter subsets whose modification directly influences safety behaviors. We conduct a systematic evaluation of four safety region identification methods spanning different parameter granularities, from individual weights to entire Transformer layers, across four families of backbone LLMs with varying sizes. Using ten safety identification datasets, we find that the identified safety regions exhibit only low to moderate overlap, as measured by IoU. The overlap drops significantly when the safety regions are further refined using utility datasets (\ie non-harmful queries). These results suggest that current techniques fail to reliably identify a stable, dataset-agnostic safety region.

【21】AsynDBT: Asynchronous Distributed Bilevel Tuning for efficient In-Context Learning with Large Language Models
标题:AsynDBT:通过大型语言模型实现高效的上下文内学习的同步分布式二层调优
链接:https://arxiv.org/abs/2602.17694

作者:Hui Ma,Shaoyu Dou,Ya Liu,Fei Xing,Li Feng,Feng Pi
备注:Accepted in Scientific Reports
摘要 :随着大型语言模型(LLM)的快速发展,越来越多的应用程序利用基于云的LLM API来降低使用成本。然而,由于基于云的模型的参数和梯度是不可知的,因此用户必须手动或使用启发式算法来调整用于干预LLM输出的提示,这需要昂贵的优化过程。上下文学习(ICL)最近已经成为一种很有前途的范例,它使LLM能够使用输入中提供的示例来适应新的任务,从而消除了参数更新的需要。然而,国际化学品分类法的进展往往受到缺乏高质量数据的阻碍,而这些数据往往是敏感和不同的,难以分享。联邦学习(FL)提供了一个潜在的解决方案,它支持分布式LLM的协作训练,同时保护数据隐私。尽管存在这些问题,以前的FL方法,包括ICL一直在努力与严重的落伍者的问题和挑战与异构不相同的数据。为了解决这些问题,我们提出了一种异步分布式双层调优(AsynDBT)算法,该算法根据LLM的反馈优化了上下文学习样本和提示片段,从而提高了下游任务的性能。得益于其分布式架构,AsynDBT提供了隐私保护和适应异构计算环境。此外,我们提出了一个理论分析,建立该算法的收敛保证。在多个基准数据集上进行的大量实验证明了AsynDBT的有效性和效率。
摘要:With the rapid development of large language models (LLMs), an increasing number of applications leverage cloud-based LLM APIs to reduce usage costs. However, since cloud-based models' parameters and gradients are agnostic, users have to manually or use heuristic algorithms to adjust prompts for intervening LLM outputs, which requiring costly optimization procedures. In-context learning (ICL) has recently emerged as a promising paradigm that enables LLMs to adapt to new tasks using examples provided within the input, eliminating the need for parameter updates. Nevertheless, the advancement of ICL is often hindered by the lack of high-quality data, which is often sensitive and different to share. Federated learning (FL) offers a potential solution by enabling collaborative training of distributed LLMs while preserving data privacy. Despite this issues, previous FL approaches that incorporate ICL have struggled with severe straggler problems and challenges associated with heterogeneous non-identically data. To address these problems, we propose an asynchronous distributed bilevel tuning (AsynDBT) algorithm that optimizes both in-context learning samples and prompt fragments based on the feedback from the LLM, thereby enhancing downstream task performance. Benefiting from its distributed architecture, AsynDBT provides privacy protection and adaptability to heterogeneous computing environments. Furthermore, we present a theoretical analysis establishing the convergence guarantees of the proposed algorithm. Extensive experiments conducted on multiple benchmark datasets demonstrate the effectiveness and efficiency of AsynDBT.

【22】A Case Study of Selected PTQ Baselines for Reasoning LLMs on Ascend NPU
标题:Ascend NPU上推理LLM的选定PTQ基线案例研究
链接:https://arxiv.org/abs/2602.17693

作者:Yuchen Luo,Fangyue Zhu,Ruining Zhou,Mingzhe Huang,Jian Zhu,Fanyu Fan,Wei Shao
摘要:训练后量化(PTQ)对于有效的模型部署至关重要,但与GPU架构相比,其在Ascend NPU上的有效性仍然未得到充分探索。本文介绍了一个应用于面向推理的模型(如DeepSeek-R1-Distill-Qwen系列(1.5B/7 B/14 B)和QwQ-32 B)的代表性PTQ基线的案例研究。我们评估了四种不同的算法,包括AWQ,GPTQ,SmoothQuant和FlatQuant,以涵盖从仅加权压缩到基于旋转的高级方法的范围。我们的实证结果显示了显着的平台敏感性。虽然4位仅权重量化被证明适用于较大的模型,但积极的4位权重激活方案在NPU上存在逐层校准不稳定性,导致长上下文推理任务中的逻辑崩溃。相反,标准的8位量化在数值上保持稳定。此外,一个真实的INT 8部署表明,尽管优化的内核减少了延迟,但动态量化开销目前限制了端到端加速。这些研究结果为在Ascend NPU上部署量化推理模型的可行性和局限性提供了实际参考。
摘要:Post-Training Quantization (PTQ) is crucial for efficient model deployment, yet its effectiveness on Ascend NPU remains under-explored compared to GPU architectures. This paper presents a case study of representative PTQ baselines applied to reasoning-oriented models such as DeepSeek-R1-Distill-Qwen series (1.5B/7B/14B) and QwQ-32B. We evaluate four distinct algorithms, including AWQ, GPTQ, SmoothQuant, and FlatQuant, to cover the spectrum from weight-only compression to advanced rotation-based methods. Our empirical results reveal significant platform sensitivity. While 4-bit weight-only quantization proves viable for larger models, aggressive 4-bit weight-activation schemes suffer from layer-wise calibration instability on the NPU, leading to logic collapse in long-context reasoning tasks. Conversely, standard 8-bit quantization remains numerically stable. Furthermore, a real-world INT8 deployment demonstrates that although optimized kernels reduce latency, dynamic quantization overheads currently limit end-to-end acceleration. These findings offer a practical reference for the feasibility and limitations of deploying quantized reasoning models on Ascend NPU.

【23】Agentic Unlearning: When LLM Agent Meets Machine Unlearning
标题:抽象的遗忘:当LLM代理遇到机器遗忘
链接:https://arxiv.org/abs/2602.17692

作者:Bin Wang,Fan Wang,Pingping Wang,Jinyu Cong,Yang Yu,Yilong Yin,Zhongyi Han,Benzheng Wei
备注:9 pages, 6 figures, 6 tables
摘要:在本文中,我们引入了\textbf{agentic unlearning},它从具有闭环交互的代理中的模型参数和持久记忆中删除指定的信息。现有的遗忘方法只针对参数,留下两个关键的差距:(i)参数记忆回流,其中检索重新激活参数残留或记忆文物重新引入敏感内容,以及(ii)缺乏一个统一的策略,涵盖参数和记忆路径。我们提出了同步回流遗忘(SBU),一个框架,联合参数和记忆通路的遗忘。内存路径执行基于依赖闭包的非学习,在逻辑上使共享工件无效的同时修剪隔离的实体。参数路径采用随机参考对齐来引导模型输出朝向高熵先验。这些通路通过同步的双更新协议整合,形成一个闭环机制,其中记忆遗忘和参数抑制相互加强,以防止跨通路再污染。在医学QA基准测试上的实验表明,SBU减少了两种途径中目标私人信息的痕迹,保留数据的退化有限。
摘要:In this paper, we introduce \textbf{agentic unlearning} which removes specified information from both model parameters and persistent memory in agents with closed-loop interaction. Existing unlearning methods target parameters alone, leaving two critical gaps: (i) parameter-memory backflow, where retrieval reactivates parametric remnants or memory artifacts reintroduce sensitive content, and (ii) the absence of a unified strategy that covers both parameter and memory pathways. We present Synchronized Backflow Unlearning (SBU), a framework that unlearns jointly across parameter and memory pathways. The memory pathway performs dependency closure-based unlearning that prunes isolated entities while logically invalidating shared artifacts. The parameter pathway employs stochastic reference alignment to guide model outputs toward a high-entropy prior. These pathways are integrated via a synchronized dual-update protocol, forming a closed-loop mechanism where memory unlearning and parametric suppression reinforce each other to prevent cross-pathway recontamination. Experiments on medical QA benchmarks show that SBU reduces traces of targeted private information across both pathways with limited degradation on retained data.

【24】Tethered Reasoning: Decoupling Entropy from Hallucination in Quantized LLMs via Manifold Steering
标题:系留推理:通过管汇引导将量化LLM中的幻觉与量化LLM中的量化幻觉脱钩
链接:https://arxiv.org/abs/2602.17691

作者:Craig Atkinson
备注:16 pages, 6 tables
摘要:量化语言模型面临着一个基本的困境:低采样温度会产生重复的、模式崩溃的输出,而高温(T > 2.0)会导致轨迹发散和语义不连贯。我们提出了HELIX,一个几何框架,通过将隐藏状态的轨迹拴系到预先计算的真实性流形上,从而将输出熵从幻觉中分离出来。HELIX计算一个统一的真值分数(UTS),结合标记级语义熵与流形的马氏距离。当UTS指示轨迹发散时,分级导向向量将激活重定向到结构一致的区域,同时仅影响0.2-2.5%的令牌。 在4位量化Granite 4.0 H Small(32 B/9 B主动,混合Mamba-Transformer)上:GSM 8 K在T = 3.0时保持88.84%的准确性(从T = 0.5下降2.81pp); MMLU在14,042个问题中保持72.49%(1.24pp下降)。这表明高温幻觉主要是轨迹发散而不是语义崩溃。值得注意的是,操纵稀疏的Transformer注意层(约10%的层)足以校正Mamba-2状态空间公式中的漂移。 几何束缚揭示了一个先前被掩盖的高熵创造性水库。在T > 2.0时,转向输出表现出5-20%的想法重复,而在保守设置下为70-80%。跨架构验证(Qwen 3 - 30 B-A3 B MoE)证实了这种现象与架构无关,独特概念生成高出46.7%。HELIX充当语法系链,支持在不违反有效输出所需的逻辑主干的情况下探索语义多样性。这就实现了多温度合成,生成的独特概念比单温度推断多200%。
摘要:Quantized language models face a fundamental dilemma: low sampling temperatures yield repetitive, mode-collapsed outputs, while high temperatures (T > 2.0) cause trajectory divergence and semantic incoherence. We present HELIX, a geometric framework that decouples output entropy from hallucination by tethering hidden-state trajectories to a pre-computed truthfulness manifold. HELIX computes a Unified Truth Score (UTS) combining token-level semantic entropy with Mahalanobis distance from the manifold. When UTS indicates trajectory divergence, graduated steering vectors redirect activations toward structurally coherent regions while affecting only 0.2-2.5% of tokens. On 4-bit quantized Granite 4.0 H Small (32B/9B active, hybrid Mamba-Transformer): GSM8K maintains 88.84% accuracy at T = 3.0 (2.81pp degradation from T = 0.5); MMLU maintains 72.49% across 14,042 questions (1.24pp degradation). This demonstrates that high-temperature hallucination is primarily trajectory divergence rather than semantic collapse. Notably, steering the sparse Transformer attention layers (~10% of layers) is sufficient to correct drift in the Mamba-2 state-space formulation. Geometric tethering reveals a previously-masked High-Entropy Creative Reservoir. At T > 2.0, steered outputs exhibit 5-20% idea duplication versus 70-80% at conservative settings. Cross-architecture validation (Qwen3-30B-A3B MOE) confirms this phenomenon is architecture-independent, with 46.7% higher unique concept generation. HELIX acts as a syntax tether, enabling exploration of semantic diversity without violating the logical backbone required for valid output. This enables Multi-Temperature Synthesis, generating 200% more unique concepts than single-temperature inference.

【25】Robust Pre-Training of Medical Vision-and-Language Models with Domain-Invariant Multi-Modal Masked Reconstruction
标题:具有领域不变多模式掩蔽重建的医学视觉和语言模型的稳健预训练
链接:https://arxiv.org/abs/2602.17689

作者:Melika Filvantorkaman,Mohsen Piri
备注:28 pages, 3 figures
摘要:医学视觉语言模型在医学图像和临床文本的联合推理方面表现出很强的潜力,但它们的性能往往会在成像设备、采集协议和报告风格的变化引起的域转移下下降。现有的多模态预训练方法在很大程度上忽略了鲁棒性,将其视为下游适应问题。在这项工作中,我们提出了鲁棒多模态掩蔽重建(Robust-MMR),这是一个自我监督的预训练框架,它将鲁棒性目标明确地纳入掩蔽视觉语言学习中。Robust-MMR集成了非对称扰动感知掩蔽,域一致性正则化和模态弹性约束,以鼓励域不变表示。我们在多个医学视觉语言基准上评估了Robust-MMR,包括医学视觉问答(VQA-RAD,SLAKE,VQA-2019),跨域图像-文本分类(MELINDA)和鲁棒图像标题检索(ROCO)。Robust-MMR在VQA-RAD上实现了78.9%的跨域准确率,超过最强基线3.8个百分点,在SLAKE和VQA-2019上分别达到74.6%和77.0%的准确率。在扰动评估下,Robust-MMR将VQA-RAD准确度从69.1%提高到75.6%。对于图像-文本分类,跨域MELINDA的准确率从70.3%增加到75.2%,而检索实验表明,在扰动下,平均秩下降从16下降到4.1。定性结果进一步证明了改进的疾病检测和结构异常评估的临床推理。这些发现表明,在预训练期间明确建模鲁棒性可以为现实世界的部署带来更可靠和可转移的医疗视觉语言表示。
摘要:Medical vision-language models show strong potential for joint reasoning over medical images and clinical text, but their performance often degrades under domain shift caused by variations in imaging devices, acquisition protocols, and reporting styles. Existing multi-modal pre-training methods largely overlook robustness, treating it as a downstream adaptation problem. In this work, we propose Robust Multi-Modal Masked Reconstruction (Robust-MMR), a self-supervised pre-training framework that explicitly incorporates robustness objectives into masked vision-language learning. Robust-MMR integrates asymmetric perturbation-aware masking, domain-consistency regularization, and modality-resilience constraints to encourage domain-invariant representations. We evaluate Robust-MMR on multiple medical vision-language benchmarks, including medical visual question answering (VQA-RAD, SLAKE, VQA-2019), cross-domain image-text classification (MELINDA), and robust image-caption retrieval (ROCO). Robust-MMR achieves 78.9% cross-domain accuracy on VQA-RAD, outperforming the strongest baseline by 3.8 percentage points, and reaches 74.6% and 77.0% accuracy on SLAKE and VQA-2019, respectively. Under perturbed evaluation, Robust-MMR improves VQA-RAD accuracy from 69.1% to 75.6%. For image-text classification, cross-domain MELINDA accuracy increases from 70.3% to 75.2%, while retrieval experiments show a reduction in mean rank degradation from over 16 to 4.1 under perturbation. Qualitative results further demonstrate improved clinical reasoning for disease detection and structural abnormality assessment. These findings show that explicitly modeling robustness during pre-training leads to more reliable and transferable medical vision-language representations for real-world deployment.

【26】CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models
标题:CodeScaler:通过免执行奖励模型进行扩展代码LLM训练和测试时推理
链接:https://arxiv.org/abs/2602.17684

作者:Xiao Zhu,Xinyu Zhou,Boyu Zhu,Hanxu Hu,Mingzhe Du,Haotian Zhang,Huiming Wang,Zhijiang Guo
摘要:来自可验证奖励的强化学习(RLVR)通过利用来自单元测试的基于执行的反馈,推动了代码大型语言模型的最新进展,但其可扩展性从根本上受到高质量测试用例的可用性和可靠性的限制。我们提出了CodeScaler,这是一种免执行的奖励模型,旨在扩展强化学习训练和代码生成的测试时推理。CodeScaler基于从已验证的代码问题中获得的精心策划的偏好数据进行训练,并结合了语法感知代码提取和有效性保持奖励整形,以确保稳定和强大的优化。在五个编码基准测试中,CodeScaler将Qwen 3 -8B-Base平均提高了+11.72点,优于基于二进制执行的RL +1.82点,并在没有任何测试用例的情况下在合成数据集上实现了可扩展的强化学习。在推理时,CodeScaler作为一种有效的测试时间缩放方法,实现了与单元测试方法相当的性能,同时将延迟减少了10倍。此外,CodeScaler不仅在代码领域(+3.3分),而且在一般和推理领域(平均+2.7分)都超过了RM-Bench上现有的奖励模型。
摘要:Reinforcement Learning from Verifiable Rewards (RLVR) has driven recent progress in code large language models by leveraging execution-based feedback from unit tests, but its scalability is fundamentally constrained by the availability and reliability of high-quality test cases. We propose CodeScaler, an execution-free reward model designed to scale both reinforcement learning training and test-time inference for code generation. CodeScaler is trained on carefully curated preference data derived from verified code problems and incorporates syntax-aware code extraction and validity-preserving reward shaping to ensure stable and robust optimization. Across five coding benchmarks, CodeScaler improves Qwen3-8B-Base by an average of +11.72 points, outperforming binary execution-based RL by +1.82 points, and enables scalable reinforcement learning on synthetic datasets without any test cases. At inference time, CodeScaler serves as an effective test-time scaling method, achieving performance comparable to unit test approaches while providing a 10-fold reduction in latency. Moreover, CodeScaler surpasses existing reward models on RM-Bench not only in the code domain (+3.3 points), but also in general and reasoning domains (+2.7 points on average).

【27】LATMiX: Learnable Affine Transformations for Microscaling Quantization of LLMs
标题:LAMiX:用于LLM微尺度量化的可学习仿射变换
链接:https://arxiv.org/abs/2602.17681

作者:Ofir Gordon,Lior Dikstein,Arnon Netzer,Idan Achituve,Hai Victor Habi
备注:24 pages, 4 figures
摘要:后训练量化(PTQ)是一种广泛使用的方法,用于减少大型语言模型(LLM)的内存和计算成本。最近的研究表明,将可逆变换应用于激活可以通过减少激活离群值来显着提高量化鲁棒性;然而,现有的方法在很大程度上限于旋转或基于Hadamard的变换。此外,大多数研究主要集中在传统的量化方案,而现代硬件越来越多地支持微尺度(MX)数据格式。尝试将两者结合起来显示出严重的性能下降,导致先前的工作引入对转换的假设。在这项工作中,我们采取了互补的观点。首先,我们提供了一个理论分析的变换下MX量化的量化误差推导出一个界。我们的分析强调会计的激活分布和潜在的量子化结构的重要性。在此分析的基础上,我们提出了LATMix,这是一种将离群值减少推广到使用标准深度学习工具优化的可学习可逆仿射变换的方法。实验表明,MX低比特量化的平均精度在多种模型尺寸上在宽范围的zero-shot基准上在强基线上得到一致的提高。
摘要:Post-training quantization (PTQ) is a widely used approach for reducing the memory and compute costs of large language models (LLMs). Recent studies have shown that applying invertible transformations to activations can significantly improve quantization robustness by reducing activation outliers; however, existing approaches are largely restricted to rotation or Hadamard-based transformations. Moreover, most studies focused primarily on traditional quantization schemes, whereas modern hardware increasingly supports the microscaling (MX) data format. Attempts to combine both showed severe performance degradation, leading prior work to introduce assumptions on the transformations. In this work, we take a complementary perspective. First, we provide a theoretical analysis of transformations under MX quantization by deriving a bound on the quantization error. Our analysis emphasizes the importance of accounting for both the activation distribution and the underlying quantization structure. Building on this analysis, we propose LATMiX, a method that generalizes outlier reduction to learnable invertible affine transformations optimized using standard deep learning tools. Experiments show consistent improvements in average accuracy for MX low-bit quantization over strong baselines on a wide range of zero-shot benchmarks, across multiple model sizes.

【28】BioBridge: Bridging Proteins and Language for Enhanced Biological Reasoning with LLMs
标题:BioBridge:通过LLM连接蛋白质和语言以增强生物推理
链接:https://arxiv.org/abs/2602.17680

作者:Yujia Wang,Jihong Guan,Wengen Li,Shuigeng Zhou,Xuhong Wang
摘要:现有的蛋白质语言模型(PLMs)通常对多个任务的适应性有限,并且在不同的生物背景下表现出较差的泛化能力。相比之下,通用大型语言模型(LLM)缺乏解释蛋白质序列的能力,并且缺乏特定领域的知识,限制了它们进行有效生物语义推理的能力。为了结合两者的优点,我们提出了BioBridge,一个用于蛋白质理解的领域自适应持续预训练框架。该框架采用领域增量连续预训练(DICP)将蛋白质领域知识和一般推理语料同时注入LLM,有效地减轻灾难性遗忘。跨模态对齐通过PLM-Projector-LLM管道实现,该管道将蛋白质序列嵌入映射到语言模型的语义空间中。最终,采用端到端的优化来统一支持各种任务,包括蛋白质性质预测和知识问答。我们提出的BioBridge在多个蛋白质基准测试(如EC和BindingDB)上表现出与主流PLM相当的性能。它还在MMLU和RACE等一般理解任务上取得了与LLM相当的结果。这展示了其将特定领域的适应性与通用语言能力相结合的创新优势。
摘要 :Existing Protein Language Models (PLMs) often suffer from limited adaptability to multiple tasks and exhibit poor generalization across diverse biological contexts. In contrast, general-purpose Large Language Models (LLMs) lack the capability to interpret protein sequences and fall short in domain-specific knowledge, limiting their capacity for effective biosemantic reasoning. To combine the advantages of both, we propose BioBridge, a domain-adaptive continual pretraining framework for protein understanding. This framework employs Domain-Incremental Continual Pre-training (DICP) to infuse protein domain knowledge and general reasoning corpus into a LLM simultaneously, effectively mitigating catastrophic forgetting. Cross-modal alignment is achieved via a PLM-Projector-LLM pipeline, which maps protein sequence embeddings into the semantic space of the language model. Ultimately, an end-to-end optimization is adopted to uniformly support various tasks, including protein property prediction and knowledge question-answering. Our proposed BioBridge demonstrates performance comparable to that of mainstream PLMs on multiple protein benchmarks, such as EC and BindingDB. It also achieves results on par with LLMs on general understanding tasks like MMLU and RACE. This showcases its innovative advantage of combining domain-specific adaptability with general-purpose language competency.

Graph相关(图学习|图神经网络|图优化等)(11篇)

【1】Unifying approach to uniform expressivity of graph neural networks
标题:图神经网络统一表达性的统一方法
链接:https://arxiv.org/abs/2602.18409

作者:Huan Luo,Jonni Virtema
摘要:图神经网络(GNNs)的表达能力通常通过与Weisfeiler-Leman(WL)算法和一阶逻辑片段的对应来分析。标准GNN仅限于在直接邻域或全局读出上执行聚合。为了增加它们的表现力,最近已经尝试将子结构信息(例如,循环计数和子图属性)。在本文中,我们通过引入模板GNNs(T-GNNs)来形式化这种架构趋势,T-GNNs是一种通用框架,通过从一组指定的图模板中聚合有效的模板嵌入来更新节点特征。我们提出了一个相应的逻辑,分级模板模态逻辑(GML(T)),和广义的概念,基于模板的互模拟和WL算法。我们建立了T-GNN和GML(T)的表达能力之间的等价性,并提供了一种分析GNN表达能力的统一方法:我们展示了标准AC-GNN及其最近的变体如何被解释为T-GNN的实例化。
摘要:The expressive power of Graph Neural Networks (GNNs) is often analysed via correspondence to the Weisfeiler-Leman (WL) algorithm and fragments of first-order logic. Standard GNNs are limited to performing aggregation over immediate neighbourhoods or over global read-outs. To increase their expressivity, recent attempts have been made to incorporate substructural information (e.g. cycle counts and subgraph properties). In this paper, we formalize this architectural trend by introducing Template GNNs (T-GNNs), a generalized framework where node features are updated by aggregating over valid template embeddings from a specified set of graph templates. We propose a corresponding logic, Graded template modal logic (GML(T)), and generalized notions of template-based bisimulation and WL algorithm. We establish an equivalence between the expressive power of T-GNNs and GML(T), and provide a unifying approach for analysing GNN expressivity: we show how standard AC-GNNs and its recent variants can be interpreted as instantiations of T-GNNs.

【2】Stable Long-Horizon Spatiotemporal Prediction on Meshes Using Latent Multiscale Recurrent Graph Neural Networks
标题:利用潜在多尺度回归图神经网络进行网格稳定的长期时空预测
链接:https://arxiv.org/abs/2602.18146

作者:Lionel Salesses,Larbi Arbaoui,Tariq Benamara,Arnaud Francois,Caroline Sainvitu
摘要:复杂几何形状上时空场的准确长期预测是科学机器学习中的一个基本挑战,其应用包括增材制造,其中温度历史控制缺陷形成和机械性能。高保真模拟是准确的,但计算成本高,尽管最近取得了进展,机器学习方法仍然受到长期温度和梯度预测的挑战。我们提出了一个深度学习框架,用于直接在网格上预测完整的温度历史,以几何形状和工艺参数为条件,同时在数千个时间步上保持稳定性,并在异构几何形状中推广。该框架采用了一个时间多尺度架构组成的两个耦合模型在互补的时间尺度上运行。这两种模型都依赖于潜在递归图神经网络来捕获网格上的时空动态,而变分图自动编码器提供了一种紧凑的潜在表示,可以减少内存使用并提高训练稳定性。模拟粉末床融合数据的实验表明,在不同的几何形状,准确和时间稳定的长期预测,优于现有的基线。虽然在两个维度上进行评估,但该框架是通用的,并且可扩展到具有多尺度动力学的物理驱动系统和三维几何形状。
摘要:Accurate long-horizon prediction of spatiotemporal fields on complex geometries is a fundamental challenge in scientific machine learning, with applications such as additive manufacturing where temperature histories govern defect formation and mechanical properties. High-fidelity simulations are accurate but computationally costly, and despite recent advances, machine learning methods remain challenged by long-horizon temperature and gradient prediction. We propose a deep learning framework for predicting full temperature histories directly on meshes, conditioned on geometry and process parameters, while maintaining stability over thousands of time steps and generalizing across heterogeneous geometries. The framework adopts a temporal multiscale architecture composed of two coupled models operating at complementary time scales. Both models rely on a latent recurrent graph neural network to capture spatiotemporal dynamics on meshes, while a variational graph autoencoder provides a compact latent representation that reduces memory usage and improves training stability. Experiments on simulated powder bed fusion data demonstrate accurate and temporally stable long-horizon predictions across diverse geometries, outperforming existing baseline. Although evaluated in two dimensions, the framework is general and extensible to physics-driven systems with multiscale dynamics and to three-dimensional geometries.

【3】Advection-Diffusion on Graphs: A Bakry-Emery Laplacian for Spectral Graph Neural Networks
标题:图上的平流-扩散:谱图神经网络的Bakry-Emery Laplacian
链接:https://arxiv.org/abs/2602.18141

作者:Pierre-Gabriel Berlureau,Ali Hariri,Victor Kawasaki-Borruat,Mia Zosso,Pierre Vandergheynst
摘要:由于过度平滑和过度挤压,图神经网络(GNN)经常难以长距离传播信息。现有的补救措施,如图Transformers或重新布线通常会导致高计算成本或需要改变图结构。我们引入了一个Bakry-Emery图拉普拉斯算子,它通过一个可学习的节点势集成了扩散和平流,在不修改拓扑结构的情况下诱导了任务相关的传播动态。该算子具有良好的谱分解,并作为谱GNN中标准拉普拉斯算子的直接替代品。基于这一见解,我们开发了mu-ChebNet,这是一种频谱架构,可以共同学习潜在的Chebyshev滤波器,有效地桥接消息传递自适应性和频谱效率。我们的理论分析表明,潜在的调制频谱,使关键的图形属性的控制。从经验上讲,mu-ChebNet在合成远程推理任务以及现实世界的基准测试方面提供了一致的收益,同时提供了一个可解释的路由字段,揭示了信息如何通过图形流动。这将Bakry-Emery Laplacian确立为自适应谱图学习的原则性且有效的基础。
摘要:Graph Neural Networks (GNNs) often struggle to propagate information across long distances due to oversmoothing and oversquashing. Existing remedies such as graph transformers or rewiring typically incur high computational cost or require altering the graph structure. We introduce a Bakry-Emery graph Laplacian that integrates diffusion and advection through a learnable node-wise potential, inducing task-dependent propagation dynamics without modifying topology. This operator has a well-behaved spectral decomposition and acts as a drop-in replacement for standard Laplacians in spectral GNNs. Building on this insight, we develop mu-ChebNet, a spectral architecture that jointly learns the potential and Chebyshev filters, effectively bridging message-passing adaptivity and spectral efficiency. Our theoretical analysis shows how the potential modulates the spectrum, enabling control of key graph properties. Empirically, mu-ChebNet delivers consistent gains on synthetic long-range reasoning tasks, as well as real-world benchmarks, while offering an interpretable routing field that reveals how information flows through the graph. This establishes the Bakry-Emery Laplacian as a principled and efficient foundation for adaptive spectral graph learning.

【4】Balancing Symmetry and Efficiency in Graph Flow Matching
标题:平衡图流匹配中的对称性和效率
链接:https://arxiv.org/abs/2602.18084

作者:Benjamin Honoré,Alba Carballo-Castro,Yiming Qin,Pascal Frossard
备注:15 pages, 11 figures
摘要:等方差是图生成模型的核心,因为它确保模型尊重图的排列对称性。然而,严格的等方差可能会增加计算成本,由于增加的架构约束,并可能会减慢收敛,因为模型必须在一个大的空间的可能的节点排列是一致的。我们研究这种权衡图生成模型。具体来说,我们从一个等变离散流匹配模型,并放松其等方差在训练过程中通过一个可控的对称调制方案的基础上正弦位置编码和节点排列。实验首先表明,破密可以通过提供更容易的学习信号来加速早期训练,但代价是鼓励可能导致过拟合的捷径解决方案,其中模型重复生成与训练集重复的图。相反,适当地调制对称信号可以延迟过拟合,同时加速收敛,使模型达到更强的性能与19\%$的基线训练时期。
摘要:Equivariance is central to graph generative models, as it ensures the model respects the permutation symmetry of graphs. However, strict equivariance can increase computational cost due to added architectural constraints, and can slow down convergence because the model must be consistent across a large space of possible node permutations. We study this trade-off for graph generative models. Specifically, we start from an equivariant discrete flow-matching model, and relax its equivariance during training via a controllable symmetry modulation scheme based on sinusoidal positional encodings and node permutations. Experiments first show that symmetry-breaking can accelerate early training by providing an easier learning signal, but at the expense of encouraging shortcut solutions that can cause overfitting, where the model repeatedly generates graphs that are duplicates of the training set. On the contrary, properly modulating the symmetry signal can delay overfitting while accelerating convergence, allowing the model to reach stronger performance with $19\%$ of the baseline training epochs.

【5】CityGuard: Graph-Aware Private Descriptors for Bias-Resilient Identity Search Across Urban Cameras
标题:CityGuard:图形感知私人描述符,用于跨城市摄像机的防偏身份搜索
链接:https://arxiv.org/abs/2602.18047

作者:Rong Fu,Wenxin Zhang,Yibo Meng,Jia Yee Tan,Jiaxuan Lu,Rui Lu,Jiekai Wu,Zhaolu Kang,Simon Fong
备注:36 pages, 12 figures
摘要 :跨分布式摄像机进行城市规模的人员重新识别必须处理来自视角、遮挡和域转移的严重外观变化,同时遵守防止共享原始图像的数据保护规则。我们介绍CityGuard,一个拓扑感知的Transformer隐私保护身份检索分散监控。该框架包括三个组成部分。分散自适应度量学习器根据特征扩散调整实例级边缘,增加类内紧凑性。空间调节注意力将粗糙的几何形状(例如GPS或部署平面图)注入到基于图形的自注意力中,以仅使用粗糙的几何先验来实现投影一致的交叉视图对准,而不需要测量级校准。差分私有嵌入映射与紧凑的近似索引相结合,以支持安全和经济高效的部署。这些设计一起产生对视点变化、遮挡和域转移鲁棒的描述符,并且它们在严格的差分隐私会计下实现了隐私和效用之间的可调平衡。Market-1501和其他公共基准的实验,辅以数据库规模的检索研究,显示检索精度和查询吞吐量在强基线上的一致收益,证实了隐私关键的城市身份匹配框架的实用性。
摘要:City-scale person re-identification across distributed cameras must handle severe appearance changes from viewpoint, occlusion, and domain shift while complying with data protection rules that prevent sharing raw imagery. We introduce CityGuard, a topology-aware transformer for privacy-preserving identity retrieval in decentralized surveillance. The framework integrates three components. A dispersion-adaptive metric learner adjusts instance-level margins according to feature spread, increasing intra-class compactness. Spatially conditioned attention injects coarse geometry, such as GPS or deployment floor plans, into graph-based self-attention to enable projectively consistent cross-view alignment using only coarse geometric priors without requiring survey-grade calibration. Differentially private embedding maps are coupled with compact approximate indexes to support secure and cost-efficient deployment. Together these designs produce descriptors robust to viewpoint variation, occlusion, and domain shifts, and they enable a tunable balance between privacy and utility under rigorous differential-privacy accounting. Experiments on Market-1501 and additional public benchmarks, complemented by database-scale retrieval studies, show consistent gains in retrieval precision and query throughput over strong baselines, confirming the practicality of the framework for privacy-critical urban identity matching.

【6】Whole-Brain Connectomic Graph Model Enables Whole-Body Locomotion Control in Fruit Fly
标题:全脑连接组图模型实现果蝇全身运动控制
链接:https://arxiv.org/abs/2602.17997

作者:Zehao Jin,Yaoye Zhu,Chen Zhang,Yanan Sui
摘要:全脑生物神经网络自然支持学习和控制全身运动。然而,在具体的强化学习中使用大脑连接体作为神经网络控制器仍然是未知的。我们研究使用成年果蝇的大脑的确切神经结构来控制其身体运动。我们建立了果蝇连接组图模型(Fly-connectomic Graph Model,FlyGM),其静态结构与成年果蝇的完整连接组相同,用于全身运动控制。为了执行动态控制,FlyGM将静态连接体表示为有向消息传递图,以将生物学接地的信息流从感觉输入施加到运动输出。与生物力学果蝇模型相结合,我们的方法实现了稳定的控制,在不同的运动任务,而无需特定的任务的架构调整。为了验证基于连接组的模型的结构优势,我们将其与保持度的重新连接图,随机图和多层感知器进行比较,结果表明FlyGM具有更高的样本效率和优异的性能。这项工作表明,静态的大脑连接体可以被转化为实例化有效的神经策略,用于运动控制的具身学习。
摘要:Whole-brain biological neural networks naturally support the learning and control of whole-body movements. However, the use of brain connectomes as neural network controllers in embodied reinforcement learning remains unexplored. We investigate using the exact neural architecture of an adult fruit fly's brain for the control of its body movement. We develop Fly-connectomic Graph Model (FlyGM), whose static structure is identical to the complete connectome of an adult Drosophila for whole-body locomotion control. To perform dynamical control, FlyGM represents the static connectome as a directed message-passing graph to impose a biologically grounded information flow from sensory inputs to motor outputs. Integrated with a biomechanical fruit fly model, our method achieves stable control across diverse locomotion tasks without task-specific architectural tuning. To verify the structural advantages of the connectome-based model, we compare it against a degree-preserving rewired graph, a random graph, and multilayer perceptrons, showing that FlyGM yields higher sample efficiency and superior performance. This work demonstrates that static brain connectomes can be transformed to instantiate effective neural policy for embodied learning of movement control.

【7】Generating adversarial inputs for a graph neural network model of AC power flow
标题:为交流潮流图神经网络模型生成对抗输入
链接:https://arxiv.org/abs/2602.17975

作者:Robert Parker
摘要:这项工作制定和解决优化问题,以生成输入点,产生高误差之间的神经网络的预测交流潮流的解决方案和解决方案的交流潮流方程。我们证明了这种能力的CANOS-PF图神经网络模型的一个实例,实现PF$Δ$基准库,在14总线测试网格上运行。生成的对抗点产生的误差高达3.4每单位的无功功率和0.08每单位的电压幅度。当最小化满足对抗性约束所需的训练点的扰动时,我们发现单根母线上的电压幅值只需每单位0.04的扰动即可满足约束。这项工作激励了交流潮流的神经网络代理模型的严格验证和鲁棒训练方法的发展。
摘要:This work formulates and solves optimization problems to generate input points that yield high errors between a neural network's predicted AC power flow solution and solutions to the AC power flow equations. We demonstrate this capability on an instance of the CANOS-PF graph neural network model, as implemented by the PF$Δ$ benchmark library, operating on a 14-bus test grid. Generated adversarial points yield errors as large as 3.4 per-unit in reactive power and 0.08 per-unit in voltage magnitude. When minimizing the perturbation from a training point necessary to satisfy adversarial constraints, we find that the constraints can be met with as little as an 0.04 per-unit perturbation in voltage magnitude on a single bus. This work motivates the development of rigorous verification and robust training methods for neural network surrogate models of AC power flow.

【8】Optimizing Graph Causal Classification Models: Estimating Causal Effects and Addressing Confounders
标题:优化图因果分类模型:估计因果效应并解决混杂因素
链接:https://arxiv.org/abs/2602.17941

作者:Simi Job,Xiaohui Tao,Taotao Cai,Haoran Xie,Jianming Yong,Xin Wang
摘要:由于各个领域对AI中关系洞察的需求不断增长,图形数据变得越来越普遍。组织经常使用图形数据来解决涉及关系和连接的复杂问题。在这种情况下,因果学习尤其重要,因为它有助于理解因果关系,而不仅仅是关联。由于许多现实世界的系统本质上是因果关系的,因此图可以有效地对这些系统进行建模。然而,包括图神经网络(GNN)在内的传统图机器学习方法依赖于相关性,并且对虚假模式和分布变化敏感。另一方面,因果模型通过隔离真正的因果因素来实现稳健的预测,从而使它们在这种变化下更加稳定。因果学习还有助于识别和调整混杂因素,确保预测反映真实的因果关系,即使在干预下也保持准确。为了应对这些挑战并构建强大且因果信息的模型,我们提出了CCAGNN,这是一个混淆感知因果GNN框架,将因果推理纳入图学习,支持反事实推理并在现实世界中提供可靠的预测。对来自不同领域的六个公开数据集的综合实验表明,CCAGNN始终优于领先的最先进的模型。
摘要:Graph data is becoming increasingly prevalent due to the growing demand for relational insights in AI across various domains. Organizations regularly use graph data to solve complex problems involving relationships and connections. Causal learning is especially important in this context, since it helps to understand cause-effect relationships rather than mere associations. Since many real-world systems are inherently causal, graphs can efficiently model these systems. However, traditional graph machine learning methods including graph neural networks (GNNs), rely on correlations and are sensitive to spurious patterns and distribution changes. On the other hand, causal models enable robust predictions by isolating true causal factors, thus making them more stable under such shifts. Causal learning also helps in identifying and adjusting for confounders, ensuring that predictions reflect true causal relationships and remain accurate even under interventions. To address these challenges and build models that are robust and causally informed, we propose CCAGNN, a Confounder-Aware causal GNN framework that incorporates causal reasoning into graph learning, supporting counterfactual reasoning and providing reliable predictions in real-world settings. Comprehensive experiments on six publicly available datasets from diverse domains show that CCAGNN consistently outperforms leading state-of-the-art models.

【9】Causal Neighbourhood Learning for Invariant Graph Representations
标题:不变图表示的因果邻近学习
链接:https://arxiv.org/abs/2602.17934

作者:Simi Job,Xiaohui Tao,Taotao Cai,Haoran Xie,Jianming Yong
摘要:图数据通常包含噪声和虚假的相关性,这些相关性掩盖了真正的因果关系,这对于使图模型能够基于数据的潜在因果结构进行预测至关重要。对虚假连接的依赖使得传统的图神经网络(GNN)难以在不同的图之间进行有效的泛化。此外,传统的聚合方法往往会放大这些虚假的模式,限制模型的鲁棒性下的分布变化。为了解决这些问题,我们提出了因果邻域学习与图神经网络(CNL-GNN),一个新的框架,执行因果干预图结构。CNL-GNN通过生成反事实邻域和由可学习的重要性掩蔽和基于注意力的机制引导的自适应边缘扰动,有效地识别和保留因果相关连接,并减少虚假影响。此外,通过将结构级干预与因果特征与混杂因素的分离相结合,该模型可以学习鲁棒的不变节点表示,并且可以在不同的图结构中很好地推广。我们的方法改进了因果图学习,超越了传统的基于特征的方法,从而产生了一个强大的分类模型。在四个公开可用的数据集上进行的广泛实验,包括一个数据集的多个域变体,表明CNL-GNN优于最先进的GNN模型。
摘要 :Graph data often contain noisy and spurious correlations that mask the true causal relationships, which are essential for enabling graph models to make predictions based on the underlying causal structure of the data. Dependence on spurious connections makes it challenging for traditional Graph Neural Networks (GNNs) to generalize effectively across different graphs. Furthermore, traditional aggregation methods tend to amplify these spurious patterns, limiting model robustness under distribution shifts. To address these issues, we propose Causal Neighbourhood Learning with Graph Neural Networks (CNL-GNN), a novel framework that performs causal interventions on graph structure. CNL-GNN effectively identifies and preserves causally relevant connections and reduces spurious influences through the generation of counterfactual neighbourhoods and adaptive edge perturbation guided by learnable importance masking and an attention-based mechanism. In addition, by combining structural-level interventions with the disentanglement of causal features from confounding factors, the model learns invariant node representations that are robust and generalize well across different graph structures. Our approach improves causal graph learning beyond traditional feature-based methods, resulting in a robust classification model. Extensive experiments on four publicly available datasets, including multiple domain variants of one dataset, demonstrate that CNL-GNN outperforms state-of-the-art GNN models.

【10】COMBA: Cross Batch Aggregation for Learning Large Graphs with Context Gating State Space Models
标题:COMBA:使用上下文门控状态空间模型学习大型图的交叉批聚合
链接:https://arxiv.org/abs/2602.17893

作者:Jiajun Shen,Yufei Jin,Yi He,xingquan Zhu
摘要:状态空间模型(SSM)最近出现的建模序列数据中的长程相关性,大大简化了计算成本比现代的替代品,如Transformers。将SMM推进到图结构化数据,特别是对于大型图,是一个重大挑战,因为SSM是序列模型,并且剪切图体积使得将图转换为序列以进行有效学习非常昂贵。在本文中,我们提出了COMBA来解决大型图学习使用状态空间模型,有两个关键的创新:图上下文门控和跨批次聚合。图上下文是指每个节点的邻居的不同跳数,图上下文门控允许COMBA使用这样的上下文来学习邻居聚合的最佳控制。对于每个图形上下文,COMBA将节点作为批次进行采样,并训练图形神经网络(GNN),信息跨批次聚合,允许COMBA扩展到大型图形。我们的理论研究断言,跨批次聚合保证比没有聚合的训练GNN更低的错误。在基准网络上的实验表明,与基线方法相比,性能有显着提高。代码和基准测试数据集将被发布供公众访问。
摘要:State space models (SSMs) have recently emerged for modeling long-range dependency in sequence data, with much simplified computational costs than modern alternatives, such as transformers. Advancing SMMs to graph structured data, especially for large graphs, is a significant challenge because SSMs are sequence models and the shear graph volumes make it very expensive to convert graphs as sequences for effective learning. In this paper, we propose COMBA to tackle large graph learning using state space models, with two key innovations: graph context gating and cross batch aggregation. Graph context refers to different hops of neighborhood for each node, and graph context gating allows COMBA to use such context to learn best control of neighbor aggregation. For each graph context, COMBA samples nodes as batches, and train a graph neural network (GNN), with information being aggregated cross batches, allowing COMBA to scale to large graphs. Our theoretical study asserts that cross-batch aggregation guarantees lower error than training GNN without aggregation. Experiments on benchmark networks demonstrate significant performance gains compared to baseline approaches. Code and benchmark datasets will be released for public access.

【11】Benchmarking Graph Neural Networks in Solving Hard Constraint Satisfaction Problems
标题:对标图神经网络解决硬约束满足问题
链接:https://arxiv.org/abs/2602.18419

作者:Geri Skenderi,Lorenzo Buffoni,Francesco D'Amico,David Machado,Raffaele Marino,Matteo Negri,Federico Ricci-Tersenghi,Carlo Lucibello,Maria Chiara Angelini
摘要:图神经网络(GNN)越来越多地应用于困难的优化问题,通常声称优于经典算法。然而,由于缺乏对真正困难的情况的标准基准,这种说法可能是不可靠的。从统计物理的角度来看,我们提出了新的硬基准的基础上随机问题。我们提供了这些基准测试,以及经典启发式算法和GNN的性能结果。我们的公平比较表明,经典算法仍然优于GNN。我们讨论了神经网络在这一领域面临的挑战。使用我们的基准可以使未来的优越性声明更加强大,可在https://github.com/ArtLabBocconi/RandCSPBench上获得。
摘要:Graph neural networks (GNNs) are increasingly applied to hard optimization problems, often claiming superiority over classical heuristics. However, such claims risk being unsolid due to a lack of standard benchmarks on truly hard instances. From a statistical physics perspective, we propose new hard benchmarks based on random problems. We provide these benchmarks, along with performance results from both classical heuristics and GNNs. Our fair comparison shows that classical algorithms still outperform GNNs. We discuss the challenges for neural networks in this domain. Future claims of superiority can be made more robust using our benchmarks, available at https://github.com/ArtLabBocconi/RandCSPBench.

Transformer(4篇)

【1】Subgroups of $U(d)$ Induce Natural RNN and Transformer Architectures
标题:$U(d)$的子组诱导自然RNN和Transformer架构
链接:https://arxiv.org/abs/2602.18417

作者:Joshua Nunley
备注:12 pages, 3 figures, 8 tables
摘要:本文给出了U(d)的闭子群上隐态序列模型的一个直接框架。我们使用一个最小的公理设置,并从一个共享的骨架中派生经常性和Transformer模板,其中子组选择作为状态空间,切线投影和更新映射的下拉式替换。然后,我们专门研究O(d),并在参数匹配的设置下,在Tiny Shakespeare和Penn Treebank上评估正交状态RNN和Transformer模型。我们还报告了切线空间中的一般线性混合扩展,该扩展适用于子群选择,并提高了当前O(d)实验中的有限预算性能。
摘要:This paper presents a direct framework for sequence models with hidden states on closed subgroups of U(d). We use a minimal axiomatic setup and derive recurrent and transformer templates from a shared skeleton in which subgroup choice acts as a drop-in replacement for state space, tangent projection, and update map. We then specialize to O(d) and evaluate orthogonal-state RNN and transformer models on Tiny Shakespeare and Penn Treebank under parameter-matched settings. We also report a general linear-mixing extension in tangent space, which applies across subgroup choices and improves finite-budget performance in the current O(d) experiments.

【2】TempoNet: Slack-Quantized Transformer-Guided Reinforcement Scheduler for Adaptive Deadline-Centric Real-Time Dispatchs
标题:TempoNet:用于自适应以截止日期为中心的实时调度的松弛量化转换器引导的强化调度
链接:https://arxiv.org/abs/2602.18109

作者:Rong Fu,Yibo Meng,Guangzhen Yao,Jiaxuan Lu,Zeyu Zhang,Zhaolu Kang,Ziming Guo,Jia Yee Tan,Xiaojing Du,Simon James Fong
备注:43 pages, 12 figures
摘要:实时调度人员必须在严格的计算预算下考虑紧迫的最后期限。我们提出了TempoNet,这是一种强化学习调度器,它将置换不变的Transformer与深度Q近似配对。紧急令牌化器将时间松弛离散化为可学习的嵌入,稳定值学习并捕获最后期限接近度。一个延迟感知的稀疏注意力栈,具有块式top-k选择和局部敏感的分块,使全局推理的无序任务集与近线性缩放和亚毫秒级的推理。多核映射层通过屏蔽贪婪选择或可区分匹配将上下文化的Q分数转换为处理器分配。对工业混合关键性跟踪和大型多处理器设置的广泛评估显示,在分析型机器人和神经基线的最后期限实现方面取得了一致的进展,同时优化稳定性也得到了提高。诊断包括松弛量化的敏感性分析,注意力驱动的政策解释,硬件在环和内核微基准测试,以及简单的运行时缓解压力下的鲁棒性;我们还报告了行为克隆预训练的样本效率优势以及与演员-评论家变体的兼容性,而不改变推理管道。这些结果建立了一个实用的框架,在高吞吐量的实时调度基于变压器的决策。
摘要:Real-time schedulers must reason about tight deadlines under strict compute budgets. We present TempoNet, a reinforcement learning scheduler that pairs a permutation-invariant Transformer with a deep Q-approximation. An Urgency Tokenizer discretizes temporal slack into learnable embeddings, stabilizing value learning and capturing deadline proximity. A latency-aware sparse attention stack with blockwise top-k selection and locality-sensitive chunking enables global reasoning over unordered task sets with near-linear scaling and sub-millisecond inference. A multicore mapping layer converts contextualized Q-scores into processor assignments through masked-greedy selection or differentiable matching. Extensive evaluations on industrial mixed-criticality traces and large multiprocessor settings show consistent gains in deadline fulfillment over analytic schedulers and neural baselines, together with improved optimization stability. Diagnostics include sensitivity analyses for slack quantization, attention-driven policy interpretation, hardware-in-the-loop and kernel micro-benchmarks, and robustness under stress with simple runtime mitigations; we also report sample-efficiency benefits from behavioral-cloning pretraining and compatibility with an actor-critic variant without altering the inference pipeline. These results establish a practical framework for Transformer-based decision making in high-throughput real-time scheduling.

【3】ZACH-ViT: Regime-Dependent Inductive Bias in Compact Vision Transformers for Medical Imaging
标题:ZACH-ViT:医疗成像紧凑型视觉变形器中的机制依赖性感应偏差
链接:https://arxiv.org/abs/2602.17929

作者:Athanasios Angelakis
备注:15 pages, 12 figures, 7 tables. Code and models available at https://github.com/Bluesman79/ZACH-ViT
摘要 :Vision Transformers依赖于对固定空间先验进行编码的位置嵌入和类令牌。虽然对自然图像有效,但当空间布局信息量弱或不一致时,这些先验可能会阻碍泛化,这是医学成像和边缘部署临床系统中的常见情况。我们介绍ZACH-ViT(零令牌自适应紧凑分层Vision Transformer),一个紧凑的Vision Transformer,删除位置嵌入和[CLS]令牌,通过补丁表示的全局平均池化实现排列不变性。“零令牌”一词具体指的是删除专用的[CLS]聚合令牌和位置嵌入;补丁令牌保持不变并正常处理。自适应残差投影在紧凑配置中保持训练稳定性,同时保持严格的参数预算。 在严格的Few-Shot方案(每类50个样本,固定超参数,5个随机种子)下,在跨越二进制和多类任务的7个MedMNIST数据集上进行评价。实证分析证明了依赖于制度的行为:ZACH-ViT(0.25 M参数,从头开始训练)在BloodMNIST上实现了最大的优势,并在PathMNIST上与TransMIL保持竞争力,而其相对优势在具有强解剖先验的数据集(OCTMNIST,OrganAMNIST)上下降,与架构假设一致。这些发现支持了这样一种观点,即将架构归纳偏差与数据结构相结合可能比追求通用基准优势更重要。尽管ZACH-ViT的尺寸很小,而且缺乏预训练,但它在保持亚秒级推理时间的同时,仍能实现具有竞争力的性能,支持在资源受限的临床环境中部署。代码和模型可在https://github.com/Bluesman79/ZACH-ViT上获得。
摘要:Vision Transformers rely on positional embeddings and class tokens that encode fixed spatial priors. While effective for natural images, these priors may hinder generalization when spatial layout is weakly informative or inconsistent, a frequent condition in medical imaging and edge-deployed clinical systems. We introduce ZACH-ViT (Zero-token Adaptive Compact Hierarchical Vision Transformer), a compact Vision Transformer that removes both positional embeddings and the [CLS] token, achieving permutation invariance through global average pooling over patch representations. The term "Zero-token" specifically refers to removing the dedicated [CLS] aggregation token and positional embeddings; patch tokens remain unchanged and are processed normally. Adaptive residual projections preserve training stability in compact configurations while maintaining a strict parameter budget. Evaluation is performed across seven MedMNIST datasets spanning binary and multi-class tasks under a strict few-shot protocol (50 samples per class, fixed hyperparameters, five random seeds). The empirical analysis demonstrates regime-dependent behavior: ZACH-ViT (0.25M parameters, trained from scratch) achieves its strongest advantage on BloodMNIST and remains competitive with TransMIL on PathMNIST, while its relative advantage decreases on datasets with strong anatomical priors (OCTMNIST, OrganAMNIST), consistent with the architectural hypothesis. These findings support the view that aligning architectural inductive bias with data structure can be more important than pursuing universal benchmark dominance. Despite its minimal size and lack of pretraining, ZACH-ViT achieves competitive performance while maintaining sub-second inference times, supporting deployment in resource-constrained clinical environments. Code and models are available at https://github.com/Bluesman79/ZACH-ViT.

【4】Financial time series augmentation using transformer based GAN architecture
标题:使用基于Transformer的GAN架构增强金融时间序列
链接:https://arxiv.org/abs/2602.17865

作者:Andrzej Podobiński,Jarosław A. Chudziak
备注:This paper has been accepted for the upcoming 18th International Conference on Agents and Artificial Intelligence (ICAART-2026), Marbella, Spain. The final published version will appear in the official conference proceedings
摘要:时间序列预测是许多领域的关键任务,从工程到经济,准确的预测驱动战略决策。然而,由于金融时间序列数据的固有局限性和动态性,将高级深度学习模型应用于金融等具有挑战性的波动性领域是困难的。这种稀缺性往往导致次优模型训练和泛化能力差。根本的挑战在于确定如何可靠地增强稀缺的金融时间序列数据,以提高深度学习预测模型的预测准确性。我们的主要贡献是演示了生成对抗网络(GANs)如何有效地作为数据增强工具来克服金融领域的数据稀缺性。具体来说,我们表明,与单独使用真实数据相比,在由基于变换器的GAN(TTS-GAN)生成的合成数据增强的数据集上训练长短期记忆(LSTM)预测模型显着提高了预测精度。我们在不同的金融时间序列(比特币和S\& P500价格数据)和各种预测范围内确认了这些结果。此外,我们提出了一种新的时间序列特定的质量度量,它结合了动态时间规整(DTW)和修改后的深度数据集差异度量(DeD-iMs),以可靠地监控训练进度并评估生成数据的质量。这些发现为基于GAN的数据增强在增强财务预测能力方面的好处提供了令人信服的证据。
摘要:Time-series forecasting is a critical task across many domains, from engineering to economics, where accurate predictions drive strategic decisions. However, applying advanced deep learning models in challenging, volatile domains like finance is difficult due to the inherent limitation and dynamic nature of financial time series data. This scarcity often results in sub-optimal model training and poor generalization. The fundamental challenge lies in determining how to reliably augment scarce financial time series data to enhance the predictive accuracy of deep learning forecasting models. Our main contribution is a demonstration of how Generative Adversarial Networks (GANs) can effectively serve as a data augmentation tool to overcome data scarcity in the financial domain. Specifically, we show that training a Long Short-Term Memory (LSTM) forecasting model on a dataset augmented with synthetic data generated by a transformer-based GAN (TTS-GAN) significantly improves the forecasting accuracy compared to using real data alone. We confirm these results across different financial time series (Bitcoin and S\&P500 price data) and various forecasting horizons. Furthermore, we propose a novel, time series specific quality metric that combines Dynamic Time Warping (DTW) and a modified Deep Dataset Dissimilarity Measure (DeD-iMs) to reliably monitor the training progress and evaluate the quality of the generated data. These findings provide compelling evidence for the benefits of GAN-based data augmentation in enhancing financial predictive capabilities.

GAN|对抗|攻击|生成相关(7篇)

【1】Robo-Saber: Generating and Simulating Virtual Reality Players
标题:Robo-Saber:生成和模拟虚拟现实玩家
链接:https://arxiv.org/abs/2602.18319

作者:Nam Hee Kim,Jingjing May Liu,Jaakko Lehtinen,Perttu Hämäläinen,James F. O'Brien,Xue Bin Peng
备注:13 pages, 15 figures. Accepted to Eurographics 2026. Project page: https://robo-saber.github.io/
摘要:我们提出了第一个运动生成系统的playtesting虚拟现实(VR)游戏。我们的玩家模型从游戏中的对象排列中生成VR耳机和手持控制器的动作,由风格范例指导并对齐,以最大限度地提高模拟游戏得分。我们在大型BOXRR-23数据集上进行训练,并将我们的框架应用于流行的VR游戏Beat Saber。由此产生的模型Robo-Saber产生熟练的游戏玩法,并捕捉不同的玩家行为,反映了输入风格范例指定的技能水平和运动模式。Robo-Saber在为预测应用合成丰富的游戏数据和实现基于物理的全身VR游戏测试代理方面展示了前景。
摘要:We present the first motion generation system for playtesting virtual reality (VR) games. Our player model generates VR headset and handheld controller movements from in-game object arrangements, guided by style exemplars and aligned to maximize simulated gameplay score. We train on the large BOXRR-23 dataset and apply our framework on the popular VR game Beat Saber. The resulting model Robo-Saber produces skilled gameplay and captures diverse player behaviors, mirroring the skill levels and movement patterns specified by input style exemplars. Robo-Saber demonstrates promise in synthesizing rich gameplay data for predictive applications and enabling a physics-based whole-body VR playtesting agent.

【2】Continual-NExT: A Unified Comprehension And Generation Continual Learning Framework
标题:Continual-NExT:统一的理解和生成持续学习框架
链接:https://arxiv.org/abs/2602.18055

作者:Jingyang Qiao,Zhizhong Zhang,Xin Tan,Jingyu Gong,Yanyun Qu,Yuan Xie
摘要:Dual-to-Dual MLLM是指多模态大型语言模型,它可以通过文本和图像模态实现统一的多模态理解和生成。虽然表现出强大的瞬时学习和泛化能力,但Dual-to-Dual MLLM在终身进化中仍然存在缺陷,严重影响了对动态现实世界场景的持续适应。挑战之一是学习新任务不可避免地会破坏所学知识。除了传统的灾难性遗忘之外,双对双MLLM还面临其他挑战,包括幻觉,指令不遵循和跨模式知识转移失败。然而,尚未建立双对双硕士的标准化持续学习框架,因此这些挑战尚未得到探讨。因此,在本文中,我们建立了持续的NExT,一个持续的学习框架,为双双MLLM与精心设计的评估指标。为了提高Dual-to-Dual MLLM的持续学习能力,我们提出了一种有效的MAGE(Mixture and Aggregation of General LoRA and Expert LoRA)方法,以进一步促进跨模态的知识转移并减轻遗忘。大量的实验表明,MAGE优于其他持续学习方法,并达到最先进的性能。
摘要:Dual-to-Dual MLLMs refer to Multimodal Large Language Models, which can enable unified multimodal comprehension and generation through text and image modalities. Although exhibiting strong instantaneous learning and generalization capabilities, Dual-to-Dual MLLMs still remain deficient in lifelong evolution, significantly affecting continual adaptation to dynamic real-world scenarios. One of the challenges is that learning new tasks inevitably destroys the learned knowledge. Beyond traditional catastrophic forgetting, Dual-to-Dual MLLMs face other challenges, including hallucination, instruction unfollowing, and failures in cross-modal knowledge transfer. However, no standardized continual learning framework for Dual-to-Dual MLLMs has been established yet, leaving these challenges unexplored. Thus, in this paper, we establish Continual-NExT, a continual learning framework for Dual-to-Dual MLLMs with deliberately-architected evaluation metrics. To improve the continual learning capability of Dual-to-Dual MLLMs, we propose an efficient MAGE (Mixture and Aggregation of General LoRA and Expert LoRA) method to further facilitate knowledge transfer across modalities and mitigate forgetting. Extensive experiments demonstrate that MAGE outperforms other continual learning methods and achieves state-of-the-art performance.

【3】Provable Adversarial Robustness in In-Context Learning
标题:上下文学习中可证明的对抗鲁棒性
链接 :https://arxiv.org/abs/2602.17743

作者:Di Zhang
备注:16 pages
摘要:大型语言模型通过上下文学习(ICL)适应新任务,无需参数更新。目前对这种能力的理论解释假设测试任务是从类似于预训练期间看到的分布中提取的。这种假设忽略了威胁现实世界可靠性的对抗性分布变化。为了解决这一差距,我们引入了一个分布鲁棒的元学习框架,提供最坏情况下的性能保证ICL下基于Wasserstein的分布变化。针对线性自注意Transformers,我们得到了一个非渐近界连接对抗扰动强度($ρ$),模型容量($m$),和在上下文中的例子的数量($N$)。分析表明,模型的鲁棒性与其容量的平方根($ρ_{\text{max}} \propto \sqrt{m}$)成比例,而对抗性设置会施加与扰动幅度的平方成比例的样本复杂度惩罚($N_ρ- N_0 \propto ρ^2$)。合成任务的实验证实了这些缩放定律。这些发现推进了对ICL在对抗条件下的极限的理论理解,并表明模型容量是分布鲁棒性的基本资源。
摘要:Large language models adapt to new tasks through in-context learning (ICL) without parameter updates. Current theoretical explanations for this capability assume test tasks are drawn from a distribution similar to that seen during pretraining. This assumption overlooks adversarial distribution shifts that threaten real-world reliability. To address this gap, we introduce a distributionally robust meta-learning framework that provides worst-case performance guarantees for ICL under Wasserstein-based distribution shifts. Focusing on linear self-attention Transformers, we derive a non-asymptotic bound linking adversarial perturbation strength ($ρ$), model capacity ($m$), and the number of in-context examples ($N$). The analysis reveals that model robustness scales with the square root of its capacity ($ρ_{\text{max}} \propto \sqrt{m}$), while adversarial settings impose a sample complexity penalty proportional to the square of the perturbation magnitude ($N_ρ- N_0 \propto ρ^2$). Experiments on synthetic tasks confirm these scaling laws. These findings advance the theoretical understanding of ICL's limits under adversarial conditions and suggest that model capacity serves as a fundamental resource for distributional robustness.

【4】Parallel Complex Diffusion for Scalable Time Series Generation
标题:用于可扩展时间序列生成的并行复扩散
链接:https://arxiv.org/abs/2602.17706

作者:Rongyao Cai,Yuxi Wan,Kexin Zhang,Ming Jin,Zhiqiang Ge,Qingsong Wen,Yong Liu
摘要:在时间序列生成中建模长距离依赖关系在表示能力和计算效率之间提出了一个基本的权衡。传统的时间扩散模型受到本地纠缠和注意机制的$\mathcal{O}(L^2)$成本的影响。我们通过引入PaCoDi(并行复扩散)来解决这些限制,PaCoDi是一种频谱原生架构,可以在频域中进行生成建模。PaCoDi从根本上改变了问题的拓扑结构:傅立叶变换充当对角化算子,将局部耦合的时间信号转换为全局去相关的频谱分量。在理论上,我们证明了正交正向扩散和条件反向分解定理,证明了复杂的扩散过程可以分为独立的实部和虚部。我们使用由交互式校正机制加强的\textbf{平均场理论(MFT)近似}来弥合这种解耦理论与数据现实之间的差距。此外,我们将这种离散DDPM推广到连续时间频率偏微分方程,严格推导出描述微分谱布朗运动极限的谱维纳过程。最重要的是,PaCoDi利用实值信号的Hermitian对称性将序列长度压缩一半,从而在不丢失信息的情况下将注意力FLOPs减少50%。我们进一步推导出一个严格的异方差损失来处理压缩流形上的非各向同性噪声分布。大量的实验表明,PaCoDi在生成质量和推理速度方面都优于现有的基线,为时间序列建模提供了理论基础和计算效率高的解决方案。
摘要:Modeling long-range dependencies in time series generation poses a fundamental trade-off between representational capacity and computational efficiency. Traditional temporal diffusion models suffer from local entanglement and the $\mathcal{O}(L^2)$ cost of attention mechanisms. We address these limitations by introducing PaCoDi (Parallel Complex Diffusion), a spectral-native architecture that decouples generative modeling in the frequency domain. PaCoDi fundamentally alters the problem topology: the Fourier Transform acts as a diagonalizing operator, converting locally coupled temporal signals into globally decorrelated spectral components. Theoretically, we prove the Quadrature Forward Diffusion and Conditional Reverse Factorization theorem, demonstrating that the complex diffusion process can be split into independent real and imaginary branches. We bridge the gap between this decoupled theory and data reality using a \textbf{Mean Field Theory (MFT) approximation} reinforced by an interactive correction mechanism. Furthermore, we generalize this discrete DDPM to continuous-time Frequency SDEs, rigorously deriving the Spectral Wiener Process describe the differential spectral Brownian motion limit. Crucially, PaCoDi exploits the Hermitian Symmetry of real-valued signals to compress the sequence length by half, achieving a 50% reduction in attention FLOPs without information loss. We further derive a rigorous Heteroscedastic Loss to handle the non-isotropic noise distribution on the compressed manifold. Extensive experiments show that PaCoDi outperforms existing baselines in both generation quality and inference speed, offering a theoretically grounded and computationally efficient solution for time series modeling.

【5】DesignAsCode: Bridging Structural Editability and Visual Fidelity in Graphic Design Generation
标题:DesignAsCode:在平面设计生成中架起结构可编辑性和视觉逼真性的桥梁
链接:https://arxiv.org/abs/2602.17690

作者:Ziyuan Liu,Shizhao Sun,Danqing Huang,Yingdong Shi,Meisheng Zhang,Ji Li,Jingsong Yu,Jiang Bian
摘要:图形设计生成需要在高视觉保真度和细粒度结构可编辑性之间保持微妙的平衡。然而,现有的方法通常分为不可编辑的光栅图像合成或缺乏视觉内容的抽象布局生成。这两种方法的最新组合试图弥合这一差距,但往往遭受僵化的组成模式和不可分辨的视觉不和谐(例如,文本-背景冲突),这是由于它们的非表达性表示和开环性质。为了解决这些挑战,我们提出了DesignAsCode,一个新的框架,重新想象图形设计作为一个程序化的合成任务,使用HTML/CSS。具体来说,我们引入了一个计划-实现-反射管道,结合语义规划器来构建动态的,可变深度的元素层次结构和视觉感知反射机制,迭代优化代码以纠正渲染工件。大量的实验表明,DesignAsCode在结构有效性和美学质量方面都显着优于最先进的基线。此外,我们的代码原生表示解锁了高级功能,包括自动布局重定向,复杂文档生成(例如,简历)和基于CSS的动画。
摘要:Graphic design generation demands a delicate balance between high visual fidelity and fine-grained structural editability. However, existing approaches typically bifurcate into either non-editable raster image synthesis or abstract layout generation devoid of visual content. Recent combinations of these two approaches attempt to bridge this gap but often suffer from rigid composition schemas and unresolvable visual dissonances (e.g., text-background conflicts) due to their inexpressive representation and open-loop nature. To address these challenges, we propose DesignAsCode, a novel framework that reimagines graphic design as a programmatic synthesis task using HTML/CSS. Specifically, we introduce a Plan-Implement-Reflect pipeline, incorporating a Semantic Planner to construct dynamic, variable-depth element hierarchies and a Visual-Aware Reflection mechanism that iteratively optimizes the code to rectify rendering artifacts. Extensive experiments demonstrate that DesignAsCode significantly outperforms state-of-the-art baselines in both structural validity and aesthetic quality. Furthermore, our code-native representation unlocks advanced capabilities, including automatic layout retargeting, complex document generation (e.g., resumes), and CSS-based animation.

【6】AnCoder: Anchored Code Generation via Discrete Diffusion Models
标题:AnCoder:通过离散扩散模型生成锚定代码
链接:https://arxiv.org/abs/2602.17688

作者:Anton Xue,Litu Rout,Constantine Caramanis,Sanjay Shakkottai
摘要:扩散语言模型为自回归代码生成提供了一种令人信服的替代方案,使复杂程序逻辑的全局规划和迭代细化成为可能。然而,现有的方法未能尊重编程语言的刚性结构,因此,经常产生无法执行的破碎程序。为了解决这个问题,我们引入了AnchorTree,一个框架,明确锚定扩散过程中使用结构化的,层次化的先验代码。具体地,AnchorTree使用抽象语法树来优先化解析语法上和语义上突出的令牌,诸如关键字(例如,if,while)和标识符(例如,变量名),从而建立一个结构支架,指导其余的一代。我们通过AnCoder验证了这个框架,这是一个系列的模型,表明结构锚定扩散提供了一个高质量代码生成的参数有效的路径。
摘要:Diffusion language models offer a compelling alternative to autoregressive code generation, enabling global planning and iterative refinement of complex program logic. However, existing approaches fail to respect the rigid structure of programming languages and, as a result, often produce broken programs that fail to execute. To address this, we introduce AnchorTree, a framework that explicitly anchors the diffusion process using structured, hierarchical priors native to code. Specifically, AnchorTree uses the abstract syntax tree to prioritize resolving syntactically and semantically salient tokens, such as keywords (e.g., if, while) and identifiers (e.g., variable names), thereby establishing a structural scaffold that guides the remaining generation. We validate this framework via AnCoder, a family of models showing that structurally anchored diffusion offers a parameter-efficient path to high-quality code generation.

【7】Duality Models: An Embarrassingly Simple One-step Generation Paradigm
标题:二元模型:一个令人尴尬的简单一步生成范式
链接:https://arxiv.org/abs/2602.17682

作者:Peng Sun,Xinyi Shang,Tao Lin,Zhiqiang Shen
备注:https://github.com/LINs-lab/DuMo
摘要 :基于一致性的生成模型,如MeanFlow和MeanFlow,通过目标感知设计来求解概率流ODE(PF-ODE),取得了令人印象深刻的结果。通常,这样的方法引入目标时间$r$以及当前时间$t$,以在局部多步导数($r = t$)和全局少步积分($r = 0$)之间调节输出。然而,传统的“一个输入,一个输出”范式强制划分训练预算,通常分配相当大的一部分(例如,75%的MeanFlow)仅用于多步稳定性目标。这种分离迫使一个权衡:将足够的样本分配给多步目标会使少步生成训练不足,这会损害收敛性并限制可扩展性。为此,我们提出了通过“一个输入,双输出”范式的对偶模型(DuMo)。使用具有双头的共享骨干,DuMo从单个输入$x_t$同时预测速度$v_t$和流图$u_t$。这将来自多步目标的几何约束应用于每个样本,在不分离训练目标的情况下限制了几步估计,从而显着提高了稳定性和效率。在ImageNet 256 $\times $256上,一个带有SD-VAE的679 M扩散Transformer仅用2步就实现了1.79的最先进(SOTA)FID。代码可在:https://github.com/LINs-lab/DuMo
摘要:Consistency-based generative models like Shortcut and MeanFlow achieve impressive results via a target-aware design for solving the Probability Flow ODE (PF-ODE). Typically, such methods introduce a target time $r$ alongside the current time $t$ to modulate outputs between a local multi-step derivative ($r = t$) and a global few-step integral ($r = 0$). However, the conventional "one input, one output" paradigm enforces a partition of the training budget, often allocating a significant portion (e.g., 75% in MeanFlow) solely to the multi-step objective for stability. This separation forces a trade-off: allocating sufficient samples to the multi-step objective leaves the few-step generation undertrained, which harms convergence and limits scalability. To this end, we propose Duality Models (DuMo) via a "one input, dual output" paradigm. Using a shared backbone with dual heads, DuMo simultaneously predicts velocity $v_t$ and flow-map $u_t$ from a single input $x_t$. This applies geometric constraints from the multi-step objective to every sample, bounding the few-step estimation without separating training objectives, thereby significantly improving stability and efficiency. On ImageNet 256 $\times$ 256, a 679M Diffusion Transformer with SD-VAE achieves a state-of-the-art (SOTA) FID of 1.79 in just 2 steps. Code is available at: https://github.com/LINs-lab/DuMo

迁移|Zero/Few/One-Shot|自适应(7篇)

【1】MEG-to-MEG Transfer Learning and Cross-Task Speech/Silence Detection with Limited Data
标题:有限数据的MEG到MEG迁移学习和跨任务语音/静音检测
链接:https://arxiv.org/abs/2602.18253

作者:Xabier de Zuazo,Vincenzo Verbeni,Eva Navas,Ibon Saratxaga,Mathieu Bourguignon,Nicola Molinaro
备注:6 pages, 3 figures, 3 tables, submitted to Interspeech 2026
摘要:数据高效的神经解码是语音脑机接口的核心挑战。我们提出了第一个演示迁移学习和跨任务解码的MEG为基础的语音模型跨越感知和生产。我们在50小时的单一主题听力数据上预训练了一个基于Conformer的模型,并在18名参与者中对每个主题进行了5分钟的微调。迁移学习带来了持续的改进,任务内准确率提高了1-4%,跨任务的准确率提高了5- 6%。预训练不仅可以提高每个任务的性能,而且还可以在感知和生产之间进行可靠的跨任务解码。至关重要的是,在言语产生方面训练的模型解码了被动倾听,而不是偶然的,这证实了学习的表征反映了共享的神经过程,而不是特定于任务的运动活动。
摘要:Data-efficient neural decoding is a central challenge for speech brain-computer interfaces. We present the first demonstration of transfer learning and cross-task decoding for MEG-based speech models spanning perception and production. We pre-train a Conformer-based model on 50 hours of single-subject listening data and fine-tune on just 5 minutes per subject across 18 participants. Transfer learning yields consistent improvements, with in-task accuracy gains of 1-4% and larger cross-task gains of up to 5-6%. Not only does pre-training improve performance within each task, but it also enables reliable cross-task decoding between perception and production. Critically, models trained on speech production decode passive listening above chance, confirming that learned representations reflect shared neural processes rather than task-specific motor activity.

【2】Parameter-Efficient Domain Adaptation of Physics-Informed Self-Attention based GNNs for AC Power Flow Prediction
标题:基于物理信息自注意GNNs的参数有效域自适应交流潮流预测
链接:https://arxiv.org/abs/2602.18227

作者:Redwanul Karim,Changhun Kim,Timon Conrad,Nora Gourmelon,Julian Oelhaf,David Riebesel,Tomás Arias-Vergara,Andreas Maier,Johann Jäger,Siming Bayer
摘要:当在中压电网上训练的模型部署在高压电网上时,准确的域转移下的AC-PF预测是至关重要的。现有的物理信息图神经求解器通常依赖于跨机制转移的完全微调,导致高再训练成本,并对目标域适应和源域保留之间的稳定性-可塑性权衡提供有限的控制。我们研究了基于物理信息的自注意力GNN的参数有效域适应,通过基于物理的损失鼓励Kirchhoff一致的行为,同时限制适应低秩更新。具体来说,我们将LoRA应用于注意力预测,并选择性地解冻预测头来调节适应能力。这种设计产生了一个可控的效率和精度的权衡物理约束下的电压制度转变的逆估计。在多个网格拓扑中,所提出的LoRA+PHead自适应恢复了接近全微调的准确性,目标域RMSE间隙为2.6\times10 ^{-4}$,同时将可训练参数的数量减少了85.46%。基于物理的残差仍然与完全微调相当;然而,相对于完全FT,LoRA+PHead在域偏移下将MV源保留降低了4.7个百分点(17.9% vs. 22.6%),同时仍然能够实现参数高效和物理一致的AC-PF估计。
摘要:Accurate AC-PF prediction under domain shift is critical when models trained on medium-voltage (MV) grids are deployed on high-voltage (HV) networks. Existing physics-informed graph neural solvers typically rely on full fine-tuning for cross-regime transfer, incurring high retraining cost and offering limited control over the stability-plasticity trade-off between target-domain adaptation and source-domain retention. We study parameter-efficient domain adaptation for physics-informed self-attention based GNN, encouraging Kirchhoff-consistent behavior via a physics-based loss while restricting adaptation to low-rank updates. Specifically, we apply LoRA to attention projections with selective unfreezing of the prediction head to regulate adaptation capacity. This design yields a controllable efficiency-accuracy trade-off for physics-constrained inverse estimation under voltage-regime shift. Across multiple grid topologies, the proposed LoRA+PHead adaptation recovers near-full fine-tuning accuracy with a target-domain RMSE gap of $2.6\times10^{-4}$ while reducing the number of trainable parameters by 85.46%. The physics-based residual remains comparable to full fine-tuning; however, relative to Full FT, LoRA+PHead reduces MV source retention by 4.7 percentage points (17.9% vs. 22.6%) under domain shift, while still enabling parameter-efficient and physically consistent AC-PF estimation.

【3】Improving Generalizability of Hip Fracture Risk Prediction via Domain Adaptation Across Multiple Cohorts
标题:通过跨多个队列的领域适应提高髋部骨折风险预测的普遍性
链接:https://arxiv.org/abs/2602.17962

作者:Shuo Sun,Meiling Zhou,Chen Zhao,Joyce H. Keyak,Nancy E. Lane,Jeffrey D. Deng,Kuan-Jui Su,Hui Shen,Hong-Wen Deng,Kui Zhang,Weihua Zhou
备注:26 pages, 3 tables, 1 figure
摘要:临床风险预测模型通常无法在队列中推广,因为基础数据分布因临床研究中心、地区、人口统计学和测量方案而异。这种局限性在髋部骨折风险预测中尤其明显,其中在一个队列(源队列)上训练的模型的性能在部署在其他队列(目标队列)中时可能会大幅下降。我们在三个大型队列中使用了一组共享的临床和DXA衍生特征-骨质疏松性骨折研究(SOF),男性骨质疏松性骨折研究(MrOS)和英国生物银行(UKB),以系统地评估三种域适应方法的性能-最大平均离散度(MMD),相关对齐(CORAL)和域对抗神经网络(DANN)及其组合。对于仅含男性的源队列和仅含女性的源队列,域适应方法始终显示出比无适应基线(仅源训练)更好的性能,并且使用多个域适应方法的组合提供了最大和最稳定的增益。结合MMD、CORAL和DANN的方法实现了最高的区分度,对于仅具有男性的源队列,曲线下面积(AUC)为0.88,对于仅具有女性的源队列,曲线下面积(AUC)为0.95,这表明集成多个域自适应方法可以产生对数据集差异不太敏感的特征表示。与严重依赖监督调整或假设目标队列中样本的已知结果的现有方法不同,我们的无结果方法能够在现实的部署条件下进行模型选择,并提高髋部骨折风险预测中模型的泛化能力。
摘要:Clinical risk prediction models often fail to be generalized across cohorts because underlying data distributions differ by clinical site, region, demographics, and measurement protocols. This limitation is particularly pronounced in hip fracture risk prediction, where the performance of models trained on one cohort (the source cohort) can degrade substantially when deployed in other cohorts (target cohorts). We used a shared set of clinical and DXA-derived features across three large cohorts - the Study of Osteoporotic Fractures (SOF), the Osteoporotic Fractures in Men Study (MrOS), and the UK Biobank (UKB), to systematically evaluate the performance of three domain adaptation methods - Maximum Mean Discrepancy (MMD), Correlation Alignment (CORAL), and Domain - Adversarial Neural Networks (DANN) and their combinations. For a source cohort with males only and a source cohort with females only, domain-adaptation methods consistently showed improved performance than the no-adaptation baseline (source-only training), and the use of combinations of multiple domain adaptation methods delivered the largest and most stable gains. The method that combines MMD, CORAL, and DANN achieved the highest discrimination with the area under curve (AUC) of 0.88 for a source cohort with males only and 0.95 for a source cohort with females only), demonstrating that integrating multiple domain adaptation methods could produce feature representations that are less sensitive to dataset differences. Unlike existing methods that rely heavily on supervised tuning or assume known outcomes of samples in target cohorts, our outcome-free approaches enable the model selection under realistic deployment conditions and improve generalization of models in hip fracture risk prediction.

【4】MantisV2: Closing the Zero-Shot Gap in Time Series Classification with Synthetic Data and Test-Time Strategies
标题:MantisV2:利用合成数据和测试时间策略缩小时间序列分类中的Zero-Shot差距
链接:https://arxiv.org/abs/2602.17868

作者:Vasilii Feofanov,Songkang Wen,Jianfeng Zhang,Lujia Pan,Ievgen Redko
摘要:开发用于时间序列分类的基础模型具有很高的实际意义,因为这些模型可以作为各种下游任务的通用特征提取器。虽然Mantis等早期模型已经展示了这种方法的前景,但在冻结和微调编码器之间仍然存在很大的性能差距。在这项工作中,我们介绍的方法,显着加强zero-shot特征提取的时间序列。首先,我们介绍Mantis+,这是Mantis的一个变种,完全基于合成时间序列进行预训练。其次,通过控制消融研究,我们完善了架构,并获得MantisV2,一个改进的,更轻量级的编码器。第三,我们提出了一种增强的测试时方法,利用中间层表示和细化输出令牌聚合。此外,我们表明,性能可以进一步提高通过自集成和跨模型嵌入融合。在UCR、UEA、人类活动识别(HAR)基准测试和EEG数据集上进行的大量实验表明,MantisV2和Mantis+的性能始终优于先前的时间序列基础模型,实现了最先进的zero-shot性能。
摘要:Developing foundation models for time series classification is of high practical relevance, as such models can serve as universal feature extractors for diverse downstream tasks. Although early models such as Mantis have shown the promise of this approach, a substantial performance gap remained between frozen and fine-tuned encoders. In this work, we introduce methods that significantly strengthen zero-shot feature extraction for time series. First, we introduce Mantis+, a variant of Mantis pre-trained entirely on synthetic time series. Second, through controlled ablation studies, we refine the architecture and obtain MantisV2, an improved and more lightweight encoder. Third, we propose an enhanced test-time methodology that leverages intermediate-layer representations and refines output-token aggregation. In addition, we show that performance can be further improved via self-ensembling and cross-model embedding fusion. Extensive experiments on UCR, UEA, Human Activity Recognition (HAR) benchmarks, and EEG datasets show that MantisV2 and Mantis+ consistently outperform prior time series foundation models, achieving state-of-the-art zero-shot performance.

【5】Calibrated Adaptation: Bayesian Stiefel Manifold Priors for Reliable Parameter-Efficient Fine-Tuning
标题:校准适应:用于可靠且参数高效的微调的Bayesian Stiefel Manifold先验
链接:https://arxiv.org/abs/2602.17809

作者:Ibne Farabi Shihab,Sanjeda Akter,Anuj Sharma
摘要:参数高效的微调方法(如LoRA)能够实际适应大型语言模型,但没有提供原则性的不确定性估计,导致校准不良的预测和域转移下的不可靠行为。我们介绍Stiefel-Bayes适配器(SBA),贝叶斯PEFT框架,放置一个矩阵Langevin前正交适配器因子的Stiefel流形$\St$,并通过切空间拉普拉斯近似与测地线收缩进行近似后验推理。与平面空间中投影到正交约束的高斯先验不同,我们在流形上的先验自然地编码了适配子空间应该是良好条件和正交的归纳偏差,而后验提供了校准的预测不确定性而无需重新校准。我们正式证明,切空间近似严格避免了固有的结构方差膨胀从周围空间投影,建立了严格的理论优势内在流形推断。在RoBERTa-large、LLaMA-2- 7 B、LLaMA-2- 13 B、Mistral-7 B和Qwen2.5- 7 B上的GLUE和SuperGLUE基准测试、域偏移评估、选择性预测协议和抽象摘要任务中,SBA实现了与LoRA和DoRA相当的任务性能,同时在确定性基线上将预期校准误差降低了18 - 34%,在域转移下,选择性预测AUROC提高了12 - 25%,并且在OOD检测上以一小部分参数成本优于五个LoRA模型的深度集成。我们的研究结果表明,在正确的几何结构上放置不确定性,比简单地向适配器添加任何贝叶斯处理更重要。
摘要:Parameter-efficient fine-tuning methods such as LoRA enable practical adaptation of large language models but provide no principled uncertainty estimates, leading to poorly calibrated predictions and unreliable behavior under domain shift. We introduce Stiefel-Bayes Adapters (SBA), a Bayesian PEFT framework that places a Matrix Langevin prior over orthonormal adapter factors on the Stiefel manifold $\St$ and performs approximate posterior inference via tangent space Laplace approximation with geodesic retraction. Unlike Gaussian priors in flat space projected onto orthogonality constraints, our prior on the manifold naturally encodes the inductive bias that adapter subspaces should be well conditioned and orthogonal, while the posterior provides calibrated predictive uncertainty without recalibration. We prove formally that the tangent space approximation strictly avoids the structural variance inflation inherent in projecting from ambient space, establishing a rigorous theoretical advantage for intrinsic manifold inference. Across GLUE and SuperGLUE benchmarks on RoBERTa-large, LLaMA-2-7B, LLaMA-2-13B, Mistral-7B, and Qwen2.5-7B, domain shift evaluations, selective prediction protocols, and an abstractive summarization task, SBA achieves task performance comparable to LoRA and DoRA while reducing Expected Calibration Error by 18 to 34\% over deterministic baselines, improving selective prediction AUROC by 12 to 25\% under domain shift, and outperforming deep ensembles of five LoRA models on OOD detection at a fraction of the parameter cost. Our results demonstrate that where you place uncertainty, on the right geometric structure, matters more than simply adding any Bayesian treatment to adapters.

【6】Nested Training for Mutual Adaptation in Human-AI Teaming
标题:人机协作中相互适应的嵌套训练
链接:https://arxiv.org/abs/2602.17737

作者:Upasana Biswas,Durgesh Kalwar,Subbarao Kambhampati,Sarath Sreedharan
摘要:相互适应是人类-人工智能团队的核心挑战,因为人类自然会根据机器人的政策调整自己的策略。现有的方法旨在提高训练伙伴的多样性,以接近人类的行为,但这些伙伴是静态的,无法捕捉人类的自适应行为。将机器人暴露于自适应行为是至关重要的,但是当两个代理在多代理环境中同时学习时,它们通常会收敛到不透明的隐式协调策略,这些策略只适用于与它们共同训练的代理。当与新伙伴配对时,这些代理人无法进行归纳。为了捕捉人类的自适应行为,我们将人类-机器人合作场景建模为交互式部分可观察马尔可夫决策过程(I-POMDP),明确地将人类适应建模为状态的一部分。我们提出了一个嵌套的训练制度,近似学习的解决方案,有限级别的I-POMDP。在这个框架中,每个级别的代理都是针对下面级别的自适应代理进行训练的。这确保了自我主体在训练过程中暴露于适应性行为,同时避免了内隐协调策略的出现,因为训练伙伴本身并没有学习。我们训练我们的方法在一个多集,需要合作设置在煮过头域,比较它对几个基线代理人设计的人-机器人团队。我们评估我们的代理与训练期间未看到的自适应合作伙伴配对时的性能。我们的研究结果表明,我们的代理不仅实现了更高的任务性能与这些自适应的合作伙伴,但也表现出显着更大的适应性,在团队互动。
摘要:Mutual adaptation is a central challenge in human--AI teaming, as humans naturally adjust their strategies in response to a robot's policy. Existing approaches aim to improve diversity in training partners to approximate human behavior, but these partners are static and fail to capture adaptive behavior of humans. Exposing robots to adaptive behaviors is critical, yet when both agents learn simultaneously in a multi-agent setting, they often converge to opaque implicit coordination strategies that only work with the agents they were co-trained with. Such agents fail to generalize when paired with new partners. In order to capture the adaptive behavior of humans, we model the human-robot teaming scenario as an Interactive Partially Observable Markov Decision Process (I-POMDP), explicitly modeling human adaptation as part of the state. We propose a nested training regime to approximately learn the solution to a finite-level I-POMDP. In this framework, agents at each level are trained against adaptive agents from the level below. This ensures that the ego agent is exposed to adaptive behavior during training while avoiding the emergence of implicit coordination strategies, since the training partners are not themselves learning. We train our method in a multi-episode, required cooperation setup in the Overcooked domain, comparing it against several baseline agents designed for human-robot teaming. We evaluate the performance of our agent when paired with adaptive partners that were not seen during training. Our results demonstrate that our agent not only achieves higher task performance with these adaptive partners but also exhibits significantly greater adaptability during team interactions.

【7】Theory and interpretability of Quantum Extreme Learning Machines: a Pauli-transfer matrix approach
标题:量子极限学习机的理论和可解释性:保利转移矩阵方法
链接:https://arxiv.org/abs/2602.18377

作者:Markus Gross,Hans-Martin Rieser
备注:34 pages, 12 figures
摘要:量子库计算机(QRC)已经成为量子机器学习的一种有前途的方法,因为它们利用量子系统的自然动力学进行数据处理,并且训练简单。在这里,我们考虑具有连续时间库动力学的n量子比特量子极限学习机(QELM)。QELM是无记忆QRC,能够执行各种ML任务,包括图像分类和时间序列预测。我们应用泡利传输矩阵(PTM)形式主义的QELM性能的编码,水库动态和测量操作,包括时间复用,从理论上分析的影响。这种形式主义明确表示,编码决定了QELM可用的(非线性)特征的完整集合,而量子通道在被所选择的测量算子探测之前线性地转换这些特征。因此,优化QELM可以转换为解码问题,其中一个形状的通道诱导的转换,使任务相关的功能变得可用于回归。PTM形式主义允许一个确定的经典表示的QELM,从而引导其设计对一个给定的训练目标。作为一个具体的应用程序,我们专注于学习非线性动力系统,并表明在这样的轨迹上训练的QELM学习到底层流图的代理近似。
摘要 :Quantum reservoir computers (QRCs) have emerged as a promising approach to quantum machine learning, since they utilize the natural dynamics of quantum systems for data processing and are simple to train. Here, we consider n-qubit quantum extreme learning machines (QELMs) with continuous-time reservoir dynamics. QELMs are memoryless QRCs capable of various ML tasks, including image classification and time series forecasting. We apply the Pauli transfer matrix (PTM) formalism to theoretically analyze the influence of encoding, reservoir dynamics, and measurement operations, including temporal multiplexing, on the QELM performance. This formalism makes explicit that the encoding determines the complete set of (nonlinear) features available to the QELM, while the quantum channels linearly transform these features before they are probed by the chosen measurement operators. Optimizing a QELM can therefore be cast as a decoding problem in which one shapes the channel-induced transformations such that task-relevant features become available to the regressor. The PTM formalism allows one to identify the classical representation of a QELM and thereby guide its design towards a given training objective. As a specific application, we focus on learning nonlinear dynamical systems and show that a QELM trained on such trajectories learns a surrogate-approximation to the underlying flow map.

强化学习(6篇)

【1】Flow Matching with Injected Noise for Offline-to-Online Reinforcement Learning
标题:用于离线到在线强化学习的注入噪音流匹配
链接:https://arxiv.org/abs/2602.18117

作者:Yongjae Shin,Jongseong Chae,Jongeui Park,Youngchul Sung
备注:ICLR 2026 camera-ready
摘要:生成模型最近在不同的领域取得了显着的成功,促使它们被采用为强化学习(RL)中的表达策略。虽然它们在离线强化学习中表现出了很强的性能,特别是在目标分布定义明确的情况下,但它们对在线微调的扩展在很大程度上被视为离线预训练的直接延续,留下了未解决的关键挑战。在本文中,我们提出了离线到在线RL(FINO)的注入噪声流匹配,这是一种利用基于流匹配的策略来提高离线到在线RL的样本效率的新方法。FINO通过在政策训练中注入噪音来促进有效的探索,从而鼓励在离线数据集中观察到的行动之外采取更广泛的行动。除了探索增强的流策略训练之外,我们还结合了熵引导的采样机制来平衡探索和利用,使策略能够在整个在线微调过程中调整其行为。在不同的,具有挑战性的任务的实验表明,FINO始终实现在有限的在线预算优越的性能。
摘要:Generative models have recently demonstrated remarkable success across diverse domains, motivating their adoption as expressive policies in reinforcement learning (RL). While they have shown strong performance in offline RL, particularly where the target distribution is well defined, their extension to online fine-tuning has largely been treated as a direct continuation of offline pre-training, leaving key challenges unaddressed. In this paper, we propose Flow Matching with Injected Noise for Offline-to-Online RL (FINO), a novel method that leverages flow matching-based policies to enhance sample efficiency for offline-to-online RL. FINO facilitates effective exploration by injecting noise into policy training, thereby encouraging a broader range of actions beyond those observed in the offline dataset. In addition to exploration-enhanced flow policy training, we combine an entropy-guided sampling mechanism to balance exploration and exploitation, allowing the policy to adapt its behavior throughout online fine-tuning. Experiments across diverse, challenging tasks demonstrate that FINO consistently achieves superior performance under limited online budgets.

【2】Interacting safely with cyclists using Hamilton-Jacobi reachability and reinforcement learning
标题:使用Hamilton-Jacobi可达性和强化学习与骑自行车的人安全互动
链接:https://arxiv.org/abs/2602.18097

作者:Aarati Andrea Noronha,Jean Oh
备注:7 pages. This manuscript was completed in 2020 as part of the first author's graduate thesis at Carnegie Mellon University
摘要:在本文中,我们提出了一个框架,使自动驾驶汽车与骑自行车的人互动的方式,平衡安全性和最优性。该方法将Hamilton-Jacobi可达性分析与深度Q学习相结合,以共同解决安全保证和时间高效导航问题。一个值函数计算作为一个时间相关的Hamilton-Jacobi-Bellman不等式的解决方案,提供了一个定量的安全措施,为每个系统状态。该安全度量作为结构化奖励信号被纳入强化学习框架中。该方法进一步模拟骑车人对车辆的潜在响应,允许干扰输入反映人的舒适度和行为适应。所提出的框架进行评估,通过模拟和比较与人类驾驶行为和现有的国家的最先进的方法。
摘要:In this paper, we present a framework for enabling autonomous vehicles to interact with cyclists in a manner that balances safety and optimality. The approach integrates Hamilton-Jacobi reachability analysis with deep Q-learning to jointly address safety guarantees and time-efficient navigation. A value function is computed as the solution to a time-dependent Hamilton-Jacobi-Bellman inequality, providing a quantitative measure of safety for each system state. This safety metric is incorporated as a structured reward signal within a reinforcement learning framework. The method further models the cyclist's latent response to the vehicle, allowing disturbance inputs to reflect human comfort and behavioral adaptation. The proposed framework is evaluated through simulation and comparison with human driving behavior and an existing state-of-the-art method.

【3】Gradient Regularization Prevents Reward Hacking in Reinforcement Learning from Human Feedback and Verifiable Rewards
标题:梯度正规化防止强化学习中的奖励黑客攻击来自人类反馈和可验证奖励
链接:https://arxiv.org/abs/2602.18037

作者:Johannes Ackermann,Michael Noukhovitch,Takashi Ishida,Masashi Sugiyama
备注:25 pages, 15 figures
摘要:基于人类反馈的强化学习(RLHF)或可验证奖励(RLVR)是现代语言模型(LM)后训练的两个关键步骤。一个常见的问题是奖励黑客,其中策略可能会利用奖励的不准确性并学习意外的行为。大多数以前的作品解决这个问题,限制政策更新与Kullback-Leibler(KL)的罚款对参考模型。我们提出了一个不同的框架:训练LM,使政策更新偏向奖励更准确的地区。首先,我们推导出一个理论之间的联系的奖励模型的准确性和平坦的最佳收敛。然后,可以使用梯度正则化(GR)来将训练偏置到更平坦的区域,从而保持奖励模型的准确性。我们证实了这些结果表明,梯度范数和奖励的准确性是经验相关的RLHF。然后,我们证明了KL惩罚的参考重置隐式地使用GR来找到具有更高奖励精度的平坦区域。我们进一步改进,建议使用显式GR与一个有效的有限差分估计。从经验上讲,GR在使用LM的各种RL实验中的表现优于KL罚分。GR在RLHF中实现了更高的GPT判断获胜率,避免了过度关注基于规则的数学奖励的格式,并防止在LLM-as-a-Judge数学任务中黑客攻击法官。
摘要:Reinforcement Learning from Human Feedback (RLHF) or Verifiable Rewards (RLVR) are two key steps in the post-training of modern Language Models (LMs). A common problem is reward hacking, where the policy may exploit inaccuracies of the reward and learn an unintended behavior. Most previous works address this by limiting the policy update with a Kullback-Leibler (KL) penalty towards a reference model. We propose a different framing: Train the LM in a way that biases policy updates towards regions in which the reward is more accurate. First, we derive a theoretical connection between the accuracy of a reward model and the flatness of an optimum at convergence. Gradient regularization (GR) can then be used to bias training to flatter regions and thereby maintain reward model accuracy. We confirm these results by showing that the gradient norm and reward accuracy are empirically correlated in RLHF. We then show that Reference Resets of the KL penalty implicitly use GR to find flatter regions with higher reward accuracy. We further improve on this by proposing to use explicit GR with an efficient finite-difference estimate. Empirically, GR performs better than a KL penalty across a diverse set of RL experiments with LMs. GR achieves a higher GPT-judged win-rate in RLHF, avoids overly focusing on the format in rule-based math rewards, and prevents hacking the judge in LLM-as-a-Judge math tasks.

【4】Mean-Field Reinforcement Learning without Synchrony
标题:无同步的平均场强化学习
链接:https://arxiv.org/abs/2602.18026

作者:Shan Yang
备注:21 pages, 5 figures, 1 algorithm
摘要:平均场强化学习(MF-RL)通过将每个代理对其他代理的依赖减少到单个汇总统计量(平均动作)来将多代理RL扩展到大群体。然而,这种减少要求每个代理人在每个时间步都采取行动;当一些代理人空闲时,平均行动只是不确定的。因此,解决不确定性问题需要一个不同的汇总统计数据--一个无论哪个代理人采取行动都保持定义的汇总统计数据。在Δ(\mathcal{O})$中的人口分布$μ\--每次观察时代理的分数--满足这个要求:它的维数与$N$无关,并且在交换条件下,它完全决定每个代理的奖励和转移。然而,现有的MF-RL理论是建立在平均作用上的,并且没有扩展到$μ$。因此,我们从零开始围绕人口分布构建时间平均场(Temporal Mean Field,简称EMF)框架,涵盖了从完全同步到纯粹顺序决策的整个频谱。我们证明的存在性和唯一性的平衡点,建立一个O(1/\sqrt{N})$有限人口的近似界,无论有多少代理每一步的行为,并证明收敛的策略梯度算法(TMF-PG)的唯一平衡。一个资源选择游戏和一个动态博弈的实验证实,TMF-PG实现几乎相同的性能,无论是一个代理或所有$N$每一步的行为,近似误差衰减预测的O(1/\sqrt{N})$率。
摘要:Mean-field reinforcement learning (MF-RL) scales multi-agent RL to large populations by reducing each agent's dependence on others to a single summary statistic -- the mean action. However, this reduction requires every agent to act at every time step; when some agents are idle, the mean action is simply undefined. Addressing asynchrony therefore requires a different summary statistic -- one that remains defined regardless of which agents act. The population distribution $μ\in Δ(\mathcal{O})$ -- the fraction of agents at each observation -- satisfies this requirement: its dimension is independent of $N$, and under exchangeability it fully determines each agent's reward and transition. Existing MF-RL theory, however, is built on the mean action and does not extend to $μ$. We therefore construct the Temporal Mean Field (TMF) framework around the population distribution $μ$ from scratch, covering the full spectrum from fully synchronous to purely sequential decision-making within a single theory. We prove existence and uniqueness of TMF equilibria, establish an $O(1/\sqrt{N})$ finite-population approximation bound that holds regardless of how many agents act per step, and prove convergence of a policy gradient algorithm (TMF-PG) to the unique equilibrium. Experiments on a resource selection game and a dynamic queueing game confirm that TMF-PG achieves near-identical performance whether one agent or all $N$ act per step, with approximation error decaying at the predicted $O(1/\sqrt{N})$ rate.

【5】Flow Actor-Critic for Offline Reinforcement Learning
标题:线下强化学习的流演员评论家
链接:https://arxiv.org/abs/2602.18015

作者:Jongseong Chae,Jongeui Park,Yongjae Shin,Gyeongmin Kim,Seungyul Han,Youngchul Sung
备注:Accepted to ICLR 2026
摘要:离线强化学习(RL)中的数据集分布通常表现出复杂和多模态的分布,需要表达策略来捕获广泛使用的高斯策略之外的分布。为了处理这种复杂和多模态的数据集,在本文中,我们提出了流Actor-Critic,一种新的离线RL的Actor-Critic方法,基于最近的流策略。所提出的方法不仅使用流模型的演员在以前的流政策,但也利用表达流模型的保守的评论家收购,以防止Q值爆炸的数据区域。为此,我们提出了一种新的形式的评论家正则化的基础上获得的流行为代理模型作为一个副产品的流为基础的演员设计。以这种联合方式利用流模型,我们为离线RL的测试数据集(包括D4 RL和最近的OGBench基准测试)实现了新的最先进的性能。
摘要:The dataset distributions in offline reinforcement learning (RL) often exhibit complex and multi-modal distributions, necessitating expressive policies to capture such distributions beyond widely-used Gaussian policies. To handle such complex and multi-modal datasets, in this paper, we propose Flow Actor-Critic, a new actor-critic method for offline RL, based on recent flow policies. The proposed method not only uses the flow model for actor as in previous flow policies but also exploits the expressive flow model for conservative critic acquisition to prevent Q-value explosion in out-of-data regions. To this end, we propose a new form of critic regularizer based on the flow behavior proxy model obtained as a byproduct of flow-based actor design. Leveraging the flow model in this joint way, we achieve new state-of-the-art performance for test datasets of offline RL including the D4RL and recent OGBench benchmarks.

【6】Optimal Multi-Debris Mission Planning in LEO: A Deep Reinforcement Learning Approach with Co-Elliptic Transfers and Refueling
标题:LEO中的最佳多碎片任务规划:具有共椭圆传输和加油的深度强化学习方法
链接:https://arxiv.org/abs/2602.17685

作者:Agni Bandyopadhyay,Gunther Waxenegger-Wilfing
备注:Presented at Conference: IFAC Workshop on Control Aspects of Multi-Satellite Systems (CAMSAT) 2025 At: Wuerzburg
摘要:本文介绍了一个统一的coelliptical机动框架,结合Hohmann转移,安全椭圆接近操作,明确的加油逻辑,在低地球轨道(LEO)的多目标主动碎片清除(ADR)的挑战。我们基准三个不同的规划算法贪婪启发式,蒙特卡罗树搜索(MCTS),和深度强化学习(RL)使用掩蔽邻近策略优化(PPO)在现实的轨道模拟环境中具有随机碎片场,禁区和delta V约束。超过100个测试场景的实验结果表明,Masked PPO实现了卓越的任务效率和计算性能,访问的碎片数量是Greedy的两倍,并且在运行时的性能明显优于MCTS。这些发现强调了现代强化学习方法在可扩展、安全和资源高效的太空任务规划方面的前景,为ADR自治的未来发展铺平了道路。
摘要:This paper addresses the challenge of multi target active debris removal (ADR) in Low Earth Orbit (LEO) by introducing a unified coelliptic maneuver framework that combines Hohmann transfers, safety ellipse proximity operations, and explicit refueling logic. We benchmark three distinct planning algorithms Greedy heuristic, Monte Carlo Tree Search (MCTS), and deep reinforcement learning (RL) using Masked Proximal Policy Optimization (PPO) within a realistic orbital simulation environment featuring randomized debris fields, keep out zones, and delta V constraints. Experimental results over 100 test scenarios demonstrate that Masked PPO achieves superior mission efficiency and computational performance, visiting up to twice as many debris as Greedy and significantly outperforming MCTS in runtime. These findings underscore the promise of modern RL methods for scalable, safe, and resource efficient space mission planning, paving the way for future advancements in ADR autonomy.

元学习(1篇)

【1】Explaining AutoClustering: Uncovering Meta-Feature Contribution in AutoML for Clustering
标题:解释自动集群:揭示AutoML中元特征对集群的贡献
链接:https://arxiv.org/abs/2602.18348

作者:Matheus Camilo da Silva,Leonardo Arrighi,Ana Carolina Lorena,Sylvio Barbon Junior
摘要:自动聚类方法旨在自动执行无监督学习任务,包括算法选择(AS),超参数优化(HPO)和管道合成(PS),通常通过利用数据集元特征的元学习。虽然这些系统通常具有很强的性能,但它们的建议往往难以证明:数据集元特征对算法和超参数选择的影响通常不会暴露,从而限制了可靠性,偏差诊断和有效的元特征工程。这限制了进一步改进的可靠性和诊断洞察力。在这项工作中,我们研究了自聚类中元模型的可解释性。我们首先回顾了22个现有的方法,并将它们的元特征组织成一个结构化的分类。然后,我们应用全局可解释性技术(即,决策谓词图),以评估来自选定框架的元模型中的特征重要性。最后,我们使用本地可解释性工具,如SHAP(SHapley加法解释)来分析特定的聚类决策。我们的研究结果强调了元特征相关性的一致模式,确定了当前元学习策略中可能扭曲推荐的结构性弱点,并为更可解释的自动机器学习(AutoML)设计提供了可操作的指导。因此,这项研究为提高无监督学习自动化的决策透明度提供了一个实际的基础。
摘要:AutoClustering methods aim to automate unsupervised learning tasks, including algorithm selection (AS), hyperparameter optimization (HPO), and pipeline synthesis (PS), by often leveraging meta-learning over dataset meta-features. While these systems often achieve strong performance, their recommendations are often difficult to justify: the influence of dataset meta-features on algorithm and hyperparameter choices is typically not exposed, limiting reliability, bias diagnostics, and efficient meta-feature engineering. This limits reliability and diagnostic insight for further improvements. In this work, we investigate the explainability of the meta-models in AutoClustering. We first review 22 existing methods and organize their meta-features into a structured taxonomy. We then apply a global explainability technique (i.e., Decision Predicate Graphs) to assess feature importance within meta-models from selected frameworks. Finally, we use local explainability tools such as SHAP (SHapley Additive exPlanations) to analyse specific clustering decisions. Our findings highlight consistent patterns in meta-feature relevance, identify structural weaknesses in current meta-learning strategies that can distort recommendations, and provide actionable guidance for more interpretable Automated Machine Learning (AutoML) design. This study therefore offers a practical foundation for increasing decision transparency in unsupervised learning automation.

医学相关(4篇)

【1】Machine Learning Based Prediction of Surgical Outcomes in Chronic Rhinosinusitis from Clinical Data
标题:基于机器学习的临床数据预测慢性鼻窦炎的手术结果
链接:https://arxiv.org/abs/2602.17888

作者:Sayeed Shafayet Chowdhury,Karen D'Souza,V. Siva Kakumani,Snehasis Mukhopadhyay,Shiaofen Fang,Rodney J. Schlosser,Daniel M. Beswick,Jeremiah A. Alt,Jess C. Mace,Zachary M. Soler,Timothy L. Smith,Vijay R. Ramakrishnan
摘要:人工智能(AI)通过实现跨成像和病理学的快速准确分析,日益改变医学诊断学。然而,将机器学习预测应用于观察性临床干预试验前瞻性收集的标准化数据的研究仍然没有得到充分探索,尽管它有可能降低成本并改善患者结局。慢性鼻窦炎(CRS)是一种持续时间超过三个月的鼻窦炎性疾病,对生活质量(QoL)和社会成本造成了巨大负担。虽然许多患者对药物治疗有反应,但其他难治性症状的患者通常会进行手术干预。CRS的手术决策是复杂的,因为它必须权衡已知的手术风险和不确定的个体化结果。在这项研究中,我们评估了有监督的机器学习模型,以预测CRS中的手术获益,使用鼻窦结局测试-22(SNOT-22)作为主要患者报告的结局。我们从观察性干预试验中前瞻性收集的队列包括所有接受手术的患者;我们调查了仅根据术前数据训练的模型是否可以识别在手术前可能未被推荐手术的患者。在多种算法中,包括集成方法,我们的最佳模型实现了约85%的分类准确度,提供了准确和可解释的手术候选人预测。此外,在30例混合困难的病例中,我们的模型实现了80%的准确率,超过了专家临床医生的平均预测准确率(75.6%),证明了其增强临床决策和支持个性化CRS护理的潜力。
摘要:Artificial intelligence (AI) has increasingly transformed medical prognostics by enabling rapid and accurate analysis across imaging and pathology. However, the investigation of machine learning predictions applied to prospectively collected, standardized data from observational clinical intervention trials remains underexplored, despite its potential to reduce costs and improve patient outcomes. Chronic rhinosinusitis (CRS), a persistent inflammatory disease of the paranasal sinuses lasting more than three months, imposes a substantial burden on quality of life (QoL) and societal cost. Although many patients respond to medical therapy, others with refractory symptoms often pursue surgical intervention. Surgical decision-making in CRS is complex, as it must weigh known procedural risks against uncertain individualized outcomes. In this study, we evaluated supervised machine learning models for predicting surgical benefit in CRS, using the Sino-Nasal Outcome Test-22 (SNOT-22) as the primary patient-reported outcome. Our prospectively collected cohort from an observational intervention trial comprised patients who all underwent surgery; we investigated whether models trained only on preoperative data could identify patients who might not have been recommended surgery prior to the procedure. Across multiple algorithms, including an ensemble approach, our best model achieved approximately 85% classification accuracy, providing accurate and interpretable predictions of surgical candidacy. Moreover, on a held-out set of 30 cases spanning mixed difficulty, our model achieved 80% accuracy, exceeding the average prediction accuracy of expert clinicians (75.6%), demonstrating its potential to augment clinical decision-making and support personalized CRS care.

【2】RamanSeg: Interpretability-driven Deep Learning on Raman Spectra for Cancer Diagnosis
标题 :RamanSeg:用于癌症诊断的拉曼光谱的可解释性驱动深度学习
链接:https://arxiv.org/abs/2602.18119

作者:Chris Tomy,Mo Vali,David Pertzborn,Tammam Alamatouri,Anna Mühlig,Orlando Guntinas-Lichius,Anna Xylander,Eric Michele Fantuzzi,Matteo Negro,Francesco Crisafi,Pietro Lio,Tiago Azevedo
备注:12 pages, 8 figures
摘要:组织学是目前癌症诊断的金标准,涉及化学染色后对组织样本的人工检查,这是一个耗时的过程,需要专家分析。拉曼光谱是一种从样品中提取信息的替代、无污染方法。使用nnU-Net,我们在与肿瘤注释对齐的空间拉曼光谱的新数据集上训练了一个分割模型,实现了80.9%的平均前景Dice得分,超过了以前的工作。此外,我们提出了一种新的,可解释的,基于原型的架构称为RamanSeg。RamanSeg基于训练集的发现区域对像素进行分类,生成分割掩码。RamanSeg的两个变体允许在可解释性和性能之间进行权衡:一个具有原型投影,另一个无投影版本。无投影的RamanSeg表现优于U-Net基线,平均前景Dice得分为67.3%,比黑盒训练方法有了有意义的改进。
摘要:Histopathology, the current gold standard for cancer diagnosis, involves the manual examination of tissue samples after chemical staining, a time-consuming process requiring expert analysis. Raman spectroscopy is an alternative, stain-free method of extracting information from samples. Using nnU-Net, we trained a segmentation model on a novel dataset of spatial Raman spectra aligned with tumour annotations, achieving a mean foreground Dice score of 80.9%, surpassing previous work. Furthermore, we propose a novel, interpretable, prototype-based architecture called RamanSeg. RamanSeg classifies pixels based on discovered regions of the training set, generating a segmentation mask. Two variants of RamanSeg allow a trade-off between interpretability and performance: one with prototype projection and another projection-free version. The projection-free RamanSeg outperformed a U-Net baseline with a mean foreground Dice score of 67.3%, offering a meaningful improvement over a black-box training approach.

【3】Deep Learning for Dermatology: An Innovative Framework for Approaching Precise Skin Cancer Detection
标题:皮肤病学深度学习:精确皮肤癌检测的创新框架
链接:https://arxiv.org/abs/2602.17797

作者:Mohammad Tahmid Noor,B. M. Shahria Alam,Tasmiah Rahman Orpa,Shaila Afroz Anika,Mahjabin Tasnim Samiha,Fahad Ahammed
备注:6 pages, 9 figures, this is the author's accepted manuscript of a paper accepted for publication in the Proceedings of the 16th International IEEE Conference on Computing, Communication and Networking Technologies (ICCCNT 2025). The final published version will be available via IEEE Xplore
摘要:皮肤癌如果不及早诊断,可能危及生命,这是一种流行但可预防的疾病。在全球范围内,皮肤癌被认为是最好的流行癌症之一,每年有数百万人被诊断出来。对于皮肤病诊断中至关重要的良性和恶性皮肤斑点的分配,本文研究了两个突出的深度学习模型VGG16和DenseNet201的应用。我们评估了这些CNN架构在区分良性和恶性皮肤病变方面的有效性,这些架构利用了对皮肤癌定位的深度学习增强。我们的目标是评估模型的准确性和计算效率,深入了解这些模型如何有助于皮肤科的早期检测、诊断和简化工作流程。我们在包含3297张图像的二进制类数据集上使用了两种深度学习方法DenseNet201和VGG16模型。最好的结果是DenseNet201,准确率为93.79%。通过重新缩放将所有图像的大小调整为224x224。虽然这两种模型都提供了出色的准确性,但仍有一些改进的空间。在未来使用新的数据集时,我们倾向于通过实现更高的准确性来改进我们的工作。
摘要:Skin cancer can be life-threatening if not diagnosed early, a prevalent yet preventable disease. Globally, skin cancer is perceived among the finest prevailing cancers and millions of people are diagnosed each year. For the allotment of benign and malignant skin spots, an area of critical importance in dermatological diagnostics, the application of two prominent deep learning models, VGG16 and DenseNet201 are investigated by this paper. We evaluate these CNN architectures for their efficacy in differentiating benign from malignant skin lesions leveraging enhancements in deep learning enforced to skin cancer spotting. Our objective is to assess model accuracy and computational efficiency, offering insights into how these models could assist in early detection, diagnosis, and streamlined workflows in dermatology. We used two deep learning methods DenseNet201 and VGG16 model on a binary class dataset containing 3297 images. The best result with an accuracy of 93.79% achieved by DenseNet201. All images were resized to 224x224 by rescaling. Although both models provide excellent accuracy, there is still some room for improvement. In future using new datasets, we tend to improve our work by achieving great accuracy.

【4】Deep Neural Network Architectures for Electrocardiogram Classification: A Comprehensive Evaluation
标题:用于心电图分类的深度神经网络架构:综合评估
链接:https://arxiv.org/abs/2602.17701

作者:Yun Song,Wenjia Zheng,Tiedan Chen,Ziyu Wang,Jiazhao Shi,Yisong Chen
摘要:随着心血管疾病发病率的不断上升,心电图(ECG)仍然是心脏异常的无创检测所必需的。本研究对用于自动心律失常分类的深度神经网络架构进行了全面评估,整合了时间建模,注意力机制和集成策略。为了解决少数群体的数据稀缺问题,MIT-BIH心律失常数据集使用生成对抗网络(GAN)进行了增强。我们开发并比较了四种不同的架构,包括卷积神经网络(CNN),结合长短期记忆的CNN(CNN-LSTM),带注意力的CNN-LSTM和1D残差网络(ResNet-1D),以捕获局部形态特征和长期时间依赖性。使用准确度、F1评分和曲线下面积(AUC)(95\%置信区间)严格评估性能,以确保统计稳健性,同时采用加权类别激活映射(Grad-CAM)验证模型的可解释性。实验结果表明,CNN-LSTM模型在灵敏度和特异性之间实现了最佳的独立平衡,F1得分为0.951。相反,CNN-LSTM-Attention和ResNet-1D模型对类别不平衡表现出更高的敏感性。为了缓解这一问题,引入了动态集成融合策略;具体来说,Top2加权集成实现了最高的整体性能,F1得分为0.958。这些发现表明,利用互补的深度架构可以显著提高分类可靠性,为智能心律失常检测系统提供强大且可解释的基础。
摘要:With the rising prevalence of cardiovascular diseases, electrocardiograms (ECG) remain essential for the non-invasive detection of cardiac abnormalities. This study presents a comprehensive evaluation of deep neural network architectures for automated arrhythmia classification, integrating temporal modeling, attention mechanisms, and ensemble strategies. To address data scarcity in minority classes, the MIT-BIH Arrhythmia dataset was augmented using a Generative Adversarial Network (GAN). We developed and compared four distinct architectures, including Convolutional Neural Networks (CNN), CNN combined with Long Short-Term Memory (CNN-LSTM), CNN-LSTM with Attention, and 1D Residual Networks (ResNet-1D), to capture both local morphological features and long-term temporal dependencies. Performance was rigorously evaluated using accuracy, F1-score, and Area Under the Curve (AUC) with 95\% confidence intervals to ensure statistical robustness, while Gradient-weighted Class Activation Mapping (Grad-CAM) was employed to validate model interpretability. Experimental results indicate that the CNN-LSTM model achieved the optimal stand-alone balance between sensitivity and specificity, yielding an F1-score of 0.951. Conversely, the CNN-LSTM-Attention and ResNet-1D models exhibited higher sensitivity to class imbalance. To mitigate this, a dynamic ensemble fusion strategy was introduced; specifically, the Top2-Weighted ensemble achieved the highest overall performance with an F1-score of 0.958. These findings demonstrate that leveraging complementary deep architectures significantly enhances classification reliability, providing a robust and interpretable foundation for intelligent arrhythmia detection systems.

蒸馏|知识提取(1篇)

【1】Curriculum Learning for Efficient Chain-of-Thought Distillation via Structure-Aware Masking and GRPO
标题:通过结构感知掩蔽和GRPO进行有效思想链提炼的课程学习
链接:https://arxiv.org/abs/2602.17686

作者:Bowen Yu,Maolin Wang,Sheng Zhang,Binhao Wang,Yi Wen,Jingtong Gao,Bowen Liu,Zimo Zhao,Wanyu Wang,Xiangyu Zhao
备注:22 pages, 12 figures
摘要:将思想链(CoT)推理从大型语言模型提取到紧凑的学生模型中提出了一个根本性的挑战:教师的理论往往过于冗长,无法忠实地再现较小的模型。现有的方法要么将推理压缩到单步,失去了使CoT有价值的可解释性。我们提出了一个三阶段的课程学习框架,通过渐进的技能获取解决这种能力不匹配。首先,我们通过掩码混洗重建建立结构理解。其次,我们将组相对策略优化(GRPO)应用于掩码完成任务,使模型能够在准确性和简洁性之间找到自己的平衡。第三,我们确定持续的失败案例,并引导学生内化教师的知识,通过有针对性的重写,再次优化GRPO。GSM 8 K上的实验表明,我们的方法使Qwen2.5-3B-Base实现了11.29%的精度提高,同时减少了27.4%的输出长度,超过了调整的变体和先前的蒸馏方法。
摘要:Distilling Chain-of-Thought (CoT) reasoning from large language models into compact student models presents a fundamental challenge: teacher rationales are often too verbose for smaller models to faithfully reproduce. Existing approaches either compress reasoning into single-step, losing the interpretability that makes CoT valuable. We present a three-stage curriculum learning framework that addresses this capacity mismatch through progressive skill acquisition. First, we establish structural understanding via masked shuffled reconstruction. Second, we apply Group Relative Policy Optimization (GRPO) on masked completion tasks, enabling the model to discover its own balance between accuracy and brevity. Third, we identify persistent failure cases and guide the student to internalize teacher knowledge through targeted rewriting, again optimized with GRPO. Experiments on GSM8K demonstrate that our approach enables Qwen2.5-3B-Base to achieve an 11.29 percent accuracy improvement while reducing output length by 27.4 percent, surpassing both instruction-tuned variants and prior distillation methods.

超分辨率|去噪|去模糊|去雾(1篇)

【1】Drift Estimation for Stochastic Differential Equations with Denoising Diffusion Models
标题:具有去噪扩散模型的随机方程的漂移估计
链接:https://arxiv.org/abs/2602.17830

作者:Marcos Tapia Costa,Nikolas Kantas,George Deligiannidis
摘要:我们研究了具有已知扩散系数的多变量随机微分方程中时间齐次漂移函数的估计,从在固定时间范围内高频观察到的多个轨迹。我们将漂移估计公式化为基于先前观测的去噪问题,并提出了漂移函数的估计器,该漂移函数是训练能够动态模拟新轨迹的条件扩散模型的副产品。在不同的漂移类中,发现所提出的估计量在低维度中与经典方法相匹配,并且在较高维度中保持一贯的竞争力,其收益不能单独归因于建筑设计选择。
摘要:We study the estimation of time-homogeneous drift functions in multivariate stochastic differential equations with known diffusion coefficient, from multiple trajectories observed at high frequency over a fixed time horizon. We formulate drift estimation as a denoising problem conditional on previous observations, and propose an estimator of the drift function which is a by-product of training a conditional diffusion model capable of simulating new trajectories dynamically. Across different drift classes, the proposed estimator was found to match classical methods in low dimensions and remained consistently competitive in higher dimensions, with gains that cannot be attributed to architectural design choices alone.

自动驾驶|车辆|车道检测等(1篇)

【1】Reducing Text Bias in Synthetically Generated MCQAs for VLMs in Autonomous Driving
标题:减少自动驾驶中针对VLM合成生成的MCQA中的文本偏差
链接:https://arxiv.org/abs/2602.17677

作者:Sutej Kulgod,Sean Ye,Sanchit Tanwar,Christoffer Heckman
备注:7 pages, 2 figures
摘要:多项选择题测试(MCQA)基准是衡量视觉语言模型(VLM)在驾驶任务中表现的既定标准。然而,我们观察到一个已知的现象,即合成生成的MCQA非常容易受到隐藏的文本线索的影响,这些文本线索允许模型利用语言模式而不是视觉上下文。我们的研究结果表明,即使没有视觉输入,对这些数据进行微调的VLM也可以达到与人类验证的基准相当的准确性。我们提出的方法将盲准确率从随机以上+66.9%降低到+2.9%,消除了绝大多数可利用的文本快捷方式。通过将正确答案与语言工件分离并采用课程学习策略,我们迫使模型依赖于视觉基础,确保性能准确反映感知理解。
摘要:Multiple Choice Question Answering (MCQA) benchmarks are an established standard for measuring Vision Language Model (VLM) performance in driving tasks. However, we observe the known phenomenon that synthetically generated MCQAs are highly susceptible to hidden textual cues that allow models to exploit linguistic patterns rather than visual context. Our results show that a VLM fine-tuned on such data can achieve accuracy comparable to human-validated benchmarks even without visual input. Our proposed method reduces blind accuracy from +66.9% above random to +2.9%, eliminating the vast majority of exploitable textual shortcuts. By decoupling the correct answer from linguistic artifacts and employing a curriculum learning strategy, we force the model to rely on visual grounding, ensuring that performance accurately reflects perceptual understanding.

联邦学习|隐私保护|加密(1篇)

【1】FedZMG: Efficient Client-Side Optimization in Federated Learning
标题:FedZMG:联邦学习中的高效客户端优化
链接:https://arxiv.org/abs/2602.18384

作者:Fotios Zantalis,Evangelos Zervas,Grigorios Koulouras
摘要:联邦学习(FL)支持在边缘设备上进行分布式模型训练,同时保护数据隐私。然而,客户端往往具有非独立和相同分布(非IID)的数据,这通常会导致客户端漂移,从而降低收敛速度和模型性能。虽然已经提出了自适应优化器来减轻这些影响,但它们经常引入不适合资源受限的物联网环境的计算复杂性或通信开销。本文介绍了Federated Zero Mean Quantients(FedZMG),这是一种新颖的无参数客户端优化算法,旨在通过结构化优化空间来解决客户端漂移问题。FedZMG推进了梯度集中化的思想,将局部梯度投影到零均值超平面上,有效地中和了异构数据分布中固有的“强度”或“偏差”偏移,而无需额外的通信或超参数调整。理论分析表明,与标准FedAvg算法相比,FedZMG算法降低了有效梯度方差,保证了更严格的收敛范围。对EMNIST、CIFAR100和Shakespeare数据集的广泛实证评估表明,与基准FedAvg和自适应优化器FedAdam相比,FedZMG实现了更好的收敛速度和最终验证精度,特别是在高度非IID设置中。
摘要:Federated Learning (FL) enables distributed model training on edge devices while preserving data privacy. However, clients tend to have non-Independent and Identically Distributed (non-IID) data, which often leads to client-drift, and therefore diminishing convergence speed and model performance. While adaptive optimizers have been proposed to mitigate these effects, they frequently introduce computational complexity or communication overhead unsuitable for resource-constrained IoT environments. This paper introduces Federated Zero Mean Gradients (FedZMG), a novel, parameter-free, client-side optimization algorithm designed to tackle client-drift by structurally regularizing the optimization space. Advancing the idea of Gradient Centralization, FedZMG projects local gradients onto a zero-mean hyperplane, effectively neutralizing the "intensity" or "bias" shifts inherent in heterogeneous data distributions without requiring additional communication or hyperparameter tuning. A theoretical analysis is provided, proving that FedZMG reduces the effective gradient variance and guarantees tighter convergence bounds compared to standard FedAvg. Extensive empirical evaluations on EMNIST, CIFAR100, and Shakespeare datasets demonstrate that FedZMG achieves better convergence speed and final validation accuracy compared to the baseline FedAvg and the adaptive optimizer FedAdam, particularly in highly non-IID settings.

推理|分析|理解|解释(4篇)

【1】RAT+: Train Dense, Infer Sparse -- Recurrence Augmented Attention for Dilated Inference
标题:RAT+:火车密集,推理稀疏--重复增强注意力扩展推理
链接:https://arxiv.org/abs/2602.18196

作者:Xiuying Wei,Caglar Gulcehre
摘要:结构化扩张注意力有一个吸引人的推理时间效率旋钮:它减少了注意力的FLOPs和KV缓存大小的扩张大小D的一个因素,同时保持长距离连接。然而,我们发现了它们的一个持续故障模式-将预训练的注意力模型稀疏化为扩张模式会导致严重的准确性下降。我们引入了RAT+,这是一种密集的预训练架构,通过全序列递归和主动递归学习来增强注意力。单个RAT+模型被密集地预训练一次,然后在推理时灵活地切换到扩大的注意力(可选地具有局部窗口)或混合层/头部组合,仅需要短的1B令牌分辨率自适应,而不是重新训练单独的稀疏模型。在100 B令牌上训练的1.5 B参数下,RAT+在16处与密集准确度非常接近,在常识推理和LongBench任务上分别在64处下降了约2-3点。此外,当稀疏化到前k个块注意力时,RAT+优于注意力。我们进一步扩展到26亿个参数和200亿个代币,并观察到相同的趋势。
摘要:Structured dilated attention has an appealing inference-time efficiency knob: it reduces the FLOPs of the attention and the KV cache size by a factor of the dilation size D, while preserving long-range connectivity. However, we find a persistent failure mode of them -- sparsifying a pretrained attention model to a dilated pattern leads to severe accuracy degradation. We introduce RAT+, a dense-pretraining architecture that augments attention with full-sequence recurrence and active recurrence learning. A single RAT+ model is pretrained densely once, then flexibly switched at inference time to dilated attention (optionally with local windows) or hybrid layer/head compositions, requiring only a short 1B-token resolution adaptation rather than retraining separate sparse models. At 1.5B parameters trained on 100B tokens, RAT+ closely matches dense accuracy at 16 and drops by about 2-3 points at 64 on commonsense reasoning and LongBench tasks, respectively. Moreover, RAT+ outperforms attention when sparsifying to the top-k block attention. We further scale to 2.6B parameters and 200B tokens and observe the same trend.

【2】Unifying Formal Explanations: A Complexity-Theoretic Perspective
标题:统一形式解释:复杂性理论的视角
链接:https://arxiv.org/abs/2602.18160

作者:Shahaf Bassan,Xuanxiang Huang,Guy Katz
备注:To appear in ICLR 2026
摘要:以前的工作已经探索了为ML模型预测导出两种基本类型解释的计算复杂性:(1)* 充分原因 *,这是输入特征的子集,当固定时,确定预测,以及(2)* 对比原因 *,这是输入特征的子集,当修改时,改变预测。先前的研究已经在不同的背景下研究了这些解释,例如非概率与概率框架以及局部与全局设置。在这项研究中,我们引入了一个统一的框架来分析这些解释,表明它们都可以通过一个统一的概率值函数的最小化来表征。然后,我们证明了这些计算的复杂性受到值函数的三个关键性质的影响:(1)* 单调性 *,(2)* 子模块性 * 和(3)* 超模块性 * -这是 * 组合优化 * 中的三个基本性质。我们的研究结果揭示了一些违反直觉的结果,这些属性的性质内的解释设置检查。例如,虽然 * 局部 * 值函数不表现出单调性或子模性/超模性,但我们证明了 * 全局 * 值函数确实具有这些性质。这种区别使我们能够证明一系列新颖的多项式时间结果,用于在全局可解释性设置中计算各种解释,并在一系列跨越可解释性谱的ML模型中提供可证明的保证,例如神经网络,决策树和树集成。相比之下,我们表明,即使是高度简化的版本,这些解释成为NP-难以计算在相应的本地可解释性设置。
摘要 :Previous work has explored the computational complexity of deriving two fundamental types of explanations for ML model predictions: (1) *sufficient reasons*, which are subsets of input features that, when fixed, determine a prediction, and (2) *contrastive reasons*, which are subsets of input features that, when modified, alter a prediction. Prior studies have examined these explanations in different contexts, such as non-probabilistic versus probabilistic frameworks and local versus global settings. In this study, we introduce a unified framework for analyzing these explanations, demonstrating that they can all be characterized through the minimization of a unified probabilistic value function. We then prove that the complexity of these computations is influenced by three key properties of the value function: (1) *monotonicity*, (2) *submodularity*, and (3) *supermodularity* - which are three fundamental properties in *combinatorial optimization*. Our findings uncover some counterintuitive results regarding the nature of these properties within the explanation settings examined. For instance, although the *local* value functions do not exhibit monotonicity or submodularity/supermodularity whatsoever, we demonstrate that the *global* value functions do possess these properties. This distinction enables us to prove a series of novel polynomial-time results for computing various explanations with provable guarantees in the global explainability setting, across a range of ML models that span the interpretability spectrum, such as neural networks, decision trees, and tree ensembles. In contrast, we show that even highly simplified versions of these explanations become NP-hard to compute in the corresponding local explainability setting.

【3】Turbo Connection: Reasoning as Information Flow from Higher to Lower Layers
标题:涡轮连接:作为从高层到下层的信息流进行推理
链接:https://arxiv.org/abs/2602.17993

作者:Mohan Tang,Sidi Lu
摘要:复杂的问题,无论是数学、逻辑还是规划,都是由人类通过一系列步骤来解决的,其中一个步骤的结果会通知下一个步骤。在这项工作中,我们采用的观点,Transformers的推理能力是从根本上限制了一个固定的最大数量的步骤沿任何潜在的计算路径。为了解决这个问题,我们介绍了涡轮连接(TurboConn),一种新的架构,克服了固定的深度约束路由多个剩余连接从高层隐藏状态的每个令牌$t$的令牌$t+1$的较低层。使用我们的方法微调预训练的LLM不仅在GSM 8 K,Parity和多步算法等基准测试中获得了0.9%到10%以上的准确性增益,而且还证明了这些向后连接的密度是至关重要的;我们的密集交互显著优于仅通过单个隐藏状态或向量的“稀疏”替代方案。值得注意的是,TurboConn可以集成到预先训练的LLM中,以克服特定任务的高原:虽然经过微调的Qwen-3-1.7B在Parity上仅达到53.78%,但添加我们的架构修改使模型能够达到100%的准确性,所有这些都无需从头开始重新训练完整模型或复杂的课程学习。我们的研究结果提供了强有力的经验证据,证明计算路径的深度是推理能力的关键因素,还提供了一种新的机制来增强LLM,而不会显着影响生成延迟。
摘要:Complex problems, whether in math, logic, or planning, are solved by humans through a sequence of steps where the result of one step informs the next. In this work, we adopt the perspective that the reasoning power of Transformers is fundamentally limited by a fixed maximum number of steps along any latent path of computation. To address this, we introduce Turbo Connection (TurboConn), a novel architecture that overcomes the fixed-depth constraint by routing multiple residual connections from the higher-layer hidden states of each token $t$ to the lower layers of token $t+1$. Fine-tuning pre-trained LLMs with our method not only yields accuracy gains of 0.9% to over 10% on benchmarks like GSM8K, Parity, and multi-step arithmetic, but also demonstrates that the density of these backward connections is critical; our dense interaction significantly outperforms "sparse" alternatives that only pass a single hidden state or vector. Notably, TurboConn can be integrated into pre-trained LLMs to overcome task-specific plateaus: while a fine-tuned Qwen-3-1.7B achieves only 53.78% on Parity, adding our architectural modification enables the model to reach 100% accuracy, all without the necessity to retrain the full model from scratch or sophisticated curriculum learning. Our results provide strong empirical evidence that the depth of the computational path is a key factor in reasoning ability, also offering a new mechanism to enhance LLMs without significantly affecting generation latency.

【4】Understanding the Generalization of Bilevel Programming in Hyperparameter Optimization: A Tale of Bias-Variance Decomposition
标题:理解超参数优化中二层规划的推广:偏差方差分解的故事
链接:https://arxiv.org/abs/2602.17947

作者:Yubo Zhou,Jun Shu,Junmin Liu,Deyu Meng
摘要:最近出现了基于约束的超参数优化(HPO),利用双层规划技术通过估计超梯度w.r.t.验证损失然而,先前的理论工作主要集中在减少估计和地面实况之间的差距(即,偏差),而忽略由于数据分布引起的误差(即,方差),这降低了性能。为了解决这个问题,我们进行了偏置方差分解的超梯度估计误差,并提供了一个补充的详细分析的方差项忽略了以前的作品。我们还提出了一个全面的分析超梯度估计的误差界。这有助于简单地解释实践中经常观察到的一些现象,例如验证集的过拟合。在理论的启发下,我们提出了一种集成超梯度策略来有效地降低HPO算法的方差。在正则化超参数学习、数据超清洗和Few-Shot学习等任务上的实验结果表明,我们的方差减少策略提高了超梯度估计。为了解释性能的提高,我们在超额误差和超梯度估计之间建立了联系,从而提供了对经验观察的一些理解。
摘要:Gradient-based hyperparameter optimization (HPO) have emerged recently, leveraging bilevel programming techniques to optimize hyperparameter by estimating hypergradient w.r.t. validation loss. Nevertheless, previous theoretical works mainly focus on reducing the gap between the estimation and ground-truth (i.e., the bias), while ignoring the error due to data distribution (i.e., the variance), which degrades performance. To address this issue, we conduct a bias-variance decomposition for hypergradient estimation error and provide a supplemental detailed analysis of the variance term ignored by previous works. We also present a comprehensive analysis of the error bounds for hypergradient estimation. This facilitates an easy explanation of some phenomena commonly observed in practice, like overfitting to the validation set. Inspired by the derived theories, we propose an ensemble hypergradient strategy to reduce the variance in HPO algorithms effectively. Experimental results on tasks including regularization hyperparameter learning, data hyper-cleaning, and few-shot learning demonstrate that our variance reduction strategy improves hypergradient estimation. To explain the improved performance, we establish a connection between excess error and hypergradient estimation, offering some understanding of empirical observations.

分类|识别(4篇)

【1】Latent Equivariant Operators for Robust Object Recognition: Promise and Challenges
标题:鲁棒对象识别的潜在等变运算符:前景与挑战
链接:https://arxiv.org/abs/2602.18406

作者:Minh Dinh,Stéphane Deny
摘要:尽管深度学习在计算机视觉中取得了成功,但在识别经历了训练期间很少见到的群对称变换的对象方面仍然存在困难,例如以不寻常的姿势,比例,位置或其组合看到的对象。等变神经网络是一种解决对称变换泛化问题的方法,但需要先验的变换知识。另一种体系结构家族提出从对称变换的例子中获得潜在空间中的等变算子。在这里,使用旋转和平移的噪声MNIST的简单数据集,我们说明了如何成功地利用这种架构进行分布外分类,从而克服传统和等变网络的局限性。虽然概念上很吸引人,但我们讨论了将这些架构扩展到更复杂数据集的道路上面临的挑战。
摘要:Despite the successes of deep learning in computer vision, difficulties persist in recognizing objects that have undergone group-symmetric transformations rarely seen during training-for example objects seen in unusual poses, scales, positions, or combinations thereof. Equivariant neural networks are a solution to the problem of generalizing across symmetric transformations, but require knowledge of transformations a priori. An alternative family of architectures proposes to earn equivariant operators in a latent space from examples of symmetric transformations. Here, using simple datasets of rotated and translated noisy MNIST, we illustrate how such architectures can successfully be harnessed for out-of-distribution classification, thus overcoming the limitations of both traditional and equivariant networks. While conceptually enticing, we discuss challenges ahead on the path of scaling these architectures to more complex datasets.

【2】LERD: Latent Event-Relational Dynamics for Neurodegenerative Classification
标题:LERD:神经退行性分类的潜在事件关系动力学
链接:https://arxiv.org/abs/2602.18195

作者:Hairong Chen,Yicheng Feng,Ziyu Jia,Samir Bhatt,Hengguan Huang
摘要:阿尔茨海默病(AD)改变大脑电生理学并扰乱多通道脑电动力学,使得准确且临床有用的基于脑电的诊断对于筛查和疾病监测变得越来越重要。然而,许多现有的方法依赖于黑盒分类器,并且没有明确地对生成观察到的信号的潜在动态进行建模。为了解决这些局限性,我们提出了LERD,端到端贝叶斯电生理神经动力学系统,直接从多通道EEG推断潜在的神经事件和它们的关系结构,而无需事件或交互注释。LERD将连续时间事件推理模块与随机事件生成过程相结合,以捕获灵活的时间模式,同时在以原则性方式指导学习之前结合电生理学启发的动态。我们进一步提供理论分析,产生一个听话的约束的训练和稳定性保证推断的关系动态。对合成基准和两个真实世界的AD EEG队列的广泛实验表明,LERD始终优于强基线,并产生生理学对齐的潜在摘要,有助于表征组级动态差异。
摘要 :Alzheimer's disease (AD) alters brain electrophysiology and disrupts multichannel EEG dynamics, making accurate and clinically useful EEG-based diagnosis increasingly important for screening and disease monitoring. However, many existing approaches rely on black-box classifiers and do not explicitly model the underlying dynamics that generate observed signals. To address these limitations, we propose LERD, an end-to-end Bayesian electrophysiological neural dynamical system that infers latent neural events and their relational structure directly from multichannel EEG without event or interaction annotations. LERD combines a continuous-time event inference module with a stochastic event-generation process to capture flexible temporal patterns, while incorporating an electrophysiology-inspired dynamical prior to guide learning in a principled way. We further provide theoretical analysis that yields a tractable bound for training and stability guarantees for the inferred relational dynamics. Extensive experiments on synthetic benchmarks and two real-world AD EEG cohorts demonstrate that LERD consistently outperforms strong baselines and yields physiology-aligned latent summaries that help characterize group-level dynamical differences.

【3】Quantum-enhanced satellite image classification
标题:量子增强卫星图像分类
链接:https://arxiv.org/abs/2602.18350

作者:Qi Zhang,Anton Simen,Carlos Flores-Garrigós,Gabriel Alvarado Barrios,Paolo A. Erdman,Enrique Solano,Aaron C. Kemp,Vincent Beltrani,Vedangi Pathak,Hamed Mohammadbagherpoor
摘要:我们展示了应用量子特征提取方法来增强空间应用的多类图像分类。通过利用多体自旋哈密顿的动力学,该方法生成表达量子特征,当与经典处理相结合时,导致量子增强的分类准确性。使用强大且完善的ResNet 50基线,我们实现了83%的最大经典准确率,通过迁移学习方法可以提高到84%。相比之下,应用我们的量子经典方法,性能提高到87%的准确度,证明了稳健的经典方法的清晰和可重复的改进。在IBM的几个量子处理器上实现,我们的混合量子经典方法在绝对精度上提供了2-3%的一致增益。这些结果突出了当前和近期量子处理器在卫星成像和遥感等高风险数据驱动领域的实际潜力,同时表明在现实世界的机器学习任务中具有更广泛的适用性。
摘要:We demonstrate the application of a quantum feature extraction method to enhance multi-class image classification for space applications. By harnessing the dynamics of many-body spin Hamiltonians, the method generates expressive quantum features that, when combined with classical processing, lead to quantum-enhanced classification accuracy. Using a strong and well-established ResNet50 baseline, we achieved a maximum classical accuracy of 83%, which can be improved to 84% with a transfer learning approach. In contrast, applying our quantum-classical method the performance is increased to 87% accuracy, demonstrating a clear and reproducible improvement over robust classical approaches. Implemented on several of IBM's quantum processors, our hybrid quantum-classical approach delivers consistent gains of 2-3% in absolute accuracy. These results highlight the practical potential of current and near-term quantum processors in high-stakes, data-driven domains such as satellite imaging and remote sensing, while suggesting broader applicability in real-world machine learning tasks.

【4】Box Thirding: Anytime Best Arm Identification under Insufficient Sampling
标题:盒子三分之一:抽样不足下的任何时候最佳手臂识别
链接:https://arxiv.org/abs/2602.18186

作者:Seohwa Hwang,Junyong Park
备注:29 pages, 5 figures
摘要:我们引入了Box Thirding(B3),这是一种在固定预算约束下进行最佳臂识别(BAI)的灵活而高效的算法。它被设计用于任何时候BAI和具有大N的场景,其中臂的数量太大而无法在有限的预算T内进行详尽的评估。该算法采用迭代三元比较:在每次迭代中,比较三个手臂-进一步探索性能最好的手臂,推迟中位数用于未来的比较,最弱的被丢弃。即使没有T的先验知识,B3也实现了与连续减半(SH)相当的ε-最佳臂错误识别概率,连续减半(SH)需要T作为预定义参数,应用于符合预算的c 0臂的随机选择的子集。实证结果表明,B3优于现有的方法在有限的预算限制下,简单的遗憾,纽约客卡通字幕大赛数据集上证明。
摘要:We introduce Box Thirding (B3), a flexible and efficient algorithm for Best Arm Identification (BAI) under fixed-budget constraints. It is designed for both anytime BAI and scenarios with large N, where the number of arms is too large for exhaustive evaluation within a limited budget T. The algorithm employs an iterative ternary comparison: in each iteration, three arms are compared--the best-performing arm is explored further, the median is deferred for future comparisons, and the weakest is discarded. Even without prior knowledge of T, B3 achieves an epsilon-best arm misidentification probability comparable to Successive Halving (SH), which requires T as a predefined parameter, applied to a randomly selected subset of c0 arms that fit within the budget. Empirical results show that B3 outperforms existing methods under limited-budget constraints in terms of simple regret, as demonstrated on the New Yorker Cartoon Caption Contest dataset.

表征(1篇)

【1】Neural Prior Estimation: Learning Class Priors from Latent Representations
标题:神经先验估计:从潜在表示中学习类先验
链接:https://arxiv.org/abs/2602.17853

作者:Masoud Yavari,Payman Moallem
摘要:类不平衡通过施加倾斜的有效类先验在深度神经网络中引起系统偏差。这项工作介绍了神经先验估计(NPE),一个框架,学习功能条件的对数先验估计从潜在的表示。NPE采用一个或多个先验估计模块,通过单向物流损失与骨干网联合训练。在神经崩溃机制下,NPE被分析证明可以将类对数先验恢复到一个加性常数,提供一个理论上接地的自适应信号,而不需要显式的类计数或分布特定的超参数。学习的估计被纳入logit调整,形成NPE-LA,偏差感知预测的原则性机制。在长尾CIFAR和不平衡语义分割基准(STARE,ADE 20 K)上的实验表明了一致的改进,特别是对于代表性不足的类。因此,NPE提供了一个轻量级的和理论上合理的方法来学习先验估计和不平衡感知预测。
摘要:Class imbalance induces systematic bias in deep neural networks by imposing a skewed effective class prior. This work introduces the Neural Prior Estimator (NPE), a framework that learns feature-conditioned log-prior estimates from latent representations. NPE employs one or more Prior Estimation Modules trained jointly with the backbone via a one-way logistic loss. Under the Neural Collapse regime, NPE is analytically shown to recover the class log-prior up to an additive constant, providing a theoretically grounded adaptive signal without requiring explicit class counts or distribution-specific hyperparameters. The learned estimate is incorporated into logit adjustment, forming NPE-LA, a principled mechanism for bias-aware prediction. Experiments on long-tailed CIFAR and imbalanced semantic segmentation benchmarks (STARE, ADE20K) demonstrate consistent improvements, particularly for underrepresented classes. NPE thus offers a lightweight and theoretically justified approach to learned prior estimation and imbalance-aware prediction.

优化|敛散性(9篇)

【1】Asynchronous Heavy-Tailed Optimization
标题:同步重尾优化
链接:https://arxiv.org/abs/2602.18002

作者:Junfei Sun,Dixi Yao,Xuchen Gong,Tahseen Rabbani,Manzil Zaheer,Tian Li
备注:8-page main body, 25-page appendix, 5 figures
摘要:在Transformer模型中常见的重尾随机梯度噪声会破坏优化过程的稳定性。最近的工作主要集中在开发和理解的方法,以解决重尾噪声在集中式或分布式,同步设置,离开这样的噪声和异步优化之间的相互作用未充分探讨。在这项工作中,我们研究了两个通信方案,处理落后者与异步更新中存在的重尾梯度噪声。我们提出并从理论上分析了基于延迟感知学习率调度和延迟补偿的算法修改,以提高异步算法的性能。我们的收敛保证下重尾噪声匹配的同步同行的速度和提高延迟容忍度相比,现有的异步方法。从经验上讲,我们的方法在准确性/运行时间权衡方面优于先前的同步和异步方法,并且在图像和语言任务中对超参数更鲁棒。
摘要:Heavy-tailed stochastic gradient noise, commonly observed in transformer models, can destabilize the optimization process. Recent works mainly focus on developing and understanding approaches to address heavy-tailed noise in the centralized or distributed, synchronous setting, leaving the interactions between such noise and asynchronous optimization underexplored. In this work, we investigate two communication schemes that handle stragglers with asynchronous updates in the presence of heavy-tailed gradient noise. We propose and theoretically analyze algorithmic modifications based on delay-aware learning rate scheduling and delay compensation to enhance the performance of asynchronous algorithms. Our convergence guarantees under heavy-tailed noise match the rate of the synchronous counterparts and improve delay tolerance compared with existing asynchronous approaches. Empirically, our approaches outperform prior synchronous and asynchronous methods in terms of accuracy/runtime trade-offs and are more robust to hyperparameters in both image and language tasks.

【2】Learning Optimal and Sample-Efficient Decision Policies with Guarantees
标题:学习具有保证的最佳且样本高效的决策策略
链接:https://arxiv.org/abs/2602.17978

作者:Daqian Shao
备注:A thesis submitted for the degree of DPhil in Computer Science at Oxford
摘要 :决策的范式已经被强化学习和深度学习彻底改变。尽管这已经在机器人、医疗保健和金融等领域取得了重大进展,但在实践中使用RL仍具有挑战性,特别是在可能需要保证的高风险应用中学习决策策略时。传统的RL算法依赖于与环境的大量在线交互,这在在线交互成本高、危险或不可行的情况下是有问题的。然而,从离线数据集学习受到隐藏混杂因素的阻碍。这种混杂因素可能会导致数据集中的虚假相关性,并可能误导代理采取次优或对抗性行动。首先,我们解决了在存在隐藏混杂因素的情况下从离线数据集学习的问题。我们使用工具变量(IV)来识别因果效应,这是条件矩限制(CMR)问题的一个实例。受双重/去偏机器学习的启发,我们推导出一个样本有效的算法来解决CMR问题,并具有收敛性和最优性保证,其性能优于最先进的算法。其次,我们放宽条件的隐藏混杂因素的设置(离线)模仿学习,并适应我们的CMR估计,以获得一个算法,可以学习有效的模仿者的政策,收敛速度的保证。最后,我们考虑了学习线性时序逻辑(LTL)表示的高层次目标的问题,并开发了一个可证明的最佳学习算法,提高了现有方法的样本效率。通过对强化学习基准和合成和半合成数据集的评估,我们证明了本文开发的方法在现实世界决策中的实用性。
摘要:The paradigm of decision-making has been revolutionised by reinforcement learning and deep learning. Although this has led to significant progress in domains such as robotics, healthcare, and finance, the use of RL in practice is challenging, particularly when learning decision policies in high-stakes applications that may require guarantees. Traditional RL algorithms rely on a large number of online interactions with the environment, which is problematic in scenarios where online interactions are costly, dangerous, or infeasible. However, learning from offline datasets is hindered by the presence of hidden confounders. Such confounders can cause spurious correlations in the dataset and can mislead the agent into taking suboptimal or adversarial actions. Firstly, we address the problem of learning from offline datasets in the presence of hidden confounders. We work with instrumental variables (IVs) to identify the causal effect, which is an instance of a conditional moment restrictions (CMR) problem. Inspired by double/debiased machine learning, we derive a sample-efficient algorithm for solving CMR problems with convergence and optimality guarantees, which outperforms state-of-the-art algorithms. Secondly, we relax the conditions on the hidden confounders in the setting of (offline) imitation learning, and adapt our CMR estimator to derive an algorithm that can learn effective imitator policies with convergence rate guarantees. Finally, we consider the problem of learning high-level objectives expressed in linear temporal logic (LTL) and develop a provably optimal learning algorithm that improves sample efficiency over existing methods. Through evaluation on reinforcement learning benchmarks and synthetic and semi-synthetic datasets, we demonstrate the usefulness of the methods developed in this thesis in real-world decision making.

【3】Breaking the Correlation Plateau: On the Optimization and Capacity Limits of Attention-Based Regressors
标题:打破相关平台:基于注意力的回归器的优化和容量限制
链接:https://arxiv.org/abs/2602.17898

作者:Jingquan Yan,Yuwei Miao,Peiran Yu,Junzhou Huang
备注:Accepted by ICLR 2026
摘要:基于注意力的回归模型通常通过联合优化均方误差(MSE)损失和皮尔逊相关系数(PCC)损失来训练,分别强调误差的大小和目标的顺序或形状。在训练过程中,一个常见但知之甚少的现象是PCC平台期:PCC在训练早期停止改善,即使MSE继续下降。我们提供了这种行为的第一个严格的理论分析,揭示了优化动力学和模型容量的基本限制。首先,在平坦的PCC曲线,我们发现了一个关键的冲突,降低MSE(幅度匹配)可以矛盾地抑制PCC梯度(形状匹配)。softmax注意机制加剧了这个问题,特别是当要聚合的数据高度同质时。其次,我们确定了模型容量的限制:我们推导出任何凸聚合器(包括softmax attention)的PCC改进极限,表明输入的凸包严格限制了可实现的PCC增益。我们证明,数据的同质性加剧了这两个限制。受这些见解的启发,我们提出了外推相关注意力(ECA),它结合了新的,理论上的动机机制,以改善PCC优化和外推超出凸包。在各种基准测试中,包括具有挑战性的同质数据设置,ECA始终打破PCC平台,在不影响MSE性能的情况下实现相关性的显著改善。
摘要:Attention-based regression models are often trained by jointly optimizing Mean Squared Error (MSE) loss and Pearson correlation coefficient (PCC) loss, emphasizing the magnitude of errors and the order or shape of targets, respectively. A common but poorly understood phenomenon during training is the PCC plateau: PCC stops improving early in training, even as MSE continues to decrease. We provide the first rigorous theoretical analysis of this behavior, revealing fundamental limitations in both optimization dynamics and model capacity. First, in regard to the flattened PCC curve, we uncover a critical conflict where lowering MSE (magnitude matching) can paradoxically suppress the PCC gradient (shape matching). This issue is exacerbated by the softmax attention mechanism, particularly when the data to be aggregated is highly homogeneous. Second, we identify a limitation in the model capacity: we derived a PCC improvement limit for any convex aggregator (including the softmax attention), showing that the convex hull of the inputs strictly bounds the achievable PCC gain. We demonstrate that data homogeneity intensifies both limitations. Motivated by these insights, we propose the Extrapolative Correlation Attention (ECA), which incorporates novel, theoretically-motivated mechanisms to improve the PCC optimization and extrapolate beyond the convex hull. Across diverse benchmarks, including challenging homogeneous data setting, ECA consistently breaks the PCC plateau, achieving significant improvements in correlation without compromising MSE performance.

【4】MePoly: Max Entropy Polynomial Policy Optimization
标题:MePoly:最大Entropy多项政策优化
链接:https://arxiv.org/abs/2602.17832

作者:Hang Liu,Sangli Teng,Maani Ghaffari
摘要:随机最优控制为解决复杂的决策问题提供了一个统一的数学框架,包括最大熵强化学习(RL)和模仿学习(IL)等范式。然而,传统的参数政策往往很难代表解决方案的多模态。虽然基于扩散的策略旨在恢复多模态,但它们缺乏明确的概率密度,这使得策略梯度优化变得复杂。为了弥合这一差距,我们提出了MePoly,一种新的政策参数化的基础上多项式能量为基础的模型。MePoly提供了一个明确的,易于处理的概率密度,使精确的熵最大化。从理论上讲,我们的方法在经典的矩问题,利用任意分布的通用近似能力。从经验上讲,我们证明了MePoly有效地捕获了复杂的非凸流形,并在不同的基准测试中表现优于基线。
摘要:Stochastic Optimal Control provides a unified mathematical framework for solving complex decision-making problems, encompassing paradigms such as maximum entropy reinforcement learning(RL) and imitation learning(IL). However, conventional parametric policies often struggle to represent the multi-modality of the solutions. Though diffusion-based policies are aimed at recovering the multi-modality, they lack an explicit probability density, which complicates policy-gradient optimization. To bridge this gap, we propose MePoly, a novel policy parameterization based on polynomial energy-based models. MePoly provides an explicit, tractable probability density, enabling exact entropy maximization. Theoretically, we ground our method in the classical moment problem, leveraging the universal approximation capabilities for arbitrary distributions. Empirically, we demonstrate that MePoly effectively captures complex non-convex manifolds and outperforms baselines in performance across diverse benchmarks.

【5】Multi-material Multi-physics Topology Optimization with Physics-informed Gaussian Process Priors
标题:具有物理信息高斯过程先验的多材料多物理场布局优化
链接:https://arxiv.org/abs/2602.17783

作者:Xiangyu Sun,Shirin Hosseinmardi,Amin Yousefpour,Ramin Bostanabad
摘要:机器学习(ML)越来越多地用于拓扑优化(TO)。然而,大多数现有的ML为基础的方法集中在简化的基准问题,由于其高计算成本,光谱偏差,并难以处理复杂的物理。这些限制变得更加明显,在多材料,多物理问题,其目标或约束函数是不自伴的。为了解决这些挑战,我们提出了一个基于物理信息高斯过程(PIGPs)的框架。在我们的方法中,主要的,伴随的,和设计变量表示的独立GP先验的均值函数参数化通过神经网络的架构是特别有益的代理建模的PDE解决方案。我们估计我们的模型的所有参数同时通过最小化的损失,是基于目标函数,多物理势能泛函,和设计约束。我们证明了所提出的框架的能力,如遵守最小化,热传导优化,并符合机制设计下的单材料和多材料设置的基准TO问题。此外,我们利用热机械TO与单一和多材料选项作为一个代表性的多物理问题。我们还引入了差异化和整合方案,大大加快了培训过程。我们的研究结果表明,所提出的PIGP框架可以有效地解决耦合的多物理和设计问题,同时生成超分辨率拓扑结构与尖锐的接口和物理上可解释的材料分布。我们使用开源代码和商业软件包COMSOL验证这些结果。
摘要:Machine learning (ML) has been increasingly used for topology optimization (TO). However, most existing ML-based approaches focus on simplified benchmark problems due to their high computational cost, spectral bias, and difficulty in handling complex physics. These limitations become more pronounced in multi-material, multi-physics problems whose objective or constraint functions are not self-adjoint. To address these challenges, we propose a framework based on physics-informed Gaussian processes (PIGPs). In our approach, the primary, adjoint, and design variables are represented by independent GP priors whose mean functions are parametrized via neural networks whose architectures are particularly beneficial for surrogate modeling of PDE solutions. We estimate all parameters of our model simultaneously by minimizing a loss that is based on the objective function, multi-physics potential energy functionals, and design-constraints. We demonstrate the capability of the proposed framework on benchmark TO problems such as compliance minimization, heat conduction optimization, and compliant mechanism design under single- and multi-material settings. Additionally, we leverage thermo-mechanical TO with single- and multi-material options as a representative multi-physics problem. We also introduce differentiation and integration schemes that dramatically accelerate the training process. Our results demonstrate that the proposed PIGP framework can effectively solve coupled multi-physics and design problems simultaneously -- generating super-resolution topologies with sharp interfaces and physically interpretable material distributions. We validate these results using open-source codes and the commercial software package COMSOL.

【6】Bayesian Optimality of In-Context Learning with Selective State Spaces
标题:具有选择性状态空间的上下文内学习的Bayesian最优性
链接:https://arxiv.org/abs/2602.17744

作者:Di Zhang,Jiaqi Xing
备注:17 pages
摘要:我们提出贝叶斯最优序列预测作为理解上下文学习(ICL)的新原则。与将Transformers框架为执行隐式梯度下降的解释不同,我们将ICL形式化为潜在序列任务的元学习。对于由线性高斯状态空间模型(LG-SSM)控制的任务,我们证明了一个元训练的选择性SSM渐近实现贝叶斯最优预测器,收敛到后验预测均值。我们进一步从梯度下降中建立了统计分离,构建了具有时间相关噪声的任务,其中最佳贝叶斯预测器严格优于任何经验风险最小化(ERM)估计器。由于Transformers可以被视为执行隐式ERM,这表明由于优越的统计效率,选择性SSM实现了较低的渐近风险。合成LG-SSM任务和字符级马尔可夫基准测试的实验证实,选择性SSM收敛速度更快贝叶斯最优风险,在结构化噪声环境中具有更长的上下文,表现出更高的样本效率,并且比线性Transformers更鲁棒地跟踪潜在状态。这将ICL从“隐式优化”重新定义为“最优推理”,解释了选择性SSM的效率,并为架构设计提供了原则性基础。
摘要:We propose Bayesian optimal sequential prediction as a new principle for understanding in-context learning (ICL). Unlike interpretations framing Transformers as performing implicit gradient descent, we formalize ICL as meta-learning over latent sequence tasks. For tasks governed by Linear Gaussian State Space Models (LG-SSMs), we prove a meta-trained selective SSM asymptotically implements the Bayes-optimal predictor, converging to the posterior predictive mean. We further establish a statistical separation from gradient descent, constructing tasks with temporally correlated noise where the optimal Bayesian predictor strictly outperforms any empirical risk minimization (ERM) estimator. Since Transformers can be seen as performing implicit ERM, this demonstrates selective SSMs achieve lower asymptotic risk due to superior statistical efficiency. Experiments on synthetic LG-SSM tasks and a character-level Markov benchmark confirm selective SSMs converge faster to Bayes-optimal risk, show superior sample efficiency with longer contexts in structured-noise settings, and track latent states more robustly than linear Transformers. This reframes ICL from "implicit optimization" to "optimal inference," explaining the efficiency of selective SSMs and offering a principled basis for architecture design.

【7】Joint Parameter and State-Space Bayesian Optimization: Using Process Expertise to Accelerate Manufacturing Optimization
标题:联合参数和状态空间Bayesian优化:使用过程专业知识加速制造优化
链接:https://arxiv.org/abs/2602.17679

作者:Saksham Kiroriwal,Julius Pfrommer,Jürgen Beyerer
备注:This paper is under review and has been submitted for CIRP CMS 2026
摘要:贝叶斯优化(BO)是一种用于优化黑盒制造过程的强大方法,但在处理高维多阶段系统时,其性能往往受到限制,我们可以观察到中间输出。标准BO将流程建模为黑盒,忽略中间观察结果和底层流程结构。部分可观测高斯过程网络(POGPN)将过程建模为有向无环图(DAG)。然而,当观测是高维状态空间时间序列时,使用中间观测是具有挑战性的。过程专家知识可用于从高维状态空间数据中提取低维潜在特征。我们提出POGPN-JPSS,一个框架,结合POGPN与联合参数和状态空间(JPSS)建模使用中间提取的信息。我们证明了POGPN-JPSS的有效性,一个具有挑战性的,高维模拟的多阶段生物乙醇生产过程。我们的研究结果表明,POGPN-JPSS显着优于国家的最先进的方法,实现所需的性能阈值的两倍,更快,更高的可靠性。快速优化直接转化为时间和资源的大量节省。这突出了将专家知识与结构化概率模型相结合以实现快速工艺成熟的重要性。
摘要:Bayesian optimization (BO) is a powerful method for optimizing black-box manufacturing processes, but its performance is often limited when dealing with high-dimensional multi-stage systems, where we can observe intermediate outputs. Standard BO models the process as a black box and ignores the intermediate observations and the underlying process structure. Partially Observable Gaussian Process Networks (POGPN) model the process as a Directed Acyclic Graph (DAG). However, using intermediate observations is challenging when the observations are high-dimensional state-space time series. Process-expert knowledge can be used to extract low-dimensional latent features from the high-dimensional state-space data. We propose POGPN-JPSS, a framework that combines POGPN with Joint Parameter and State-Space (JPSS) modeling to use intermediate extracted information. We demonstrate the effectiveness of POGPN-JPSS on a challenging, high-dimensional simulation of a multi-stage bioethanol production process. Our results show that POGPN-JPSS significantly outperforms state-of-the-art methods by achieving the desired performance threshold twice as fast and with greater reliability. The fast optimization directly translates to substantial savings in time and resources. This highlights the importance of combining expert knowledge with structured probabilistic models for rapid process maturation.

【8】BONNI: Gradient-Informed Bayesian and Interior Point Optimization for Efficient Inverse Design in Nanophotonics
标题:BONNI:面向对象的Bayesian和内点优化,用于纳米光电学中的高效逆设计
链接:https://arxiv.org/abs/2602.18148

作者:Yannik Mahlau,Yannick Augenstein,Tyler W. Hughes,Marius Lindauer,Bodo Rosenhahn
摘要:逆向设计,特别是几何形状优化,为开发高性能纳米光子器件提供了一种系统的方法。虽然存在许多优化算法,但以前的全局方法收敛速度慢,相反,局部搜索策略经常陷入局部最优。为了解决局部和全局方法固有的局限性,我们引入了BONNI:贝叶斯优化通过神经网络集成代理与内部点优化。它通过有效地结合梯度信息来确定最佳采样点,从而增强了全局优化。这种能力使BONNI能够规避在许多纳米光子应用中发现的局部最优,同时利用基于梯度的优化的效率。我们证明了BONNI的能力,在设计的分布式布拉格反射器以及双层光栅耦合器通过详尽的比较,对其他常用的优化算法在文献中。使用BONNI,我们能够设计一个10层的分布式布拉格反射器只有4.5%的平均光谱误差,与以前报道的结果7.8%的误差与16层。宽带波导锥形和光子晶体波导过渡的进一步设计验证了BONNI的能力。
摘要:Inverse design, particularly geometric shape optimization, provides a systematic approach for developing high-performance nanophotonic devices. While numerous optimization algorithms exist, previous global approaches exhibit slow convergence and conversely local search strategies frequently become trapped in local optima. To address the limitations inherent to both local and global approaches, we introduce BONNI: Bayesian optimization through neural network ensemble surrogates with interior point optimization. It augments global optimization with an efficient incorporation of gradient information to determine optimal sampling points. This capability allows BONNI to circumvent the local optima found in many nanophotonic applications, while capitalizing on the efficiency of gradient-based optimization. We demonstrate BONNI's capabilities in the design of a distributed Bragg reflector as well as a dual-layer grating coupler through an exhaustive comparison against other optimization algorithms commonly used in literature. Using BONNI, we were able to design a 10-layer distributed Bragg reflector with only 4.5% mean spectral error, compared to the previously reported results of 7.8% error with 16 layers. Further designs of a broadband waveguide taper and photonic crystal waveguide transition validate the capabilities of BONNI.

【9】Learning from Biased and Costly Data Sources: Minimax-optimal Data Collection under a Budget
标题:从有偏见和昂贵的数据源中学习:预算下的最小最优数据收集
链接:https://arxiv.org/abs/2602.17894

作者:Michael O. Harding,Vikas Singh,Kirthevasan Kandasamy
摘要:数据收集是现代统计和机器学习管道的关键组成部分,特别是当必须从多个异构源收集数据以研究感兴趣的目标人群时。在许多用例中,例如医学研究或政治民意调查,不同的来源会产生不同的采样成本。观察结果往往有相关的群体身份(例如,健康标志,人口统计学或政治派别),这些群体的相对组成可能会有很大的不同,无论是在源人群之间还是在源人群和目标人群之间。 在这项工作中,我们研究了固定预算下的多源数据收集,重点是估计人口的平均值和组条件的平均值。我们表明,天真的数据收集策略(例如,试图“匹配”的目标分布)或依赖于标准估计(例如,样本平均值)可以是非常次优的。相反,我们开发了一个最大化有效样本量的抽样计划:总样本量除以$D_{χ ^2}(q\mid\mid\overline {p})+1 $,其中$q $是目标分布,$\overline {p}$是聚合源分布,$D_{χ ^2}$是$χ ^2 $-散度。我们将此抽样计划与经典的后分层估计量和其风险上限配对。我们提供匹配的下限,建立我们的方法实现了预算的最小最大最优风险。我们的技术还扩展到预测问题时,最大限度地减少超额风险,提供了一个原则性的方法来多源学习昂贵的和异构的数据源。
摘要:Data collection is a critical component of modern statistical and machine learning pipelines, particularly when data must be gathered from multiple heterogeneous sources to study a target population of interest. In many use cases, such as medical studies or political polling, different sources incur different sampling costs. Observations often have associated group identities (for example, health markers, demographics, or political affiliations) and the relative composition of these groups may differ substantially, both among the source populations and between sources and target population. In this work, we study multi-source data collection under a fixed budget, focusing on the estimation of population means and group-conditional means. We show that naive data collection strategies (e.g. attempting to "match" the target distribution) or relying on standard estimators (e.g. sample mean) can be highly suboptimal. Instead, we develop a sampling plan which maximizes the effective sample size: the total sample size divided by $D_{χ^2}(q\mid\mid\overline{p}) + 1$, where $q$ is the target distribution, $\overline{p}$ is the aggregated source distribution, and $D_{χ^2}$ is the $χ^2$-divergence. We pair this sampling plan with a classical post-stratification estimator and upper bound its risk. We provide matching lower bounds, establishing that our approach achieves the budgeted minimax optimal risk. Our techniques also extend to prediction problems when minimizing the excess risk, providing a principled approach to multi-source learning with costly and heterogeneous data sources.

预测|估计(13篇)

【1】Scientific Knowledge-Guided Machine Learning for Vessel Power Prediction: A Comparative Study
标题 :科学知识引导的机器学习用于船舶功率预测:比较研究
链接:https://arxiv.org/abs/2602.18403

作者:Orfeas Bourchas,George Papalambrou
备注:Accepted to the KGML Bridge at AAAI 2026 (non-archival)
摘要:主机功率的准确预测对于船舶性能优化、燃油效率和符合排放法规至关重要。传统的机器学习方法,如支持向量机,人工神经网络(ANN)的变体,以及基于树的方法,如随机森林,额外树回归器和XGBoost,可以捕获非线性,但通常难以尊重功率和速度之间的基本螺旋桨定律关系,导致训练包络外的外推效果不佳。本研究介绍了一种混合建模框架,该框架将海上试验中基于物理的知识与数据驱动的残差学习相结合。从形式为$P = cV^n$的静水功率曲线导出的基线分量捕获主要的功率-速度依赖性,而另一个非线性回归量随后被训练以预测剩余功率,表示由环境和操作条件引起的偏差。通过将机器学习任务限制为残差修正,混合模型简化了学习,提高了泛化能力,并确保了与底层物理的一致性。在这项研究中,将XGBoost,一个简单的神经网络和一个与基线组件耦合的物理信息神经网络(PINN)与没有基线组件的相同模型进行了比较。在服务数据上的验证表明,混合模型在稀疏数据区域中始终优于纯数据驱动的基线,同时在填充区域中保持类似的性能。所提出的框架提供了一个实用和计算效率高的工具,船舶性能监测,天气路由,修剪优化和能源效率规划中的应用。
摘要:Accurate prediction of main engine power is essential for vessel performance optimization, fuel efficiency, and compliance with emission regulations. Conventional machine learning approaches, such as Support Vector Machines, variants of Artificial Neural Networks (ANNs), and tree-based methods like Random Forests, Extra Tree Regressors, and XGBoost, can capture nonlinearities but often struggle to respect the fundamental propeller law relationship between power and speed, resulting in poor extrapolation outside the training envelope. This study introduces a hybrid modeling framework that integrates physics-based knowledge from sea trials with data-driven residual learning. The baseline component, derived from calm-water power curves of the form $P = cV^n$, captures the dominant power-speed dependence, while another, nonlinear, regressor is then trained to predict the residual power, representing deviations caused by environmental and operational conditions. By constraining the machine learning task to residual corrections, the hybrid model simplifies learning, improves generalization, and ensures consistency with the underlying physics. In this study, an XGBoost, a simple Neural Network, and a Physics-Informed Neural Network (PINN) coupled with the baseline component were compared to identical models without the baseline component. Validation on in-service data demonstrates that the hybrid model consistently outperformed a pure data-driven baseline in sparse data regions while maintaining similar performance in populated ones. The proposed framework provides a practical and computationally efficient tool for vessel performance monitoring, with applications in weather routing, trim optimization, and energy efficiency planning.

【2】PRISM-FCP: Byzantine-Resilient Federated Conformal Prediction via Partial Sharing
标题:PRISM-FCP:通过部分共享的拜占庭弹性联邦保形预测
链接:https://arxiv.org/abs/2602.18396

作者:Ehsan Lari,Reza Arablouei,Stefan Werner
备注:13 pages, 5 figures, 2 tables, Submitted to IEEE Transactions on Signal Processing (TSP)
摘要:我们提出了PRISM-FCP(Partial shaRing and robust calIbration with Statistical Margins for Federated Conformal Prediction),这是一个拜占庭弹性的联邦共形预测框架,它利用部分模型共享来提高模型训练和共形校准期间对拜占庭攻击的鲁棒性。现有方法仅在校准阶段解决对抗行为,使学习的模型容易受到有毒更新的影响。相比之下,PRISM-FCP减轻了端到端的攻击。在训练期间,客户端通过每轮仅传输$M$的$D$参数来部分地共享更新。这衰减的预期能量的对手的扰动在聚合更新的一个因素$M/D$,产生较低的均方误差(MSE)和更紧密的预测间隔。在校准过程中,客户端将不合格分数转换为特征向量,计算基于距离的恶意分数,并在估计共形分位数之前降低权重或过滤可疑的拜占庭贡献。在合成数据和UCI超导数据集上进行的大量实验表明,PRISM-FCP在拜占庭攻击下保持了名义覆盖保证,同时避免了在标准FCP中观察到的间隔膨胀,减少了通信,为联邦不确定性量化提供了一种鲁棒且通信高效的方法。
摘要:We propose PRISM-FCP (Partial shaRing and robust calIbration with Statistical Margins for Federated Conformal Prediction), a Byzantine-resilient federated conformal prediction framework that utilizes partial model sharing to improve robustness against Byzantine attacks during both model training and conformal calibration. Existing approaches address adversarial behavior only in the calibration stage, leaving the learned model susceptible to poisoned updates. In contrast, PRISM-FCP mitigates attacks end-to-end. During training, clients partially share updates by transmitting only $M$ of $D$ parameters per round. This attenuates the expected energy of an adversary's perturbation in the aggregated update by a factor of $M/D$, yielding lower mean-square error (MSE) and tighter prediction intervals. During calibration, clients convert nonconformity scores into characterization vectors, compute distance-based maliciousness scores, and downweight or filter suspected Byzantine contributions before estimating the conformal quantile. Extensive experiments on both synthetic data and the UCI Superconductivity dataset demonstrate that PRISM-FCP maintains nominal coverage guarantees under Byzantine attacks while avoiding the interval inflation observed in standard FCP with reduced communication, providing a robust and communication-efficient approach to federated uncertainty quantification.

【3】Quantum Maximum Likelihood Prediction via Hilbert Space Embeddings
标题:基于希尔BERT空间嵌入的量子最大似然预测
链接:https://arxiv.org/abs/2602.18364

作者:Sreejith Sreekumar,Nir Weinberger
备注:32+4 pages, 1 figure
摘要:最近的工作已经提出了各种解释的能力,现代大型语言模型(LLM)执行上下文预测。我们从信息几何和统计的角度提出了另一种概念观点。受Bach[2023]的启发,我们将训练建模为学习将概率分布嵌入到量子密度算子空间中,并将上下文学习建模为对特定类别量子模型的最大似然预测。我们提供了一个解释这个预测的量子反向信息投影和量子勾股定理时,类的量子模型是足够的表达。我们进一步得到非渐近性能保证的收敛速度和浓度不等式,无论是在跟踪范数和量子相对熵。我们的方法提供了一个统一的框架来处理经典和量子LLM。
摘要:Recent works have proposed various explanations for the ability of modern large language models (LLMs) to perform in-context prediction. We propose an alternative conceptual viewpoint from an information-geometric and statistical perspective. Motivated by Bach[2023], we model training as learning an embedding of probability distributions into the space of quantum density operators, and in-context learning as maximum-likelihood prediction over a specified class of quantum models. We provide an interpretation of this predictor in terms of quantum reverse information projection and quantum Pythagorean theorem when the class of quantum models is sufficiently expressive. We further derive non-asymptotic performance guarantees in terms of convergence rates and concentration inequalities, both in trace norm and quantum relative entropy. Our approach provides a unified framework to handle both classical and quantum LLMs.

【4】A Deep Surrogate Model for Robust and Generalizable Long-Term Blast Wave Prediction
标题:稳健且可推广的长期爆炸波预测深度代理模型
链接:https://arxiv.org/abs/2602.18168

作者:Danning Jing,Xinhai Chen,Xifeng Pu,Jie Hu,Chao Huang,Xuguang Chen,Qinglin Wang,Jie Liu
摘要:由于爆炸波传播的高度非线性行为、陡峭的梯度和繁重的计算成本,精确地建模爆炸波传播的时空动力学仍然是一个长期的挑战。虽然基于机器学习的代理模型提供了快速推理作为一种有前途的替代方案,但它们的准确性下降,特别是在复杂的城市布局或分布情况下进行评估。此外,这种模型中的自回归预测策略在长期预测范围内容易出现误差积累,限制了它们在长时间模拟中的鲁棒性。为了解决这些局限性,我们提出了RGD-Blast,这是一个强大的和可推广的深度代理模型,用于高保真,长期的爆炸波预测。RGD-Blast集成了一个多尺度模块来捕获全局流模式和局部边界相互作用,有效地减少了自回归预测过程中的误差积累。我们引入了一种动态-静态特征耦合机制,该机制将时变压力场与静态源和布局特征相融合,从而增强了分布泛化能力。实验表明,RGD-Blast实现了两个数量级的速度比传统的数值方法,同时保持相当的精度。在对看不见的建筑布局的泛化测试中,该模型在280个连续时间步长内实现了低于0.01的平均RMSE和超过0.89的R2。在不同的爆炸源位置和炸药装药重量下的额外评估进一步验证了其泛化能力,大大推进了长期爆炸波建模的最新技术水平。
摘要 :Accurately modeling the spatio-temporal dynamics of blast wave propagation remains a longstanding challenge due to its highly nonlinear behavior, sharp gradients, and burdensome computational cost. While machine learning-based surrogate models offer fast inference as a promising alternative, they suffer from degraded accuracy, particularly evaluated on complex urban layouts or out-of-distribution scenarios. Moreover, autoregressive prediction strategies in such models are prone to error accumulation over long forecasting horizons, limiting their robustness for extended-time simulations. To address these limitations, we propose RGD-Blast, a robust and generalizable deep surrogate model for high-fidelity, long-term blast wave forecasting. RGD-Blast incorporates a multi-scale module to capture both global flow patterns and local boundary interactions, effectively mitigating error accumulation during autoregressive prediction. We introduce a dynamic-static feature coupling mechanism that fuses time-varying pressure fields with static source and layout features, thereby enhancing out-of-distribution generalization. Experiments demonstrate that RGD-Blast achieves a two-order-of-magnitude speedup over traditional numerical methods while maintaining comparable accuracy. In generalization tests on unseen building layouts, the model achieves an average RMSE below 0.01 and an R2 exceeding 0.89 over 280 consecutive time steps. Additional evaluations under varying blast source locations and explosive charge weights further validate its generalization, substantially advancing the state of the art in long-term blast wave modeling.

【5】Learning Long-Range Dependencies with Temporal Predictive Coding
标题:使用时间预测编码学习长期依赖性
链接:https://arxiv.org/abs/2602.18131

作者:Tom Potter,Oliver Rhodes
摘要:预测编码(PC)是一种受生物启发的学习框架,其特征是本地的,可并行的操作,能够在神经形态硬件上实现节能的特性。尽管如此,将PC有效地扩展到递归神经网络(RNN)仍然具有挑战性,特别是对于涉及长距离时间依赖性的任务。通过时间的反向传播(BPTT)仍然是训练RNN的主要方法,但其非局部计算,缺乏空间并行性,以及需要存储大量激活历史,导致大量的能量消耗。本文介绍了一种结合时间预测编码(TPC)和近似实时递归学习(RTRL)的新方法,实现了有效的时空信用分配。结果表明,该方法可以密切匹配的性能BPTT在合成基准和现实世界的任务。在一个具有挑战性的机器翻译任务中,使用1500万个参数模型,所提出的方法实现了7.62的测试困惑度(BPTT为7.49),这标志着tPC首次应用于这种规模的任务之一。这些发现表明,这种方法有潜力学习复杂的时间依赖关系,同时保留原始PC框架的本地,可并行和灵活的属性,为更节能的学习系统铺平了道路。
摘要:Predictive Coding (PC) is a biologically-inspired learning framework characterised by local, parallelisable operations, properties that enable energy-efficient implementation on neuromorphic hardware. Despite this, extending PC effectively to recurrent neural networks (RNNs) has been challenging, particularly for tasks involving long-range temporal dependencies. Backpropagation Through Time (BPTT) remains the dominant method for training RNNs, but its non-local computation, lack of spatial parallelism, and requirement to store extensive activation histories results in significant energy consumption. This work introduces a novel method combining Temporal Predictive Coding (tPC) with approximate Real-Time Recurrent Learning (RTRL), enabling effective spatio-temporal credit assignment. Results indicate that the proposed method can closely match the performance of BPTT on both synthetic benchmarks and real-world tasks. On a challenging machine translation task, with a 15-million parameter model, the proposed method achieves a test perplexity of 7.62 (vs. 7.49 for BPTT), marking one of the first applications of tPC to tasks of this scale. These findings demonstrate the potential of this method to learn complex temporal dependencies whilst retaining the local, parallelisable, and flexible properties of the original PC framework, paving the way for more energy-efficient learning systems.

【6】Comparative Assessment of Multimodal Earth Observation Data for Soil Moisture Estimation
标题:土壤湿度估算的多峰地球观测数据比较评估
链接:https://arxiv.org/abs/2602.18083

作者:Ioannis Kontogiorgakis,Athanasios Askitopoulos,Iason Tsardanidis,Dimitrios Bormpoudakis,Ilias Tsoumas,Fotios Balampanis,Charalampos Kontoes
备注:This paper has been submitted to IEEE IGARSS 2026
摘要:精确的土壤湿度(SM)估计对于精确农业、水资源管理和气候监测至关重要。然而,现有的卫星SM产品对于农场级应用来说太粗糙(> 1 km)。我们提出了一个高分辨率(10米)SM估计框架,为整个欧洲的植被覆盖区,结合哨兵-1 SAR,哨兵-2光学图像和ERA-5再分析数据,通过机器学习。使用113个国际土壤水分网络(ISMN)站跨越不同的植被地区,我们比较模态组合与时间参数化,使用空间交叉验证,以确保地理概括。我们还评估了IBM-NASA的Prithvi模型的基础模型嵌入是否改善了传统手工制作的光谱特征。结果表明,混合时间匹配- Sentinel-2当前采集与Sentinel-1下降轨道-达到R^2=0.514,10天ERA 5回顾窗口将性能提高到R^2=0.518。基础模型(Prithvi)嵌入相对于手工特征的改进可以忽略不计(R^2=0.515 vs. 0.514),这表明传统的特征工程在稀疏数据回归任务中仍然具有很强的竞争力。我们的研究结果表明,特定领域的光谱指数与基于树的集成方法相结合,提供了一个实用的和计算效率高的解决方案,操作泛欧现场规模的土壤水分监测。
摘要:Accurate soil moisture (SM) estimation is critical for precision agriculture, water resources management and climate monitoring. Yet, existing satellite SM products are too coarse (>1km) for farm-level applications. We present a high-resolution (10m) SM estimation framework for vegetated areas across Europe, combining Sentinel-1 SAR, Sentinel-2 optical imagery and ERA-5 reanalysis data through machine learning. Using 113 International Soil Moisture Network (ISMN) stations spanning diverse vegetated areas, we compare modality combinations with temporal parameterizations, using spatial cross-validation, to ensure geographic generalization. We also evaluate whether foundation model embeddings from IBM-NASA's Prithvi model improve upon traditional hand-crafted spectral features. Results demonstrate that hybrid temporal matching - Sentinel-2 current-day acquisitions with Sentinel-1 descending orbit - achieves R^2=0.514, with 10-day ERA5 lookback window improving performance to R^2=0.518. Foundation model (Prithvi) embeddings provide negligible improvement over hand-crafted features (R^2=0.515 vs. 0.514), indicating traditional feature engineering remains highly competitive for sparse-data regression tasks. Our findings suggest that domain-specific spectral indices combined with tree-based ensemble methods offer a practical and computationally efficient solution for operational pan-European field-scale soil moisture monitoring.

【7】PHAST: Port-Hamiltonian Architecture for Structured Temporal Dynamics Forecasting
标题:PHAST:结构化时间动态预测的波特汉密尔顿架构
链接:https://arxiv.org/abs/2602.17998

作者:Shubham Bhardwaj,Chandrajit Bajaj
备注:50 pages
摘要:真正的物理系统是耗散的--钟摆变慢,电路因热而失去电荷--通过部分观察来预测它们的动态是科学机器学习的核心挑战。我们解决了\n {position-only}(仅q)问题:仅给定离散时间(动量~$p_t$ latent)的广义位置~$q_t$,学习一个结构化模型,该模型(a)~产生稳定的长期预测,(b)~在提供足够的结构时恢复物理上有意义的参数。端口-哈密顿框架通过$\dot{x}=(J-R)\nabla H(x)$使保守-耗散分裂显式化,保证$dH/dt\le 0$当$R\q 0$时。我们介绍了\textbf{PHAST}(结构化时间动力学的端口-哈密顿体系结构),它将哈密顿量分解为三个知识体系(KNOWN,PARTIAL,UNKNOWN)的势~$V(q)$,质量~$M(q)$和阻尼~$D(q)$,使用有效的低秩PSD/SPD参数化,并使用Strang分裂推进动力学。在涵盖机械、电气、分子、热、重力和生态系统的13个仅限q的基准中,PHAST在竞争性基线中实现了最佳的长期预测,并在制度提供足够的锚点时实现了有物理意义的参数恢复。我们发现,识别是根本不适定没有这样的锚(规范自由),激励一个两轴的评价,分离的预测稳定性和可识别性。
摘要:Real physical systems are dissipative -- a pendulum slows, a circuit loses charge to heat -- and forecasting their dynamics from partial observations is a central challenge in scientific machine learning. We address the \emph{position-only} (q-only) problem: given only generalized positions~$q_t$ at discrete times (momenta~$p_t$ latent), learn a structured model that (a)~produces stable long-horizon forecasts and (b)~recovers physically meaningful parameters when sufficient structure is provided. The port-Hamiltonian framework makes the conservative-dissipative split explicit via $\dot{x}=(J-R)\nabla H(x)$, guaranteeing $dH/dt\le 0$ when $R\succeq 0$. We introduce \textbf{PHAST} (Port-Hamiltonian Architecture for Structured Temporal dynamics), which decomposes the Hamiltonian into potential~$V(q)$, mass~$M(q)$, and damping~$D(q)$ across three knowledge regimes (KNOWN, PARTIAL, UNKNOWN), uses efficient low-rank PSD/SPD parameterizations, and advances dynamics with Strang splitting. Across thirteen q-only benchmarks spanning mechanical, electrical, molecular, thermal, gravitational, and ecological systems, PHAST achieves the best long-horizon forecasting among competitive baselines and enables physically meaningful parameter recovery when the regime provides sufficient anchors. We show that identification is fundamentally ill-posed without such anchors (gauge freedom), motivating a two-axis evaluation that separates forecasting stability from identifiability.

【8】Student Flow Modeling for School Decongestion via Stochastic Gravity Estimation and Constrained Spatial Allocation
标题:通过随机重力估计和约束空间分配实现学校拥堵的学生流建模
链接:https://arxiv.org/abs/2602.17972

作者:Sebastian Felipe R. Bundoc,Paula Joy B. Martinez,Sebastian C. Ibañez,Erika Fille T. Legara
摘要:入学人数超过学校容量的学校拥挤是低收入和中等收入国家的一个重大挑战。它严重影响学习成果,加深教育中的不平等。虽然将学生从公立学校转移到私立学校的补贴计划提供了一种在没有资本密集型建设的情况下缓解拥堵的机制,但由于碎片化的数据系统阻碍了有效实施,这些计划往往表现不佳。菲律宾教育服务承包计划是世界上最大的教育补贴计划之一,它克服了这些挑战,没有达到缓解公立学校拥挤的目标。这阻碍了科学和数据驱动的分析,以了解是什么塑造了学生入学流动,特别是家庭如何应对经济激励和空间限制。我们介绍了一个计算框架,学生流模式建模和模拟政策的情况。通过综合近3,000所机构的异质政府数据,我们采用了通过负二项回归估计的随机重力模型来推导距离,净学费和社会经济决定因素的行为弹性。这些弹性通知一个双重约束的空间分配机制,模拟学生重新分配下不同的补贴金额,同时尊重两个来源候选池和目的地插槽容量。我们发现,地理位置的接近限制学校选择的四倍以上的学费成本和插槽容量,而不是补贴金额,是有约束力的约束。我们的工作表明,仅靠补贴计划无法解决系统性的过度拥挤问题,计算建模可以通过揭示形成有效资源分配的结构性限制,使教育政策制定者能够做出公平的、数据驱动的决策,即使在资源有限的情况下。
摘要 :School congestion, where student enrollment exceeds school capacity, is a major challenge in low- and middle-income countries. It highly impacts learning outcomes and deepens inequities in education. While subsidy programs that transfer students from public to private schools offer a mechanism to alleviate congestion without capital-intensive construction, they often underperform due to fragmented data systems that hinder effective implementation. The Philippine Educational Service Contracting program, one of the world's largest educational subsidy programs, exemplifies these challenges, falling short of its goal to decongest public schools. This prevents the science-based and data-driven analyses needed to understand what shapes student enrollment flows, particularly how families respond to economic incentives and spatial constraints. We introduce a computational framework for modeling student flow patterns and simulating policy scenarios. By synthesizing heterogeneous government data across nearly 3,000 institutions, we employ a stochastic gravity model estimated via negative binomial regression to derive behavioral elasticities for distance, net tuition cost, and socioeconomic determinants. These elasticities inform a doubly constrained spatial allocation mechanism that simulates student redistribution under varying subsidy amounts while respecting both origin candidate pools and destination slot capacities. We find that geographic proximity constrains school choice four times more strongly than tuition cost and that slot capacity, not subsidy amounts, is the binding constraint. Our work demonstrates that subsidy programs alone cannot resolve systemic overcrowding, and computational modeling can empower education policymakers to make equitable, data-driven decisions by revealing the structural constraints that shape effective resource allocation, even when resources are limited.

【9】Distribution-Free Sequential Prediction with Abstentions
标题:有弃权的无分布顺序预测
链接:https://arxiv.org/abs/2602.17918

作者:Jialin Yu,Moïse Blanchard
备注:38 pages, 2 figures. Submitted to COLT 2026. Extended version
摘要:我们研究了一个序贯预测问题,其中允许对手在i.i.d.流中注入任意多个对抗实例。实例,但在每一轮,如果实例确实被破坏,则学习者也可以\n {弃权}不进行预测而不招致任何惩罚。这种半对抗性的设置自然介于经典的随机情况与独立同分布之间。具有有限VC维的函数类是可学习的实例;以及具有任意实例的对抗性情况,已知具有更大的限制性。对于这个问题,Goel等人(2023)表明,如果学习者事先知道干净样本的分布,则可以在不限制对手破坏的情况下对所有VC类进行学习。然而,这在理论和实践中都是一个强有力的假设:一个自然的问题是,在没有先验分布知识的情况下,是否可以实现类似的学习保证,这在经典学习框架中是标准的(例如,PAC学习或渐近一致性)和其他非i.i.d.\模型(例如,在线学习(Online Learning)因此,我们专注于无分布设置,其中$μ$是\n {未知},并提出了一个算法\textsc{AbstainBoost}的基础上的弱学习者的提升过程,保证次线性误差的一般VC类\n {distribution-free}不经意的对手的学习。这些算法也享有类似的保证,自适应对手,结构化功能类,包括线性分类器。这些结果得到了相应下限的补充,揭示了错误分类错误和错误弃权数量之间有趣的多项式权衡。
摘要:We study a sequential prediction problem in which an adversary is allowed to inject arbitrarily many adversarial instances in a stream of i.i.d.\ instances, but at each round, the learner may also \emph{abstain} from making a prediction without incurring any penalty if the instance was indeed corrupted. This semi-adversarial setting naturally sits between the classical stochastic case with i.i.d.\ instances for which function classes with finite VC dimension are learnable; and the adversarial case with arbitrary instances, known to be significantly more restrictive. For this problem, Goel et al. (2023) showed that, if the learner knows the distribution $μ$ of clean samples in advance, learning can be achieved for all VC classes without restrictions on adversary corruptions. This is, however, a strong assumption in both theory and practice: a natural question is whether similar learning guarantees can be achieved without prior distributional knowledge, as is standard in classical learning frameworks (e.g., PAC learning or asymptotic consistency) and other non-i.i.d.\ models (e.g., smoothed online learning). We therefore focus on the distribution-free setting where $μ$ is \emph{unknown} and propose an algorithm \textsc{AbstainBoost} based on a boosting procedure of weak learners, which guarantees sublinear error for general VC classes in \emph{distribution-free} abstention learning for oblivious adversaries. These algorithms also enjoy similar guarantees for adaptive adversaries, for structured function classes including linear classifiers. These results are complemented with corresponding lower bounds, which reveal an interesting polynomial trade-off between misclassification error and number of erroneous abstentions.

【10】VQPP: Video Query Performance Prediction Benchmark
标题:VQPP:视频查询性能预测基准
链接:https://arxiv.org/abs/2602.17814

作者:Adrian Catalin Lutu,Eduard Poesina,Radu Tudor Ionescu
摘要:Query performance prediction (QPP) is an important and actively studied information retrieval task, having various applications, such as query reformulation, query expansion, and retrieval system selection, among many others. The task has been primarily studied in the context of text and image retrieval, whereas QPP for content-based video retrieval (CBVR) remains largely underexplored. To this end, we propose the first benchmark for video query performance prediction (VQPP), comprising two text-to-video retrieval datasets and two CBVR systems, respectively. VQPP contains a total of 56K text queries and 51K videos, and comes with official training, validation and test splits, fostering direct comparisons and reproducible results. We explore multiple pre-retrieval and post-retrieval performance predictors, creating a representative benchmark for future exploration of QPP in the video domain. Our results show that pre-retrieval predictors obtain competitive performance, enabling applications before performing the retrieval step. We also demonstrate the applicability of VQPP by employing the best performing pre-retrieval predictor as reward model for training a large language model (LLM) on the query reformulation task via direct preference optimization (DPO). We release our benchmark and code at https://github.com/AdrianLutu/VQPP.

【11】Probabilistic NDVI Forecasting from Sparse Satellite Time Series and Weather Covariates
标题:根据稀疏卫星时间序列和天气协变量进行概率NSX预测
链接:https://arxiv.org/abs/2602.17683

作者:Irene Iele,Giulia Romoli,Daniele Molino,Elena Mulero Ayllón,Filippo Ruffini,Paolo Soda,Matteo Tortora
摘要:Accurate short-term forecasting of vegetation dynamics is a key enabler for data-driven decision support in precision agriculture. Normalized Difference Vegetation Index (NDVI) forecasting from satellite observations, however, remains challenging due to sparse and irregular sampling caused by cloud coverage, as well as the heterogeneous climatic conditions under which crops evolve. In this work, we propose a probabilistic forecasting framework specifically designed for field-level NDVI prediction under clear-sky acquisition constraints. The method leverages a transformer-based architecture that explicitly separates the modeling of historical vegetation dynamics from future exogenous information, integrating historical NDVI observations with both historical and future meteorological covariates. To address irregular revisit patterns and horizon-dependent uncertainty, we introduce a temporal-distance weighted quantile loss that aligns the training objective with the effective forecasting horizon. In addition, we incorporate cumulative and extreme-weather feature engineering to better capture delayed meteorological effects relevant to vegetation response. Extensive experiments on European satellite data demonstrate that the proposed approach consistently outperforms a diverse set of statistical, deep learning, and recent time series baselines across both point-wise and probabilistic evaluation metrics. Ablation studies further highlight the central role of target history, while showing that meteorological covariates provide complementary gains when jointly exploited. The code is available at https://github.com/arco-group/ndvi-forecasting.

【12】TopoGate: Quality-Aware Topology-Stabilized Gated Fusion for Longitudinal Low-Dose CT New-Lesion Prediction
标题:TopoGate:具有质量意识的布局稳定门控融合,用于纵向低剂量CT新病变预测
链接:https://arxiv.org/abs/2602.17855

作者:Seungik Cho
摘要 :Longitudinal low-dose CT follow-ups vary in noise, reconstruction kernels, and registration quality. These differences destabilize subtraction images and can trigger false new lesion alarms. We present TopoGate, a lightweight model that combines the follow-up appearance view with the subtraction view and controls their influence through a learned, quality-aware gate. The gate is driven by three case-specific signals: CT appearance quality, registration consistency, and stability of anatomical topology measured with topological metrics. On the NLST--New-Lesion--LongCT cohort comprising 152 pairs from 122 patients, TopoGate improves discrimination and calibration over single-view baselines, achieving an area under the ROC curve of 0.65 with a standard deviation of 0.05 and a Brier score of 0.14. Removing corrupted or low-quality pairs, identified by the quality scores, further increases the area under the ROC curve from 0.62 to 0.68 and reduces the Brier score from 0.14 to 0.12. The gate responds predictably to degradation, placing more weight on appearance when noise grows, which mirrors radiologist practice. The approach is simple, interpretable, and practical for reliable longitudinal LDCT triage.

【13】AgriVariant: Variant Effect Prediction using DeepChem-Variant for Precision Breeding in Rice
标题:TrigVariant:使用DeepChem-Variant进行水稻精确育种的变异效应预测
链接:https://arxiv.org/abs/2602.17747

作者:Ankita Vaishnobi Bisoi,Bharath Ramsundar
备注:8 pages, 7 figures, 5 tables
摘要:Predicting functional consequences of genetic variants in crop genes remains a critical bottleneck for precision breeding programs. We present AgriVariant, an end-to-end pipeline for variant-effect prediction in rice (Oryza sativa) that addresses the lack of crop-specific variant-interpretation tools and can be extended to any crop species with available reference genomes and gene annotations. Our approach integrates deep learning-based variant calling (DeepChem-Variant) with custom plant genomics annotation using RAP-DB gene models and database-independent deleteriousness scoring that combines the Grantham distance and the BLOSUM62 substitution matrix. We validate the pipeline through targeted mutations in stress-response genes (OsDREB2a, OsDREB1F, SKC1), demonstrating correct classification of stop-gained, missense, and synonymous variants with appropriate HIGH / MODERATE / LOW impact assignments. An exhaustive mutagenesis study of OsMT-3a analyzed all 1,509 possible single-nucleotide variants in 10 days, identifying 353 high-impact, 447 medium-impact, and 709 low-impact variants - an analysis that would have required 2-4 years using traditional wet-lab approaches. This computational framework enables breeders to prioritize variants for experimental validation across diverse crop species, reducing screening costs and accelerating development of climate-resilient crop varieties.

其他神经网络|深度学习|模型|建模(23篇)

【1】The Geometry of Noise: Why Diffusion Models Don't Need Noise Conditioning
标题:噪声的几何:为什么扩散模型不需要噪声调节
链接:https://arxiv.org/abs/2602.18428

作者:Mojtaba Sahraee-Ardakan,Mauricio Delbracio,Peyman Milanfar
摘要:Autonomous (noise-agnostic) generative models, such as Equilibrium Matching and blind diffusion, challenge the standard paradigm by learning a single, time-invariant vector field that operates without explicit noise-level conditioning. While recent work suggests that high-dimensional concentration allows these models to implicitly estimate noise levels from corrupted observations, a fundamental paradox remains: what is the underlying landscape being optimized when the noise level is treated as a random variable, and how can a bounded, noise-agnostic network remain stable near the data manifold where gradients typically diverge? We resolve this paradox by formalizing Marginal Energy, $E_{\text{marg}}(\mathbf{u}) = -\log p(\mathbf{u})$, where $p(\mathbf{u}) = \int p(\mathbf{u}|t)p(t)dt$ is the marginal density of the noisy data integrated over a prior distribution of unknown noise levels. We prove that generation using autonomous models is not merely blind denoising, but a specific form of Riemannian gradient flow on this Marginal Energy. Through a novel relative energy decomposition, we demonstrate that while the raw Marginal Energy landscape possesses a $1/t^p$ singularity normal to the data manifold, the learned time-invariant field implicitly incorporates a local conformal metric that perfectly counteracts the geometric singularity, converting an infinitely deep potential well into a stable attractor. We also establish the structural stability conditions for sampling with autonomous models. We identify a ``Jensen Gap'' in noise-prediction parameterizations that acts as a high-gain amplifier for estimation errors, explaining the catastrophic failure observed in deterministic blind models. Conversely, we prove that velocity-based parameterizations are inherently stable because they satisfy a bounded-gain condition that absorbs posterior uncertainty into a smooth geometric drift.

【2】Learning to Tune Pure Pursuit in Autonomous Racing: Joint Lookahead and Steering-Gain Control with PPO
标题:学习在自动赛车中调整纯粹追求:与PPO联合前瞻和转向-油门控制
链接:https://arxiv.org/abs/2602.18386

作者:Mohamed Elgouhary,Amr S. El-Wakeel
摘要:Pure Pursuit (PP) is widely used in autonomous racing for real-time path tracking due to its efficiency and geometric clarity, yet performance is highly sensitive to how key parameters-lookahead distance and steering gain-are chosen. Standard velocity-based schedules adjust these only approximately and often fail to transfer across tracks and speed profiles. We propose a reinforcement-learning (RL) approach that jointly chooses the lookahead Ld and a steering gain g online using Proximal Policy Optimization (PPO). The policy observes compact state features (speed and curvature taps) and outputs (Ld, g) at each control step. Trained in F1TENTH Gym and deployed in a ROS 2 stack, the policy drives PP directly (with light smoothing) and requires no per-map retuning. Across simulation and real-car tests, the proposed RL-PP controller that jointly selects (Ld, g) consistently outperforms fixed-lookahead PP, velocity-scheduled adaptive PP, and an RL lookahead-only variant, and it also exceeds a kinematic MPC raceline tracker under our evaluated settings in lap time, path-tracking accuracy, and steering smoothness, demonstrating that policy-guided parameter tuning can reliably improve classical geometry-based control.

【3】On the "Induction Bias" in Sequence Models
链接:https://arxiv.org/abs/2602.18333

作者:M. Reza Ebrahimi,Michaël Defferrard,Sunny Panchal,Roland Memisevic
摘要:Despite the remarkable practical success of transformer-based language models, recent work has raised concerns about their ability to perform state tracking. In particular, a growing body of literature has shown this limitation primarily through failures in out-of-distribution (OOD) generalization, such as length extrapolation. In this work, we shift attention to the in-distribution implications of these limitations. We conduct a large-scale experimental study of the data efficiency of transformers and recurrent neural networks (RNNs) across multiple supervision regimes. We find that the amount of training data required by transformers grows much more rapidly with state-space size and sequence length than for RNNs. Furthermore, we analyze the extent to which learned state-tracking mechanisms are shared across different sequence lengths. We show that transformers exhibit negligible or even detrimental weight sharing across lengths, indicating that they learn length-specific solutions in isolation. In contrast, recurrent models exhibit effective amortized learning by sharing weights across lengths, allowing data from one sequence length to improve performance on others. Together, these results demonstrate that state tracking remains a fundamental challenge for transformers, even when training and evaluation distributions match.

【4】Generative Model via Quantile Assignment
标题:通过分位数分配的生成模型
链接:https://arxiv.org/abs/2602.18216

作者:Georgi Hrusanov,Oliver Y. Chén,Julien S. Bodelet
摘要 :Deep Generative models (DGMs) play two key roles in modern machine learning: (i) producing new information (e.g., image synthesis) and (ii) reducing dimensionality. However, traditional architectures often rely on auxiliary networks such as encoders in Variational Autoencoders (VAEs) or discriminators in Generative Adversarial Networks (GANs), which introduce training instability, computational overhead, and risks like mode collapse. We present NeuroSQL, a new generative paradigm that eliminates the need for auxiliary networks by learning low-dimensional latent representations implicitly. NeuroSQL leverages an asymptotic approximation that expresses the latent variables as the solution to an optimal transportation problem. Specifically, NeuroSQL learns the latent variables by solving a linear assignment problem and then passes the latent information to a standalone generator. We benchmark its performance against GANs, VAEs, and a budget-matched diffusion baseline on four datasets: handwritten digits (MNIST), faces (CelebA), animal faces (AFHQ), and brain images (OASIS). Compared to VAEs, GANs, and diffusion models: (1) in terms of image quality, NeuroSQL achieves overall lower mean pixel distance between synthetic and authentic images and stronger perceptual/structural fidelity; (2) computationally, NeuroSQL requires the least training time; and (3) practically, NeuroSQL provides an effective solution for generating synthetic data with limited training samples. By embracing quantile assignment rather than an encoder, NeuroSQL provides a fast, stable, and robust way to generate synthetic data with minimal information loss.

【5】Cut Less, Fold More: Model Compression through the Lens of Projection Geometry
标题:减少剪切,增加折叠:通过投影几何的视角进行模型压缩
链接:https://arxiv.org/abs/2602.18116

作者:Olga Saukh,Dong Wang,Haris Šikić,Yun Cheng,Lothar Thiele
备注:Accepted by ICLR 2026
摘要:Compressing neural networks without retraining is vital for deployment at scale. We study calibration-free compression through the lens of projection geometry: structured pruning is an axis-aligned projection, whereas model folding performs a low-rank projection via weight clustering. We formalize both as orthogonal operators and show that, within a rank distance of one, folding provably yields smaller parameter reconstruction error, and under mild smoothness assumptions, smaller functional perturbations than pruning. At scale, we evaluate >1000 checkpoints spanning ResNet18, PreActResNet18, ViT-B/32, and CLIP ViT-B/32 on CIFAR-10 and ImageNet-1K, covering diverse training hyperparameters (optimizers, learning rates, augmentations, regularization, sharpness-aware training), as well as multiple LLaMA-family 60M and 130M parameter models trained on C4. We show that folding typically achieves higher post-compression accuracy, with the largest gains at moderate-high compression. The gap narrows and occasionally reverses at specific training setups. Our results position folding as a geometry-aware, calibration-free alternative to pruning that is often superior in practice and principled in theory.

【6】Non-Stationary Online Resource Allocation: Learning from a Single Sample
标题:非固定在线资源分配:从单一样本中学习
链接:https://arxiv.org/abs/2602.18114

作者:Yiding Feng,Jiashuo Jiang,Yige Wang
摘要:We study online resource allocation under non-stationary demand with a minimum offline data requirement. In this problem, a decision-maker must allocate multiple types of resources to sequentially arriving queries over a finite horizon. Each query belongs to a finite set of types with fixed resource consumption and a stochastic reward drawn from an unknown, type-specific distribution. Critically, the environment exhibits arbitrary non-stationarity -- arrival distributions may shift unpredictably-while the algorithm requires only one historical sample per period to operate effectively. We distinguish two settings based on sample informativeness: (i) reward-observed samples containing both query type and reward realization, and (ii) the more challenging type-only samples revealing only query type information. We propose a novel type-dependent quantile-based meta-policy that decouples the problem into modular components: reward distribution estimation, optimization of target service probabilities via fluid relaxation, and real-time decisions through dynamic acceptance thresholds. For reward-observed samples, our static threshold policy achieves $\tilde{O}(\sqrt{T})$ regret. For type-only samples, we first establish that sublinear regret is impossible without additional structure; under a mild minimum-arrival-probability assumption, we design both a partially adaptive policy attaining the same $\tilde{O}({T})$ bound and, more significantly, a fully adaptive resolving policy with careful rounding that achieves the first poly-logarithmic regret guarantee of $O((\log T)^3)$ for non-stationary multi-resource allocation. Our framework advances prior work by operating with minimal offline data (one sample per period), handling arbitrary non-stationarity without variation-budget assumptions, and supporting multiple resource constraints.

【7】Learning Without Training
标题:无需训练即可学习
链接:https://arxiv.org/abs/2602.17985

作者:Ryan O'Dowd
备注:PhD Dissertation of Ryan O'Dowd, defended successfully at Claremont Graduate University on 1/28/2026
摘要:Machine learning is at the heart of managing the real-world problems associated with massive data. With the success of neural networks on such large-scale problems, more research in machine learning is being conducted now than ever before. This dissertation focuses on three different projects rooted in mathematical theory for machine learning applications. The first project deals with supervised learning and manifold learning. In theory, one of the main problems in supervised learning is that of function approximation: that is, given some data set $\mathcal{D}=\{(x_j,f(x_j))\}_{j=1}^M$, can one build a model $F\approx f$? We introduce a method which aims to remedy several of the theoretical shortcomings of the current paradigm for supervised learning. The second project deals with transfer learning, which is the study of how an approximation process or model learned on one domain can be leveraged to improve the approximation on another domain. We study such liftings of functions when the data is assumed to be known only on a part of the whole domain. We are interested in determining subsets of the target data space on which the lifting can be defined, and how the local smoothness of the function and its lifting are related. The third project is concerned with the classification task in machine learning, particularly in the active learning paradigm. Classification has often been treated as an approximation problem as well, but we propose an alternative approach leveraging techniques originally introduced for signal separation problems. We introduce theory to unify signal separation with classification and a new algorithm which yields competitive accuracy to other recent active learning algorithms while providing results much faster.

【8】In-Context Learning for Pure Exploration in Continuous Spaces
标题:在连续空间中进行纯探索的上下文学习
链接:https://arxiv.org/abs/2602.17976

作者:Alessio Russo,Yin-Ching Lee,Ryan Welch,Aldo Pacchiano
摘要 :In active sequential testing, also termed pure exploration, a learner is tasked with the goal to adaptively acquire information so as to identify an unknown ground-truth hypothesis with as few queries as possible. This problem, originally studied by Chernoff in 1959, has several applications: classical formulations include Best-Arm Identification (BAI) in bandits, where actions index hypotheses, and generalized search problems, where strategically chosen queries reveal partial information about a hidden label. In many modern settings, however, the hypothesis space is continuous and naturally coincides with the query/action space: for example, identifying an optimal action in a continuous-armed bandit, localizing an $ε$-ball contained in a target region, or estimating the minimizer of an unknown function from a sequence of observations. In this work, we study pure exploration in such continuous spaces and introduce Continuous In-Context Pure Exploration for this regime. We introduce C-ICPE-TS, an algorithm that meta-trains deep neural policies to map observation histories to (i) the next continuous query action and (ii) a predicted hypothesis, thereby learning transferable sequential testing strategies directly from data. At inference time, C-ICPE-TS actively gathers evidence on previously unseen tasks and infers the true hypothesis without parameter updates or explicit hand-crafted information models. We validate C-ICPE-TS across a range of benchmarks, spanning continuous best-arm identification, region localization, and function minimizer identification.

【9】Bayesian Online Model Selection
标题:Bayesian在线模型选择
链接:https://arxiv.org/abs/2602.17958

作者:Aida Afshar,Yuke Zhang,Aldo Pacchiano
摘要:Online model selection in Bayesian bandits raises a fundamental exploration challenge: When an environment instance is sampled from a prior distribution, how can we design an adaptive strategy that explores multiple bandit learners and competes with the best one in hindsight? We address this problem by introducing a new Bayesian algorithm for online model selection in stochastic bandits. We prove an oracle-style guarantee of $O\left( d^* M \sqrt{T} + \sqrt{(MT)} \right)$ on the Bayesian regret, where $M$ is the number of base learners, $d^*$ is the regret coefficient of the optimal base learner, and $T$ is the time horizon. We also validate our method empirically across a range of stochastic bandit settings, demonstrating performance that is competitive with the best base learner. Additionally, we study the effect of sharing data among base learners and its role in mitigating prior mis-specification.

【10】JAX-Privacy: A library for differentially private machine learning
标题:JAX-Privacy:差异私密机器学习的库
链接:https://arxiv.org/abs/2602.17861

作者:Ryan McKenna,Galen Andrew,Borja Balle,Vadym Doroshenko,Arun Ganesh,Weiwei Kong,Alex Kurakin,Brendan McMahan,Mikhail Pravilov
摘要:JAX-Privacy is a library designed to simplify the deployment of robust and performant mechanisms for differentially private machine learning. Guided by design principles of usability, flexibility, and efficiency, JAX-Privacy serves both researchers requiring deep customization and practitioners who want a more out-of-the-box experience. The library provides verified, modular primitives for critical components for all aspects of the mechanism design including batch selection, gradient clipping, noise addition, accounting, and auditing, and brings together a large body of recent research on differentially private ML.

【11】Two Calm Ends and the Wild Middle: A Geometric Picture of Memorization in Diffusion Models
标题:两个平静的结局和狂野的中间:扩散模型中子化的几何图景
链接:https://arxiv.org/abs/2602.17846

作者:Nick Dodson,Xinyu Gao,Qingsong Wang,Yusu Wang,Zhengchao Wan
摘要:Diffusion models generate high-quality samples but can also memorize training data, raising serious privacy concerns. Understanding the mechanisms governing when memorization versus generalization occurs remains an active area of research. In particular, it is unclear where along the noise schedule memorization is induced, how data geometry influences it, and how phenomena at different noise scales interact. We introduce a geometric framework that partitions the noise schedule into three regimes based on the coverage properties of training data by Gaussian shells and the concentration behavior of the posterior, which we argue are two fundamental objects governing memorization and generalization in diffusion models. This perspective reveals that memorization risk is highly non-uniform across noise levels. We further identify a danger zone at medium noise levels where memorization is most pronounced. In contrast, both the small and large noise regimes resist memorization, but through fundamentally different mechanisms: small noise avoids memorization due to limited training coverage, while large noise exhibits low posterior concentration and admits a provably near linear Gaussian denoising behavior. For the medium noise regime, we identify geometric conditions through which we propose a geometry-informed targeted intervention that mitigates memorization.

【12】Market Games for Generative Models: Equilibria, Welfare, and Strategic Entry
标题:生成模型的市场博弈:均衡、福利与战略进入
链接:https://arxiv.org/abs/2602.17787

作者:Xiukun Wei,Min Shi,Xueru Zhang
备注:Published as a conference paper at ICLR 2026
摘要:Generative model ecosystems increasingly operate as competitive multi-platform markets, where platforms strategically select models from a shared pool and users with heterogeneous preferences choose among them. Understanding how platforms interact, when market equilibria exist, how outcomes are shaped by model-providers, platforms, and user behavior, and how social welfare is affected is critical for fostering a beneficial market environment. In this paper, we formalize a three-layer model-platform-user market game and identify conditions for the existence of pure Nash equilibrium. Our analysis shows that market structure, whether platforms converge on similar models or differentiate by selecting distinct ones, depends not only on models' global average performance but also on their localized attraction to user groups. We further examine welfare outcomes and show that expanding the model pool does not necessarily increase user welfare or market diversity. Finally, we design novel best-response training schemes that allow model providers to strategically introduce new models into competitive markets.

【13】Solving and learning advective multiscale Darcian dynamics with the Neural Basis Method
标题:用神经基方法求解和学习平流多尺度达西动力学
链接:https://arxiv.org/abs/2602.17776

作者:Yuhe Wang,Min Wang
摘要:Physics-governed models are increasingly paired with machine learning for accelerated predictions, yet most "physics--informed" formulations treat the governing equations as a penalty loss whose scale and meaning are set by heuristic balancing. This blurs operator structure, thereby confounding solution approximation error with governing-equation enforcement error and making the solving and learning progress hard to interpret and control. Here we introduce the Neural Basis Method, a projection-based formulation that couples a predefined, physics-conforming neural basis space with an operator-induced residual metric to obtain a well-conditioned deterministic minimization. Stability and reliability then hinge on this metric: the residual is not merely an optimization objective but a computable certificate tied to approximation and enforcement, remaining stable under basis enrichment and yielding reduced coordinates that are learnable across parametric instances. We use advective multiscale Darcian dynamics as a concrete demonstration of this broader point. Our method produce accurate and robust solutions in single solves and enable fast and effective parametric inference with operator learning.

【14】Investigating Target Class Influence on Neural Network Compressibility for Energy-Autonomous Avian Monitoring
标题:研究目标类别对能量自主鸟类监测神经网络可压缩性的影响
链接:https://arxiv.org/abs/2602.17751

作者:Nina Brolich,Simon Geis,Maximilian Kasper,Alexander Barnhill,Axel Plinge,Dominik Seuß
备注:11 pages, 7 figures, Funding: GreenICT@FMD (BMFTR grant 16ME0491K)
摘要 :Biodiversity loss poses a significant threat to humanity, making wildlife monitoring essential for assessing ecosystem health. Avian species are ideal subjects for this due to their popularity and the ease of identifying them through their distinctive songs. Traditionalavian monitoring methods require manual counting and are therefore costly and inefficient. In passive acoustic monitoring, soundscapes are recorded over long periods of time. The recordings are analyzed to identify bird species afterwards. Machine learning methods have greatly expedited this process in a wide range of species and environments, however, existing solutions require complex models and substantial computational resources. Instead, we propose running machine learning models on inexpensive microcontroller units (MCUs) directly in the field. Due to the resulting hardware and energy constraints, efficient artificial intelligence (AI) architecture is required. In this paper, we present our method for avian monitoring on MCUs. We trained and compressed models for various numbers of target classes to assess the detection of multiple bird species on edge devices and evaluate the influence of the number of species on the compressibility of neural networks. Our results demonstrate significant compression rates with minimal performance loss. We also provide benchmarking results for different hardware platforms and evaluate the feasibility of deploying energy-autonomous devices.

【15】Certified Learning under Distribution Shift: Sound Verification and Identifiable Structure
标题:分布转变下的认证学习:健全的验证和可识别的结构
链接:https://arxiv.org/abs/2602.17699

作者:Chandrasekhar Gokavarapu,Sudhakar Gadde,Y. Rajasekhar,S. R. Bhargava
摘要:Proposition. Let $f$ be a predictor trained on a distribution $P$ and evaluated on a shifted distribution $Q$. Under verifiable regularity and complexity constraints, the excess risk under shift admits an explicit upper bound determined by a computable shift metric and model parameters. We develop a unified framework in which (i) risk under distribution shift is certified by explicit inequalities, (ii) verification of learned models is sound for nontrivial sizes, and (iii) interpretability is enforced through identifiability conditions rather than post hoc explanations. All claims are stated with explicit assumptions. Failure modes are isolated. Non-certifiable regimes are characterized.

【16】Epistemic Traps: Rational Misalignment Driven by Model Misspecification
标题:认识陷阱:模型错误规范驱动的理性失调
链接:https://arxiv.org/abs/2602.17676

作者:Xingcheng Xu,Jingjing Qu,Qiaosheng Zhang,Chaochao Lu,Yanqing Yang,Na Zou,Xia Hu
摘要:The rapid deployment of Large Language Models and AI agents across critical societal and technical domains is hindered by persistent behavioral pathologies including sycophancy, hallucination, and strategic deception that resist mitigation via reinforcement learning. Current safety paradigms treat these failures as transient training artifacts, lacking a unified theoretical framework to explain their emergence and stability. Here we show that these misalignments are not errors, but mathematically rationalizable behaviors arising from model misspecification. By adapting Berk-Nash Rationalizability from theoretical economics to artificial intelligence, we derive a rigorous framework that models the agent as optimizing against a flawed subjective world model. We demonstrate that widely observed failures are structural necessities: unsafe behaviors emerge as either a stable misaligned equilibrium or oscillatory cycles depending on reward scheme, while strategic deception persists as a "locked-in" equilibrium or through epistemic indeterminacy robust to objective risks. We validate these theoretical predictions through behavioral experiments on six state-of-the-art model families, generating phase diagrams that precisely map the topological boundaries of safe behavior. Our findings reveal that safety is a discrete phase determined by the agent's epistemic priors rather than a continuous function of reward magnitude. This establishes Subjective Model Engineering, defined as the design of an agent's internal belief structure, as a necessary condition for robust alignment, marking a paradigm shift from manipulating environmental rewards to shaping the agent's interpretation of reality.

【17】Clapeyron Neural Networks for Single-Species Vapor-Liquid Equilibria
标题:单物种气液平衡的Clapeyron神经网络
链接:https://arxiv.org/abs/2602.18313

作者:Jan Pavšek,Alexander Mitsos,Elvis J. Sim,Jan G. Rittig
摘要:Machine learning (ML) approaches have shown promising results for predicting molecular properties relevant for chemical process design. However, they are often limited by scarce experimental property data and lack thermodynamic consistency. As such, thermodynamics-informed ML, i.e., incorporating thermodynamic relations into the loss function as regularization term for training, has been proposed. We herein transfer the concept of thermodynamics-informed graph neural networks (GNNs) from the Gibbs-Duhem to the Clapeyron equation, predicting several pure component properties in a multi-task manner, namely: vapor pressure, liquid molar volume, vapor molar volume and enthalpy of vaporization. We find improved prediction accuracy of the Clapeyron-GNN compared to the single-task learning setting, and improved approximation of the Clapeyron equation compared to the purely data-driven multi-task learning setting. In fact, we observe the largest improvement in prediction accuracy for the properties with the lowest availability of data, making our model promising for practical application in data scarce scenarios of chemical engineering practice.

【18】Machine-learning force-field models for dynamical simulations of metallic magnets
标题:用于金属磁体动力学仿真的机器学习力场模型
链接:https://arxiv.org/abs/2602.18213

作者:Gia-Wei Chern,Yunhao Fan,Sheng Zhang,Puhan Zhang
备注:9 pages, 5 figures
摘要:We review recent advances in machine learning (ML) force-field methods for Landau-Lifshitz-Gilbert (LLG) simulations of itinerant electron magnets, focusing on scalability and transferability. Built on the principle of locality, a deep neural network model is developed to efficiently and accurately predict the electron-mediated forces governing spin dynamics. Symmetry-aware descriptors constructed through a group-theoretical approach ensure rigorous incorporation of both lattice and spin-rotation symmetries. The framework is demonstrated using the prototypical s-d exchange model widely employed in spintronics. ML-enabled large-scale simulations reveal novel nonequilibrium phenomena, including anomalous coarsening of tetrahedral spin order on the triangular lattice and the freezing of phase separation dynamics in lightly hole-doped, strong-coupling square-lattice systems. These results establish ML force-field frameworks as scalable, accurate, and versatile tools for modeling nonequilibrium spin dynamics in itinerant magnets.

【19】Interactive Learning of Single-Index Models via Stochastic Gradient Descent
标题:通过随机梯度下降进行单指标模型的交互学习
链接:https://arxiv.org/abs/2602.17876

作者:Nived Rajaraman,Yanjun Han
备注:26 pages, 2 figures
摘要:Stochastic gradient descent (SGD) is a cornerstone algorithm for high-dimensional optimization, renowned for its empirical successes. Recent theoretical advances have provided a deep understanding of how SGD enables feature learning in high-dimensional nonlinear models, most notably the \textit{single-index model} with i.i.d. data. In this work, we study the sequential learning problem for single-index models, also known as generalized linear bandits or ridge bandits, where SGD is a simple and natural solution, yet its learning dynamics remain largely unexplored. We show that, similar to the optimal interactive learner, SGD undergoes a distinct ``burn-in'' phase before entering the ``learning'' phase in this setting. Moreover, with an appropriately chosen learning rate schedule, a single SGD procedure simultaneously achieves near-optimal (or best-known) sample complexity and regret guarantees across both phases, for a broad class of link functions. Our results demonstrate that SGD remains highly competitive for learning single-index models under adaptive data.

【20】Learning Flow Distributions via Projection-Constrained Diffusion on Manifolds
标题:通过流集中投影约束扩散的学习流分布
链接:https://arxiv.org/abs/2602.17773

作者:Noah Trupin,Rahul Ghosh,Aadi Jangid
摘要:We present a generative modeling framework for synthesizing physically feasible two-dimensional incompressible flows under arbitrary obstacle geometries and boundary conditions. Whereas existing diffusion-based flow generators either ignore physical constraints, impose soft penalties that do not guarantee feasibility, or specialize to fixed geometries, our approach integrates three complementary components: (1) a boundary-conditioned diffusion model operating on velocity fields; (2) a physics-informed training objective incorporating a divergence penalty; and (3) a projection-constrained reverse diffusion process that enforces exact incompressibility through a geometry-aware Helmholtz-Hodge operator. We derive the method as a discrete approximation to constrained Langevin sampling on the manifold of divergence-free vector fields, providing a connection between modern diffusion models and geometric constraint enforcement in incompressible flow spaces. Experiments on analytic Navier-Stokes data and obstacle-bounded flow configurations demonstrate significantly improved divergence, spectral accuracy, vorticity statistics, and boundary consistency relative to unconstrained, projection-only, and penalty-only baselines. Our formulation unifies soft and hard physical structure within diffusion models and provides a foundation for generative modeling of incompressible fields in robotics, graphics, and scientific computing.

【21】Sparse Bayesian Modeling of EEG Channel Interactions Improves P300 Brain-Computer Interface Performance
标题:脑电通道相互作用的稀疏Bayesian建模提高P300脑机接口性能
链接:https://arxiv.org/abs/2602.17772

作者:Guoxuan Ma,Yuan Zhong,Moyan Li,Yuxiao Nie,Jian Kang
摘要:Electroencephalography (EEG)-based P300 brain-computer interfaces (BCIs) enable communication without physical movement by detecting stimulus-evoked neural responses. Accurate and efficient decoding remains challenging due to high dimensionality, temporal dependence, and complex interactions across EEG channels. Most existing approaches treat channels independently or rely on black-box machine learning models, limiting interpretability and personalization. We propose a sparse Bayesian time-varying regression framework that explicitly models pairwise EEG channel interactions while performing automatic temporal feature selection. The model employs a relaxed-thresholded Gaussian process prior to induce structured sparsity in both channel-specific and interaction effects, enabling interpretable identification of task-relevant channels and channel pairs. Applied to a publicly available P300 speller dataset of 55 participants, the proposed method achieves a median character-level accuracy of 100\% using all stimulus sequences and attains the highest overall decoding performance among competing statistical and deep learning approaches. Incorporating channel interactions yields subgroup-specific gains of up to 7\% in character-level accuracy, particularly among participants who abstained from alcohol (up to 18\% improvement). Importantly, the proposed method improves median BCI-Utility by approximately 10\% at its optimal operating point, achieving peak throughput after only seven stimulus sequences. These results demonstrate that explicitly modeling structured EEG channel interactions within a principled Bayesian framework enhances predictive accuracy, improves user-centric throughput, and supports personalization in P300 BCI systems.

【22】GeneZip: Region-Aware Compression for Long Context DNA Modeling
标题:GeneZip:用于长上下文DNA建模的区域感知压缩
链接:https://arxiv.org/abs/2602.17739

作者:Jianan Zhao,Xixian Liu,Zhihao Zhan,Xinyu Yuan,Hongyu Guo,Jian Tang
备注:Preprint, work in progress
摘要:Genomic sequences span billions of base pairs (bp), posing a fundamental challenge for genome-scale foundation models. Existing approaches largely sidestep this barrier by either scaling relatively small models to long contexts or relying on heavy multi-GPU parallelism. Here we introduce GeneZip, a DNA compression model that leverages a key biological prior: genomic information is highly imbalanced. Coding regions comprise only a small fraction (about 2 percent) yet are information-dense, whereas most non-coding sequence is comparatively information-sparse. GeneZip couples HNet-style dynamic routing with a region-aware compression-ratio objective, enabling adaptive allocation of representation budget across genomic regions. As a result, GeneZip learns region-aware compression and achieves 137.6x compression with only 0.31 perplexity increase. On downstream long-context benchmarks, GeneZip achieves comparable or better performance on contact map prediction, expression quantitative trait loci prediction, and enhancer-target gene prediction. By reducing effective sequence length, GeneZip unlocks simultaneous scaling of context and capacity: compared to the prior state-of-the-art model JanusDNA, it enables training models 82.6x larger at 1M-bp context, supporting a 636M-parameter GeneZip model at 1M-bp context. All experiments in this paper can be trained on a single A100 80GB GPU.

【23】Clever Materials: When Models Identify Good Materials for the Wrong Reasons
标题:聪明的材料:当模特出于错误的原因识别好材料时
链接:https://arxiv.org/abs/2602.17730

作者:Kevin Maik Jablonka
摘要:Machine learning can accelerate materials discovery. Models perform impressively on many benchmarks. However, strong benchmark performance does not imply that a model learned chemistry. I test a concrete alternative hypothesis: that property prediction can be driven by bibliographic confounding. Across five tasks spanning MOFs (thermal and solvent stability), perovskite solar cells (efficiency), batteries (capacity), and TADF emitters (emission wavelength), models trained on standard chemical descriptors predict author, journal, and publication year well above chance. When these predicted metadata ("bibliographic fingerprints") are used as the sole input to a second model, performance is sometimes competitive with conventional descriptor-based predictors. These results show that many datasets do not rule out non-chemical explanations of success. Progress requires routine falsification tests (e.g., group/time splits and metadata ablations), datasets designed to resist spurious correlations, and explicit separation of two goals: predictive utility versus evidence of chemical understanding.

其他(33篇)

【1】Assigning Confidence: K-partition Ensembles
标题:信心:K分区合奏
链接:https://arxiv.org/abs/2602.18435

作者:Aggelos Semoglou,John Pavlopoulos
备注:31 pages including appendix
摘要 :Clustering is widely used for unsupervised structure discovery, yet it offers limited insight into how reliable each individual assignment is. Diagnostics, such as convergence behavior or objective values, may reflect global quality, but they do not indicate whether particular instances are assigned confidently, especially for initialization-sensitive algorithms like k-means. This assignment-level instability can undermine both accuracy and robustness. Ensemble approaches improve global consistency by aggregating multiple runs, but they typically lack tools for quantifying pointwise confidence in a way that combines cross-run agreement with geometric support from the learned cluster structure. We introduce CAKE (Confidence in Assignments via K-partition Ensembles), a framework that evaluates each point using two complementary statistics computed over a clustering ensemble: assignment stability and consistency of local geometric fit. These are combined into a single, interpretable score in [0,1]. Our theoretical analysis shows that CAKE remains effective under noise and separates stable from unstable points. Experiments on synthetic and real-world datasets indicate that CAKE effectively highlights ambiguous points and stable core members, providing a confidence ranking that can guide filtering or prioritization to improve clustering quality.

【2】Leakage and Second-Order Dynamics Improve Hippocampal RNN Replay
标题:泄漏和二阶动态改善海马RNN重播
链接:https://arxiv.org/abs/2602.18401

作者:Josue Casco-Rodriguez,Nanda H. Krishna,Richard G. Baraniuk
摘要:Biological neural networks (like the hippocampus) can internally generate "replay" resembling stimulus-driven activity. Recent computational models of replay use noisy recurrent neural networks (RNNs) trained to path-integrate. Replay in these networks has been described as Langevin sampling, but new modifiers of noisy RNN replay have surpassed this description. We re-examine noisy RNN replay as sampling to understand or improve it in three ways: (1) Under simple assumptions, we prove that the gradients replay activity should follow are time-varying and difficult to estimate, but readily motivate the use of hidden state leakage in RNNs for replay. (2) We confirm that hidden state adaptation (negative feedback) encourages exploration in replay, but show that it incurs non-Markov sampling that also slows replay. (3) We propose the first model of temporally compressed replay in noisy path-integrating RNNs through hidden state momentum, connect it to underdamped Langevin sampling, and show that, together with adaptation, it counters slowness while maintaining exploration. We verify our findings via path-integration of 2D triangular and T-maze paths and of high-dimensional paths of synthetic rat place cell activity.

【3】JPmHC Dynamical Isometry via Orthogonal Hyper-Connections
标题:通过垂直超连接的JPmHC动态等距
链接:https://arxiv.org/abs/2602.18308

作者:Biswa Sengupta,Jinhua Wang,Leo Brunswic
摘要:Recent advances in deep learning, exemplified by Hyper-Connections (HC), have expanded the residual connection paradigm by introducing wider residual streams and diverse connectivity patterns. While these innovations yield significant performance gains, they compromise the identity mapping property of residual connections, leading to training instability, limited scalability, and increased memory overhead. To address these challenges, we propose JPmHC (Jacobian-spectrum Preserving manifold-constrained Hyper-Connections), a framework that replaces identity skips with a trainable linear mixer acting on n parallel streams while explicitly controlling gradient conditioning. By constraining the mixer M on operator-norm-bounded manifolds (e.g., bistochastic, Stiefel, Grassmann), JPmHC prevents gradient pathologies and enhances stability. JPmHC introduces three key contributions: (i) a free-probability analysis that predicts Jacobian spectra for structured skips, providing actionable design rules for mixer selection; (ii) memory-efficient implicit differentiation for fixed-point projections, reducing activation memory and synchronization overhead; and (iii) a Stiefel-constrained mixer via Cayley transforms, ensuring orthogonality without post-hoc normalization. Empirical evaluations on ARC-AGI demonstrate that JPmHC achieves faster convergence, higher accuracy, and lower computational cost compared to bistochastic baselines. As a flexible and scalable extension of HC, JPmHC advances spectrum-aware, stable, and efficient deep learning, offering insights into topological architecture design and foundational model evolution.

【4】VeriSoftBench: Repository-Scale Formal Verification Benchmarks for Lean
标题:VeriSoftBench:面向精益的存储库规模正式验证基准
链接:https://arxiv.org/abs/2602.18307

作者:Yutong Xin,Qiaochu Chen,Greg Durrett,Işil Dillig
摘要:Large language models have achieved striking results in interactive theorem proving, particularly in Lean. However, most benchmarks for LLM-based proof automation are drawn from mathematics in the Mathlib ecosystem, whereas proofs in software verification are developed inside definition-rich codebases with substantial project-specific libraries. We introduce VeriSoftBench, a benchmark of 500 Lean 4 proof obligations drawn from open-source formal-methods developments and packaged to preserve realistic repository context and cross-file dependencies. Our evaluation of frontier LLMs and specialized provers yields three observations. First, provers tuned for Mathlib-style mathematics transfer poorly to this repository-centric setting. Second, success is strongly correlated with transitive repository dependence: tasks whose proofs draw on large, multi-hop dependency closures are less likely to be solved. Third, providing curated context restricted to a proof's dependency closure improves performance relative to exposing the full repository, but nevertheless leaves substantial room for improvement. Our benchmark and evaluation suite are released at https://github.com/utopia-group/VeriSoftBench.

【5】On the Semantic and Syntactic Information Encoded in Proto-Tokens for One-Step Text Reconstruction
标题:用于一步文本重建的原始标记中编码的语义和语法信息
链接:https://arxiv.org/abs/2602.18301

作者:Ivan Bondarenko,Egor Palkin,Fedor Tikunov
摘要:Autoregressive large language models (LLMs) generate text token-by-token, requiring n forward passes to produce a sequence of length n. Recent work, Exploring the Latent Capacity of LLMs for One-Step Text Reconstruction (Mezentsev and Oseledets), shows that frozen LLMs can reconstruct hundreds of tokens from only two learned proto-tokens in a single forward pass, suggesting a path beyond the autoregressive paradigm. In this paper, we study what information these proto-tokens encode and how they behave under reconstruction and controlled constraints. We perform a series of experiments aimed at disentangling semantic and syntactic content in the two proto-tokens, analyzing stability properties of the e-token, and visualizing attention patterns to the e-token during reconstruction. Finally, we test two regularization schemes for "imposing" semantic structure on the e-token using teacher embeddings, including an anchor-based loss and a relational distillation objective. Our results indicate that the m-token tends to capture semantic information more strongly than the e-token under standard optimization; anchor-based constraints trade off sharply with reconstruction accuracy; and relational distillation can transfer batch-level semantic relations into the proto-token space without sacrificing reconstruction quality, supporting the feasibility of future non-autoregressive seq2seq systems that predict proto-tokens as an intermediate representation.

【6】Analyzing and Improving Chain-of-Thought Monitorability Through Information Theory
标题:利用信息论分析和提高思想链的协调性
链接:https://arxiv.org/abs/2602.18297

作者:Usman Anwar,Tim Bakker,Dana Kianfar,Cristina Pinneri,Christos Louizos
备注:First two authors contributed equally
摘要 :Chain-of-thought (CoT) monitors are LLM-based systems that analyze reasoning traces to detect when outputs may exhibit attributes of interest, such as test-hacking behavior during code generation. In this paper, we use information-theoretic analysis to show that non-zero mutual information between CoT and output is a necessary but not sufficient condition for CoT monitorability. We identify two sources of approximation error that may undermine the performance of CoT monitors in practice: information gap, which measures the extent to which the monitor can extract the information available in CoT, and elicitation error, which measures the extent to which the monitor approximates the optimal monitoring function. We further demonstrate that CoT monitorability can be systematically improved through targeted training objectives. To this end, we propose two complementary approaches: (a) an oracle-based method that directly rewards the monitored model for producing CoTs that maximize monitor accuracy, and (b) a more practical, label-free approach that maximizes conditional mutual information between outputs and CoTs. Across multiple different environments, we show both methods significantly improve monitor accuracy while preventing CoT degeneration even when training against a monitor, thereby mitigating reward hacking when the task reward is imperfectly specified.

【7】Decoding as Optimisation on the Probability Simplex: From Top-K to Top-P (Nucleus) to Best-of-K Samplers
标题:作为概率单形优化的解码:从Top-K到Top-P(核心)再到K中最佳采样器
链接:https://arxiv.org/abs/2602.18292

作者:Xiaotong Ji,Rasul Tutunov,Matthieu Zimmer,Haitham Bou-Ammar
摘要:Decoding sits between a language model and everything we do with it, yet it is still treated as a heuristic knob-tuning exercise. We argue decoding should be understood as a principled optimisation layer: at each token, we solve a regularised problem over the probability simplex that trades off model score against structural preferences and constraints. This single template recovers greedy decoding, Softmax sampling, Top-K, Top-P, and Sparsemax-style sparsity as special cases, and explains their common structure through optimality conditions. More importantly, the framework makes it easy to invent new decoders without folklore. We demonstrate this by designing Best-of-K (BoK), a KL-anchored coverage objective aimed at multi-sample pipelines (self-consistency, reranking, verifier selection). BoK targets the probability of covering good alternatives within a fixed K-sample budget and improves empirical performance. We show that such samples can improve accuracy by, for example, +18.6% for Qwen2.5-Math-7B on MATH500 at high sampling temperatures.

【8】PRISM: Parallel Reward Integration with Symmetry for MORL
标题:PRism:MORL对称性并行奖励集成
链接:https://arxiv.org/abs/2602.18277

作者:Finn van der Knaap,Kejiang Qian,Zheng Xu,Fengxiang He
摘要:This work studies heterogeneous Multi-Objective Reinforcement Learning (MORL), where objectives can differ sharply in temporal frequency. Such heterogeneity allows dense objectives to dominate learning, while sparse long-horizon rewards receive weak credit assignment, leading to poor sample efficiency. We propose a Parallel Reward Integration with Symmetry (PRISM) algorithm that enforces reflectional symmetry as an inductive bias in aligning reward channels. PRISM introduces ReSymNet, a theory-motivated model that reconciles temporal-frequency mismatches across objectives, using residual blocks to learn a scaled opportunity value that accelerates exploration while preserving the optimal policy. We also propose SymReg, a reflectional equivariance regulariser that enforces agent mirroring and constrains policy search to a reflection-equivariant subspace. This restriction provably reduces hypothesis complexity and improves generalisation. Across MuJoCo benchmarks, PRISM consistently outperforms both a sparse-reward baseline and an oracle trained with full dense rewards, improving Pareto coverage and distributional balance: it achieves hypervolume gains exceeding 100\% over the baseline and up to 32\% over the oracle. The code is at \href{https://github.com/EVIEHub/PRISM}{https://github.com/EVIEHub/PRISM}.

【9】Variational Distributional Neuron
标题:变分分布神经元
链接:https://arxiv.org/abs/2602.18250

作者:Yves Ruffenach
备注:29 pages, 7 figures. Code available at GitHub (link in paper)
摘要:We propose a proof of concept for a variational distributional neuron: a compute unit formulated as a VAE brick, explicitly carrying a prior, an amortized posterior and a local ELBO. The unit is no longer a deterministic scalar but a distribution: computing is no longer about propagating values, but about contracting a continuous space of possibilities under constraints. Each neuron parameterizes a posterior, propagates a reparameterized sample and is regularized by the KL term of a local ELBO - hence, the activation is distributional. This "contraction" becomes testable through local constraints and can be monitored via internal measures. The amount of contextual information carried by the unit, as well as the temporal persistence of this information, are locally tuned by distinct constraints. This proposal addresses a structural tension: in sequential generation, causality is predominantly organized in the symbolic space and, even when latents exist, they often remain auxiliary, while the effective dynamics are carried by a largely deterministic decoder. In parallel, probabilistic latent models capture factors of variation and uncertainty, but that uncertainty typically remains borne by global or parametric mechanisms, while units continue to propagate scalars - hence the pivot question: if uncertainty is intrinsic to computation, why does the compute unit not carry it explicitly? We therefore draw two axes: (i) the composition of probabilistic constraints, which must be made stable, interpretable and controllable; and (ii) granularity: if inference is a negotiation of distributions under constraints, should the primitive unit remain deterministic or become distributional? We analyze "collapse" modes and the conditions for a "living neuron", then extend the contribution over time via autoregressive priors over the latent, per unit.

【10】Neural-HSS: Hierarchical Semi-Separable Neural PDE Solver
标题:Neural-Chase:分层半可分离神经PCE解算器
链接:https://arxiv.org/abs/2602.18248

作者:Pietro Sittoni,Emanuele Zangrando,Angelo A. Casulli,Nicola Guglielmi,Francesco Tudisco
摘要:Deep learning-based methods have shown remarkable effectiveness in solving PDEs, largely due to their ability to enable fast simulations once trained. However, despite the availability of high-performance computing infrastructure, many critical applications remain constrained by the substantial computational costs associated with generating large-scale, high-quality datasets and training models. In this work, inspired by studies on the structure of Green's functions for elliptic PDEs, we introduce Neural-HSS, a parameter-efficient architecture built upon the Hierarchical Semi-Separable (HSS) matrix structure that is provably data-efficient for a broad class of PDEs. We theoretically analyze the proposed architecture, proving that it satisfies exactness properties even in very low-data regimes. We also investigate its connections with other architectural primitives, such as the Fourier neural operator layer and convolutional layers. We experimentally validate the data efficiency of Neural-HSS on the three-dimensional Poisson equation over a grid of two million points, demonstrating its superior ability to learn from data generated by elliptic PDEs in the low-data regime while outperforming baseline methods. Finally, we demonstrate its capability to learn from data arising from a broad class of PDEs in diverse domains, including electromagnetism, fluid dynamics, and biology.

【11】SimVLA: A Simple VLA Baseline for Robotic Manipulation
标题:SimVLA:机器人操纵的简单VLA基线
链接:https://arxiv.org/abs/2602.18224

作者:Yuankai Luo,Woping Chen,Tong Liang,Baiqiao Wang,Zhenguo Li
摘要 :Vision-Language-Action (VLA) models have emerged as a promising paradigm for general-purpose robotic manipulation, leveraging large-scale pre-training to achieve strong performance. The field has rapidly evolved with additional spatial priors and diverse architectural innovations. However, these advancements are often accompanied by varying training recipes and implementation details, which can make it challenging to disentangle the precise source of empirical gains. In this work, we introduce SimVLA, a streamlined baseline designed to establish a transparent reference point for VLA research. By strictly decoupling perception from control, using a standard vision-language backbone and a lightweight action head, and standardizing critical training dynamics, we demonstrate that a minimal design can achieve state-of-the-art performance. Despite having only 0.5B parameters, SimVLA outperforms multi-billion-parameter models on standard simulation benchmarks without robot pretraining. SimVLA also reaches on-par real-robot performance compared to pi0.5. Our results establish SimVLA as a robust, reproducible baseline that enables clear attribution of empirical gains to future architectural innovations. Website: https://frontierrobo.github.io/SimVLA

【12】SOMtime the World Ain$'$t Fair: Violating Fairness Using Self-Organizing Maps
标题:SOMtime the World Ain $'$t Fair:使用自组织地图违反公平
链接:https://arxiv.org/abs/2602.18201

作者:Joseph Bingham,Netanel Arussy,Dvir Aran
备注:10 pages, 2 figures, preprint
摘要:Unsupervised representations are widely assumed to be neutral with respect to sensitive attributes when those attributes are withheld from training. We show that this assumption is false. Using SOMtime, a topology-preserving representation method based on high-capacity Self-Organizing Maps, we demonstrate that sensitive attributes such as age and income emerge as dominant latent axes in purely unsupervised embeddings, even when explicitly excluded from the input. On two large-scale real-world datasets (the World Values Survey across five countries and the Census-Income dataset), SOMtime recovers monotonic orderings aligned with withheld sensitive attributes, achieving Spearman correlations of up to 0.85, whereas PCA and UMAP typically remain below 0.23 (with a single exception reaching 0.31), and against t-SNE and autoencoders which achieve at most 0.34. Furthermore, unsupervised segmentation of SOMtime embeddings produces demographically skewed clusters, demonstrating downstream fairness risks without any supervised task. These findings establish that \textit{fairness through unawareness} fails at the representation level for ordinal sensitive attributes and that fairness auditing must extend to unsupervised components of machine learning pipelines. We have made the code available at~ https://github.com/JosephBingham/SOMtime

【13】Capabilities Ain't All You Need: Measuring Propensities in AI
标题:能力并非您所需要的全部:衡量人工智能的倾向
链接:https://arxiv.org/abs/2602.18182

作者:Daniel Romero-Alvarado,Fernando Martínez-Plumed,Lorenzo Pacchiardi,Hugo Save,Siddhesh Milind Pawar,Behzad Mehrbakhsh,Pablo Antonio Moreno Casares,Ben Slater,Paolo Bova,Peter Romero,Zachary R. Tyler,Jonathan Prunty,Luning Sun,Jose Hernandez-Orallo
摘要:AI evaluation has primarily focused on measuring capabilities, with formal approaches inspired from Item Response Theory (IRT) being increasingly applied. Yet propensities - the tendencies of models to exhibit particular behaviours - play a central role in determining both performance and safety outcomes. However, traditional IRT describes a model's success on a task as a monotonic function of model capabilities and task demands, an approach unsuited to propensities, where both excess and deficiency can be problematic. Here, we introduce the first formal framework for measuring AI propensities by using a bilogistic formulation for model success, which attributes high success probability when the model's propensity is within an "ideal band". Further, we estimate the limits of the ideal band using LLMs equipped with newly developed task-agnostic rubrics. Applying our framework to six families of LLM models whose propensities are incited in either direction, we find that we can measure how much the propensity is shifted and what effect this has on the tasks. Critically, propensities estimated using one benchmark successfully predict behaviour on held-out tasks. Moreover, we obtain stronger predictive power when combining propensities and capabilities than either separately. More broadly, our framework showcases how rigorous propensity measurements can be conducted and how it yields gains over solely using capability evaluations to predict AI behaviour.

【14】Rethinking Beam Management: Generalization Limits Under Hardware Heterogeneity
标题:重新思考射束管理:硬件异类下的概括限制
链接:https://arxiv.org/abs/2602.18151

作者:Nikita Zeulin,Olga Galinina,Ibrahim Kilinc,Sergey Andreev,Robert W. Heath
备注:This work has been submitted to the IEEE for possible publication
摘要:Hardware heterogeneity across diverse user devices poses new challenges for beam-based communication in 5G and beyond. This heterogeneity limits the applicability of machine learning (ML)-based algorithms. This article highlights the critical need to treat hardware heterogeneity as a first-class design concern in ML-aided beam management. We analyze key failure modes in the presence of heterogeneity and present case studies demonstrating their performance impact. Finally, we discuss potential strategies to improve generalization in beam management.

【15】MeanVoiceFlow: One-step Nonparallel Voice Conversion with Mean Flows
标题:MeanEqualFlow:采用Mean Flow的一步非并行语音转换
链接:https://arxiv.org/abs/2602.18104

作者:Takuhiro Kaneko,Hirokazu Kameoka,Kou Tanaka,Yuto Kondo
备注:Accepted to ICASSP 2026. Project page: https://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/meanvoiceflow/
摘要:In voice conversion (VC) applications, diffusion and flow-matching models have exhibited exceptional speech quality and speaker similarity performances. However, they are limited by slow conversion owing to their iterative inference. Consequently, we propose MeanVoiceFlow, a novel one-step nonparallel VC model based on mean flows, which can be trained from scratch without requiring pretraining or distillation. Unlike conventional flow matching that uses instantaneous velocity, mean flows employ average velocity to more accurately compute the time integral along the inference path in a single step. However, training the average velocity requires its derivative to compute the target velocity, which can cause instability. Therefore, we introduce a structural margin reconstruction loss as a zero-input constraint, which moderately regularizes the input-output behavior of the model without harmful statistical averaging. Furthermore, we propose conditional diffused-input training in which a mixture of noise and source data is used as input to the model during both training and inference. This enables the model to effectively leverage source information while maintaining consistency between training and inference. Experimental results validate the effectiveness of these techniques and demonstrate that MeanVoiceFlow achieves performance comparable to that of previous multi-step and distillation-based models, even when trained from scratch. Audio samples are available at https://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/meanvoiceflow/.

【16】DohaScript: A Large-Scale Multi-Writer Dataset for Continuous Handwritten Hindi Text
标题:DohaScript:用于连续手写印地语文本的大规模多作家数据集
链接:https://arxiv.org/abs/2602.18089

作者:Kunwar Arpit Singh,Ankush Prakash,Haroon R Lone
摘要 :Despite having hundreds of millions of speakers, handwritten Devanagari text remains severely underrepresented in publicly available benchmark datasets. Existing resources are limited in scale, focus primarily on isolated characters or short words, and lack controlled lexical content and writer level diversity, which restricts their utility for modern data driven handwriting analysis. As a result, they fail to capture the continuous, fused, and structurally complex nature of Devanagari handwriting, where characters are connected through a shared shirorekha (horizontal headline) and exhibit rich ligature formations. We introduce DohaScript, a large scale, multi writer dataset of handwritten Hindi text collected from 531 unique contributors. The dataset is designed as a parallel stylistic corpus, in which all writers transcribe the same fixed set of six traditional Hindi dohas (couplets). This controlled design enables systematic analysis of writer specific variation independent of linguistic content, and supports tasks such as handwriting recognition, writer identification, style analysis, and generative modeling. The dataset is accompanied by non identifiable demographic metadata, rigorous quality curation based on objective sharpness and resolution criteria, and page level layout difficulty annotations that facilitate stratified benchmarking. Baseline experiments demonstrate clear quality separation and strong generalization to unseen writers, highlighting the dataset's reliability and practical value. DohaScript is intended to serve as a standardized and reproducible benchmark for advancing research on continuous handwritten Devanagari text in low resource script settings.

【17】Deepmechanics
标题:深度机械
链接:https://arxiv.org/abs/2602.18060

作者:Abhay Shinde,Aryan Amit Barsainyan,Jose Siguenza,Ankita Vaishnobi Bisoi,Rakshit Kr. Singh,Bharath Ramsundar
备注:11 pages, 7 figures, Submitted to KDD 2026
摘要:Physics-informed deep learning models have emerged as powerful tools for learning dynamical systems. These models directly encode physical principles into network architectures. However, systematic benchmarking of these approaches across diverse physical phenomena remains limited, particularly in conservative and dissipative systems. In addition, benchmarking that has been done thus far does not integrate out full trajectories to check stability. In this work, we benchmark three prominent physics-informed architectures such as Hamiltonian Neural Networks (HNN), Lagrangian Neural Networks (LNN), and Symplectic Recurrent Neural Networks (SRNN) using the DeepChem framework, an open-source scientific machine learning library. We evaluate these models on six dynamical systems spanning classical conservative mechanics (mass-spring system, simple pendulum, double pendulum, and three-body problem, spring-pendulum) and non-conservative systems with contact (bouncing ball). We evaluate models by computing error on predicted trajectories and evaluate error both quantitatively and qualitatively. We find that all benchmarked models struggle to maintain stability for chaotic or nonconservative systems. Our results suggest that more research is needed for physics-informed deep learning models to learn robust models of classical mechanical systems.

【18】Hardware-Friendly Input Expansion for Accelerating Function Approximation
标题:用于加速函数逼近的硬件友好输入扩展
链接:https://arxiv.org/abs/2602.17952

作者:Hu Lou,Yin-Jun Gao,Dong-Xiao Zhang,Tai-Jiao Du,Jun-Jie Zhang,Jia-Rui Zhang
备注:22 pages, 4 figures
摘要:One-dimensional function approximation is a fundamental problem in scientific computing and engineering applications. While neural networks possess powerful universal approximation capabilities, their optimization process is often hindered by flat loss landscapes induced by parameter-space symmetries, leading to slow convergence and poor generalization, particularly for high-frequency components. Inspired by the principle of \emph{symmetry breaking} in physics, this paper proposes a hardware-friendly approach for function approximation through \emph{input-space expansion}. The core idea involves augmenting the original one-dimensional input (e.g., $x$) with constant values (e.g., $π$) to form a higher-dimensional vector (e.g., $[π, π, x, π, π]$), effectively breaking parameter symmetries without increasing the network's parameter count. We evaluate the method on ten representative one-dimensional functions, including smooth, discontinuous, high-frequency, and non-differentiable functions. Experimental results demonstrate that input-space expansion significantly accelerates training convergence (reducing LBFGS iterations by 12\% on average) and enhances approximation accuracy (reducing final MSE by 66.3\% for the optimal 5D expansion). Ablation studies further reveal the effects of different expansion dimensions and constant selections, with $π$ consistently outperforming other constants. Our work proposes a low-cost, efficient, and hardware-friendly technique for algorithm design.

【19】A Geometric Probe of the Accuracy-Robustness Trade-off: Sharp Boundaries in Symmetry-Breaking Dimensional Expansion
标题:准确性与鲁棒性权衡的几何探索:对称性破坏维度扩展中的尖锐边界
链接:https://arxiv.org/abs/2602.17948

作者:Yu Bai,Zhe Wang,Jiarui Zhang,Dong-Xiao Zhang,Yinjun Gao,Jun-Jie Zhang
备注:22 pages, 3 figures
摘要:The trade-off between clean accuracy and adversarial robustness is a pervasive phenomenon in deep learning, yet its geometric origin remains elusive. In this work, we utilize Symmetry-Breaking Dimensional Expansion (SBDE) as a controlled probe to investigate the mechanism underlying this trade-off. SBDE expands input images by inserting constant-valued pixels, which breaks translational symmetry and consistently improves clean accuracy (e.g., from $90.47\%$ to $95.63\%$ on CIFAR-10 with ResNet-18) by reducing parameter degeneracy. However, this accuracy gain comes at the cost of reduced robustness against iterative white-box attacks. By employing a test-time \emph{mask projection} that resets the inserted auxiliary pixels to their training values, we demonstrate that the vulnerability stems almost entirely from the inserted dimensions. The projection effectively neutralizes the attacks and restores robustness, revealing that the model achieves high accuracy by creating \emph{sharp boundaries} (steep loss gradients) specifically along the auxiliary axes. Our findings provide a concrete geometric explanation for the accuracy-robustness paradox: the optimization landscape deepens the basin of attraction to improve accuracy but inevitably erects steep walls along the auxiliary degrees of freedom, creating a fragile sensitivity to off-manifold perturbations.

【20】Tighter Regret Lower Bound for Gaussian Process Bandits with Squared Exponential Kernel in Hypersphere
标题:超球中具有平方指数核的高斯过程带宽的更紧遗憾下限
链接:https://arxiv.org/abs/2602.17940

作者:Shogo Iwazaki
备注:27 pages, 2 figures
摘要:We study an algorithm-independent, worst-case lower bound for the Gaussian process (GP) bandit problem in the frequentist setting, where the reward function is fixed and has a bounded norm in the known reproducing kernel Hilbert space (RKHS). Specifically, we focus on the squared exponential (SE) kernel, one of the most widely used kernel functions in GP bandits. One of the remaining open questions for this problem is the gap in the \emph{dimension-dependent} logarithmic factors between upper and lower bounds. This paper partially resolves this open question under a hyperspherical input domain. We show that any algorithm suffers $Ω(\sqrt{T (\ln T)^{d} (\ln \ln T)^{-d}})$ cumulative regret, where $T$ and $d$ represent the total number of steps and the dimension of the hyperspherical domain, respectively. Regarding the simple regret, we show that any algorithm requires $Ω(ε^{-2}(\ln \frac{1}ε)^d (\ln \ln \frac{1}ε)^{-d})$ time steps to find an $ε$-optimal point. We also provide the improved $O((\ln T)^{d+1}(\ln \ln T)^{-d})$ upper bound on the maximum information gain for the SE kernel. Our results guarantee the optimality of the existing best algorithm up to \emph{dimension-independent} logarithmic factors under a hyperspherical input domain.

【21】Latent Diffeomorphic Co-Design of End-Effectors for Deformable and Fragile Object Manipulation
标题:用于可变形和易碎物体操纵的末端效应器的潜在异形协同设计
链接:https://arxiv.org/abs/2602.17921

作者:Kei Ikemura,Yifei Dong,Florian T. Pokorny
摘要:Manipulating deformable and fragile objects remains a fundamental challenge in robotics due to complex contact dynamics and strict requirements on object integrity. Existing approaches typically optimize either end-effector design or control strategies in isolation, limiting achievable performance. In this work, we present the first co-design framework that jointly optimizes end-effector morphology and manipulation control for deformable and fragile object manipulation. We introduce (1) a latent diffeomorphic shape parameterization enabling expressive yet tractable end-effector geometry optimization, (2) a stress-aware bi-level co-design pipeline coupling morphology and control optimization, and (3) a privileged-to-pointcloud policy distillation scheme for zero-shot real-world deployment. We evaluate our approach on challenging food manipulation tasks, including grasping and pushing jelly and scooping fillets. Simulation and real-world experiments demonstrate the effectiveness of the proposed method.

【22】Dual Length Codes for Lossless Compression of BFloat16
标题:BFloat16的双长码无损压缩
链接:https://arxiv.org/abs/2602.17849

作者:Aditya Agrawal,Albert Magyar,Hiteshwar Eswaraiah,Patrick Sheridan,Pradeep Janedula,Ravi Krishnan Venkatesan,Krishna Nair,Ravi Iyer
备注:6 pages, 5 figures
摘要:Training and serving Large Language Models (LLMs) relies heavily on parallelization and collective operations, which are frequently bottlenecked by network bandwidth. Lossless compression using e.g., Huffman codes can alleviate the issue, however, Huffman codes suffer from slow, bit-sequential decoding and high hardware complexity due to deep tree traversals. Universal codes e.g., Exponential-Golomb codes are faster to decode but do not exploit the symbol frequency distributions. To address these limitations, this paper introduces Dual Length Codes, a hybrid approach designed to balance compression efficiency with decoding speed. Analyzing BFloat16 tensors from the Gemma model, we observed that the top 8 most frequent symbols account for approximately 50% of the cumulative probability. These 8 symbols are assigned a short 4 bit code. The remaining 248 symbols are assigned a longer 9 bit code. The coding scheme uses a single prefix bit to distinguish between the two code lengths. The scheme uses a small Look Up Table with only 8 entries for encoding and decoding. The scheme achieves a compressibility of 18.6% in comparison to 21.3% achieved by Huffman codes, but it significantly speeds up the decoding and simplifies the hardware complexity.

【23】Avoid What You Know: Divergent Trajectory Balance for GFlowNets
标题:避免你所知道的:GFlowNets的发散轨迹平衡
链接:https://arxiv.org/abs/2602.17827

作者:Pedro Dall'Antonia,Tiago da Silva,Daniel Csillag,Salem Lahlou,Diego Mesquita
备注:20 pages, under review
摘要:Generative Flow Networks (GFlowNets) are a flexible family of amortized samplers trained to generate discrete and compositional objects with probability proportional to a reward function. However, learning efficiency is constrained by the model's ability to rapidly explore diverse high-probability regions during training. To mitigate this issue, recent works have focused on incentivizing the exploration of unvisited and valuable states via curiosity-driven search and self-supervised random network distillation, which tend to waste samples on already well-approximated regions of the state space. In this context, we propose Adaptive Complementary Exploration (ACE), a principled algorithm for the effective exploration of novel and high-probability regions when learning GFlowNets. To achieve this, ACE introduces an exploration GFlowNet explicitly trained to search for high-reward states in regions underexplored by the canonical GFlowNet, which learns to sample from the target distribution. Through extensive experiments, we show that ACE significantly improves upon prior work in terms of approximation accuracy to the target distribution and discovery rate of diverse high-reward states.

【24】Grassmannian Mixture-of-Experts: Concentration-Controlled Routing on Subspace Manifolds
标题:格拉斯曼式专家混合:子空间上的集中控制路由
链接:https://arxiv.org/abs/2602.17798

作者:Ibne Farabi Shihab,Sanjeda Akter,Anuj Sharma
摘要:Mixture-of-Experts models rely on learned routers to assign tokens to experts, yet standard softmax gating provides no principled mechanism to control the tradeoff between sparsity and utilization. We propose Grassmannian MoE (GrMoE), a routing framework that operates on the Grassmannian manifold of subspaces, where gating weights arise from the concentration parameters of Matrix Bingham distributions. This construction yields a single, interpretable knob -- the concentration matrix $Λ$ -- that continuously controls routing entropy, replacing discrete top-$k$ selection with a smooth, geometrically principled sparsity mechanism. We further develop an amortized variational inference procedure for posterior routing distributions, enabling uncertainty-aware expert assignment that naturally resists expert collapse. We formally prove tight bounds relating the Bingham concentration spectrum to routing entropy, expected top-$k$ mass, and an exponential bound on expert collapse, establishing the first formal theory of concentration-controlled sparsity. On synthetic routing tasks, a 350M-parameter MoE language model with 8 experts, a 1.3B-parameter model with 16 experts, and a 2.7B-parameter model with 32 experts, GrMoE achieves 0\% routing collapse across all seeds, comparable or better perplexity with 15--30\% improved load balance, and a smooth monotonic relationship between concentration and effective sparsity that enables post-hoc sparsity tuning without retraining. Token-level analysis reveals that experts learn heterogeneous concentration values that correlate with linguistic specialization, providing interpretable routing behavior.

【25】MIDAS: Mosaic Input-Specific Differentiable Architecture Search
标题:MIDAS:马赛克输入特定差异化架构搜索
链接:https://arxiv.org/abs/2602.17700

作者:Konstanty Subbotko
摘要:Differentiable Neural Architecture Search (NAS) provides efficient, gradient-based methods for automatically designing neural networks, yet its adoption remains limited in practice. We present MIDAS, a novel approach that modernizes DARTS by replacing static architecture parameters with dynamic, input-specific parameters computed via self-attention. To improve robustness, MIDAS (i) localizes the architecture selection by computing it separately for each spatial patch of the activation map, and (ii) introduces a parameter-free, topology-aware search space that models node connectivity and simplifies selecting the two incoming edges per node. We evaluate MIDAS on the DARTS, NAS-Bench-201, and RDARTS search spaces. In DARTS, it reaches 97.42% top-1 on CIFAR-10 and 83.38% on CIFAR-100. In NAS-Bench-201, it consistently finds globally optimal architectures. In RDARTS, it sets the state of the art on two of four search spaces on CIFAR-10. We further analyze why MIDAS works, showing that patchwise attention improves discrimination among candidate operations, and the resulting input-specific parameter distributions are class-aware and predominantly unimodal, providing reliable guidance for decoding.

【26】EXACT: Explicit Attribute-Guided Decoding-Time Personalization
标题:EXACT:显式属性引导的解码时间个性化
链接:https://arxiv.org/abs/2602.17695

作者:Xin Yu,Hanwen Xing,Lingzhou Xue
摘要 :Achieving personalized alignment requires adapting large language models to each user's evolving context. While decoding-time personalization offers a scalable alternative to training-time methods, existing methods largely rely on implicit, less interpretable preference representations and impose a rigid, context-agnostic user representation, failing to account for how preferences shift across prompts. We introduce EXACT, a new decoding-time personalization that aligns generation with limited pairwise preference feedback using a predefined set of interpretable attributes. EXACT first identifies user-specific attribute subsets by maximizing the likelihood of preferred responses in the offline stage. Then, for online inference, EXACT retrieves the most semantically relevant attributes for an incoming prompt and injects them into the context to steer generation. We establish theoretical approximation guarantees for the proposed algorithm under mild assumptions, and provably show that our similarity-based retrieval mechanism effectively mitigates contextual preference shifts, adapting to disparate tasks without pooling conflicting preferences. Extensive experiments on human-annotated preference datasets demonstrate that EXACT consistently outperforms strong baselines, including preference modeling accuracy and personalized generation quality.

【27】IRPAPERS: A Visual Document Benchmark for Scientific Retrieval and Question Answering
标题:IRPAPPERS:科学检索和问题解答的视觉文档基准
链接:https://arxiv.org/abs/2602.17687

作者:Connor Shorten,Augustas Skaburskas,Daniel M. Jones,Charles Pierse,Roberto Esposito,John Trengrove,Etienne Dilocker,Bob van Luijt
备注:23 pages, 6 figures
摘要:AI systems have achieved remarkable success in processing text and relational data, yet visual document processing remains relatively underexplored. Whereas traditional systems require OCR transcriptions to convert these visual documents into text and metadata, recent advances in multimodal foundation models offer retrieval and generation directly from document images. This raises a key question: How do image-based systems compare to established text-based methods? We introduce IRPAPERS, a benchmark of 3,230 pages from 166 scientific papers, with both an image and an OCR transcription for each page. Using 180 needle-in-the-haystack questions, we compare image- and text-based retrieval and question answering systems. Text retrieval using Arctic 2.0 embeddings, BM25, and hybrid text search achieved 46% Recall@1, 78% Recall@5, and 91% Recall@20, while image-based retrieval reaches 43%, 78%, and 93%, respectively. The two modalities exhibit complementary failures, enabling multimodal hybrid search to outperform either alone, achieving 49% Recall@1, 81% Recall@5, and 95% Recall@20. We further evaluate efficiency-performance tradeoffs with MUVERA and assess multiple multi-vector image embedding models. Among closed-source models, Cohere Embed v4 page image embeddings outperform Voyage 3 Large text embeddings and all tested open-source models, achieving 58% Recall@1, 87% Recall@5, and 97% Recall@20. For question answering, text-based RAG systems achieved higher ground-truth alignment than image-based systems (0.82 vs. 0.71), and both benefit substantially from increased retrieval depth, with multi-document retrieval outperforming oracle single-document retrieval. We analyze the complementary limitations of unimodal text and image representations and identify question types that require one modality over the other. The IRPAPERS dataset and all experimental code are publicly available.

【28】When & How to Write for Personalized Demand-aware Query Rewriting in Video Search
标题:何时以及如何编写视频搜索中的个性化需求感知查询重写
链接:https://arxiv.org/abs/2602.17667

作者:Cheng cheng,Chenxing Wang,Aolin Li,Haijun Wu,Huiyun Hu,Juyuan Wang
摘要:In video search systems, user historical behaviors provide rich context for identifying search intent and resolving ambiguity. However, traditional methods utilizing implicit history features often suffer from signal dilution and delayed feedback. To address these challenges, we propose WeWrite, a novel Personalized Demand-aware Query Rewriting framework. Specifically, WeWrite tackles three key challenges: (1) When to Write: An automated posterior-based mining strategy extracts high-quality samples from user logs, identifying scenarios where personalization is strictly necessary; (2) How to Write: A hybrid training paradigm combines Supervised Fine-Tuning (SFT) with Group Relative Policy Optimization (GRPO) to align the LLM's output style with the retrieval system; (3) Deployment: A parallel "Fake Recall" architecture ensures low latency. Online A/B testing on a large-scale video platform demonstrates that WeWrite improves the Click-Through Video Volume (VV$>$10s) by 1.07% and reduces the Query Reformulation Rate by 2.97%.

【29】Trojans in Artificial Intelligence (TrojAI) Final Report
标题:人工智能中的特洛伊木马(TrojAI)最终报告
链接:https://arxiv.org/abs/2602.07152

作者:Kristopher W. Reese,Taylor Kulp-McDowall,Michael Majurski,Tim Blattner,Derek Juba,Peter Bajcsy,Antonio Cardone,Philippe Dessauw,Alden Dima,Anthony J. Kearsley,Melinda Kleczynski,Joel Vasanth,Walid Keyrouz,Chace Ashcraft,Neil Fendley,Ted Staley,Trevor Stout,Josh Carney,Greg Canal,Will Redman,Aurora Schmidt,Cameron Hickert,William Paul,Jared Markowitz,Nathan Drenkow,David Shriver,Marissa Connor,Keltin Grimes,Marco Christiani,Hayden Moore,Jordan Widjaja,Kasimir Gabert,Uma Balakrishnan,Satyanadh Gundimada,John Jacobellis,Sandya Lakkur,Vitus Leung,Jon Roose,Casey Battaglino,Farinaz Koushanfar,Greg Fields,Xihe Gu,Yaman Jandali,Xinqiao Zhang,Akash Vartak,Tim Oates,Ben Erichson,Michael Mahoney,Rauf Izmailov,Xiangyu Zhang,Guangyu Shen,Siyuan Cheng,Shiqing Ma,XiaoFeng Wang,Haixu Tang,Di Tang,Xiaoyi Chen,Zihao Wang,Rui Zhu,Susmit Jha,Xiao Lin,Manoj Acharya,Wenchao Li,Chao Chen
摘要:The Intelligence Advanced Research Projects Activity (IARPA) launched the TrojAI program to confront an emerging vulnerability in modern artificial intelligence: the threat of AI Trojans. These AI trojans are malicious, hidden backdoors intentionally embedded within an AI model that can cause a system to fail in unexpected ways, or allow a malicious actor to hijack the AI model at will. This multi-year initiative helped to map out the complex nature of the threat, pioneered foundational detection methods, and identified unsolved challenges that require ongoing attention by the burgeoning AI security field. This report synthesizes the program's key findings, including methodologies for detection through weight analysis and trigger inversion, as well as approaches for mitigating Trojan risks in deployed models. Comprehensive test and evaluation results highlight detector performance, sensitivity, and the prevalence of "natural" Trojans. The report concludes with lessons learned and recommendations for advancing AI security research.

【30】On the Generalization and Robustness in Conditional Value-at-Risk
标题:论条件风险价值的推广性和稳健性
链接:https://arxiv.org/abs/2602.18053

作者:Dinesh Karthik Mulumudi,Piyushi Manupriya,Gholamali Aminian,Anant Raj
摘要 :Conditional Value-at-Risk (CVaR) is a widely used risk-sensitive objective for learning under rare but high-impact losses, yet its statistical behavior under heavy-tailed data remains poorly understood. Unlike expectation-based risk, CVaR depends on an endogenous, data-dependent quantile, which couples tail averaging with threshold estimation and fundamentally alters both generalization and robustness properties. In this work, we develop a learning-theoretic analysis of CVaR-based empirical risk minimization under heavy-tailed and contaminated data. We establish sharp, high-probability generalization and excess risk bounds under minimal moment assumptions, covering fixed hypotheses, finite and infinite classes, and extending to $β$-mixing dependent data; we further show that these rates are minimax optimal. To capture the intrinsic quantile sensitivity of CVaR, we derive a uniform Bahadur-Kiefer type expansion that isolates a threshold-driven error term absent in mean-risk ERM and essential in heavy-tailed regimes. We complement these results with robustness guarantees by proposing a truncated median-of-means CVaR estimator that achieves optimal rates under adversarial contamination. Finally, we show that CVaR decisions themselves can be intrinsically unstable under heavy tails, establishing a fundamental limitation on decision robustness even when the population optimum is well separated. Together, our results provide a principled characterization of when CVaR learning generalizes and is robust, and when instability is unavoidable due to tail scarcity.

【31】Interactions that reshape the interfaces of the interacting parties
标题:重塑互动方界面的互动
链接:https://arxiv.org/abs/2602.17917

作者:David I. Spivak
备注:20 pages
摘要:Polynomial functors model systems with interfaces: each polynomial specifies the outputs a system can produce and, for each output, the inputs it accepts. The bicategory $\mathbb{O}\mathbf{rg}$ of dynamic organizations \cite{spivak2021learners} gives a notion of state-driven interaction patterns that evolves over time, but each system's interface remains fixed throughout the interaction. Yet in many systems, the outputs sent and inputs received can reshape the interface itself: a cell differentiating in response to chemical signals gains or loses receptors; a sensor damaged by its input loses a channel; a neural network may grow its output resolution during training. Here we introduce *polynomial trees*, elements of the terminal $(u\triangleleft u)$-coalgebra where $u$ is the polynomial associated to a universe of sets, to model such systems: a polynomial tree is a coinductive tree whose nodes carry polynomials, and in which each round of interaction -- an output chosen and an input received -- determines a child tree, hence the next interface. We construct a monoidal closed category $\mathbf{PolyTr}$ of polynomial trees, with coinductively-defined morphisms, tensor product, and internal hom. We then build a bicategory $\mathbb{O}\mathbf{rgTr}$ generalizing $\mathbb{O}\mathbf{rg}$, whose hom-categories parametrize morphisms by state sets with coinductive action-and-update data. We provide a locally fully faithful functor $\mathbb{O}\mathbf{rg}\to\mathbb{O}\mathbf{rgTr}$ via constant trees, those for which the interfaces do not change through time. We illustrate the generalization by suggesting a notion of progressive generative adversarial networks, where gradient feedback determines when the image-generation interface grows to a higher resolution.

【32】Topological Exploration of High-Dimensional Empirical Risk Landscapes: general approach, and applications to phase retrieval
标题:多维经验风险景观的布局探索:一般方法及其在阶段检索中的应用
链接:https://arxiv.org/abs/2602.17779

作者:Antoine Maillard,Tony Bonnaire,Giulio Biroli
备注:43 pages, 14 figures
摘要:We consider the landscape of empirical risk minimization for high-dimensional Gaussian single-index models (generalized linear models). The objective is to recover an unknown signal $\boldsymbolθ^\star \in \mathbb{R}^d$ (where $d \gg 1$) from a loss function $\hat{R}(\boldsymbolθ)$ that depends on pairs of labels $(\mathbf{x}_i \cdot \boldsymbolθ, \mathbf{x}_i \cdot \boldsymbolθ^\star)_{i=1}^n$, with $\mathbf{x}_i \sim \mathcal{N}(0, I_d)$, in the proportional asymptotic regime $n \asymp d$. Using the Kac-Rice formula, we analyze different complexities of the landscape -- defined as the expected number of critical points -- corresponding to various types of critical points, including local minima. We first show that some variational formulas previously established in the literature for these complexities can be drastically simplified, reducing to explicit variational problems over a finite number of scalar parameters that we can efficiently solve numerically. Our framework also provides detailed predictions for properties of the critical points, including the spectral properties of the Hessian and the joint distribution of labels. We apply our analysis to the real phase retrieval problem for which we derive complete topological phase diagrams of the loss landscape, characterizing notably BBP-type transitions where the Hessian at local minima (as predicted by the Kac-Rice formula) becomes unstable in the direction of the signal. We test the predictive power of our analysis to characterize gradient flow dynamics, finding excellent agreement with finite-size simulations of local optimization algorithms, and capturing fine-grained details such as the empirical distribution of labels. Overall, our results open new avenues for the asymptotic study of loss landscapes and topological trivialization phenomena in high-dimensional statistical models.

【33】Spectral Homogenization of the Radiative Transfer Equation via Low-Rank Tensor Train Decomposition
标题:通过低阶张量串分解实现辐射传递方程的谱均匀化
链接:https://arxiv.org/abs/2602.17708

作者:Y. Sungtaek Ju
备注:30 pages; submitted for publication
摘要:Radiative transfer in absorbing-scattering media requires solving a transport equation across a spectral domain with 10^5 - 10^6 molecular absorption lines. Line-by-line (LBL) computation is prohibitively expensive, while existing approximations sacrifice spectral fidelity. We show that the Young-measure homogenization framework produces solution tensors I that admit low-rank tensor-train (TT) decompositions whose bond dimensions remain bounded as the spectral resolution Ns increases. Using molecular line parameters from the HITRAN database for H2O and CO2, we demonstrate that: (i) the TT rank saturates at r = 8 (at tolerance e = 10^-6) from Ns = 16 to 4096, independent of single-scattering albedo, Henyey-Greenstein asymmetry, temperature, and pressure; (ii) quantized tensor-train (QTT) representations achieve sub-linear storage scaling; (iii) in a controlled comparison using identical opacity data and transport solver, the homogenized approach achieves over an order of magnitude lower L2 error than the correlated-k distribution at equal cost; and (iv) for atomic plasma opacity (aluminum at 60 eV, TOPS database), the TT rank saturates at r = 15 with fundamentally different spectral structure (bound-bound and bound-free transitions spanning 12 decades of dynamic range), confirming that rank boundedness is a property of the transport equation rather than any particular opacity source. These results establish that the spectral complexity of radiative transfer has a finite effective rank exploitable by tensor decomposition, complementing the spatial-angular compression achieved by existing TT and dynamical low-rank approaches.

机器翻译由腾讯交互翻译提供,仅供参考

点击“阅读原文”获取带摘要的学术速递

Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/193132